Methods
Methods implemented in WannierBerri and allowing the performance boost are described in the following preprint article
Stepan S. Tsirkin. “High performance Wannier interpolation of Berry curvature and related quantities with WannierBerri code”, npj Comput Mater 7, 33 (2021). (Open Access).
Below is given a section from that paper, just for reference.
General equations for Wannier interpolation
The goal of this section is to introduce notation necessary for further discussion. For more details please refer to review (Marzari et al. 2012) and original articles cited therein. The problem of Wannier interpolation is stated in the following way. First we evaluate the energies \(E_{n{\bf q}}\) and wavefunctions \(\psi_{n{\bf q}}({\bf r})\equiv e^{i{\bf q}\cdot{\bf r}}u_{n{\bf q}}({\bf r})\) from first principles on a rather coarse grid of \(N_{\bf q}=N_{\bf q}^1\times N_{\bf q}^2\times N_{\bf q}^3\) wavevectors \({\bf q}\) within the reciprocal unit cell. Next we want to find the energies and wavefunctions at points on a denser grid of wavevectors \({\bf k}\). Further we will consistently use \({\bf q}\) and \({\bf k}\) to denote the ab initio and interpolation grids respectively.
For a group of entangled bands one can define a set of \(J\) WFs defined as
where \({\cal J}_{\bf q}\ge J\) and \({\bf R}\) are real-space lattice vectors. The matrices \(V_{mn'}({\bf q})\) contain all the information of the construction of WFs, and may be generated by Wannier90 code. They are constrained by \(\sum_{m=1}^{{\cal J}_{\bf q}} V^*_{mn}({\bf q})V_{mn'}({\bf q})=\delta_{nn'}\) and are chosen in such a way that the WFs are localized, which yields that the Bloch wavefunctions in the Wannier gauge
vary slowly with the \({\bf k}\) vector, unlike the true wavefunctions. Now let us see how WFs may be used to interpolate the band energies. First, one evaluates the matrix elements of the Hamiltonian
Next, to obtain energies at an arbitrary point \({\bf k}\) one needs to construct the Wannier Hamiltonian
which further may be diagonalized as
where \(U_{nl'}({\bf k})\) are unitary matrices with columns corresponding to the eigenvectors of the Hamiltonian (6). In a similar way, for any operator \(\hat{X}\), for which the matrix elements are evaluated on the ab initio grid, one may obtain the real-space matrix elements
where in a simple case (e.g. \(\hat{X}=\boldsymbol{\sigma}\))
or if \(\hat{X}\) involves momentum-space derivatives, (e.g. the position operator \(\hat{r}_\alpha\equiv i\partial/\partial k\alpha\)) may also involve matrix elements between neighbouring \({\bf q}\) points (see Wang et al. 2006, Lopez et al. 2012 for details). Then the matrix elements may be interpolated to any \({\bf k}\) point in the Wannier gauge by
and further rotated to the Hamiltonian gauge
Note that equations (5), (6) are particular cases of (7) and (9). Equation (7) can be performed by means of FFT, and its result is periodic in \({\bf R}\) with a supercell formed by vectors \(\mathbf{A}_i=\mathbf{a}_iN_{\bf q}^i\), where \(\mathbf{a}_i\) (\(i=1,2,3\)) are the primitive unit cell vectors. Among the equivalent \({\bf R}\) vectors we choose those belonging to the corresponding Wigner-Seitz (WS) supercell. If an \({\bf R}\) vector belongs to the WS supercell boundary, we include all equivalent vectors on the boundary with the corresponding elements \(X({\bf R})\) divided by the degeneracy of the \({\bf R}\) vector. Further, the MDRS method (see Minimal-distance replica selection method) may also slightly modify the set of \({\bf R}\) vectors.
As an example, the total Berry curvature of the occupied manifold is interpolated (Wang et al. 2006) via
where the ingredients of the equation are obtained using eqs. (9), (10) starting from \(D_{nl,\alpha}\equiv\frac{\overline{H}_{nl,\alpha}^{\rm H}}{E_l-E_n}\), \(H_\alpha^{\rm W}\equiv\partial_\alpha H^{\rm W}\), \(A_{mn,\alpha}({\bf R})\equiv\langle\mathbf{0}m\vert\hat{r}_\alpha\vert{\bf R}n\rangle\), \(\overline{\Omega}_\gamma^{\rm W} \equiv\epsilon_{\alpha\beta\gamma}\partial_\alpha A^{\rm W}_\beta\), \(\partial_\alpha\equiv \partial/\partial{k_\alpha}\). The anomalous Hall conductivity is evaluated as an integral
Note, that while the direct Fourier transform ((7)) is performed only once for the calculation, and is not repeated for the multiple \({\bf k}\) points upon interpolation, the inverse Fourier transform ((9)) is repeated for every interpolation \({\bf k}\) point. And in fact it presents the most time-consuming part of the calculation involving Wannier interpolation as implemented in the Wannier90 code.
Mixed Fourier transform
Fig. 2 (a-f) Illustration of the procedure of mixed Fourier transform, adaptive refinement and use of symmetries. 2D picture is used for visualization purposes, while the code actually works in 3D. The area of colored circles corresponds to the weight of the \(\mathbf{K}\)-point, gray crosses denote the points with zero weight. See the text for detailed description. (g) AHC of bcc Fe, evaluated from a grid of \(52\times 52\times 52\) \(\mathbf{k}\) points and 20 recursive adaptive refinement iterations.
In this section we will see how the evaluation of ((9)) may be accelerated. It is easy to see that the computation time of a straightforward discrete Fourier transform scales with the number of \({\bf R}\) vectors and \({\bf k}\) points as \(t\propto N_{\bf R}N_{\bf k}\), and we are typically interested in a case \(N_{\bf k}\gg N_{\bf R}\) (\(N_{\bf R}\approx N_{\bf q}\)).
When the Fourier transform is done on a regular grid of \({\bf k}\) points, it is usually appealing to use the FFT. For that one needs to place the \({\bf R}\) vectors on a regular grid of size \(N_{\bf k}\), fill the missing spots with zeros and perform the standard FFT, which will scale as \(t\propto N_{\bf k}\log{N_{\bf k}}\). However there are some dificulties with such FFT. Mainly, because to perform FFT on a large grid implies storing the data for all \({\bf k}\) points in memory at the same time, which becomes a severe computational limitation. Also FFT does not allow to reduce computation to only the symmetry-irreducible \({\bf k}\) points, and is more difficult to do in parallel. However there is a way to combine the advantages of both the FFT and the usual discrete Fourier transform, leading to the concept of mixed Fourier transform.
We want to evaluate ((9)) for a set of \({\bf k}\) points.
where \(0\le n_i< N_{\bf k}^i\) – integers (\(i=1,2,3\)), \(N_{\bf k}^i\) – size of interpolation grid, \({\bf b}_i\) — reciprocal lattice vectors. Now suppose we can factorize \(N_{\bf k}^i=N_{\rm FFT}^i N_{\bf K}^i\) [2] . Then the set of points ((14)) is equivalent to a set of points \({\bf k}={\bf K}+\boldsymbol{\kappa}\), where
where \(0\le l_i< N_{\bf K}^i\), \(N_{\bf K}=\prod_i N_{\bf K}^i\), \(0\le m_i< N_{\rm FFT}^i\). This separation is illustrated in Fig. 2 (a), which shows a 2\(\times\)2 \({\bf K}\)-grid, each corresponding to 4\(\times\)4 FFT grid (dots of a certain color). Now for each \({\bf K}\)-point we can define
and then (9) reads as
The principle idea of mixed Fourier transform consists in performing the Fourier transform (16) as FFT, while (15) is performed directly. To perform the FFT we put all the \({\bf R}\) vectors on a grid \(N_{\rm FFT}^1\times N_{\rm FFT}^2\times N_{\rm FFT}^3\), and a vector \({\bf R}=\sum_{i=1}^3 n_i\mathbf{a}_i\) is placed on a slot with coordinates \(\widetilde{n}_i= n_i\,{\rm mod}\,N_{\rm FFT}^i\) (\(n_i\) are both positive and negative integers, while \(0\le \widetilde{n}_i<N_{\rm FFT}^i\)). It is important to choose the FFT grid big enough, so that two different \({\bf R}\) vectors are not placed on the same slot in the grid.
The advantages of this approach are the following. First, the computational time scales as \(t_1\propto N_{\bf K}N_{\bf R}\) for (15) and \(t_2\propto N_{\bf K}N_{\rm FFT}\log N_{\rm FFT}\) for (16). Because it is required that \(N_{\rm FFT}\ge N_{\bf R}\) (to fit all \({\bf R}\)-vectors in the FFT box), we have \(t_1 \le t_2 \propto N_{\bf k}\log N_{\rm FFT}\) (in practice it occurs that \(t_1 \ll t_2\)), which scales better then both the Fast and ’slow’ Fourier transforms. Next, we can perform Eqs. (15) and (16) independently for different \({\bf K}\)-points. This saves us memory, and also offers a simple parallelization scheme. Also we can further restrict evaluation only to symmetry irreducible \({\bf K}\)-points (Symmetries) and also perform adaptive refinement over \({\bf K}\)-points (Recursive adaptive refinement).
Moreover, the evaluation time of a mixed Fourier transform only logarithmically depends on the size of the ab initio grid (recall that \(N_{\rm FFT}\sim N_{\bf R}\sim N_{\bf q}\)), while for the slow Fourier transform, the dependence is linear. However, in practice we see that the Fourier transform in the present implementation consumes only a small portion of computational time, and therefore the overall computational time is practically independent of the size of the ab initio grid.
Symmetries
When we integrate some quantity over the BZ, at every \({\bf K}\)-point (after summing over \(\boldsymbol{\kappa}\) points) we obtain the result as a rank-\(m\) tensor \(X_{i_1,\ldots,i_m}({\bf K})\), for example the berry curvature vector \(\Omega_\gamma\) or the conductivity tensor \(\sigma_{xy}\). Then the BZ integral is expressed as a sum
and we initially set \(\{{\bf K}\}\) as a regular grid (14) and \(w_{\bf K}=1/N_{\bf K}\). Suppose \(G\) is the magnetic point group of the system. [3] We define the set of symmetry-irreducible \({\bf K}\) points \(\rm irr\) as a a set of points that \(\forall {\bf K},{\bf K}'\in{\rm irr}\), \(\forall g\in G\) holds \(g{\bf K}\neq{\bf K}'\), unless \(g=E\) (identity). Then we can rewrite the sum (17) as
where we choose \(g_{\bf K}\) such that \(g_{\bf K}^{-1}{\bf K}\in{\rm irr}\) (this choice may be not unique), and obviously \(g_{\bf K}=E\) for \({\bf K}\in{\rm irr}\). Thus, only the irreducible \({\bf K}\) points need to be evaluated. Next, to make sure that the result respects the symmetries, despite possible numerical inaccuracies, we symmetrize the result as:
Note, that \({\cal\widetilde X}={\cal X}\) if the model respects the symmetry precisely (e.g. when symmetry-adapted WFs (Sakuma 2013) are used). Combining (18) and (19) and using \(\sum_f^{G} f\cdot g_{\bf K}= \sum_f^{G} f\) we get
where \(G\cdot{\bf K}\) denotes the orbit of \({\bf K}\) under
action of group \(G\). The latter equation reflects the
implementation in the WB
code. Starting from a regular grid of
\({\bf K}\) points we search for pairs of symmetry-equivalent
points. Whenever such a pair is found, one of the points is excluded and
it’s weight is transferred to the other point. Compare
Fig. 2 (a) and (b): the red
points are removed and their weight is moved to green points. Thus we
end with a set of irreducible \({\bf K}\)-point with weights
\(\widetilde{w}_{\bf K}=\sum_{{\bf K}'}^{G\cdot{\bf K}} w_{{\bf K}'}\).
Next we evaluate \(X({\bf K})\) (employing the corresponding
interpolation scheme) only at symmetry-irreducible
\({\bf K}\)-points. Note, that although some \({\bf k}\)-points
corresponding to the same \({\bf K}\)-point (same color in
Fig. 2 are equivalent, we have to
evaluate them all to be able to use the FFT. Finally, after summation,
we symmetrize the result. The described procedure achieves two goals:
(i) reduce the computational costs and (ii) make the result precisely
symmetric, even if the WFs are not perfectly symmetric. In the present
example we managed to obtain highly symmetric WFs (although without
employment of symmetry-adapted WFs method), and therefore the
symmetrization procedure does not change the result (within relative
accuracy \(\sim 10^{-5}\)). However, for complex materials such
quality of WFs is not always easy to achieve.
Recursive adaptive refinement
It is well known that in calculations of quantities involving Berry curvature or orbital moments, one performs integration over \({\bf k}\)-space of a function that rapidly changes with \({\bf k}\). As a result, small areas of \({\bf k}\)-space give the major contribution to the integral. Such areas often appear in the vicinity of Weyl points, nodal lines, as well as avoided crossings. To accelerate convergence with respect to the number of \({\bf k}\) points, we utilize adaptive mesh refinement similar to Refs. ( Yao et al. 2004; Wang et al. 2006). The authors of (Yao et al. 2004; Wang et al. 2006) assumed a pre-defined threshold, and the \({\bf k}\)-points yielding Berry curvature above the threshold were refined. This is inconvenient because one needs a good intuition to guess an optimal value for this threshold, because it depends both on the quantity one wants to calculate, and the material considered.
In WB
it is implemented in a way that does not require initial guess
from the user. This procedure, in combination with symmetrization
described above, is illustrated in
Fig. 2 in two dimensions (2D),
while the actual work in 3D is described below. After excluding
symmetry-equivalent \({\bf K}\)-points
(Fig. 2 (b)) the results are
evaluated for every \({\bf K}\) point and stored. We assume that
initially each \({\bf K}\) point has weight
\(\widetilde{w}_{\bf K}\) and corresponds to a volume defined by
vectors \(\mathbf{c}_{\bf K}^i=\mathbf{b}_i/N_{\bf k}^i\) centered
at \({\bf K}\). Then we pick a few “most important
\({\bf K}\)-points”. The criteria of importance may be different -
either the Maximal value for any \(E_F\), or maximal value summed
over all \(E_F\), or yielding most variation over the \(E_F\)
(if the evaluated quantity is a function of Fermi level \(E_F\)).
Suppose we selected the magenta point. Then those points are refined —
replaced with 8 points around it with coordinates
where all combinationgs of \(\pm\) signs are used. In Fig. 2 (c) 4 new blue \({\bf K}\)-points in the 2D case. The weight and volume of the initial point is distributed over the new points, thus \(w_{{\bf K}'}=\widetilde{w}_{\bf K}/8\) and \(\mathbf{c}_{{\bf K}'}^i=\mathbf{c}_{{\bf K}}^i/2\). Then the symmetrization is applied again (the four blue points are connected by 4-fold rotation) to exclude the equivalent points, and the weight of the equivalent points is collected on the remaining point, while the vectors \(\mathbf{c}_{{\bf K}'}^i\) are not changed. After the new \({\bf K}\)-points are evaluated, we go to the next iteration of refinement. On each iteration any point may be refined, including both those from the initial regular grid, and those created during previous refinement iterations. The procedure stops after the pre-defined number of iterations was performed. Fig. 2 (g) shows how undesired artificial peaks of the the AHC curve are removed iteration by iteration, yielding a smooth curve.
Minimal-distance replica selection method
The MDRS method (Pizzi et al. 2020) allows to obtain a more accurate Wannier interpolation, in particular when moderate \({\bf q}\)-grids are used in the ab initio calculations. With MDRS method the Fourier transform (9) is modified in the following way:
where \(\mathbf{T}_{mn{\bf R}}^{(j)}\) are
\({\cal N}_{mn{\bf R}}\) lattice vectors that minimise the distance
\(|{\bf r}_m-({\bf r}_n+{\bf R}+{\bf T})|\) for a given set
\(m,n,{\bf R}\). However, the evaluation of
(21) is quite slower than
(9), because every
\({\bf k},m,n,{\bf R}\) an extra loop over \(j\) is needed.
Therefore calculations employing MDRS in postw90.x
(which is enabled
by default) takes more time. Instead it is convenient to re-define the
modified real-space matrix elements as
only once for the calculation, and then the transformation to \({\bf k}\)-space is performed via
Note, that the set of \({\bf R}\) vectors in (22) is increased compared to the initial set of vectors in (7) in order to fit all nonzero elements \(\widetilde{X}_{mn}({\bf R})\) Equation (23) having essentially same form as (9), can be evaluated via mixed Fourier transform, as described in Mixed Fourier transform.
Thus the MDRS method implemented in WB
via
Eqs. (22)-(23),
and has practically no extra computational cost, while giving notable
accuracy improvement.