Questaal Home
Navigation

Troubleshooting

This page documents some of the error messages that can appear in the Questaal suite, and their resolution.

Fatal errors typically begin with a message  Exit −1 routine-name message, indicating where and why the program failed. Sometimes non-fatal, warning messages are given. Usually the message has  “(warning)“  or something similar.

If the warning is severe, there will be an accompanying exclamation mark, e.g.  warning!, which will be logged and indicated on exit (see severe warnings below).

This page discusses problems that may arise, and also error messages that may appear. Error messages are also summarized in the Error Messages page, with solutions. On this page the discussion is more discursive.

Table of Contents


General

Generally, your directory should be cleaned after every complete simulation. For example, if you run a simulation with extension .si, then edit the input files and then rerun the simulation, you may get conflicts as old information about .si simulation runs still exist in the directory.

It is thus good practice to clean your directory of unnecessary files before a new simulation (only files related to the material in question need be cleaned), especially cleaning the rst and mixm files.

Also, you should inspect the standard output! Particularly look out for the severe warning message when the code exits.

Severe Warnings

If a Questaal program includes a  [severe warning]  in the Exit message:

Exit n [1 severe warning] message

it means that a significant problem was encountered which should be investigated. A warnings of this type is logged to indicate you should look for a message somewhere the standard output containing the string  warning!.

Note: less severe warnings can also appear in the text. These are not indicated when the program exits.

1. Unexpected MT radius

If you get an error similar to the following:

unexpected value # for file rmt ... expected #

it is probably because you have a mismatched restart file rst.ext which contains augmentation radii from a prior run. This information, should you change your input files, could become invalid for the current simulation and thus cause errors. See Error messages.

2. Problems with Radial Schrodinger equation solver

A message like this may appear:

RSEQ : nit gt999 and bad nodes for l=2.  Sought 0 but found 1.  e=-2.0811D-01

This can occur for a myriad of reasons, usually because something is amiss in the potential (issue [3]), or because the linearization energy is too far out of range (issue [4]).

3. Small errors in radial integration, ASA

The Questaal codes integrate partial waves on a radial mesh.

It turns out that the algorithm it uses can become unstable in the ASA, when GGA functionals are used. This is because very small discontinuities can appear in the second energy derivative of the partial wave, at the point where the inward- and outward- radial integrations meet. This tiny error gets amplified by the GGA, which since the potential involves gradients and laplacians of the density.

The resolution to this is to add 4 to whatever value you have selected (if any so far) to tag HAM_QASA.

This switch causes the integrator to integrate only outwards when making and . (The standard procedure is to integrate inwards and outwards to a middle point, and join the solution). Outward-only integration can be slightly erroneous when states are deep and core-like (when the wave function is decaying exponentially away from the nucleus), but in the ASA you are not likely to have a valence state where this is a significant issue.

4. Bad logarithmic derivative parameters

The linear method approximates the energy-dependent partial wave with a first-order Taylor series about a linearization energy, traditionally called . Normally you do not specify in the Questaal package, but the continuously varying principal quantum number Pl. You can specify Pl yourself (indeed on getting started an initial estimate is necessary), but normally the codes will float this quantity in the course of a self-consistency cycle.

4.1 P is too small

It may happen that Pl is floated to a value that is too small (referring to the fractional part of Pl). This can happen if the natural band center is far above the Fermi level, e.g. the Ga state. lmf mostly protects against this by not allowing Pl to fall below the free electron value but the ASA codes do not. This is intentional, because what value is acceptable depends on whether the state is folded down or not. (lmf does not have this capability yet.)

Note: This is a common reason for nonsensical ghost bands to appear.

By default lmf uses as a lower bound three free electron value for Pl. This is usually adequate. If HAM_AUTOBAS_GW is true, the lower bound is set a little bit higher. This is because in the GW case, the unoccupied states also affect the potential, so there is a higher demand on the precision of states above the Fermi level.

You can also freeze Pl with token SPEC_ATOM_IDMOD.

4.2 P is too large

It may happen that Pl is floated to a value that is too large (referring to the fractional part of Pl), i.e. too close to 1. This can happen if the natural band center is very deep; this can occur when using lmf with local orbitals. Usually it is not an issue, but if it is, you can set PZ to a fixed value and freeze it with SPEC_ATOM_IDMOD.

It can also happen that the floating algorithm goes haywire and puts Pl unrealistically high. This is rare, but it occurs occasionally. A classic example appears when carrying out a QSGW calculation for Ni, where the usual floating algorithm tries to set Pd=3.96 (it should be about 3.85).

The resolution to this is to revert to lmf’s traditional floating algorithm. By default lmf’s uses a new algorithm introduced by Takao Kotani with the advent of version 7.0. We make the newer scheme the default because should in principle be slightly better (in practice there is little difference), but for reasons unknown it occasionally develops problems.

The choice of floating algorithm is buried in the second argument to this tag, input through either express_autobas_pfloat or HAM_AUTOBAS_PFLOAT. You can find a brief description of the tag by running

$ lmf --input

To select the traditional floating algorithm, set the second element of pfloat to zero, in the input file e.g.

express autobas[... pfloat=2,0]

Note: If you build the input file with the blm utility, you can cause it to autoselect this algorithm by invoke blm with switch  --clfloat.

Alternatively, you can freeze Pl with token SPEC_ATOM_IDMOD, but usually the floating algorithm will make a better choice.

4.3 PZ is too large

The local orbital’s version of P (called PZ) can occasionally become too large for deep lying states. If you freeze it with SPEC_ATOM_IDMOD, both PZ and the usual valence P are frozen. For an example see this tutorial

5. Inconsistent treatment of local orbitals

5.1 Inconsistent contents of local orbitals in the restart file

When reading the restart file, lmf may produce a message like this:

         site   1:Se      :file pz  is  0.00  0.00  0.00  0.00
                          given pz  is  0.00  0.00  3.90  0.00
                          warning!  local orbital mismatch

lmf has found that the local orbitals specified in the input file are not consistent with those in the restart file. This mismatch flags a severe warning, and may also cause lmf to stop with an error, failing to find the eigenvalues..

For solution, see Error Messages.

5.2 Error in reading the atm file

When reading the file atm.ext, lmf may abort with an error message like this:

 is=1 qsc0=10 qsca=10 qc=28 qc0=28
 Exit -1 problem in locpot -- possibly low LMXA, or orbital mismatch, species Se

It can happen for more than one reason, but the most common is that lmfa generated the atm file with a valence-core partitioning different from what lmf expects.
Usually this means you invoked lmfa with a different set of local orbitals than the current input conditions specify (see the callout box on this page). For example, lmfa create the atm; you create a new basp file with a different set of local orbitals but do not re-run lmfa with altered valence-core partitioning.

The solution is to run lmfa again with the same input conditions as lmf expects.

6 Non-integral number of electrons

You may encounter this message, especially when using the tetrahedron method :

 (warning): non-integral number of electrons --- possible band crossing at E_f

In finding a Fermi level the integrator assigns weights to each state. This message is printed when the sum of weights don’t add up to an integral number of electrons. It mostly likely occurs when using the tetrahedron integration scheme and two bands cross near the Fermi level. This confuses the tetrahedron integrator because doesn’t know how to smoothly interpolate the bands. The larger the system with a denser mesh of bands, the likely this problem appears.

It can also appear if you use a non-integral nuclear charge, or add background charge to the system. This is not an error, and you can disregard the warning.

The resolution to this is to change the number of k divisions. In can happen that the problem will resolve itself in the course of the self-consistency cycle, as the potential changes.

7 Inexact Inverse Bloch transform

This error may appear when the (static) self-energy Σ0 is read from disk, causing the program to stop

 Oops!  Bloch sum deviates more than specified tolerance:
   i   j      diff              bloch sum                file value
   1   1    0.000020     -0.020464    0.000000     -0.020443    0.000000
 Try increasing HAM_RSRNGE or HAM_RSSTOL

This occurs when the inverse Bloch sum of to the real space is inexact. (Here T is a lattice translation vector.) The reader performs the inverse Bloch transform using FFT techniques. It is followed by a forward Bloch sum and compared against the original .

For the inverse sum, a cluster of points around each atom is generated. The radius of the cluster is printed out in a line similar to

 hft2rs: make neighbor table for r.s. hamiltonian using range = 4.9237 * alat

The number of pairs (connecting vectors) it finds within spheres around each atom of radius  range  is printed out in lines like these:

 hft2rs: found 1020 connecting vectors out of 1024 possible for FFT
 symiax: enlarged neighbor table from 1020 to 1452 pairs (48 symops)

The first line tells you how many pairs it found out, and also the number pairs it requires for the FFT to be exact. (The second tells you how many pairs it “padded” by adding equivalent boundary points to the original list, but it is not important here). The (backward,forward) process should be exact provided enough pairs are available. In the lines above, only 1020 pairs were found, while 1024 pairs are needed so as not to lose any information in the process.

To resolve this problem, increase  HAM_RSRNGE. You can use as large a number as you like, but the larger the number, the slower the calculation. Best to increase  HAM_RSRNGE just enough (say 4.9237→6) so that the error does not appear. Alternatively, you can increase the acceptable tolerance in the error (HAM_RSSTOL).

8 Failure to find all eigenvalues

The diagonalizer is unable to calculate all of the eigenvalues. lmf aborts with a message similar to
Exit -1 zhev: zhegv cannot find all evals

The ASA lm may abort with the message
DIAGNO: tinvit cannot find all evecs

This can happen for several reasons.

  • The diagonalizer sometimes uses inverse iteration to diagonalize the tridiagonal form of the matrix after the Householder transformation.

    Solution: Set [BZ_INVIT] to false; another algorithm will be used. If this is the problem it will usually disappear with some tiny change, e.g. the density is updated.

  • Another common reason for this error is that the overlap matrix is not positive definite.

    • Especially in the ASA, this can happen if spheres overlap too much or the potential is very poor. Change the input conditions.

    • If this occurs when using the lmf code, it may be that convergence parameters are too loose. Especially the PMT basis can produce nearly singular overlap matrices when both the LMTO and APW basis are sizable. This is because they are spanning nearly the same Hilbert space (this is the primary drawback to the method).

    • This can also happen with lmf if your restart file has a valence-core partitioning inconsistent with the ctrl file, also explained here.

      Solutions:

      • first try to increase the order of polynomial used in augmentation method KMXA, the default is 3 so try 4 or 5.
      • tighten the Ewald tolerance (EWALD_TOL) and the tolerance in the plane-wave expansion of envelope functions (HAM_TOL).
      • Reduce HAM_PWEMAX.
      • remove orbitals from the LMTO basis or set EH more negative.
      • Set HAM_OVEPS to a small number, e.g. 1e-6.
      • Remove rst.ext

9 Too many group operations cause code to crash

You may a message similar to this one, running many of the executables, e.g. lmchk and lmf.

 Exit -1 GRPGEN: too many elements

or this one:

 Exit -1 SGROUP: ng greater than ###:  see Questaal troubleshooting web page

or this one:

pwd MKSYM (warning): generators create more than ngmx=## group ops ...

This can happen for different reasons:

  1. You have made an artificial supercell and an executable like lmchk finds too many internal translation vectors

  2. You did not specify the lattice vectors with enough precision. If this is the issue the command line switch --tidy can often resolve the problem. See this tutorial for more detail. See in particular Example 5 and Example 7.

  3. You did not specify the basis vectors with enough precision. Here again, --tidy usually cleans up the site positions. Sometimes some extra intervention is needed, however. See Example 6 of the same tutorial for an example and a remedy.

10 Warning! symmetry operations not consistent with Bravais lattice!

This message usually appears because the lattice vectors were not supplied with enough precision. If this is the issue the --tidy switch can often resolve the problem. Example 1 in this tutorial provides an instance, an analysis of the problem, and a resolution.

11 Exit -1 lgen: more than 1 missing plat … try reducing EWALD_TOL

This may happen if you have a long, pencil-like unit cell. Reduce EWALD_TOL by adding this tag:

EWALD TOL=1e-12

or some tolerance much smaller than the default (10−8)

12 Exit -1 xlgen: too many vectors, n=

This may happen when you have very eccentric cells, or EWALD_TOL is very small (see 11, preceding issue). the number of allowed direct lattice vectors can be increased with the EWALD_NKDMX tag, e.g.

EWALD TOL=1e-12 NKDMX=5000

13 NGHBOR: too many pairs: …

This may happen in different contexts, e.g. running lmchk --angles. The default radius for creating neighbor tables (tables of atom pairs) is too small, which may occur if the length of one or more of the lattice vectors (PLAT) are significantly larger than unity. You can modify the input file by scaling PLAT by some factor, and scaling ALAT by its inverse.

In some contexts you can fix the problem by increasing the radius, e.g. lmchk --angles:r=3.

14 mapirrq (warning): small mismatch q …

This can occur if the lattice vectors deviate slightly from those dictated by a given symmetry. On rare occasions it can cause the GW code to crash.

As one example consider the following set of lattice vectors:

1    0          0 
-1/4 0.4330128  0 
0    0        19.612656/6.387771

This is equivalent to seven decimal places to

1    0          0 
-1/4 sqrt(3/16) 0 
0    0        19.612656/6.387771

Most of the time this slight inconsistency is accommodated, but on rare occasions (this case is one example) the difference is enough to confuse the GW code and cause it to crash. The remedy is to improve the precision when defining the lattice vectors.

B1 (blm) CVPLAT error message

blm may produce this error:

CVPLAT: could not calculate platcv

Usually these messages indicate the symmetry finder got confused, when trying to deduce the crystal system from given lattice vectors.

The most likely reason is that it found some point group operations, but not an internally consistent set. This can happen if you don’t supply enough precision in the lattice vectors. The command line switch --tidy can usually resolve the problem. See this tutorial for more details, in particular Example 3

Warning: --tidy will use the (approximate) symmetry operations it finds to make small adjustments to either lattice vectors or site positions. If you really want the configuration as given, run blm with the --nosym switch.

B2 (blm) BRAVSY : Inconsistent rotations

blm may produce an error similar to this one:

BRAVSY : inconsistent threefold rotations ... try running blm with --pr55 --quit=crysys

As in the case of the CVPLAT error message, the symmetry group operations lattice vectors are too imprecise. See Example 2 in this tutorial for an example and a resolution.

B3 (blm) FIXPOS error message

blm may produce an error similar to this one:

 Exit -1 FIXPOS: positions incompatible with symgrp:  dpos=0.000035

This means the site positions you specified are incompatible with the symmetry operations.

You may have to reduce the symmetry operations, or it may be that the site positions are merely incompatible because they were specified with not enough precision.
Try blm --tidy or blm --fixpos:tol=1e-4 or even blm --fixpos:tol=1e-3.
Usually the (command line switch) --tidy will resolve the problem. See this tutorial.

Warning: These switches make small adjustments to either lattice vectors or site positions.

B3 (blm) bug in pfixplat

You may occasionally encounter this error message when using the --tidy switch

 Exit -1 bug in pfixplat

An example of this can be found in the POSCAR file for YbTlPd in the Materials database.

Routine pfixplat, which tries to make adjustments to the lattice vectors approximately consistent with symmetry operations, is not entirely robust. A safe thing to do is to suppress the call to that routine, e.g.

blm --tidy~fixlat=0 ...

G1 Error message ecore>evalence encountered

hsfp0 may produce this error:

 ---- hsfp0 ixc=3: ecore>evalence
 ---- ERROR EXIT!

This means that some core state is higher than the lowest valence state.

If you look near the bottom of the output you should find a message similar to the following:

 hsfp0 core level ecore(  4,1) =  -2.3631 lies above bottom of valence band = -21.4956

It says that core level #4, spin 1 is the offender. Look at the core table in the GWinput file. In this instance it reads

  atom   l    n  occ unocc   ForX0 ForSxc :CoreState(1=yes, 0=no)
    1    0    1    0    0      0    0    ! 1S *
    1    0    2    0    0      0    0    ! 2S
    1    0    3    0    0      0    0    ! 3S
    1    0    4    1    0      0    0    ! 4S

The fourth core level is the 4s state (it is a Sr atom).

To fix this problem, include the 4s in the valence as a local orbital. (You will also have to include the 4p state: it is higher than the 4s) If you are using a basp

 Sr RSMH= ... PZ= 14.9338 14.9148

lmfa should find this state automatically if HAM_AUTOBAS_LOC is set.

Other problems

This section documents problems that do not generate errors or warning messages.

1. Access Denied on https://bitbucket.org/lmto/lm/src/master/README.md

If this happens to you and you have no idea how to resolve it, please follow the following steps:

  1. Make sure you have accepted the invitation sent by e-mail and can login to https://bitbucket.org.

  2. While you are logged in try to open the link https://bitbucket.org/lmto/lm/src/master/README.md in the same browser. It should now work.

2. blm generates nonsensical site file

You may find blm generates a different structure than what you expect, e.g. the number of sites it generates is not what it should be. It cannot be ruled out that blm has a bug in it, but it is more likely you are using blm incorrectly.

One giveaway is that the number of atoms in the basis is different from what you expect. When you identify the lattice through the space group, blm uses the known symmetry operations of the group to add new atoms to make the basis consistent with the group. The operations it uses are taken from conventions in the International Tables for Crystallography. blm may generate incorrect results because the choice of origin is different from 1. Some crystal structures have two choices of origin; it may be that the input data you have corresponds to the second choice. Unless you specify otherwise blm uses the first choice by default. To specify the latter use LATTICE_ORIGIN=2.

LiFeAs (space group P4/nmm) is a typical instance of this. Information in the literature is most commonly specified using the second choice of origin. Look at file init.lifeas in top-level-directory/testing/init.lifeas. If you invoke blm with ORIGIN=2 taken out of init.lifeas, it will create a site file with 20 atoms and nonsensical coordinates. With ORIGIN=2 included it generates the correct site file with 6 atoms.

As a check of the structure, try running lmchk and verify that bond lengths and bond angles are what you expect, for example lmchk --shell --angles.

3. lmf generates unphysical or “ghost” bands

This can occur for a myriad of reasons, and there is no single explanation, though usually it is connected with deep-lying, core-like states.

One instance where it occurs has to do with inaccuracies in overlapping free-atomic densities. lmf makes a polynomial expansion for the smooth density in the augmentation sphere. When envelope functions are sharply peaked the expansion for the onsite part can become unstable. In a future release this problem will solved, but an easy solution for now is reduce the polyonomial cutoff, which you can control with tag SPEC_ATOM_KMXV.

There is an instance of this in the test suite; look at top-level-directory/fp/test/ctrl.cs. You can run this test in any working directory with the command top-level-directory/lm/fp/test/test.fp cs. This input file is will make an accurate calculation of Cs; you the results should compare very well with the DeltaCodes project.

If you remove the KMXV token from top-level-directory/fp/test/ctrl.cs and run the test, you will find it generates nonsensically deep states.

Another common cause for “ghost” bands: sometimes lmf will try to straddle two principal quantum numbers, which occurs when states near the Fermi level are far removed from the center of gravity of a partial wave. One solution is to freeze the logarithmic derivative parameters in the offending augmentation channel. Freeze these parameters with token SPEC_ATOM_IDMOD. The standard FP and ASA test suites provide several examples of where IDMOD is used; see, e.g. ctrl.kfese or ctrl.zbgan in top-level-directory/fp/test. In the ASA context, ctrl files in top-level-directory/testing make numerous uses of this token.

“Ghost” bands can occur in the QSGW context for a different reason. Interpolation in the QSGW self-energy, Σ0, to k-points other than ones for which it was generated is a delicate matter, and it can go wrong. When it does occur you can often see rapidly varying streaky bands. One simple workaround is to reduce HAM_SIGP_EMAX. Making it smaller reduces the accuracy but makes the interpolation more stable. EMAX=2 is a good compromise if you are having problems. Another alternative is to make the basis functions shorter ranged. Do this by pushing EH deeper; if you are using basp files, you can modify that file.

See Table of Contents

4. Problems in drawing energy bands

This can occur for a myriad of reasons, and there is no single explanation. If you can create a bnds file using lmf or some other code, but see nothing sensible when drawing pictures, it may be that lmf doesn’t have the right Fermi level.

Look at the top line of the bnds file. It should at least three numbers, like this:

300 999.00000     0

The middle number is the Fermi level. If it is large, like 999, it means lmf didn’t read the Fermi level and put a nonsensical number there instead. Run lmf again in regular mode (no --band switch) but add --quit=band so it stops before overwriting the rst file. You can see the Fermi level in the output. Edit the bnds file, or run lmf with --band again.