[Octopus-devel] [Octopus-notify] svn commit: r3173 - in trunk: liboct src/basic src/species testsuite/finite_systems_2d testsuite/finite_systems_3d by xavier
xavier at tddft.org
Fri Sep 7 15:15:57 WEST 2007
On Fri, 7 Sep 2007, alberto.castro at tddft.org wrote:
> Great work, I could compile and run in single precision in my machine,
> this feature goes back to work. Just one comment regarding the atomic
> code: I remember that the last time that we tried the code in single
> precision we had to make an exception for the atomic code (that module was
> compiled in double precision) because some of the algorithms would not
> converge for some pseudopotentials. Maybe you have not encountered that
> problem, but it is good to remember just in case.
What I had to do was to reduce the tolerance of the convergency, but as you
say may be it is necessary to do it in double precision to get accurate
I also have the problem that the conjugated gradients solver for the
sternheimer equation converges erratically, while the biconjugated gradients
stabilized converges nicely.
> The time gain is however modest; I don't know what is your experience, but
> in my case, whereas some operations take significantly lower time
> (MF_INTEGRATE), others seem to take almost the same time in double and
> in single precision (the key: NL_OPERATOR). I remember that this was also
> the experience the last time we tried, I don't know if there is a way to
> overcome this problem.
The main advantage of single precision is that you reduce by a half
bandwidth and cache requirements, if you use vectorial instructions you can
also double the operations throughput, but as the compiler can't vectorize
always, this is more difficult to get.
In the case of NL_OPERATOR currently the main bottleneck is in bringing the
huge array of indexes (op%i) from memory, so you can't gain much by reducing
the size of the operands. And this code is not easy to vectorize to depth 4,
because evene when you can operate in parallel, but you can't load four
values at a time due to the distribution in memory of the values and
Probably to further optimize NL_OPERATE we should study strategies to store
and operate with sparse matrices efficiently.
More information about the Octopus-devel