[Octopus-devel] Problems with FFTW3

Florian Lorenzen lorenzen at physik.fu-berlin.de
Wed Jul 23 09:24:36 WEST 2008


Dear all,

I just recently discovered a serious problem with the way we call
FFTW3 routines, I'm very puzzled it did not appear earlier.

We initialize our FFTW3 plans with temporary arrays since the in and
out arrays have to be passed to the planner routine. To perform the
FFT we call dfftw_execute_dft or dfftw_execute_dft_{r2c|c2r} from the
so called guru interface of FFTW3. The routines allow to pass the in
and out arrays as additional arguments, BUT this only works under the
following conditions (from FFTW3 manual):

   * The array size, strides, etcetera are the same (since those are
     set by the plan).

   * The input and output arrays are the same (in-place) or different
     (out-of-place) if the plan was originally created to be in-place or
     out-of-place, respectively.

   * For split arrays, the separations between the real and imaginary
     parts, `ii-ri' and `io-ro', are the same as they were for the
     input and output arrays when the plan was created.  (This
     condition is automatically satisfied for interleaved arrays.)

   * The "alignment" of the new input/output arrays is the same as that
     of the input/output arrays when the plan was created, unless the
     plan was created with the `FFTW_UNALIGNED' flag.  Here, the
     alignment is a platform-dependent quantity (for example, it is the
     address modulo 16 if SSE SIMD instructions are used, but the
     address modulo 4 for non-SIMD single-precision FFTW on the same
     machine).  In general, only arrays allocated with `fftw_malloc'
     are guaranteed to be equally aligned.

The first three conditons are usually fulfilled but the fourth may or
may not be fulfilled because we obtain our memory by the Fortran
allocator which sometimes produces the same alignment as in the plan
creation and sometimes not.

I am not sure what is the best way to fix this.

1. Somehow use fftw_malloc, but this will get very clumsy and possibly
error-prone.

2. Pre-allocate the in and out arrays and use the mf2cf and cf2mf
functions to copy to and from these arrays. These should be the ones
used for the plan creation. This solution is the cleanest but could
cause a long trail of changes.

3. Pass FFTW_UNALIGNED. This disables SSE code. Perhaps, we could only
pass this flag when the Fortran allocator does not do alignment
properly which we check anyway. This solution is simplest and probably
appropriate because SSE cannot be used anyway if Fortran does not
align memory to 16 Byte boundaries.

I have not the time to fix this right now, therefore, I also put a
ticket.

Best regards,

Florian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.tddft.org/pipermail/octopus-devel/attachments/20080723/9439e17d/attachment.bin 


More information about the Octopus-devel mailing list