[Octopus-users] The result of gs calculation (energy) depends on core number

David Strubbe dstrubbe at berkeley.edu
Wed May 13 19:27:14 WEST 2009


David M,

If you have a queue, what you need to do is submit a job script that
contains as the executable command "make check," asking for 5 nodes, which
is the maximum used by any of the parallel tests.

David S

On Tue, May 12, 2009 at 11:40 PM, david marlan <davma7 at gmail.com> wrote:

> Ok, actually I found main line in oct-run_regression_test.pl :
>
> system("cd $workdir; $mpirun -np $np $octopus_exe_suffix > out ");
>
> Unfortunately I don't know perl, but  I'll try to write (bash) substitution
> of mpirun
> which will "hang"  until task finished (for example checking in loop queue
> state)
>
>
> But how octopus users solved this task? Is anybody here who use octopus_mpi
> with queue system?
>
>
> Thanks!
>
>
>
> On Wed, May 13, 2009 at 8:22 AM, david marlan <davma7 at gmail.com> wrote:
>
>> David, thanks for advice! I thought about it (make check) while
>> installation but I met
>> some problem:  the case is that  "make check" try to execute  "mpirun -np
>> 2  ..."  and
>> waits reply from this command.
>> But "mpirun -np 2  ..." in my system returns immediatelly  because  this
>> command push
>> the task to queue. Therefore all tests fail.
>>
>> So my question to you will be: how I can pass test suite when I have queue
>> system?
>>
>>
>> In my mini-cluster of 3 core I don't have queue system and I passed  this
>> test suite.
>>
>> The results are:
>>
>> "Finished test run.
>>
>>  Energy [step  1] :      [   OK   ]
>>  Energy [step  5] :      [   OK   ]
>>  Energy [step 10] :      [   OK   ]
>>  Energy [step 15] :      [   OK   ]
>>  Energy [step 20] :      [   OK   ]
>>  Forces [step  1] :      [   OK   ]
>>  Forces [step  5] :      [   OK   ]
>>  Forces [step 10] :      [   OK   ]
>>  Forces [step 15] :      [   OK   ]
>>  Forces [step 20] :      [   OK   ]
>>
>>     Passed:  29 / 34
>>     Skipped: 5 / 34
>>
>> Everything seems to be OK
>>
>> Total run-time of the testsuite: 00:15:37"
>> But I have not problems with octopus  on this mini cluster (my post about
>> result depensence on core number
>> related with cluster where I have queue system)
>> So the question how I can do "make test" on cluster with queue system.
>>
>> Thanks a lot.
>>
>> P.S: I  hope this message will be added to my main topic chain
>>
>>
>> On Wed, May 13, 2009 at 6:28 AM, David Strubbe <dstrubbe at berkeley.edu>wrote:
>>
>>> David,
>>>
>>> Have you tried running the testsuite via "make check"?  See if any tests
>>> fail with the parallel executable.  That will give some more information on
>>> whether this a build/library problem, or an inherent bug in the code.
>>>
>>> David Strubbe
>>>
>>> On Tue, May 12, 2009 at 4:58 PM, david marlan <davma7 at gmail.com> wrote:
>>>
>>>> Dear colleagues!
>>>> Sorry  for  my long writing here. I wanted to describe problem in all
>>>> details for understanding.
>>>> I have very strange behaivour of octopus.
>>>> Before all I note that I use gamess us (important:  with this MKL! see
>>>> below) on the same system
>>>> withoum any problem. I also use my own written parallel program also
>>>> without any problem.
>>>>
>>>>
>>>> 1)
>>>> My system details:
>>>> - MKL library : 10.1.015
>>>> - C compiler  : icc (ICC) 10.1 20081024
>>>> - fortran     : ifort (IFORT) 10.1 20081024
>>>> - MVAPICH     : (binded to intel compiler)
>>>>
>>>> 2) Intallation required libs
>>>> I used gsl 1-12.
>>>>
>>>> All I need to install GNU GSL: (path2libGSL is my path - actually  it
>>>> something like /home/username/lib/.../)
>>>>
>>>> ./configure --prefix=path2libGSL  CC=icc  \
>>>>     CPPFLAGS='-I/opt/intel/mkl/10.1.0.015/include -O3' \
>>>>         LDFLAGS=-L/opt/intel/mkl/10.1.0.015/lib/em64t \
>>>>         LIBS='-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5
>>>> -lpthread'
>>>> make
>>>> make install
>>>>
>>>> All ok here.
>>>>
>>>> I have fftw (versioin 3) already installed. Compiled with icc and ifort.
>>>>
>>>> ./configure --prefix=myfftwLibDir  CC=icc F77=ifort
>>>> make
>>>> make install
>>>>
>>>>
>>>> At my .profile file I added  (my SHELL is /bin/sh)
>>>> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/mkl/
>>>> 10.1.0.015/lib/em64t
>>>> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:path2libGSL/lib
>>>>
>>>> After that re-login (or executing  exec /bin/sh)
>>>>
>>>> Checking linking of GSL library
>>>>
>>>> cd path2libGSL/bin
>>>> ldd gsl-randist
>>>>         libgsl.so.0 => path2libGSL/lib/libgsl.so.0
>>>>         libgslcblas.so.0 => path2libGSL/lib/libgslcblas.so.0
>>>>         libmkl_intel_lp64.so => /opt/intel/mkl/
>>>> 10.1.0.015/lib/em64t/libmkl_intel_lp64.so
>>>>         libmkl_intel_thread.so => /opt/intel/mkl/
>>>> 10.1.0.015/lib/em64t/libmkl_intel_thread.so
>>>>         libmkl_core.so => /opt/intel/mkl/
>>>> 10.1.0.015/lib/em64t/libmkl_core.so
>>>>         libiomp5.so => /opt/intel/fce/10.1.021/lib/libiomp5.so
>>>>         libpthread.so.0 => /lib64/libpthread.so.0
>>>>         libm.so.6 => /lib64/libm.so.6
>>>>         libgcc_s.so.1 => /lib64/libgcc_s.so.1
>>>>         libc.so.6 => /lib64/libc.so.6
>>>>         libdl.so.2 => /lib64/libdl.so.2
>>>>         libimf.so => /opt/intel/fce/10.1.021/lib/libimf.so
>>>>         libsvml.so => /opt/intel/fce/10.1.021/lib/libsvml.so
>>>>         libintlc.so.5 => /opt/intel/fce/10.1.021/lib/libintlc.so.5
>>>>         /lib64/ld-linux-x86-64.so.2
>>>>
>>>>
>>>>
>>>> 3) Now I'm ready to install octopus-3.1.0
>>>>
>>>>
>>>> tar zxf octopus-3.1.0.tar.gz
>>>> cd octopus-3.1.0
>>>>
>>>> ./configure --prefix=path2octopusDir --with-fft=fftw3 \
>>>>                 --with-fft-lib="-LmyfftwLibDir/lib -lfftw3 -lm" \
>>>>         --with-blas="-L/opt/intel/mkl/10.1.0.015/lib/em64t-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread" \
>>>>                 --with-lapack="-L/opt/intel/mkl/10.1.0.015/lib/em64t-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core -liomp5
>>>> -lpthread" \
>>>>                 --with-gsl-prefix="path2libGSL"  --enable-mpi \
>>>>                 FC="mpif90" FCFLAGS="-u -zero -fpp1 -nbs -pc80 -pad
>>>> -align -unroll -O3 -ip -tpp7 -xW" \
>>>>                 CC="mpicc" CXX="mpiCC"
>>>> LIBS="-L/common/mvapich/lib/shared  -lmpich  -lfmpich" \
>>>>                 CPPFLAGS='-I/opt/intel/mkl/10.1.0.015/include-I/common/mvapich/include -O3'
>>>>
>>>> make
>>>> make install
>>>>
>>>> Now check the library for  octopus_mpi
>>>>
>>>> cd path2octopusDir/bin
>>>> ldd octopus_mpi
>>>>
>>>>         libmkl_intel_lp64.so => /opt/intel/mkl/
>>>> 10.1.0.015/lib/em64t/libmkl_intel_lp64.so
>>>>         libmkl_intel_thread.so => /opt/intel/mkl/
>>>> 10.1.0.015/lib/em64t/libmkl_intel_thread.so
>>>>         libmkl_lapack.so => /opt/intel/mkl/
>>>> 10.1.0.015/lib/em64t/libmkl_lapack.so
>>>>         libmkl_core.so => /opt/intel/mkl/
>>>> 10.1.0.015/lib/em64t/libmkl_core.so
>>>>         libiomp5.so => /opt/intel/fce/10.1.021/lib/libiomp5.so
>>>>         libpthread.so.0 => /lib64/libpthread.so.0
>>>>         libgsl.so.0 => path2libGSL/lib/libgsl.so.0
>>>>         libgslcblas.so.0 => path2libGSL/lib/libgslcblas.so.0
>>>>         libimf.so => /opt/intel/fce/10.1.021/lib/libimf.so
>>>>         libm.so.6 => /lib64/libm.so.6
>>>>         libmpich.so.1.0 => /common/mvapich/lib/shared/libmpich.so.1.0
>>>>         libfmpich.so.1.0 => /common/mvapich/lib/shared/libfmpich.so.1.0
>>>>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1
>>>>         libibumad.so.1 => /usr/lib64/libibumad.so.1
>>>>         librt.so.1 => /lib64/librt.so.1
>>>>         libifport.so.5 => /opt/intel/fce/10.1.021/lib/libifport.so.5
>>>>         libifcore.so.5 => /opt/intel/fce/10.1.021/lib/libifcore.so.5
>>>>         libsvml.so => /opt/intel/fce/10.1.021/lib/libsvml.so
>>>>         libirc.so => /opt/intel/fce/10.1.021/lib/libirc.so
>>>>         libdl.so.2 => /lib64/libdl.so.2
>>>>         libc.so.6 => /lib64/libc.so.6
>>>>         libgcc_s.so.1 => /lib64/libgcc_s.so.1
>>>>         /lib64/ld-linux-x86-64.so.2
>>>>         libintlc.so.5 => /opt/intel/fce/10.1.021/lib/libintlc.so.5
>>>>         libibcommon.so.1 => /usr/lib64/libibcommon.so.1
>>>>
>>>> A repeat here compiling program for me is not uniq proccess and for
>>>> other
>>>> cases I don't have problems.
>>>>
>>>>
>>>> Ok, at this moment all ready.
>>>>
>>>>
>>>> 4) Let us to investigate  task  Methane from  octopus tutorial:
>>>> http://www.tddft.org/programs/octopus/wiki/index.php/Methane_molecule
>>>>
>>>> I mean I use the  same inp file:
>>>>
>>>> CalculationMode = gs
>>>> Units = eV_Angstrom
>>>>
>>>> radius = 3.5
>>>> spacing = 0.25
>>>>
>>>> CH = 1.2
>>>> %Coordinates
>>>>   "C" |           0 |          0 |           0 | no
>>>>   "H" |  CH/sqrt(3) | CH/sqrt(3) |  CH/sqrt(3) | no
>>>>   "H" | -CH/sqrt(3) |-CH/sqrt(3) |  CH/sqrt(3) | no
>>>>   "H" |  CH/sqrt(3) |-CH/sqrt(3) | -CH/sqrt(3) | no
>>>>   "H" | -CH/sqrt(3) | CH/sqrt(3) | -CH/sqrt(3) | no
>>>> %
>>>>
>>>>
>>>> The need to get the following result (described on site):
>>>>
>>>> Spacing        Total Energy
>>>> 0.25           -218.91429602
>>>> 0.225          -218.58280698
>>>> 0.2            -218.30870085
>>>> 0.175          -218.21486251
>>>> 0.15           -218.13058092
>>>> 0.125          -218.15086348
>>>> 0.1            -218.14591227
>>>>
>>>> My results  while running on 32 cores, take a look please:
>>>>
>>>> 0.25   -218.91210793
>>>> 0.225  -218.58075825
>>>> 0.2    -218.30704110
>>>> 0.175  -218.21316709
>>>> 0.15   -218.12850191
>>>> 0.125  -218.14888868
>>>> 0.1    -218.14429408
>>>>
>>>> seems to be ok, but what will have while running on 8 cores:
>>>>
>>>> 0.25   -218.91210790
>>>> 0.225  -218.58075825   (not changing with different runs)
>>>> 0.2    -218.30704111   (not changing with different runs)
>>>> 0.175  -218.21316709
>>>> 0.15   -98.96855841 ;  -98.96857260; -98.96855840    (each time
>>>> different)
>>>> 0.125  -112.10695544;  -112.20904537; -69.05923167; -112.12479289 (each
>>>> time different)
>>>> 0.1    -117.09111257;  -102.67383053; -117.44072169; (each time
>>>> different)
>>>>
>>>>
>>>> At the same time I tested this example on other my mini-cluser of 3
>>>> cores
>>>> - MKL library : 10.1.1.019
>>>> - C compiler  : icc (ICC) 11.0.081
>>>> - fortran     : ifort (IFORT) 11.0.081
>>>> - openMPI     : versioin 1.2.4 (binded to intel compiler)
>>>> - GSL         : 1.12  (complied as above)
>>>>
>>>> octopus compiled the  same way as above.
>>>>
>>>> The results are:
>>>> 0.25   -218.91120793
>>>> 0.225  -218.58075827
>>>> 0.2    -218.30704108
>>>> 0.175  -218.21316712
>>>> 0.15   -218.12850191
>>>> 0.125  -218.14888868
>>>> 0.1    -218.14429481
>>>>
>>>> i.e.  for this mini-cluster of 3 cores I have acceptable results (at
>>>> least with this example)
>>>>
>>>>
>>>> What does it mean?  The result depends on number of cores?  What will be
>>>> your recomendation for me?
>>>> All ideas are great appreciated.
>>>>
>>>> Thanks.
>>>> P.S: can anyone reproduce such a behaviour???
>>>>
>>>>
>>>> _______________________________________________
>>>> Octopus-users mailing list
>>>> Octopus-users at tddft.org
>>>> http://www.tddft.org/mailman/listinfo/octopus-users
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.tddft.org/pipermail/octopus-users/attachments/20090513/affc97c9/attachment-0001.html 


More information about the Octopus-users mailing list