[Octopus-devel] [Octopus-notify] svn commit: r8566 - trunk/src/states by joseba
xavier at tddft.org
Wed Nov 16 16:48:26 WET 2011
On Wed, Nov 16, 2011 at 11:07 AM, Joseba Alberdi <joseba.alberdi at ehu.es>wrote:
> Hi Xavier,
> in the test I made only with one thread is slower with OpenMP.
But without OpenMP with 1 thread is faster than OpenMP with 4 threads. And
the gain is not sigficant with 4 threads.
If I put the scheduling the times are much lower than without putting this
> clause, half more or less, so I think that it is not the default one.
Which implementation of OpenMP are you using?
> I know it is not the most important part of the code, in fact is only a
> memcpy. By the way, there is any implementation of C memcpy in fortran? If
> so it would be much faster than any OpenMP implementation. Otherwise we
> could also to wrap it form C.
For this we should use blas _copy routines.
> I would appreciate if you could give me the test you made. I want to know
> why is not scaling with OpenMP (at least in my case).
You can find the results here:
> I start working on this because the coming generations of machines would
> have more and more processors and the memory is not going to increase so
> rapidly. This way we could also simulate bigger system using the resourcesmore efficiently.
The problem with OpenMP is that it requires a lot of effort, specially if
you want to have good performance in NUMA machines. Perhaps it would be
better to try OpenCL for CPUs.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Octopus-devel