[Octopus-devel] [Octopus-notify] svn commit: r8566 - trunk/src/states by joseba
Xavier Andrade
xavier at tddft.org
Wed Nov 16 16:48:26 WET 2011
Hi,
On Wed, Nov 16, 2011 at 11:07 AM, Joseba Alberdi <joseba.alberdi at ehu.es>wrote:
> Hi Xavier,
>
> in the test I made only with one thread is slower with OpenMP.
>
But without OpenMP with 1 thread is faster than OpenMP with 4 threads. And
the gain is not sigficant with 4 threads.
>
If I put the scheduling the times are much lower than without putting this
> clause, half more or less, so I think that it is not the default one.
>
Which implementation of OpenMP are you using?
> I know it is not the most important part of the code, in fact is only a
> memcpy. By the way, there is any implementation of C memcpy in fortran? If
> so it would be much faster than any OpenMP implementation. Otherwise we
> could also to wrap it form C.
>
>
For this we should use blas _copy routines.
> I would appreciate if you could give me the test you made. I want to know
> why is not scaling with OpenMP (at least in my case).
>
>
You can find the results here:
https://docs.google.com/spreadsheet/ccc?key=0AjYhC7stoeKvdHhVeEpZQVlfYzlSV29EN0dBTUV3d1E
> I start working on this because the coming generations of machines would
> have more and more processors and the memory is not going to increase so
> rapidly. This way we could also simulate bigger system using the resourcesmore efficiently.
>
The problem with OpenMP is that it requires a lot of effort, specially if
you want to have good performance in NUMA machines. Perhaps it would be
better to try OpenCL for CPUs.
Xavier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tddft.org/pipermail/octopus-devel/attachments/20111116/800ae825/attachment-0001.html>
More information about the Octopus-devel
mailing list