[Octopus-devel] [Octopus-notify] svn commit: r8566 - trunk/src/states by joseba

Xavier Andrade xavier at tddft.org
Wed Nov 16 16:48:26 WET 2011


Hi,

On Wed, Nov 16, 2011 at 11:07 AM, Joseba Alberdi <joseba.alberdi at ehu.es>wrote:

>  Hi Xavier,
>
> in the test I made only with one thread is slower with OpenMP.
>

But without OpenMP with 1 thread is faster than OpenMP with 4 threads. And
the gain is not sigficant with 4 threads.


>
If I put the scheduling the times are much lower than without putting this
> clause, half more or less, so I think that it is not the default one.
>

Which implementation of OpenMP are you using?


> I know it is not the most important part of the code, in fact is only a
> memcpy. By the way, there is any implementation of C memcpy in fortran? If
> so it would be much faster than any OpenMP implementation. Otherwise we
> could also to wrap it form C.
>
>
For this we should use blas _copy routines.


> I would appreciate if you could give me the test you made. I want to know
> why is not scaling with OpenMP (at least in my case).
>
>
You can find the results here:

https://docs.google.com/spreadsheet/ccc?key=0AjYhC7stoeKvdHhVeEpZQVlfYzlSV29EN0dBTUV3d1E


> I start working on this because the coming generations of machines would
> have more and more processors and the memory is not going to increase so
> rapidly. This way we could also simulate bigger system using the resourcesmore efficiently.
>

The problem with OpenMP is that it requires a lot of effort, specially if
you want to have good performance in NUMA machines. Perhaps it would be
better to try OpenCL for CPUs.

Xavier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tddft.org/pipermail/octopus-devel/attachments/20111116/800ae825/attachment-0001.html>


More information about the Octopus-devel mailing list