The Asynchronous Partitioned Global Address Space (APGAS) programming model enables programmers to express the parallelism and locality necessary for high performance scientific applications on extreme-scale systems.
We used the well-known LULESH hydrodynamics proxy application to explore the performance and programmability of the APGAS model as expressed in the X10 programming language. By extending previous work on efficient exchange of ghost regions in stencil codes, and by taking advantage of enhanced runtime support for collective communications, the X10 implementation of LULESH exhibits superior performance to a reference MPI code when scaling to many nodes of a large HPC system. Runtime support for local parallel iteration allows the efficient use of all cores within a node. The X10 implementation was around 10% faster than the reference version using C++/OpenMP/MPI, across a range of 125 to 4,096 places (750 to 24,576 cores).