Arkouda is a framework for large-scale interactive data analytics that combines a Python front-end with a distributed server implemented using Chapel to run on parallel architectures ranging in size from a single node to an entire high-performance computing system. While Arkouda is capable of exploiting traditional high-performance compute capabilities, it is currently unable to use the powerful GPU accelerators available in many HPC systems.
In this talk, we will demonstrate how the Chapel GPUAPI can be used to accelerate Arkouda operations, which is most beneficial when a chain of operations is executed on the same data. We extend the GPUAPI to support shared virtual memory using CUDA unified memory and use this support to implement a custom domain map for Arkouda arrays. Our preliminary performance results show that GPU-accelerated operations in unified memory perform better than explicit memory management while simplifying the programming task for complex Arkouda operations.