Explicit accelerator-data management with OpenACC directives
Context
When using GPU accelerators that do not share memory with the CPU, OpenACC directives can be utilised to persistently allocate copies of arrays that exist in CPU memory on the GPU hardware, in order to avoid implicit data transfers that might occur when entering and leaving accelerated code sections. Such array pairs, however, may require explicit data transfer instructions for appropriate synchronisation.
Proposal
The addition of a PSyclone transformation script for generating at the start of the time-step subroutine OpenACC acc enter data
directives for nominated module variables (specified in a dictionary of lists of variables, grouped by the module that contains the respective variable), in order to make them accelerator-resident during time stepping, is proposed (in branch_5.0
and main
).
Further, local arrays that are hoisted to module level when building the model with GPU acceleration (see script psct-LocalToGlobalArrays.py
) can often be made accelerator-resident; it is thus also proposed to extend script psct-LocalToGlobalArrays.py
with an option to add OpenACC acc enter data
directives for all such hoisted variables that are either solely used inside OpenACC kernels regions or explicitly nominated.
If required in CPU-executed sections of the code, the CPU-memory version and its GPU-resident copy of an array can be explicitly synchronised using OpenACC acc update
directives. It is suggested to add an optional PSyclone transformation script that generically encloses all lbc_lnk
calls with appropriate update directives, so that halo exchanges are successfully applied to GPU-resident arrays; this approach is a workaround only as, in the absence of CPU-GPU memory sharing, it triggers costly data transfers, and a separate development is planned to supersede this workaround by native lbc_lnk
support of GPU-resident arrays.