summer 2022 work- DYN/TRA halo cleanup and optimization
Context
NEMO is memory bound, meaning that it spends most of its execution time accessing arrays stored into fast memory to make computations. The larger the size of the used fast memory, the deeper are the access and the longer it takes to access it. The goal is to limit the memory footprint during NEMO's execution. This reduction is even more efficient inside FORTRAN loops.
-
Modifications of versioned files: DYN. Fortran routines ( *.[Ffh]90
) : dynadv_cen2.F90, dynadv_ubs.F90, dynvor.F90, dynhpg.F90, dynzdf.F90, dynldf_lab_blp.F90, dynldf_iso.F90 -
Modifications of versioned files: TRA. Fortran routines ( *.[Ffh]90
) : traadv_cen.F90, traadv_ubs.F90, traadv_mus.F90, trazdf.F90, traldf_lab_blp.F90, traldf_iso.F90 -
Modifications of versioned files: LDF. Fortran routines ( *.[Ffh]90
) : slope computation
Proposal
A way to decrease memory footprint is to reduce intermediate arrays dimension. Often it is possible to transform temporary 3D arrays into 2D arrays. We replaced 3D z-temporary arrays by 2D arrays by slicing the 3D loops along one axis.
Side effects :
- remove nn_hls = 1 possibility
- remove loop fusion
- refactor ***ldf_iso and ***ldf_lap_blp routines
- zps management is changed in order 1) to deal with optimization 2) to prepare penalization
The change in zps framework : in z + partial steps case
- T-point location is unchanged so reference gdept/w keeps 1D or uniform
- partial step are accounted for with penalization-like scale-factors e3t_0, e3u_0, e3v_0, e3f_0 As an important consequence gdepw_0 is no longer e3t_0 sum (also true for gdept_0 and e3w_0)
- addition of vertical keys key_vco_.
- removal of e3._0 and gdep._0 as declared variables as they are always substituted either by e3._1d/gdep._1d or by e3._3d/gdep._3d.
- addition of variables e3._3d and gdep._3d when key_vco_3d is activated or e3t/u/v/f_3d when key_vco_1d3d is activated
- AGRIF_DEMO broken...
Argument for removing loop fusion : Duration of routines (s) within the main branch, with and without loop fusion, and in the optimised branch : branch 80 with nn_hls = 2
| Routines | main | main+LF | branch 80 |
| :--- | :---: | :---: | :---: |
| tra_adv (ubs) | 50.11 | 45.47 | 47.58 |
| tra_zdf | 11.69 | 11.73 | 3.55 |
| traldf_lap (iso) | 25.37 | 25.42 | 9.19 |
| dyn_zdf | 11.84 | 11.84 | 4.09 |
| dyn_adv (ubs) | 23.12 | 23.2 | 4.63 |
| dyn_vor | 3.7 | 3.67 | 2.12 |
| dynldf_blp (hor) | 7.9 | 9.29 | 2.99 |
| TOTAL (OCE+ICE) | 10099.31 | 11380.758 | 8056.93 |
You can
-
📋 Copy code blocks (```fortran ...```) or diff outputs (```diff ...```) -
📎 Include files -
🔗 Add external links.