summer 2022 work- DYN/TRA halo cleanup and optimization

Context

NEMO is memory bound, meaning that it spends most of its execution time accessing arrays stored into fast memory to make computations. The larger the size of the used fast memory, the deeper are the access and the longer it takes to access it. The goal is to limit the memory footprint during NEMO's execution. This reduction is even more efficient inside FORTRAN loops.

Modifications of versioned files: DYN. Fortran routines (*.[Ffh]90) : dynadv_cen2.F90, dynadv_ubs.F90, dynvor.F90, dynhpg.F90, dynzdf.F90, dynldf_lab_blp.F90, dynldf_iso.F90
Modifications of versioned files: TRA. Fortran routines (*.[Ffh]90) : traadv_cen.F90, traadv_ubs.F90, traadv_mus.F90, trazdf.F90, traldf_lab_blp.F90, traldf_iso.F90
Modifications of versioned files: LDF. Fortran routines (*.[Ffh]90) : slope computation

Proposal

A way to decrease memory footprint is to reduce intermediate arrays dimension. Often it is possible to transform temporary 3D arrays into 2D arrays. We replaced 3D z-temporary arrays by 2D arrays by slicing the 3D loops along one axis.

Side effects :

remove nn_hls = 1 possibility
remove loop fusion
refactor ***ldf_iso and ***ldf_lap_blp routines
zps management is changed in order 1) to deal with optimization 2) to prepare penalization

The change in zps framework : in z + partial steps case

T-point location is unchanged so reference gdept/w keeps 1D or uniform
partial step are accounted for with penalization-like scale-factors e3t_0, e3u_0, e3v_0, e3f_0 As an important consequence gdepw_0 is no longer e3t_0 sum (also true for gdept_0 and e3w_0)
addition of vertical keys key_vco_.
removal of e3._0 and gdep._0 as declared variables as they are always substituted either by e3._1d/gdep._1d or by e3._3d/gdep._3d.
addition of variables e3._3d and gdep._3d when key_vco_3d is activated or e3t/u/v/f_3d when key_vco_1d3d is activated
AGRIF_DEMO broken...

Argument for removing loop fusion : Duration of routines (s) within the main branch, with and without loop fusion, and in the optimised branch : branch 80 with nn_hls = 2

| Routines         | main    | main+LF | branch 80 |
| :---             | :---:   | :---:   | :---:  |
| tra_adv (ubs)    |  50.11  |  45.47  | 47.58  |
| tra_zdf          |  11.69  |  11.73  |  3.55  |
| traldf_lap (iso) |  25.37  |  25.42  |  9.19  |
| dyn_zdf          |  11.84  |  11.84  |  4.09  |
| dyn_adv (ubs)    |  23.12  |  23.2   |  4.63  |
| dyn_vor          |  3.7    |  3.67   |  2.12  |
| dynldf_blp (hor) |  7.9    |  9.29   |  2.99  |
| TOTAL (OCE+ICE)  | 10099.31 | 11380.758 |  8056.93  |

You can

📋 Copy code blocks (```fortran ...```) or diff outputs (```diff ...```)
📎 Include files
🔗 Add external links.

⚠ Please remove all unnecessary lines in this description, like the one you are reading in italic, before creating the issue. ⚠

Edited Dec 16, 2022 by Sibylle Techene

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

summer 2022 work- DYN/TRA halo cleanup and optimization

Context

Proposal