scaling issues with diamlr
Poor performance with diamlr option in larger configurations
Some work with a 4.2.2 base code running a eORCA025 configuration across several thousand cores has revealed a significant performance hit when using the multiple linear regression option to perform harmonic analysis on tidally affected fields.
-
Branches impacted: branch_4.2 and main -
Reference configuration/test case (chosen or used as template) AMM -
Dependencies: consistent behaviour with both XIOS2 and XIOS3
Analysis
In this case dia_mlr
consumes over 60% of the elapsed time despite being little more than a single iom_put
call of a 2D field. The cause can be traced to the 2d to scalar transformation that is being carried out by XIOS
on receipt of the 2D diamlr_time
field. By default this is a global reduction that is blocking and does not scale well. The reduction is largely redundant since the value is the same everywhere and XIOS
can be instructed to carry out the reduction on local arrays only. Doing so removes the scaling issue and activating diamlr diagnostics has little impact on performance.
Fix
Simply add the local attribute to the reduction operator:
index d66b314..6cd63ea 100644
--- a/cfgs/SHARED/grid_def_nemo.xml
+++ b/cfgs/SHARED/grid_def_nemo.xml
@@ -377,9 +377,8 @@
</grid>
<grid id="diamlr_grid_2D_to_scalar" >
<scalar>
- <reduce_domain operation="average" />
+ <reduce_domain operation="average" local="true"/>
</scalar>
- <scalar />
</grid>
<!-- grid definitions for the computation of daily detided model diagnostics (diadetide) -->
<grid id="diadetide_grid_T_2D" >
There are also a few minor code changes that can be made to diamlr.F90
to correct some metadata attributes and remove the need for complete xml attributes if a field in the diamlr_fields
field_group
is not enabled.
These changes will also benefit main.