review and imprive timing
The timing diagnostics have been review and improved in order to facilitate the use of timing.
NetCDF files are now used to store timing results:
- the timing of step is no more written in timing.output but is directly written in a NetCDF file called
timing_step.nc
. For performance issues, this file is written by chunks ofkeepnc
time steps. The value ofkeepnc
is specified in the call oftiming_start
. We use 1000. As fortiming.output
, this file is written only by the MPI process with rank 0 ifsn_cfctl%l_oceout = .false.
or by all MPI processes ifsn_cfctl%l_oceout = .true.
- the timing of the last
keepnc
time steps of all MPI processes is stored in another NetCDF file calledtiming_ts_allmpi_step.nc
. This file is written only by the MPI process with rank 0 at the end of the run. - these 2 files are written only for the timing of
step
but it could be written for any timed section of the code by using the optional argumentkeepnc
intiming_start
. - the total of the time (net and full), of each MPI task, spent in each part of the model that is timed are written in NetCDF files called
timing_tsum_allmpi_txxx_tyyy.nc
, where xxx and yyy are the time step number of the window used in the timing. These files are written only by the MPI process with rank 0 at the end of the run.
Other minor improvements of the timing:
- the gnuplot script,
timing_gnuplot.sh
, has been adapted to the newtiming_step.nc
and generalized. - the contain of
timing.output
has been slightly improved by adding the statistics for each timing window. - even if
ln_timing = .false.
, we time and give the timing information for step. This timing limited to step is extremely light. A simple ncview of the filestiming_step.nc
ortiming_ts_allmpi_step.nc
could give very usefull information even in production mode...
Other side improvements:
- for benchmarking purpose, we introduced
nn_comm = 0
which suppresses all MPI communications in lbc_lnk (of course, in this case, model results are meaningless from a physical point of view). - In order to reduce the size of
ocean.output
andlayout.dat
when we use a large number of MPI processes, we limited the number of lines printed in these ascii files, to give details of the MPI domain decomposition. A new NetCDF file calledlayout.nc
provides all details informations, including the send and receive neighbours in the 8 directions. - as the finalization of the timing can require large memory when we use a large number of MPI processes, we introduce the routine
nemo_dealloc
which deallocates the arrays allocated bynemo_alloc