rebuild_nemo restart slow
Context
Rebuilding an eORCA025 (121 levels) on ~2500 sub domain on Irene using the default tool provided took me more than 1h.
Proposal
To speed it up, I dig deep in my back up and I found a mpp version designed by Mireck (UKMO). The principle is very simple:
- We define X readers. Their roles is basically to rebuild the global variables. Variables are spread among the readers.
- We define 1 writer. It role is to write all the global that reader throw at him. Compression and chunking can be activated as in the default tools.
Btw, I :
- removed the key_netcdf4 => this is the only option now.
- put the date stamp as an option => this can allow us to compare checksum of output file between reference output file and tools output when we are doing new development of the tools
- Add an option in rebuild_nemo '-i' to trigger icb_combrest automatically after if input file is an icb restart.
When I do this, I am now able to rebuild my restart in 11 minutes using 16 cores (instead of >1h) and for sea ice restart, I am now at 2min instead of 20 minutes.
For know, the tools is call rebuild_nemo_mpp.f90. I didn't think on how to merge the two as the mpi instructions are quite all over the place and at the end, I think people need both versions: one that can be used on the front end of the HPC (debug, small config output.abort ...) and one for production or rebuild of large files. For ocean/ice sheet coupling, it is quite important as restarts are rebuilt every year as we don't know what will be the decomposition the following year once the isf draft change.
Comments
I didn't try to go further with multiple writer as I am not convince the compression in parallel is working, especially if you don't have a 'collective' access.