Reproducibility of the OBS global grid search
Context
In the context of issue #126 (closed) it was found that the observation-operator implementation (OBS) would fail the SETTE reproducibility test if SETTE would compare OBS-specific model output (see !194 (comment 5931)). As a result, !194 (merged) includes a SETTE extension for the comparison of OBS-specific output, which evinces the ORCA2_ICE_OBS
REPRO
failure, but the actual reproducibility failure has yet to be resolved in the corresponding development branch. Further, the same issue has also been found to be present in branch_4.2
.
Analysis
Some associations of observation locations to model grid locations in the vicinity of land-only subdomains (two land-only subdomains are present in the REPRO_8_4
configuration, which, however, are not suppressed) have been found to differ between ORCA2_ICE_OBS
SETTE runs REPRO_4_8
and REPRO_8_4
. As halo exchanges that solely affect land points are suppressed (and by extension exchanges with land-only subdomains), coordinate values in halo regions of arrays glam{t,u,v,f}
and gphi{t,u,v,f}
are unreliable. However, the OBS grid-search algorithm subjects such values to a global reduction operation (calls of subroutine mpp_global_max
in module obs_grid
). Further, the current implementation of the OBS global grid search (option ln_grid_global
) appears to be incompatible with the suppression of land subdomains. In fact, running REPRO_8_4
on 30 instead of 32 processes results in a failure of the model run.
Fix
In branch_4.2
, the global reduction used to build the global coordinate array could give precedence to interior-domain values over halo values to avoid the use of coordinate values from halo regions in the grid search in many cases (although not all), for example with the modification
--- a/src/OCE/OBS/obs_grid.F90
+++ b/src/OCE/OBS/obs_grid.F90
@@ -285,9 +285,21 @@ CONTAINS
zmskg(mig(ji),mjg(jj)) = tmask(ji,jj,1)
END DO
END DO
+ DO jj = 1+nn_hls, jpj-nn_hls
+ DO ji = 1+nn_hls, jpi-nn_hls
+ zlamg(mig(ji),mjg(jj)) = glamt(ji,jj) + 1000000.0_wp
+ zphig(mig(ji),mjg(jj)) = gphit(ji,jj) + 1000000.0_wp
+ zmskg(mig(ji),mjg(jj)) = tmask(ji,jj,1) + 1000000.0_wp
+ END DO
+ END DO
CALL mpp_global_max( zlamg )
CALL mpp_global_max( zphig )
CALL mpp_global_max( zmskg )
+ WHERE( zmskg(:,:) >= 1000000.0_wp )
+ zlamg(:,:) = zlamg(:,:) - 1000000.0_wp
+ zphig(:,:) = zphig(:,:) - 1000000.0_wp
+ zmskg(:,:) = zmskg(:,:) - 1000000.0_wp
+ END WHERE
ELSE
! Add various grids here.
DO jj = 1, jlat
and a corresponding modification in file obs_grd_bruteforce.h90, which resolves the ORCA2_ICE_OBS
reproducibility failure. Further, it is recommended to include a model stop when both OBS is active with the global grid-search option (ln_grid_global
) and land subdomains are suppressed:
--- a/src/OCE/OBS/diaobs.F90
+++ b/src/OCE/OBS/diaobs.F90
@@ -426,6 +426,9 @@ CONTAINS
ENDIF
!
IF( ln_grid_global ) THEN
+ IF( jpnij < jpni * jpnj ) THEN
+ CALL ctl_stop( 'STOP', 'dia_obs_init: ln_grid_global=T is incompatible with suppressed land subdomains' )
+ END IF
CALL ctl_warn( 'dia_obs_init: ln_grid_global=T may cause memory issues when used with a large number of processors' )
ENDIF
!
Corresponding fixes could be applied as part of !194 (merged) to resolve this issue in main
.