Implement tiling in RK3 and expand tiling to `wzv`/`wAimp` in MLF
Main changes
-
Slightly increased tiling coverage in MLF to include calls to
wzv
andwAimp
.The 2nd call to
wAimp
(afterdyn_spg
) could not be included in the tiling loop without changing results (due to tiling overlap affectingwi
). -
Implemented tiling in RK3, with coverage similar to that of tiling in MLF (
stpmlf.F90
).- 2 loops in
stprk3.F90
- 1 loop in
stp2d.F90
(2 if AGRIF is used) - 3 loops in
stprk3_stg.F90
(4 if AGRIF is used)
SWE has not been updated.
Supporting changes
-
Moved
tra_adv_cen
/tra_adv_mus
/tra_adv_qck*
/tra_adv_ubs
subroutines into a wrapper subroutinepU
/pV
/pW
may now be of sizeDIMENSION(T2D(2))
(tile-sized) if using MLF, orDIMENSION(jpi,jpj,jpk)
(MPI domain-sized) if using RK3. As for other subroutines with varying input array argument shapes (e.g. in eosbn2.F90), a wrapper subroutine is used to determine the shape of the input array and the correct starting indices, then pass this information as additional arguments to the main subroutine. -
Changed
pvtr
declaration india_ptr
anddia_ptr_zint
subroutines fromDIMENSION(T2D(nn_hls))
toDIMENSION(:,:,:)
This is for the same reason as the previous change.
pvtr
is only used forIF( PRESENT(pvtr) )
, so a wrapper function isn't necessary. -
Replaced re-use of
Cu_adv
with local working arrays inwAimp
When the argument to
iom_put
is an MPI domain-sized array (DIMENSION(A2D(x))
) and tiling is active, data will not be sent to XIOS until the final tile (at this point all tiles have finished working on the array). The re-use ofCu_adv
for multiple diagnostics means that by the time this data is sent to XIOS, it will be largely incorrect for mostwAimp
diagnostics.The working arrays replace this re-use of
Cu_adv
. They are tile-sized (DIMENSION(T2D(x))
) working arrays, so data is sent byiom_put
for each individual tile. This fix is also needed for the MLF version ofwAimp
.The
wimp
diagnostic requires a similar fix for the tiling.wAimp
is called twice per timestep, but the diagnostics should only usewi
data from the first call (see related bug fix in the next section). However by the timewi
is sent to XIOS on the final tile, the other tiles will have made two calls towAimp
andwi
will contain largely incorrect data.Again, the fix is to use a local working array to output the diagnostic. This specific fix is only needed for RK3, as the two
wAimp
calls are in separate tiling loops in MLF and there is no overlap issue. However XIOS requires consistent array sizes for thewimp
diagnostic, so a local working array is used for this diagnostic in the MLF version ofwAimp
as well. -
Reduced size of
Ue_rhs
andVe_rhs
arrays fromA2D(2)
toA2D(0)
This is to avoid tiling overlap. The size of the
zu_frc
andzv_frc
arrays indyn_spg_ts
have been reduced accordingly.
- 2 loops in
-
Reduced loop bounds of calculations in
wzv
/wAimp
/div_hor
(and called subroutines) from(1,2,1,2)
to(1,1,1,1)
Supporting changes
-
Reduced size of
hdiv
array fromA2D(2)
toA2D(1)
-
Reduced loop bounds of
ssh(Kaa)
calculation inssh_nxt
from(1,2,1,2)
to(0,0,0,0)
Since the calculation of
hdiv
is now only performed on the 1st halo points, anlbc_lnk
call is needed to set all halo points forssh(Kaa)
, so no explicit halo calculations need to be performed. -
Reduced size of several ISF arrays from
A2D(2)
toA2D(1)
-
Restored the wrapper function around
isf_tbl_avg
This was commented out in 542a46a4, but is still needed in order to avoid using
A2D(0)
in array arguments, which triggers temporary array copies:- CALL isf_tbl_avg( misfkt_cav, misfkb_cav, rhisf_tbl_cav, rfrac_tbl_cav, ze3, ts(A2D(0),:,jp_sal,Kmm), zstbl ) + CALL isf_tbl_avg( misfkt_cav, misfkb_cav, rhisf_tbl_cav, rfrac_tbl_cav, ze3, ts(:,:,:,jp_sal,Kmm), zstbl )
-
Reduced size of
h_rnf
andnk_rnf
arrays fromA2D(2)
toA2D(1)
-
-
Removed
DO_.D_OVR
macrosThe functionality of these macros has been replaced by the
dom_tile_copy*
functions- see #151 (closed) for more information. At present, no calls to these new functions are required.
Other changes
Bug fixes
-
Corrected some conditionals involving tiling parameters
-
Fixed workaround for using
tra_adv_fct
with tilingThis was broken by 4973adb5-
ll_dofct
is needed in thetra_adv
conditional:- IF( l_istiled .AND. nadv == np_FCT ) THEN + IF( l_istiled .AND. nadv == np_FCT .AND. ll_dofct ) THEN
-
Fixed
iom_put
being called forwAimp
diagnostics multiple times per timestepwAimp
can be called twice per timestep for MLF and RK3, soiom_put
can be called twice per timestep for eachwAimp
diagnostic. In this situation, XIOS will ignore the 2nd call and use data from the 1st call. Since this is the intended source of data for thewAimp
diagnostics, this doesn't really cause an issue. However, this behaviour of XIOS is not well known and can lead to unexpected changes in results (e.g. issue 3 in #379 (closed)).An optional argument (
lddiag
) has been added to the MLF version ofwAimp
to control diagnostic output- this is set to true for the first call. For the RK3 version, diagnostic output is enabled whenk_ind == np_velocity
. -
Remove workarounds for tiling in
bdy_dyn3d_dmp
andbdy_tra_dmp
These workarounds (
IF( l_istiled .AND. ntile /= 1 ) RETURN
) were not tested at the time and do not work.A new function (
in_hdom
) has been added todomutl.F90
, which returns true if the given indices are within the currently active tile. This function is used to ensure that damping is only applied to boundary points within the current tile.DO ib = 1, idx_bdy(ib_bdy)%nblen(igrd) ii = idx_bdy(ib_bdy)%nbi(ib,igrd) ij = idx_bdy(ib_bdy)%nbj(ib,igrd) zwgt = idx_bdy(ib_bdy)%nbd(ib,igrd) + IF( in_hdom(ii, ij, khls=0) ) THEN DO ik = 1, jpkm1 zta = zwgt * ( dta_bdy(ib_bdy)%tem(ib,ik) - pts(ii,ij,ik,jp_tem,Kbb) ) * tmask(ii,ij,ik) zsa = zwgt * ( dta_bdy(ib_bdy)%sal(ib,ik) - pts(ii,ij,ik,jp_sal,Kbb) ) * tmask(ii,ij,ik) pts(ii,ij,ik,jp_tem,Krhs) = pts(ii,ij,ik,jp_tem,Krhs) + zta pts(ii,ij,ik,jp_sal,Krhs) = pts(ii,ij,ik,jp_sal,Krhs) + zsa END DO + ENDIF END DO
This is admittedly redundant when the tiling is not used.
-
Restored output of
dia_ar5
diagnostics in RK3 (Sibylle) -
Fixed output of
uadv_heattr
/vadv_heattr
/uadv_salttr
/vadv_salttr
diagnostics (Sibylle)As for
dia_ptr_hst
in #379 (closed),dia_ar5_hst
is called for each RK3 stage. A similar approach (use of anl_diaar5
conditional to control diagnostic output) is taken here. -
Various other minor corrections
Minor changes
-
Added a
ctl_stop
totra_adv_fct
when using tiling (as a precaution) -
Fused several
DO jk & DO_2D
loops together, e.g.- DO jk = 1, jpkm1 ! At all stages : Add the Stokes Drift - DO_2D( nn_hls, nn_hls-1, nn_hls, nn_hls-1) - pFu(ji,jj,jk) = pFu(ji,jj,jk) + e2u(ji,jj) * e3u(ji,jj,jk,Kmm) * usd(ji,jj,jk) - pFv(ji,jj,jk) = pFv(ji,jj,jk) + e1v(ji,jj) * e3v(ji,jj,jk,Kmm) * vsd(ji,jj,jk) - END_2D - END DO + DO_3D( nn_hls, nn_hls-1, nn_hls, nn_hls-1, 1, jpkm1 ) ! At all stages : Add the Stokes Drift + pFu(ji,jj,jk) = pFu(ji,jj,jk) + e2u(ji,jj) * e3u(ji,jj,jk,Kmm) * usd(ji,jj,jk) + pFv(ji,jj,jk) = pFv(ji,jj,jk) + e1v(ji,jj) * e3v(ji,jj,jk,Kmm) * vsd(ji,jj,jk) + END_3D
-
Various other minor optimisations