Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI JJOB Tests using CMake #3214

Conversation

TerrenceMcGuinness-NOAA
Copy link
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA commented Jan 8, 2025

Description

Adding CI tests at the JJOB level using CMake/ctest wrappers:
These individual JJOB tests have four distinct phases:

  • Setup: Creates a EXPDIR/COMROOT just for the individual JJOB
  • Stage: Moves the specific files into the COMROOT that are needed to run the JJOB specified in ${HOMEgfs}/ci/ctest/cases/{CASE}_{JJOB}.yaml
  • Execute: Run the JJOB in batch (batch card extracted from XML via Rocoto)
  • Validate: Check the outputs also specified in the above yaml configure file (currently stubbed)

Resolves #3204

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? YES
  • Does this change require an update to any of the following submodules? NO (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

Ran with CMake/ctest:

mterry (orion-login-2) ctest (ctest_jjobs_framework) $ mkdir build; cd build

mterry (orion-login-2) build (ctest_jjobs_framework) $ cmake ../.. -DICSDIR_ROOT=/work/noaa/global/glopara/data/ICSDIR -DHPC_ACCOUNT=nems -DSTAGED_TESTS_DIR=/work/noaa/stmp/GFS_CI_ROOT/ORION/STAGED_TESTS_DIR
-- gw: global-workflow baselines will be used from: '/work2/noaa/global/mterry/global-workflow_forked'
-- gw: global-workflow tests will be run at: '/work2/noaa/global/mterry/global-workflow_forked/ctests/build/ctests/RUNTESTS'
-- gw: global-workflow tests will use the allocation: 'nems'
-- gw: global-workflow tests will use ICSDIR_ROOT: '/work/noaa/global/glopara/data/ICSDIR'
-- gw: global-workflow tests will use staged data from:  '/work/noaa/stmp/GFS_CI_ROOT/ORION/STAGED_TESTS_DIR'
-- Build files have been written to: /work2/noaa/global/mterry/global-workflow_forked/ctests/build

mterry (orion-login-2) build (ctest_jjobs_framework) $ ctest -N
Test project /work2/noaa/global/mterry/global-workflow_forked/ctests/build
  Test #1: test_C48_ATM_gfs_fcst_seg0_setup
  Test #2: test_C48_ATM_gfs_fcst_seg0_stage
  Test #3: test_C48_ATM_gfs_fcst_seg0_execute
  Test #4: test_C48_ATM_gfs_fcst_seg0_validate

Total Tests: 4
mterry (orion-login-2) build (ctest_jjobs_framework) $ ctest -L C48_ATM
Test project /work2/noaa/global/mterry/global-workflow_forked/ctests/build
    Start 1: test_C48_ATM_gfs_fcst_seg0_setup
1/4 Test #1: test_C48_ATM_gfs_fcst_seg0_setup ......   Passed    3.25 sec
    Start 2: test_C48_ATM_gfs_fcst_seg0_stage
2/4 Test #2: test_C48_ATM_gfs_fcst_seg0_stage ......   Passed    1.90 sec
    Start 3: test_C48_ATM_gfs_fcst_seg0_execute
3/4 Test #3: test_C48_ATM_gfs_fcst_seg0_execute ....   Passed  2942.51 sec
    Start 4: test_C48_ATM_gfs_fcst_seg0_validate
4/4 Test #4: test_C48_ATM_gfs_fcst_seg0_validate ...   Passed    0.02 sec

100% tests passed, 0 tests failed out of 4

Label Time Summary:
C48_ATM          = 2947.68 sec*proc (4 tests)
gfs_fcst_seg0    = 2947.68 sec*proc (4 tests)

Total Test time (real) = 2947.68 sec

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

Terry McGUinness and others added 30 commits December 6, 2024 10:55
Terry McGuinness and others added 5 commits January 14, 2025 13:20
update last bit
fix misspelled work with create
fixed a grammar in README.md
path to source as HOMEgfs
@WalterKolczynski-NOAA
Copy link
Contributor

Execute test is failing when I try

13:42:21 <ctest_jjobs_framework> build/>ctest --output-on-failure
Test project /work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build
    Start 1: test_C48_ATM_gfs_fcst_seg0_setup
1/4 Test #1: test_C48_ATM_gfs_fcst_seg0_setup ......   Passed    2.05 sec
    Start 2: test_C48_ATM_gfs_fcst_seg0_stage
2/4 Test #2: test_C48_ATM_gfs_fcst_seg0_stage ......   Passed    1.82 sec
    Start 3: test_C48_ATM_gfs_fcst_seg0_execute
3/4 Test #3: test_C48_ATM_gfs_fcst_seg0_execute ....***Failed    0.01 sec
+ TEST_NAME=C48_ATM_gfs_fcst_seg0
+ JOB=gfs_fcst_seg0
+ idate=2021032312
+ rocotoboot_dryrun=/work2/noaa/global/mterry/rocoto_dryrun/bin/rocotoboot
+ CASEDIR=/work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build/ctests/RUNTESTS/EXPDIR/C48_ATM_gfs_fcst_seg0
+ cd /work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build/ctests/RUNTESTS/EXPDIR/C48_ATM_gfs_fcst_seg0
/work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build/ctests/scripts/execute.sh: line 13: cd: /work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build/ctests/RUNTESTS/EXPDIR/C48_ATM_gfs_fcst_seg0: No such file or directory

    Start 4: test_C48_ATM_gfs_fcst_seg0_validate
4/4 Test #4: test_C48_ATM_gfs_fcst_seg0_validate ...   Passed    0.01 sec

75% tests passed, 1 tests failed out of 4

Label Time Summary:
C48_ATM          =   3.89 sec*proc (4 tests)
gfs_fcst_seg0    =   3.89 sec*proc (4 tests)

Total Test time (real) =   3.90 sec

The following tests FAILED:
          3 - test_C48_ATM_gfs_fcst_seg0_execute (Failed)
Errors while running CTest

# Set HPC_ACCOUNT
set_from_env_or_default(HPC_ACCOUNT HPC_ACCOUNT " ")
if (NOT DEFINED HPC_ACCOUNT)
message(FATAL_ERROR "HPC_ACCOUNT must be set")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FATAL_ERROR will stop cmake processing (I think).
The return() after this line will not allow the cmake-ification of the project.

ctests/CMakeLists.txt Outdated Show resolved Hide resolved
@WalterKolczynski-NOAA
Copy link
Contributor

WalterKolczynski-NOAA commented Jan 16, 2025

I'm still getting the same error when I try to run on Hercules:

03:46:38 <ctest_jjobs_framework> pr_3214/>cd ctests/
03:46:44 <ctest_jjobs_framework> ctests/>mkdir build
03:46:47 <ctest_jjobs_framework> ctests/>cd build
03:47:04 <ctest_jjobs_framework> build/>cmake ../.. -DICSDIR_ROOT=/work/noaa/global/glopara/data/ICSDIR -DHPC_ACCOUNT=nems -DSTAGED_TESTS_DIR=/work/noaa/stmp/GFS_CI_ROOT/ORION/STAGED_TESTS_DIR
-- The C compiler identification is GNU 11.3.1
-- The CXX compiler identification is GNU 11.3.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- gw: global-workflow baselines will be used from: '/work2/noaa/global/wkolczyn/save/global-workflow/pr_3214'
-- gw: global-workflow tests will be run at: '/work2/noaa/global/wkolczyn/noscrub/global-workflow'
-- gw: global-workflow tests will use the allocation: 'nems'
-- gw: global-workflow tests will use ICSDIR_ROOT: '/work/noaa/global/glopara/data/ICSDIR'
-- gw: global-workflow tests will use staged data from:  '/work/noaa/stmp/GFS_CI_ROOT/ORION/STAGED_TESTS_DIR'
-- Configuring done
-- Generating done
-- Build files have been written to: /work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build
03:47:10 <ctest_jjobs_framework> build/>ctest --output-on-failure
Test project /work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build
    Start 1: test_C48_ATM_gfs_fcst_seg0_setup
1/4 Test #1: test_C48_ATM_gfs_fcst_seg0_setup ......   Passed    4.88 sec
    Start 2: test_C48_ATM_gfs_fcst_seg0_stage
2/4 Test #2: test_C48_ATM_gfs_fcst_seg0_stage ......   Passed    2.03 sec
    Start 3: test_C48_ATM_gfs_fcst_seg0_execute
3/4 Test #3: test_C48_ATM_gfs_fcst_seg0_execute ....***Failed    0.02 sec
+ TEST_NAME=C48_ATM_gfs_fcst_seg0
+ JOB=gfs_fcst_seg0
+ idate=2021032312
+ rocotoboot_dryrun=/work2/noaa/global/mterry/rocoto_dryrun/bin/rocotoboot
+ CASEDIR=/work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build/ctests/RUNTESTS/EXPDIR/C48_ATM_gfs_fcst_seg0
+ cd /work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build/ctests/RUNTESTS/EXPDIR/C48_ATM_gfs_fcst_seg0
/work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build/ctests/scripts/execute.sh: line 13: cd: /work2/noaa/global/wkolczyn/save/global-workflow/pr_3214/ctests/build/ctests/RUNTESTS/EXPDIR/C48_ATM_gfs_fcst_seg0: No such file or directory

    Start 4: test_C48_ATM_gfs_fcst_seg0_validate
4/4 Test #4: test_C48_ATM_gfs_fcst_seg0_validate ...   Passed    0.02 sec

75% tests passed, 1 tests failed out of 4

Label Time Summary:
C48_ATM          =   6.96 sec*proc (4 tests)
gfs_fcst_seg0    =   6.96 sec*proc (4 tests)

Total Test time (real) =   6.98 sec

The following tests FAILED:
	  3 - test_C48_ATM_gfs_fcst_seg0_execute (Failed)
Errors while running CTest
ERROR 8

RUNTESTS/ is not being created at all

@aerorahul
Copy link
Contributor

@WalterKolczynski-NOAA
Tests for you will fail until we have a central staged directory.
It requires either;

  • A central CM provided baseline staged experiment COM, or
  • User generated baseline from develop
    This PR is not sufficient for users to simply do ctest. More work is needed for staging the appropriate input files for automation to kick in.

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

@WalterKolczynski-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA Tests for you will fail until we have a central staged directory. It requires either;

  • A central CM provided baseline staged experiment COM, or
  • User generated baseline from develop
    This PR is not sufficient for users to simply do ctest. More work is needed for staging the appropriate input files for automation to kick in.

Okay, I ran on the same machine so I was expecting it to stage from Terry's space.

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 01eeb24 into NOAA-EMC:develop Jan 17, 2025
5 checks passed
tsga added a commit to tsga/global-workflow that referenced this pull request Jan 22, 2025
* develop:
  Add echgres as a dependency only for RUN=enkfgdas, not enkfgfs (NOAA-EMC#3246)
  Add domain level to wave gridded COM path (NOAA-EMC#3137)
  CI JJOB Tests using CMake (NOAA-EMC#3214)
  Make assorted updates to waves (NOAA-EMC#3190)
  Move WCOSS2 LD_LIBRARY_PATH patches to load_ufsda_modules.sh (NOAA-EMC#3236)
  Adding a gefs_arch task to GEFS workflow (NOAA-EMC#3211)
  Add additional GEFS variables needed for AI/ML applications  (NOAA-EMC#3221)
  Add bmat task dependency to marine LETKF task (NOAA-EMC#3224)
  Resolve bug with LMOD_TMOD_FIND_FIRST setting affecting build on WCOSS2 (NOAA-EMC#3229)
  Reinstate product groups (NOAA-EMC#3208)
  Additional fixes for downstream jobs (NOAA-EMC#3187)
  Turn IAU off during staging job for cold start experiments (NOAA-EMC#3215)
  Update the gdas.cd hash and enable GDASApp to run on WCOSS2 (NOAA-EMC#3220)
  Update upload-artifact to v4 (NOAA-EMC#3216)
  Prevent duplicate case generation in generate_workflows.sh (NOAA-EMC#3217)
  Update g-w to cycle with C1152 ATM (NOAA-EMC#3206)
  Separate use of initial increment/perturbation file from REPLAY/+03 ICs  (NOAA-EMC#3119)
  Update gsi_enkf hash and gsi_ver (NOAA-EMC#3207)
  Remove cpus-per-task from APRUN_OCNANALECEN on WCOSS2 (NOAA-EMC#3212)
  Remove 5WAVH from AWIPS GRIB2 parm files (NOAA-EMC#3146)
  Remove multi-grid wave support (NOAA-EMC#3188)
  Add echgres as a dependency for earc (NOAA-EMC#3202)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding Functional Tests Framework for JJOBS using CMake ctests
3 participants