Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIGTERM signal handler, for slurm preemption. #1059

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@ COMPFLAGS += -DACC_SEMILAG_PQM -DTRANS_SEMILAG_PPM
#May cause problems
#COMPFLAGS += -DCATCH_FPE

#Add -DCATCH_SIGTERM to make the simulation bail out gracefully on SIGTERM (for
# example on slurm preemption)
COMPFLAGS += -DCATCH_SIGTERM

#Define MESH=VAMR if you want to use adaptive mesh refinement in velocity space
#MESH = VAMR

Expand Down
18 changes: 17 additions & 1 deletion vlasiator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,10 @@
#include "fieldsolver/gridGlue.hpp"
#include "fieldsolver/derivatives.hpp"

#include <signal.h>

#ifdef CATCH_FPE
#include <fenv.h>
#include <signal.h>
/*! Function used to abort the program upon detecting a floating point exception. Which exceptions are caught is defined using the function feenableexcept.
*/
void fpehandler(int sig_num)
Expand All @@ -81,6 +82,17 @@ bool globalflags::balanceLoad = 0;
bool globalflags::doRefine=0;
bool globalflags::ionosphereJustSolved = false;

#ifdef CATCH_SIGTERM
// The normal behaviour on SIGTERM is to simply abort the simulation in place.
// This implementation instead attempts to write a restart file and then quit,
// to work nicely with slurm's job preemption mechanism.
void termhandler(int sig_num) {
logFile << "Caught SIGTERM. Initiating bailout." << endl << flush;
globalflags::bailingOut = 1;
globalflags::writeRestart = 1;
}
#endif

ObjectWrapper objectWrapper;

void addTimedBarrier(string name){
Expand Down Expand Up @@ -311,6 +323,10 @@ int simulate(int argn,char* args[]) {
signal(SIGFPE, fpehandler);
#endif

#ifdef CATCH_SIGTERM
signal(SIGTERM, termhandler);
#endif

// Initialize memory allocator configuration.
memory_configurator();

Expand Down
Loading