-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: #48: vt and magistrate support #49
Draft
nmm0
wants to merge
17
commits into
main
Choose a base branch
from
48-vt-and-magistrate-support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
b3c9958
Magistrate support, switch to Registrations
Matthew-Whitlock db115e3
Updates in magistrate, support manual object registration, aggregate …
Matthew-Whitlock 68cffa2
modify cmake config to make it easier for external libraries to find
nmm0 4073526
#48: resolve some rebase issues
nmm0 eaabc86
#48: fixup after rebase on latest main
nmm0 55701c8
48: move files before rebase
nmm0 fc4e065
WIP VTContext
Matthew-Whitlock 7731779
Update stdfile backend, finished V1 VTContext
Matthew-Whitlock f21d12f
#48: fixup after vt branch rebase
nmm0 5e719fb
#48: fix bug where view hooks test wasn't resetting
nmm0 55a7fcb
#48: clear registered regions when resetting a context
nmm0 a51d0d4
#48: ensure persistent backend state is cleared on context destruction
nmm0 b7982c4
#48: disable stdfile backend until we update it
nmm0 f52a82a
Add async checkpointing, plus Jacobi example
Matthew-Whitlock f888c76
Add non-member deregister
Matthew-Whitlock 56589f7
Small magistrate updates
Matthew-Whitlock 44645a0
Cleaned up Jacobi example
Matthew-Whitlock File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
#include "config.hpp" | ||
#include <vt/transport.h> | ||
|
||
namespace Jacobi { | ||
Config::Config(int argc, char** argv){ | ||
for(int i = 0; i < argc; i++){ | ||
std::string arg = argv[i]; | ||
if( arg == "--decomp"){ | ||
int x = std::stoi(argv[++i]); | ||
int y = std::stoi(argv[++i]); | ||
int z = std::stoi(argv[++i]); | ||
colRange = vt::Index3D(x,y,z); | ||
} else if(arg == "--input"){ | ||
int x = std::stoi(argv[++i]); | ||
int y = std::stoi(argv[++i]); | ||
int z = std::stoi(argv[++i]); | ||
dataRange = vt::Index3D(x,y,z); | ||
} else if(arg == "--max-iters") { | ||
maxIter = std::stoi(argv[++i]); | ||
} else if(arg == "--tolerance") { | ||
tolerance = std::stod(argv[++i]); | ||
} else if(arg == "--async-serialize") { | ||
asyncCheckpoint = true; | ||
} | ||
} | ||
|
||
/* --- Print information about the simulation */ | ||
if(vt::theContext()->getNode() == 0){ | ||
fmt::print( | ||
stdout, "\n - Solve the linear system for the Laplacian with homogeneous Dirichlet" | ||
" on [0, 1] x [0, 1] x [0, 1]\n" | ||
); | ||
fmt::print(" - Second-order centered finite difference\n"); | ||
fmt::print(" - {} elements decomposed onto {} objects.\n", dataRange.toString(), colRange.toString()); | ||
fmt::print(" - Maximum number of iterations {}\n", maxIter); | ||
fmt::print(" - Convergence tolerance {}\n", tolerance); | ||
fmt::print("\n"); | ||
} | ||
} | ||
} | ||
|
||
ResilienceConfig::ResilienceConfig(int argc, char** argv, Jacobi::Config app_cfg){ | ||
for(int i = 0; i < argc; i++){ | ||
std::string arg = argv[i]; | ||
if(arg == "--config") | ||
config_filename = argv[++i]; | ||
else if(arg == "--mode") | ||
context_mode = argv[++i]; | ||
else if(arg == "--freq") | ||
checkpoint_frequency = std::stoi(argv[++i]); | ||
else if(arg == "--kill") | ||
kill_iter = std::stoi(argv[++i]); | ||
else if(arg == "--kill-rank") | ||
kill_rank = std::stoi(argv[++i]); | ||
else if(arg == "--iters-per-phase") | ||
iters_per_phase = std::stoi(argv[++i]); | ||
else if(arg == "--iters-per-epoch") | ||
iters_per_epoch = std::stoi(argv[++i]); | ||
} | ||
|
||
|
||
if(context_mode == "VT") { | ||
if(iters_per_epoch == 0) iters_per_epoch = -1; | ||
context = kr::make_context(vt::theContext(), config_filename); | ||
} else if(context_mode == "MPI"){ | ||
if(iters_per_epoch == 0){ | ||
iters_per_epoch = checkpoint_frequency; | ||
//Can't infer both iters_per_epoch and checkpoint_frequency | ||
assert(checkpoint_frequency != 0); | ||
} | ||
context = kr::make_context(MPI_COMM_WORLD, config_filename); | ||
} else throw std::invalid_argument("Valid --mode values are VT or MPI"); | ||
|
||
std::string freq_str; | ||
if(checkpoint_frequency < 0) { | ||
freq_str = "never"; | ||
checkpoint_filter = [](int iter){ return false; }; | ||
} else if(checkpoint_frequency == 0){ | ||
freq_str = "according to json"; | ||
checkpoint_filter = context->default_filter(); | ||
} else { | ||
freq_str = fmt::format("every {} iterations", checkpoint_frequency); | ||
checkpoint_filter = kr::Filter::NthIterationFilter(checkpoint_frequency); | ||
} | ||
|
||
|
||
if(iters_per_phase < 1) iters_per_phase = app_cfg.maxIter+1; | ||
if(iters_per_epoch < 1) iters_per_epoch = app_cfg.maxIter+1; | ||
|
||
|
||
if(vt::theContext()->getNode() == 0) { | ||
fmt::print("kr:: {} Context configured against {}\n", context_mode, config_filename); | ||
fmt::print("kr:: Checkpointing {}\n", freq_str); | ||
if(kill_iter > 0 && kill_rank > 0){ | ||
fmt::print("Generating failure at iteration {} on rank {}\n", kill_iter, kill_rank); | ||
if(kill_rank >= vt::theContext()->getNumNodes()){ | ||
fmt::print("WARNING: kill_rank {} does not exist!\n", kill_rank); | ||
} | ||
} | ||
|
||
if(iters_per_epoch == -1){ | ||
fmt::print("kr:: instructing app not to bound iterations\n"); | ||
} else { | ||
fmt::print("kr:: instructing app to bound every {} iterations\n", iters_per_epoch); | ||
} | ||
|
||
if(iters_per_phase == -1){ | ||
fmt::print("kr:: instructing app not to use phases\n"); | ||
} else { | ||
fmt::print("kr:: instructing app to phase every {} iterations\n", iters_per_phase); | ||
} | ||
} | ||
} | ||
|
||
void ResilienceConfig::try_kill(int current_iteration){ | ||
if(kill_iter == current_iteration && | ||
kill_rank == vt::theContext()->getNode()){ | ||
fmt::print(stderr, "Rank {} simulating failure on iteration {}\n", | ||
kill_rank, kill_iter); | ||
exit(1); | ||
} | ||
}; |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah just noticed this is wrong, see
KR_ENABLE_EXEC_SPACES
below. I think I messed this up in the rebase somewhere