-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reservoir coupling: Spawn slave processes from master process #5453
base: master
Are you sure you want to change the base?
Conversation
a231330
to
1167c9d
Compare
c7de9b0
to
6b84a2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't even know that we can spawn addtional processes. Cool. so we could run flow in parallel without mpirun if the data file tells us to.
There are a few things that I am wonderinfg about:
- What happens if I start Master/Slave with
mpriun -np N flow
and N>1 - Can I compile this without MPI
- On a cluster with a queuing system (e.g. the NORCE cluster?), will the admin allow spawning at all or is there a limit for the number of processes?. Usually you have allocated some nodes and they have some hardware limits. So there might even be a natural limit (by RAM) for the number of processes. Or things might slow down if too many of those land on one node.
Having said that, I think this is a beautiful approach and in any case we will use communicators and intercommunicators like here in the end. Let's see whether we spawn or use some other means in the end. I tink that can be changed easily later.
// If first command line argument is "--slave-log-file=<filename>", | ||
// then redirect stdout and stderr to the specified file. | ||
if (this->argc_ >= 2) { | ||
std::string_view arg = this->argv_[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be handled later, but having an assumption that a parameter is at a special position seems to be a really strong/unrealistic one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption that if the first argument is --slave-log-file
we can assume that this is a reservoir coupling slave, seems to be quite safe to me since a reservoir coupling slave can only be started by a master process. If this is not a slave process the user may still put --slave-log-file
as the first argument by accident but that is not something that should happen very often.
const auto& rescoup = this->schedule_[0].rescoup(); | ||
char *flow_program_name = argv[0]; | ||
for (const auto& [slave_name, slave] : rescoup.slaves()) { | ||
auto master_slave_comm = MPI_Comm_Ptr(new MPI_Comm(MPI_COMM_NULL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to able to compile flow without MPI, too. Hence I think we will need some #if HAVE_MPI
somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have added #if HAVE_MPI
preprocessor directives in SimulatorFullyImplicitBlackoil.hpp
that should prevent inclusion of the reservoir coupling code if we compile with -DUSE_MPI=NO
} | ||
} | ||
slave_argv[argc] = const_cast<char *>("--slave=true"); | ||
slave_argv[argc+1] = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious: What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument --slave=true
is passed to flow such that it knows that it is a slave. It is used for example in SimulatorFullyImplicit.hpp
like this auto slave_mode = Parameters::Get<Parameters::Slave>();
{ | ||
"SLAVES", | ||
{ | ||
{1,{true, [](const std::string& val){ return val.size()<=8;}, "SLAVES(SLAVE_RESERVOIR): Only names of slave reservoirs of up to 8 characters are supported."}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check with the engineers whether they really only use 8 or reky on truncation and use the rest for further documentation. I think the commercial simulator just truncates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This discussion has been moved to #5436 (comment)
{ | ||
OPM_THROW(std::logic_error, | ||
fmt::format("Inconsistent SCHEDULE section: {}", message)); | ||
} | ||
|
||
void checkScheduleKeywordConsistency(const Opm::Schedule& schedule) | ||
{ | ||
const auto& final_state = schedule.back(); | ||
const auto& rescoup = final_state.rescoup(); | ||
if (rescoup.slaveCount() > 0 && rescoup.masterGroupCount() == 0) { | ||
inconsistentScheduleError("SLAVES keyword without GRUPMAST keyword"); | ||
} | ||
if (rescoup.slaveCount() == 0 && rescoup.masterGroupCount() > 0) { | ||
inconsistentScheduleError("GRUPMAST keyword without SLAVES keyword"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we rather do this check during parsing in opm-common? Then we might even be able to provide file locations.
I have to admit that I do not understand 100% what we are checking. What happens if we have SLAVES and GRUPMAST but at differen dates/steps?
@blattms I think that should work as usual. The master spawns N sub process, but only one of those will spawn the slaves. |
@blattms Good point. I do not have access to the NORCE cluster yet, I will try to test this next week. Currently, I have only tested on my own computer. |
6b84a2d
to
bf103dc
Compare
Rebased |
Do not specify slave program name twice when launching slave process
Open MPI does not support output redirection for spawned child processes.
Copy command line parameters from master to slave command line. Also replace data file name in master argv with data file name of the slave.
Remove debug code that was introduced by mistake in the previous commit
Create one log file for each slave subprocess. Redirect both stdout and stderr to this file
Exclude the reservoir coupling stuff if MPI is not enabled
bf103dc
to
7eb07e4
Compare
@blattms Unfortunately, the NORCE compute machines did not have a job scheduler that I could test this out with. Though, the documentation for e.g. SLURM seems to indicate that it should be possible to allocate resources for |
Builds on #5443. Depends on upstream OPM/opm-common#4123.
This is work in progress, so I am putting this in draft mode for now.