-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using files from other jobs #94
Comments
Thanks for raising this issue specifically for jobflow-remote. I do believe that files handling is a big limitation in jobflow and all related packages and I agree that we need to find a solution. The main issue with the solution proposed is that, among the currently available managers for jobflow, jobflow-remote would be the only one to be able to handle the The concept of the implementation is nice. I don't think I have much to add about it, but I will keep thinking. A set of notes in random order:
I will reply specifically on the last part about changing the Response to pass files. We have been advocating for the implementation of a specific Store for files, and I believe such a Store would be crucial to deal with large outputs (see e.g. materialsproject/atomate2#515 (comment)). Even though can be probably used, Maggma stores are not really suitable for this purpose. I am not sure if such a kind of store would better fit in maggma or directly in jobflow. I need to open an issue on maggma to discuss this. |
It is not uncommon that workflows (in
atomate2
or otherwise) need to make use of files created by other jobs, without the data from those files being explicitly stored/serialized. In cases where the file does not map to a pymatgen object (e.g., an arbitrary force field model that has been trained), this is especially awkward.I would like to handle this in a framework-agnostic way by allowing the user to indicate that a given job depends on the context of another job. At the moment the workflow creator would have to explicitly write a file to an external directory and then access it via an absolute path in the future workflow, but obviously in the case of jobflow-remote those directories may not be on the same machine (and it is good hygiene that a jobflow-remote job only writes to that job's work_dir).
I propose a system where a job can be started directly from a copy of the directory of another job.
For example,
This would trigger jobflow-remote to copy all of the contents of workdir of parent into the new workdir of child before submitting the job. When they are being executed on the same worker this transfer is just done directly on the remote, otherwise the copy will need two stages from remote to runner and then runner to remote. Of course this also creates some duplication of data, but it also allows child to potentially (in this case) refit the model and save it again, and think this flexibility might be useful in many cases.
This could then be specialized in the future to explicitly enumerate the files that are required by the child job, e.g.,
with additional checks that
model.pth
was successfully created byparent
(and not deleted on cleanup) during its execution.I think this would need to be implemented as an extension of the
@job
decorator, that hopefully we could then add upstream to jobflow. This could then be used directly for atomate2 workflows where this shared filesystem is required. There could be many cases that benefit from this approach, for example restarting from checkpoints from DFT codes natively (I actually don't know how this is handled at the moment).An alternative approach would be to extend the jobflow
Response
object to have a special section for files, with an associated store, but I am not sure maggma is flexible enough for this (and not sure if it would in-scope for them, so would require significant new implementation in jobflow itself).Would be interested to hear people's thoughts! I will play around with implementation of this and can try to make a simple demo.
The text was updated successfully, but these errors were encountered: