You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes, there's a chance test-tube will try to create an experiment version which already exists. Need to add a small delay to avoid the race condition.
The text was updated successfully, but these errors were encountered:
williamFalcon
changed the title
Experiment version runtime error when using slurm
Experiment version race condition error when using slurm
Nov 30, 2018
I ran into this same problem. The workaround I found is to set the Experiment.version attribute to the value of the --hpc_exp_number argument that gets passed to the script when it's called from SlurmCluster.optimize_parallel_cluster_gpu(). Since the next_trial_version is read from a single process before the sbatch scripts are enqueued to run in parallel, it won't hit the race condition.
There's probably a better way that handles this automatically, but in the meantime this is the solution I found. I'll open a PR if I find a better way to do it. What do you think @williamFalcon?
Sometimes, there's a chance test-tube will try to create an experiment version which already exists. Need to add a small delay to avoid the race condition.
The text was updated successfully, but these errors were encountered: