Releases: DataBiosphere/toil
7.0.0
What's Changed
- Respect job local-ness when chaining by @adamnovak in #4809
- Fix Python 3.8 support by @adamnovak in #4823
- Fix missing description on PyPI by @mr-c in #4820
- Install build module for CI by @stxue1 in #4826
- Use a sentinel location instead of an unmodified location to mark missing files by @adamnovak in #4818
- Bump mypy from 1.8.0 to 1.9.0 by @dependabot in #4830
- Make sure output directory exists before using it by @adamnovak in #4832
- Pass through debugged job status code to prevent infinite loop by @stxue1 in #4829
- Add tests for environment pickling by @adamnovak in #4837
- Add colored logging by @stxue1 in #4828
- Remove unused CI test by @stxue1 in #4843
- Measure CPU and memory usage in WDL Docker containers by @adamnovak in #4819
- Allow debugging jobs by name (and status improvements) by @adamnovak in #4840
- Improve exception handling to not output tracebacks by @stxue1 in #4839
- Update pytest-cov requirement from <5,>=2.12.1 to >=2.12.1,<6 by @dependabot in #4851
- Update docutils requirement from <0.21,>=0.16 to >=0.16,<0.22 by @dependabot in #4866
- Update galaxy-util requirement from <23 to <25 by @dependabot in #4862
- Update galaxy-tool-util requirement from <23 to <25 by @dependabot in #4861
- Bump cwltool from 3.1.20240112164112 to 3.1.20240404144621 by @dependabot in #4870
- Bump gunicorn from 21.2.0 to 22.0.0 by @dependabot in #4871
- Retry Slurm interactions more by @adamnovak in #4869
- Replace use of boto with boto3 for
awsProvisioner.py
by @stxue1 in #4859 - Allow fetching job inputs for debugging by @adamnovak in #4848
- Make leader wait for expected updates to be visible in the job store, or fail the job by @adamnovak in #4811
- Enable FUSE for privileged Toil clusters by @stxue1 in #4824
- Detect if the GridEngine worker thread has crashed to prevent hanging the workflow by @stxue1 in #4873
- Bump mypy from 1.9.0 to 1.10.0 by @dependabot in #4878
- Support caching on SLURM by @stxue1 in #4884
- Add debug logging for single machine batchsystem to signal worker issue and startup by @stxue1 in #4881
- Update WDL conformance tests on CI by @stxue1 in #4876
- Replace all usage of boto2 with boto3 by @stxue1 in #4868
- Revert ensurepip to get-pip to fix Python 3.10 ARM CI appliance builds by @stxue1 in #4900
- docs cleanup by @mr-c in #4889
- Bump to a new major version by @adamnovak in #4885
- Warn user about wait times for stats gathering with a large quantity of jobs. by @DailyDreaming in #4893
- Allow symlinks to inputs as WDL outputs by @adamnovak in #4883
- bye pytz by @mr-c in #4890
- Stop suggesting infinity when validating half-open intervals by @adamnovak in #4887
- Fix WDL option spelling and tolerate Cromwell-isms by @adamnovak in #4906
- Remove wrapped CWL doc example. by @DailyDreaming in #4892
- Add retries to DockerCheckTest.testBadGoogleRepo by @stxue1 in #4909
- Fix 3.8 backport.timezone import by @stxue1 in #4908
- Update to Python 3.12 by @stxue1 in #4901
- Bump flask-cors from 4.0.0 to 4.0.1 by @dependabot in #4916
- Try /tmp before the workdir for the Toil coordination directory by @stxue1 in #4914
- CWL biocontainer tests: use version corresponding to v2 Docker Image Format by @mr-c in #4912
- Revert "Update to Python 3.12" by @DailyDreaming in #4917
- Bump miniwdl from 1.11.1 to 1.12.0 by @dependabot in #4920
- Support Python 3.12 by @stxue1 in #4919
- Add documentation for installing batch system plugins by @stxue1 in #4926
- Update Werkzeug to appease the Github security police by @adamnovak in #4925
- Revert "Update Werkzeug to appease the Github security police" by @DailyDreaming in #4928
- Bump cwltool from 3.1.20240404144621 to 3.1.20240508115724 by @dependabot in #4936
- Add batchsystem plugin test by @stxue1 in #4933
- Fix bad test paths. by @DailyDreaming in #4938
- Add better logic in finding a temp directory for the Toil coordination directory by @stxue1 in #4918
- Add supported workflow language versions to README by @adamnovak in #4923
6.1.0
Highlighted Features Added
- WDL and CWL task standard output and standard error logs that are not captured by the workflow will now be logged at INFO level and stored in the
--writeLogs
/--writeLogsGzip
directory. (#4657) - Use a default log limit of 100MiB (#4788)
Breaking Changes
- Stats and logging system again uses job display name (#4755)
--disableProgress
is once again a flag that doesn't take an argument (#4758)
CWL
- Don't clear out user-provided values for the --default-container option (#4730)
WDL
- WDL job names now include numbers for scatters (#4755)
- Multi-line WDL placeholder substitutions no longer interfere with de-indenting WDL command blocks (chanzuckerberg/miniwdl#665)
- Standard error for failed tasks is now always logged to the worker log somewhere (#4781)
Kubernetes
Dependencies
- Deps: removed the ruaml.yaml.string plugin dependency for a simpler solution (#4760)
Misc
- Toil will no longer warn about a missing XDG_RUNTIME_DIR (#4769)
- Read the Docs and CI docs builds should have Graphviz installed (pending CI image rebuild) (#4734)
- Add more Python3.12 compatibility by replacing the one function from distutils that we use,
strtobool()
. (#4765) - Set default cache folders to be accessible between toil-wdl-runner workflows (Same as MiniWDL/Singularity defaults) (#4761)
- Set toil-wdl-runner cache folders on Toil managed clusters to be at
/var/lib/toil
(#4761) - Fall back to assuming machine has 1 core when CPU count is unavailable. (#4545)
FileJobStore
now supports filenames that get modified when percent-encoded (#4779)
Thank you to our contributors:
@DailyDreaming @mr-c @stxue1 @adamnovak @app/dependabot
Full Changelog: releases/6.0.0...releases/6.1.0
6.0.0
NOTE!
We now have a config file! https://toil.readthedocs.io/en/latest/running/cliOptions.html#the-config-file
Breaking Changes
- Removed the parasol batch system
- Removed the TES batch system (this is now a plugin)
- Removed our WDL compiler in favor of an interpreter (we still support WDL, we just do it differently now)
- We no longer support python3.7
CWL
- Support CWL 1.2.1 (#4682)
- CWL Pipefish compatibility (#4636)
- Support per-task preemptibility in CWL (#4551)
- Fix configargparse in CWL (#4618)
- cwl: use the latest commit from the proposed CWL v1.2.1 branch (#4565)
- Upgrade cwltool to avoid broken galaxy-tool-util release. (#4639)
- Implement a better config file system for CWL/WDL options (#4666)
- Allow working with remote files in CWL and WDL workflows (#4690)
- Make cwl mutually exclusive groups exist only when cwl is not suppressed (#4725)
- Log more usefully for CWL workflows (#4736)
WDL
- Simplify WDL Toil job graphs (#4524)
- More WDL and Slurm documentation (#4558)
- Improve WDL documentation (#4732)
- Add String to File functionality into toil-wdl-runner (#4589)
- Run WDL output through Toil export system to support URIs (#4579)
- Allow the WDL output section to reference itself (#4592)
- Ensure sibling files in toil-wdl-runner (#4610)
- Make WDLOutputJob collect all task outputs (#4602)
- Report errors in WDL using MiniWDL's error location printer (#4637)
- Remove the WDL compiler. (#4679)
- Implement a better config file system for CWL/WDL options (#4666)
- Allow working with remote files in CWL and WDL workflows (#4690)
- Strip leading whitespace from WDL commands (#4720)
Misc
- Add config file support (#4569)
- Support Python3.11 and drop Python 3.7 (#4646)
- Move TES batch system to a plugin (#4650)
- Turn batch system tests back on (#4649)
- Separate out integration tests to run on a schedule (#4612)
- Avoid concurrent modification in cluster scaler tests (#4600)
- Remove old buckets from AWS (#4588)
- Tests: only request a single core (#4572)
- Reduce the number of assert statements (#4590)
- take any nvidia-smi exception as not having gpu (#4611)
- More resiliancy (#4395)
- Remove useage of the deprecated pkg_resources (#4701)
- Make sure cwltool always knows we have an outdir to fix #4698 (#4699)
- AWS jobStoreTest: re-use delete_s3_bucket from toil.lib.aws (#4700)
- Only count output file usage when using the file store (#4692)
- Remove the parasol batch system. (#4678)
- Move around reqs and move aws dev libraries to aws (#4664)
- Make sure the
--batchLogsDir
exists if it is set (#4635) - Update EC2 instances and EC2 update script. (#4745)
- remove extraneous dependency on old 'mock' (#4739)
- Point CI at the new public URLs for stuff we host
- Add init.py to options folder (#4723)
Bug Fixes
- Lower redirect log level to fix #4526 (#4578)
- Fix mypy from being broken by new boto types (#4577)
- Fix CI on local Gitlab runners (#4571)
- Banish ghost jobs (#4563)
- Stop deleting chained-to jobs which fail as orphaned jobs (#4557)
- Fix pickling error when jobstate file doesnt exist and fix threading error when lock file exists then disappears (#4575)
- Fix #3867 and try to explain but not crash when bad things happen to our mutex file (#4656)
- Fix CI Appliance Builds (#4655)
- Tolerate a failed AMI polling attempt (#4727)* Add pure Python fallback for getDirSizeRecursively() (#4753)
- Don't mark inputs (or outputs) executable for no reason (#4728)
- Fix scheduled CI tests (#4742)
- Fix --printJobInfo (#4709)
Thank you to our contributors: @stxue1 , @w-gao, @DailyDreaming , @mr-c , @adamnovak , @glennhickey, @misterbrandonwalker, and @a-detiste !
5.12.0
WDL
- Virtualize filenames as in-container paths from point of view of WDL command (#4527)
- Add WDL conformance tests to CI (#4530)
- Use less memory in the Giraffe WDL test (#4541)
Version Upgrades
- Upgrade to cwltool 3.1.20230601100705 (#4500)
- Update mock requirement from <5,>=4.0.3 to >=4.0.3,<6 (#4366)
Misc
- Anonymous access to Google Storage (#4518)
- Reorder config so that default settings are applied first (#4528)
- Add a way to forward accelerators to Docker containers (#4492)
Bug Fixes
- Fix test failures without docker installed (#4544)
- Prevent certain tests from being run twice in CI (#4529)
- Drop external Docker builder (#4523)
- Fix CI lint test (#4533)
- Grab AWS group policies on top of user (#4505)
- Grab accelerator set off the end of the list instead of by index (#4506)
- Fix RtD build (#4491)
- Include tests (#4499)
Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , and @tjni !
5.11.0
Breaking Changes
- Imported files will be symlinked by default, unless the user sets
--noLinkImports
or the workflow imports withsymlink=False
. (#3949)
WDL
- Toil will now stop if it encounters an error polling a possible import URL for a WDL workflow input file. (#4479)
- WDL workflows will be protected against imported files with no basenames. (#4477)
Misc
- Toil batch system ID numbers for issued jobs now start at 1. (#4482)
- Attempts to import files from URLs when the implementing job store is missing an extra are now better reported. (#4479)
- Include tests in the source distribution that gets published to PyPI (#4499)
Bug Fixes
- Toil should no longer crash when a delete wins a race against a load in
FileJobStore
(#4484) - Prevent local root jobs (such as WDLRootJob) from being run twice. (#4482)
- Slurm and other grid batch system jobs will now have more informative names (#4472)
- WDL workflows can no longer import
""
as a File. (#4477)
Thank you to our contributors: @stxue1, @DailyDreaming, @mr-c, @adamnovak
5.10.0
Changelog
Highlighted Features Added
- Add a
--caching
option which explicitly states whether to use caching with a workflow. Uses a default value depending on whether or not we are using the file job store if not specified. (#4218) - New prototype WDL runner
python -m toil.wdl.wdltoil
using MiniWDL (#3468) - MiniWDL-based WDL implementation can now run the vg Giraffe WDL workflow ( #4353)
- Toil now tests against our own tiny set of WDL conformance tests (#4351)
- Toil can run the HPRC assembly WDL workflows (#4435)
- Toil can now use Mesos roles (#4455)
Breaking Changes
- Replace "preemptable" with "preemptible", add example of using --defaultPreemptible flag to Preemptibility documentation (#1951)
CWL
- CWL: run all ExpressionTools on the Leader node, instead of submitting separate jobs (#4157)
Kubernetes
- Kubernetes batch system: Delete jobs individually when batch delete fails (#3403)
- Documentation for running a Toil leader for a Kubernetes workflow outside Kubernetes now covers examples and common problems for running CWL workflows (document toil-cwl-runner + "Running the Leader Outside Kubernetes" #3422)
- Kubernetes batch system: support
--maxCores
,--maxDisk
, and--maxMemory
(#2864) - Add tutorial for Kubernetes launch cluster (#3743)
Dependencies
- Require htcondor 10 exactly (#4315)
- Toil jobs now have a
local
parameter which determines if they should run on the leader. (#4388)
Misc
- The offline tests can now be run in parallel (#3493)
- Code updated to be more idiomatic for Python3.7 (#4295)
- Support for a
--network
fortoil launch-cluster
for Google cloud (#4196) - Support for a
--use_private_ip
fortoil launch-cluster
to dial nodes by private IP instead of public IP (#4196) - GPU scheduling should now be supported on Slurm (#4308)
- Toil now supports a
--batchLogsDir
option andTOIL_BATCH_LOGS_DIR
environment variable, to provide a directory other than the work dir where Toil will instruct HPC batch systems to save their captured job logs. htcondor
batch system should now work again, and will retry connections- Updated the --coalesceStatusCalls help documentation to reflect the current state of #4431 (#4437)
- Toil no longer trusts XDG_RUNTIME_DIR under Slurm (fixes some of the issues behind #4395 when Slurm is configured not to follow the XDG spec) (#4435)
- Toil now puts it lock files for Singularity cache directories for WDL in those directories (#4435)
- Toil's WDL interpreter can now use local-to-the-leader jobs for evaluating WDL code that doesn't need appreciable resources (#4388)
- Toil now tolerates more possible exceptions related to the panasas network file system (#4440)
- Type hinting to functions in resource.py (#938)
- Added return type to inVirtualEnv() in
__init__.py
(#938) - Added None checks to some function bodies (#938)
Bug Fixes
- Stop crashing when predefined batch job exit reasons are used and need to go into the message bus log file (#4321)
- Added
import subprocess
to restore the behavior of #588. (#4429) - Toil will no longer use the stored message bus path from an old execution of a workflow when deciding where to save the message bus log when restarting a workflow (#4438)
- Fix --custom-net mutual exclusivity bug. (#4458)
Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , @jfennick , @misterbrandonwalker , @w-gao , @stephanaime , @glennhickey , @Hexotical , @manabuishii @gmloose , @boukn , and @thiagogenez !
5.9.2
Changelog
Bug Fixes
- Change build tag import (#4329)
Thank you to our contributors: @adamnovak , @Hexotical !
5.9.0
Changelog
Bug Fixes
- Fix --provisioner and --metrics together (#4328)
- Ignore incorrect type hint from boto3, remove json.loads (#4330)
- Warn about missing --bypass-file-store with in-place update (#4337)
- Replace prepareHTSubmission with prepareSubmission in HTCondor (#4319)
- Merge "Google fixes" (#4293)
- Support (only) current htcondor (#4320)
- Delete k8s jobs individually when batch delete fails (#4306)
Misc
- Update aws spot documentation (#4310)
- Enable parallel testing (#3493)
- Add documentation for running CWL workflows on non-Toil-managed Kubernetes clusters (#4332)
- Export all slurm args by default (#4237)
- Allow for subclasses of base types in messages (#4322)
- Non cache default (#4299)
Dependencies
- Bump mypy from 0.982 to 0.991 (#4345)
- Bump schema-salad>=8.4.20230128170514,<9 to schema-salad>=8.3.20220913105718,<8.4 (#4342) (#4341)
- Bump cwltool from 3.1.20221008225030 to 3.1.20221201130942 (#4338)
- Bump pyupgrade to 3.7 (#4295)
Thank you to our contributors: @adamnovak , @Hexotical , @w-gao, @mr-c , @gmloose , @boukn , and @thiagogenez !
5.8.0
Changelog
Highlighted Features Added
- Toil server now exposes workflow tasks via WES (#4046).
- Toil server now has a
--wes_dialect agc
option that will hide any tasks that don't have Amazon Batch job IDs, and put the IDs in the task names for those that do (#4047). - Toil jobs now accept an
accelerators
requirement, likeaccelerators=1
oraccelerators={'kind': 'gpu', 'brand': 'nvidia', 'count': 2}
(#4163) - Include total requested cores for each job type in
toil stats
(#4173) - Toil jobs now expose
job.accelerators
to workflow - Add prefix suffix params to
AbstractFileStore.getLocalTempFile
andAbstractFileStore.getLocalTempFileName
(#4273) - CWL:
--no-compute-checksum
,--strict-cpu-limit
,--disable-validate
, and--fast-parser
are now available
Breaking Changes
- Toil's built-in autoscaler now guesses that some memory and disk space on nodes will not actually be available for jobs; pass
--assumeZeroOverhead
to revert to the old behavior (#2103)
CWL
- CWL job unit and display names have been changed to make more sense as task names, and management of them has been unified into a
CWLNamedJob
. (#4046/#4047) - CWL
CUDARequirement
is parsed bycwltool
and turned into a requirement for the minimum requested number of nvidia GPU accelerators (#3982) - fix false warning when outputSource contains only one None value (#4300)
Kubernetes
KubernetesBatchSystem
can addnvidia.com/gpu
andamd.com/gpu
resource requests for jobs that request those accelerators (#4163)KubernetesBatchSystem
can request GPUs bymodel
key, if nodes are labeled appropriately (#4163)
Dependencies
Misc
- Toil WES server now accepts requests that leave out workflow_params. (#4037)
- The
MessageBus
has been expanded to usepypubsub
, and now hasMessageInbox
andMessageOutbox
objects to represent connections to it. (#4046/#4047) ToilMetrics
now rides on theMessageBus
rails. (#4046/#4047)- Toil workflows now have a
--writeMessages
option, which takes a file to which a line-oriented stream ofMessageBus
messages will be written. Reading this file will allow you to recover the current state of the workflow. (#4046/#4047) - Add code for warning check to be used when launching cluster with AWS. (#3514)
- Use a CI prebake image for gitlab testing. (#4185)
- Toil clusters now have
/var/tmp
as the default temporary directory, since they often make large temporary files (#4148) - Adds basic testing for slurm using a slurm docker cluster by running sample workflows. (#3856)
- Add message bus documentation (#4239)
SingleMachineBatchSystem
can schedule nvidia GPU accelerators, limiting the concurrent jobs to no more than there are accelerators to support, and settingCUDA_VISIBLE_DEVICES
in the tasks' environments to tell them which nvidia GPU(s) to use. (#4163)AWSBatchBatchSystem
can use AWS Batch's GPU resource to provide nvidia GPU accelerators (#4163)- Toil jobs no longer need to re-run after their child/followOn/service jobs in order to delete themselves. (#3188)
- Message bus is now thread safe (#4276)
- Docker build has been updated with new Aventer Mesos deb URL (fixes #4290)
docker
binary in the container has been updated to that included in the Ubuntu repos (fixes #4282)- Singularity in the appliance has been updated to 3.10 which is >=3.9, for cgroups v2 support.
- Base Ubuntu container image for the appliance has been updated to 22.04, which has a new enough libc for Debian's Singularity 3.10 debs.
- Safer type usage checking for systems without boto3 installed
- Tests are now more runnable post-installation. Temporary paths are not selected based upon the location of the tests themselves. (#4287)
Bug Fixes
- Only use
/var/run/user
if XDG tells us we have it in our session. Otherwise we will try other places, including/run/lock/toil
. (#4170) toil destroy-cluster
: terminate stopped instances when destroying the cluster (#4271)- fileJobStore: handle arbitrary
os.link
errors to work on some filesystems (#2232)
Thank you to our contributors!
5.7.1
Changelog
Highlighted Features Added
AWS Batch Batch System (#3956)
AGC Integration (#4039) + More AGC integration (#4067) + AGC megabranch (#4113)
Scale TES to be able to run reasonably-sized workflows on Funnel on Kubernetes with the AWS job store (#3927)
CWL
Run CWL conformance tests via WES (#4052)
Implement and test CWL loadContents from URLs to fix #4125 (#4126)
Add CWL tests under ARM (#4038)
Cache results of cwltool version lookup (#4141)
Misc
SGE batch system change to support serial jobs. (#4022)
Performance testing for Graviton instances (#4123)
Stop waiting on hostpath volumes to exist (#4146)
Catch and warn about jobs going away too slowly on FileJobStore (#4149)
Add documentation for the type-checking hooks (#4117)
Pod murder bot (#4060)
Contrib hook scripts (#4105)
Allow newer google-cloud-storage (#4114)
Use environment variable to set parallel partition name (#4096)
Register pytest markers (#4103)
Mention --export=ALL for SLURM environments (#4100) (#4102)
Allow persisting workflow state in WES server across container recreation (#4082)
Change toil kill
to use the job store shared file API to find pig.log
(#4075)
Bring back kill loop in the single_machine batch system but with a timeout (#4070)
Reorganize Locking (#4059)
Add and test preemptability constraints (#4044)
Enhanced types (#3975)
Use an init
process that reaps zombies on toil clusters (#3974)
Add launch cluster support for ARM (#3971)
Feat: square bracket to period separator (#4008)
Add AGC health check endpoint (#3997)
Tolerate and require typed Werkzeug (#4011)
Add more static URLs for Singularity debs (#4007)
Bug Fixes
Update WES set up docs (#4027)
Add real time logs (#4031)
Fail fast if Docker builder is missing (#4001)
Make Toil version be reported as a string in WES (#4013)
Fix assorted typos within assorted comments (#4023)
Make file store case insensitive (#4153)
Pre-lex commands for qsub (#4150)
Update Cactus and exclude broken networkx (#4107)
Make toil kill
work when the leader is on another machine (#4084)
Wrong filename in output (#4139)
Tolerate a missing VersionID key to fix #4129 (#4130)
Only import from typing_extensions on old Python where we install it (#4090)
Allow missing username and fix Docker build (#4077)
Leave more time for concurrency measurement to fix #4012 (#4068)
Stop people asking for ARM Mesos clusters to fix #4057 (#4058)
Thank you to our contributors: @mr-c, @adamnovak, @w-gao, @jonathanxu18, @Hexotical, @gmloose, @kannon92, @douglowe, @gcapes, and @pmiddend!