UMI-tools output is now deterministic with --random-seed
Many users have had issues with making UMI-tools deterministic, which previously relied upon both --random-seed
and the enivornmental variable PYTHONHASHSEED
being set. From v1.1.6
only --random seed
is required.
Please note that in some cases the implemented solution may make the output from v.1.1.6 different to previous versions, even if --random-seed
is set to the same value. The differences will be very slight and the different outputs represent equally sensible UMI grouping/deduplication since they relate only to how ties are broken.
Thank you @TyberiusPrime, @christianbioinf and others for their suggestions for how to remove the dependency on PYTHONHASHSEED
for deterministic output.
New features
- umi_tools is now deterministic when using --random-seed - @TomSmithCGAT in #550
- Option to extract barcode from read2 only - @TomSmithCGAT in #630
- Adds support for python 3.12 - @IanSudbery in #657
Bugfix
- Avoids switching matplotlib backend - @sshen8 in #640
- count_tab now correctly reads UMI and cell barcodes - @eachanjohnson in #654
- count_tab now writes out strings not bytes - @eachanjohnson in #654
- Installation with < python 3 prevented - @IanSudbery in #644
Documentation
- FAQ entry regarding identification of possible duplicates reads/pairs - @TomSmithCGAT in #631
- Improved docs regarding chimeric/unmapped/unpaired read pairs - @TomSmithCGAT in #629
Other
- Add issue templates - @TomSmithCGAT in #632
- Update testing suite to pytest - @eachanjohnson in #655
New Contributors
- @sshen8 made their first contribution in #640
- @eachanjohnson made their first contribution in #654
Full Changelog: 1.1.5...v1.1.6