Releases: CGATOxford/UMI-tools
v1.1.6: Update version.py
UMI-tools output is now deterministic with --random-seed
Many users have had issues with making UMI-tools deterministic, which previously relied upon both --random-seed
and the enivornmental variable PYTHONHASHSEED
being set. From v1.1.6
only --random seed
is required.
Please note that in some cases the implemented solution may make the output from v.1.1.6 different to previous versions, even if --random-seed
is set to the same value. The differences will be very slight and the different outputs represent equally sensible UMI grouping/deduplication since they relate only to how ties are broken.
Thank you @TyberiusPrime, @christianbioinf and others for their suggestions for how to remove the dependency on PYTHONHASHSEED
for deterministic output.
New features
- umi_tools is now deterministic when using --random-seed - @TomSmithCGAT in #550
- Option to extract barcode from read2 only - @TomSmithCGAT in #630
- Adds support for python 3.12 - @IanSudbery in #657
Bugfix
- Avoids switching matplotlib backend - @sshen8 in #640
- count_tab now correctly reads UMI and cell barcodes - @eachanjohnson in #654
- count_tab now writes out strings not bytes - @eachanjohnson in #654
- Installation with < python 3 prevented - @IanSudbery in #644
Documentation
- FAQ entry regarding identification of possible duplicates reads/pairs - @TomSmithCGAT in #631
- Improved docs regarding chimeric/unmapped/unpaired read pairs - @TomSmithCGAT in #629
Other
- Add issue templates - @TomSmithCGAT in #632
- Update testing suite to pytest - @eachanjohnson in #655
New Contributors
- @sshen8 made their first contribution in #640
- @eachanjohnson made their first contribution in #654
Full Changelog: 1.1.5...v1.1.6
1.1.5
New features
- Enables read suffixes to be removed from single end data: @IanSudbery in #591. See #580 for motivating issue
- Adds a script to prepare
umi_tools dedup
output for use withRSEM
: @IanSudbery in #609. See #465 and #607 for motivating issues
Bugfix
- Fix lack of help messages in 1.1.4 by @IanSudbery in #586
- Fixes read suffix line end: @IanSudbery in #611
Documentation
New Contributors
Full Changelog: 1.1.4...1.1.5
1.1.4
1.1.3
New features
- Adds '--umi-separator' option to
umi_tools extract
to specify UMI separator. Thanks @opplatek (#548)
Optimisation
- Speeds up read pair mate writing. Significant benefit for transcriptome alignments (#543)
Bugfix
- Handles
umi_tools group
output to tsv with--per-contig
when no gene tags are present. Thanks @mfansler & @akmorrow13 (#577) - Fixes syntax warning in extract.py. Thanks @rajivnarayan (#558)
- Improves error message for incorrect command line input. Thanks @epruesse (#506 & #537)
1.1.2
1.1.1
Updates requirements for pysam version to >0.16.0.1
. Thanks @sunnymouse25 (#444)
1.1.0
A long overdue release covering some minor functionality updates and bugfixes:
Additional functionality:
- Write out reads failing regex matching with
extract
/whitelist
(see options--filtered-out
,--filtered-out2
). See #328 for motivation - Ignore template length with paired-end
dedup
/group
(see option--ignore-tlen
). See #357 for motivation. Thanks @skitcattCRUKMI - Ignore read pair suffixes with
extract
/whitelist
e.g/1
or/2
. (see option--ignore-read-pair-suffixes
). See #325, #391, #418, PierreBSC/Viral-Track#9 for motivation
Performance
- Sped up error correction mapping for cell barcodes in
whitelist
by using BKTree. Thanks @redst4r. Note that this adds a new python dependency (pybktree
) which is available viapip
andconda-forge
. - Very slight reduction in memory usage for
dedup
/group
via bugfix to reduce the amount of reads being retained in the buffer. Thanks to @mitrinh1 for spotting this (#428). The bug was equivalent to hardcoding the option-buffer-whole-contig
on, which ensures all reads with the same start position are grouped together for deduplication, but at the cost of not yielding reads until the end of each contig, thus increasing memory usage. As such, the bug was not detrimental to results output.
Bugfixes:
- Unmapped mates were not properly discarded with
dedup
andgroup
. Thanks @Daniel-Liu-c0deb0t for rectifying this.
1.0.1: Merge pull request #385 from CGATOxford/{TS}-DebugCellTag
Debug for KeyError
when some reads are missing a cell barode tag and stats output required from umi_tools dedup
. See comments from @ZHUwj0 in #281
1.0.0
This release is intended to be a stable release with no plans for significant updates to UMI-tools functionality in the near future. As part of this release, much of the code base has been refactored. It is possible this may have introduced bugs which have not been picked up by the regression testing. If so, please raise an issue and we'll try and rectify with a minor release update ASAP.
Documentation
UMI-tools documentation is now available online: https://umi-tools.readthedocs.io/en/latest/index.html
Along with the previous documentation, the readthedocs pages also include new pages:
- FAQ
- Making use of our Alogrithmns: The API
New knee method for whitelist
- The method to detect the "knee" in
whitelist
has been updated (#317). This method should always identify a threshold and is now set as the default method. Note that this knee method appears to be slightly more conservative (fewer cells above threshold) but having identified the knee, one can always re-runwhitelist
and use--set-cell-number
to expand the whitelist if desired - The old method is still available via
--knee-method=density
- In addition, to run the old knee method but allow whitelist to exit without error even if a suitable knee point isn't identified, use the new
--allow-threshold-error
option (#249) - Putative errors in CBs above the knee can be detected using
--ed-above-threshold
(#309)
Explicit options for handling chimeric & inproper read pairs (#312)
The behaviour for chimeric read pairs, inproper read pairs and unmapped reads can now be explictly set with the --chimeric-pairs
, --unpaired-reads
and --unmapped-reads
.
New options
--temp-dir
: Set the directory for temporary files (#254)--either-read
&--either-read-resolve
: Extract the UMI from either read (#175)
Misc
0.5.5
Mainly minor debugs and improved detection of incorrect command line options. Minor updates to documentation.
- Resolves issues correctly skipping reads which have not been assigned (#191 & #273). This involves the addition of the
--assigned-status-tag
option
Testing for OSX has been dropped due to unresolved issues with travis. We hope to resurrect this in the future!
In line with major python packages (e.g https://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html), support for python 2 will be dropped from January 1st 2019.