21 Oct 05:30

Yenaled

1b3948c

v0.44.1 Latest

Latest

Minor updates:

Updated bustools extract (can work with multiple pairs of FASTQ files; also has an include/exclude functionality)
Minor update to bustools count (preventing issues with BUS files where multiple consecutive UMIs have the same EC)

Assets 5

10 Sep 21:48

Yenaled

v0.44.0

1dd7335

v0.44.0

bustools extract does not mess up FASTQ header name when doing extraction
bustools inspect handles split barcodes properly
bustools count displays warning when t2g and transcripts file mismatch

Assets 5

04 Jan 11:30

Yenaled

v0.43.2

6705b7e

Update bustools count producing split matrices

Exons are prioritized over introns when generating nascent, mature, and ambiguous count matrices when doing gene-level UMI counting with bustools count.

Additionally, command-line options menu has been simplified a bit.

Assets 5

01 Nov 12:45

Yenaled

v0.43.1

a7b3e81

bustools: new features with handling flags and barcode prefixes

New features:

Have a bustools sort --no-flags option (to eliminate flag column while sorting)
bustools capture can now capture prefixes (supply a 16-bp barcode prefix followed by * in the capture list)
bustools fromtext can now read in the flags column

Other:

Update kseq.h
Rename bustools whitelist to bustools allowlist
Fix issue #89
Display warning if transcripts.txt and t2g file mismatch; addresses issue #94
README updated

Assets 5

01 Jul 23:50

github-actions

v0.43.0

2be1e43

Count generates three matrices (nascent/mature/ambiguous), barcode metadata supported, and updates to bustools correct

bustools count

bustools count now has a --split= option (-s), for which a file containing a list of transcripts (a subset of the --txnames file) can be supplied to "split" the count matrix generated into multiple count matrices, as follows:
- A count matrix (.mtx) will be generated from collapsed UMIs that map solely to transcripts found in the file supplied to the --txnames file (but not the --split file).
- A second count matrix (.2.mtx) will be generated for collapsed UMIs mapping transcripts found in the --split file.
- An ambiguous matrix (.ambiguous.mtx) is also generated for collapsed UMIs that map to multiple transcripts such that some transcripts in the mapping might have been assigned to .mtx file and other transcripts in the mapping might have been assigned to the .2.mtx file; therefore, such UMIs rather than going into either of those two matrices file, will actually go into this ambiguous matrix file.

Note that, at the gene-level, this "splitting" is done following normal UMI collapsing and, by default, matrix assignment only occurs if all transcripts in the collapsed UMIs mapping belong to the same gene (as specified in the t2g file supplied to --genemap). This --split option is useful for generating nascent/mature/ambiguous matrices for workflows that involve looking at splicing.

barcode metadata

BUS records can now have metadata stored in the barcodes column. This metadata might have been generated by the --batch-barcodes option in kallisto bus, which stores sample barcodes in the metadata (while cell barcodes belong to the non-metadata). When bustools count is called and metadata is detected in the BUS file, a .barcodes.prefix.txt file is generated that contains the metadata (extended to 16 characters because the metadata generated by kallisto bus is 16 characters in length).
bustools text has a --showAll (-a) option that can expose the metadata in plaintext format.

bustools correct

The onlist supplied to bustools correct can have multiple columns so that each component of the barcode is correct independently.
bustools correct now has a --replace option (-r) which takes in a replacement file which contains two columns of equal length. The barcodes in the first column are replaced with the barcodes in the corresponding row in the second column. Additionally, partial replacements are possible using an asterisk, for example, a row containing CATCATCC *CATTCCTA means that the end of a barcode (if CATCATCC) is replaced with CATTCCTA.

Assets 5

19 Dec 14:16

github-actions

v0.42.0

a3b939c

BUSZ compression

This version adds the option of compressing sorted BUS files to BUSZ files and decompressing BUSZ files. For a detailed description of the BUSZ format see https://github.com/BUStools/BUSZ-format

compress

bustools compress compresses a sorted BUS file.

decompress

bustools decompress decompresses and existing BUSZ file and writes out a BUS file.

Both commands can write their output (BUS or BUSZ) to standard output via the -p flag and read binary input from standard input by using - in place of the file name.

Assets 5

04 Jan 08:30

github-actions

refs/heads/master

671f60b

refs/heads/master: Merge pull request #85 from BUStools/devel_compress

Merge compression / decompression

Assets 5

30 Jun 12:24

github-actions

v0.41.0

6fa0731

Butterfly paper

This version adds new commands to bustools from the Butterfly paper

clusterhist

bustools clusterhist creates histogram information per gene across all cells

predict

bustools predict takes as input an output directory containing the bus file an related information to predict expected gene counts based on corrected input

Assets 5

27 Jan 17:25

pmelsted

v0.40.0

b0df3ed

Bugfixes

Fixes to count, closes issue #40 when generating TCC matrices

Adds cite and version commands. Modifies merge command and project command.

If the BUS file is written to contain the read number in the flag column, using kallisto bus -n then the reads can be processed against multiple indices, derived from portions of the full transcriptome, and the resultant BUSfiles can be merged into a single BUS file with bustools merge. This is useful when index memory requirements are low as with RNA velocity.

bustools project takes in a map which is a two column file mapping barcodes to barcodes, umis to umis, or transcripts to genes thereby changing the coordinate system of the BUS file.

Assets 5

07 Nov 07:04

pmelsted

v0.39.4

c3b5816

Improved correct command and assignment of multimapping reads

This release improves the memory footprint and speed of the correct command of bustools.

The bustools count command adds the --em option that estimates gene abundances using an EM algorithm for reads that pseudoalign to multiple genes.

Note that the --multimapping option splits the read counts evenly across all genes, whereas the EM algorithm gives a more statistically valid answer. The two options are mutually exclusive.

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bustools count

barcode metadata

bustools correct

compress

decompress

clusterhist

predict

Releases: BUStools/bustools

v0.44.1

v0.44.0

Update bustools count producing split matrices

bustools: new features with handling flags and barcode prefixes

Count generates three matrices (nascent/mature/ambiguous), barcode metadata supported, and updates to bustools correct

bustools count

barcode metadata

bustools correct

BUSZ compression

compress

decompress

refs/heads/master: Merge pull request #85 from BUStools/devel_compress

Butterfly paper

clusterhist

predict

Bugfixes

Improved correct command and assignment of multimapping reads