Skip to content

Releases: scverse/scirpy

v0.13.0 - new data structure based on awkward arrays

09 Jun 06:10
ec9b894
Compare
Choose a tag to compare

This release introduces a new way to store AIRR data in AnnData.obsm using awkward arrays.
This change entails several backwards-incompatible changes to the scirpy workflow.

v0.12.2

26 Apr 04:58
678d0ca
Compare
Choose a tag to compare
  • Fix IEDB data loader after update of IEDB data formats (backport of #401)

v0.13.0rc1 - new data structure based on awkward arrays

07 Apr 06:58
d8ec147
Compare
Choose a tag to compare

This update introduces a new datastructure based on awkward arrays.
The new datastructure is described in more detail in the documentation and is considered the "official" way of representing AIRR data for scverse core and ecosystem packages.

Benefits of the new data structure include:

  • a more natural, lossless representation of AIRR Rearrangement data
  • separation of AIRR data and the receptor model, thereby getting rid of previous limitations (e.g. "only productive chains") and enabling other use-cases (e.g. spatial AIRR data) in the future.
  • clean adata.obs as AIRR data is not expanded into columns
  • support for MuData for working with paired gene expression and AIRR data as separate modalities.

The overall workflow stays the same, however this update required several backwards-incompatible changes which are summarized below.

Backwards-incompatible changes

New data structure

Closes issue #327.

Changed behavior:

  • there are no "has_ir" and "multichain" columns in adata.obs anymore
  • By default all fields are imported from AIRR rearrangement and 10x data.
  • The restriction that all chains added to an AirrCell must have the same fields has been removed. Missing fields are automatically filled with missing values.
  • io.upgrade_schema can update from v0.7 to v0.13 schema. AnnData objects generated with scirpy <= 0.6.x cannot be read anymore.
  • pl.spectratype now has a chain attributed and the meaning of the cdr3_col attribute has changed.

New functions:

  • pp.index_chains
  • pp.merge_chains

Removed functions:

  • pp.merge_with_ir
  • pp.merge_airr_chains

API supporting MuData

Closes issue #383

All functions take (where applicable) the additional, optional keyword arguments

  • airr_mod: the modality in MuData that contains AIRR information (default: "airr")
  • airr_key: the slot in adata.obsm that contains AIRR rearrangement data (default: "airr")
  • chain_idx_key: the slot in adata.obsm that contains indices specifying which chains in adata.obsm[airr_key] are the primary/secondary chains etc.

New class:

  • util.DataHandler

Updated example datasets

The example datasets have been updated to be based on the new datastructure and are now based on MuData.

  • The example datasets have been regenerated from scratch using the loader notebooks described in the docstring. The Maynard dataset gene expression is now based on values generated with Salmon instead of RSEM/featurecounts.
  • Scirpy now uses pooch to manage example datasets.

Cleanup

  • Removed the deprecated functions io.from_tcr_objs, io.from_ir_objs, io.to_ir_objs, pp.merge_with_tcr, pp.tcr_neighbors, pp.ir_neighbors, tl.chain_pairing
  • Removed the deprecated classes TcrCell, AirrChain, TcrChain
  • Removed the function pl.cdr_convergence which was never public anyway.

Additions

Easy-access functions (scirpy.get)

Closes issue #184

New functions:

  • get.airr
  • get.obs_context
  • get.airr_context

Fixes

  • Several type hints that were previously inaccurate are now updated.
  • Fix x-axis labelling in pl.clonotype_overlap raises an error if row annotations are not unique for each group.

Documentation

The documentation has been updated to reflect the changes described above, in particular the tutorials and the page about the data structure.

Other changes

  • The minimum required Python version is now 3.8 (#381)
  • Increased the minium version of tqdm to 4.63 (See tqdm/tqdm#1082)
  • pl.repertoire_overlap now always runs tl.repertoire_overlap internally and doesn't rely on cached values.
  • The mode dendro_only in pl.repertoire_overlap has been removed.
  • Cells that have a receptor, but no CDR3 sequence have previously received a separate clonotype in tl.define_clonotypes. Now they are receiving no clonotype (i.e. np.nan) as do cells without a receptor.
  • The function tl.clonal_expansion now returns a pd.Series instead of a np.array with inplace=False
  • Removed deprecation for clonotype_imbalanced, see #330
  • The group_abundance tool and plotting function used has_ir as a default group as we could previously rely on this column being present. With the new datastructure, this is not the case. To no break old code, the has_ir column is tempoarily added when requested. The group_abundance function will have to be rewritten enitrely in the future, see #232
  • In pl.spectratype, the parameter groupby has been replaced by chain.
  • We now use isort to organize imports.
  • Static typing has been improved internally (using pylance). It's not perfectly consistent yet, but we will keep working on this in the future.

v0.12.1

07 Apr 06:05
c56897c
Compare
Choose a tag to compare

Fixes

  • Bump min Python version to 3.8; CI update by @grst in #381
  • Temporarily pin pandas < 2 in #390

Other Changes

  • update pre-commit CI

v0.12.0

27 Jan 13:50
ba45166
Compare
Choose a tag to compare

New Features

  • Download IEDB and process it into an AnnData object by @ausserh in #377

Fixes

Documentation

Internal changes

New Contributors

Full Changelog: v0.11.2...v0.12.0

v0.11.2

20 Nov 12:36
5b381ad
Compare
Choose a tag to compare

Fixes

  • Excluded broken python-igraph version (#366)

v0.11.1

18 Aug 12:56
831f817
Compare
Choose a tag to compare

Fixes

  • Solve incompatibility with scipy v1.9.0 (#360)

Internal changes

  • do not autodeploy docs via CI (currently broken)
  • updated patched version of scikit-learn

v0.11.0

05 Jul 09:55
2c23901
Compare
Choose a tag to compare

Additions

  • Add data loader for BD Rhapsody single-cell immune-cell receptor data (io.read_bd_rhapsody) (#351)

Fixes

  • Fix type conversions in from_dandelion (#349).
  • Update minimal dandelion version

Documentation

Internal changes

v0.10.1

22 Nov 09:43
c8cf8e9
Compare
Choose a tag to compare

Fixes

  • Fix bug in cellranger import (#310 by @ddemaeyer)
  • Fix that VDJDB download failed when cache dir was not present (#311)

v0.10.0

15 Nov 19:51
76233c5
Compare
Choose a tag to compare

Additions

This release adds a new feature to query reference databases (#298) comprising

  • an extension of pp.ir_dist to compute distances to a reference dataset,
  • tl.ir_query, to match immune receptors to a reference database based on the distances computed with ir_dist,
  • tl.ir_query_annotate and tl.ir_query_annotate_df to annotate cells based on the result of tl.ir_query, and
  • datasets.vdjdb which conveniently downloads and processes the latest version of VDJDB.

Fixes

  • Bump minimal dependencies for networkx and tqdm (#300)
  • Fix issue with repertoire_overlap (Fix #302 via #305)
  • Fix issue with define_clonotype_clusters (Fix #303 via #305)
  • Suppress FutureWarnings from pandas in tutorials (#307)

Internal changes

  • Update sphinx to >= 4.1 (#306)
  • Update black version
  • Update the internal folder structure: tl, pp etc. are now real packages instead of aliases