Releases: scverse/scirpy
v0.13.0 - new data structure based on awkward arrays
This release introduces a new way to store AIRR data in AnnData.obsm
using awkward arrays.
This change entails several backwards-incompatible changes to the scirpy workflow.
- Please read the release notes for more details.
- For more information about the new data structure, please see the respective section in the documentation.
v0.12.2
v0.13.0rc1 - new data structure based on awkward arrays
This update introduces a new datastructure based on awkward arrays.
The new datastructure is described in more detail in the documentation and is considered the "official" way of representing AIRR data for scverse core and ecosystem packages.
Benefits of the new data structure include:
- a more natural, lossless representation of AIRR Rearrangement data
- separation of AIRR data and the receptor model, thereby getting rid of previous limitations (e.g. "only productive chains") and enabling other use-cases (e.g. spatial AIRR data) in the future.
- clean
adata.obs
as AIRR data is not expanded into columns - support for MuData for working with paired gene expression and AIRR data as separate modalities.
The overall workflow stays the same, however this update required several backwards-incompatible changes which are summarized below.
Backwards-incompatible changes
New data structure
Closes issue #327.
Changed behavior:
- there are no "has_ir" and "multichain" columns in
adata.obs
anymore - By default all fields are imported from AIRR rearrangement and 10x data.
- The restriction that all chains added to an
AirrCell
must have the same fields has been removed. Missing fields are automatically filled with missing values. io.upgrade_schema
can update from v0.7 to v0.13 schema. AnnData objects generated with scirpy<= 0.6.x
cannot be read anymore.pl.spectratype
now has achain
attributed and the meaning of thecdr3_col
attribute has changed.
New functions:
pp.index_chains
pp.merge_chains
Removed functions:
pp.merge_with_ir
pp.merge_airr_chains
API supporting MuData
Closes issue #383
All functions take (where applicable) the additional, optional keyword arguments
airr_mod
: the modality in MuData that contains AIRR information (default: "airr")airr_key
: the slot inadata.obsm
that contains AIRR rearrangement data (default: "airr")chain_idx_key
: the slot inadata.obsm
that contains indices specifying which chains inadata.obsm[airr_key]
are the primary/secondary chains etc.
New class:
util.DataHandler
Updated example datasets
The example datasets have been updated to be based on the new datastructure and are now based on MuData.
- The example datasets have been regenerated from scratch using the loader notebooks described in the docstring. The Maynard dataset gene expression is now based on values generated with Salmon instead of RSEM/featurecounts.
- Scirpy now uses pooch to manage example datasets.
Cleanup
- Removed the deprecated functions
io.from_tcr_objs
,io.from_ir_objs
,io.to_ir_objs
,pp.merge_with_tcr
,pp.tcr_neighbors
,pp.ir_neighbors
,tl.chain_pairing
- Removed the deprecated classes
TcrCell
,AirrChain
,TcrChain
- Removed the function
pl.cdr_convergence
which was never public anyway.
Additions
Easy-access functions (scirpy.get
)
Closes issue #184
New functions:
get.airr
get.obs_context
get.airr_context
Fixes
- Several type hints that were previously inaccurate are now updated.
- Fix x-axis labelling in
pl.clonotype_overlap
raises an error if row annotations are not unique for each group.
Documentation
The documentation has been updated to reflect the changes described above, in particular the tutorials and the page about the data structure.
Other changes
- The minimum required Python version is now 3.8 (#381)
- Increased the minium version of tqdm to 4.63 (See tqdm/tqdm#1082)
pl.repertoire_overlap
now always runstl.repertoire_overlap
internally and doesn't rely on cached values.- The mode
dendro_only
inpl.repertoire_overlap
has been removed. - Cells that have a receptor, but no CDR3 sequence have previously received a separate clonotype in
tl.define_clonotypes
. Now they are receiving no clonotype (i.e.np.nan
) as do cells without a receptor. - The function
tl.clonal_expansion
now returns apd.Series
instead of anp.array
withinplace=False
- Removed deprecation for
clonotype_imbalanced
, see #330 - The
group_abundance
tool and plotting function usedhas_ir
as a default group as we could previously rely on this column being present. With the new datastructure, this is not the case. To no break old code, thehas_ir
column is tempoarily added when requested. Thegroup_abundance
function will have to be rewritten enitrely in the future, see #232 - In
pl.spectratype
, the parametergroupby
has been replaced bychain
. - We now use isort to organize imports.
- Static typing has been improved internally (using pylance). It's not perfectly consistent yet, but we will keep working on this in the future.
v0.12.1
v0.12.0
New Features
Fixes
Documentation
Internal changes
New Contributors
Full Changelog: v0.11.2...v0.12.0
v0.11.2
v0.11.1
v0.11.0
Additions
- Add data loader for BD Rhapsody single-cell immune-cell receptor data (
io.read_bd_rhapsody
) (#351)
Fixes
- Fix type conversions in
from_dandelion
(#349). - Update minimal dandelion version
Documentation
- Rebranding to scverse (#324, #326)
- Add issue templates
- Fix IMGT typos (#344 by @emjbishop)
Internal changes
- Bump default CI python version to 3.9
- Use patched version of scikit-bio in CI until scikit-bio/scikit-bio#1813 gets merged
v0.10.1
v0.10.0
Additions
This release adds a new feature to query reference databases (#298) comprising
- an extension of
pp.ir_dist
to compute distances to a reference dataset, tl.ir_query
, to match immune receptors to a reference database based on the distances computed withir_dist
,tl.ir_query_annotate
andtl.ir_query_annotate_df
to annotate cells based on the result oftl.ir_query
, anddatasets.vdjdb
which conveniently downloads and processes the latest version of VDJDB.
Fixes
- Bump minimal dependencies for networkx and tqdm (#300)
- Fix issue with
repertoire_overlap
(Fix #302 via #305) - Fix issue with
define_clonotype_clusters
(Fix #303 via #305) - Suppress
FutureWarning
s from pandas in tutorials (#307)
Internal changes
- Update sphinx to >= 4.1 (#306)
- Update black version
- Update the internal folder structure:
tl
,pp
etc. are now real packages instead of aliases