Skip to content

Commit

Permalink
Implement scverse datastucture (#356)
Browse files Browse the repository at this point in the history
* Create Awkward AnnData instead of putting everything in obs

* add todo

* Get chain indices for primary and secondary chains

* WIP get module

* Implement ir.get.airr

* Clean up AirrCell

* WIP restructure IO module

* fix imports

* Add helper function for unit tests

* tl.chain_qc successfully runs on the new datastructure

* Update convert anndata

* switch to obsm-based data structure

* update get module

* Update anndata schema check and _make_adata util function.

* fix _make_adata

* update fixtures

* Fix a couple of tests

* Re-add to_airr_cells

* Fix couple more tests

* Fix more IO tests

* More IO tests [skip ci]

* Cleanup has_ir

* WIP fix clonotype neighbors [skip ci]

* WIP fix distance tests

* WIP fix clonotype cluster tests

* Fix spectratype functions [skip ci]

* Fix more tests

* Fix IR dist tests [skip ci]

* Fix tests for ir dist

* Fix spectratype test [skip ci]

* Tests for new upgrade_schema function [skip ci]

* Workaround for group_abundance plot without has_ir column

* Cleanup has_ir

* Clean multi_chain [skip ci]

* stub new index_chains function

* WIP index_chain function [skip ci]

* Add stub test for index_chains

* Stub second test for index_chains

* Complete second test for index_chains [skip ci]

* index_chains tests

* Update target version to v0.13 [skip ci]

* add isort and autoflake

* Fix circular import

* Fix multichain handling (implement get._has_ir)

* re-add fixtures

* isort on tests [skip ci]

* fix remaining IO tests

* update todo flags [skip ci]

* _is_na input sanitization already in AirrCell module [skip ci]

By doing so, we can get rid of multiple todos.

* Fix issue with plotting; get rid of merge_with_ir [skip ci]

* Remove test for merge_with_ir [skip ci]

* Ensure consistent ordering or chains in merge_airr

* Complete unit tests for merge_airr [skip ci]

* Use pre-commit.ci for black formatting

* Bump minimum python version to 3.8

* Bump minimum python version to 3.8

* bump python version in CI tests

* update imports of Literal

* update pre-commit config [skip ci]

* fix compat

* WIP new chain_indices format

* Fix get module

* WIP fix tests

* Fix tests [skip ci]

* Fix dandelion tests

* Update workflow tests

* update min anndata version

* Deprecate include_fields parameter and pass kwargs to from_airr_cells inIO

* WIP update example datasets

* update wu dataset generation

* Update wu2020 dataset to mudata (preliminary)

* First attempt to make tutorial work with mudata

* fix issue with slicing awkward array when slice mask is empty

* Change clonotype calling behavior for missing cdr3 sequences

Previously, cells that had a receptor, but no sequence
were treated differently from cells with no receptor: Previously
cells with a receptor, but no sequence were assigned to a separate
clonotype, while cells without a receptor got the clonotype `nan`.

Now, also cells without sequence are assigned the clonotype `nan`.

In practice, this shouldn't have affected a lot of people, as
during IO, it was anyway ensured that only chains with a sequence
are imported.

* fix awkward type conversion in index_chains

* Get rid of tqdm workaround which is not needed anymore

* Update API in tutorial to what it *should* look like in the future

* Stub parameter validation [skip ci]

* implement params check class

* update API docs

* Apply new params check to first function

* document params check

* Remove anndata version check decorators

* Restructure to fix cirular import [skip ci]

* Unit tests for parms check

* Fix notebook pairing

* Params check in index_chains

* update ir_dist with paramscheck [skip ci]

* Apply pre-commit hooks to all files [skip ci]

* Refactor ParamsCheck class

* Refactor chain_qc

* WIP implement param checks

* Update type hints

* Improve _ParamsCheck class [skip ci]

* Fix typing in a couple of files.

* Iterate on tutorial [skip ci]

* Iterate on tutorial

* Rename _ParamsCheck to DataHandler

* Implement get_obs in DataHandler

* WIP fix clonotype_network

* Fix clonotype_network plot [skip ci]

* Update clonal_expansion

* Fix alpha diversity

* Fix repertoire overlap and spectratype

* Fix clonotype modularity

* Fix ir_query [skip ci]

* Fix clonotype convergence

* Fix clonotype imbalance

* Fix clonotype imbalance

* Update processing scripts for Wu2020

* Update maynard loading script

* disable check for same fields in AirrCell [skip ci]

* Update maynard processing script

* WIP tests with mudata

* Update example datasets [skip ci]

Use pooch to manage datasets.

* Fix test for clonotype convergence

* Experimental: use wrapper class for fixture

* Remove outdated TODO statements

* Revert "Experimental: use wrapper class for fixture"

This reverts commit ddf5718.

* Implement inplace logic in DataHandler

* Parametrize fixtures to represent both AnnData and MuData [skip ci]

* Use DataHandler to write results to obs.

* WIP fix tests

* Fix _get_colors [skip ci]

* Fix tests

* Fix test_get_color

* Implement context managers in `get` module

* Fix clustermap

* Fix normalize in spectratype

* Tutorial again complete 🎉

* Fix some open TODOs

* Add tests for get context managers

* update datasets module

* Remove function cdr_convergence, which was never publicly documented anyway

* Update some docstrings

* remove erroneous import [skip ci]

* WIP update docs

* Update usage principles and data structure

* Update MuData section [skip ci]

* WIP update IO tutorial

* Update IO tutorial

* Update datastructure section with info about single AnnData object

* Update main tutorial

* Update API docs page

* Minor doc amendments

* WIP update docstrings

* Fix docstrings

* Fix TODOs

* Fix sphinx warnings

* update isort

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* constrain pandas

* Pandas workarounds

* Revert "Pandas workarounds"

This reverts commit 6e19241.

* pandas version

* Fix problem with color by gene in clonotype_network

* fix missing import in datasets

* cancel previous CI jobs automaticallY

* test ci

* Concurrency should be outside 'jobs'

* test ci

* Update dependencies

* Update conda dependencies

Will fail, because anndata 0.9rc1 is not on conda.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
grst and pre-commit-ci[bot] authored Apr 7, 2023
1 parent c56897c commit d8ec147
Show file tree
Hide file tree
Showing 91 changed files with 4,588 additions and 3,260 deletions.
15 changes: 9 additions & 6 deletions .conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,30 +16,33 @@ build:

requirements:
host:
- python >=3.7
- python >=3.8
- pip!=22.1 # https://github.com/pypa/pip/issues/11110
- flit
- setuptools_scm
- pytoml
- importlib_metadata

run:
- python >=3.7
- anndata >=0.7.6
- scanpy >=1.6.0
- python >=3.8
- anndata >=0.9rc1
- awkward >=2.1.0
- mudata >=0.2.2
- scanpy >=1.9.3
- pandas >=1.5,<2
- numpy >=1.17.0
- scipy
- parasail-python
- scikit-learn
- python-levenshtein
- python-igraph !=0.10.0,!=0.10.1
- adjusttext >=0.7
- networkx >=2.5
- squarify
- tqdm >=4.44.1
- airr >=1.2
- tqdm >=4.63
- adjusttext >=0.7
- numba >=0.41.0
- pooch >=1.7.0

test:
source_files:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
pull_request:
branches: [master]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
tests:
if: "!contains(github.event.head_commit.message, 'skip ci')"
Expand Down
15 changes: 9 additions & 6 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ on:
release:
types: [created]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
docs:
if: "!contains(github.event.head_commit.message, 'skip ci')"
Expand All @@ -17,9 +21,10 @@ jobs:
matrix:
python-version: [3.9]
os:
- ubuntu-latest
# - macos-latest
- windows-latest
- ubuntu-latest
# - macos-latest
- windows-latest

steps:
- uses: actions/checkout@v2
with:
Expand Down Expand Up @@ -67,9 +72,7 @@ jobs:
pip install .[doc,test,rpack,dandelion]
- name: run sphinx
run: |
# cd docs && make html SPHINXOPTS="-W --keep-going"
# TODO do not ignore sphinx warnings
cd docs && make html
cd docs && make html SPHINXOPTS="-W --keep-going"
- name: Get target folder for page deploy from github ref
if: ( matrix.os == 'ubuntu-latest' ) && ( matrix.python-version == '3.8' )
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ on:
schedule:
- cron: "0 5 * * 0"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
test:
if: "!contains(github.event.head_commit.message, 'skip ci')"
Expand Down
17 changes: 17 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,20 @@ repos:
hooks:
- id: black
language_version: python3.10
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/myint/autoflake
rev: v1.4
hooks:
- id: autoflake
args:
- --in-place
- --remove-all-unused-imports
- --remove-unused-variable
- --ignore-init-module-imports
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-merge-conflict
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The case study from our paper is available `here <https://icbi-lab.github.io/sci

Installation
^^^^^^^^^^^^
You need to have Python 3.7 or newer installed on your system. If you don't have
You need to have Python 3.8 or newer installed on your system. If you don't have
Python installed, we recommend installing `Miniconda <https://docs.conda.io/en/latest/miniconda.html>`_.

There are several alternative options to install scirpy:
Expand Down
41 changes: 35 additions & 6 deletions docs/api.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _api:

API
===

Expand All @@ -20,10 +22,17 @@ Input/Output: `io`
.. currentmodule:: scirpy

.. note::
In scirpy v0.7.0 the way VDJ data is stored in `adata.obs` has changed to
be fully compliant with the `AIRR Rearrangement <https://docs.airr-community.org/en/latest/datarep/rearrangements.html#productive>`__
schema. Please use :func:`~scirpy.io.upgrade_schema` to make `AnnData` objects
from previous scirpy versions compatible with the most recent scirpy workflow.
**scirpy's data structure has been updated in v0.13.0.**

Previously, receptor data was expanded into columns of `adata.obs`, now they are stored as an :term:`awkward array` in `adata.obsm["airr"]`.
Moreover, we now use :class:`~mudata.MuData` to handle paired transcriptomics and :term:`AIRR` data.

:class:`~anndata.AnnData` objects created with older versions of scirpy can be upgraded with :func:`scirpy.io.upgrade_schema` to be compatible with the latest version of scirpy.

Please check out

* the `release notes <https://github.com/scverse/scirpy/releases/tag/v0.13.0>`_ for details about the changes and
* the documentation about :ref:`Scirpy's data structure <data-structure>`

.. autosummary::
:toctree: ./generated
Expand All @@ -37,6 +46,7 @@ formats.
.. autosummary::
:toctree: ./generated

io.read_h5mu
io.read_h5ad
io.read_10x_vdj
io.read_tracer
Expand Down Expand Up @@ -75,10 +85,25 @@ Preprocessing: `pp`
.. autosummary::
:toctree: ./generated

pp.merge_with_ir
pp.merge_airr_chains
pp.index_chains
pp.merge_airr
pp.ir_dist

Get: `get`
----------

The `get` module allows retrieving :term:`AIRR` data stored in `adata.obsm["airr"]` as a per-cell :class:`~pandas.DataFrame`
or :class:`~pandas.Series`.

.. module:: scirpy.get
.. currentmodule:: scirpy

.. autosummary::
:toctree: ./generated

get.airr
get.obs_context
get.airr_context

Tools: `tl`
-----------
Expand Down Expand Up @@ -211,6 +236,9 @@ Datasets: `datasets`
.. module:: scirpy.datasets
.. currentmodule:: scirpy

Example datasets
^^^^^^^^^^^^^^^^

.. autosummary::
:toctree: ./generated

Expand Down Expand Up @@ -241,6 +269,7 @@ Utility functions: `util`
.. autosummary::
:toctree: ./generated

util.DataHandler
util.graph.layout_components
util.graph.layout_fr_size_aware
util.graph.igraph_from_sparse_matrix
Expand Down
11 changes: 8 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@
sklearn=("https://scikit-learn.org/stable/", None),
networkx=("https://networkx.org/documentation/networkx-1.10/", None),
dandelion=("https://sc-dandelion.readthedocs.io/en/latest/", None),
muon=("https://muon.readthedocs.io/en/latest", None),
mudata=("https://mudata.readthedocs.io/en/latest/", None),
awkward=("https://awkward-array.org/doc/main/", None),
pooch=("https://www.fatiando.org/pooch/latest/", None),
)


Expand Down Expand Up @@ -130,7 +134,8 @@ def setup(app):
("py:class", "D.get(k,d), also set D[k]=d if k not in D"),
("py:class", "None. Update D from mapping/iterable E and F."),
("py:class", "an object providing a view on D's values"),
# Will work once scipy 1.8 is released
("py:class", "scipy.sparse.base.spmatrix"),
("py:class", "scipy.sparse.csr.csr_matrix"),
# don't know why these are not working
("py:class", "seaborn.matrix.ClusterGrid"),
("py:meth", "mudata.MuData.update"),
("py:class", "awkward.highlevel.Array"),
]
Loading

0 comments on commit d8ec147

Please sign in to comment.