Implement scverse datastucture (#356)

* Create Awkward AnnData instead of putting everything in obs * add todo * Get chain indices for primary and secondary chains * WIP get module * Implement ir.get.airr * Clean up AirrCell * WIP restructure IO module * fix imports * Add helper function for unit tests * tl.chain_qc successfully runs on the new datastructure * Update convert anndata * switch to obsm-based data structure * update get module * Update anndata schema check and _make_adata util function. * fix _make_adata * update fixtures * Fix a couple of tests * Re-add to_airr_cells * Fix couple more tests * Fix more IO tests * More IO tests [skip ci] * Cleanup has_ir * WIP fix clonotype neighbors [skip ci] * WIP fix distance tests * WIP fix clonotype cluster tests * Fix spectratype functions [skip ci] * Fix more tests * Fix IR dist tests [skip ci] * Fix tests for ir dist * Fix spectratype test [skip ci] * Tests for new upgrade_schema function [skip ci] * Workaround for group_abundance plot without has_ir column * Cleanup has_ir * Clean multi_chain [skip ci] * stub new index_chains function * WIP index_chain function [skip ci] * Add stub test for index_chains * Stub second test for index_chains * Complete second test for index_chains [skip ci] * index_chains tests * Update target version to v0.13 [skip ci] * add isort and autoflake * Fix circular import * Fix multichain handling (implement get._has_ir) * re-add fixtures * isort on tests [skip ci] * fix remaining IO tests * update todo flags [skip ci] * _is_na input sanitization already in AirrCell module [skip ci] By doing so, we can get rid of multiple todos. * Fix issue with plotting; get rid of merge_with_ir [skip ci] * Remove test for merge_with_ir [skip ci] * Ensure consistent ordering or chains in merge_airr * Complete unit tests for merge_airr [skip ci] * Use pre-commit.ci for black formatting * Bump minimum python version to 3.8 * Bump minimum python version to 3.8 * bump python version in CI tests * update imports of Literal * update pre-commit config [skip ci] * fix compat * WIP new chain_indices format * Fix get module * WIP fix tests * Fix tests [skip ci] * Fix dandelion tests * Update workflow tests * update min anndata version * Deprecate include_fields parameter and pass kwargs to from_airr_cells inIO * WIP update example datasets * update wu dataset generation * Update wu2020 dataset to mudata (preliminary) * First attempt to make tutorial work with mudata * fix issue with slicing awkward array when slice mask is empty * Change clonotype calling behavior for missing cdr3 sequences Previously, cells that had a receptor, but no sequence were treated differently from cells with no receptor: Previously cells with a receptor, but no sequence were assigned to a separate clonotype, while cells without a receptor got the clonotype `nan`. Now, also cells without sequence are assigned the clonotype `nan`. In practice, this shouldn't have affected a lot of people, as during IO, it was anyway ensured that only chains with a sequence are imported. * fix awkward type conversion in index_chains * Get rid of tqdm workaround which is not needed anymore * Update API in tutorial to what it *should* look like in the future * Stub parameter validation [skip ci] * implement params check class * update API docs * Apply new params check to first function * document params check * Remove anndata version check decorators * Restructure to fix cirular import [skip ci] * Unit tests for parms check * Fix notebook pairing * Params check in index_chains * update ir_dist with paramscheck [skip ci] * Apply pre-commit hooks to all files [skip ci] * Refactor ParamsCheck class * Refactor chain_qc * WIP implement param checks * Update type hints * Improve _ParamsCheck class [skip ci] * Fix typing in a couple of files. * Iterate on tutorial [skip ci] * Iterate on tutorial * Rename _ParamsCheck to DataHandler * Implement get_obs in DataHandler * WIP fix clonotype_network * Fix clonotype_network plot [skip ci] * Update clonal_expansion * Fix alpha diversity * Fix repertoire overlap and spectratype * Fix clonotype modularity * Fix ir_query [skip ci] * Fix clonotype convergence * Fix clonotype imbalance * Fix clonotype imbalance * Update processing scripts for Wu2020 * Update maynard loading script * disable check for same fields in AirrCell [skip ci] * Update maynard processing script * WIP tests with mudata * Update example datasets [skip ci] Use pooch to manage datasets. * Fix test for clonotype convergence * Experimental: use wrapper class for fixture * Remove outdated TODO statements * Revert "Experimental: use wrapper class for fixture" This reverts commit ddf5718. * Implement inplace logic in DataHandler * Parametrize fixtures to represent both AnnData and MuData [skip ci] * Use DataHandler to write results to obs. * WIP fix tests * Fix _get_colors [skip ci] * Fix tests * Fix test_get_color * Implement context managers in `get` module * Fix clustermap * Fix normalize in spectratype * Tutorial again complete 🎉 * Fix some open TODOs * Add tests for get context managers * update datasets module * Remove function cdr_convergence, which was never publicly documented anyway * Update some docstrings * remove erroneous import [skip ci] * WIP update docs * Update usage principles and data structure * Update MuData section [skip ci] * WIP update IO tutorial * Update IO tutorial * Update datastructure section with info about single AnnData object * Update main tutorial * Update API docs page * Minor doc amendments * WIP update docstrings * Fix docstrings * Fix TODOs * Fix sphinx warnings * update isort * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * constrain pandas * Pandas workarounds * Revert "Pandas workarounds" This reverts commit 6e19241. * pandas version * Fix problem with color by gene in clonotype_network * fix missing import in datasets * cancel previous CI jobs automaticallY * test ci * Concurrency should be outside 'jobs' * test ci * Update dependencies * Update conda dependencies Will fail, because anndata 0.9rc1 is not on conda. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
scverse · Apr 7, 2023 · d8ec147 · d8ec147
1 parent c56897c
commit d8ec147
Show file tree

Hide file tree

Showing 91 changed files with 4,588 additions and 3,260 deletions.
diff --git a/.conda/meta.yaml b/.conda/meta.yaml
@@ -16,30 +16,33 @@ build:
 
 requirements:
   host:
-    - python >=3.7
+    - python >=3.8
     - pip!=22.1 # https://github.com/pypa/pip/issues/11110
     - flit
     - setuptools_scm
     - pytoml
     - importlib_metadata
 
   run:
-    - python >=3.7
-    - anndata >=0.7.6
-    - scanpy >=1.6.0
+    - python >=3.8
+    - anndata >=0.9rc1
+    - awkward >=2.1.0
+    - mudata >=0.2.2
+    - scanpy >=1.9.3
     - pandas >=1.5,<2
     - numpy >=1.17.0
     - scipy
     - parasail-python
     - scikit-learn
     - python-levenshtein
     - python-igraph !=0.10.0,!=0.10.1
-    - adjusttext >=0.7
     - networkx >=2.5
     - squarify
-    - tqdm >=4.44.1
     - airr >=1.2
+    - tqdm >=4.63
+    - adjusttext >=0.7
     - numba >=0.41.0
+    - pooch >=1.7.0
 
 test:
   source_files:

diff --git a/.github/workflows/conda.yml b/.github/workflows/conda.yml
@@ -6,6 +6,10 @@ on:
   pull_request:
     branches: [master]
 
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
 jobs:
   tests:
     if: "!contains(github.event.head_commit.message, 'skip ci')"

diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -8,6 +8,10 @@ on:
   release:
     types: [created]
 
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
 jobs:
   docs:
     if: "!contains(github.event.head_commit.message, 'skip ci')"
@@ -17,9 +21,10 @@ jobs:
       matrix:
         python-version: [3.9]
         os:
-         - ubuntu-latest
-         # - macos-latest
-         - windows-latest
+          - ubuntu-latest
+          # - macos-latest
+          - windows-latest
+
     steps:
       - uses: actions/checkout@v2
         with:
@@ -67,9 +72,7 @@ jobs:
           pip install .[doc,test,rpack,dandelion]
       - name: run sphinx
         run: |
-          # cd docs && make html SPHINXOPTS="-W --keep-going"
-          # TODO do not ignore sphinx warnings
-          cd docs && make html
+          cd docs && make html SPHINXOPTS="-W --keep-going"
 
       - name: Get target folder for page deploy from github ref
         if: ( matrix.os == 'ubuntu-latest' ) &&  ( matrix.python-version == '3.8' )

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -8,6 +8,10 @@ on:
   schedule:
     - cron: "0 5 * * 0"
 
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
 jobs:
   test:
     if: "!contains(github.event.head_commit.message, 'skip ci')"

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -4,3 +4,20 @@ repos:
     hooks:
       - id: black
         language_version: python3.10
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.12.0
+    hooks:
+      - id: isort
+  - repo: https://github.com/myint/autoflake
+    rev: v1.4
+    hooks:
+      - id: autoflake
+        args:
+          - --in-place
+          - --remove-all-unused-imports
+          - --remove-unused-variable
+          - --ignore-init-module-imports
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.4.0
+    hooks:
+      - id: check-merge-conflict
diff --git a/README.rst b/README.rst
@@ -50,7 +50,7 @@ The case study from our paper is available `here <https://icbi-lab.github.io/sci
 
 Installation
 ^^^^^^^^^^^^
-You need to have Python 3.7 or newer installed on your system. If you don't have
+You need to have Python 3.8 or newer installed on your system. If you don't have
 Python installed, we recommend installing `Miniconda <https://docs.conda.io/en/latest/miniconda.html>`_.
 
 There are several alternative options to install scirpy:

diff --git a/docs/api.rst b/docs/api.rst
@@ -1,3 +1,5 @@
+.. _api:
+
 API
 ===
 
@@ -20,10 +22,17 @@ Input/Output: `io`
 .. currentmodule:: scirpy
 
 .. note::
-   In scirpy v0.7.0 the way VDJ data is stored in `adata.obs` has changed to
-   be fully compliant with the `AIRR Rearrangement <https://docs.airr-community.org/en/latest/datarep/rearrangements.html#productive>`__
-   schema. Please use :func:`~scirpy.io.upgrade_schema` to make `AnnData` objects
-   from previous scirpy versions compatible with the most recent scirpy workflow.
+    **scirpy's data structure has been updated in v0.13.0.**
+
+    Previously, receptor data was expanded into columns of `adata.obs`, now they are stored as an :term:`awkward array` in `adata.obsm["airr"]`. 
+    Moreover, we now use :class:`~mudata.MuData` to handle paired transcriptomics and :term:`AIRR` data. 
+
+    :class:`~anndata.AnnData` objects created with older versions of scirpy can be upgraded with :func:`scirpy.io.upgrade_schema` to be compatible with the latest version of scirpy.
+
+    Please check out
+
+     * the `release notes <https://github.com/scverse/scirpy/releases/tag/v0.13.0>`_ for details about the changes and
+     * the documentation about :ref:`Scirpy's data structure <data-structure>`
 
    .. autosummary::
       :toctree: ./generated
@@ -37,6 +46,7 @@ formats.
 .. autosummary::
    :toctree: ./generated
 
+   io.read_h5mu
    io.read_h5ad
    io.read_10x_vdj
    io.read_tracer
@@ -75,10 +85,25 @@ Preprocessing: `pp`
 .. autosummary::
    :toctree: ./generated
 
-   pp.merge_with_ir
-   pp.merge_airr_chains
+   pp.index_chains
+   pp.merge_airr
    pp.ir_dist
 
+Get: `get`
+----------
+
+The `get` module allows retrieving :term:`AIRR` data stored in `adata.obsm["airr"]` as a per-cell :class:`~pandas.DataFrame`
+or :class:`~pandas.Series`. 
+
+.. module:: scirpy.get
+.. currentmodule:: scirpy
+
+.. autosummary::
+   :toctree: ./generated
+
+   get.airr
+   get.obs_context
+   get.airr_context
 
 Tools: `tl`
 -----------
@@ -211,6 +236,9 @@ Datasets: `datasets`
 .. module:: scirpy.datasets
 .. currentmodule:: scirpy
 
+Example datasets
+^^^^^^^^^^^^^^^^
+
 .. autosummary::
    :toctree: ./generated
 
@@ -241,6 +269,7 @@ Utility functions: `util`
 .. autosummary::
    :toctree: ./generated
 
+   util.DataHandler
    util.graph.layout_components
    util.graph.layout_fr_size_aware
    util.graph.igraph_from_sparse_matrix

diff --git a/docs/conf.py b/docs/conf.py
@@ -76,6 +76,10 @@
     sklearn=("https://scikit-learn.org/stable/", None),
     networkx=("https://networkx.org/documentation/networkx-1.10/", None),
     dandelion=("https://sc-dandelion.readthedocs.io/en/latest/", None),
+    muon=("https://muon.readthedocs.io/en/latest", None),
+    mudata=("https://mudata.readthedocs.io/en/latest/", None),
+    awkward=("https://awkward-array.org/doc/main/", None),
+    pooch=("https://www.fatiando.org/pooch/latest/", None),
 )
 
 
@@ -130,7 +134,8 @@ def setup(app):
     ("py:class", "D.get(k,d), also set D[k]=d if k not in D"),
     ("py:class", "None.  Update D from mapping/iterable E and F."),
     ("py:class", "an object providing a view on D's values"),
-    # Will work once scipy 1.8 is released
-    ("py:class", "scipy.sparse.base.spmatrix"),
-    ("py:class", "scipy.sparse.csr.csr_matrix"),
+    # don't know why these are not working
+    ("py:class", "seaborn.matrix.ClusterGrid"),
+    ("py:meth", "mudata.MuData.update"),
+    ("py:class", "awkward.highlevel.Array"),
 ]