Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split IO into separate package #385

Open
grst opened this issue Mar 16, 2023 · 2 comments
Open

Split IO into separate package #385

grst opened this issue Mar 16, 2023 · 2 comments

Comments

@grst
Copy link
Collaborator

grst commented Mar 16, 2023

In the scverse core team the consensus was reached that IO should not be part of the analysis packages (e.g. scanpy, scirpy, muon), but rather in an independent package with minimal dependencies and have the analysis packages depend on it. The hope is that this leads to a wider adoption of scverse datastructures, since the "dependency cost" of depending on a lightweight IO packages is lower than depending on an entire framework.
This issue is to track the goal of creating such a package for scirpy.

Name (?)

A couple of ideas

  • scirpy-io
  • scverse-airr
  • airr-io

Scope

  • All read_xxx and write_xxx functions in scirpy.io
  • AirrCell, to_airr_cells and from_airr_cells functions
  • to/from_dandelion (ideally dandelion adapts the scverse datastructure. Otherwise these functions should live in dandelion itself)

Maybe

  • merge_airr
  • index_chains
  • get.airr

The latter two go beyond just storing AIRR data as an awkward array, but implement the scirpy receptor model. But they are likely useful for some other packages. But then again if a method needs this, they could just depend on the full scirpy.

In case of doubt, err on the side of including less in the package, as it could be added later if required.

@grst
Copy link
Collaborator Author

grst commented Mar 16, 2023

As discussed with @zktuong, it would be nice to refer to the dandelion preprocessing workflow (which addresses some issues with the cellranger output) from this package and/or scirpy. In the end, this shouldn't be hard, as the dandelion pipeline reads cellranger output and writes AIRR, which can directoy be consumed by the read_airr function.

@zktuong
Copy link
Contributor

zktuong commented Mar 23, 2023

tagging @DennisCambridge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: prio2
Development

No branches or pull requests

2 participants