-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt the new scverse AIRR data structure #261
Comments
Thanks @grst. yes indeed it probably will break lots of things but nevertheless i am committed to implementing this somehow. At the first instance, I'm thinking I can probably get rid of dandelion/dandelion/utilities/_core.py Lines 1470 to 1481 in 878c1a0
Will keep you updated on the progress here (while i try to potentially find a student to work on this !) |
Hi @DennisCambridge, i have an idea why (scverse/scirpy#399). Can you try that branch?
|
Hi Gregor @grst , Yes, it seems to have been restored after I installed this branch. |
perfect, thanks for your feedback! I'll merge it as soon as the checks ran through. |
some initial scalability test |
@grst am i right to think that the new structure essentially holds off on the generation of the cell level Or when the user wants to populate that they can just use |
Exactly! There's also a neat context manager to just temporarily add a column for e.g. plotting and automatically remove it again: with ir.get.airr_context(mdata, "v_call", chain=["VJ_1", "VDJ_1"]) as m:
mu.pl.umap(mdata, color="VJ_1_v_call")
Mudata doesn't propagate changes to obs of one modality automatically. Essentially think of You can either assign columns to mdata.obs["airr:vj_1_locus"] = ir.get.airr(adata, "locus", "VJ_1") or use the adata = mdata["airr"]
adata.obs["vj_1_locus"] = ir.get.airr(adata, "locus", "VJ_1")
mdata.update_obs() I made scirpy functions add to both |
Gotcha!
This didn't work for me unfortunately i had to do with ir.get.airr_context(mdata["airr"], "v_call", chain=["VJ_1", "VDJ_1"]) as m:
mu.pl.umap(mdata["airr"], color="VJ_1_v_call") Anyway - ok now i know what the the new datastructure is doing and i've put in a couple of issues/requests
The roadmap that i have in mind at the moment is for dandelion to work using |
and support |
Sorry, my bad. When using
Just for clarification: Would it need its own function because you don't want to have scirpy as a heavy dependency, or because it really needs to work differently? If it's the former, there's always the option to factor that out in a helper package, see also scverse/scirpy#385 |
at the moment i'm wanting to access beyond VJ_1/2 and VDJ_1/2 in case there's more than 4 contigs - this is currently the main reason why. still not working for mdata for completeness this is my code: mdata = ir.datasets.wu2020_3k()
mdata["gex"].obsm["X_umap"] = mdata["gex"].obsm["X_umap_orig"]
mdata["airr"].obsm["X_umap"] = mdata["gex"].obsm["X_umap_orig"]
with ir.get.airr_context(mdata, "v_call", chain=["VJ_1", "VDJ_1"]) as m:
mu.pl.umap(mdata, color="airr:VJ_1_v_call") ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File [~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:143](https://file+.vscode-resource.vscode-cdn.net/Users/uqztuong/Documents/Projects/dandelion/~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:143), in embedding(data, basis, color, use_raw, layer, **kwargs)
142 try:
--> 143 mod, basis_mod = basis.split(":")
144 except ValueError:
ValueError: not enough values to unpack (expected 2, got 1)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[14], line 2
1 with ir.get.airr_context(mdata, "v_call", chain=["VJ_1", "VDJ_1"]) as m:
----> 2 mu.pl.umap(mdata, color="airr:VJ_1_v_call")
File [~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:273](https://file+.vscode-resource.vscode-cdn.net/Users/uqztuong/Documents/Projects/dandelion/~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:273), in umap(mdata, **kwargs)
267 def umap(mdata: MuData, **kwargs) -> Union[Axes, List[Axes], None]:
268 """
269 UMAP Scatter plot
270
271 See :func:`muon.pl.embedding` for details.
272 """
--> 273 return embedding(mdata, basis="umap", **kwargs)
File [~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:145](https://file+.vscode-resource.vscode-cdn.net/Users/uqztuong/Documents/Projects/dandelion/~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:145), in embedding(data, basis, color, use_raw, layer, **kwargs)
143 mod, basis_mod = basis.split(":")
144 except ValueError:
--> 145 raise ValueError(f"Basis {basis} is not present in the MuData object (.obsm)")
147 if mod not in data.mod:
148 raise ValueError(
149 f"Modality {mod} is not present in the MuData object with modalities {', '.join(data.mod)}"
150 )
ValueError: Basis umap is not present in the MuData object (.obsm) with ir.get.airr_context(mdata["airr"], "v_call", chain=["VJ_1", "VDJ_1"]) as m:
mu.pl.umap(mdata, color="airr:VJ_1_v_call") ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File [~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:143](https://file+.vscode-resource.vscode-cdn.net/Users/uqztuong/Documents/Projects/dandelion/~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:143), in embedding(data, basis, color, use_raw, layer, **kwargs)
142 try:
--> 143 mod, basis_mod = basis.split(":")
144 except ValueError:
ValueError: not enough values to unpack (expected 2, got 1)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[17], line 2
1 with ir.get.airr_context(mdata["airr"], "v_call", chain=["VJ_1", "VDJ_1"]) as m:
----> 2 mu.pl.umap(mdata, color="airr:VJ_1_v_call")
File [~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:273](https://file+.vscode-resource.vscode-cdn.net/Users/uqztuong/Documents/Projects/dandelion/~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:273), in umap(mdata, **kwargs)
267 def umap(mdata: MuData, **kwargs) -> Union[Axes, List[Axes], None]:
268 """
269 UMAP Scatter plot
270
271 See :func:`muon.pl.embedding` for details.
272 """
--> 273 return embedding(mdata, basis="umap", **kwargs)
File [~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:145](https://file+.vscode-resource.vscode-cdn.net/Users/uqztuong/Documents/Projects/dandelion/~/miniconda3/envs/dandelion/lib/python3.10/site-packages/muon/_core/plot.py:145), in embedding(data, basis, color, use_raw, layer, **kwargs)
143 mod, basis_mod = basis.split(":")
144 except ValueError:
--> 145 raise ValueError(f"Basis {basis} is not present in the MuData object (.obsm)")
147 if mod not in data.mod:
148 raise ValueError(
149 f"Modality {mod} is not present in the MuData object with modalities {', '.join(data.mod)}"
150 )
ValueError: Basis umap is not present in the MuData object (.obsm) |
fair enough. the
So what should work here is either mdata.obsm["X_umap"] = mdata["gex"].obsm["X_umap_orig"]
mu.pl.umap(mdata, color="airr:VJ_1_v_call") or mdata["gex"].obsm["X_umap"] = mdata["gex"].obsm["X_umap_orig"]
mdata.update() # <-- not sure if this is even needed
mu.pl.embedding(mdata, basis="gex:X_umap", color="airr:VJ_1_v_call") |
ok preliminary commit to use scverse-airr structure as per #261 TODO: - wherever *.data is mentioned in dandelion, this needs to handle the awkward arrays - create a slot/function that stores the cell_id - update the tutorials - create a new to_scirpy/from_scirpy function that basically transfers the .obs, .obsm and .uns - create a to/from_ak function so that the rearrangement data can still be repopulated as a pandas dataframe. - would be cool if i can also make this dask compatible - only issue is dask.dataframe.DataFrame.compute is always slow..? need to learn more about dask. vaex is also cool but not really compatible for contig->cell level wrangling.
OK i've made some preliminary progress over at (d5fb09f). There's a list of TODOs that are probably quite trivial to work through - might get a few grad students to work on them over the next quarter. I also had to deactivate the cell level (.obs equivalent) data generation because it takes a long time to load, compared to just operating from pandas dataframe natively. |
implemented in #342 |
Ok, then I should probably change the scirpy functions |
yes. perhaps @amoschoomy can do in the PR on scirpy's side and you can do a quick check? |
that would be fab! |
Is your feature request related to a problem?
Hi @zktuong and @DennisCambridge,
the new scverse AIRR data structure is out in scirpy v0.13 RC and ready to be experimented with.
As discussed previously, it would be fantastic to get dandelion aligned to it and I'm opening this issue to track this challenge.
Describe the solution you'd like
In an ideal world, dandelion could directly work on the new data structure -- with the potential to get rid of the separate
Dandelion
class and theh5ddl
format.Describe alternatives you've considered
This could turn out to be a tremendous amount of work to transfer to a new data structure (I experienced this myself with scirpy). Therefore an easier solution could be to just support import and export from the new data structure.
Additional context
No response
The text was updated successfully, but these errors were encountered: