Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't save h5mu from Scirpy processed gex+bcr+tcr data if I copy airr into obs #434

Open
Ngort opened this issue Jul 19, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@Ngort
Copy link

Ngort commented Jul 19, 2023

Describe the bug

Can't save h5mu from Scirpy processed gex+bcr+tcr data if I copy airr into obs (i.e. tdata.obs = tdata.obs.join(ir.get.airr(tdata, tdata.obsm['airr'].fields))).
Unlike in #427 , I am on 0.13 and still suffer from the bug.

TypeError: Can't implicitly convert non-string objects to strings
Above error raised while writing key 'VJ_1_germline_alignment' of <class 'h5py._hl.group.Group'> to /

(it does this with many other columns, including all _call, _cigar columns)

To Reproduce

mdata = mu.MuData({'gex':adata.copy(),
       'tcr':tdata.copy(),
       'bcr':bdata.copy()})

ir.tl.chain_qc(mdata['tcr'])
ir.pp.ir_dist(mdata['tcr'], metric="hamming", sequence='nt', n_jobs=4, cutoff=20, key_added='ir_dist_nt_hamming_global')
ir.pp.ir_dist(mdata['tcr'], metric="identity", sequence='nt', n_jobs=4)

ir.tl.define_clonotypes(tdata, key_added='clone_id',
                        n_jobs=4, dual_ir='all', receptor_arms='all',
                        within_group=['receptor_type', 'donor_id_global'])

mdata['tcr'].obs = mdata['tcr'].obs.join(ir.get.airr(mdata['tcr'], mdata['tcr'].obsm['airr'].fields))

mdata.write(fname)

What else I've tried
Changing columns to categoricals

for mod in mdata.mod.keys():
    for col in mdata[mod].obs.columns:
        if re.findall(r'(V(?:D)?J_\d_\w_(?:call|cigar))', col):
            mdata[mod].obs[col] = mdata[mod].obs[col].astype('category')
            print(mod,':',col, sep='')
            
    mdata.update()

Expected behaviour
Save the file without problems

System

OS: Linux
Python version 3.9.16
Versions of libraries involved [Muon 0.1.5, Scirpy 0.13.0, Scanpy 1.9.3]

Additional context
Add any other context about the problem here.

@Ngort Ngort added the bug Something isn't working label Jul 19, 2023
@grst
Copy link
Collaborator

grst commented Jul 23, 2023

The problem is that AnnData cannot deal with None values in obs.
A minimal repex is

import anndata
import pandas as pd
import numpy as np

adata = anndata.AnnData(X=None, obs=pd.DataFrame().assign(test=np.array([1, 2, None, 3])))
adata.write_h5ad("test.h5ad")

In principle, AnnData supports nullable Integers and Booleans, but not Strings (see scverse/anndata#679, scverse/anndata#504). However, nullable here means a pandas BooleanArray or IntegerArray, not an object dtype with Nones.

As a workaround, the offending columns can be converted to a pandas array, e.g.

mdata['tcr'].obs["VJ_1_consensus_count"] = pd.array(mdata['tcr'].obs["VJ_1_consensus_count"].values)

We obviously need a better solution than this. I'll check if this should be solved on the AnnData side e.g. by an automatic conversion. Otherwise the scirpy.get.airr function could deal with that.

@grst
Copy link
Collaborator

grst commented Aug 30, 2024

some progress on anndata
scverse/anndata#1558

Still need to check if this can be closed now.

@grst
Copy link
Collaborator

grst commented Nov 4, 2024

Still need to check if this can be closed now.

Unfortunately not.

Depends on scverse/anndata#1068

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: On Hold
Development

No branches or pull requests

2 participants