Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does cellbender work with citeseq data? #114

Closed
smk5g5 opened this issue Oct 4, 2021 · 11 comments
Closed

Does cellbender work with citeseq data? #114

smk5g5 opened this issue Oct 4, 2021 · 11 comments
Assignees
Labels
enhancement New feature or improvement
Milestone

Comments

@smk5g5
Copy link

smk5g5 commented Oct 4, 2021

Does cellbender work with 10x citeseq data?

@sjfleming
Copy link
Member

Yes! It sure does. The "Antibody Capture" features are treated just like gene expression, using the same underlying model. Background counts for protein features will be subtracted, much the same as background counts for genes are. This point will be emphasized in our upcoming paper, as I think it's a nice use of the tool.

I'll show this "raw versus CellBender" dotplot for the "Antibody Capture" features in one of the 10x Genomics PBMC datasets (though this is from a older development version of the code and might not match the current performance)
image
but still, I think you can expect significant cleanup, since the antibody features really tend to have a lot of background.

@sjfleming
Copy link
Member

@smk5g5 Great question, we definitely hope to point this out more in the upcoming paper, since this data wasn't so common when we first wrote the CellBender package.

@sjfleming
Copy link
Member

One more thing to point out is that it is probably beneficial to try several values of "--fpr" when working with antibody capture data. Since there is so much additional background noise for most antibody capture, sometimes it is helpful to increase the FPR in order to increase noise removal.

@sjfleming sjfleming pinned this issue Nov 4, 2021
@sjfleming sjfleming added this to the v0.3.0 milestone Apr 1, 2022
@sjfleming sjfleming self-assigned this Apr 1, 2022
@sjfleming sjfleming added the enhancement New feature or improvement label Apr 1, 2022
@sjfleming
Copy link
Member

Add mention of this in the docs

@ktpolanski
Copy link

I gave cellbender a try on CITE data based on this discussion, and the results have been encouraging. I had to use a stratospheric 0.9 for the FPR, but in the end I was able to coax out profiles with a good portion of the soup removed.

Given the fact CITE is inherently soupier than GEX via experimental factors, would it make sense to model their backgrounds separately? Maybe have a more trigger-happy model for the CITE? Accepting separate FPRs for the two modalities seems like a reasonable heuristic, but maybe there's something smarter that could be done at an algorithm level?

@sjfleming
Copy link
Member

@ktpolanski you make a very interesting point. With the public 10x Genomics pbmc5k dataset above, things kind of worked out that the gene expression and antibody features both cleaned up very nicely at about FPR 0.1, which is pretty reasonable. If you are having to go to FPR 0.9, that does represent a massive deviation from what we'd expect.

Two-part answer:

(1) We will (within a few weeks) be releasing cellbender v0.3.0, which constructs the denoised count matrix in a slightly different way from v0.2. In particular, it has per-feature noise removal targets that it tries to hit, based on the dataset. These per-feature targets might be exactly what's needed to fix the issue you're seeing.

(2) If that does not end up fixing the issue you've described, then it is reasonable to consider different alternatives. We'd definitely like it to work well on both modalities without having to run twice using different FPR settings. As you say, it does make sense to model their backgrounds separately. But the current model does basically model all features separately. If one feature has way higher background than another (or one feature type), then the model should be able to learn that without a problem. If (1) doesn't fix the issue, then we might need to consider some tweaks to the model in the medium-term, starting out with figuring out why the antibody features maybe are not obeying the assumptions in our model, or if there is some other noise mechanism at play for antibody capture features (though I don't see what it would be...)

@ktpolanski
Copy link

Thanks for the response, all of the above sounds very promising. Looking forward to trying out v0.3.0.

@mdmanurung
Copy link

Is there an estimate of when the v.0.3.0 will be released? Thanks in advance.

@sjfleming
Copy link
Member

@mdmanurung
My time estimates are clearly way too optimistic. I have done a whole lot of refactoring, but I think I'm pretty much done. The draft I'm working on is in the branch called sf_dev_0.3.0_postreg. I want to add one more new output file, and potentially modify the heuristics for computing priors, but that's about it. I'd estimate about a month before it all actually undergoes code review and gets merged and released.

@sjfleming
Copy link
Member

sjfleming commented Apr 26, 2023

Official release will follow merging of this PR, #189

@sjfleming
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement
Projects
None yet
Development

No branches or pull requests

4 participants