-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird cell proportions #8
Comments
Hi, I ran into the same thing. The proportions don't add to one and I have used TPM. Any idea why that would happen? Should I just normalize the samples to one? |
@keremw, I got an answer from authors to my question via e-mail: "this is not uncommon situation for predictors to give values that don't I advise you to dig into the math of a process (both LinSeed or DSA etc) I tried it with TPMs and got the same result. Also, I didn't have time to dig deeper into LinSeed/DSA math and just switched to another tool. |
sorry for the slow replies on my side Weird proportions come from the fact that in both LinSeed and DSA we kinda try to solve sum-to-one constraint as good as we can, yet, we do not force the proportions to be exactly sum-to-one. @keremw, if for your analysis sum-to-one is indeed required you can just try to force sum-to-one for the columns of the proportions matrix. @pushtiks " Also, I didn't have time to dig deeper into LinSeed/DSA math and just switched to another tool.". Sorry to hear that, I will try to maintain the package better from now on. Cheers, |
@konsolerr Could the issue be due to the "spill-over" effect, when the one cell type rise the proportions of another one? Cheers, |
Vickie, the inability to fit sum-to-one constraint is rather technical: before we calculate the proportions we first find "pseudo-proportions" these are vectors changes in which correspond to proportions, however, they are found in different space, thus, pseudo-proportions won't fit sum-to-one constraint. After that we actually try to find such coefficients that would make our "pseudo-proportions" look more like actual proportions by solving linear equations to fit sum-to-one constraint. But these equations can not be solved accurately in most cases, so we "approximate" sum-to-one constraint but never force it. I like your questions about cell types and subtypes! Before answering your question I want to change a bit your perspective on the cell types and the signatures of the cell types. Assume you have A, B, C1, and C2 (I assume that C2 is somewhat close to C1 but also have some expression signatures from B, if I read your example correctly). What LinSeed does is trying to identify linear subspace to put all the genes with respect to their expression in pure cell types. Imagine tetrahedron A-B-C1-C2 where we put all the genes with higher expression in A closer to vertex A (the same with B, C1, C2), all the housekeeeping genes will be in the middle of this tetrahedron. Now comes interesting case: we know that C1 and C2 are transcriptionally similar - it means that genes that are higher expressed in C will be somewhere in the middle of the edge C1-C2 of this tetrahedron, and if C2 has similar expression with B we will have some genes on the edge C2-B. Now back to your question: if you have signature genes for C2 and C1, you will have marker genes in the corners of the tetrahedron, which means that you can just run the analysis with 4 cell types and you don't have to worry about "collapsing" C2 into C1 and B. However, if you run the analysis for 3 cell types it's hard to guess what the triangle projection will be. If C1 and C2 are very similar and C2 has just "some genes" shared with B than you will find you will find proportions of A, B, C1+C2. Otherwise it's kinda hard to tell. Cheers, |
@konsolerr |
Hi!
I'm testing linseed on RNAseq dataset. And I run into issue where linseed predicted proportions in summary gives >1, like:
Cell type 1 0.6073787 0.5740409 1.027164e+00
Cell type 2 0.5784655 0.6328196 7.704742e-06
Maybe you have any idea what could go wrong.
I'm using R 3.6.0, edgeR CPM matrix and there is only 2 cell types per sample.
I'm planning to try TPMs instead, will update if there is or there is not any changes.
The text was updated successfully, but these errors were encountered: