Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError (ill-defined empirical covariance) #35

Open
m21camby opened this issue Jul 25, 2022 · 2 comments
Open

ValueError (ill-defined empirical covariance) #35

m21camby opened this issue Jul 25, 2022 · 2 comments

Comments

@m21camby
Copy link

Thank you for the wonderful package to analyse CITE-seq. I'm having an error while running dsb and below is the code and error message:

pt.pp.dsb(filtered, raw, empty_counts_range=(2, 3), random_state=1)
Having below error message:
ValueError: Fitting the mixture model failed because some components have ill-defined empirical covariance (for instance caused by singleton or collapsed samples). Try to decrease the number of components, or increase reg_covar.

Could help me to figure out what causing the error?

@MattPM
Copy link
Collaborator

MattPM commented Jul 26, 2022

Hi @m21camby

This message looks like it comes from sklearn.mixture: GaussianMixture from the python implementation of dsb in Muon which was made by @gtca.

@gtca first, many thanks again much for adding dsb to muon. This user sees the error above I believe from sklearn.mixture. I think that to address this issue the default parameter for reg_covar could be set to a higher value like 1e-5 from the default of 1e-6 according to this post: https://stackoverflow.com/questions/48370066/how-to-fix-valueerror-in-fitting-gmm-using-sklearn-mixture-gaussianmixture

I think this is justifiable to force the model to fit and to extract that background mean. In the second step of dsb each cell's the lower mean from a 2 component mxture is combined with isotype control antibodies. The isotype controls robustify the estimate of technical variation for any one cell with a poor 2 component mixture. Cells with a poor mixture fit could in theory be flagged in the stats returned by the updated versions of dsb, however that would require different data structures in python which should probably be avoided. Might it be possible to add this as a one line patch on your end? If you want me to do a pr let me know.

By the way re muon: our feature complete version of dsb has a couple updates and a new function for datasets without access to empty drops; it would be awesome to add these to muon to harmonize the method available in R and python. (I can bring that up in a separate discussion).

-Matt

@MattPM
Copy link
Collaborator

MattPM commented May 31, 2023

@gtca I'm not sure if you have noted this thread above. If this is not something you have bandwidth for can you let me know? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants