Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference procedure failures; how to pick learning rate? #184

Closed
majorkazer opened this issue Mar 20, 2023 · 8 comments
Closed

Inference procedure failures; how to pick learning rate? #184

majorkazer opened this issue Mar 20, 2023 · 8 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@majorkazer
Copy link

Hi CellBender team!

I'm using CellBender again for the first time since the last major commit (d82893c), and am running into a persistent problem during the Inference procedure within the first minute of the workflow (via Terra).

I get the following error (tried on multiple different count matrices):

cellbender:remove-background: Command:
cellbender remove-background --input /cromwell_root/fc-03f2851c-c351-43b1-a5d8-e4cd01924104/cellranger_outs/D5_Pilot/ctrl_RTX/raw_feature_bc_matrix.h5 --output ctrl_RTX_out.h5 --cuda --expected-cells 25000 --total-droplets-included 50000 --fpr 0.01 --learning-rate 1.0E-7 --exclude-antibody-capture
cellbender:remove-background: 2023-03-20 20:13:56
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from file /cromwell_root/fc-03f2851c-c351-43b1-a5d8-e4cd01924104/cellranger_outs/D5_Pilot/ctrl_RTX/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Trimming dataset for inference.
cellbender:remove-background: Excluding 4 features that correspond to antibody capture.
cellbender:remove-background: Including 27614 genes that have nonzero counts.
cellbender:remove-background: Prior on counts in empty droplets is 402
cellbender:remove-background: Prior on counts for cells is 3356
cellbender:remove-background: Excluding barcodes with counts below 201
cellbender:remove-background: Using 25000 probable cell barcodes, plus an additional 25000 barcodes, and 28588 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 621 UMI counts.
cellbender:remove-background: Running inference...
cellbender:remove-background: Inference procedure terminated early due to a NaN value in: mu, lam

The suggested fix is to reduce the learning rate.

I have tried adjusting the learning_rate argument as suggested, trying 1e-4, 1e-5, 1e-6, and 1e-7, but I am still getting the same error. After this error, the workflow continues to run for ~60+ minutes and ultimately fails, but sometimes will still produce outputs. In the workflows that don't produce outputs, I see the following error:

cellbender:remove-background: 2023-03-20 19:13:00
cellbender:remove-background: Preparing to write outputs to file...
/cellbender/cellbender/remove_background/infer.py:346: RuntimeWarning: invalid value encountered in greater
  cell_inds = np.where(self.latents['p'] > 0.9)[0]
/cellbender/cellbender/remove_background/infer.py:617: RuntimeWarning: invalid value encountered in greater
  mean_cell_epsilon = (self.latents['epsilon'][self.latents['p'] > 0.5]).mean()
cellbender:remove-background: Optimal posterior regularization factor = 0.27
/opt/conda/conda-bld/pytorch_1591914895884/work/aten/src/ATen/native/cuda/DistributionBernoulli.cu:68: operator(): block: [0,0,0], thread: [2,0,0] Assertion `0 <= p1 && p1 <= 1` failed.
mu problem values: tensor([nan, nan, nan,  ..., nan, nan, nan], device='cuda:0',
       grad_fn=<IndexBackward>)
lam problem values: tensor([nan, nan, nan,  ..., nan, nan, nan], device='cuda:0',
       grad_fn=<IndexBackward>)
A wild NaN appeared!  In param {mu, lam}

Should I keep scaling down learning_rate until the error no longer occurs? Or could there be something else going on? Just for context, these are highly overloaded reactions, so I'm running with the following parameters:
expected_cells = 25000
total_droplets_included = 50000

Thanks in advance for your help!
Best,
Sam

@sjfleming
Copy link
Member

Hi Sam! This is definitely a problem and should not be happening. I think there might actually be a bug with --exclude-antibody-capture... I'm not sure how many people have used that feature.

Can you do me a favor and try running (with default learning rate) but omit the --exclude-antibody-capture flag? Do you still see this NaN error?

@majorkazer
Copy link
Author

Hi @sjfleming !

Omitting the --exclude-antibody-capture tag and running with default learning rate worked! Do you think including the antibody capture sequences in the calculation will make a big difference? I can grab the "uncorrected" antibody capture matrix from the raw cellranger output.

Thanks for the help!
Sam

@sjfleming
Copy link
Member

Yes, I would recommend grabbing the "uncorrected" antibody capture matrix from the raw cellranger output, if you don't want to use the cellbender outputs for those values. (Actually though, cellbender usually does a good job with the antibody features. See #114 .)

I do not think that including those features in the input will make a very big difference. The other way you could exclude them (since there is currently a bug) would be to first load the data in scanpy, delete the antibody features, and save the an h5ad file to use as input to cellbender. CellBender can take an anndata h5ad file as input currently. I would not think there would be a big difference in terms of the denoised gene counts, no matter if antibody features are included in the input or not. (In fact, I think including them might help rather than hurt.)

@sjfleming sjfleming self-assigned this Mar 27, 2023
@sjfleming sjfleming added the bug Something isn't working label Mar 27, 2023
@sjfleming sjfleming added this to the v0.3.0 milestone Mar 27, 2023
@sjfleming sjfleming mentioned this issue Mar 28, 2023
@majorkazer
Copy link
Author

majorkazer commented Mar 28, 2023 via email

@racng
Copy link

racng commented Jul 17, 2023

Hi! I also run into this problem and get the following error when I use the --exclude-antibody-capture tag:

cellbender:remove-background: Inference procedure terminated early due to a NaN value in: mu, lam

Is there a release/branch where this has been fixed that you would recommmend us to install?
Thanks!

@sjfleming sjfleming mentioned this issue Aug 6, 2023
@sjfleming
Copy link
Member

Hi @racng , this issue should be fixed currently on the dev branch.

I plan to merge this today and then make an official v0.3.0 release.

@sjfleming
Copy link
Member

With the new version, instead of an --exclude-antibody-capture tag, there is an --exclude-feature-types argument. You can use it like --exclude-feature-types "Antibody Capture" to exclude those features.

Typically I always include the Antibody Capture features myself. But I have talked to people to want to include ATAC features from a multiome analysis, because it's a bit hard for cellbender to handle 200k+ features. That can be achieved via --exclude-feature-types Peaks

@sjfleming
Copy link
Member

Closed by #238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants