BCR tutorial #542

MKanetscheider · 2024-08-22T13:55:07Z

Added beta-version v2 of bcr tutorial and adapted corresponding file so that I (hopefully) can visualize it with read-the-docs. I have drastically reduced the tutorial as I was very unsatisfied with the previous version. I will add soon further literature to the .bib file and adapt the glossary to make the tutorial more precise and less overwhelming, while still providing any interested user with additional information.

I would be happy for any feedback (@FFinotello @grst) to make the tutorial as good as it could possibly be!

Closes #199

Fix TODO comments
CHANGELOG.md updated
Tutorial updated (if necessary)
rerun tutorial with latest version of scirpy once all required functionality is merged
add CI test for tutorial
review glossary

…accordingly; tested to add two citations into .bib file

review-notebook-app · 2024-08-22T13:55:13Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

for more information, see https://pre-commit.ci

MKanetscheider · 2024-08-22T14:53:00Z

Hi, could you help me out, please?
Why is here the readthedocs build failing... I don't really get the issue as there are only warnings, but no further details :/

grst · 2024-08-22T14:55:48Z

Warnings are treated as errors.


/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:40002: WARNING: could not find bibtex key "null.2022"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:40005: WARNING: could not find bibtex key "Suo.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:60003: WARNING: could not find bibtex key "Lefranc.2003"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:60005: WARNING: could not find bibtex key "Suo.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:120003: WARNING: could not find bibtex key "Zhu.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:120014: WARNING: could not find bibtex key "Shi.2019"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170022: WARNING: term not in glossary: 'SHM'
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170024: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170026: WARNING: could not find bibtex key "Gupta.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170026: WARNING: could not find bibtex key "Kepler.2014"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170028: WARNING: could not find bibtex key "Gupta.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170028: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190002: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190002: WARNING: could not find bibtex key "DeKosky.2013"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190008: WARNING: could not find bibtex key "Clauset.2004"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:260004: WARNING: could not find bibtex key "Adams.2020"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:280002: WARNING: could not find bibtex key "Nutt.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:320004: WARNING: could not find bibtex key "Finotello.2016"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:320004: WARNING: could not find bibtex key "Pelissier.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:360002: WARNING: py:func reference target not found: scirpy.tl.hill_diversity_profile
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:380002: WARNING: could not find bibtex key "Chao.2014"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:400004: WARNING: py:func reference target not found: scirpy.tl.convert_hill_table
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:400004: WARNING: py:func reference target not found: scirpy.tl.hill_diversity_profile
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:420002: WARNING: could not find bibtex key "Jost.2010"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:530003: WARNING: could not find bibtex key "Kenneth.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600003: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600003: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600006: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:640005: WARNING: py:func reference target not found: scirpy.tl.mutational_load

this means you are referring to citation keys and functions that don't exist.

MKanetscheider · 2024-08-22T15:00:58Z

Thanks a lot that makes sense...I will add the other citations and will for now exclude those references new functions as they are still in their own PR, but used in the notebook... 🥹

MKanetscheider · 2024-08-22T15:02:19Z

If the read the Docs build is succesfull we are able to investigate the tutorial on the website interface, right?

grst · 2024-08-22T15:11:03Z

If the read the Docs build is succesfull we are able to investigate the tutorial on the website interface, right?

yes

for more information, see https://pre-commit.ci

…easy-access literature

for more information, see https://pre-commit.ci

MKanetscheider · 2024-08-26T09:59:03Z

Hi, I adapted also the glossary a little bit to include some more information regarding B cells and B cell clustering, which is in my opinion important to know/clarify, but does confuse if included into the markdown text of the tutorial. I would have some questions that might need some discussion:

is it possible to include the .h5mu file that I used to load the 5k B-cells for the tutorial somewhere into github? It is a rather large file (~2 600 000KB) so directly importing it into GitHub shouldn't work as far as I'm aware. Is there an alternative solution, because I think it's important that any user can experiment a bit with this toy dataset. Is there a way to implement the test data similar to the one you used for the TCR tutorial, i.e. load it with its own function call? If this is desired I would be happy to give it a try, but maybe you need to offer me some guidance as I'm not sure how "easy" this is for me :/
this is somewhat of an overlap with the prior point, but should I upload the notebook, which contains the code to obtain this subsetted (down to 5k B cells) stephenson dataset, somewhere into Scirpy?
lastly I want to report some kind of bug/issue that I recognized during my work in Scirpy and has to do with scirpy.tl.define_clonotype cluster (but likely also with scirpy.tl.define_clonotype, although this was not tested). Scirpy considers any string inside v_call (same problem with j_call) as a unique V-gene assignment, and this is perfectly fine for working with Cell Ranger annotation. However, if we are working with re-annotated data, which is done with IgBlastn or IMGT/Highv-quest this is not true any more. First, the annotation contains alleles, which are depicted like this IGHV3-33*001. The problem is that if another cell would have IGHV3-33*002 these two cells would always be separated by Scirpy (if v_gene = True), because Scirpy thinks that those are totally different genes, although they only differ in their alleles, even if everything else even the junction sequence can be quite similar.
Another issue with using IgBlastn or IMGT/Highv-quest re-annotation is that they often leave multiple possible gene assignments into the v_call column if they are all similar likely.
What I did so far with my datasets was that I manually manipulated the v_call and j_call column prior to loading the datatset into an AnnData object so that they only contain the first v/j_call and without the allele information.

My idea here would be to adapt the clonotype cluster function so that it automatically ignores multiple v_call's/j_call's i.e. only considers the first one and also ignores the allele information for clustering, but doesn't manipulate the call itself. Immcantation has a own parameter on how to work with multiple calls for a gene (see "parameter first= FALSE": https://scoper.readthedocs.io/en/stable/topics/hierarchicalClones/).
Actually I encountered this problem already some time ago and discussed it with @felixpetschko but eventually we both forgot about it until now. Either way, I think it's good if @grst can also have a look on this problem and help with a solution, because if I remeber correctly it's not that trivial to "fix" this. Maybe there is some elegant workaround available?

grst · 2024-10-08T19:44:05Z

is it possible to include the .h5mu file that I used to load the 5k B-cells for the tutorial somewhere into github? It is a rather large file (~2 600 000KB) so directly importing it into GitHub shouldn't work as far as I'm aware. Is there an alternative solution, because I think it's important that any user can experiment a bit with this toy dataset. Is there a way to implement the test data similar to the one you used for the TCR tutorial, i.e. load it with its own function call? If this is desired I would be happy to give it a try, but maybe you need to offer me some guidance as I'm not sure how "easy" this is for me :/

If you can get the size below 2GB (e.g. by changing the compression to gzip when saving the h5mu file), we can attach it to a scirpy release on GitHub. Otherwise it's possible to upload it to figshare.com or maybe huggingface.co. Such a dataset should definitely be available from scirpy.datasets. It should be easy to add, just take a look at the other functions that are already there.

grst · 2024-10-08T19:49:17Z

Just thinking twice, it might be easiest if you just send me the file, then I can add it to the existing figshare where the other datasets are hosted.

grst · 2024-10-11T09:13:55Z

Regarding preprocessing, did you also check out if nf-core/airrflow is an option for re-annoation? That could also be a pretty smooth workflow to run a nextflow pipeline first (it also does some standard analyses) and then follow up with scirpy for more custom analyses.

grst · 2024-10-11T10:45:04Z

Just dropping comments here as I go through the notebook...

Section Define clonotype clusters: I don't really see the bimodality in the plots. Is this just an issue with this dataset, or may there be a problem with our implementation? If the former, could you please come up with 1-2 sentences discussion why this pattern is not visible in all cases? And maybe link to an example where it works well...

for more information, see https://pre-commit.ci

MKanetscheider · 2024-10-15T09:10:58Z

Just dropping comments here as I go through the notebook...

Section Define clonotype clusters: I don't really see the bimodality in the plots. Is this just an issue with this dataset, or may there be a problem with our implementation? If the former, could you please come up with 1-2 sentences discussion why this pattern is not visible in all cases? And maybe link to an example where it works well...

Actually I think our implementation is fine as this "bimodality" seems to be just somewhat resemble a bimodality like you can see here in the shazam tutorial (https://shazam.readthedocs.io/en/latest/vignettes/DistToNearest-Vignette/). I think that's also the reason why they came up with a computational model to select an appropriate threshold as it's usually not very clear just from the plot.
I just wrote a short discussion that this can occur and that in such cases a fixed threshold might reduce human bias...I know this is not ideal, but as we don't have a way to automatically define bimodalities this should be sufficient for now

MKanetscheider · 2024-10-15T09:22:16Z

Regarding preprocessing, did you also check out if nf-core/airrflow is an option for re-annoation? That could also be a pretty smooth workflow to run a nextflow pipeline first (it also does some standard analyses) and then follow up with scirpy for more custom analyses.

Yes I did. It should be usable as a re-annotation tool as it works with single-cell-data derived from Cellranger and it does output a .tsv file, which follows the AIRR community standards. Do you want to integrate this somehow into the tutorial?

grst · 2024-10-17T19:55:13Z

For now, I removed a few sections that depend on other open PRs (#536, #534, #535) and copied the content over to those PRs. I believe like that we can wrap up this PR faster and discuss the other sections in a more focused manner.

Yes I did. It should be usable as a re-annotation tool as it works with single-cell-data derived from Cellranger and it does output a .tsv file, which follows the AIRR community standards. Do you want to integrate this somehow into the tutorial?

I think it might be even easier to use than dandelion for preprocessing. If you think it gives equally good results I think we should mention it as another option to do preprocessing in the corresponding section.

MKanetscheider · 2024-10-20T06:45:51Z

For now, I removed a few sections that depend on other open PRs (#536, #534, #535) and copied the content over to those PRs. I believe like that we can wrap up this PR faster and discuss the other sections in a more focused manner.

Yes, you are definitely right. In some manner this tutorial is almost finished, but it depends of course if and how much we are changing in the remaining PRs. So it makes sense to wrap this one up and add sections as part of the other PRs.

I think it might be even easier to use than dandelion for preprocessing. If you think it gives equally good results I think we should mention it as another option to do preprocessing in the corresponding section.

If you wish, I will add a reference in an appropriate place so that the user is aware of this possibility 👍
The interesting thing is that Dandelion also relies a lot on Immcantation so the re-annotation pipeline is essentially the same. The only difference I can see is that with dandelion one has the possibility to change from a dandelion object to AnnData/MuData quite easily, while in the nf-core workflow one has to write and read an appropriate file first. Either way I don't feel like that should be a big obstacle. 😄

grst · 2024-11-01T21:00:14Z

I went through the remaining bits and also added a reference to AIRRflow.
Thanks for your patience and persistence while working on this!

We'll follow up on the missing pieces in #536, #535 and #534

I'll merge this as soon as the tests ran through.

Added beta-version v2 of bcr tutorial and adapted corresponding file …

f07d847

…accordingly; tested to add two citations into .bib file

pre-commit-ci bot and others added 2 commits August 22, 2024 13:55

[pre-commit.ci] auto fixes from pre-commit.com hooks

9de8b8d

for more information, see https://pre-commit.ci

Fixed issues with citation, leading to a fail of build-the-docs

d769b36

MKanetscheider and others added 8 commits August 22, 2024 18:41

Added missing citations and removed references to in-work functions

28feb21

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa2171a

for more information, see https://pre-commit.ci

Fixed an issue with one citation

f64e336

[pre-commit.ci] auto fixes from pre-commit.com hooks

b8c5358

for more information, see https://pre-commit.ci

Updated glossary to match new BCR functionalities and offer the user …

56b3ba5

…easy-access literature

Reference to new glossary entries

d2ee583

[pre-commit.ci] auto fixes from pre-commit.com hooks

d44c59c

for more information, see https://pre-commit.ci

Fixed faulty citation

2684a58

grst mentioned this pull request Oct 8, 2024

Dealing with alleles and multiple gene calls #561

Open

grst and others added 2 commits October 11, 2024 09:28

Update tutorial

3424006

Merge branch 'main' into BCR_tutorial

c2b8632

grst and others added 4 commits October 11, 2024 10:48

update tutorial

19e6a8a

[pre-commit.ci] auto fixes from pre-commit.com hooks

e0e7ea1

for more information, see https://pre-commit.ci

Update notebook

1ef6712

Update tutorial

422d99d

This comment was marked as resolved.

Sign in to view

MKanetscheider and others added 2 commits October 15, 2024 10:33

Merge branch 'main' into BCR_tutorial

287d97d

Discussion regarding bimodality of dataset

46c0e8f

Remove sections that will be added in separate PR

cb662b2

grst added 7 commits November 1, 2024 19:15

Merge branch 'main' into BCR_tutorial

be5d6cc

Improve wording

1426be6

Update CHANGELOG

6bec158

Add BCR tutorial to CI

d0c7e39

Fix glossary link

94e414b

Glossary updates

cad5768

Update references to preprocessing tools

fe4911a

grst approved these changes Nov 1, 2024

View reviewed changes

Fix function reference

f83ed20

grst merged commit 86e93ce into scverse:main Nov 1, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BCR tutorial #542

BCR tutorial #542

MKanetscheider commented Aug 22, 2024 •

edited by grst

Loading

review-notebook-app bot commented Aug 22, 2024

MKanetscheider commented Aug 22, 2024

grst commented Aug 22, 2024

MKanetscheider commented Aug 22, 2024

MKanetscheider commented Aug 22, 2024

grst commented Aug 22, 2024

MKanetscheider commented Aug 26, 2024

grst commented Oct 8, 2024

grst commented Oct 8, 2024 •

edited

Loading

grst commented Oct 11, 2024 •

edited

Loading

grst commented Oct 11, 2024 •

edited

Loading

This comment was marked as resolved.

MKanetscheider commented Oct 15, 2024 •

edited by grst

Loading

MKanetscheider commented Oct 15, 2024 •

edited by grst

Loading

grst commented Oct 17, 2024

MKanetscheider commented Oct 20, 2024

grst commented Nov 1, 2024

BCR tutorial #542

BCR tutorial #542

Conversation

MKanetscheider commented Aug 22, 2024 • edited by grst Loading

review-notebook-app bot commented Aug 22, 2024

MKanetscheider commented Aug 22, 2024

grst commented Aug 22, 2024

MKanetscheider commented Aug 22, 2024

MKanetscheider commented Aug 22, 2024

grst commented Aug 22, 2024

MKanetscheider commented Aug 26, 2024

grst commented Oct 8, 2024

grst commented Oct 8, 2024 • edited Loading

grst commented Oct 11, 2024 • edited Loading

grst commented Oct 11, 2024 • edited Loading

This comment was marked as resolved.

MKanetscheider commented Oct 15, 2024 • edited by grst Loading

MKanetscheider commented Oct 15, 2024 • edited by grst Loading

grst commented Oct 17, 2024

MKanetscheider commented Oct 20, 2024

grst commented Nov 1, 2024

MKanetscheider commented Aug 22, 2024 •

edited by grst

Loading

grst commented Oct 8, 2024 •

edited

Loading

grst commented Oct 11, 2024 •

edited

Loading

grst commented Oct 11, 2024 •

edited

Loading

MKanetscheider commented Oct 15, 2024 •

edited by grst

Loading

MKanetscheider commented Oct 15, 2024 •

edited by grst

Loading