Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BCR tutorial #542

Merged
merged 28 commits into from
Nov 1, 2024
Merged

BCR tutorial #542

merged 28 commits into from
Nov 1, 2024

Conversation

MKanetscheider
Copy link
Collaborator

@MKanetscheider MKanetscheider commented Aug 22, 2024

Added beta-version v2 of bcr tutorial and adapted corresponding file so that I (hopefully) can visualize it with read-the-docs. I have drastically reduced the tutorial as I was very unsatisfied with the previous version. I will add soon further literature to the .bib file and adapt the glossary to make the tutorial more precise and less overwhelming, while still providing any interested user with additional information.

I would be happy for any feedback (@FFinotello @grst) to make the tutorial as good as it could possibly be!

Closes #199

  • Fix TODO comments
  • CHANGELOG.md updated
  • Tutorial updated (if necessary)
  • rerun tutorial with latest version of scirpy once all required functionality is merged
  • add CI test for tutorial
  • review glossary

…accordingly; tested to add two citations into .bib file
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MKanetscheider
Copy link
Collaborator Author

Hi, could you help me out, please?
Why is here the readthedocs build failing... I don't really get the issue as there are only warnings, but no further details :/

@grst
Copy link
Collaborator

grst commented Aug 22, 2024

Warnings are treated as errors.


/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:40002: WARNING: could not find bibtex key "null.2022"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:40005: WARNING: could not find bibtex key "Suo.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:60003: WARNING: could not find bibtex key "Lefranc.2003"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:60005: WARNING: could not find bibtex key "Suo.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:120003: WARNING: could not find bibtex key "Zhu.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:120014: WARNING: could not find bibtex key "Shi.2019"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170022: WARNING: term not in glossary: 'SHM'
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170024: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170026: WARNING: could not find bibtex key "Gupta.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170026: WARNING: could not find bibtex key "Kepler.2014"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170028: WARNING: could not find bibtex key "Gupta.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170028: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190002: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190002: WARNING: could not find bibtex key "DeKosky.2013"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190008: WARNING: could not find bibtex key "Clauset.2004"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:260004: WARNING: could not find bibtex key "Adams.2020"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:280002: WARNING: could not find bibtex key "Nutt.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:320004: WARNING: could not find bibtex key "Finotello.2016"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:320004: WARNING: could not find bibtex key "Pelissier.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:360002: WARNING: py:func reference target not found: scirpy.tl.hill_diversity_profile
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:380002: WARNING: could not find bibtex key "Chao.2014"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:400004: WARNING: py:func reference target not found: scirpy.tl.convert_hill_table
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:400004: WARNING: py:func reference target not found: scirpy.tl.hill_diversity_profile
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:420002: WARNING: could not find bibtex key "Jost.2010"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:530003: WARNING: could not find bibtex key "Kenneth.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600003: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600003: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600006: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:640005: WARNING: py:func reference target not found: scirpy.tl.mutational_load

this means you are referring to citation keys and functions that don't exist.

@MKanetscheider
Copy link
Collaborator Author

Thanks a lot that makes sense...I will add the other citations and will for now exclude those references new functions as they are still in their own PR, but used in the notebook... 🥹

@MKanetscheider
Copy link
Collaborator Author

If the read the Docs build is succesfull we are able to investigate the tutorial on the website interface, right?

@grst
Copy link
Collaborator

grst commented Aug 22, 2024

If the read the Docs build is succesfull we are able to investigate the tutorial on the website interface, right?

yes

@MKanetscheider
Copy link
Collaborator Author

Hi, I adapted also the glossary a little bit to include some more information regarding B cells and B cell clustering, which is in my opinion important to know/clarify, but does confuse if included into the markdown text of the tutorial. I would have some questions that might need some discussion:

  • is it possible to include the .h5mu file that I used to load the 5k B-cells for the tutorial somewhere into github? It is a rather large file (~2 600 000KB) so directly importing it into GitHub shouldn't work as far as I'm aware. Is there an alternative solution, because I think it's important that any user can experiment a bit with this toy dataset. Is there a way to implement the test data similar to the one you used for the TCR tutorial, i.e. load it with its own function call? If this is desired I would be happy to give it a try, but maybe you need to offer me some guidance as I'm not sure how "easy" this is for me :/
  • this is somewhat of an overlap with the prior point, but should I upload the notebook, which contains the code to obtain this subsetted (down to 5k B cells) stephenson dataset, somewhere into Scirpy?
  • lastly I want to report some kind of bug/issue that I recognized during my work in Scirpy and has to do with scirpy.tl.define_clonotype cluster (but likely also with scirpy.tl.define_clonotype, although this was not tested). Scirpy considers any string inside v_call (same problem with j_call) as a unique V-gene assignment, and this is perfectly fine for working with Cell Ranger annotation. However, if we are working with re-annotated data, which is done with IgBlastn or IMGT/Highv-quest this is not true any more. First, the annotation contains alleles, which are depicted like this IGHV3-33*001. The problem is that if another cell would have IGHV3-33*002 these two cells would always be separated by Scirpy (if v_gene = True), because Scirpy thinks that those are totally different genes, although they only differ in their alleles, even if everything else even the junction sequence can be quite similar.
    Another issue with using IgBlastn or IMGT/Highv-quest re-annotation is that they often leave multiple possible gene assignments into the v_call column if they are all similar likely.
    What I did so far with my datasets was that I manually manipulated the v_call and j_call column prior to loading the datatset into an AnnData object so that they only contain the first v/j_call and without the allele information.

My idea here would be to adapt the clonotype cluster function so that it automatically ignores multiple v_call's/j_call's i.e. only considers the first one and also ignores the allele information for clustering, but doesn't manipulate the call itself. Immcantation has a own parameter on how to work with multiple calls for a gene (see "parameter first= FALSE": https://scoper.readthedocs.io/en/stable/topics/hierarchicalClones/).
Actually I encountered this problem already some time ago and discussed it with @felixpetschko but eventually we both forgot about it until now. Either way, I think it's good if @grst can also have a look on this problem and help with a solution, because if I remeber correctly it's not that trivial to "fix" this. Maybe there is some elegant workaround available?

@grst
Copy link
Collaborator

grst commented Oct 8, 2024

is it possible to include the .h5mu file that I used to load the 5k B-cells for the tutorial somewhere into github? It is a rather large file (~2 600 000KB) so directly importing it into GitHub shouldn't work as far as I'm aware. Is there an alternative solution, because I think it's important that any user can experiment a bit with this toy dataset. Is there a way to implement the test data similar to the one you used for the TCR tutorial, i.e. load it with its own function call? If this is desired I would be happy to give it a try, but maybe you need to offer me some guidance as I'm not sure how "easy" this is for me :/

If you can get the size below 2GB (e.g. by changing the compression to gzip when saving the h5mu file), we can attach it to a scirpy release on GitHub. Otherwise it's possible to upload it to figshare.com or maybe huggingface.co. Such a dataset should definitely be available from scirpy.datasets. It should be easy to add, just take a look at the other functions that are already there.

@grst
Copy link
Collaborator

grst commented Oct 8, 2024

  • Just thinking twice, it might be easiest if you just send me the file, then I can add it to the existing figshare where the other datasets are hosted.

@grst
Copy link
Collaborator

grst commented Oct 11, 2024

  • Regarding preprocessing, did you also check out if nf-core/airrflow is an option for re-annoation? That could also be a pretty smooth workflow to run a nextflow pipeline first (it also does some standard analyses) and then follow up with scirpy for more custom analyses.

@grst
Copy link
Collaborator

grst commented Oct 11, 2024

Just dropping comments here as I go through the notebook...

  • Section Define clonotype clusters: I don't really see the bimodality in the plots. Is this just an issue with this dataset, or may there be a problem with our implementation? If the former, could you please come up with 1-2 sentences discussion why this pattern is not visible in all cases? And maybe link to an example where it works well...

@grst

This comment was marked as resolved.

@MKanetscheider
Copy link
Collaborator Author

MKanetscheider commented Oct 15, 2024

Just dropping comments here as I go through the notebook...

Section Define clonotype clusters: I don't really see the bimodality in the plots. Is this just an issue with this dataset, or may there be a problem with our implementation? If the former, could you please come up with 1-2 sentences discussion why this pattern is not visible in all cases? And maybe link to an example where it works well...

Actually I think our implementation is fine as this "bimodality" seems to be just somewhat resemble a bimodality like you can see here in the shazam tutorial (https://shazam.readthedocs.io/en/latest/vignettes/DistToNearest-Vignette/). I think that's also the reason why they came up with a computational model to select an appropriate threshold as it's usually not very clear just from the plot.
I just wrote a short discussion that this can occur and that in such cases a fixed threshold might reduce human bias...I know this is not ideal, but as we don't have a way to automatically define bimodalities this should be sufficient for now

@MKanetscheider
Copy link
Collaborator Author

MKanetscheider commented Oct 15, 2024

Regarding preprocessing, did you also check out if nf-core/airrflow is an option for re-annoation? That could also be a pretty smooth workflow to run a nextflow pipeline first (it also does some standard analyses) and then follow up with scirpy for more custom analyses.

Yes I did. It should be usable as a re-annotation tool as it works with single-cell-data derived from Cellranger and it does output a .tsv file, which follows the AIRR community standards. Do you want to integrate this somehow into the tutorial?

@grst
Copy link
Collaborator

grst commented Oct 17, 2024

For now, I removed a few sections that depend on other open PRs (#536, #534, #535) and copied the content over to those PRs. I believe like that we can wrap up this PR faster and discuss the other sections in a more focused manner.

Yes I did. It should be usable as a re-annotation tool as it works with single-cell-data derived from Cellranger and it does output a .tsv file, which follows the AIRR community standards. Do you want to integrate this somehow into the tutorial?

I think it might be even easier to use than dandelion for preprocessing. If you think it gives equally good results I think we should mention it as another option to do preprocessing in the corresponding section.

@MKanetscheider
Copy link
Collaborator Author

For now, I removed a few sections that depend on other open PRs (#536, #534, #535) and copied the content over to those PRs. I believe like that we can wrap up this PR faster and discuss the other sections in a more focused manner.

Yes, you are definitely right. In some manner this tutorial is almost finished, but it depends of course if and how much we are changing in the remaining PRs. So it makes sense to wrap this one up and add sections as part of the other PRs.

I think it might be even easier to use than dandelion for preprocessing. If you think it gives equally good results I think we should mention it as another option to do preprocessing in the corresponding section.

If you wish, I will add a reference in an appropriate place so that the user is aware of this possibility 👍
The interesting thing is that Dandelion also relies a lot on Immcantation so the re-annotation pipeline is essentially the same. The only difference I can see is that with dandelion one has the possibility to change from a dandelion object to AnnData/MuData quite easily, while in the nf-core workflow one has to write and read an appropriate file first. Either way I don't feel like that should be a big obstacle. 😄

@grst
Copy link
Collaborator

grst commented Nov 1, 2024

I went through the remaining bits and also added a reference to AIRRflow.
Thanks for your patience and persistence while working on this!

We'll follow up on the missing pieces in #536, #535 and #534

I'll merge this as soon as the tests ran through.

@grst grst merged commit 86e93ce into scverse:main Nov 1, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BCR-tutorial
2 participants