Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guide on how to estimate clade frequencies #53

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Mar 15, 2021

Description of proposed changes

Adds the Jupyter notebook and corresponding restructured text version of a how-to guide to estimate clade frequencies from SARS-CoV-2 data.

An open question with this guide (and others like it in the future) is where we should source the data. The benefit to the current approach is that it does not require users to prepare any data in advance; data are fetched from the live Nextstrain builds. The disadvantages of this approach are that the guide's static figures quickly diverge from current data and we don't show users how to load their own local data which may be much more relevant.

Another potential issue is how we should maintain guides like this that we generate directly from a notebook environment. To prepare this guide for the docs, I had to manually copy images into the images directory and rename them for clarity. The HTML/CSS presentation of tables is also not ideal. We might want to standardize these steps for future guides, even if the standards are a checklist in the documentation's documentation.

Testing

The initial guide was tested by @kistlerk and this version reflects edits based on (most of) her comments. One comment I did not address here was a suggestion to allow users to source their own local data for the guide instead of fetching the live Nextstrain data (see discussion above).

The guide is available through this PR's RTD build.

Adds the Jupyter notebook and corresponding restructured text version of
a how-to guide to estimate clade frequencies from SARS-CoV-2 data.
Comment on lines +73 to +74
tree_url = "https://data.nextstrain.org/ncov_global.json"
frequencies_url = "https://data.nextstrain.org/ncov_global_tip-frequencies.json"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tree_url = "https://data.nextstrain.org/ncov_global.json"
frequencies_url = "https://data.nextstrain.org/ncov_global_tip-frequencies.json"
tree_url = "https://data.nextstrain.org/ncov_open_global.json"
frequencies_url = "https://data.nextstrain.org/ncov_open_global_tip-frequencies.json"

@@ -0,0 +1,758 @@
========================================================
Estimate frequencies of phylogenetic clades through time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential issue is how we should maintain guides like this that we generate directly from a notebook environment.

What about an extension like nbsphinx or MyST-NB? We'd still have to run the notebook to generate new plots but less manual work than maintaining the same content in both .ipynb and .rst.

@huddlej
Copy link
Contributor Author

huddlej commented Apr 27, 2022

During issue triage we also realized that this guide can be updated to use Nextstrain open data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

2 participants