Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Augment sarscov2 community builds listing and include datasets in sequence search #266

Closed
wants to merge 13 commits into from

Conversation

eharkins
Copy link
Contributor

see commit messages for now, just putting this up for review since I came to a good breaking point even though there is more to do.

@tsibley tsibley temporarily deployed to nextstrain-s-fetch-comm-qgeg3k January 27, 2021 20:20 Inactive
@eharkins eharkins temporarily deployed to nextstrain-s-fetch-comm-qgeg3k January 28, 2021 01:02 Inactive
@eharkins eharkins temporarily deployed to nextstrain-s-fetch-comm-qgeg3k January 29, 2021 21:28 Inactive
@eharkins
Copy link
Contributor Author

eharkins commented Jan 29, 2021

@jameshadfield Is this clear enough or should we explicitly indicate somehow whether datasets in sequence search results are from Nextstrain or the community?

Screen Shot 2021-01-29 at 1 59 57 PM

(edit: so far you can only get community datasets in search results like this by changing the search results jsonUrl locally, since i havent uploaded this to data.nextstrain.org yet)

@emmahodcroft
Copy link
Member

Just an idea, but could we port the same symbols you used for the map view of builds instead of showing the 'folded corner page' symbol? That would also indicate how 'focused' the build in, which might also help people decide which one might be of most help, depending on why/what they're looking for!

creates an augmented version of
static-site/content/allSARS-CoV-2Builds.yaml
by fetching the actual datasets
to get information about them such
as date updated and number of
tips to be displayed on the build
listing and map.
building on:
081bd55633133e586a3dae86617564e1f2854418
, which fetched community datasets listed
in static-site/content/allSARS-CoV-2Builds.yaml
in order to augment that list with metadata.
This takes the fetched community datasets
and includes them in the sarscov2 search
results database file. Still to-do is testing
this on the front end.
@eharkins eharkins temporarily deployed to nextstrain-s-fetch-comm-qgeg3k February 3, 2021 02:35 Inactive
@eharkins eharkins temporarily deployed to nextstrain-s-fetch-comm-qgeg3k February 3, 2021 02:44 Inactive
@eharkins
Copy link
Contributor Author

eharkins commented Feb 3, 2021

Good idea @emmahodcroft. I think we could do this although the build level isn't immediately accessible in the strain map created for the search results since it's currently just strain -> list of dataset indexes, but we could match up those datasets names with entries from the build list that has the levels in it if we want to go with those icons.

@eharkins eharkins marked this pull request as ready for review February 3, 2021 02:45
@eharkins
Copy link
Contributor Author

eharkins commented Feb 3, 2021

@jameshadfield this should be functional so I marked ready for review. However, before we merge, I still need to change a few URLs to data.nextstrain.org from staging.nextstrain.org.

Just realized I also need to do this (as we discussed last week):
Go through the sars-cov-2 build catalogue and make sure all entries have URLs that point to an auspice server so that we can include all of them and not filter them as we currently are. I believe you mentioned wanting to include the core nextstrain builds in the "augmenting" process so that we can get metadata for those on the map as well. Just noting to myself here that in doing that I'll need to make sure that doesn't result in duplicate datasets in the strain map, since I imagine the nextstrain builds in the catalogue are (all?) also in the s3 bucket that we have been using for search results.

@emmahodcroft
Copy link
Member

Yes, good points @eharkins. I think this is an 'optional extra' rather than a dealbreaker - so if it's a lot of work do to this matching & maintain it - no worries!

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eharkins -- this is looking really good.

I think there's a number of optional extras which I'd be in favor of making issues for rather than solving in this PR:

  • augmenting core datasets (as mentioned above)
  • changing the search icons, if possible
  • cleaning up the displayed URL in search so that we don't display the query part

The functionality and code looks good - tested it a few different ways and it handles problems well (including if the fetch fails).

Things to add / fix before merge:

  • dev docs to explain that the manually curated YAML is being augmented, and that this is happening via GitHub actions (it is, right?)
  • brief dev docs to explain what extra information's added and what's needed (backend + frontend) to add more here
  • URLs to change (as you noted)
  • The augmented YAML currently includes “block chomping indicators” (YAML is crazy when it comes to multi-line strings) which should be removed. For instance:
  - url: >-
      http://auspice.finlaymagui.re/ncov/north-america/canada/ontario?f_division=Ontario
    name: Ontario, Canada
    geo: ontario
    level: division
  • ./scripts/collect-pathogen-resources.js --pathogen seasonal-flu fails:
Writing search data to file  ./data/search_seasonal-flu.json
(node:18117) UnhandledPromiseRejectionWarning: TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received undefined
    at validateString (internal/validators.js:124:11)
    at Object.basename (path.js:1156:5)
    at main (/Users/enigma/projects/nextstrain/nextstrain.org/scripts/collect-pathogen-resources.js:96:43)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
...

P.S. I hadn't realised that Nextstrain groups datasets were part of this YAML - it's awesome that they are 😄

@eharkins eharkins temporarily deployed to nextstrain-s-fetch-comm-qgeg3k February 5, 2021 00:20 Inactive
scripts/collect-search-results.js reads in a
list of builds in a manually maintained pathogen
build catalogue yaml file such as
static-site/content/allSARS-CoV-2Builds.yaml
and produces an augmented version with metadata
from each corresponding dataset.
(see 82e072d).
That augmented yaml file is stored on s3 and
fetched here to populate e.g the build-map
(front-end manisfestation of that pathogen build
catalogue on the page) with metadata about each
dataset (e.g. date updated).
this makes this component more
generally applicable for when we
want to have similar build catalogues
for other pathogens
we have a couple of scripts that are
utilities for creating and checking the
format of data resources behind pages
like nextstrain.org/sars-cov-2. This
generalizes the code in those scripts
so that they make sense when we
apply them to other pathogens in
the future
also update the github action using
scripts/collect-pathogen-resources.js

also note that upon merging this we
will need to update a link in the following
doc:
https://docs.nextstrain.org/en/latest/guides/share/sars-cov-2.html ;
the link points to:
scripts/check-sars-cov-2-builds-yaml.js
which is now called:
scripts/check-build-catalogue-yaml.js
this takes the augmenting of the build
catalogue and abstracts it into a function
fixes bug when running for seasonal-flu
introduced by work on this script. Better
logic overall for writing outputs and
printing messages
@eharkins eharkins temporarily deployed to nextstrain-s-fetch-comm-qgeg3k February 5, 2021 01:37 Inactive
eharkins added a commit to nextstrain/docs.nextstrain.org that referenced this pull request Feb 5, 2021
@eharkins
Copy link
Contributor Author

eharkins commented Feb 5, 2021

Thanks @jameshadfield! Cleaned up the commit history a bit and took care of these three:

URLs to change (as you noted)

The augmented YAML currently includes “block chomping indicators” (YAML is crazy when it comes to multi-line strings) which should be removed.

gen-resources.js --pathogen seasonal-flu fails

I also ran nextstrain remote upload s3://nextstrain-data data/allSARS-CoV-2Builds.augmented.yaml so that that file exists on s3 for when we merge and we dont have to wait for a GH action to run (and so that it works on the heroku dev app before we merge).

Docs PR to address dev docs concerns: nextstrain/docs.nextstrain.org#44

this directory contains components
which are now generally applicable
to any page that wants to display a
collection of info about any grouping
of builds (e.g. flu, groups, commeunity),
not just for sars-cov-2.
@eharkins eharkins temporarily deployed to nextstrain-s-fetch-comm-qgeg3k February 8, 2021 22:16 Inactive
@eharkins eharkins mentioned this pull request Feb 10, 2021
@jameshadfield jameshadfield temporarily deployed to nextstrain-s-fetch-comm-zq3gez March 4, 2021 01:39 Inactive
@jameshadfield
Copy link
Member

Closing this as all commits are in #271 - Eli please reopen if i'm missing something!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants