-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider changing data logic #174
Comments
@ryantibs this is a candidate improvement to complete before the October 5 meeting |
Thanks @krivard. And sounds like a good idea, @sgratzl What happened to the idea of reading data from disk instead of from the S3 bucket? I remember @nmdefries mentioning that it's actually slow to read from the S3 bucket. And then I suggested we just download a local clone of all the data each Monday that we can read from for the dashboard. Has that been implemented and does it lead to speed improvements? |
I've implemented caching for the score data in #169 that only loads the score data twice a week, after each pipeline run. Every other user besides the first one following a pipeline run will be reading the scores from memory. Releasing the caching change is waiting on another PR -- let me go poke some people. |
This behavior has been added for target variables, but geo types are still combined due to the only small additional benefit that change would provide. Most of the time we're handling state forecasts, both when a single state is requested or when we're summarizing across them; separating out US data only saves effort ~1/60 of the time. A full fix here could be to load and store separately all target variable x forecaster x geo (x date?) combinations so we can load minimal data at each step and can filter chunks by dir name (in a hierarchical file structure)/element name (in a list) instead of filtering data by row. Should be a lot faster. |
atm. the app loads all the data stored in multiple files (cases, death, hospitalizations, per us/states) and then one of the first steps is to filter them again by targetVariable (cases, deaths, ...), scoreType, and location.
one option would be to load only the data that is really needed and better split the up in multiple files (targetVariable x score x location (nation or states)). This would reduce the initial loading time and with https://shiny.rstudio.com/reference/shiny/1.6.0/bindCache.html shiny could take care of caching datasets.
The text was updated successfully, but these errors were encountered: