consider changing data logic #174

sgratzl · 2021-09-18T19:46:51Z

atm. the app loads all the data stored in multiple files (cases, death, hospitalizations, per us/states) and then one of the first steps is to filter them again by targetVariable (cases, deaths, ...), scoreType, and location.

one option would be to load only the data that is really needed and better split the up in multiple files (targetVariable x score x location (nation or states)). This would reduce the initial loading time and with https://shiny.rstudio.com/reference/shiny/1.6.0/bindCache.html shiny could take care of caching datasets.

krivard · 2021-09-22T14:22:21Z

@ryantibs this is a candidate improvement to complete before the October 5 meeting

ryantibs · 2021-09-23T14:22:03Z

Thanks @krivard. And sounds like a good idea, @sgratzl

What happened to the idea of reading data from disk instead of from the S3 bucket? I remember @nmdefries mentioning that it's actually slow to read from the S3 bucket. And then I suggested we just download a local clone of all the data each Monday that we can read from for the dashboard. Has that been implemented and does it lead to speed improvements?

nmdefries · 2021-09-23T14:35:25Z

I've implemented caching for the score data in #169 that only loads the score data twice a week, after each pipeline run. Every other user besides the first one following a pipeline run will be reading the scores from memory. Releasing the caching change is waiting on another PR -- let me go poke some people.

nmdefries · 2023-04-17T14:03:47Z

This behavior has been added for target variables, but geo types are still combined due to the only small additional benefit that change would provide. Most of the time we're handling state forecasts, both when a single state is requested or when we're summarizing across them; separating out US data only saves effort ~1/60 of the time.

A full fix here could be to load and store separately all target variable x forecaster x geo (x date?) combinations so we can load minimal data at each step and can filter chunks by dir name (in a hierarchical file structure)/element name (in a list) instead of filtering data by row. Should be a lot faster.

sgratzl added the enhancement New feature or request label Sep 24, 2021

nmdefries mentioned this issue Mar 31, 2023

Load and cache target variable data separately #257

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider changing data logic #174

consider changing data logic #174

sgratzl commented Sep 18, 2021

krivard commented Sep 22, 2021

ryantibs commented Sep 23, 2021

nmdefries commented Sep 23, 2021

nmdefries commented Apr 17, 2023 •

edited

Loading

consider changing data logic #174

consider changing data logic #174

Comments

sgratzl commented Sep 18, 2021

krivard commented Sep 22, 2021

ryantibs commented Sep 23, 2021

nmdefries commented Sep 23, 2021

nmdefries commented Apr 17, 2023 • edited Loading

nmdefries commented Apr 17, 2023 •

edited

Loading