Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init/runoff #156

Open
wants to merge 773 commits into
base: master
Choose a base branch
from
Open

Init/runoff #156

wants to merge 773 commits into from

Conversation

tommylees112
Copy link
Contributor

@tommylees112 tommylees112 commented Mar 4, 2020

NOTE:

  • EarlyStopping is currently not working because I haven't created a train/validation test set

Create xy samples dynamically from Data loaded into memory

sorry this is a huge PR where we have basically re-written the Engineer/DataLoaders/Models to work with data loaded into memory. Better for hard disk constrained modelling problems where the size of the seq_length is larger (e.g. 365 daily timesteps as input to the LSTM models).

Use the Pipeline for working with runoff data.

  • data is 2D instead of 3D (station_id, time)
  • data is on smaller timesteps than monthly (daily)
  • create dynamic engineer
  • create dynamic dataloader
  • update the EALSTM / Neural Networks to work with DynamicDataLoaders
  • new arguments to models = 'seq_length', 'target_var', 'forecast_horizon'

We have created an experiment file for running the OneTimestepForecast Runoff modelling:
scripts/experiments/18_runoff_init.py

Analysis updates

We have added some updates to the analysis code:

  • overview: update all rmse/r2 functions to calculate spatial scores (score for each spatial unit) and temporal scores (time series of each station)
  • add more catching of the inversion problem (turns out it occurs when the order of lat, lon is reversed -> lon, lat

Engineer updates

  • Create new engineer OneTimestepForecast - src/engineer/one_timestep_forecast.py
  • Created a new DynamicEngineer for use with the DynamicDataLoader
    NOTE do we want this or do we ideally want to generalise the one_month_forecast?
  • Major difference is collapsing things not by lat, lon but by dimension_name = [c for c in static_ds.coords][0]

DataLoader Updates

  • self.get_reducing_dims to get the spatial dimensions (either latlon or area or station_id or whatever is not time!)
  • aggregations collapse over these reducing dimensions
    global_mean = x.mean(dim=reducing_dims)
  • build_loc_to_idx_mapping building a dictionary to ensure we can track what id relates to what spatial unit
  • Various examples of if len(static_np.shape) == 3: having to account for 2D spatial information (time, lat, lon) or 1D spatial information (time, station_id)

TODO:
# TODO: why so many static nones?

  • This is because the standard deviation of some of the values, stored in the normalizing_dict become 0, so dividing by 0 we get np.nan

Model updates

  • seq_length // include_timestep_aggs
  • use a dataloader for the load in timesteps for x, y in tqdm.tqdm(train_dataloader):
  • include_monthly_aggs -> include_timestep_aggs = spatial aggregation (map of mean values for that pixel)

@tommylees112 tommylees112 added the wip Work in progress - not ready for merging label Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wip Work in progress - not ready for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant