Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Preprocessing Module #25

Open
jejjohnson opened this issue Feb 26, 2023 · 0 comments
Open

ML Preprocessing Module #25

jejjohnson opened this issue Feb 26, 2023 · 0 comments

Comments

@jejjohnson
Copy link
Owner

This will define some factors that will be important for the ML method where we define the input, output and meta. For example, we may want to input a batch of coordinates or patches from a cube. This can potentially be a difficult step due to the “heaviness” of the data.


Case Studies:

  • Interpolation: Batch of coordinates
  • Interpolation: Batch of patches, x=ssh_sims(lat x lon x time), y=ssh_obs(lat x lon x time),
  • Surrogate: Batch of patches, x=ssh(lat x lon x t-1), y=ssh(lat x lon x t)
  1. Patch-ify, e.g. [h,w] -> [P, h//factor, w//factor]
  2. Cube-ify, e.g. [h,w,c] -> [P, c//factor, h//factor, w//factor]
  3. Coordinate-ify, e.g. xr.dataset -> pd.DataFrame, [LAT,LON,TIME,SSH] -> [ssh,lat,lon,time]*(latxlonxtimex1)
    1. broadcast_like
    2. ravel
    3. storage suited for local sampling!, e.g. 1) storage for frequency sampling, 2) storage for local sampling
  4. Save sub-patches/cubes, e.g. zarr (Examples - ex1a| ex1b | ex2a | ex2b
  5. Save coordinates, e.g. csv, parquet (Examples - Kaggle | Nvidia-merlin | nvtabular
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant