Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

breaking change: explore alternative to handling "real" datasets in this package #303

Open
jdhoffa opened this issue Mar 22, 2023 · 5 comments
Labels
breaking change ☠️ API change likely to affect existing code wontfix

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Mar 22, 2023

This package currently serves two purposes.
1 - To provide mock data used for CI/CD, testing and demonstration purposes
2 - It contains a few "real" datasets that are actually used in the banks analysis

We should explore if this is actually desirable, and potentially find an alternative path to distribute "real" datasets.

Real datasets include:

  • sector_classifications (and all datasets inside that dataset)
  • green_or_brown
  • iso_codes
  • region_isos

Some of these datasets can/ should perhaps be added into the (private) package: pacta.scenario.preparation

Note: this would require equivalent changes in:
https://github.com/RMI-PACTA/r2dii.match
https://github.com/RMI-PACTA/r2dii.analysis

AB#10166

@jdhoffa
Copy link
Member Author

jdhoffa commented Mar 22, 2023

@cjyetman I know you don't have a lot/ any context on this, but let me know if you have thoughts

@cjyetman
Copy link
Member

I've been hoping that we make pacta.scenario.preparation public too, but maybe that's way off from now?

@jdhoffa
Copy link
Member Author

jdhoffa commented Mar 22, 2023

I think unfortunately that's way off. pacta.scenario.preparation contains legit/ real/ genuine scenario data, I don't think we have the ability to host that data publicly.

At some point, perhaps we can remove the data, and just have the prep functions (I am def not against that), but we need a good way of storing and versioning our data first.

Until then, I think a pkg is the best way to keep the small-medium sized "real" datasets (and also make sure we are all using the same central dataset(s))

@jdhoffa
Copy link
Member Author

jdhoffa commented Mar 22, 2023

An, in particular these two datasets are pretty scenario specific anyway:
green_or_brown
region_isos

The other two we would need to figure out how to store somewhere:
sector_classifications
iso_codes - We can/ should/ could just depend on your package for this. I'm happy to use that as a dependency

@jdhoffa jdhoffa added the breaking change ☠️ API change likely to affect existing code label Apr 14, 2023
@jdhoffa jdhoffa added the medium Likely finished in under a week label Feb 6, 2024
@jdhoffa jdhoffa changed the title Explore if "real" datasets should live in this repository breaking change: explore alternative to handling "real" datasets in this package Mar 6, 2024
@jdhoffa jdhoffa added the ADO Add issue to ADO label Mar 6, 2024
@jdhoffa
Copy link
Member Author

jdhoffa commented Mar 18, 2024

This breaking change is too large given the current r2dii architecture.
However it is absolutely important considerations for future iterations of this software.

Tagging as "wontfix"

@jdhoffa jdhoffa added wontfix and removed medium Likely finished in under a week ADO Add issue to ADO labels Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ☠️ API change likely to affect existing code wontfix
Projects
None yet
Development

No branches or pull requests

2 participants