-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: reduce number of top level packages #331
Comments
I'm not addressing your post quite yet, but will have a read and do so. But related, I would like to refactor the |
Another quick point is that we could combine the two apps installs. |
I do agree that odc-tools/libs/aws/odc/aws/misc.py Line 8 in 03235c5
But ultimately it's not so much threading vs async that would allow for better performance but a better strategy of user guided "parallell" listing that combines some shallow directory listing followed by deep prefix listing running across several "threads", regardless of whether those threads are async or "normal". |
I know what you mean about parallel listing. Not sure if there's an elegant way to partition that. Damien said that the |
made this #332 for this |
It's only used by odc-aio and we have too many projects
whatever remaining functionality was there was moved to odc.algo.*, as this is where all the Dask related experiments/utilities are now
It's only used by odc-aio and we have too many projects
whatever remaining functionality was there was moved to odc.algo.*, as this is where all the Dask related experiments/utilities are now
Update
Still not sure about cloud libs. I'm leaning towards making it one package with feature flags to pull in optional dependencies like thredds, might actually make it easier to refactor S3Fetcher to work without |
I agree that we should merge cloud and dc apps, keep it simple. |
@alexgleith apps I'm not too concerned about as they are "leaf nodes" and can be merged later on without much disruption. What do we do with libraries though. We have 4 related and separate libs:
One option is to put it into |
Ok, sorry. Yeah, I'm happy with Thredds is such a niche use case, and I think if they're using it they have bigger problems than needing to have an unused boto dependency! |
Problem
This repository grew "organically" for a while and some of the earlier experiments have proven to be less relevant than other. The side effect of this growth is a large number of packages and namespaces. Currently there is a one to one mapping between namespace and package, like
odc.algo.*
is shipped byodc-algo
, andodc-algo
ships only files inodc/algo
. While this is a reasonable and clean relation it does make for a larger number of packages. On one hand this allows for higher granularity when declaring dependencies, on the other, packages are not "free". It adds to CI delays, makes renaming and moving code around harder, and makes publishing/managing to pypi/conda harder too as more secrets need to be managed, maintainers need to be added to more projects etc. (I spent some time adding @GypsyBojangles as an owner to every pypi project pushed from this repo, and will need to do a similar thing for generating publishing tokens.)Stocktake
First let's decide which thing we definitely keeping as is. I'd say that apps can stay as they are. And as far as user facing libraries go I have this list:
odc.algo
- mostly xarray + dask tool: examplexr_rerpoject
odc.ui
- tools for visualizations in jupyterodc.stac
- (previouslyodc.index
) STAC and missing datacube index utilitiesodc.stats
- large scale data processing libs/apps (work in progress)odc.dscache
- used byodc.stats
, but has other possible future use cases (odc index export/import for example)Then we have "cloud io helper libs" that are mostly used via apps or other higher level libs and not directly by users.
odc.aws
- AWS S3 and SQSodc.azure
- Azure blob storageodc.thredds
- Crawling THREDDSodc.aio
- AWS S3, but async, has annoyingaiobotocore
dependencyThe reason why these are all separate is due to dependencies they pull in. For example
odc.aio
depends onaiobotocore
which is really challenging to install in the presence of dependencies onboto*
libraries. One option is to put them all into one package, sayodc-cloud
, and use feature flags to enable/disable features, so instead of depending onodc-azure
one would declare dependency onodc-cloud[AZURE]
.Finally there is an odd bunch of libs
odc.io
- poor name, used by cli apps, for various unrelated text processing helpersodc.ppt
- "paralllell processing tools", some generic "Future" object handling(not used) andAsync->Thread
adapterodc.aio
odc.dtools
- not used, mostly moved todatacube
, used to have "rasterio environment activation/configuration" helpers.odc.geom
- not used, mostly moved todatacube
, still has some unfinished work that might be of use later onImmediate Actions
Dissolveodc.ppt
by movingAsyncThread
intoodc.aio
, and abandoning the restDissolveodc.dtools
, possibly move some of things intoodc.algo
(and add tests)Remove dead code from other libsThe text was updated successfully, but these errors were encountered: