-
Notifications
You must be signed in to change notification settings - Fork 129
DEA Notebooks Hackathon: Make DEA Notebooks faster!
Our aim is to make DEA Notebooks faster and more efficient so they run more quickly for our users and in our integration tests.
More efficient notebooks will lead to an improved user experience, and allow us to expand our test coverage across the entire DEA Notebooks repository. This will free us up from having to manually check and test notebooks for broken code, and make sure we fix issues before they impact our users.
Important
We want to preserve the overall "purpose" of our notebooks when making changes.
We should only make changes if we can do it without making our examples less useful or informative to our users. The guide below provides an example workflow you can follow on the day!
- Look at the spreadsheet here, and choose a notebook to focus on from the "Longest running notebooks" column (you can view the notebooks on Knowledge Hub to help you choose)
- Open the DEA Sandbox (https://app.sandbox.dea.ga.gov.au/), and launch the
Default environment 2 Cores, 16G Memory
server option (this is what our external users use, and is most similar to how we run our tests)
-
If you haven't edited DEA Notebooks before, follow the technical guide below to get started with Git on the DEA Sandbox.
-
Launch your notebook on the DEA Sandbox. Before making any changes, read through and run each cell in your notebook carefully. Try and understand its overall purpose or message, i.e. what is it trying to convey to our users? What functionality is it showing off? What do we need to keep so make sure the example is still useful?
-
Once you understand the purpose and approach of the notebook, look for places where we can make it faster to run without affecting its overall purpose. For most notebooks, these will be the most important things to look at:
- Reducing the time period, e.g.
time=...
(can the notebook be run one one year of data instead of two without affecting its conclusions?)- Reducing the area/extent, e.g.
x=..., y=...
(can we load data for a smaller area and still demonstrate the same functionality?)- Loading fewer products, e.g.
products=...
(e.g. do we have to load data from Landsat 7 and 8 if just Landsat 8 will do?)
Changes to time/spatial extents/products should be enough for most notebooks. Some other more advanced ideas include:
- Filtering to less cloudy images by metadata (e.g.
cloud_cover=(0, 10)
) to load only clear images- Updating code to be more efficient (e.g. using built-in
xarray
ornumpy
tools instead of for-loops etc)
-
Once you have made some changes to the notebook, double check that the notebook markdown cells still correctly match and describe the analysis (e.g. update references to time periods/locations to match your new values).
-
Re-run the entire notebook (
Kernel > Restart kernel and run all cells
), then commit it back into the repo for review! (see Git details below)
Tip
If you can't find an easy way to update the notebook without impacting its purpose, that's completely fine - some will be easier or harder than others! Feel free to skip it and move onto something new. 🙂
If it is your first time using the DEA Sandbox, follow this guide to register: https://knowledge.dea.ga.gov.au/guides/setup/Sandbox/sandbox/
If it is your first time editing a notebook, follow this guide to setting up DEA Notebooks with Git: https://github.com/GeoscienceAustralia/dea-notebooks/wiki/Edit-a-DEA-Notebook
More details about using the DEA Sandbox and DEA Notebooks are available here:
Updating this wiki: If you notice anything incorrect or out of date in this wiki, please feel free to make an edit!
License: All code in this repository is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance with any of the Jupyter Notebooks or Python code in this repository, please post a question on the Open Data Cube Discord chat or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with any notebook, you can file one on Github.