[MAIN] Climate Stripes process for global data #7

agstephens · 2024-10-08T15:35:05Z

This is a main ticket from which you will want to create a set of sub-tickets.

What is the CEDA WPS?

The CEDA WPS (Web Processing Service) is actually a collection of tools/components/services. The front-end is a web application written in Pyramid that provides users with access to a range of functions (known as processes in WPS) that are packaged together in individual services called WPSs.

Each WPS provides an endpoint that follows the OGC WPS specification, an API exposing simple methods:

GetCapabilities - find out service metadata and which processes exist
DescribeProcess - find out which parameters need to be defined to call a process
Execute - run a process with the parameters provided in the user request

List of WPS names and functions

What are the WPS repositories/components?

flamingo: data-subsetter (HadObs, CRUTS)
goldfinch: midas-extract (MIDAS Subsetter)
swallow: name-model (Met Office NAME dispersion model)
vulture: compliance-checker (CF checker)
phoenix: the user interface

What is the Climate Stripes process?

We currently provide a "Climate Stripes" CEDA WPS Process here:
https://ceda-wps-ui.ceda.ac.uk/processes/execute?wps=compliance_checker&process=PlotClimateStripes

The climate stripes process is currently implemented as part of vulture (but we will probably move it to its own WPS as part of this work.

The user form for submitting a job looks like this for the current UK Climate Stripes process:

It generates a PDF, so that people can generate climate stripes for a UK location, and find out what the colours are (for re-use in other projects). Which looks like this:

The code

The code is located here:

Next step: climate stripes for global data

The SCD Team is in the USA for a conference in November. They would like to demo the WPS tool during this conference.

So we would like to:

expand to global usage

Deadline

Show prototype (for feedback) by: 31st October
Should be fully delivered by: 12th November
Conference date: 17th November

MVP - Minimal Viable Product

A new process that sits on top of a global dataset: CRU_TS (see section below: "Which dataset should we use?")
Liaise with Ed about getting access to latest NetCDF version (if relevant)

Better than MVP

MVP+:

Better checking and explanation of incorrect inputs (or out of bounds)
Possibly, click on a map to select a single lat/lon point
- NOTE: OpenLayers (the map UI framework) can return coordinates from a click - so might be relatively easy to implement.
Create a new WPS specifically for Stripes (moving it away from the compliance checker WPS)
Improved mark-up formatting for the description part:
- Would be nice to have active hyperlinks, paragraphs etc in here (maybe from Markdown)

Which dataset should we use?

We propose using this dataset:

CRU TS4.08: Climatic Research Unit (CRU) Time-Series (TS) version 4.08 of high-resolution gridded data of month-by-month variation in climate (Jan. 1901- Dec. 2023) (Ed is responsible for this data)
Here is the only data file that we need to use: /badc/cru/data/cru_ts/cru_ts_4.08/data/tmp/cru_ts4.08.1901.2023.tmp.dat.nc
This is simpler than the current dataset, which uses a Kerchunk JSON file to link to multiple NC files.

Development environment

As well as the production system, there is a development (staging) VM where we run both the front-end web app (using nginx) and the backend WPS services (using supervisord). The staging server is only visible inside the JASMIN firewall (i.e. on the VPN). It is visible at: https://ceda-wps-staging.ceda.ac.uk/

Note that wiring up the client and server on a single test VM is complex, more instructions are here:

cedadev/swallow#15 (comment)

Working with the Dev back-end WPS (vulture)

To access the code and service for the vulture, do:

ssh [email protected]
source setup-vulture.sh

This will activate the conda environment that runs vulture and it will put you in the directory where you can potentially make changes.

NOTE: the directory you are in, /usr/local/src/vulture, is a checkout of the new Git global-stripes branch.

Start the service with:

vulture start -d --outputurl=http://ceda-wps-staging.ceda.ac.uk/outputs \
                         --outputpath=/gws/nopw/j04/ceda_wps/birds/test/outputs/vulture

Stop the service with:

vulture stop

NOTE: The above settings are not how it is managed on the production system - it is only done this way in order to make the server and client compatible on the dev server.

Working with the Dev front-end (phoenix)

To access the code and service for the front end, do:

ssh [email protected]
source setup-env.sh

This will activate the conda environment that runs phoenix and it will put you in the directory where you can potentially make changes.

NOTE: the directory you are in, /code/local/pyramid-phoenix, is a checkout of the Git master branch.

Restart the service with:

make restart

The admin interface - used for adding a WPS

When you have made changes to a WPS back-end, you may need to re-add it to the front-end via the admin interface (get the password from Alan/Ag), which looks like:

To get to that page, click on Settings --> Services --> Register a new service. Fill out the form as above.

Git etiquette

Make sure that you create a new branch in the local git repo if you are making changes.
When you are ready for changes to be reviewed, create a Pull Request on GitHub and ask @agstephens or @alaniwi to review and accept it.

Proposed approach

I suggest you follow this approach:

Step 1:

Copy and paste the two main files (listed above) and just rename to "wps_plot_climate_stripes_global" and "WPSPlotClimateStripesGlobal" in the relevant places.
Wire up the new code to the NetCDF file that we are using and then remove the parts you don't need in the new version.
Update the /usr/local/birdhouse/etc/phoenix/ceda_process_role_map.json file to include the new process ID in the open processes. Then restart the front end.
Update a few other files that are required for imports to work - look in some of the __init__.py files for this.
Note that the UK stuff includes translating lat/lon to OGSB - in the global data you don't need any of that.
Note that the Kerchunk stuff is all redundant for the Global NC file so you can open it directly with xarray.
Get it running.

Step 2:

Refactor the vulture/stripes_lib/stripes.py module so that both the UK and Global files use mainly the same code.
Create a new WPS, and move both processes into that.
Rename the processes so that UK and Global are in the names.
Look at other improvements on the list above.

Troubleshooting and general tips

Do you want more detailed logging on the backend (vulture)?

Use:

vulture start -d --outputurl=http://ceda-wps-staging.ceda.ac.uk/outputs --outputpath=/gws/nopw/j04/ceda_wps/birds/test/outputs/vulture --log-level=DEBUG

There are a few unit tests

Once you are in the correct vulture setup, you can run:

python -m pytest - it should run some tests (more would be great ;-)

Keep getting a 502 error in the front-end

A 502 error is normally associated with a problem or error in the CEDA security config file on the server. Maybe this config file needs editing and phoenix needs restarting:

/usr/local/birdhouse/etc/phoenix/ceda_process_role_map.json

Possible issues could be:

incorrect format of file (not valid JSON)
spelling errors
process not in "open" processes list when it should be
process name has not yet been added to the file.

This file manages access control so it is often the reason why Phoenix just gets stuck and returns a 502 error.

Jobs are accepted but never run (when watching them in the front-end logs)

Sometimes jobs start running fine, but then just sit in a suspended status with no indication of why. On the front-end, you just see that the job was accepted but no other logs. Here is an explanation:

In the config of vulture, it has a fixed number of jobs that can be run concurrently - which is probably set by the command-line option: --maxprocesses
The --database option uses the local file, pywps-logs.sqlite as the database that Vulture uses to manage and record its job details. (It is confusingly called logs when it is actually the server-side persistent DB).
If the number of jobs submitted that didn't complete exceeds maxprocesses then new jobs will sit in the database as being in progress - which means the Pywps thinks they are running.
It will then block more jobs from running and the front-end will just show them as accepted in its output logs - but will never progress further.

The fix is simple:

vulture stop
rm -f pywps-logs.sqlite
vulture start -d --outputurl=http://ceda-wps-staging.ceda.ac.uk/outputs \ 
    --outputpath=/gws/nopw/j04/ceda_wps/birds/test/outputs/vulture --log-level=DEBUG

And then it should all work again.

As part of tidying things up, please change the vulture/cli.py file so that that the default database option is called: pywps-db.sqlite instead of pywps-logs.sqlite.

The text was updated successfully, but these errors were encountered:

agstephens assigned Adik8688 Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAIN] Climate Stripes process for global data #7

[MAIN] Climate Stripes process for global data #7

agstephens commented Oct 8, 2024 •

edited

Loading

[MAIN] Climate Stripes process for global data #7

[MAIN] Climate Stripes process for global data #7

Comments

agstephens commented Oct 8, 2024 • edited Loading

What is the CEDA WPS?

List of WPS names and functions

What is the Climate Stripes process?

The code

Next step: climate stripes for global data

Deadline

MVP - Minimal Viable Product

Better than MVP

Which dataset should we use?

Development environment

Working with the Dev back-end WPS (vulture)

Working with the Dev front-end (phoenix)

The admin interface - used for adding a WPS

Git etiquette

Proposed approach

Troubleshooting and general tips

Do you want more detailed logging on the backend (vulture)?

There are a few unit tests

Keep getting a 502 error in the front-end

Jobs are accepted but never run (when watching them in the front-end logs)

agstephens commented Oct 8, 2024 •

edited

Loading