Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge 22 - Discovering hidden patterns on Climate Data Store #6

Open
EsperanzaCuartero opened this issue Feb 24, 2023 · 0 comments
Assignees
Labels
Stream 2 Machine Learning for Earth Sciences

Comments

@EsperanzaCuartero
Copy link
Contributor

EsperanzaCuartero commented Feb 24, 2023

Challenge 22 - Discovering hidden patterns on Climate Data Store

Stream 2 - Machine Learning for Earth Science

Goal

CDS produce a wide set of transactional records and operational logs which contains a lot of hidden information that would represent a very valuable insight to better understand and predict system patterns, user behaviours and preferences, and early warnings, ... this could result in improvements in the system and more dynamic configuration (QoS).

The aim of the project is to explore what Ml/AI can bring to reveal this information and how this could be later applied for CADS Operation.

Mentors and skills

  • Mentors: Angel Lopez, Gionata Biavati
  • Skills required:
    • Python (numpy, pandas, xarray)
    • ML/AI models - python libraries
    • SQL
    • Splunk (Optional)

Note: Only nationals from European Union (EU) Member States and countries associated with EU’s Space Programme (currently Iceland and Norway) are eligible to participate (see Terms and Conditions).


Challenge description

Currently, the information obtained about users is based on very generic indicators and graphs. Going deeper into the exploration of data and logs is done case by case when particular issues or requests need to be addressed.

Currently, the number and volumes of transactions and data are such that these operations become more and more complicated.

Data/System to use

Climate and Atmosphere Data Stores transactional information (user requests) is supported by a Postgres DB. Operational information from the system components is registered in different logs.

Both sources of information are indexed on Splunk in almost real-time. Information can be directly exploited via Splunk or exported to be used in other environments.

Solution

Applying ML/AI models to the data collected by the system will allow to the extraction of hidden knowledge about user patterns, and cause-effect issues,...

This knowledge will allow us to better understand the system, put in place more dynamic configuration (QoS), tune the system, implement new features on the system, inform users, and organise the catalogue structure, ...

Ideas for the implementation

  • Quality of Service Rules are a key component to protect the system as it allows handling the management of processing requests by balancing users' requirements with available resources at the system level. Currently, QoS is manually managed based on perceptions and visited reports. The output of this project could trigger some automatic updates on some QoS rules based on discovered information.
  • Issues on the system usually start as a consequence of bad actions from abusive users or badly performed requests. Sometimes this response is to well-known causes (eg. users fishing the latest available data even before this is released). The more we are able to understand these behaviours the better we can react and put in place contingency actions.
  • Does the user download exactly what we need or we are forced to download more to later extract what he really looks for? What are the looking for that is not there and what information can we get from there to improve the system performance or/and access to data? (eg. How user structure request? Is that optional? Are they looking for something in the wrong way or even something that does not exist yet?
@EsperanzaCuartero EsperanzaCuartero added the Stream 2 Machine Learning for Earth Sciences label Feb 24, 2023
@EsperanzaCuartero EsperanzaCuartero changed the title Challenge 6 - Discovering hidden patterns on Climate Data Store Challenge 22 - Discovering hidden patterns on Climate Data Store Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stream 2 Machine Learning for Earth Sciences
Projects
None yet
Development

No branches or pull requests

4 participants