Skip to content

Adding a national data source

Joe Palmer edited this page Apr 20, 2021 · 2 revisions

Adding a new national data source

In addition to providing regional data, this package also provides the same Covid-19 data at a national level. Currently, the package gets this data from the World Health Organisation (WHO) and European Centre for Disease Prevention and Control (ECDC). This guide outlines how to go about adding a new data source to get national level covid data.

Writing a national data class

You need an open and accessible data source, preferably in the form of a CSV file updated on a regular basis and accessible for download with a fixed (or predictable) URL.

As with adding an individual country class, the best way to get started with adding a national level data class is to follow an example, such as WHO or ECDC.

Naming convention

If your data comes from a body with an acronym (e.g. WHO, ECDC, JHU, etc.) make sure to name your class as such in all upper case. If it does not (e.g. Google data) use upper camel case (GoogleData).

What fields (variables) to include

  • Make sure your new class inherits CountryDataClass not DataClass. Rather than inheriting directly from DataClass national level classes instead inherit from CountryDataClass which itself inherits from DataClass and provides some additional generic capabilities, such as filtering for a target country.

Include all fields you would expect to see in any other country class:

  • country - Write here the full name of the source with the acronym in brackets (e.g. World Health Organisation (WHO))
  • supported_levels - Make this a list containing the character "1"
  • supported_region_names - Make this a named list of the supported level and it's name (e.g. "1" = "country")
  • supported_region_codes - Make this a named list of the supported level and it's region code name (e.g. "1" = "iso_code")
  • common_data_urls - Make this a named list of the url links to download the data (e.g. "main" = "path/2/data.csv")
  • source_data_cols - Make this a vector of the columns contained in the raw data as it is downloaded (e.g. c("cases_new", "cases_total", etc.)

Make sure all fields are public.

What methods (functions) to include

Downloading and Processing

Like with an individual country class, downloading and processing are handled for you. You will only need a custom download method if your data is not available on static urls in csv format. If you provide a new method for download, call super$download() first within it. You should never need to add any custom processing methods.

Cleaning

You will need to write a public function called clean_common which should not need arguments (everything should be accessed using self) or to return anything. This is where the raw data (data$raw) is cleaned (e.g. put in the correct format and make sure the data is consistent) and stored in data$clean. Unlike with individual country classes, you are not expected to provide region codes so there is no need to join region codes to your data.

Return

National classes should have a return function (called return) which should take processed data (data$processed) and generate totals for each column if the argument totals = TRUE and arrange them by date and country. This function should also use the argument steps to return either the full data if TRUE or just the returned data (see the WHO example if not clear).

Additional notes

Whilst your class must contain the public functions return, and clean_common you can also make new functions which overwrite parent ones (although you probably will not need to except for perhaps download). These functions should not return anything (except for return) and should not take arguments. You may want to use additional functions to help with things like cleaning. This is fine and just store them as public methods but named something that wont clash with the existing code. For these functions use arguments and returns rather than self to keep the code flow verbose. Also, please provide tests for these 'helper functions'.