Skip to content

JSON recipes

David Megginson edited this page Jun 20, 2023 · 41 revisions

(For coders)

As an alternative to using the web forms linked from the Filters article, advanced users can pass a full recipe to the HXL Proxy as a URL-encoded JSON specification via the &recipe HTTP parameter. A recipe is simply a JSON list of filters, like this:

[
    {
        "filter": "with_rows",
        "queries": "org=unicef"
    },
    { 
        "filter": "count",
        "tags": "#adm1+name"
    }
]

For the Process JSON spec service, you embed the list of filters as the value of the recipe property in a JSON processing spec:

{
    "input": "http://example.org/dataset.csv",
    "recipe": [
        {
            "filter": "with_rows",
            "queries": "org=unicef"
        },
        { 
            "filter": "count",
            "tags": "#adm1+name"
        }
    ]
}

In both cases, the HXL Proxy will apply the filters in the order you list them, using the output from one as the input to the next. The output of the last filter will be the result of the processing chain.

Filter properties

Every filter specification includes a property filter set to a short name for the filter (e.g. "count"). The other properties, if any, depend on the filter type.

Add column filter properties

Unlike the web form, the JSON version allows you to add multiple columns in a single operation, so the name is add_columns (plural).

{ "filter": "add_columns",
  "specs": ["Country#country+name=Ghana"] }
Property Required Description Example
filter yes The filter type "add_columns"
specs yes A specification or list of specifications in the format header:tagspec=fixed value "Country#country+code=SYR". Note that fixed values may include substitution patterns inside double braces to pull values from other columns, e.g. "X-{{#country+code}}"
before no If true, place the new column at the left rather than right of existing columns True
{ "filter": "append",
  "append_source": "http://example.org/data.csv" }
Property Required Description Example
filter yes The filter type always "append"
append_source yes The URL or JSON source (see JSON data sources) for the dataset to append "http://example.org/data.csv"
add_columns no If true, include columns that are in the append_source but not the original (default False). True
queries no One or more Row queries to filter rows added from the append_source "date>2010"

Append datasets filter (external list) properties

{ "filter": "append_external_list",
  "source_list_url": "http://example.org/list-of-sources.csv" }
Property Required Description Example
filter yes The filter type always "append_external_list"
source_list_url yes The URL or JSON source (see JSON data sources) for the list of dataset URLs to append (in a column tagged #x_source) "http://example.org/list-of-sources.csv"
add_columns no If true, include columns that are in the append_source but not the original (default False). True
queries no One or more Row queries to filter rows added from the append_source "date>2010"

Cache filter properties

(This filter caches the complete dataset for repeated processing. Not generally used inside the HXL proxy.)

{ "filter": "cache" }
Property Required Description Example
filter yes The filter type Always "cache"

Clean data filter properties

{ "filter": "clean_data",
  "date": "date+start" }
Property Required Description Example
filter yes The filter type Always "clean_data"
whitespace no Tag patterns for columns that will have whitespace normalised (list of strings, or single string with patterns comma-separated). "org+name"
upper no Tag patterns for columns that will be converted to all upper case. "adm1+code,adm2+code"
lower no Tag patterns for columns that will be converted to all lower case. ["sector+code", "subsector+code"]
date no Tag patterns for columns that will have dates normalised. "date"
date_format no strftime format string for output dates. "%Y-%m-%d" (ISO 8601 date, e.g. "2018-01-05").
number no Tag patterns for columns that will have numbers normalised. "inneed,affected,population"
number_format no printf-style format string for output numbers. Do not include a leading "%" "0.2f" (two decimal places, e.g. 3.00).
latlon no Tag patterns for columns that will have lat/lon normalised. The attributes +lat, +lon, and +coord will control the normalisation type. "geo"
purge no If true, remove any dates, numbers, or lat/lon that can't be cleaned during parsing. True
queries no One or more Row queries to filter rows that will be cleaned. "sector=wash"

Count rows filter properties

{ "filter": "count",
  "patterns": "adm1+name,org",
  "aggregators": "sum(#targeted) as Total targeted#targeted#total",
  "queries": "sector=WASH'  }
Property Required Description Example
filter yes The filter type Always "count"
patterns yes Tag patterns for the rows to count (list, or single string with patterns comma-separated). "adm1,org"
aggregators no A single string or list of strings describing Aggregators. If omitted, the filter will use the aggregator "count() as Count#meta+count" by default, simply counting unique combinations of patterns. "sum(#targeted)"
queries no One or more Row queries to filter rows that will be counted (ignore any that don't match at least one query). "sector=wash"
{ "filter": "dedup" }
Property Required Description Example
filter yes The filter type Always "dedup"
patterns no Tag patterns for the columns to consider for deduplication (if omitted, consider all columns). "org,country,adm1,sector"
queries no One or more Row queries to filter rows that will be ignored during deduplication. "sector=wash"
{ "filter": "explode" }
Property Required Description Example
filter yes The filter type Always "explode"
header_attribute no The attribute to add for the former header (instead of "+header"). "age"
value_attribute no The attribute to add for the former dataset value (instead of "+value"). "total"

Fill data filter properties

{ "filter": "fill",
  "pattern": "#affected",
  "queries": "sector=wash"}
Property Required Description Example
filter yes The filter type Always "fill"
pattern no A tag pattern (see Tag patterns) matching the column(s) that should be filled (if omitted, fill all). "#affected"
queries no One or more Row queries to filter rows that will be ignored during filling. "sector=wash"
{ "filter": "implode",
  "label_pattern": "#date+year",
  "value_pattern": "#affected" }
Property Required Description Example
filter yes The filter type Always "implode"
label_pattern yes Tag pattern to match the column holding labels. "#date+year"
value_pattern yes Tag pattern to match the column holding values. "#affected"

JSONPath filter properties

{ "filter": "jsonpath",
  "path": "a.b[2]",
  "pattern": "#affected",
  "queries": "sector=wash"}
Property Required Description Example
filter yes The filter type Always "jsonpath"
path yes A JSONPath expression "data.values[3]"
pattern no A tag pattern (see Tag patterns) matching the column(s) where the JSONPath should be applied (if omitted, apply to all). "#affected"
queries no One or more Row queries to filter rows that will be ignored during JSONPath extraction. "sector=wash"
{ "filter": "merge_data",
  "keys": "adm1+code",
  "tags": ["affected", "population"] }
Property Required Description Example
filter yes The filter type Always "merge_data"
merge_source yes The URL or JSON source (see JSON data sources) for the dataset to from which to merge extra columns. "http://example.org/data.csv"
keys yes Tag patterns for the columns to use as shared keys between the two datasets. "adm1+code,adm2+code"
tags yes Tag patterns matching the columns that will be copied over from the merge_source. "affected,population"
replace no If True, attempt to replace existing columns in the source dataset, instead of adding new ones. False
overwrite no If True (and replace is also True), overwrite fields in the source dataset even when there's already a value in them (otherwise, write only when the field is empty). True
queries no One or more Row queries to filter rows that considered from the merge_source. "sector=wash"

Unlike the web form, the JSON version allows you to rename multiple columns in a single operation. As a result, it is called rename_columns (plural).

{ "filter": "rename_columns",
  "specs": "meta+count:Number affected#affected" }
Property Required Description Example
filter yes The filter type Always "rename_columns"
specs yes A single specification or list of rename specifications in the format #old_tagspec:New header#new_tagspec "#meta+count:Number affected#affected"
{ "filter": "replace_data",
  "original": "Oxfm",
  "replacement": "Oxfam",
  "pattern": "org+name" }
Property Required Description Example
filter yes The filter type Always "replace_data"
original yes The original string or regular expression to replace. "Oxfm"
replacement yes The new string to substitute. "Oxfam"
pattern no a tag pattern (see Tag patterns) for the column(s) where the replacement should take place. If omitted, replace in all columns.
use_regex no If True, interpret pattern as a regular expression. False
queries no One or more Row queries to filter rows where replacement can take place. "sector=wash"
{ "filter": "replace_data_map",
  "map_source": "http://example.org/replacement-map.csv" }
Property Required Description Example
filter yes The filter type Always "replace_data_map"
map_source yes The URL or JSON source (see JSON data sources) for the dataset containing a replacement map (see Replacement maps). "http://example.org/replacement-map.csv"
queries no One or more Row queries to filter rows where replacement can take place. "sector=wash"

Sort rows filter properties

{ "filter": "sort",
  "tags": "date,adm1+name" }
Property Required Description Example
filter yes The filter type Always "sort"
tags no Tag patterns for the columns to use as sort keys. "date,adm1+name"
reverse no If True, reverse the sort order (defaults to False). True

With columns filter properties

{ "filter": "with_columns",
  "includes": "org,sector,adm1+name" }

Corresponds to the Cut columns filter with the includes parameter specified.

Property Required Description Example
filter yes The filter type Always "with_columns"
includes yes Tag patterns for the columns to include. "org,sector,adm1+name"

With rows filter properties

{ "filter": "with_rows",
  "queries": ["org=unicef", "date+year>2010"] }

Corresponds to the Select rows filter without the "Invert query" box checked.

Property Required Description Example
filter yes The filter type Always "with_rows"
queries yes Row queries for the rows to include (list or single string). ["org=unicef", "date+year>2010"]

Without columns filter properties

{ "filter": "without_columns",
  "excludes": "contact+email" }

Corresponds to the Cut columns filter with the excludes parameter specified.

Property Required Description Example
filter yes The filter type Always "without_columns"
excludes yes Tag patterns for the columns to exclude. "contact+email"
skip_untagged no also remove all columns without HXL hashtags. True

Without rows filter properties

{ "filter": "without_rows",
  "queries": "status!=active" }

Corresponds to the Select rows filter with the "Invert query" box checked.

Property Required Description Example
filter yes The filter type Always "without_rows"
queries yes Row queries for the rows to exclude (list or single string). "status!=active"
Clone this wiki locally