-
Notifications
You must be signed in to change notification settings - Fork 4
JSON recipes
(For coders)
As an alternative to using the web forms linked from the Filters article, advanced users can pass a full recipe to the HXL Proxy as a URL-encoded JSON specification via the &recipe HTTP parameter. A recipe is simply a JSON list of filters, like this:
[
{
"filter": "with_rows",
"queries": "org=unicef"
},
{
"filter": "count",
"tags": "#adm1+name"
}
]
For the Process JSON spec service, you embed the list of filters as the value of the recipe property in a JSON processing spec:
{
"input": "http://example.org/dataset.csv",
"recipe": [
{
"filter": "with_rows",
"queries": "org=unicef"
},
{
"filter": "count",
"tags": "#adm1+name"
}
]
}
In both cases, the HXL Proxy will apply the filters in the order you list them, using the output from one as the input to the next. The output of the last filter will be the result of the processing chain.
Every filter specification includes a property filter set to a short name for the filter (e.g. "count"). The other properties, if any, depend on the filter type.
Add column filter properties
Unlike the web form, the JSON version allows you to add multiple columns in a single operation, so the name is add_columns (plural).
{ "filter": "add_columns",
"specs": ["Country#country+name=Ghana"] }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | "add_columns" |
specs | yes | A specification or list of specifications in the format header:tagspec=fixed value | "Country#country+code=SYR". Note that fixed values may include substitution patterns inside double braces to pull values from other columns, e.g. "X-{{#country+code}}" |
before | no | If true, place the new column at the left rather than right of existing columns | True |
Append datasets filter properties
{ "filter": "append",
"append_source": "http://example.org/data.csv" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | always "append" |
append_source | yes | The URL or JSON source (see JSON data sources) for the dataset to append | "http://example.org/data.csv" |
add_columns | no | If true, include columns that are in the append_source but not the original (default False). | True |
queries | no | One or more Row queries to filter rows added from the append_source | "date>2010" |
Append datasets filter (external list) properties
{ "filter": "append_external_list",
"source_list_url": "http://example.org/list-of-sources.csv" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | always "append_external_list" |
source_list_url | yes | The URL or JSON source (see JSON data sources) for the list of dataset URLs to append (in a column tagged #x_source) | "http://example.org/list-of-sources.csv" |
add_columns | no | If true, include columns that are in the append_source but not the original (default False). | True |
queries | no | One or more Row queries to filter rows added from the append_source | "date>2010" |
(This filter caches the complete dataset for repeated processing. Not generally used inside the HXL proxy.)
{ "filter": "cache" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "cache" |
Clean data filter properties
{ "filter": "clean_data",
"date": "date+start" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "clean_data" |
whitespace | no | Tag patterns for columns that will have whitespace normalised (list of strings, or single string with patterns comma-separated). | "org+name" |
upper | no | Tag patterns for columns that will be converted to all upper case. | "adm1+code,adm2+code" |
lower | no | Tag patterns for columns that will be converted to all lower case. | ["sector+code", "subsector+code"] |
date | no | Tag patterns for columns that will have dates normalised. | "date" |
date_format | no | strftime format string for output dates. | "%Y-%m-%d" (ISO 8601 date, e.g. "2018-01-05"). |
number | no | Tag patterns for columns that will have numbers normalised. | "inneed,affected,population" |
number_format | no | printf-style format string for output numbers. Do not include a leading "%" | "0.2f" (two decimal places, e.g. 3.00). |
latlon | no | Tag patterns for columns that will have lat/lon normalised. The attributes +lat, +lon, and +coord will control the normalisation type. | "geo" |
purge | no | If true, remove any dates, numbers, or lat/lon that can't be cleaned during parsing. | True |
queries | no | One or more Row queries to filter rows that will be cleaned. | "sector=wash" |
Count rows filter properties
{ "filter": "count",
"patterns": "adm1+name,org",
"aggregators": "sum(#targeted) as Total targeted#targeted#total",
"queries": "sector=WASH' }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "count" |
patterns | yes | Tag patterns for the rows to count (list, or single string with patterns comma-separated). | "adm1,org" |
aggregators | no | A single string or list of strings describing Aggregators. If omitted, the filter will use the aggregator "count() as Count#meta+count" by default, simply counting unique combinations of patterns. | "sum(#targeted)" |
queries | no | One or more Row queries to filter rows that will be counted (ignore any that don't match at least one query). | "sector=wash" |
Deduplicate rows filter properties
{ "filter": "dedup" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "dedup" |
patterns | no | Tag patterns for the columns to consider for deduplication (if omitted, consider all columns). | "org,country,adm1,sector" |
queries | no | One or more Row queries to filter rows that will be ignored during deduplication. | "sector=wash" |
Explode data filter properties
{ "filter": "explode" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "explode" |
header_attribute | no | The attribute to add for the former header (instead of "+header"). | "age" |
value_attribute | no | The attribute to add for the former dataset value (instead of "+value"). | "total" |
Fill data filter properties
{ "filter": "fill",
"pattern": "#affected",
"queries": "sector=wash"}
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "fill" |
pattern | no | A tag pattern (see Tag patterns) matching the column(s) that should be filled (if omitted, fill all). | "#affected" |
queries | no | One or more Row queries to filter rows that will be ignored during filling. | "sector=wash" |
Explode data filter properties
{ "filter": "implode",
"label_pattern": "#date+year",
"value_pattern": "#affected" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "implode" |
label_pattern | yes | Tag pattern to match the column holding labels. | "#date+year" |
value_pattern | yes | Tag pattern to match the column holding values. | "#affected" |
JSONPath filter properties
{ "filter": "jsonpath",
"path": "a.b[2]",
"pattern": "#affected",
"queries": "sector=wash"}
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "jsonpath" |
path | yes | A JSONPath expression | "data.values[3]" |
pattern | no | A tag pattern (see Tag patterns) matching the column(s) where the JSONPath should be applied (if omitted, apply to all). | "#affected" |
queries | no | One or more Row queries to filter rows that will be ignored during JSONPath extraction. | "sector=wash" |
Merge columns filter properties
{ "filter": "merge_data",
"keys": "adm1+code",
"tags": ["affected", "population"] }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "merge_data" |
merge_source | yes | The URL or JSON source (see JSON data sources) for the dataset to from which to merge extra columns. | "http://example.org/data.csv" |
keys | yes | Tag patterns for the columns to use as shared keys between the two datasets. | "adm1+code,adm2+code" |
tags | yes | Tag patterns matching the columns that will be copied over from the merge_source. | "affected,population" |
replace | no | If True, attempt to replace existing columns in the source dataset, instead of adding new ones. | False |
overwrite | no | If True (and replace is also True), overwrite fields in the source dataset even when there's already a value in them (otherwise, write only when the field is empty). | True |
queries | no | One or more Row queries to filter rows that considered from the merge_source. | "sector=wash" |
Rename column filter properties
Unlike the web form, the JSON version allows you to rename multiple columns in a single operation. As a result, it is called rename_columns (plural).
{ "filter": "rename_columns",
"specs": "meta+count:Number affected#affected" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "rename_columns" |
specs | yes | A single specification or list of rename specifications in the format #old_tagspec:New header#new_tagspec | "#meta+count:Number affected#affected" |
Replace data filter properties
{ "filter": "replace_data",
"original": "Oxfm",
"replacement": "Oxfam",
"pattern": "org+name" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "replace_data" |
original | yes | The original string or regular expression to replace. | "Oxfm" |
replacement | yes | The new string to substitute. | "Oxfam" |
pattern | no | a tag pattern (see Tag patterns) for the column(s) where the replacement should take place. If omitted, replace in all columns. | |
use_regex | no | If True, interpret pattern as a regular expression. | False |
queries | no | One or more Row queries to filter rows where replacement can take place. | "sector=wash" |
Replace data (mapping table) filter properties
{ "filter": "replace_data_map",
"map_source": "http://example.org/replacement-map.csv" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "replace_data_map" |
map_source | yes | The URL or JSON source (see JSON data sources) for the dataset containing a replacement map (see Replacement maps). | "http://example.org/replacement-map.csv" |
queries | no | One or more Row queries to filter rows where replacement can take place. | "sector=wash" |
Sort rows filter properties
{ "filter": "sort",
"tags": "date,adm1+name" }
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "sort" |
tags | no | Tag patterns for the columns to use as sort keys. | "date,adm1+name" |
reverse | no | If True, reverse the sort order (defaults to False). | True |
{ "filter": "with_columns",
"includes": "org,sector,adm1+name" }
Corresponds to the Cut columns filter with the includes parameter specified.
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "with_columns" |
includes | yes | Tag patterns for the columns to include. | "org,sector,adm1+name" |
{ "filter": "with_rows",
"queries": ["org=unicef", "date+year>2010"] }
Corresponds to the Select rows filter without the "Invert query" box checked.
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "with_rows" |
queries | yes | Row queries for the rows to include (list or single string). | ["org=unicef", "date+year>2010"] |
{ "filter": "without_columns",
"excludes": "contact+email" }
Corresponds to the Cut columns filter with the excludes parameter specified.
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "without_columns" |
excludes | yes | Tag patterns for the columns to exclude. | "contact+email" |
skip_untagged | no | also remove all columns without HXL hashtags. | True |
{ "filter": "without_rows",
"queries": "status!=active" }
Corresponds to the Select rows filter with the "Invert query" box checked.
Property | Required | Description | Example |
---|---|---|---|
filter | yes | The filter type | Always "without_rows" |
queries | yes | Row queries for the rows to exclude (list or single string). | "status!=active" |
Learn more about the HXL standard at http://hxlstandard.org