-
Notifications
You must be signed in to change notification settings - Fork 4
Explode data filter
The Explode data filter on the Recipe page allows you to convert a wide dataset (e.g. time series) to a narrow dataset (suitable for visualisations and databases), similar to the "reshape" command in the R language. For background information, see the Wikipedia article "Wide and narrow data."
The command looks for the HXL +label attribute. Every column where the +label attribute appears in the input ("wide") data appears will generate a new row in the output ("narrow") data, with the column header as one of the values. The output columns will have the +label attribute removed, and new +header and +value attributes added.
(This process is easier to understand by looking at the Example section below.)
Attribute for former header text: the HXL attribute to add to the original hashtag for the column containing the former text label (defaults to +header).
Attribute for original value: the HXL attribute to add to the original hashtag for the column containing the original value (defaults to +value).
District | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 |
---|---|---|---|---|---|---|
#adm1 | #inneed+label | #inneed+label | #inneed+label | #inneed+label | #inneed+label | #inneed+label |
Coast | 5000 | 6000 | 6000 | 2500 | 1500 | 1000 |
Plains | 0 | 0 | 1000 | 8000 | 10000 | 12000 |
District | -- | -- |
---|---|---|
#adm1 | #inneed+header | #inneed+value |
Coast | 2010 | 5000 |
Coast | 2011 | 6000 |
Coast | 2012 | 6000 |
Coast | 2013 | 2500 |
Coast | 2014 | 1500 |
Coast | 2015 | 1000 |
Plains | 2010 | 0 |
Plains | 2011 | 0 |
Plains | 2012 | 1000 |
Plains | 2013 | 8000 |
Plains | 2014 | 10000 |
Plains | 2015 | 12000 |
This filter is especially useful for taking data that has been optimised for human readability (e.g. time-series data) and converting it to a format that's more suitable for automated processing, such as generating visualisations, importing into a database, etc.
Typically, you would follow the Explode data filter with a Rename column filter to change the output column headers and hashtags to something more meaningful, like this:
District | Year | People in need |
---|---|---|
#adm1 | #date+year | #inneed |
Coast | 2010 | 5000 |
Coast | 2011 | 6000 |
Coast | 2012 | 6000 |
Coast | 2013 | 2500 |
Coast | 2014 | 1500 |
Coast | 2015 | 1000 |
Plains | 2010 | 0 |
Plains | 2011 | 0 |
Plains | 2012 | 1000 |
Plains | 2013 | 8000 |
Plains | 2014 | 10000 |
Plains | 2015 | 12000 |
When using the Tagger page to add hashtags to a non-HXL dataset, the Default HXL hashtag field is especially useful: if you set it to a value like "#inneed+label", it will automatically fill that in for all the time-series columns.
Learn more about the HXL standard at http://hxlstandard.org