-
Notifications
You must be signed in to change notification settings - Fork 0
Data Transforms
Wiki ▸ Documentation ▸ Data Transforms
A data transform performs operations on a data set prior to visualization. Common examples include filtering and grouping (e.g., group data points with the same stock ticker for plotting as separate lines). Other examples include layout functions (e.g., stream graphs, treemaps, graph layout) that are run prior to mark encoding.
All transform definitions must include a "type"
parameter, which specifies the transform to apply. Each transform then has a set of transform-specific parameters. Transformation workflows are defined as an array of transforms; each transform is then evaluated in the specified order.
Many transforms accept data attributes, or fields (denoted below as "Field"), as parameters. Data field parameters are strings that describe either individual attributes (e.g., "index"
) or access paths (e.g., "data.price"
). Data fields can also access array indices: for example "data.list.0"
is equivalent to data.list[0]
in normal JavaScript.
These transforms can be used to manipulate the data: to filter and sort elements, form groups, and merge different data sets together.
Map a data object to an array of selected values.
Properties
- fields - [Array<Field>] An array of field references to copy into an array, preserving the order specified.
Output
The array transforms outputs a new array instance for each datum.
Example
{"type": "array", "fields": ["data.x", "data.y"]}
This example generates new 2-element arrays containing the data.x
and data.y
values, in that order.
Copies values within a sub-element to the top-level data object. This can be useful for "promoting" raw input values to fields on the wrapping data object.
Properties
- from - [Field] The name of the object to copy values from.
- fields - [Array<String>] The individual field values to copy.
- as - [Array<String>] The name field names to copy the values to. This array must be identical in length to the fields parameter.
Output
The copy transform will populate the requested fields in the top-level of the datum.
Example
{"type": "copy", "from": "data", "fields": ["source", "target"]}
This example copies the fields data.source
and data.target
to the top-level datum. This particular example can be useful for defining source and target fields on network edges.
Organizes a data set into groups or "facets"
Properties
- keys - [Array<Field>] The fields to use as keys. Each unique set of key values corresponds to a single facet in the output.
- sort - [String | Array<String>] Optional sort criteria to use for sorting values within each facet.
Output
The facet transform returns a transformed data set organized into facets. Vega uses a standardized data structure for representing hierarchical or faceted data, which consists of a hierarchy of objects with key and values properties.
Example
{"type": "facet", "keys": ["data.category"]}
Facets the data according to the values of the data.category
attribute.
Filters elements from a data set to remove unwanted items.
Properties
-
test - [String] A string containing an expression (in JavaScript syntax) for the filter predicate. The expression language includes the variable
d
, corresponding to the current data object. All methods and constants of the JavaScriptMath
object are also supported (no prefix required).
Output
The filter transform returns a new data set containing only elements that match the filter test.
Examples
{"type": "filter", "test": "d.data.x > 10"}
This example retains only data elements for which the field data.x
is greater than 10.
{"type": "filter", "test": "log(d.data.y)/LN10 > 2"}
This example retains only data elements for which the base-10 logarithm of data.y
is greater than 2.
Converts a faceted or hierarchical data set back into a flat, tabular structure.
Properties None
Output
The flatten transform returns an array of flattened data values.
Example
{"type": "flatten"}
Flattens a faceted or tree data set.
Extends data elements with new values according to a calculation formula.
Properties
- field - [String] The property name in which to store the computed formula value.
-
expr - [String] A string containing an expression (in JavaScript syntax) for the formula. The expression language includes the variable
d
, corresponding to the current data object. All methods and constants of the JavaScriptMath
object are also supported (no prefix required).
Output
The formula transform returns the input data set, with each element extended with the computed formula value.
Examples
{"type": "formula", "field": "logx", "expr": "log(d.data.x)/LN10"}
This example computes the base-10 logarithm of data.x
and stores the result on each datum as the "logx"
property.
{"type": "formula", "field": "xy", "expr": "abs(d.data.x * d.data.y)"}
This example computes the absolute value of data.x
times data.y
, and stores the result on each datum as the "xy"
property.
Sorts the values of a data set (whether it be flat or faceted).
Properties
- by - [Field | Array<Field>] A list of fields to use as sort criteria. By default, ascending order is assumed. Field names may be prepended with a "-" (minus) character to indicate descending order.
Output
The sort transform returns the input data set with elements sorted in place.
Example
{"type": "sort", "by": "-index"}
This example sorts a data set in descending order by the value of the index
field.
Computes descriptive statistics for a data set. If the data set is faceted, this transform will compute stats separately for each facet. By default, the stats operator computes minimum, maximum, sum, mean (average), sample variance and sample standard deviation statistics.
Properties
- value - [Field] The field for which to compute the statistics. This field should contain quantitative values.
- median - [Boolean] If true, the median statistic will also be computed.
Output
By default, the stats transform sets the following values on each datum:
- count, min, max, sum, mean, variance, stdev
In addition, a median property may be included if requested.
If the data set is faceted, the statistics will be stored on each facet's root element. If the input data is flat, the stats transform will return a new object with the statistics values.
Example
{"type": "stats", "median": true}
This example computes descriptive statistics, including the median.
Given an input data set, constructs a new data set containing the unique values for a requested field.
Properties
- field - [Field] The data field for which to compute unique values.
- as - [String] The field name under which to store the unique values in the new data set.
Output
The unique transform produces a new data set consisting of objects with a single defined property, named according to the as parameter.
Example
{"type": "unique", "field": "data.category", "as": "value"}
This example extracts all unique values of data.category
in the input data, and produces a new data set in which each object contains a unique value in the value
property.
Merges two data sets together according to a provided join key. If no join key is provided, the data sets are merged by their indices. In essence, zip extends a data set with values from another data set. When the secondary (with
) data set is shorter than the primary data set, the zip transform will cycle through values when matching by index.
Properties
- with - [String] The name of the secondary data set to "zip" with the current, primary data set.
- as - [String] The name of the field in which to store the secondary data set values.
- key - [Field] The field in the primary data set to match against the secondary data set.
- withKey [Field] the field in the secondary data set to match against the primary data set.
-
default - [*] A default value to use if no matching key value is found. If not specified
undefined
is used as the default value. The default value is applied only when key values are specified. It will not be used when matching by index.
Output
The zip transform extends the primary data set with values from a matching member of the secondary ("with
") data set, and stores the value from the secondary data in the field specified by the as parameter.
Example
{
"type": "zip",
"key": "data.id",
"with": "unemployment",
"withKey": "data.key",
"as": "value"
}
This example matches records in the input data with records in the data set named "unemployment", where the values of data.id
(primary data) and data.key
(secondary data) match. Matching values in the secondary data are added to the primary data in the field named "value".
Visual encoding transforms can be used to create more advanced visualizations, including layout algorithms and geographic projections.
Performs force-directed layout for network data. Force-directed layouts treat nodes as charged particles and edges (links among nodes) as springs, and uses a physics simulation to determine node positions. The force transform acts on two data sets: one containing nodes and one containing links. Apply the transform to the node data, and include the name of the link data as a transform parameter.
Properties
-
links [String] - The name of the link (edge) data set. Objects in this data set must have appropriately defined
source
andtarget
attributes. - size [Array] - The dimensions [width, height] of this force layout. Defaults to the width and height of the enclosing data rectangle or group.
- iterations [Number] - The number of iterations to run the force directed layout. The default value is 500.
- charge [Number | Field] - The strength of the charge each node exerts. If the parameter value is a number, it will be used for all nodes. If the parameter is a field definition, the charge will be determined by the data. The default value is -30. Negative values indicate a repulsive force, positive values an attractive force.
- linkDistance [Number | Field] - Determines the length of edges, in pixels. If the parameter value is a number, it will be used for all edges. If the parameter is a field definition, the linkDistance will be determined by the data. The default value is 20.
- linkStrength [Number | Field] - Determines the tension of edges (the spring constant). If the parameter value is a number, it will be used for all edges. If the parameter is a field definition, the linkStrength will be determined by the data. The default value is 1.
- friction [Number] - The strength of the friction force used to stabilize the layout.
- theta [Number] - The theta parameter for the Barnes-Hut algorithm, which is used to compute charge forces between nodes.
- gravity [Number] - The strength of the pseudo-gravity force that pulls nodes towards the center of the layout area.
- alpha [Number] - A "temperature" parameter that determines how much node positions are adjusted at each step.
Output
By default, the force transform sets the following values on each node datum:
- x, y, px, py, fixed, weight, index
Example
{"type": "force", "links": "edges", "linkDistance": 70, "charge": -100, "iterations": 1000}
This example assumes a data set named "edges
" has already been defined, and has appropriate source
and target
attributes that reference the graph nodes.
Performs a cartographic projection. Given longitude and latitude values, sets corresponding x and y properties for a mark.
Properties
-
projection [String] - The type of cartographic projection to use. Defaults to
"mercator"
. The geo transform accepts any projection supported by the D3 projection plug-in (for example,albersUsa
,albers
,hammer
,winkel3
, etc). - lon [Field] - The input longitude values.
- lat [Field] - The input latitude values.
- center [Array] - The center of the projection. The value should be a two-element array of numbers.
- translate [Array] - The translation of the project. The value should be a two-element array of numbers.
- scale [Number] - The scale of the projection.
- rotate [Number] - The rotation of the projection.
- precision [Number] - The desired precision of the projection.
- clipAngle [Number] - The clip angle of the projection.
Output
By default, the geo transform sets the following values on each datum:
- x, y
Example
{
"type": "geo",
"lat": "data.latitude",
"lon": "data.longitude",
"projection": "winkel3",
"scale": 300,
"translate": [960, 500]
}
This example computes a Winkel3 projection for lat/lon pairs stored in the data.latitude
and data.longitude
attributes.
Creates paths for geographic regions, such as countries, states and counties. Given a GeoJSON Feature data value, produces a corresponding path definition, subject to a specified cartographic projection. The geopath transform is intended for use with the path mark type.
Properties
- value [Field] - The data field containing GeoJSON Feature data.
-
projection [String] - The type of cartographic projection to use. Defaults to
"mercator"
. The geo transform accepts any projection supported by the D3 projection plug-in (for example,albersUsa
,albers
,hammer
,winkel3
, etc). - center [Array] - The center of the projection. The value should be a two-element array of numbers.
- translate [Array] - The translation of the project. The value should be a two-element array of numbers.
- scale [Number] - The scale of the projection.
- rotate [Number] - The rotation of the projection.
- precision [Number] - The desired precision of the projection.
- clipAngle [Number] - The clip angle of the projection.
Output
By default, the geopath transform sets the following values on each datum:
- path
Example
{
"type": "geopath",
"field": "data",
"projection": "winkel3",
"scale": 300,
"translate": [960, 500]
}
This example creates path definitions using a Winkel3 projection applied to GeoJSON data stored in the data
attribute of input data elements.
Computes a path definition for connecting nodes within a node-link network or tree diagram.
Properties
- source [Field] - The data field that references the source node for this link.
- target [Field] - The data field that references the target node for this link.
-
shape [String] - A string describing the path shape to use. One of
"line"
(default),"curve"
,"diagonal"
,"diagonalX"
, or"diagonalY"
. -
tension [Number] - A tension parameter in the range [0,1] for the "tightness" of
"curve"
-shaped links.
Output
By default, the link transform sets the following values on each datum:
- path
Example
{"type": "link", "shape": "line"}
Creates straight-line links.
{"type": "link", "shape": "curve", "tension": 0.15}
Creates curved links with a limited amount of curvature (tension = 0.15).
Computes a pie chart layout. Given a set of data values, sets startAngle and endAngle properties for a mark. The pie encoder is intended for use with the arc mark type.
Properties
- sort [Boolean] - If true, will sort the data prior to computing angles.
- value [Field] - The data values to encode as angular spans. If this property is omitted, all pie slices will have equal spans.
Output
By default, the pie transform sets the following values on each datum:
- startAngle, endAngle
Examples
{"type": "pie", "value": "data.price"}
Computes angular widths for pie slices based on the field data.price
.
{"type": "pie"}
Computes angular widths for equal-width pie slices.
Computes layout values for stacked graphs, as in stacked bar charts or stream graphs.
Properties
- point [Field] - The data field determining the "points" at which to stack. When values are stacked vertically, this corresponds to the x-coorindates.
- height [Field] - The data field determining the thickness or "height" of a stacks.
-
offset [String] - The baseline offset style. One of
"zero"
(default),"silhouette"
,"wiggle"
, or"expand"
. The"silhouette"
offset will center the stacks, while "wiggle
" will attempt to minimize changes in slope to make the graph easier to read. If"expand"
is chosen, the output values will be in the range [0,1]. -
order [String] - The sort order for stack layers. One of
default
(default),reverse
, orinside-out
.
Output
By default, the stack transform sets the following values on each datum:
- y, y2
Example
{"type": "facet", "keys": ["data.group"]}
{"type": "stack", "point": "data.x", "height": "data.y"}
This example first applies a facet transform to split the data into groups (stacks) according to the field data.group
. Next, it applies a stack transform to determine a stacked graph layout.
Computes a squarified treemap layout. The treemap transform is intended for visualizing hierarchical or faceted data with the rect mark type.
Properties
- padding [Number | Array] - The padding (in pixels) to provide around internal nodes in the treemap. For example, this might be used to create space to label the internal nodes. The padding value can either be a single number or an array of four numbers [top, right, bottom, left]. The default padding is zero pixels.
- ratio [Number] - The target aspect ratio for the layout to optimize. The default value is the golden ratio, (1 + sqrt(5))/2 =~ 1.618.
- round [Boolean] - If true, treemap cell dimensions will be rounded to integer pixels.
- size [Array] - The dimensions [width, height] of the treemap layout. Defaults to the width and height of the enclosing data rectangle or group.
- sticky [Boolean] - If true, repeated runs of the treemap will use cached partition boundaries. This results in smoother transition animations, at the cost of unoptimized aspect ratios. If sticky is used, do not reuse the same treemap encoder instance across data sets.
- value [Field] - The values to use to determine the area of each leaf-level treemap cell.
Output
By default, the treemap transform sets the following values on each datum:
- x, y, width, height
Example
{"type": "treemap", "value": "data.price"}
Computes a treemap layout where elements are sized according to the field data.price
. This example assumes the input data is hierarchical or has already been suitably faceted.
Computes a word cloud layout, similar to Wordle. The wordcloud transform is intended for visualizing words or phrases with the text mark type.
Properties
- font [String] - The font face to use within the word cloud.
- fontSize [Field] - The font size for a word, in pixels.
-
fontStyle [String] - The font style (e.g.,
"italic"
) to use. -
fontWeight [String] - The font weight (e.g.,
"bold"
) to use. - padding [Number | Array] - The padding (in pixels) to provide around text in the word cloud. The padding value can either be a single number or an array of four numbers [top, right, bottom, left]. The default padding is zero pixels.
-
rotate [Field | Object] - The rotation angle for a word. The input value can either be a data field reference, or an object describing an assignment policy. For example, the object
{"random": [0, 30, 60]}
will randomly assign one of 0, 30 or 60 degrees, while{"alternate": [0, 30, 60]}
will repeatedly assign the angles 0, 30 and 60 degrees in order. - size [Array] - The dimensions [width, height] of the wordcloud layout. Defaults to the width and height of the enclosing data rectangle or group.
- text [Field] - The data field containing the text to visualize for each datum.
Output
By default, the wordcloud transform sets the following values on each datum:
- x, y, font, fontSize, angle
Example
{
"type": "wordcloud",
"text": "data.text",
"font": "Helvetica Neue",
"fontSize": "data.value",
"rotate": {"random": [-60,-30,0,30,60]}
}
Computes a word cloud layout using random angles in the set [-60,-30,0,30,60].