Skip to content

2 ggparty

Martin edited this page Jun 28, 2019 · 2 revisions

But first things first.
Let’s recreate a simple example already used in the partykit vignette. If you are not familiar with the parykit you should definitely check it out before you work with this package.

data("WeatherPlay", package = "partykit")
sp_o <- partysplit(1L, index = 1:3)
sp_h <- partysplit(3L, breaks = 75)
sp_w <- partysplit(4L, index = 1:2)
pn <- partynode(1L, split = sp_o, kids = list(
  partynode(2L, split = sp_h, kids = list(
    partynode(3L, info = "yes"),
    partynode(4L, info = "no"))),
  partynode(5L, info = "yes"),
  partynode(6L, split = sp_w, kids = list(
    partynode(7L, info = "yes"),
    partynode(8L, info = "no")))))
py <- party(pn, WeatherPlay)

ggparty()

The ggparty() function takes a tree of class party and allows us to plot it with the help of the ggplot2 package. To make this possible, the 'party' object first needs to be transformed into a 'data.frame' and be passed to a ggplot() call. This is exactly what happens when we run ggparty().

is.ggplot(ggparty(py))

[1] TRUE

pander::pandoc.table(ggparty(py)$data[,1:16])
id x y parent birth_order breaks_label info info_list
1 0.5 1 NA 0 NA NA NA
2 0.2 0.75 1 1 sunny NA NA
3 0.1 0.5 2 1 NA <= NA* 75 yes NA
4 0.3 0.5 2 2 NA > NA* 75 no NA
5 0.5 0.5 1 2 overcast yes NA
6 0.8 0.75 1 3 rainy NA NA
7 0.7 0.5 6 1 false yes NA
8 0.9 0.5 6 2 true no NA

Table continues below

splitvar level kids nodesize p.value horizontal x_parent y_parent
outlook 0 3 14 NA FALSE NA NA
humidity 1 2 5 NA FALSE 0.5 1
NA 2 0 2 NA FALSE 0.2 0.75
NA 2 0 3 NA FALSE 0.2 0.75
NA 2 0 4 NA FALSE 0.5 1
windy 1 2 5 NA FALSE 0.5 1
NA 2 0 3 NA FALSE 0.8 0.75
NA 2 0 2 NA FALSE 0.8 0.75

Plot Data

The first 16 columns of the 'data.frame' passed by ggparty() to ggplot() contain these values:

  • id… ID of the node
  • x… X coordinate of the node
  • y… Y coordinate of the node
  • parent… ID of node’s parent
  • birth_order… Position relative to parent. Goes from left to right.
  • breaks_label… String containing the corresponding split break of the parent’s split variable.
  • info… String containing the info of the node
  • info_list… List containing the info of the node if it was a list
  • splitvar… String containing the name of the Variable to split with. (only inner nodes)
  • level… At which level to draw the node. (0 = root)
  • kids… Number of node’s kids
  • nodesize… Number of rows in node’s data.
  • p.value… P value of model if present
  • horizontal… Logical - specifies whether the tree is to be drawn horizontally or vertically. Identical for all nodes.
  • x_parent… X coordinate of the node’s parent
  • y_parent… Y coordinate of the node’s parent

The remaining columns contain lists of the node’s data and we will need geom_node_plot() to work with them.

Plotting a Tree

Every ggparty plot starts with a call to the eponymous ggparty() function which requires an object of class 'party'. To draw a tree we will need to add several of these components:

Basic Building Blocks

  • geom_edge() draws the edges between the nodes
  • geom_edge_label() labels the edges with the corresponding split breaks
  • geom_node_label() labels the nodes with the split variable, node info or anything else. The shorthand versions of this geom geom_node_splitvar() and geom_node_info() have the correct defaults to write the split variables in the inner nodes resp. the info in the terminal nodes.
  • geom_node_plot() creates a custom ggplot at the location of the node

In most cases we will probably want to draw at least edges, edge labels and node labels, so we will have to call the respective functions. The default mappings of geom_edge() and and geom_edge_label() ensure that lines between the related nodes are drawn and the corresponding split breaks are plotted at their centers.

Since the text we want to print on the nodes differs depending on the kind of node, we will call geom_node_label twice. Once for the inner nodes, to plot the split variables and once for the terminal nodes to plot the info elements of the tree, which in this case contain the play decision.

ggparty(py) +
  geom_edge() +
  geom_edge_label() +
  geom_node_label(aes(label = splitvar), ids = "inner") +
  # identical to  geom_node_splitvar() +
  geom_node_label(aes(label = info), ids = "terminal")

  # identical to geom_node_info()
 

Instead of adding geom_node_label() we can also add the convenience versions geom_node_splitvar() and geom_node_info() which contain the correct defaults to plot the split variables in the inner nodes and the info in the terminal nodes.
Thanks to the ggplot2 mechanics we can now map different aspects of our plot to properties of the nodes. Whether that’s the best choide in this case is a different question.

ggparty(py) +
  geom_edge() +
  geom_edge_label() +
  # map color to level and size to nodesize for all nodes
  geom_node_splitvar(aes(col = factor(level),
                         size = nodesize)) +
  geom_node_info(aes(col = factor(level),
                     size = nodesize))

We can create a horizontal tree simply by setting horizontal in ggparty() to TRUE.

ggparty(py, horizontal = TRUE) +
  geom_edge() +
  geom_edge_label() +
  geom_node_splitvar() +
  geom_node_info()

Clone this wiki locally