Skip to content

6 layout

Martin edited this page Jun 28, 2019 · 1 revision

Layout

Nodes

## Boston housing data
data("BostonHousing", package = "mlbench")
BostonHousing <- transform(BostonHousing,
                           chas = factor(chas, levels = 0:1, labels = c("no", "yes")),
                           rad = factor(rad, ordered = TRUE))

## linear model tree
bh_tree <- lmtree(medv ~ log(lstat) + I(rm^2) | zn +
                    indus + chas + nox + age + dis + rad + tax + crim + b + ptratio,
                  data = BostonHousing, minsize = 40)

Let’s take a look at ggparty()’s layout system with the help of this 'lmtree' based on BostonHousing data set from mlbench.

# terminal space specifies at which value of y the terminal plots begin
bh_plot <- ggparty(bh_tree, terminal_space = 0.5) +
  geom_edge() +
  geom_edge_label() +
  geom_node_splitvar() +
  # plot first row
  geom_node_plot(gglist = list(
    geom_point(aes(y = medv, x = `log(lstat)`, col = chas),
               alpha = 0.6)),
    # halving the height shrinks plots towards the top
    height = 0.5) +
  # plot second row
  geom_node_plot(gglist = list(
    geom_point(aes(y = medv, x = `I(rm^2)`, col = chas),
               alpha = 0.6)),
    height = 0.5,
    # move -0.25 y to use the bottom half of the terminal space
    nudge_y = -0.25)

bh_plot

ggparty() positions all the nodes within the unit square. For vertical trees the root is always at (0.5, 1), for horizontal ones it is at (0, 0.5). The argument terminal_size specifies how much room should be left for terminal plots. The default value depends on the depth of the supplied tree. The terminal nodes are placed at this value and in case labels are drawn, they are drawn there. In case plots are to be drawn their top borders are aligned to this value, i.e. the terminal plots just is not "center" but "top". Therefore reducing the height of a terminal node shrinks it towards the top.

So if we want to plot multiple plots per node we have to keep this in mind and can achieve this for example like this.
The first geom_node_plot() only takes the argument height = 0.5 which halves its size and effectively makes it occupy only the upper half of the area it would normally do. For the second geom_node_plot() we also specify the size to be 0.5 but additionally we have to specify nudge_y. Since the terminal space is set to be 0.5, we know that the first plot now spans from 0.5 to 0.25. So we want to move the line where to place the second plot to 0.25, i.e. nudge it from 0.5 by -0.25.

Changing the theme from the default theme_void to one for which gridlines are drawn allows us to see the above described layout structure.

bh_plot + theme_bw()

We can use this information to manually set the positions of nodes. To do this we must pass a 'data.frame' containing the columns id, x and y to the layout argument of ggparty().

ggparty(bh_tree, terminal_space = 0.5,
        # id specifies node; x and y values need to be between 0 and 1
        layout = data.frame(id = c(1, 2),
                            x = c(0.7, 0.3),
                            y = c(1, 0.9))
        ) +
  geom_edge() +
  geom_edge_label() +
  geom_node_splitvar() +
  geom_node_plot(gglist = list(
    geom_point(aes(y = medv, x = `log(lstat)`, col = chas),
               alpha = 0.6)),
    height = 0.5) +
  geom_node_plot(gglist = list(
    geom_point(aes(y = medv, x = `I(rm^2)`, col = chas),
               alpha = 0.6)),
    height = 0.5,
    nudge_y = -0.25) +
  theme_bw()

Axes, Legends and Limits

As mentioned the nodes of the tree should always be positioned inside the unit square. In case of a shared legend and no shared axis labels, it is plotted at (0.5, -0.05) with just = "top". In case shared axis labels are used, just changes to "bottom" (i.e. the legend shifts approximately 0.05 units downwards), and the x axis label takes its position. Furthermore the shared y axis label will be plotted outside the unit square. I.e. it can often be the case that limits based on the unit square will not be sufficient to capture all elements and ggparty() should be able to automatically cope with these situations.
In case you should need to adjust the x and y limits anyway, be advised to use coord_cartesian(xlim, ylim) instead of ylim and xlim since the latter can easily lead to unintended consequences by removing observations outside the plot limits.

Clone this wiki locally