-
Notifications
You must be signed in to change notification settings - Fork 14
6 layout
## Boston housing data
data("BostonHousing", package = "mlbench")
BostonHousing <- transform(BostonHousing,
chas = factor(chas, levels = 0:1, labels = c("no", "yes")),
rad = factor(rad, ordered = TRUE))
## linear model tree
bh_tree <- lmtree(medv ~ log(lstat) + I(rm^2) | zn +
indus + chas + nox + age + dis + rad + tax + crim + b + ptratio,
data = BostonHousing, minsize = 40)
Let’s take a look at ggparty()
’s layout system with the help of this
'lmtree'
based on BostonHousing
data set from mlbench.
# terminal space specifies at which value of y the terminal plots begin
bh_plot <- ggparty(bh_tree, terminal_space = 0.5) +
geom_edge() +
geom_edge_label() +
geom_node_splitvar() +
# plot first row
geom_node_plot(gglist = list(
geom_point(aes(y = medv, x = `log(lstat)`, col = chas),
alpha = 0.6)),
# halving the height shrinks plots towards the top
height = 0.5) +
# plot second row
geom_node_plot(gglist = list(
geom_point(aes(y = medv, x = `I(rm^2)`, col = chas),
alpha = 0.6)),
height = 0.5,
# move -0.25 y to use the bottom half of the terminal space
nudge_y = -0.25)
bh_plot
ggparty()
positions all the nodes within the unit square. For vertical
trees the root is always at (0.5, 1)
, for horizontal ones it is at
(0, 0.5)
. The argument terminal_size
specifies how much room should
be left for terminal plots. The default value depends on the depth
of
the supplied tree. The terminal nodes are placed at this value and in
case labels are drawn, they are drawn there. In case plots are to be
drawn their top borders are aligned to this value, i.e. the terminal
plots just
is not "center"
but "top"
. Therefore reducing the
height
of a terminal node shrinks it towards the top.
So if we want to plot multiple plots per node we have to keep this in
mind and can achieve this for example like this.
The first geom_node_plot()
only takes the argument height = 0.5
which halves its size and effectively makes it occupy only the upper
half of the area it would normally do. For the second geom_node_plot()
we also specify the size to be 0.5 but additionally we have to specify
nudge_y
. Since the terminal space is set to be 0.5, we know that the
first plot now spans from 0.5 to 0.25. So we want to move the line where
to place the second plot to 0.25, i.e. nudge it from 0.5 by -0.25.
Changing the theme from the default theme_void
to one for which
gridlines are drawn allows us to see the above described layout
structure.
bh_plot + theme_bw()
We can use this information to manually set the positions of nodes. To
do this we must pass a 'data.frame'
containing the columns id
, x
and y
to the layout
argument of ggparty()
.
ggparty(bh_tree, terminal_space = 0.5,
# id specifies node; x and y values need to be between 0 and 1
layout = data.frame(id = c(1, 2),
x = c(0.7, 0.3),
y = c(1, 0.9))
) +
geom_edge() +
geom_edge_label() +
geom_node_splitvar() +
geom_node_plot(gglist = list(
geom_point(aes(y = medv, x = `log(lstat)`, col = chas),
alpha = 0.6)),
height = 0.5) +
geom_node_plot(gglist = list(
geom_point(aes(y = medv, x = `I(rm^2)`, col = chas),
alpha = 0.6)),
height = 0.5,
nudge_y = -0.25) +
theme_bw()
As mentioned the nodes of the tree should always be positioned inside
the unit square. In case of a shared legend and no shared axis labels,
it is plotted at (0.5, -0.05)
with just = "top"
. In case shared axis
labels are used, just
changes to "bottom"
(i.e. the legend shifts
approximately 0.05 units
downwards), and the x axis label takes its
position. Furthermore the shared y axis label will be plotted outside
the unit square. I.e. it can often be the case that limits based on the
unit square will not be sufficient to capture all elements and
ggparty()
should be able to automatically cope with these
situations.
In case you should need to adjust the x and y limits anyway, be advised
to use coord_cartesian(xlim, ylim)
instead of ylim
and xlim
since
the latter can easily lead to unintended consequences by removing
observations outside the plot limits.