Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot factors with many (long) levels #15

Open
henningsway opened this issue Mar 20, 2019 · 2 comments
Open

Plot factors with many (long) levels #15

henningsway opened this issue Mar 20, 2019 · 2 comments

Comments

@henningsway
Copy link

When using partykit, I often run into this problem: https://stackoverflow.com/questions/16581587/how-do-i-jitter-the-node-split-strings-in-plotting-ctree-output-from-partykit

Will you library tackle this problem? :)

Ps: the project looks really cool already!

@martin-borkovec
Copy link
Owner

Thanks for the suggestion! Probably wouldn't have anticipated this problem, although it seems obvious now. So in this paricular example it actually works out quite nicely with ggparty, since the default method is to plot the edge labels at the center of the edges. Whether that's generally the best option is open to debate, but at least in this case it's advantageous ;)

library(MASS)
library("partykit")
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm
SexTest <- ctree(sex ~ ., data=Aids2)

library(ggparty)
#> Loading required package: ggplot2
ggparty(SexTest) +
  geom_edge() + 
  geom_edge_label() +
  geom_node_splitvar() +
  geom_nodeplot(gglist = list(geom_bar(aes(x = "",
                                           fill = sex),
                                       position = position_fill()) 
  ))

Created on 2019-03-21 by the reprex package (v0.2.1)

But of course it would be nice to have an option to tackle this issue in other cases, so I've just added an argument to select only specified levels of the split to plot with one geom_edge_label. Playing around with the nudge and/or shift arguments one should hopefully be able to achieve a satisfying solution. Although it might get tricky considering the two separate white label-background boxes, so maybe we'll come up with a better solution.

ggparty(SexTest) +
  geom_edge() + 
  geom_edge_label(splitlevels = 1:2, y_nudge = 0.025) +
  geom_edge_label(splitlevels = 3:4, y_nudge = -0.025) +
  geom_node_splitvar() +
  geom_nodeplot(gglist = list(geom_bar(aes(x = "",
                                           fill = sex),
                                       position = position_fill()) 
  ))

Created on 2019-03-21 by the reprex package (v0.2.1)

@henningsway
Copy link
Author

Awesome. This looks great.

I could also think about:

  • dynamically or manually increasing the vertical distance between two inner nodes (increase edge length), when the factor levels are very long or just so many, that they span multiple lines
  • wrapping individual factor levels or the whole "splitting group" in stringr::str_trunc() or similar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants