I'm making a tree using the party package for a poster, and the background of the poster is grey. I've been able to change the background of all of my other plots (box plots, scatter plots) to grey by using the command par(bg = "grey") but this doesn't work for ctree.
For example, this makes a scatter plot on a grey background:
airq <- subset(airquality, !is.na(Ozone))
par(bg="grey")
plot(Temp ~ Wind, data = airq)
But this does not make a tree on a grey background:
library("party")
air.ct <- ctree(Ozone ~ ., data = airq)
par(bg = "grey")
plot(air.ct, inner_panel=node_inner(air.ct, pval = TRUE, id = FALSE),
terminal_panel = node_boxplot(air.ct, id = FALSE))
Please help, my poster is due on Thursday!
Both the party package and its successor partykit are based on the grid package for visualization. Therefore, the par() function for base graphics is ignored when creating grid graphics. For the latter, there is a gpar() function but it does not directly support setting a bg background.
Therefore, in the current version of party or partykit setting the background color is not possible via simple arguments - only by supplying adapted panel functions.
However, as this feature was already partially supported in some panel functions, I've adapted the partykit package on R-Forge to enable setting backgrounds. The most recent version of the package is required for this:
library("partykit")
packageDescription("partykit")$Version
## [1] "1.0-5"
The tree can be grown as in your example:
airq <- subset(airquality, !is.na(Ozone))
air.ct <- ctree(Ozone ~ ., data = airq)
Then we first add an empty page with a gray background:
grid.newpage()
grid.rect(gp = gpar(col = "gray", fill = "gray"))
Then the tree can be added:
plot(air.ct,
ip_args = list(id = FALSE, fill = "gray"),
ep_args = list(fill = "gray"),
tp_args = list(id = FALSE, bg = "gray", fill = "slategray"),
newpage = FALSE
)
To obtain this development version of partykit, please go to the R-Forge page of the package. There you can either check out the source package (see "SCM") and install it by hand - or you can wait until a new package has been built (see "R Packages"). The latter should hopefully be completed in a few hours.
Related
I have what is probably the easiest question in the R Studio world. I'm and R novice, bumbling my way through a published program. The program generates dendrogram heatmaps, and I created a custom color palette for each of the dendrograms I need to export. My heatmaps look good, however, I don't know how to either (1) display the custom color-coded scale bar on my heatmap, or alternatively (2) just see and save the custom palette, as you can do with display.brewer.pal, which would allow me at least to have the palette so I can annotate later.
Here's what I have done
pal <- colorRampPalette(c("#4d4d4d", "white", "#32c200")) #GreytoGreen
curr.pal = pal(15)
which gives me (when I say View(curr.pal))
"#4D4D4D" "#666666" "#7F7F7F" "#999999" "#B2B2B2" "#CCCCCC" "#E5E5E5" "#FFFFFF" "#E1F6DA" "#C4EDB6" "#A7E491" "#89DC6D" "#6CD348" "#4FCA24" "#32C200"
The relevant part of the heatmap code is
myHeatmap <- function(x) {
map.input = t(x)
distance <- dist(map.input[, 18:24], method = "euclidean")
cluster <- hclust(distance, method = "complete")
heatmap(map.input, Rowv = as.dendrogram(cluster), Colv = NA, xlab = "Lag", col = curr.pal, scale = "none")
Any suggestions how to get the legend displayed for my 15 color palette on the heat map, or at least get a png of it? Thank you, I apologize for the easy question.
I tried to implement your code (at least, in a similar way) in "gplots" package. The heatmap.2 function has some extensions, as compared to heatmaps.
The code is attached (I used "mtcars" dataframe to play around):
library(gplots)
map.input <- scale(mtcars)
pal <- colorpanel(15, "#4d4d4d", "white", "#32c200")
distance <- dist(map.input, method = "euclidean")
cluster <- hclust(distance, method = "complete")
heatmap.2(map.input, Rowv = as.dendrogram(cluster), Colv = TRUE, xlab = "Lag", col = pal, scale = "none",
trace = "none")
I can aslo easily export the png data in RStudio in the plot tab.
Context
I am using ggraph to arrange nodes (leaves of a tree) in a circular dendrogram and then add connections between some of the nodes (using hierarchical bundling using geom_conn_bundle):
library(ggraph)
library(igraph)
# Example data
edges <- data.frame(from="root", to=paste("leaf", seq(1,100), sep=""))
vertices <- data.frame(name = unique(c(as.character(edges$from), as.character(edges$to))) )
tree <- graph_from_data_frame( edges, vertices=vertices )
# Drawing nodes
pr <- ggraph(tree, layout = "dendrogram", circular = TRUE) +
geom_edge_diagonal(alpha = 0.2)
# Example connection
pr <- pr + geom_conn_bundle(
data = get_con(from = 23, to = 42),
alpha=0.8,
width=3,
colour="skyblue",
tension = 0.9
)
print(pr)
This nicely displays a nearly transparent dendrogram and some (in this example one) connections in skyblue.
Problem / Desired output
What I'd like though, is the direction of the connection being indicated by a color gradient (i.e. starting with green, slowly changing into red) instead of showing the connection in just one color (skyblue). How can I achive such a color gradient using R and ggraph's geom_conn_bundle?
The following excerpt from Holten (2006) can serve of an example of how I'd like the connections to look:
Several of the ggraph geoms for drawing edges, including geom_conn_bundle and geom_edge_diagonal, have a calculated index stat. It's a number from 0 to 1 of how far along the edge a point is. Note that the simplified versions of these geoms (geom_*0) don't calculate it. Some mentions of it are in this blog post by the ggraph author.
In this case, map the index stat(index) to color inside your bundle's aes, then set a gradient scale with (scale_edge_color_gradient, not scale_color_gradient as I initially tried).
In the example picture, I can't tell whether the width is also scaled, but the same would work, e.g. edge_width = stat(index).
library(ggraph)
library(igraph)
ggraph(tree, layout = "dendrogram", circular = TRUE) +
geom_edge_diagonal(alpha = 0.2) +
geom_conn_bundle(aes(color = stat(index)),
data = get_con(from = 23, to = 42),
alpha=0.8,
width=3,
# colour="skyblue",
tension = 0.9
) +
scale_edge_color_gradient(low = "green", high = "red")
Created on 2019-03-09 by the reprex package (v0.2.1)
The partykit package plots barplots at the terminal nodes of trees which gives a visual rendition of the posterior probabilities of the dependent variable classes.
I would like to add those barplots also in the inner nodes, below the standard circles/ellipses. This needs to use a function that is a mixture of node_inner() and node_barplot() to the inner_panel argument of the plot() method.
But those function have pretty complex internals and I'm not sure how to mix the two in order to have to inner plots stacked vertically.
Any ideas?
It's possible, it just doesn't look very appealing. If you want to show the name of the splitting variable and the p-value, it would be better to tweak the mainlab argument of node_barplot. In the answer to
Ctree classification with weights - results displayed there is in illustration how to include weights in the title - in a similar fashion you could display splitting variable and p-value.
If you are determined to set up a new panel function that has two subpanels, you need a little bit of grid programming (the graphics system that the plot() method is based on). You need to set up a grid.layout and then go through the resulting viewports.
make_inner_and_barplot <- function(object, ...) {
function(node) {
## layout
pushViewport(viewport(layout = grid.layout(nrow = 2, ncol = 1,
heights = unit(c(0.2, 0.8), "npc"))))
## background color
grid.rect(gp = gpar(fill = "white", col = 0))
## circle
pushViewport(viewport(layout.pos.col = 1, layout.pos.row = 1))
node_inner(object)(node)
popViewport()
## circle
pushViewport(viewport(layout.pos.col = 1, layout.pos.row = 2))
node_barplot(object, id = FALSE, ...)(node)
popViewport(2)
}
}
With the resulting panel function you can then do:
ct <- ctree(factor(cyl) ~ ., data = mtcars, minsplit = 2)
plot(ct, inner_panel = make_inner_and_barplot(ct), tnex = 0.8)
I am using following commands to produce a scatterplot with jitter:
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5],500,replace=T))
library(lattice)
stripplot(NUMS~GRP,data=ddf, jitter.data=T)
I want to add boxplots over these points (one for every group). I tried searching but I am not able to find code plotting all points (and not just outliers) and with jitter. How can I solve this. Thanks for your help.
Here's one way using base graphics.
boxplot(NUMS ~ GRP, data = ddf, lwd = 2, ylab = 'NUMS')
stripchart(NUMS ~ GRP, vertical = TRUE, data = ddf,
method = "jitter", add = TRUE, pch = 20, col = 'blue')
To do this in ggplot2, try:
ggplot(ddf, aes(x=GRP, y=NUMS)) +
geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
geom_jitter(position=position_jitter(width=.1, height=0))
Obviously you can adjust the width and height arguments of position_jitter() to your liking (although I'd recommend height=0 since height jittering will make your plot inaccurate).
I've written an R function called spreadPoints() within a package basiclotteR. The package can be directly installed into your R library using the following code:
install.packages("devtools")
library("devtools")
install_github("JosephCrispell/basicPlotteR")
For the example provided, I used the following code to generate the example figure below.
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5],500,replace=T))
boxplot(NUMS ~ GRP, data = ddf, lwd = 2, ylab = 'NUMS')
spreadPointsMultiple(data=ddf, responseColumn="NUMS", categoriesColumn="GRP",
col="blue", plotOutliers=TRUE)
It is a work in progress (the lack of formula as input is clunky!) but it provides a non-random method to spread points on the X axis that doubles as a violin like summary of the data. Take a look at the source code, if you're interested.
For a lattice solution:
library(lattice)
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5], 500, replace = T))
bwplot(NUMS ~ GRP, ddf, panel = function(...) {
panel.bwplot(..., pch = "|")
panel.xyplot(..., jitter.x = TRUE)})
The default median dot symbol was changed to a line with pch = "|". Other properties of the box and whiskers can be adjusted with box.umbrella and box.rectangle through the trellis.par.set() function. The amount of jitter can be adjusted through a variable named factor where factor = 1.5 increases it by 50%.
I am trying to visualize my data flow with a Sankey Diagram in R.
I found this blog post linking to an R script that produces a Sankey Diagram; unfortunately, it's quite raw and somewhat limited (see below for sample code and data).
Does anyone know of other scripts—or maybe even a package—that is more developed? My end goal is to visualize both data flow and percentages by relative size of diagram components, like in these examples of Sankey Diagrams.
I posted a somewhat similar question on the r-help list, but after two weeks without any responses I'm trying my luck here on stackoverflow.
Thanks,
Eric
PS. I'm aware of the Parallel Sets Plot, but that is not what I'm looking for.
# thanks to, https://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/
sourc.https <- function(url, ...) {
# install and load the RCurl package
if (match('RCurl', nomatch=0, installed.packages()[,1])==0) {
install.packages(c("RCurl"), dependencies = TRUE)
require(RCurl)
} else require(RCurl)
# parse and evaluate each .R script
sapply(c(url, ...), function(u) {
eval(parse(text = getURL(u, followlocation = TRUE,
cainfo = system.file("CurlSSL", "cacert.pem",
package = "RCurl"))), envir = .GlobalEnv)
} )
}
# from https://gist.github.com/1423501
sourc.https("https://raw.github.com/gist/1423501/55b3c6f11e4918cb6264492528b1ad01c429e581/Sankey.R")
# My example (there is another example inside Sankey.R):
inputs = c(6, 144)
losses = c(6,47,14,7, 7, 35, 34)
unit = "n ="
labels = c("Transfers",
"Referrals\n",
"Unable to Engage",
"Consultation only",
"Did not complete the intake",
"Did not engage in Treatment",
"Discontinued Mid-Treatment",
"Completed Treatment",
"Active in \nTreatment")
SankeyR(inputs,losses,unit,labels)
# Clean up my mess
rm("inputs", "labels", "losses", "SankeyR", "sourc.https", "unit")
Sankey Diagram produced with the above code,
This plot can be created through the networkD3 package. It allows you to create interactive sankey diagrams. Here you can find an example. I also added a screenshot so you have an idea what it looks like.
# Load package
library(networkD3)
# Load energy projection data
# Load energy projection data
URL <- paste0(
"https://cdn.rawgit.com/christophergandrud/networkD3/",
"master/JSONdata/energy.json")
Energy <- jsonlite::fromJSON(URL)
# Plot
sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
units = "TWh", fontSize = 12, nodeWidth = 30)
I have created a package (riverplot) that has a slightly different, but overlapping functionality compared to the Sankey function, and can produce plots like this one:
If you want to do it with R, your best bid seems to be #Roman suggestion - hack the SankeyR function. For example - below is my very quick fix - simply orient labels verticaly, slighlty offset them and decrease the font for input referals to make it look a bit better. This modification only changes line 171 and 223 in the SankeyR function:
#line171 - change oversized font size of input label
fontsize = max(0.5,frInputs[j]*1.5)#1.5 instead of 2.5
#line223 - srt changes from 35 to 90 to orient labels vertically,
#and offset adjusts them to get better alignment with arrows
text(txtX, txtY, fullLabel, cex=fontsize, pos=4, srt=90, offset=0.1)
I am no ace in trigonometry, but this is really what you need for changing the direction of arrows. That would be ideal in my view - if you could adjust looses arrows so they are oriented horizontally rather then vertically. Otherwise, why my solution fixes the problem with labels orientation, it doesn't make the diagram much more readable...
In addition to rCharts, Sankey diagrams can now be also generated in R with googleVis (version >= 0.5.0). For example, this post describes the generation of the following diagram using googleVis:
R's alluvial package will also do this (from ?alluvial).
# install.packages(c("alluvial"), dependencies = TRUE)
require(alluvial)
# Titanic data
tit <- as.data.frame(Titanic)
# 4d
alluvial( tit[,1:4], freq=tit$Freq, border=NA,
hide = tit$Freq < quantile(tit$Freq, .50),
col=ifelse( tit$Class == "3rd" & tit$Sex == "Male", "red", "gray") )
plotly has the same power as networkD3 package (example link).
For completeness, there is also the ggalluvial package which is a ggplot2 extension for alluvial/Sankey diagrams.
Here is an example taken from the package's documentation
# devtools::install_github("corybrunson/ggalluvial", ref = "optimization")
library(ggalluvial)
titanic_wide <- data.frame(Titanic)
ggplot(data = titanic_wide,
aes(axis1 = Class, axis2 = Sex, axis3 = Age,
y = Freq)) +
scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +
xlab("Demographic") +
geom_alluvium(aes(fill = Survived)) +
geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +
theme_minimal() +
ggtitle("passengers on the maiden voyage of the Titanic",
"stratified by demographics and survival") +
theme(legend.position = 'bottom')
ggplot(titanic_wide,
aes(y = Freq,
axis1 = Survived, axis2 = Sex, axis3 = Class)) +
geom_alluvium(aes(fill = Class),
width = 0, knot.pos = 0, reverse = FALSE) +
guides(fill = FALSE) +
geom_stratum(width = 1/8, reverse = FALSE) +
geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
scale_x_continuous(expand = c(0, 0),
breaks = 1:3, labels = c("Survived", "Sex", "Class")) +
scale_y_discrete(expand = c(0, 0)) +
coord_flip() +
ggtitle("Titanic survival by class and sex")
Created on 2018-11-13 by the reprex package (v0.2.1.9000)
Judging by these definitions this function, like the Parallel Sets Plot, lacks the capacity to split and combine flows (i.e. through more than one transition).
Since Sankey diagrams are directed weighted graphs, a package like qgraph might be useful.
The SankeyR function provides clearer labels if you sort the losses in descending order as the text is placed closer to the arrow heads without overlapping.
have a look at //sankeybuilder.com as it offers a ready to go solution where you can upload your data and playback variations over time. The transition works well (similar to the youtube demo in your question). If you load the SankeyTrend demo it includes many time slots (Years of data). Once loaded (builds sankeys automatically), click the play button in the upper right hand corner of the page for playback of the time slots, you can even pause and resume time. Demo url is here: SankeyTrend Hope this helps your quest for the perfect Sankey diagram.
Just open sourced a package that uses an alluvial diagram to visualize workflow stages. Since history is kept when the alluvial form is used, there aren't any crossovers in the edges.
https://github.com/claytontstanley/shiny.alluvial