First off fair warning that this is relevant to a quiz question from coursera.org practical machine learning. However, my question does not deal with the actual question asked, but is a tangential question about plotting.
I have a training set of data and I am trying to create a plot for each predictor that includes the outcome on the y axis, the index of the data set on the x axis, and colors the plot by the predictor in order to determine the cause of bias along the index. To make the color argument more clear I am trying to use cut2() from the Hmisc package.
Here is my data:
library(ggplot2)
library(caret)
library(AppliedPredictiveModeling)
library(Hmisc)
data(concrete)
set.seed(1000)
inTrain = createDataPartition(mixtures$CompressiveStrength, p = 3/4)[[1]]
training = mixtures[ inTrain,]
testing = mixtures[-inTrain,]
training$index <- 1:nrow(training)
I tried this and it makes all the plots but they are all the same color.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(paste0("cutEx",i), cut2(x[ ,i]))
print(qplot(x$index, x$CompressiveStrength, color=paste0("cutEx",i)))
}
}
plotCols(training)
Then I tried this and it makes all the plots, and this time they are colored but the cut doesn't work.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(cols[i], cut2(x[ ,i]))
print(qplot(x$index, x$CompressiveStrength, color=x[ ,cols[i]]))
}
}
plotCols(training)
It seems qplot() doesn't like having paste() in the color argument. Does anyone know another way to loop through the color argument and still keep my cuts? Any help is greatly appreciated!
Your desired output is easier to achieve using ggplot() instead of qplot(), since you can use aes_string(), that accepts strings as arguments.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(paste0("cutEx", i), cut2(x[, i]))
p <- ggplot(x) +
aes_string("index", "CompressiveStrength", color = paste0("cutEx", i)) +
geom_point()
print(p)
}
}
plotCols(training)
Related
I have data I'm plotting using ggplot's facet_grid:
My data:
species <- c("spcies1","species2")
conditions <- c("cond1","cond2","cond3")
batches <- 1:6
df <- expand.grid(species=species,condition=conditions,batch=batches)
set.seed(1)
df$y <- rnorm(nrow(df))
df$replicate <- 1
df$col.fill <- paste(df$species,df$condition,df$batch,sep=".")
My plot:
integerBreaks <- function(n = 5, ...)
{
library(scales)
breaker <- pretty_breaks(n, ...)
function(x){
breaks <- breaker(x)
breaks[breaks == floor(breaks)]
}
}
library(ggplot2)
p <- ggplot(df,aes(x=replicate,y=y,color=col.fill))+
geom_point(size=3)+facet_grid(~col.fill,scales="free_x")+
scale_x_continuous(breaks=integerBreaks())+
theme_minimal()+theme(legend.position="none",axis.title=element_text(size=8))
which gives:
Obviously the labels are long and come out pretty messed up in the figure so I was wondering if there's a way edit these labels in the ggplot object (p) or the gtable/gTree/grob/gDesc object (ggplotGrob(p)).
I am aware that one way of getting better labels is to use the labeller function when the ggplot object is created but in my case I'm specifically looking for a way to edit the facet labels after the ggplot object has been created.
As I mentioned in the comments, the facet names are nested quite deeply within the gtable that ggplotGrob() gives you. However, this is still possible and since the OP explicitly wants to edit them after being plotted, you can do this with:
library(grid)
gg <- ggplotGrob(p)
edited_grobs <- mapply(FUN = function(x, y) {
x[["grobs"]][[1]][["children"]][[2]][["children"]][[1]][["label"]] <- y
return(x)
},
gg$grobs[which(grepl("strip-t",gg$layout$name))],
unique(gsub("cond","c", df$condition)),
SIMPLIFY = FALSE)
gg$grobs[which(grepl("strip-t",gg$layout$name))] <- edited_grobs
grid.draw(gg)
Note that this extracts all the strips using gg$grobs[which(grepl("strip-t",gg$layout$name))] and passes them to the mapply to be reset with the gsub(...) that OP specified in their comment.
In general, if you want to access just one of the text labels, there is a very similar structure which I made use of in my mapply:
num_to_access <- 1
gg$grobs[which(grepl("strip-t",gg$layout$name))][[num_to_access]][["grobs"]][[1]][["children"]][[2]][["children"]][[1]]$label
So to access the 4th label for example all you would need to do is change num_to_acces to be 4. Hope this helps!
I am trying to create three different plots in a for loop and then plotting them together in the same graph.
I know that some questions regarding this topic have already been asked. But I do not know what I am doing wrong. Why is my plot being overwritten.
Nevertheless, I tried both solutions (creating a list or using assign function) and I do not know why I get my plot overwriten at the end of the loop.
So, the first solution is to create a list:
library(gridExtra)
library(ggplot2)
out<-list()
for (i in c(1,2,4)){
print(i)
name= paste("WT.1",colnames(WT.1#meta.data[i]), sep=" ")
print(name)
out[[length(out) + 1]] <- qplot(NEW.1#meta.data[i],
geom="density",
main= name)
print(out[[i]])
}
grid.arrange(out[[1]], out[[2]], out[[3]], nrow = 2)
When I print the plot inside the loop, I get what I want...but of course they are not together.
First Plot
When I plot them all together at the end, I get the same plot for all of the three: the last Plot I did.
All together
This is the second option: assign function. I have exactly the same problem.
for ( i in c(1,2,4)) {
assign(paste("WT.1",colnames(WT.1#meta.data[i]),sep="."),
qplot(NEW.1#meta.data[i],geom="density",
main=paste0("WT.1",colnames(WT.1#meta.data[i]))))
}
You're missing to dev.off inside the loop for every iteration. Reproducible code below:
library(gridExtra)
library(ggplot2)
out<-list()
for (i in c(1,2,3)){
print(i)
out[[i]] <- qplot(1:100, rnorm(100), colour = runif(100))
print(out[[i]])
dev.off()
}
grid.arrange(out[[1]], out[[2]], out[[3]], nrow = 2)
In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!
I have a function which manipulates a ggplot object, by converting it to a grob and then modifying the layers. I would like the function to return a ggplot object not a grob. Is there a simple way to convert a grob back to gg?
The documentation on ggplotGrob is awfully sparse.
Simple example:
P <- ggplot(iris) + geom_bar(aes(x=Species, y=Petal.Width), stat="identity")
G <- ggplotGrob(P)
... some manipulation to G ...
## DESIRED:
P2 <- inverse_of_ggplotGrob(G)
such that, we can continue to use basic ggplot syntax, ie
`P2 + ylab ("The Width of the Petal")`
UPDATE:
To answer the question in the comment, the motivation here is to modify the colors of facet labels programmatically, based on the value of label name in each facet. The functions below work nicely (based on input from baptise in a previous question).
I would like for the return value from colorByGroup to be a ggplot object, not simply a grob.
Here is the code, for those interested
get_grob_strips <- function(G, strips=grep(pattern="strip.*", G$layout$name)) {
if (inherits(G, "gg"))
G <- ggplotGrob(G)
if (!inherits(G, "gtable"))
stop ("G must be a gtable object or a gg object")
strip.type <- G$layout[strips, "name"]
## I know this works for a simple
strip.nms <- sapply(strips, function(i) {
attributes(G$grobs[[i]]$width$arg1)$data[[1]][["label"]]
})
data.table(grob_index=strips, type=strip.type, group=strip.nms)
}
refill <- function(strip, colour){
strip[["children"]][[1]][["gp"]][["fill"]] <- colour
return(strip)
}
colorByGroup <- function(P, colors, showWarnings=TRUE) {
## The names of colors should match to the groups in facet
G <- ggplotGrob(P)
DT.strips <- get_grob_strips(G)
groups <- names(colors)
if (is.null(groups) || !is.character(groups)) {
groups <- unique(DT.strips$group)
if (length(colors) < length(groups))
stop ("not enough colors specified")
colors <- colors[seq(groups)]
names(colors) <- groups
}
## 'groups' should match the 'group' in DT.strips, which came from the facet_name
matched_groups <- intersect(groups, DT.strips$group)
if (!length(matched_groups))
stop ("no groups match")
if (showWarnings) {
if (length(wh <- setdiff(groups, DT.strips$group)))
warning ("values in 'groups' but not a facet label: \n", paste(wh, colapse=", "))
if (length(wh <- setdiff(DT.strips$group, groups)))
warning ("values in facet label but not in 'groups': \n", paste(wh, colapse=", "))
}
## identify the indecies to the grob and the appropriate color
DT.strips[, color := colors[group]]
inds <- DT.strips[!is.na(color), grob_index]
cols <- DT.strips[!is.na(color), color]
## Fill in the appropriate colors, using refill()
G$grobs[inds] <- mapply(refill, strip = G$grobs[inds], colour = cols, SIMPLIFY = FALSE)
G
}
I would say no. ggplotGrob is a one-way street. grob objects are drawing primitives defined by grid. You can create arbitrary grobs from scratch. There's no general way to turn a random collection of grobs back into a function that would generate them (it's not invertible because it's not 1:1). Once you go grob, you never go back.
You could wrap a ggplot object in a custom class and overload the plot/print commands to do some custom grob manipulation, but that's probably even more hack-ish.
You can try the following:
p = ggplotify::as.ggplot(g)
For more info, see https://cran.r-project.org/web/packages/ggplotify/vignettes/ggplotify.html
It involves a little bit of a cheat annotation_custom(as.grob(plot),...), so it may not work for all circumstances: https://github.com/GuangchuangYu/ggplotify/blob/master/R/as-ggplot.R
Have a look at the ggpubr package: it has a function as_ggplot(). If your grob is not too complex it might be a solution!
I would also advise to have a look at the patchwork package which combine nicely ggplots... it is likely to not be what you are looking for but... have a look.
Can I provide a parameter to the ggpairs function in the GGally package to use log scales for some, not all, variables?
You can't provide the parameter as such (a reason is that the function creating the scatter plots is predefined without scale, see ggally_points), but you can change the scale afterward using getPlot and putPlot. For instance:
custom_scale <- ggpairs(data.frame(x=exp(rnorm(1000)), y=rnorm(1000)),
upper=list(continuous='points'), lower=list(continuous='points'))
subplot <- getPlot(custom_scale, 1, 2) # retrieve the top left chart
subplotNew <- subplot + scale_y_log10() # change the scale to log
subplotNew$type <- 'logcontinuous' # otherwise ggpairs comes back to a fixed scale
subplotNew$subType <- 'logpoints'
custom_scale <- putPlot(custom_fill, subplotNew, 1, 2)
This is essentially the same answer as Jean-Robert but looks much more simple (approachable). I don't know if it is a new feature but it doesn't look like you need to use getPlot or putPlot anymore.
custom_scale[1,2]<-custom_scale[1,2] + scale_y_log10() + scale_x_log10()
Here is a function to apply it across a big matrix. Supply the number of rows in the plot and the name of the plot.
scalelog2<-function(x=2,g){ #for below diagonal
for (i in 2:x){
for (j in 1:(i-1)) {
g[i,(j)]<-g[i,(j)] + scale_x_continuous(trans='log2') +
scale_y_continuous(trans='log2')
} }
for (i in 1:x){ #for the bottom row
g[(x+1),i]<-g[(x+1),i] + scale_y_continuous(trans='log2')
}
for (i in 1:x){ #for the diagonal
g[i,i]<-g[i,i]+ scale_x_continuous(trans='log2') }
return(g) }
It's probably better use a linear scale and log transform variables as appropriate before supplying them to ggpairs because this avoids ambiguity in how the correlation coefficients have been computed (before or after log-transform).
This can be easily achieved e.g. like this:
library(tidyverse)
log10_vars <- vars(ends_with(".Length")) # define variables to be transformed
iris %>% # use standard R example dataframe
mutate_at(log10_vars, log10) %>% # log10 transform selected columns
rename_at(log10_vars, sprintf, fmt="log10 %s") %>% # rename variables accordingly
GGally::ggpairs(aes(color=Species))