ggplot in a function: variable not found - r

I have an issue trying to create a function to creat a plot using ggplot. Here is some code:
y1<- sample(1:30,45,replace = T)
x1 <- rep(rep(c("a1","a2","a3","a4","a5"),3),each=3)
x2 <- rep(rep(c("b1","b2","b3","b4","b5"),3),each=3)
df <- data.frame(y1,x1,x2)
library(Rmisc)
dfsum <- summarySE(data=df, measurevar="y1",groupvars=c("x1","x2"))
myplot <- function(d,v, w,g) {
pd <- position_dodge(.1)
localenv <- environment()
ggplot(data=d, aes(x=v,y=w,group=g),environment = localenv) +
geom_errorbar(data=d,aes(ymin=d$w-d$se, ymax=d$w+d$se,col=d$g), width=.4, position=pd,environment = localenv) +
geom_line(position=pd,linetype="dotted") +
geom_point(data=d,position=pd,aes(col=g))
}
myplot(dfsum,x1,y1,x2)
As I was looking for similar questions, I found that specifying the local environment should solve the issue. However it did not help in my case.
Thank you

Preliminary Note
When looking at your data.frame, the group variable does not make any sense, as it is perfectly confounded with the x variable. Hence I adapted your data a bit, to show a full example:
Data
library(Rmisc)
library(ggplot2)
d <- expand.grid(x1 = paste0("a", 1:5),
x2 = paste0("b", 1:5))
d <- d[rep(1:NROW(d), each = 3), ]
d$y1 <- rnorm(NROW(d))
dfsum <- summarySE(d, measurevar = "y1", groupvars = paste0("x", 1:2))
Plot Function
myplot <- function(mydat, xvar, yvar, grpvar) {
mydat$ymin <- mydat[[yvar]] - mydat$se
mydat$ymax <- mydat[[yvar]] + mydat$se
pd <- position_dodge(width = .5)
ggplot(mydat, aes_string(x = xvar, y = yvar, group = grpvar,
ymin = "ymin", ymax = "ymax", color = grpvar)) +
geom_errorbar(width = .4, position = pd) +
geom_point(position = pd) +
geom_line(position = pd, linetype = "dashed")
}
myplot(dfsum, "x1", "y1", "x2")
Explanation
Your problem occurs because the scope of x1 x2 and y1 was ambiguous. As you defined these variables also at the top environmnet, R did not complain in the first place. If you had added a rm(x1, x2, y1)in your original code right after you created your data.frame you would have seen the problem already eralier.
ggplot looks in the data.frame you provide for all the variables you want to map to certain aesthetics. If you want to create a function, where you specify the name of the aesthatics as arguments, you should use aes_string instead of aes, as the former expects a string giving the name of the variable rather than the variable itself.
With this approach however, you cannot do calculations on the spot, so you need to create the variables yminand ymaxbeforehand in your data.frame. Furthermore, you do not need to provide the data argument for each geom if it is the same as provided to ggplot.

I've got it plotting something, let me know if this isn't the expected output.
The changes I've made to the code to get it working are:
Load the ggplot2 library
Remove the d$ from the geom_errorbar call to w and g, as these are function arguments rather than columns in d.
I've also removed the data=d calls from all layers except the main ggplot one as these aren't necessary.
library(ggplot2)
myplot <- function(d,v, w,g) {
pd <- position_dodge(.1)
localenv <- environment()
ggplot(data=d, aes(x=v,y=w,group=g),environment = localenv) +
geom_errorbar(aes(ymin=w-se, ymax=w+se,col=g), width=.4,
position=pd,environment = localenv) +
geom_line(position=pd,linetype="dotted") +
geom_point(position=pd,aes(col=g))
}
myplot(dfsum,x1,y1,x2)

Related

How to specify an arbitrary amount of variables in aes in a generic function in R?

I am making a shiny application where the user specifies the independent variables and as a result shiny displays a time series plot with plotly, where on-however each point shows the selected parameters.
If I know the exact number of variables that the user selects, I am able to construct the time series plot without a problem. Let's say there are 3 parameters chosen:
ggp <- ggplot(data = data.depend(), aes(x = Datum, y = y, tmp1 = .data[[input$Coockpit.Dependencies.Undependables[1]]], tmp2 = .data[[input$Coockpit.Dependencies.Undependables[2]]], tmp3 = .data[[input$Coockpit.Dependencies.Undependables[3]]])) +
geom_point()
ggplotly(ggp)
where data.depend() looks like
and the selected parameters are stored in a character vector
So the problem is that for each parameter I want to include in the tooltip, I have to hard code it in the aes function as tmpi = .data[[input$Coockpit.Dependencies.Undependables[i]]]. I would however like to write generic function that handles any amount of selected parameters. Any comment suggestions are welcome.
EDIT:
Below a minimal working example:
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
ggp <- ggplot(data = data.dummy, aes(x = Datum, y = y, tmp1 = .data[[ChosenParams[1]]], tmp2 = .data[[ChosenParams[2]]], tmp3 = .data[[ChosenParams[3]]])) + geom_point()
ggplotly(ggp)
Result:
So this works at the "cost" of me knowing the user is choosing three parameters and therefore I write in aes tmpi = .data[[ChosenParams[i]]]; i=1:3. I am interested in a solution with the same result but where I don't have to write tmpi = .data[[ChosenParams[i]]] i-number of times
Thank you!
One solution is to use eval(parse(...)) to create the code for you:
library(ggplot2)
library(plotly)
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
ggp <- eval(parse(text = paste0("ggplot(data = data.dummy, aes(x = Datum, y = y, ",
paste0("tmp", seq_along(ChosenParams), " = .data[[ChosenParams[", seq_along(ChosenParams), "]]]", collapse = ", "),
")) + geom_point()"
)
))
ggplotly(ggp)
Just note that this is not very efficient and in some cases it is not advised to use it (see What specifically are the dangers of eval(parse(...))?). There might also be a way to use quasiquotation in aes(), but I am not really familiar with it.
EDIT: Added a way to do it with quasiquotation.
I had a look a closer look at quasiquotations in aes() and found a nicer way to do it using syms() and !!!:
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
names(ChosenParams) <- paste0("tmp", seq_along(ChosenParams))
ChosenParams <- syms(ChosenParams)
ggp <- ggplot(data = data.dummy, aes(x = Datum, y = y, !!!ChosenParams)) + geom_point()
ggplotly(ggp)

Split violin plot with points on top to indicate info

This is a followup post from here
and here
I have successfully implemented the split violin ggplot2 for my data (two median estimator densities, for two cases) that need to be compared. Now, since i would like to add some confidence interval. I m following the code posted in the links above:
EDIT: A reproducible example
tmp <- rnorm(1000,0,1)
tmp.2 <- rnorm(1000,0,1)
x.1 <- density(tmp)
y.1 <- density(tmp.2)
Here, i m making the densities, extracting the (x,y) pairs. Then i m getting the quantiles back,
# Make densities
densities <- as.data.frame(c(x.1$x,y.1$x))
colnames(densities) <- "loc"
densities$dens <- c(x.1$y,y.1$y)
densities$drop_case <- c(rep("B",512),rep("S",512))
densities$dens <- ifelse(densities$drop_case=="B",densities$dens*-1,densities$dens)
densities$dens <- ifelse(densities$drop_case=="S",densities$dens*1,densities$dens)
conf <- as.data.frame(c(quantile(tmp,c(0.025,0.975))[1],quantile(tmp,c(0.025,0.975))[2],quantile(tmp.2,c(0.025,0.975))[1],quantile(tmp.2,c(0.025,0.975))[2]))
colnames(conf) <- "intervals"
conf$drop_case <- c(rep("B",2),rep("S",2))
conf$length <- rep(1000,4)
Now here i am trying to extract the values inside the densities, as was noted in the linked posts
Find data points in densities
val.tmp <- rep(0,4)
val.tmp.2 <- rep(0,4)
for (i in 1:4) {
x.here <- densities$loc
y.here <- densities$dens
your.number<- conf$intervals[i]
pos.tmp <- which(abs(x.here-your.number)==min(abs(x.here-your.number)))
val.tmp[i] <- x.here[pos.tmp]
val.tmp.2[i] <- y.here[pos.tmp]
}
conf$positions <- val.tmp
conf$length <- val.tmp.2
conf$length <- ifelse(conf$drop_case=="B",conf$length*-1,conf$length)
conf$length <- ifelse(conf$drop_case=="S",conf$length*1,conf$length)
ggplot(densities,aes(dens, loc, fill = factor(drop_case)))+
geom_polygon()+
scale_x_continuous(breaks = 0, name = info$Name)+
ylab('Estimator Density') +
theme(axis.title.x = element_blank())+
geom_point(data = conf, aes(x = positions, y = length, fill = factor(drop_case), group = factor(drop_case))
,shape = 21, colour = "black", show.legend = FALSE)
Then unfortuantely I am facing the following, the points are not mapped on the densities but are rather mapped on the plane.
There is a bunch of little mistakes in the code. Firstly, within that for loop, you can't set x.here and y.here to all of the density and location values, since that includes both groups. Secondly, since the signs are already changed in densities there is no need to use those ifelse statements afterwards. Thirdly, you would only need the top ifelse anyway, since the bottom one does absolutely nothing. Finally, you had the x and y mappings in geom_point the wrong way around!
There is a bunch of other things one could change to make the code more understandable and pretty, but I'm on limited time, so I'll leave those for what they are.
Below the full adjusted code:
tmp <- rnorm(1000,0,1)
tmp.2 <- rnorm(1000,0,1)
x.1 <- density(tmp)
y.1 <- density(tmp.2)
# Make densities
densities <- as.data.frame(c(x.1$x,y.1$x))
colnames(densities) <- "loc"
densities$dens <- c(x.1$y,y.1$y)
densities$drop_case <- c(rep("B",512),rep("S",512))
densities$dens <- ifelse(densities$drop_case=="B",densities$dens*-1,densities$dens)
conf <- as.data.frame(c(quantile(tmp,c(0.025,0.975)), quantile(tmp.2,c(0.025,0.975))))
colnames(conf) <- "intervals"
conf$drop_case <- c(rep("B",2),rep("S",2))
conf$length <- rep(1000,4)
val.tmp <- rep(0,4)
val.tmp.2 <- rep(0,4)
for (i in 1:4) {
x.here <- densities$loc[densities$drop_case == conf$drop_case[i]]
y.here <- densities$dens[densities$drop_case == conf$drop_case[i]]
your.number<- conf$intervals[i]
pos.tmp <- which(abs(x.here-your.number)==min(abs(x.here-your.number)))
val.tmp[i] <- x.here[pos.tmp]
val.tmp.2[i] <- y.here[pos.tmp]
}
conf$positions <- val.tmp
conf$length <- val.tmp.2
ggplot(densities, aes(dens, loc, fill = drop_case)) +
geom_polygon()+
ylab('Estimator Density') +
theme(axis.title.x = element_blank())+
geom_point(data = conf, aes(x = length, y = positions, fill = drop_case),
shape = 21, colour = "black", show.legend = FALSE)
This results in:
I would personally prefer a plot with line segments:
ggplot(densities, aes(dens, loc, fill = factor(drop_case)))+
geom_polygon()+
ylab('Estimator Density') +
theme(axis.title.x = element_blank())+
geom_segment(data = conf, aes(x = length, xend = 0, y = positions, yend = positions))

User input name to ggplot

I am writing a function to plot heat map for users. In the following example, it plots the change of grade over time for different gender.
However, this is a special case. "Gender" may have other name like "Class".
I will let user input their specific name and then make ggplot have the right label for each axis.
How do I modify my function "heatmap()" based on what I need?
sampledata <- matrix(c(1:60,1:60,rep(0:1,each=60),sample(1:3,120,replace = T)),ncol=3)
colnames(sampledata) <- c("Time","Gender","Grade")
sampledata <- data.frame(sampledata)
heatmap=function(sampledata,Gender)
{
sampledata$Time <- factor(sampledata$Time)
sampledata$Grade <- factor(sampledata$Grade)
sampledata$Gender <- factor(sampledata$Gender)
color_palette <- colorRampPalette(c("#31a354","#2c7fb8", "#fcbfb8","#f03b20"))(length((levels(factor(sampledata$Grade)))))
ggplot(data = sampledata) + geom_tile( aes(x = Time, y = Gender, fill = Grade))+scale_x_discrete(breaks = c("10","20","30","40","50"))+scale_fill_manual(values =color_palette,labels=c("0-1","1-2","2-3","3-4","4-5","5-6",">6"))+ theme_bw()+scale_y_discrete(labels=c("Female","Male"))
}
The easiest solution is redefining the function using aes_string.
When the function is called, you need to pass it the name of the column
you want to use as a string.
heatmap=function(sampledata,y)
{
sampledata$Time <- factor(sampledata$Time)
sampledata$Grade <- factor(sampledata$Grade)
sampledata$new_var <- factor(sampledata[,y])
color_palette <- colorRampPalette(c("#31a354","#2c7fb8", "#fcbfb8","#f03b20"))(length((levels(factor(sampledata$Grade)))))
ggplot(data = sampledata) + geom_tile( aes_string(x = "Time", y = "new_var", fill = "Grade"))+scale_x_discrete(breaks = c("10","20","30","40","50"))+scale_fill_manual(values =color_palette,labels=c("0-1","1-2","2-3","3-4","4-5","5-6",">6"))+ theme_bw()+scale_y_discrete(labels=c("Female","Male")) + ylab(y)
}
# Below an example of how you call the newly defined function
heatmap(sampledata, "Gender")
Alternatively if you want to retain the quote free syntax, there is a slightly more complex solution:
heatmap=function(sampledata,y)
{
arguments <- as.list(match.call())
axis_label <- deparse(substitute(y))
y = eval(arguments$y, sampledata)
sampledata$Time <- factor(sampledata$Time)
sampledata$Grade <- factor(sampledata$Grade)
sampledata$y <- factor(y)
color_palette <- colorRampPalette(c("#31a354","#2c7fb8", "#fcbfb8","#f03b20"))(length((levels(factor(sampledata$Grade)))))
ggplot(data = sampledata) + geom_tile( aes(x = Time, y = y, fill = Grade))+scale_x_discrete(breaks = c("10","20","30","40","50"))+scale_fill_manual(values =color_palette,labels=c("0-1","1-2","2-3","3-4","4-5","5-6",">6"))+ theme_bw()+scale_y_discrete(labels=c("Female","Male")) + ylab(axis_label)
}
# Below an example of how you call the newly defined function
heatmap(sampledata, Gender)

ggplot axis order (factor) changes when using last_plot()

I've been able to successfully create a dotpot in ggplot for percentages across gender. But, I want to highlight the significant differences. I thought I could do this with a combination of subsetting and the use of last_plot().
Here’s my data:
require(ggplot2)
require(reshape2)
prog <- c("Honors", "Academic", "Social", "Media")
m <- c(30,35,40,23)
f <- c(25,40,45,15)
s <- c(0.7, 0.4, 0.1, 0.03)
temp <- as.data.frame(cbind(prog, m, f, s), stringsAsFactors=FALSE)
first <- temp[,1:3]
first.melt <- melt(first, id.vars = 'prog', variable.name = 'Gender', value.name = 'Percent')
first.melt <- as.data.frame(cbind(first.melt,temp[,4]), , stringsAsFactors=FALSE)
names(first.melt) <- c("program", "Gender", "Percent", "sig")
first.melt$program <- as.factor(first.melt$program)
Here’s where I reverse order my Program variable, so that when graphed if will be alphabetical from top to bottom.
first.melt[,1] = with(first.melt, factor(first.melt[,1], levels = rev(levels(first.melt[,1]))))
first.melt$sig <- as.numeric(as.character(first.melt$sig))
first.melt$Percent <- as.numeric(as.character(first.melt$Percent))
Now, I subset...
first.melt.ns <- subset(first.melt,sig > 0.05)
first.melt.sig <- subset(first.melt,sig <= 0.05)
ggplot(first.melt.ns, aes(program, y=Percent, shape=Gender)) +
geom_point(size=3) +
coord_flip() +
scale_shape_manual(values=c("m"=1, "f"=5))
The first run at ggplot get’s me my non-significant Program pairs – and it’s in the right order – so, I add my the two new points for male and female (making them solid, to draw attention as a significant pair):
last_plot() +
geom_point(data=first.melt.sig, aes(program[Gender=="m"], y=Percent[Gender=="m"]), size=3, shape=19) +
geom_point(data=first.melt.sig, aes(program[Gender=="f"], y=Percent[Gender=="f"]),size=4, shape=18)
The points get added just fine – ggplot works. But notice my Program axis – it’s correct, but reversed now.
First, you really should avoid as.data.frame(cbind(...)). It is dramatically increasing the amount of work necessary to prepare your data. The function for creating data frames is (naturally) data.frame. Use it!
What you're doing here is basically trying to get around the limitation of only having one shape scale. It's probably easiest to just do this:
temp <- data.frame(prog,m,f,s)
first <- temp[,1:3]
first.melt <- melt(first, id.vars = 'prog', variable.name = 'Gender', value.name = 'Percent')
first.melt$sig <- rep(temp$s,times = 2)
first.melt[,1] = with(first.melt, factor(first.melt[,1], levels = rev(levels(first.melt[,1]))))
first.melt.sig <- subset(first.melt,sig < 0.05)
first.melt$Percent[first.melt$sig < 0.05] <- NA
ggplot() +
geom_point(data = first.melt,aes(x = prog,y = Percent,shape = Gender),size = 3) +
geom_point(data = first.melt.sig[1,],aes(x = prog,y = Percent),shape = 19) +
geom_point(data = first.melt.sig[2,],aes(x = prog,y = Percent),shape = 18) +
coord_flip() +
scale_shape_manual(values=c("m"=1, "f"=5))
In general, work to structure your ggplot code so that you're subsetting data frames, not variables inside of aes. That gets both tricky and dangerous, because ggplot is assuming certain things about what you pass inside of aes in order for the evaluation to work properly.

How to produce a meaningful draftsman/correlation plot for discrete values

One of my favorite tools for exploratory analysis is pairs(), however in the case of a limited number of discrete values, it falls flat as the dots all align perfectly. Consider the following:
y <- t(rmultinom(n=1000,size=4,prob=rep(.25,4)))
pairs(y)
It doesn't really give a good sense of correlation. Is there an alternative plot style that would?
If you change y to a data.frame you can add some 'jitter' and with the col option you can set the transparency level (the 4th number in rgb):
y <- data.frame(y)
pairs(sapply(y,jitter), col = rgb(0,0,0,.2))
Or you could use ggplot2's plotmatrix:
library(ggplot2)
plotmatrix(y) + geom_jitter(alpha = .2)
Edit: Since plotmatrix in ggplot2 is deprecated use ggpairs (GGally package mentioned in #hadley's comment above)
library(GGally)
ggpairs(y, lower = list(params = c(alpha = .2, position = "jitter")))
Here is an example using corrplot:
M <- cor(y)
corrplot.mixed(M)
You can find more examples in the intro
http://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html
Here are a couple of options using ggplot2:
library(ggplot2)
## re-arrange data (copied from plotmatrix function)
prep.plot <- function(data) {
grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
grid <- subset(grid, x != y)
all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
xcol <- grid[i, "x"]
ycol <- grid[i, "y"]
data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol],
x = data[, xcol], y = data[, ycol], data)
}))
all$xvar <- factor(all$xvar, levels = names(data))
all$yvar <- factor(all$yvar, levels = names(data))
return(all)
}
dat <- prep.plot(data.frame(y))
## plot with transparent jittered points
ggplot(dat, aes(x = x, y=y)) +
geom_jitter(alpha=.125) +
facet_grid(xvar ~ yvar) +
theme_bw()
## plot with color representing density
ggplot(dat, aes(x = factor(x), y=factor(y))) +
geom_bin2d() +
facet_grid(xvar ~ yvar) +
theme_bw()
I don't have enough credits yet to comment on #Vincent 's post - when doing
library(GGally)
ggpairs(y, lower = list(params = c(alpha = .2, position = "jitter")))
I get
Error in stop_if_params_exist(obj$params) :
'params' is a deprecated argument. Please 'wrap' the function to supply arguments. help("wrap", package = "GGally")
So it seems, based on the indicated help page, that it would need to be in this case here:
ydf <- as.data.frame(y)
regularPlot <- ggpairs(ydf, lower = list(continuous = wrap(ggally_points, alpha = .2, position = "jitter")))
regularPlot

Resources