Aesthetics Error When Calling ggplot() Using Two Methods - r

My end goal is to create a function to easily build a series of ggplot objects. However in running some tests on the a piece of the code I plan to use within my function I'm receiving a geom_point aesthetics error whose cause doesn't seem to match other instances of this error for which I've found SO questions.
Reproducible code below
library(ggpubr)
library(ggplot2)
redData <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
,header = TRUE, sep = ";")
datatest <- redData
x <- "alcohol"
y <- "quality"
#PlotTest fails with Error: geom_point requires the following missing aesthetics: x, y
PlotTest<-ggplot(datatest, aes(datatest$x,datatest$y)) +
geom_point()+xlim(0,15)+ylim(0,10)
#PlotTest2 works just fine, they should be functionally equivalent
PlotTest2 <- ggplot(redData, aes(redData$"alcohol", redData$"quality")) +
geom_point()+xlim(0,15)+ylim(0,10)
PlotTest
PlotTest2
PlotTest and PlotTest2 should be functionally equivalent, but they clearly are not but I can't see what causes one to work and not the other.
EDIT
I realize now that datatest$x,datatest$y dont actually resolve to datatest$"alcohol" and datatest$"quality". That was silly.
Is there some way to access data via a variable name that stores the column name? That would be what I need.

library(ggpubr)
library(ggplot2)
redData <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv" ,header = TRUE, sep = ";")
datatest <- redData
x <- "alcohol"
y <- "quality"
ggplot(datatest,aes(x=datatest[,x],y=datatest[,y]))+geom_point()+xlim(0,15)+ylim(0,10)+labs(x=x,y=y)
ggplot(redData,aes(x=alcohol,y=quality))+geom_point()+xlim(0,15)+ylim(0,10)

You can use aes_string() which takes character variables as argument names:
library(dplyr)
library(ggplot2)
plot_cars <- function(data = mtcars, x, y) {
data %>%
ggplot(aes_string(x, y)) +
geom_point()
}
plot_cars(x = "mpg", y = "cyl")
In your example above you'd call ggplot(redData, aes_string(x, y))..., though don't have your data to test that.

Related

How to use column names starting with numbers in ggplot functions

I have a huge dataframe, whose variables/ column names start with a number such as `1_variable`. Now I am trying to create a function that can take these column names as arguments to then plot a few boxplots using ggplot. However I need the string but also need to to use its input with `` to use the arguments in ggplot. However I am not sure how to escape the character string such as "1_variable" to give ggplot an input that is `1_variable`.
small reproducible example:
dfx = data.frame(`1ev`=c(rep(1,5), rep(2,5)), `2ev`=sample(10:99, 10),
`3ev`=10:1, check.names = FALSE)
If I were to plot the figure manually, the input would look like this:
dfx$`1ev` <- as.factor(dfx$`1ev`)
ggplot(dfx, aes(x = `1ev`, y = `2ev`))+
geom_boxplot()
the function I'd like to be able to run for the dataframe is this one:
plot_boxplot <- function(data, group, value){
data = data[c(group, value)]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes(x = group, y = value))+
geom_boxplot()
return(plot)
}
1. Try
plot_boxplot(dfx, `1ev`, `2ev`)
which gives me an error saying Error in [.data.frame(data, c(group, value)) : object '1ev' not found
2. Try
entering the arguments with double quotes "" gives me unexpectedly this:
plot_boxplot(dfx, "1ev", "2ev")
3. Try
I also tried to replace the double quotes of the string with gsub in the function
gsub('\"', '`', group)
but that does not change anything abut its output.
4. Try
finally, I also tried to make use of aes_string , but that just gives me the same errors.
plot_boxplot <- function(data, group, value){
data = data[c(as.character(group), as.character(value))]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes_string(x= group, y=value))+
geom_boxplot()
return(plot)
}
plot_boxplot(dfx, `1ev`, `2ev`)
plot_boxplot(dfx, "1ev", "2ev")
Ideally I would like to run the function to produce this output:
plot_boxplot(dfx, group = "1ev", value = "2ev")
[can be produced with this code manually]
ggplot(dfx, aes(x= `1ev`, y=`2ev`)) +
geom_boxplot()
Any help would be greatly appreciated.
One way to do this is a combination of aes_ and as.name():
plot_boxplot <- function(data, group, value){
data = data[c(group, value)]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes_(x= as.name(group), y=as.name(value))) +
geom_boxplot()
return(plot)
}
And passing in strings for group and value:
plot_boxplot(dfx, "1ev", "2ev")
It's not the same plot you show above, but it looks to align with the data.

creating a subset of data frame when running a loop

I'm quite new in R, trying to find my why around. I have created a new data frame based on the "original" data frame.
library(dplyr)
prdgrp <- as.vector(mth['MMITCL'])
prdgrp %>% distinct(MMITCL)
When doing this, then the result is a list of Unique values of the column MMITCL. I would like to use this data in a loop sequence that first creates a new subset of the original data and the prints a graph based on this:
#START LOOP
for (i in 1:length(prdgrp))
{
# mth[c(MMITCL==prdgrp[i],]
mth_1 <- mth[c(mth$MMITCL==prdgrp[i]),]
# Development of TPC by month
library(ggplot2)
library(scales)
ggplot(mth_1, aes(Date, TPC_MTD))+ geom_line()
}
# END LOOP
Doing this gives me the following error message:
Error in mth$MMITCL == prdgrp[i] :
comparison of these types is not implemented
In addition: Warning:
I `[.data.frame`(mth, c(mth$MMITCL == prdgrp[i]), ) :
Incompatible methods ("Ops.factor", "Ops.data.frame") for "=="
What am I doing wrong.
If you just want to plot the outputs there is no need to subset the dataframe, it is simpler to just put ggplot in a loop (or more likely use facet_wrap). Without seeing your data it is a bit hard to give you a precise answer. However there are two generic iris examples below - hopefully these will also show where you made the error in sub setting your dataframe. Please let me know if you have any questions.
library(ggplot2)
#looping example
for(i in 1:length(unique(iris$Species))){
g <- ggplot(data = iris[iris$Species == unique(iris$Species)[i], ],
aes(x = Sepal.Length,
y = Sepal.Width)) +
geom_point()
print(g)
}
#facet_wrap example
g <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
facet_wrap(~Species)
g
However if you need to save the data frames for later use, one option is to put them into a list. If you only need to save the data frame with in the loop you can just remove the list and use whatever variable name you wish.
myData4Later <- list()
for(i in 1:length(unique(iris$Species))){
myData4Later[[i]] <- iris[iris$Species == unique(iris$Species)[i], ]
g <- ggplot(data = myData4Later[[i]],
aes(x = Sepal.Length,
y = Sepal.Width)) +
geom_point()
print(g)
}

Function for formatting and plotting in R

I am currently trying to create a function that will format my data and properly and return a bar plot that is sorted. Yet for some reason I keep getting this error:
Error in `$<-.data.frame`(`*tmp*`, "Var1", value = integer(0)) :
replacement has 0 rows, data has 3
I have tried debugging it, but have had no luck. I have an example of what I expect down at the bottom. Can anyone spot what I am doing wrong?
x <- rep(c("Mark","Jimmy","Jones","Jones","Jones","Jimmy"),2)
y <- rnorm(12)
df <- data.frame(x,y)
plottingfunction <- function(data, name,xlabel,ylabel,header){
newDf <- data.frame(table(data))
order <- newDf[order(newDf$Freq, decreasing = FALSE), ]$Var1
newDf$Var1 <- factor(newDf$Var1,order)
colnames(newDf)[1] <- name
plot <- ggplot(newDf, aes(x=name, y=Freq)) +
xlab(xlabel) +
ylab(ylabel) +
ggtitle(header) +
geom_bar(stat="identity", fill="lightblue", colour="black") +
coord_flip()
return(plot)
}
plottingfunction(df$x, "names","xlabel","ylabel","header")
A few comments, your function didn't work, because this part isn't correct:
order <- newDf[order(newDf$Freq, decreasing = FALSE), ]$Var1
Since we have no idea if there will be any columns in data which has the column name Var1. What looks like happend is when you were trying your code you ran:
newDf <- data.frame(table(df$x))
which immediately renamed your column to Var1, but when you ran your function, the name changed. So to get around this I would recommend being explicit with your column names. In this example, I used the dplyr library to make my life easier. So following your code and logic it would look like this:
newDf <- data %>% group_by_(col_name) %>% tally
order <- newDf[order(newDf$n, decreasing = FALSE), col_name][[col_name]]
data[,col_name] <- factor(data[,col_name], order)
Then within your ggplot we can use aes_string to refer to the column name of the data frame instead. So then the whole function would look like this:
plottingFunction <- function(data, col_name, xlabel, ylabel, header) {
#' create a dataframe with the data that we're interested in
#' make sure that you preserve the anme of the column that you're
#' counting on...
newDf <- data %>% group_by_(col_name) %>% tally
order <- newDf[order(newDf$n, decreasing = FALSE), col_name][[col_name]]
data[,col_name] <- factor(data[,col_name], order)
plot <- ggplot(data, aes_string(col_name)) +
xlab(xlabel) +
ylab(ylabel) +
ggtitle(header) +
geom_bar(fill="lightblue", colour="black") +
coord_flip()
return(plot)
}
plottingFunction(df, "x", "xlabel","ylabel","header")
Which would have output like:
I think for your plot having stat="identity" is redundant since you can just use your original data frame rather than having a transformed one.

Object created inside function not found by ggplot

I have a csv of time series data for a number of sites that I produce ggplots for, showing changes in means using the changepoint package. I have written a function that takes the csv, performs some calculations to get the means then loops through the sites producing a plot for each. My problem is that an object created in the for loop isn't found.
A very simplified example is below but produces the same error:
df1 <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-10"),
by = "day"),
site1 = runif(10),
site2 = runif(10),
site3 = runif(10))
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=df2[,2]))+
geom_line(data=df2, aes(x=date, y=df2[,2]))
sname.i<-sname[i]
filename<-paste0(sname.i, "-test-plot.pdf")
ggsave(file=filename, plot)
}
}
example(df1)
The error I get is: " Error in eval(expr, envir, enclos) : object 'df2' not found"
I'm not quite sure what the problem is as I have created similar loops which have worked in the past. If I assign a value to i and step through the code within the loop it works fine. I'm thinking an environment problem? Or is ggsave doing something wiggy? Any help/pointers gratefully received.
Thanks.
You problem is not so much your code, but the implementation of the ggplot2 package. This package uses nonstandard evaluation, and that can seriously mess up your results.
Take a look at the example code at the end of this post. I create in the global environment a data frame called df2 with different values. If I run your code now, you get plots that looks like this:
Note that on the X axis, it uses the correct dates, but the values on the Y axis are the ones from the dataframe df2 that is in the global environment! So the function aes() looks for the data in two different places. If you specify the name of a variable as a symbol (date) the function first looks in the data frame that is specified in the function call. However, an expression like df2[,2] cannot be found in the dataframe, as there is no variable with that name. Due to the way the ggplot2 package is constructed, R will look for that in the global environment instead of the calling environment.
As per wici's comment: Your best option is probably to use the function aes_string(), as this allows you to pass the aes in character form, and this function evaluates expressions in the correct environment :
plot <- ggplot()+
geom_point(data=df2, aes_string(x="date", y=sname[i]))+
geom_line(data=df2, aes_string(x="date", y=sname[i]))
Alternatively, you can get around that by using eval() and parse() like this:
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
aesy <- sname[i]
command <- paste("plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=",aesy,"))+
geom_line(data=df2, aes(x=date, y=",aesy,"))")
eval(parse(text=command))
sname.i<-sname[i]
print(plot)
}
If you try that out with the example script below, you'll see that this time around you get the correct values displayed. Note that this is a suboptimal solution, as most solutions involving eval(). I'd go for aes_string() here.
EXAMPLE SCRIPT
df1 <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-10"),
by = "day"),
site1 = runif(10),
site2 = runif(10),
site3 = runif(10))
df2 <- data.frame(date = seq(as.Date("2014-10-01"), as.Date("2014-10-10"),
by = "day"),
site1 = runif(10,10,20),
site2 = runif(10,10,20),
site3 = runif(10,10,20))
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=df2[,2]))+
geom_line(data=df2, aes(x=date, y=df2[,2]))
sname.i<-sname[i]
print(plot)
}
}
example(df1)

Local Variables Within aes

I'm trying to use a local variable in aes when I plot with ggplot. This is my problem boiled down to the essence:
xy <- data.frame(x=1:10,y=1:10)
plotfunc <- function(Data,YMul=2){
ggplot(Data,aes(x=x,y=y*YMul))+geom_line()
}
plotfunc(xy)
This results in the following error:
Error in eval(expr, envir, enclos) : object 'YMul' not found
It seems as if I cannot use local variables (or function arguments) in aes. Could it be that it occurrs due to the content of aes being executed later when the local variable is out of scope? How can I avoid this problem (other than not using the local variable within aes)?
I would capture the local environment,
xy <- data.frame(x=1:10,y=1:10)
plotfunc <- function(Data, YMul = 2){
.e <- environment()
ggplot(Data, aes(x = x, y = y*YMul), environment = .e) + geom_line()
}
plotfunc(xy)
Here's an alternative that allows you to pass in any value through the YMul argument without having to add it to the Data data.frame or to the global environment:
plotfunc <- function(Data, YMul = 2){
eval(substitute(
expr = {
ggplot(Data,aes(x=x,y=y*YMul)) + geom_line()
},
env = list(YMul=YMul)))
}
plotfunc(xy, YMul=100)
To see how this works, try out the following line in isolation:
substitute({ggplot(Data, aes(x=x, y=y*YMul))}, list(YMul=100))
ggplot()'s aes expects YMul to be a variable within the data data frame. Try including YMull there instead:
Thanks to #Justin: ggplot()'s aes seems to look forYMul in the data data frame first, and if not found, then in the global environment. I like to add such variables to the data frame, as follows, as it makes sense to me conceptually. I also don't have to worry about changes to global variables having unexpected consequences to functions. But all of the other answers are also correct. So, use whichever suits you.
require("ggplot2")
xy <- data.frame(x = 1:10, y = 1:10)
xy <- cbind(xy, YMul = 2)
ggplot(xy, aes(x = x, y = y * YMul)) + geom_line()
Or, if you want the function in your example:
plotfunc <- function(Data, YMul = 2)
{
ggplot(cbind(Data, YMul), aes(x = x, y = y * YMul)) + geom_line()
}
plotfunc(xy)
I am using ggplot2, and your example seems to work fine with the current version.
However, it is easy to come up with variants which still create trouble. I was myself confused by similar behavior, and that's how I found this post (top Google result for "ggplot how to evaluate variables when passed"). For example, if we move ggplot out of plotfunc:
xy <- data.frame(x=1:10,y=1:10)
plotfunc <- function(Data,YMul=2){
geom_line(aes(x=x,y=y*YMul))
}
ggplot(xy)+plotfunc(xy)
# Error in eval(expr, envir, enclos) : object 'YMul' not found
In the above variant, "capturing the local environment" is not a solution because ggplot is not called from within the function, and only ggplot has the "environment=" argument.
But there is now a family of functions "aes_", "aes_string", "aes_q" which are like "aes" but capture local variables. If we use "aes_" in the above, we still get an error because now it doesn't know about "x". But it is easy to refer to the data directly, which solves the problem:
plotfunc <- function(Data,YMul=2){
geom_line(aes_(x=Data$x,y=Data$y*YMul))
}
ggplot(xy)+plotfunc(xy)
# works
Have you looked at the solution given by #wch (W. Chang)?
https://github.com/hadley/ggplot2/issues/743
I think it is the better one
essentially is like that of #baptiste but include the reference to the environment directly in the call to ggplot
I report it here
g <- function() {
foo3 <- 4
ggplot(mtcars, aes(x = wt + foo3, y = mpg),
environment = environment()) +
geom_point()
}
g()
# Works
If you execute your code outside of the function it works. And if you execute the code within the function with YMul defined globally, it works. I don't fully understand the inner workings of ggplot but this works...
YMul <- 2
plotfunc <- function(Data){
ggplot(Data,aes(x=x,y=y*YMul))+geom_line()
}
plotfunc(xy)

Resources