I have a csv of time series data for a number of sites that I produce ggplots for, showing changes in means using the changepoint package. I have written a function that takes the csv, performs some calculations to get the means then loops through the sites producing a plot for each. My problem is that an object created in the for loop isn't found.
A very simplified example is below but produces the same error:
df1 <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-10"),
by = "day"),
site1 = runif(10),
site2 = runif(10),
site3 = runif(10))
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=df2[,2]))+
geom_line(data=df2, aes(x=date, y=df2[,2]))
sname.i<-sname[i]
filename<-paste0(sname.i, "-test-plot.pdf")
ggsave(file=filename, plot)
}
}
example(df1)
The error I get is: " Error in eval(expr, envir, enclos) : object 'df2' not found"
I'm not quite sure what the problem is as I have created similar loops which have worked in the past. If I assign a value to i and step through the code within the loop it works fine. I'm thinking an environment problem? Or is ggsave doing something wiggy? Any help/pointers gratefully received.
Thanks.
You problem is not so much your code, but the implementation of the ggplot2 package. This package uses nonstandard evaluation, and that can seriously mess up your results.
Take a look at the example code at the end of this post. I create in the global environment a data frame called df2 with different values. If I run your code now, you get plots that looks like this:
Note that on the X axis, it uses the correct dates, but the values on the Y axis are the ones from the dataframe df2 that is in the global environment! So the function aes() looks for the data in two different places. If you specify the name of a variable as a symbol (date) the function first looks in the data frame that is specified in the function call. However, an expression like df2[,2] cannot be found in the dataframe, as there is no variable with that name. Due to the way the ggplot2 package is constructed, R will look for that in the global environment instead of the calling environment.
As per wici's comment: Your best option is probably to use the function aes_string(), as this allows you to pass the aes in character form, and this function evaluates expressions in the correct environment :
plot <- ggplot()+
geom_point(data=df2, aes_string(x="date", y=sname[i]))+
geom_line(data=df2, aes_string(x="date", y=sname[i]))
Alternatively, you can get around that by using eval() and parse() like this:
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
aesy <- sname[i]
command <- paste("plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=",aesy,"))+
geom_line(data=df2, aes(x=date, y=",aesy,"))")
eval(parse(text=command))
sname.i<-sname[i]
print(plot)
}
If you try that out with the example script below, you'll see that this time around you get the correct values displayed. Note that this is a suboptimal solution, as most solutions involving eval(). I'd go for aes_string() here.
EXAMPLE SCRIPT
df1 <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-10"),
by = "day"),
site1 = runif(10),
site2 = runif(10),
site3 = runif(10))
df2 <- data.frame(date = seq(as.Date("2014-10-01"), as.Date("2014-10-10"),
by = "day"),
site1 = runif(10,10,20),
site2 = runif(10,10,20),
site3 = runif(10,10,20))
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=df2[,2]))+
geom_line(data=df2, aes(x=date, y=df2[,2]))
sname.i<-sname[i]
print(plot)
}
}
example(df1)
Related
My end goal is to create a function to easily build a series of ggplot objects. However in running some tests on the a piece of the code I plan to use within my function I'm receiving a geom_point aesthetics error whose cause doesn't seem to match other instances of this error for which I've found SO questions.
Reproducible code below
library(ggpubr)
library(ggplot2)
redData <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
,header = TRUE, sep = ";")
datatest <- redData
x <- "alcohol"
y <- "quality"
#PlotTest fails with Error: geom_point requires the following missing aesthetics: x, y
PlotTest<-ggplot(datatest, aes(datatest$x,datatest$y)) +
geom_point()+xlim(0,15)+ylim(0,10)
#PlotTest2 works just fine, they should be functionally equivalent
PlotTest2 <- ggplot(redData, aes(redData$"alcohol", redData$"quality")) +
geom_point()+xlim(0,15)+ylim(0,10)
PlotTest
PlotTest2
PlotTest and PlotTest2 should be functionally equivalent, but they clearly are not but I can't see what causes one to work and not the other.
EDIT
I realize now that datatest$x,datatest$y dont actually resolve to datatest$"alcohol" and datatest$"quality". That was silly.
Is there some way to access data via a variable name that stores the column name? That would be what I need.
library(ggpubr)
library(ggplot2)
redData <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv" ,header = TRUE, sep = ";")
datatest <- redData
x <- "alcohol"
y <- "quality"
ggplot(datatest,aes(x=datatest[,x],y=datatest[,y]))+geom_point()+xlim(0,15)+ylim(0,10)+labs(x=x,y=y)
ggplot(redData,aes(x=alcohol,y=quality))+geom_point()+xlim(0,15)+ylim(0,10)
You can use aes_string() which takes character variables as argument names:
library(dplyr)
library(ggplot2)
plot_cars <- function(data = mtcars, x, y) {
data %>%
ggplot(aes_string(x, y)) +
geom_point()
}
plot_cars(x = "mpg", y = "cyl")
In your example above you'd call ggplot(redData, aes_string(x, y))..., though don't have your data to test that.
I am trying to add multiple lines to a plot using a for loop. Using i as a variable to navigate in my data, for some reason, I get an error in the ggplot part all of a sudden, saying that that object i is not found ("Error in as.name(names(data$SURF)[i]) : object 'i' not found"). It seems like only those variables are accepted, that are linked to the data frame given to ggplot. I have used different variables in ggplots before though, so I don't know why it doesn't work now. This is my code:
library(ggplot2)
#creating object 'data' for example purpose
surf <- list()
day <- c(1:7)
for(i in 1:5){
surf[[i]] <- data.frame(day*i, day-i)
names(surf)[i] <- paste("var",i,sep = "")
colnames(surf[[i]]) <- c("T0", "whatever")
}
data <- list(surf)
names(data)[1] <- "SURF"
df <- data.frame(day)
ret <- ggplot(df, aes(x=day))
for(i in 1:length(names(data$SURF))){
df[,i+1] <- data$SURF[[i]]$T0
colnames(df)[i+1] <- names(data$SURF)[i]
ret <- ret + geom_line(data = df, aes(y=as.name(names(data$SURF)[i]), colour= names(data$SURF)[i]))
}
I managed to solve the problem in a not fully pleasing way, by omitting new variables in the plot part. I am not fully content with this solution though, because I want to keep working from a more automated code. This is the 'dirty" solution:
df <- data.frame(day)
ret <- ggplot(df, aes(x=day))
for(i in 1:length(names(data$SURF))){
df$y <- data$SURF[[i]]$T0
df$name <- names(data$SURF[i])
ret <- ret + geom_line(data = df, aes(y=y, colour= name))
}
I'd be grateful if someone could help me figure out why the use of 'external' variable i does not work in this example.
Thanks for the help. I see, my head was thinking in the wrong direction and way too complicated. Using the melt() function of the reshape2 package really gives me everything I need for this purpose. For anyone interested, here is how I solved my problem:
library(reshape2)
library(ggplot2)
for(i in 1:length(data$SURF)){
df[,i+1] <- data$SURF[[i]]$T0
}
colnames(df) <- c("day", names(data$SURF))
mlt <- melt(data = df, id.vars = "day")
ret <- ggplot(mlt) +
aes(x=day, y=value, group=variable, colour = variable) +
geom_line()
I have a df with multiple y-series which I want to plot individually, so I wrote a fn that selects one particular series, assigns to a local variable dat, then plots it. However ggplot/geom_step when called inside the fn doesn't treat it properly like a single series. I don't see how this can be a scoping issue, since if dat wasn't visible, surely ggplot would fail?
You can verify the code is correct when executed from the toplevel environment, but not inside the function. This is not a duplicate question. I understand the problem (this is a recurring issue with ggplot), but I've read all the other answers; this is not a duplicate and they do not give the solution.
set.seed(1234)
require(ggplot2)
require(scales)
N = 10
df <- data.frame(x = 1:N,
id_ = c(rep(20,N), rep(25,N), rep(33,N)),
y = c(runif(N, 1.2e6, 2.9e6), runif(N, 5.8e5, 8.9e5) ,runif(N, 2.4e5, 3.3e5)),
row.names=NULL)
plot_series <- function(id_, envir=environment()) {
dat <- subset(df,id_==id_)
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
# Unsuccessfully trying the approach from http://stackoverflow.com/questions/22287498/scoping-of-variables-in-aes-inside-a-function-in-ggplot
p$plot_env <- envir
plot(p)
# Displays wrongly whether we do the plot here inside fn, or return the object to parent environment
return(p)
}
# BAD: doesn't plot geom_step!
plot_series(20)
# GOOD! but what's causing the difference?
ggplot(data=subset(df,id_==20), mapping=aes(x,y), color='red') + geom_step()
#plot_series(25)
#plot_series(33)
This works fine:
plot_series <- function(id_) {
dat <- df[df$id_ == id_,]
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
return(p)
}
print(plot_series(20))
If you simply step through the original function using debug, you'll quickly see that the subset line did not actually subset the data frame at all: it returned all rows!
Why? Because subset uses non-standard evaluation and you used the same name for both the column name and the function argument. As jlhoward demonstrates above, it would have worked (but probably not been advisable) to have simply used different names for the two.
The reason is that subset evaluates with the data frame first. So all it sees in the logical expression is the always true id_ == id_ within that data frame.
One way to think about it is to play dumb (like a computer) and ask yourself when presented with the condition id_ == id_ how do you know what exactly each symbol refers to. It's ambiguous, and subset makes a consistent choice: use what's in the data frame.
Notwithstanding the comments, this works:
plot_series <- function(z, envir=environment()) {
dat <- subset(df,id_==z)
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
p$plot_env <- envir
plot(p)
# Displays wrongly whether we do the plot here inside fn, or return the object to parent environment
return(p)
}
plot_series(20)
The problem seems to be the subset is interpreting id_ on the RHS of the == as identical to the LHS, to this is equivalent to subletting on T, which of course includes all the rows of df. That's the plot you are seeing.
I'm trying to do my first function in R. I have a dataframe of inderminate columns, and I want to create a ggplot of each set of columns. For example, columns, 1&2, 1&3, 1&4 etc.
However, when I try the following function I get the object not found error, but only when we get the the ggplot portion.
Thanks,
BrandPlot=function(Brand){
NoCol=ncol(Brand)
count=2
while (count<=NoCol){
return(ggplot(Brand, aes(x=Brand[,1], y=Brand[,count]))+geom_point())
count=(count+1)
}
}
To clarify,
I'm trying to get the effect (also, I plan on adding additional things like geom_smooth() but I want to get it working first
ggplot(Brand, aes(x=Brand[,1], y=Brand[,2]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,3]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,4]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,5]))+geom_point
(also, I plan on adding additional things like geom_smooth() ) but I want to get it working first
Per the note above, something like this may be what you're looking for...
brandplot <- function(x){
require(reshape2)
require(ggplot2)
x_melt <- melt(x, id.vars = names(x)[1])
ggplot(x_melt,
aes_string(x = names(x_melt)[1],
y = 'value',
group = 'variable')) +
geom_point() +
facet_wrap( ~ variable)
}
dat <- data.frame(a = sample(1:10, 25, T),
b = sample(20:30, 25, T),
c = sample(40:50, 25, T))
brandplot(dat)
[Note: #maloneypatr's solution is a better way to use ggplot for your application].
To answer your question directly, there are a couple of problems.
Your function returns after the first run through the loop (e.g., count=2), so you will never get more than one plot from this.
ggplot evaluates arguments to aes(...) in the context of the data frame defined in data=..., so it is looking for something like Brand$Brand (e.g., a column named Brand in the dataframe Brand). Since there is no such column, you get the Object not found error.
The following code will generate a series of n-1 plots where n = ncol(Brand).
BrandPlot=function(Brand){
for (count in 2:ncol(Brand)){
ggp <- ggplot(Brand, aes_string(x=names(Brand)[1], y=names(Brand)[count]))
ggp <- ggp + geom_point()
ggp <- ggp + ggtitle(paste(names(Brand)[count], " vs. ", names(Brand)[1]))
plot(ggp)
}
}
While trying to overlay a new line to a existing ggplot, I am getting the following error:
Error: ggplot2 doesn't know how to deal with data of class uneval
The first part of my code works fine. Below is an image of "recent" hourly wind generation data from a Midwestern United States electric power market.
Now I want to overlay the last two days worth of observations in Red. It should be easy but I cant figure out why I am getting a error.
Any assistance would be greatly appreciated.
Below is a reproducible example:
# Read in Wind data
fname <- "https://www.midwestiso.org/Library/Repository/Market%20Reports/20130510_hwd_HIST.csv"
df <- read.csv(fname, header=TRUE, sep="," , skip=7)
df <- df[1:(length(df$MKTHOUR)-5),]
# format variables
df$MWh <- as.numeric(df$MWh)
df$Datetime <- strptime(df$MKTHOUR, "%m/%d/%y %I:%M %p")
# Create some variables
df$Date <- as.Date(df$Datetime)
df$HrEnd <- df$Datetime$hour+1
# Subset recent and last data
last.obs <- range(df$Date)[2]
df.recent <- subset(df, Date %in% seq(last.obs-30, last.obs-2, by=1))
df.last <- subset(df, Date %in% seq(last.obs-2, last.obs, by=1))
# plot recent in Grey
p <- ggplot(df.recent, aes(HrEnd, MWh, group=factor(Date))) +
geom_line(color="grey") +
scale_y_continuous(labels = comma) +
scale_x_continuous(breaks = seq(1,24,1)) +
labs(y="MWh") +
labs(x="Hour Ending") +
labs(title="Hourly Wind Generation")
p
# plot last two days in Red
p <- p + geom_line(df.last, aes(HrEnd, MWh, group=factor(Date)), color="red")
p
when you add a new data set to a geom you need to use the data= argument. Or put the arguments in the proper order mapping=..., data=.... Take a look at the arguments for ?geom_line.
Thus:
p + geom_line(data=df.last, aes(HrEnd, MWh, group=factor(Date)), color="red")
Or:
p + geom_line(aes(HrEnd, MWh, group=factor(Date)), df.last, color="red")
Another cause is accidentally putting the data=... inside the aes(...) instead of outside:
RIGHT:
ggplot(data=df[df$var7=='9-06',], aes(x=lifetime,y=rep_rate,group=mdcp,color=mdcp) ...)
WRONG:
ggplot(aes(data=df[df$var7=='9-06',],x=lifetime,y=rep_rate,group=mdcp,color=mdcp) ...)
In particular this can happen when you prototype your plot command with qplot(), which doesn't use an explicit aes(), then edit/copy-and-paste it into a ggplot()
qplot(data=..., x=...,y=..., ...)
ggplot(data=..., aes(x=...,y=...,...))
It's a pity ggplot's error message isn't Missing 'data' argument! instead of this cryptic nonsense, because that's what this message often means.
This could also occur if you refer to a variable in the data.frame that doesn't exist. For example, recently I forgot to tell ddply to summarize by one of my variables that I used in geom_line to specify line color. Then, ggplot didn't know where to find the variable I hadn't created in the summary table, and I got this error.