While trying to overlay a new line to a existing ggplot, I am getting the following error:
Error: ggplot2 doesn't know how to deal with data of class uneval
The first part of my code works fine. Below is an image of "recent" hourly wind generation data from a Midwestern United States electric power market.
Now I want to overlay the last two days worth of observations in Red. It should be easy but I cant figure out why I am getting a error.
Any assistance would be greatly appreciated.
Below is a reproducible example:
# Read in Wind data
fname <- "https://www.midwestiso.org/Library/Repository/Market%20Reports/20130510_hwd_HIST.csv"
df <- read.csv(fname, header=TRUE, sep="," , skip=7)
df <- df[1:(length(df$MKTHOUR)-5),]
# format variables
df$MWh <- as.numeric(df$MWh)
df$Datetime <- strptime(df$MKTHOUR, "%m/%d/%y %I:%M %p")
# Create some variables
df$Date <- as.Date(df$Datetime)
df$HrEnd <- df$Datetime$hour+1
# Subset recent and last data
last.obs <- range(df$Date)[2]
df.recent <- subset(df, Date %in% seq(last.obs-30, last.obs-2, by=1))
df.last <- subset(df, Date %in% seq(last.obs-2, last.obs, by=1))
# plot recent in Grey
p <- ggplot(df.recent, aes(HrEnd, MWh, group=factor(Date))) +
geom_line(color="grey") +
scale_y_continuous(labels = comma) +
scale_x_continuous(breaks = seq(1,24,1)) +
labs(y="MWh") +
labs(x="Hour Ending") +
labs(title="Hourly Wind Generation")
p
# plot last two days in Red
p <- p + geom_line(df.last, aes(HrEnd, MWh, group=factor(Date)), color="red")
p
when you add a new data set to a geom you need to use the data= argument. Or put the arguments in the proper order mapping=..., data=.... Take a look at the arguments for ?geom_line.
Thus:
p + geom_line(data=df.last, aes(HrEnd, MWh, group=factor(Date)), color="red")
Or:
p + geom_line(aes(HrEnd, MWh, group=factor(Date)), df.last, color="red")
Another cause is accidentally putting the data=... inside the aes(...) instead of outside:
RIGHT:
ggplot(data=df[df$var7=='9-06',], aes(x=lifetime,y=rep_rate,group=mdcp,color=mdcp) ...)
WRONG:
ggplot(aes(data=df[df$var7=='9-06',],x=lifetime,y=rep_rate,group=mdcp,color=mdcp) ...)
In particular this can happen when you prototype your plot command with qplot(), which doesn't use an explicit aes(), then edit/copy-and-paste it into a ggplot()
qplot(data=..., x=...,y=..., ...)
ggplot(data=..., aes(x=...,y=...,...))
It's a pity ggplot's error message isn't Missing 'data' argument! instead of this cryptic nonsense, because that's what this message often means.
This could also occur if you refer to a variable in the data.frame that doesn't exist. For example, recently I forgot to tell ddply to summarize by one of my variables that I used in geom_line to specify line color. Then, ggplot didn't know where to find the variable I hadn't created in the summary table, and I got this error.
Related
I create a dummy timeseries xts object with missing data on date 2-09-2015 as:
library(xts)
library(ggplot2)
library(scales)
set.seed(123)
seq <- seq(as.POSIXct("2015-09-01"),as.POSIXct("2015-09-02"), by = "1 hour")
ob1 <- xts(rnorm(length(seq),150,5),seq)
seq2 <- seq(as.POSIXct("2015-09-03"),as.POSIXct("2015-09-05"), by = "1 hour")
ob2 <- xts(rnorm(length(seq2),170,5),seq2)
final_ob <- rbind(ob1,ob2)
plot(final_ob)
# with ggplot
df <- data.frame(time = index(final_ob), val = coredata(final_ob) )
ggplot(df, aes(time, val)) + geom_line()+ scale_x_datetime(labels = date_format("%Y-%m-%d"))
After plotting my data looks like this:
The red coloured rectangular portion represents the date on which data is missing. How should I show that data was missing on this day in the main plot?
I think I should show this missing data with a different colour. But, I don't know how should I process data to reflect the missing data behaviour in the main plot.
Thanks for the great reproducible example.
I think you are best off to omit that line in your "missing" portion. If you have a straight line (even in a different colour) it suggests that data was gathered in that interval, that happened to fall on that straight line. If you omit the line in that interval then it is clear that there is no data there.
The problem is that you want the hourly data to be connected by lines, and then no lines in the "missing data section" - so you need some way to detect that missing data section.
You have not given a criteria for this in your question, so based on your example I will say that each line on the plot should consist of data at hourly intervals; if there's a break of more than an hour then there should be a new line. You will have to adjust this criteria to your specific problem. All we're doing is splitting up your dataframe into bits that get plotted by the same line.
So first create a variable that says which "group" (ie line) each data is in:
df$grp <- factor(c(0, cumsum(diff(df$time) > 1)))
Then you can use the group= aesthetic which geom_line uses to split up lines:
ggplot(df, aes(time, val)) + geom_line(aes(group=grp)) + # <-- only change
scale_x_datetime(labels = date_format("%Y-%m-%d"))
Using the plot() function, is it possible to change the line type over a certain interval (e.g. from x=1 to x=2) and leave the rest of the plot as another line type?
I know I could use lines() multiple times for the same effect, but I'm wondering if there's an easier way.
How about using ggplot instead?
data <- data.frame(matrix(rnorm(20),20))
names(data) <- "series"
library(reshape2)
library(dplyr)
data <- data.frame(cbind(Index=1:nrow(data),data))
data$Col <- data$Index < 8 & data$Index > 3
ggplot(data, aes(x=Index,y=series,color=factor(Col))) +
geom_line(aes(group=1),size=1) +
guides(colour=F)
I am looking to resolve an error that I encounter when trying to use direct.label to label a ggplot with only one series. Below is a example to illustrate how direct.label fails if there is only a single series.
In my real data, I am looping through regions and wanting to use direct labels on the sub-regions. However, in my case some of the regions only have one sub-region resulting in an error when using direct.label. Any assistance would be greatly appreciated
library(ggplot2)
library(directlabels)
# sample data from ggplot2 movies data
mry <- do.call(rbind, by(movies, round(movies$rating), function(df) {
nums <- tapply(df$length, df$year, length)
data.frame(rating=round(df$rating[1]), year = as.numeric(names(nums)), number=as.vector(nums))
}))
# use direct labels to label based on rating
p <- ggplot(mry, aes(x=year, y=number, group=rating, color=rating)) + geom_line()
direct.label(p, "last.bumpup")
# subset to only a single rating
mry2 = subset(mry, rating==10)
p2 <- ggplot(mry2, aes(x=year, y=number, group=rating, color=rating)) + geom_line()
p2
# direct labels fails when attempting to label plot with a single series
direct.label(p2, "last.bumpup")
This indeed was a bug; the package maintainer has already fixed it. To obtain an updated version,
install.packages("directlabels", repos="http://r-forge.r-project.org")
I've just checked, everything now runs fine. Nice catch!
Using this toy example;
ddd <- c("31/03/1995","30/04/1995","31/05/1995","31/08/2013","30/09/2013","31/10/2013","30/11/2013")
rrr <- c("returns.1","returns.1","returns.1","returns.5","returns.5","returns.5","returns.5")
vvv <- c(-0.204598992791177,3.01855013302475,6.3888266761452,-1.21353731479968,7.20845451481339,3.97428317355226,0.0155720962396065)
df <- cbind(ddd,rrr,vvv)
df <- as.data.frame(df)
df$vvv <- as.numeric(df$vvv)
I able to plot my data (with no legend/labels for the lines), using the below
ggplot(data=DF,aes(x=ddd,y=vvv)) +
geom_line(aes(group=rrr))
But i wish to add colours/legends so I modify the above with ;
ggplot(data=DF,aes(x=ddd,y=vvv)) +
geom_line(aes(group=rrr)) +
geom_line(aes(colour=rrr))
But this returns the following error
Error in x[1:min(n, length(x))] :
only 0's may be mixed with negative subscripts
Any idea what the issue is, or what I should check in my data frame?
Because ddd is a factor, you need to use both group and colour on the same layer. In your example, you are using two layers, one without group, and one without colour:
ggplot(data=df,aes(x=ddd,y=vvv)) +
geom_line(aes(group=rrr, colour=rrr))
An alternative is to convert your dates into date format instead of factor, though with your dates so far apart you may want to facet by rrr (also, note this way stuff actually shows up in the correct order):
df$ddd <- as.Date(df$ddd, format="%d/%m/%Y")
ggplot(data=df,aes(x=ddd,y=vvv)) +
geom_line(aes(colour=rrr)) +
facet_wrap(~ rrr, scales="free_x")
I can't find a way to ask ggplot2 to show an empty level in a boxplot without imputing my dataframe with actual missing values.
Here is reproducible code :
# fake data
dftest <- expand.grid(time=1:10,measure=1:50)
dftest$value <- rnorm(dim(dftest)[1],3+0.1*dftest$time,1)
# and let's suppose we didn't observe anything at time 2
# doesn't work even when forcing with factor(..., levels=...)
p <- ggplot(data=dftest[dftest$time!=2,],aes(x=factor(time,levels=1:10),y=value))
p + geom_boxplot()
# only way seems to have at least one actual missing value in the dataframe
dftest2 <- dftest
dftest2[dftest2$time==2,"value"] <- NA
p <- ggplot(data=dftest2,aes(x=factor(time),y=value))
p + geom_boxplot()
So I guess I'm missing something. This is not a problem when dealing with a balanced experiment where these missing data might be explicit in the dataframe. But with observed data in a cohort for example, it means imputing the data with missing values for unobserved combinations...
Thanks for your help.
You can control the breaks in a suitable scale function, in this case scale_x_discrete. Make sure you use the argument drop=FALSE:
p <- ggplot(data=dftest[dftest$time!=2,],aes(x=factor(time,levels=1:10),y=value))
p + geom_boxplot() +
scale_x_discrete("time", breaks=factor(1:10), drop=FALSE)
I like to do my data manipulation in advance of sending it to ggplot. I think this makes the code more readable. This is how I would do it myself, but the results are the same. Note, however, that the ggplot scale gets much simpler, since you don't have to specify the breaks:
dfplot <- dftest[dftest$time!=2, ]
dfplot$time <- factor(dfplot$time, levels=1:10)
ggplot(data=dfplot, aes(x=time ,y=value)) +
geom_boxplot() +
scale_x_discrete("time", drop=FALSE)