Understanding dates a plotting with ggplot2 in R - r

I have a CSV file called gdata.csv, with data like:
id,date,totKm,eLiter,euros,liters,km
1,24-04-2010,23678,1.180,42.00,35.59,450
2,16-05-2010,24058,1.200,43.00,35.83,380
3,27-05-2010,24488,1.160,44.00,37.93,430
4,12-06-2010,24960,1.180,45.00,38.14,472
With ggplot2
I just want to plot date and eliter in a line char with ggplot2, with this code:
x_date <- as.Date(gdata$date, format = "%d-%m-%Y")
ggplot(eliter, aes(x_date, eliter)) + geom_line()
But, it returns this error related with the class:
Error: ggplot2 doesn't know how to deal with data of class numeric
I have tried to make a data.frame but it stills returns the error:
d <- data.frame(xdate = x_date, yeliter=gdata$eLiter)
ggplot(d$xdate, aes(d$xdate, d$yeliter)) + geom_line()
Error: ggplot2 doesn't know how to deal with data of class Date
With plot
I have managed to do this with plot() function:
plot(gdata$eLiter~as.Date(gdata$date, "%d-%m-%Y"), type = "s", xlab="Date",ylab="€/Liter", main="€/liter trend", col='blue')
And it works fine! But I can not do it with ggplot.
Could anyone help me?
Thank you very much.

Add + scale_x_date() like this:
Lines <- "id,date,totKm,eLiter,euros,liters,km
1,24-04-2010,23678,1.180,42.00,35.59,450
2,16-05-2010,24058,1.200,43.00,35.83,380
3,27-05-2010,24488,1.160,44.00,37.93,430
4,12-06-2010,24960,1.180,45.00,38.14,472"
DF <- read.csv(text = Lines)
DF$date <- as.Date(DF$date, "%d-%m-%Y")
library(ggplot2)
ggplot(DF, aes(date, eLiter)) +
geom_line() +
scale_x_date()

Related

Problems creating datetime series graph in R using ggplot

I am trying to create a graph with the following characteristics:
x-axis: time and date
y-axis: data
here you can download my dataframe: https://my.cloudme.com/josechka/data
I try to produce the graph using:
p <- ggplot(data,aes(x = Date, y = Var,group = 1))
+ geom_line()
+ scale_x_date(labels = date_format("%m/%d/%Y"))
+ scale_y_continuous(limits = c(0, 70000))
p
And I get the result:
Error: Invalid input: date_trans works with objects of class Date only
I am quite new in R and ggplot. What am I doing wrong?
As suggested you have to format the Date column into a Date object.
data$Date<-as.Date(data$Date, format="%d/%m/%Y")
Now you can use your script in order to create the plot:
library("ggplo2")
library("scales")
p <- ggplot(data,aes(x = Date, y = Var,group = 1))
+ geom_line()
+ scale_x_date(labels = date_format("%m/%d/%Y"))
+ scale_y_continuous(limits = c(0, 70000))
p
And this is the resulting plot:
Thanks for the comments. They helped me to find out the solution. Both comments allow to represent my data. However, there is small problem: data from the same day is grouped and it is not possible to see the daily behaviour of the variable. I tested to format the Date column using the next command:
as.POSIXct(data$Date, format="%d/%m/%Y %H:%M:%S")
It worked out. However it is important to have the original data in the format d/m/Y h:m:s. Thanks very much for the comments which help me a lot to solve my problem.

Cannot convert a time variable to plot it on ggplot

I have two problems handling my time variable in Gnu R!
Firstly, I cannot recode the time data (downloadable here) from factor (or character) with as.Posixlt or with as.Date without an error message like this:
character string is not in a standard unambiguous format
I have then tried to covert my time data with:
dates <- strptime(time, "%Y-%m-%j")
which only gives me:
NA
Secondly, the reason why I wanted (had) to convert my time data is that I want to plot it with ggplot2 and adjust my scale_x_continuous (as described here) so that it only writes me every 50 year (i.e. 1250-01-01, 1300-01-01, etc.) in the x-axis, otherwise the x-axis is too busy (see graph below).
This is the code I use:
library(ggplot2)
library(scales)
library(reshape)
df <- read.csv(file="https://dl.dropboxusercontent.com/u/109495328/time.csv")
attach(df)
dates <- as.character(time)
population <- factor(Number_Humans)
ggplot(df, aes(x = dates, y = population)) + geom_line(aes(group=1), colour="#000099") + theme(axis.text.x=element_text(angle=90)) + xlab("Time in Years (A.D.)")
You need to remove the quotation marks in the date column, then you can convert it to date format:
df <- read.csv(file="https://dl.dropboxusercontent.com/u/109495328/time.csv")
df$time <- gsub('\"', "", as.character(df$time), fixed=TRUE)
df$time <- as.Date(df$time, "%Y-%m-%j")
ggplot(df, aes(x = time, y = Number_Humans)) +
geom_line(colour="#000099") +
theme(axis.text.x=element_text(angle=90)) +
xlab("Time in Years (A.D.)")

How to deal with "data of class uneval" error from ggplot2?

While trying to overlay a new line to a existing ggplot, I am getting the following error:
Error: ggplot2 doesn't know how to deal with data of class uneval
The first part of my code works fine. Below is an image of "recent" hourly wind generation data from a Midwestern United States electric power market.
Now I want to overlay the last two days worth of observations in Red. It should be easy but I cant figure out why I am getting a error.
Any assistance would be greatly appreciated.
Below is a reproducible example:
# Read in Wind data
fname <- "https://www.midwestiso.org/Library/Repository/Market%20Reports/20130510_hwd_HIST.csv"
df <- read.csv(fname, header=TRUE, sep="," , skip=7)
df <- df[1:(length(df$MKTHOUR)-5),]
# format variables
df$MWh <- as.numeric(df$MWh)
df$Datetime <- strptime(df$MKTHOUR, "%m/%d/%y %I:%M %p")
# Create some variables
df$Date <- as.Date(df$Datetime)
df$HrEnd <- df$Datetime$hour+1
# Subset recent and last data
last.obs <- range(df$Date)[2]
df.recent <- subset(df, Date %in% seq(last.obs-30, last.obs-2, by=1))
df.last <- subset(df, Date %in% seq(last.obs-2, last.obs, by=1))
# plot recent in Grey
p <- ggplot(df.recent, aes(HrEnd, MWh, group=factor(Date))) +
geom_line(color="grey") +
scale_y_continuous(labels = comma) +
scale_x_continuous(breaks = seq(1,24,1)) +
labs(y="MWh") +
labs(x="Hour Ending") +
labs(title="Hourly Wind Generation")
p
# plot last two days in Red
p <- p + geom_line(df.last, aes(HrEnd, MWh, group=factor(Date)), color="red")
p
when you add a new data set to a geom you need to use the data= argument. Or put the arguments in the proper order mapping=..., data=.... Take a look at the arguments for ?geom_line.
Thus:
p + geom_line(data=df.last, aes(HrEnd, MWh, group=factor(Date)), color="red")
Or:
p + geom_line(aes(HrEnd, MWh, group=factor(Date)), df.last, color="red")
Another cause is accidentally putting the data=... inside the aes(...) instead of outside:
RIGHT:
ggplot(data=df[df$var7=='9-06',], aes(x=lifetime,y=rep_rate,group=mdcp,color=mdcp) ...)
WRONG:
ggplot(aes(data=df[df$var7=='9-06',],x=lifetime,y=rep_rate,group=mdcp,color=mdcp) ...)
In particular this can happen when you prototype your plot command with qplot(), which doesn't use an explicit aes(), then edit/copy-and-paste it into a ggplot()
qplot(data=..., x=...,y=..., ...)
ggplot(data=..., aes(x=...,y=...,...))
It's a pity ggplot's error message isn't Missing 'data' argument! instead of this cryptic nonsense, because that's what this message often means.
This could also occur if you refer to a variable in the data.frame that doesn't exist. For example, recently I forgot to tell ddply to summarize by one of my variables that I used in geom_line to specify line color. Then, ggplot didn't know where to find the variable I hadn't created in the summary table, and I got this error.

Drawing a multiline graph with ggplot2 from a zoo object

all.
I read several previous message at stackoverflow, and went through the documentation of zoo and ggplot2 but didn't find any suitable answer.
Say I have a zoo object called 'data'. The original data in the flat file are as follows:
Date,Quote1,Quote2,Quote3,Quote4,Quote5
18/07/2008,42.36,44.53,28.4302,44.3,42
21/07/2008,43.14,44.87,28.6186,44.83,43.27
22/07/2008,43.26,44.85,28.6056,44.86,42.84
23/07/2008,44.74,45.61,29.7558,45.69,#N/A
24/07/2008,43.99,45.14,29.2944,45.19,#N/A
25/07/2008,43.18,45.33,29.4569,45.46,43.65
28/07/2008,43.45,44.72,28.5016,44.89,43.31
29/07/2008,43.49,44.8,28.1247,44.88,42.85
30/07/2008,44.55,45.54,28.0727,45.58,43.67
31/07/2008,43.36,45.5,27.9818,45.63,43.91
01/08/2008,43.34,44.75,28.0792,44.69,43.04
Now, I want to plot the time series of this five financial products on a single line graph so that to compare their evolution.
I wish to use the ggplot2.
Would anyone be kind to give me some hints?
If data is your zoo object then try this (and see ?autoplot.zoo for more info):
p <- autoplot(data, facet = NULL)
p
or perhaps this since I don't think the automatic varying of linetype looks so good with this many series in the same panel:
p + aes(linetype = NULL)
Here is one way to do it:
df <- read.csv(text = "Date,Quote1,Quote2,Quote3,Quote4,Quote5
18/07/2008,42.36,44.53,28.4302,44.3,42
21/07/2008,43.14,44.87,28.6186,44.83,43.27
22/07/2008,43.26,44.85,28.6056,44.86,42.84
23/07/2008,44.74,45.61,29.7558,45.69,#N/A
24/07/2008,43.99,45.14,29.2944,45.19,#N/A
25/07/2008,43.18,45.33,29.4569,45.46,43.65
28/07/2008,43.45,44.72,28.5016,44.89,43.31
29/07/2008,43.49,44.8,28.1247,44.88,42.85
30/07/2008,44.55,45.54,28.0727,45.58,43.67
31/07/2008,43.36,45.5,27.9818,45.63,43.91
01/08/2008,43.34,44.75,28.0792,44.69,43.04", na.string = "#N/A")
df$Date <- strptime(df$Date, format = "%d/%m/%Y")
Create a zoo object:
library(zoo)
dat <- zoo(df[-1], df$Date)
Transform the object to a data frame for ggplot2:
df_new <- data.frame(value = as.vector(dat),
time = time(dat),
quote = rep(names(dat), each = nrow(dat)))
Plot:
library(ggplot2)
ggplot(df_new, aes(y = value, x = time, colour = quote)) + geom_line()
Here's another slightly different method, using melt from reshape
# Read your data and format date (as proposed by Sven)
df <- read.csv(text = "Date,Quote1,Quote2,Quote3,Quote4,Quote5
18/07/2008,42.36,44.53,28.4302,44.3,42
21/07/2008,43.14,44.87,28.6186,44.83,43.27
22/07/2008,43.26,44.85,28.6056,44.86,42.84
23/07/2008,44.74,45.61,29.7558,45.69,#N/A
24/07/2008,43.99,45.14,29.2944,45.19,#N/A
25/07/2008,43.18,45.33,29.4569,45.46,43.65
28/07/2008,43.45,44.72,28.5016,44.89,43.31
29/07/2008,43.49,44.8,28.1247,44.88,42.85
30/07/2008,44.55,45.54,28.0727,45.58,43.67
31/07/2008,43.36,45.5,27.9818,45.63,43.91
01/08/2008,43.34,44.75,28.0792,44.69,43.04", na.string = "#N/A")
df$Date <- strptime(df$Date, format = "%d/%m/%Y")
library(reshape)
# reshape your data with melt
melted <- melt(df[-1])
# add dates
melted2 <- cbind(df$Date,melted)
# plot with ggplot
ggplot(melted2,aes(y = value, x = melted2[,1], color = variable)) + geom_line()

R + ggplot2: how to hide missing dates from x-axis?

Say we have the following simple data-frame of date-value pairs, where some dates are missing in the sequence (i.e. Jan 12 thru Jan 14). When I plot the points, it shows these missing dates on the x-axis, but there are no points corresponding to those dates. I want to prevent these missing dates from showing up in the x-axis, so that the point sequence has no breaks. Any suggestions on how to do this? Thanks!
dts <- c(as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16')))
df <- data.frame(dt = dts, val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point() +
scale_x_date(format = '%d%b', major='days')
I made a package that does this. It's called bdscale and it's on CRAN and github. Shameless plug.
To replicate your example:
> library(bdscale)
> library(ggplot2)
> library(scales)
> dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
> ggplot(df, aes(x=dt, y=val)) + geom_point() +
scale_x_bd(business.dates=dts, labels=date_format('%d%b'))
But what you probably want is to load known valid dates, then plot your data using the valid dates on the x-axis:
> nyse <- bdscale::yahoo('SPY') # get valid dates from SPY prices
> dts <- as.Date('2011-01-10') + 1:10
> df <- data.frame(dt=dts, val=seq_along(dts))
> ggplot(df, aes(x=dt, y=val)) + geom_point() +
scale_x_bd(business.dates=nyse, labels=date_format('%d%b'), max.major.breaks=10)
Warning message:
Removed 3 rows containing missing values (geom_point).
The warning is telling you that it removed three dates:
15th = Saturday
16th = Sunday
17th = MLK Day
Turn the date data into a factor then. At the moment, ggplot is interpreting the data in the sense you have told it the data are in - a continuous date scale. You don't want that scale, you want a categorical scale:
require(ggplot2)
dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
df <- data.frame(dt = dts, val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point() +
scale_x_date(format = '%d%b', major='days')
versus
df <- data.frame(dt = factor(format(dts, format = '%d%b')),
val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
which produces:
Is that what you wanted?
First question is : why do you want to do that? There is no point in showing a coordinate-based plot if your axes are not coordinates. If you really want to do this, you can convert to a factor. Be careful for the order though :
dts <- c(as.Date( c('31-10-2011', '01-11-2011', '02-11-2011',
'05-11-2011'),format="%d-%m-%Y"))
dtsf <- format(dts, format= '%d%b')
df <- data.frame(dt=ordered(dtsf,levels=dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
With factors you have to be careful, as the order is arbitrary in a factor,unless you make it an ordered factor. As factors are ordered alphabetically by default, you can get in trouble with some date formats. So be careful what you do. If you don't take the order into account, you get :
df <- data.frame(dt=factor(dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()

Resources