I am trying to create a graph with the following characteristics:
x-axis: time and date
y-axis: data
here you can download my dataframe: https://my.cloudme.com/josechka/data
I try to produce the graph using:
p <- ggplot(data,aes(x = Date, y = Var,group = 1))
+ geom_line()
+ scale_x_date(labels = date_format("%m/%d/%Y"))
+ scale_y_continuous(limits = c(0, 70000))
p
And I get the result:
Error: Invalid input: date_trans works with objects of class Date only
I am quite new in R and ggplot. What am I doing wrong?
As suggested you have to format the Date column into a Date object.
data$Date<-as.Date(data$Date, format="%d/%m/%Y")
Now you can use your script in order to create the plot:
library("ggplo2")
library("scales")
p <- ggplot(data,aes(x = Date, y = Var,group = 1))
+ geom_line()
+ scale_x_date(labels = date_format("%m/%d/%Y"))
+ scale_y_continuous(limits = c(0, 70000))
p
And this is the resulting plot:
Thanks for the comments. They helped me to find out the solution. Both comments allow to represent my data. However, there is small problem: data from the same day is grouped and it is not possible to see the daily behaviour of the variable. I tested to format the Date column using the next command:
as.POSIXct(data$Date, format="%d/%m/%Y %H:%M:%S")
It worked out. However it is important to have the original data in the format d/m/Y h:m:s. Thanks very much for the comments which help me a lot to solve my problem.
Related
In ggplot2, I have a question about appropriate scales for making POSIXct datetimes into time-of-day in an axis. Consider:
library(tidyverse)
library(lubridate)
library(hms)
library(patchwork)
test <- tibble(
dates = c(ymd_hms("2022-01-01 6:00:00"),
ymd_hms("2023-01-01 19:00:00")),
x = c(1, 2),
hms_dates = as_hms(dates)
)
plot1 <- ggplot(test) + geom_point(aes(x = x, y = dates)) +
scale_y_time()
plot2 <- ggplot(test) + geom_point(aes(x = x, y = hms_dates)) +
scale_y_time()
plot1 + plot2
Plot 1 y axis includes dates and time, but Plot 2 shows just time of day. That's what I want! I'd like to generate plot 2 like images without having to use the hms::as_hms approach. This seems to imply some options for scale_y_datetime (or similar) that I can't discover. I'd welcome suggestions.
Does someone have an example of how to use the limits option in scale_*_time, or (see question #1) limits for a scale_y_datetime that specifies hours within the day, e.g. .. limits(c(8,22)) predictably fails.
For your second question, when dealing with dates or datetimes or times you have to set the limits and/or breaks as dates, datetimes or times too, i.e. use limits = as_hms(c("8:00:00", "22:00:00"):
library(tidyverse)
library(lubridate)
library(hms)
ggplot(test) + geom_point(aes(x = x, y = hms_dates)) +
scale_y_time(limits = as_hms(c("8:00:00", "22:00:00")))
#> Warning: Removed 1 rows containing missing values (`geom_point()`).
Concerning your first question. TBMK this could not be achieved via scale_..._datetime. And if you just want to show the time part of your dates then converting to an has object is IMHO the easiest way to achieve that. You could of course set the units to be shown as axis text via the date_labels argument, e.g. date_labels="%H:%M:%S" to show only the time of day. However, as your dates variable is still a datetime the scale, breaks and limits will still reflect that, i.e. you only change the format of the labels and for your example data you end up with an axis showing the same time for each break, i.e. the start of the day.
ggplot(test) + geom_point(aes(x = x, y = dates)) +
scale_y_datetime(date_labels = "%H:%M:%S")
I am trying to make a data vs. Time graph for some Methane emissions data I have. The code so far looks like this:
CH4 <- as.numeric(Aeris_2_Data$CH4)
Aeris_2_Data$Date.Time <- as.POSIXct(Aeris_2_Data$Time_Stamp, tz = "", "%m/%d/%Y %H:%M:%S")
ggplot(Aeris_2_Data, aes(x = Aeris_2_Data$Date.Time, y = as.numeric(CH4)) + geom_point() + labs(x = "Time", y = "CH4 [ppm]") + ggtitle("Methane Over Time")
My data looks like this:
head(Aeris_2_Data) and this: an extension of head
I am trying to map CH4 over time as you can probably see from the small code fragment I've managed so far. but I keep getting the error:
Error in seq.int(0, to0 - from, by) : 'to' must be a finite number
Everything seems to match the ggplot info I remember and also found online. What is going wrong? My guess is to do with the formatting of the time data, which is in the format %m/%d/%Y %H:%M:%Sand stored as a character in the csv file I am pulling from. How do I properly format that to change it? Thanks in advance.
There are two errors in your code :
date format is "%m/%d/%Y %H:%M" and not "%m/%d/%Y %H:%M:%S"
one ) is missing after aes()
Additionnaly as mentioned is the comments you should better use Date.Time and transform CH4 as numeric directly into the data.frame
The code should be:
Aeris_2_Data$CH4 <- as.numeric(Aeris_2_Data$CH4)
Aeris_2_Data$Date.Time <- as.POSIXct(Aeris_2_Data$Time_Stamp, tz = "", "%m/%d/%Y %H:%M")
ggplot(Aeris_2_Data, aes(x = Date.Time, y = as.numeric(CH4))) + geom_point() + labs(x = "Time", y = "CH4 [ppm]") + ggtitle("Methane Over Time")
I am trying to plot a time series in ggplot2. Assume I am using the following data structure (2500 x 20 matrix):
set.seed(21)
n <- 2500
x <- matrix(replicate(20,cumsum(sample(c(-1, 1), n, TRUE))),nrow = 2500,ncol=20)
aa <- x
rnames <- seq(as.Date("2010-01-01"), length=dim(aa)[1], by="1 month") - 1
rownames(aa) <- format(as.POSIXlt(rnames, format = "%Y-%m-%d"), format = "%d.%m.%Y")
colnames(aa) <- paste0("aa",1:k)
library("ggplot2")
library("reshape2")
library("scales")
aa <- melt(aa, id.vars = rownames(aa))
names(aa) <- c("time","id","value")
Now the following command to plot the time series produces a weird looking x axis:
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line()
What I found out is that I can change the format to date:
aa$time <- as.Date(aa$time, "%d.%m.%Y")
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line()
This looks better, but still not a good graph. My question is especially how to control the formatting of the x axis.
Does it have to be in Date format? How can I control the amount of breaks (i.e. years) shown in either case? It seems to be mandatory if Date is not used; otherwise ggplot2 uses some kind of useful default for the breaks I believe.
For example the following command does not work:
aa$time <- as.Date(aa$time, "%d.%m.%Y")
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line() +
scale_x_continuous(breaks=pretty_breaks(n=10))
Also if you got any hints how to improve the overall look of the graph feel free to add (e.g. the lines look a bit inprecise imho).
You can format dates with scale_x_date as #Gopala mentioned. Here's an example using a shortened version of your data for illustration.
library(dplyr)
# Dates need to be in date format
aa$time <- as.Date(aa$time, "%d.%m.%Y")
# Shorten data to speed rendering
aa = aa %>% group_by(id) %>% slice(1:200)
In the code below, we get date breaks every six months with date_breaks="6 months". That's probably more breaks than you want in this case and is just for illustration. If you want to determine which months get the breaks (e.g., Jan/July, Feb/Aug, etc.) then you also need to use coord_cartesian and set the start date with xlim and expand=FALSE so that ggplot won't pad the start date. But when you set expand=FALSE you also don't get any padding on the y-axis, so you need to add the padding manually with scale_y_continuous (I'd prefer to be able to set expand separately for the x and y axes, but AFAIK it's not possible). Because the breaks are packed tightly, we use a theme statement to rotate the labels by 90 degrees.
ggplot(aa, aes(x=time,y=value,colour=id,group=id)) +
geom_line(show.legend=FALSE) +
scale_y_continuous(limits=c(min(aa$value) - 2, max(aa$value) + 1)) +
scale_x_date(date_breaks="6 months",
labels=function(d) format(d, "%b %Y")) +
coord_cartesian(xlim=c(as.Date("2009-07-01"), max(aa$time) + 182),
expand=FALSE) +
theme_bw() +
theme(axis.text.x=element_text(angle=-90, vjust=0.5))
I have a CSV file called gdata.csv, with data like:
id,date,totKm,eLiter,euros,liters,km
1,24-04-2010,23678,1.180,42.00,35.59,450
2,16-05-2010,24058,1.200,43.00,35.83,380
3,27-05-2010,24488,1.160,44.00,37.93,430
4,12-06-2010,24960,1.180,45.00,38.14,472
With ggplot2
I just want to plot date and eliter in a line char with ggplot2, with this code:
x_date <- as.Date(gdata$date, format = "%d-%m-%Y")
ggplot(eliter, aes(x_date, eliter)) + geom_line()
But, it returns this error related with the class:
Error: ggplot2 doesn't know how to deal with data of class numeric
I have tried to make a data.frame but it stills returns the error:
d <- data.frame(xdate = x_date, yeliter=gdata$eLiter)
ggplot(d$xdate, aes(d$xdate, d$yeliter)) + geom_line()
Error: ggplot2 doesn't know how to deal with data of class Date
With plot
I have managed to do this with plot() function:
plot(gdata$eLiter~as.Date(gdata$date, "%d-%m-%Y"), type = "s", xlab="Date",ylab="€/Liter", main="€/liter trend", col='blue')
And it works fine! But I can not do it with ggplot.
Could anyone help me?
Thank you very much.
Add + scale_x_date() like this:
Lines <- "id,date,totKm,eLiter,euros,liters,km
1,24-04-2010,23678,1.180,42.00,35.59,450
2,16-05-2010,24058,1.200,43.00,35.83,380
3,27-05-2010,24488,1.160,44.00,37.93,430
4,12-06-2010,24960,1.180,45.00,38.14,472"
DF <- read.csv(text = Lines)
DF$date <- as.Date(DF$date, "%d-%m-%Y")
library(ggplot2)
ggplot(DF, aes(date, eLiter)) +
geom_line() +
scale_x_date()
I have CSV data of a log for 24 hours that looks like this:
svr01,07:17:14,'u1#user.de','8.3.1.35'
svr03,07:17:21,'u2#sr.de','82.15.1.35'
svr02,07:17:30,'u3#fr.de','2.15.1.35'
svr04,07:17:40,'u2#for.de','2.1.1.35'
I read the data with tbl <- read.csv("logs.csv")
How can I plot this data in a histogram to see the number of hits per hour?
Ideally, I would get 4 bars representing hits per hour per srv01, srv02, srv03, srv04.
Thank you for helping me here!
I don't know if I understood you right, so I will split my answer in two parts. The first part is how to convert your time into a vector you can use for plotting.
a) Converting your data into hours:
#df being the dataframe
df$timestamp <- strptime(df$timestamp, format="%H:%M:%S")
df$hours <- as.numeric(format(df$timestamp, format="%H"))
hist(df$hours)
This gives you a histogram of hits over all servers. If you want to split the histograms this is one way but of course there are numerous others:
b) Making a histogram with ggplot2
#install.packages("ggplot2")
require(ggplot2)
ggplot(data=df) + geom_histogram(aes(x=hours), bin=1) + facet_wrap(~ server)
# or use a color instead
ggplot(data=df) + geom_histogram(aes(x=hours, fill=server), bin=1)
c) You could also use another package:
require(plotrix)
l <- split(df$hours, f=df$server)
multhist(l)
The examples are given below. The third makes comparison easier but ggplot2 simply looks better I think.
EDIT
Here is how thes solutions would look like
first solution:
second solution:
third solution:
An example dataset:
dat = data.frame(server = paste("svr", round(runif(1000, 1, 10)), sep = ""),
time = Sys.time() + sort(round(runif(1000, 1, 36000))))
The trick I use is to create a new variable which only specifies in which hour the hit was recorded:
dat$hr = strftime(dat$time, "%H")
Now we can use some plyr magick:
hits_hour = count(dat, vars = c("server","hr"))
And create the plot:
ggplot(data = hits_hour) + geom_bar(aes(x = hr, y = freq, fill = server), stat="identity", position = "dodge")
Which looks like:
I don't really like this plot, I'd be more in favor of:
ggplot(data = hits_hour) + geom_line(aes(x = as.numeric(hr), y = freq)) + facet_wrap(~ server, nrow = 1)
Which looks like:
Putting all the facets in one row allows easy comparison of the number of hits between the servers. This will look even better when using real data instead of my random data.