Issue with date to plot box plot with ggplot - r

I have the following data frame d:
TS Turbidity
1 2014-12-12 00:00:00 87
2 2014-12-12 00:15:00 87
3 2014-12-12 00:30:00 91
4 2014-12-12 00:45:00 84
5 2014-12-12 01:00:00 92
6 2014-12-12 01:15:00 89
TS is my time combining the year, month, day, hour, minutes, and second. When I look at the nature of my data, TS is:
$ TS : POSIXct, format: "2014-12-12 00:00:00" "2014-12-12 00:15:00"
So for me , R understand that TS is date format.
I want to create boxplot per month (I precise that I have several years of data). I create a new column Month as follow:
d$Month<-as.Date(cut(d$TS, breaks="month"))
Then I plot this function:
ggplot(d, aes(x = factor(Month), y = Turbidity))+ geom_boxplot() + theme_bw()
This function plots well my data but I have too many x-labels and would like to plot labels for every 4 months for example. I add scale_x_date:
ggplot(d, aes(x = factor(Month), y = Turbidity))+ geom_boxplot() + theme_bw() +
scale_x_date(date_breaks = "4 month", date_labels = "%B")
It is at this step that I have trouble. I got this error message :
" Error: Invalid input: date_trans works with objects of class Date only".
But R precise that Month is in a date format.
$ Month : Date, format: "2014-12-01" "2014-12-01" "2014-12-01"
I look at forums but I cannot figure out where is the problem because for me I have already state that Month was a date.
Thanks for any help!

One approach could be as(with modified data):
library(ggplot2)
library(lubridate)
df %>% mutate(TS = ymd_hms(TS)) %>%
ggplot(aes(x = cut(TS, breaks="quarter"), y = Turbidity)) +
geom_boxplot() +
labs(x = "Start date of Quarter") +
theme_bw()
Data : Different from OP
df <- read.table(text =
"TS Turbidity
'2014-09-12 00:00:00' 87
'2014-09-12 00:15:00' 107
'2014-10-12 00:30:00' 91
'2014-10-12 00:30:00' 50
'2014-11-12 00:45:00' 84
'2014-11-12 00:45:00' 60
'2014-12-12 01:00:00' 92
'2014-12-12 01:15:00' 60
'2015-01-12 00:00:00' 87
'2015-01-12 00:15:00' 107
'2015-02-12 00:30:00' 91
'2015-02-12 00:30:00' 50
'2015-03-12 00:45:00' 84
'2015-03-12 00:45:00' 60
'2015-04-12 01:00:00' 92
'2015-04-12 01:15:00' 60
'2015-05-12 00:00:00' 87
'2015-05-12 00:15:00' 107
'2015-06-12 00:30:00' 91
'2015-06-12 00:30:00' 50
'2015-07-12 00:45:00' 84
'2015-07-12 00:45:00' 60
'2015-08-12 01:00:00' 92
'2015-08-12 01:15:00' 60", header = TRUE, stringsAsFactors = FALSE)

In your call to ggplot you explicitly convert Month to a factor with aes(x = factor(Month)) internally. Try removing the factor() wrapper from Month.
This doesn't change the object outside of ggplot, which is why you still see that it's class is Date when you check it. But you are definitely converting the class from Date to Factor inside of ggplot here.

Related

How to plot a line chart in ggplot with a date and time axis?

I have a simple dataset that contains three columns of hourly observations over the course of a few days.
The data looks something like...
Time Fast Standard Slow
Aug 02 2020 18:00:00 100 200 300
Aug 02 2020 19:00:00 50 100 150
Aug 02 2020 18:00:00 100 200 300
Aug 03 2020 12:00:00 50 100 150
Aug 03 2020 11:00:00 40 50 70
I start by loading up the CSV:
library(tidyverse)
# Link source
if (!exists("gasprices")) { # Check if the object is not already loaded
if (file.exists("./datafiles/gasprices.rdata")) {
load("./datafiles/gasprices.rdata")
} else {
gasprices <- read.csv("./datafiles/gasprices.csv")
}
But when I go to plot one of the lines, I get a blank plot. I think R is showing every row, when what I really need is three overall change-over-time lines for the three variables (fast, standard, slow). My ideal outcome would show three lines of different colors changing over time in the x axis.
# Plot
g <- ggplot(gasprices, aes(x=Time, y=Fast)) +
geom_line(color = "#00AFBB", size = 2)
xlab("") +
theme_light()
g
Any help would be greatly appreciated. Thank you,
It's likely to do with the column data-types. Try running the below for your dataframe, what do you get?
lapply(gasprices, class)
Try setting the datatype to a datetime before plotting:
gasprices$Time<- as.POSIXct(gasprices$Time, format = "%b %e %Y %H:%M:%S")
Have a look at this page for details about the providing the format to be used to parse the datetime.
Let me know how it goes!
What you should do is use the tidyverse package to unpivot your data.
require(ggplot2)
require(tidyr)
require(dplyr)
Lets create a dataframe with the same structure:
Data <- data.frame ( time = c(1,2,3), fast = c(100, 105, 110), slow = c(50, 70, 90), standart = c(94, 95, 96))
time fast slow standart
1 1 100 50 94
2 2 105 70 95
3 3 110 90 96
Now we unpivot the data.
Data %>%
tidyr::gather(key = 'Speed Type', value = 'Speed Value', -time)
time Speed Type Speed Value
1 fast 100
2 fast 105
3 fast 110
1 slow 50
2 slow 70
3 slow 90
1 standart 94
2 standart 95
3 standart 96
ggplot2::ggplot(data = UnpivotData, mapping = ggplot2::aes(x = time, y = `Speed Value`, color = `Speed Type`)) +
ggplot2::geom_line()
You must convert your dates to numeric values before you can plot lines with ggplot(). Perhaps this thread will help you. After doing this, you must provide new axis tick labels to your plot, for example:
plot + scale_x_discrete(labels= df$Time)
Here is a full example with date-to-numeric along with assigning axis tick labels:
library(reshape2)
# Make data frame
Lines <-"Time Fast Standard Slow
Aug 02 2020 18:00:00 100 200 300
Aug 02 2020 19:00:00 50 100 150
Aug 02 2020 20:00:00 100 200 300
Aug 03 2020 12:00:00 50 100 150
Aug 03 2020 11:00:00 40 50 70"
df <- read.csv(text = gsub(" +", ",", readLines(textConnection(Lines))),
check.names = FALSE)
# Convert date string to proper format
df$Time <- as.POSIXct(df$Time, format = "%b %d %Y %H:%M:%S")
# Reshape data for easier plotting. This function is from Reshape2.
df <- melt(df, id = "Time")
# Plot
ggplot(data = df, aes(x = as.numeric(Time), y = value, color = variable)) +
geom_line() +
scale_x_continuous(breaks = as.numeric(df$Time), labels = as.character(df$Time)) +
theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1))
exampleplot

Order of dates when plotting time series in R

I would like to know if the order of dates matter when plotting a time series in R.
For example, the dataframe below has it's date starting from the year 2010 onwards increasing as it goes down, for example till 2011:
Date Number of visits
2010-05-17 13
2010-05-18 11
2010-05-19 4
2010-05-20 2
2010-05-21 23
2010-05-22 26
2011-05-13 14
and below where the year are jumbled up.
Date Number of visits
2011-06-19 10
2009-04-25 5
2012-03-09 20
2011-01-04 45
Would i be able to plot a time series in R for the second example above? Is it required that in order to plot a time series, the dates must be sorted?
Assuming the data shown reproducibly int he Note at the end create an ordering vector o and then plot the ordered data:
o <- order(dat$Date)
plot(dat[o, ], type = "o")
or convert the data to a zoo series, which will automatically order it, and then plot.
library(zoo)
z <- read.zoo(dat)
plot(z, type = "o")
Note
The data in reproducible form:
Lines <- "Date Number of visits
2010-05-17 13
2010-05-18 11
2010-05-19 4
2010-05-20 2
2010-05-21 23
2010-05-22 26
2011-05-13 14"
dat <- read.csv(text = gsub(" +", ",", readLines(textConnection(Lines))),
check.names = FALSE)
dat$Date <- as.Date(dat$Date)
as.Date slove your problem:
data$Date <- as.Date(x$Date)
ggplot(data, aes(Date, Number_of_visits)) + geom_line()

Plotting several lines in one diagramm

I have a time series which shows the electricity load for every 15min during one year. I already filtered to show only one specific weekday.
My dataframe:
Date Timestamp Weekday Load
2017-01-02 00:00:00 Monday 272
2017-01-02 00:15:00 Monday 400
2017-01-02 00:30:00 Monday 699
2017-01-02 00:45:00 Monday 764
2017-01-02 01:00:00 Monday 983
..
..
2017-01-09 00:45:00 Monday 764
2017-01-09 01:00:00 Monday 983
..
2017-12-25 23:45:00 Monday 983
Now I want to plot several line diagrams for every monday in one diagram:
x axis = Timestamp
y axis = Load
I tried with ggplot:
ggplot(Loadprofile, aes(x= Timestamp, y = Load, color = Date)) + geom_line()
But this brings me following error
Error: Aesthetics must be either length 1 or the same as the data (4992): x, y, colour
That is the output, the x-axis does not look continious though?
enter image description here
Any suggestions?
Your problem is that you need Date to be a factor, but when it is on a Date form, ggplot takes it as a continuous variable.
I simulated some data, just to be able to do the graph, the following code is the one I used to generate the data:
library(tidyverse)
library(lubridate)
DateTimes <- seq(
from=as.POSIXct("2017-1-02 0:00", tz="UTC"),
to=as.POSIXct("2017-1-09 23:59", tz="UTC"),
by="15 min"
)
DF <- data.frame(Date = as.Date(DateTimes), timestamp = strftime(DateTimes, format="%H:%M:%S"), Weekday = weekdays(DateTimes)) %>% filter(Weekday == "Monday") %>% mutate(load = as.numeric(timestamp)*20 + -as.numeric(timestamp)^2 + rnorm(nrow(DF), sd = 1000) + (as.numeric(Date))) %>% mutate(load = ifelse(Date < ymd("2017_01_4"), load -5000, load))
Once I have done that, if I do the following:
ggplot(DF, aes(x = timestamp, y = load)) + geom_line(aes(group = as.factor(Date), color = as.factor(Date
I get the following graph
I think that is what you need, if you need more help formating the x axis and legend let me know
Cheers

How to change x axis from years to months with ggplot2

I have a web visits over time chart which plots daily traffic from 2014 until now, and looks like this:
ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
geom_line()+
scale_y_continuous(labels = comma)+
ylim(0,50000)
As you can see it's not a great graph, what would make a bit more sense is to break it down by month as opposed to day. However when I try this code:
ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
geom_line()+
scale_y_continuous(labels = comma)+
ylim(0,50000)+
scale_x_date(date_breaks = "1 month", minor_breaks = "1 week", labels = date_format("%B"))
I get this error:
Error: Invalid input: date_trans works with objects of class Date only
The date field Post_Day is POSIXct. Page_Views is numeric. Data looks like:
Post_Title Post_Day Page_Views
Title 1 2016-05-15 139
Title 2 2016-05-15 61
Title 3 2016-05-15 79
Title 4 2016-05-16 125
Title 5 2016-05-17 374
Title 6 2016-05-17 39
Title 7 2016-05-17 464
Title 8 2016-05-17 319
Title 9 2016-05-18 84
Title 10 2016-05-18 64
Title 11 2016-05-19 433
Title 12 2016-05-19 418
Title 13 2016-05-19 124
Title 14 2016-05-19 422
I'm looking to change the X axis from a daily granularity into monthly.
The sample data set shown in the question has multiple data points per day. So, it needs to be aggregated day-wise anyway. For the aggregation by day or month, data.table and lubridate are used.
Create sample data
As no reproducible example is supplied, a sample data set is created:
library(data.table)
n_rows <- 5000L
n_days <- 365L*3L
set.seed(123L)
DT <- data.table(Post_Title = paste("Title", 1:n_rows),
Post_Day = as.Date("2014-01-01") + sample(0:n_days, n_rows, replace = TRUE),
Page_Views = round(abs(rnorm(n_rows, 500, 200))))[order(Post_Day)]
DT
Post_Title Post_Day Page_Views
1: Title 74 2014-01-01 536
2: Title 478 2014-01-01 465
3: Title 3934 2014-01-01 289
4: Title 4136 2014-01-01 555
5: Title 740 2014-01-02 442
---
4996: Title 1478 2016-12-31 586
4997: Title 2251 2016-12-31 467
4998: Title 2647 2016-12-31 468
4999: Title 3243 2016-12-31 498
5000: Title 4302 2016-12-31 309
Plot raw data
Without aggregation the data can be plotted by
library(ggplot2)
ggplot(DT) + aes(Post_Day, Page_Views) + geom_line()
Aggregated by day
ggplot(DT[, .(Page_Views = sum(Page_Views)), by = Post_Day]) +
aes(Post_Day, Page_Views) + geom_line()
To aggregate day-wise the grouping parameter by of data.table is used and sum() as aggregation function. The aggregation is reducing the number of data points from 5000 to 1087. Hence, the plot looks less convoluted.
Aggregated by month
ggplot(DT[, .(Page_Views = sum(Page_Views)),
by = .(Post_Month = lubridate::floor_date(Post_Day, "month"))]) +
aes(Post_Month, Page_Views) + geom_line()
In order to aggregate by month, the grouping parameter by is used but this time Post_Day is mapped to the first day of the respective months. So, 2014-03-26 becomes a Post_Month of 2014-03-01 which is still of class POSIXct. By this, the x-axis remains continuous with a date scale. This avoids the trouble when converting Post_Day to factor, e.g, "2014-03" using format(Post_Day, ""%Y-%m"), where the x-axis would become discrete.
APRA$month <- as.factor(stftime(APRA$Post_Day, "%m")
APRA <- APRA[order(as.numeric(APRA$month)),]
This would create a month column to your data
z <- apply(split(APRA, APRA$month), function(x) {sum(as.numeric(APRA$Page_Views))})
z <- do.call(rbind, z)
z$month <- unique(APRA$month)
colnames(Z) <- c("Page_Views", "month")
This would create a z dataframe which has months and page views each month
Now plot it
ggplot(z, aes(x = month, y = Page_Views)) + geom_line()
Please let me know if this is what you were looking for. Also I haven't compiled it, please tell if it throws some error.

plotting daily rainfall data using geom_step

I have some rainfall data collected continuously from which I have calculated daily totals. Here is some toy data:
Date <- c(seq(as.Date("2016-07-01"), by = "1 day", length.out = 10))
rain_mm <- c(3,6,8,12,0,0,34,23,5,1)
rain_data <- data.frame(Date, rain_mm)
I can plot this data as follows:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity") +
scale_x_date(date_labels = "%d")
Which gives the following:
This seems fine. It is clear how much rainfall there was on a certain day. However, it could also be interpreted that between midday of one day and midday of the next, a certain amount of rain fell, which is wrong. This is especially a problem if the graph is combined with other plots of related continuous variables over the same period.
To get round this issue I could use geom_step as follows:
library(ggplot)
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
scale_x_date(date_labels = "%d")
Which gives:
This is a better way to display the data, and now scale_x_date appears to be a continuous axis. However, it would be nice to get the area below the steps filled but cant seem to find a straight forward way of doing this.
Q1: How can I fill beneath the geom_step? Is it possible?
It may also be useful to convert Date into POSIXct to facilitate identical x-axis in multi-plot figures as discussed in this SO question here.
I can do this as follows:
library(dplyr)
rain_data_POSIX <- rain_data %>% mutate(Date = as.POSIXct(Date))
Date rain_mm
1 2016-07-01 01:00:00 3
2 2016-07-02 01:00:00 6
3 2016-07-03 01:00:00 8
4 2016-07-04 01:00:00 12
5 2016-07-05 01:00:00 0
6 2016-07-06 01:00:00 0
7 2016-07-07 01:00:00 34
8 2016-07-08 01:00:00 23
9 2016-07-09 01:00:00 5
10 2016-07-10 01:00:00 1
However, this gives a time of 01:00 for each date. I would rather have 00:00. Can I change this in the as.POSIXct function call, or do I have to do it afterwards using a separate function? I think it is something to do with tz = "" but cant figure it out.
How can I convert from class Date to POSIXct so that the time generated is 00:00?
Thanks
For your first question, you can work off this example. First, create a time-lagged version of your data:
rain_tl <- mutate( rain_data, rain_mm = lag( rain_mm ) )
Then combine this time-lagged version with the original data, and re-sort by date:
rain_all <- bind_rows( old = rain_data, new = rain_tl, .id="source" ) %>%
arrange( Date, source )
(Note the newly created source column is used to break ties, correctly interlacing the original data with the time-lagged version):
> head( rain_all )
source Date rain_mm
1 new 2016-07-01 NA
2 old 2016-07-01 3
3 new 2016-07-02 3
4 old 2016-07-02 6
5 new 2016-07-03 6
6 old 2016-07-03 8
You can now use the joint matrix to "fill" your steps:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
geom_ribbon( data = rain_all, aes( ymin = 0, ymax = rain_mm ),
fill="tomato", alpha=0.5 ):
This produces the following plot:
For your second question, the problem is that as.POSIX.ct does not pass additional arguments to the converter, so specifying the tz argument does nothing.
You basically have two options:
1) Reformat the output to what you want: format( as.POSIXct( Date ), "%F 00:00" ), which returns a vector of type character. If you want to preserve the object type as POSIXct, you can instead...
2) Cast your Date vector to character prior to passing it to as.POSIX.ct: as.POSIXct( as.character(Date) ), but this will leave off the time entirely, which may be what you want anyway.
If you would like to avoid the hack, you can customize the position in the geom_bar expression.
I found good results with:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity", position = position_nudge(x = 0.51), width = 0.99) +
scale_x_date(date_labels = "%d")

Resources