Plotting Time Series Data in R and Tidy - r

I am trying to use lubridate to sort out time series data from my temperature sensors. I would ultimately like a plot that has time on the x axis and temperature on the y axis. I have been using the function parse_date_time to try and create a new date variable but all I get is NA.
temps<-temps %>% as_tibble() %>%
mutate(date = parse_date_time(Date.Time..GMT..0500, "mdYHM"))
temps

The problem is that you inserted a capital Y when the year part only contains two digits. So you should use a small-case y, i.e.
temps %>% as_tibble() %>%
mutate(date = parse_date_time(Date.Time..GMT..0500, "mdyHM"))
To produce a simple plot, here is a basic code
ggplot(temps) +
aes(x = date, y = TempF) +
geom_line()
For further details on the plot itself, I suggest you to have a look at ggplot2 documentation.
In my sample data it worked
temps <- data.frame(
Date.Time..GMT..0500 = c("6/18/18 12:57", "6/18/18 13:57", "6/18/18 14:57"),
var = c(1,2,3)
)
parse_date_time(temps$Date.Time..GMT..0500, "mdYHM")
# [1] "2018-06-18 12:57:00 UTC" "2018-06-18 13:57:00 UTC" "2018-06-18 14:57:00 UTC"

Related

Plotting data in order of date

The aim is to plot a historical plot in r. using the following data
date new_cases
1 22/01/2020 NA
2 23/01/2020 0
3 24/01/2020 1
4 25/01/2020 0
5 26/01/2020 3
6 27/01/2020 0
7 28/01/2020 0
8 29/01/2020 1
9 30/01/2020 0
10 31/01/2020 2
... etc.
I have never plotted like this, the only thing I know is using
plot(data$new_cases)
But surely there is a more a aesthetically pleasing way to do this?
Thanks
In base R, you can convert the date column to date (as suggested above) using as.Date. Then, you can plot and label your graph.
df$date <- as.Date(df$date, "%d/%m/%Y")
plot(new_cases ~ date, df, xaxt = "n", type = "l")
axis(1, df$date, format(df$date, "%d/%m/%Y"), cex.axis = .7)
Another option is to use tidyverse and convert the date, then pipe into ggplot2:
library(tidyverse)
df %>%
mutate(date = as.Date(date, format = "%d/%m/%Y")) %>%
ggplot(aes(date,new_cases)) +
geom_line()
Or if you just want points, then you can do as #Dave2e suggested:
df$date <- as.Date(df$date, "%d/%m/%Y")
plot(x=df$date, y=df$new_cases)
For completness, date is a poor and potentially perilous name for variable and my result in a namespace conflict as date corresponds to base and lubridate functions. When using this data, I would suggest that you rename column to case_date or equivalent.
On another matter, if you are using tidyverse you may want to explore lubridate, which is part of tidyverse.
# pasted_data <- read.delim(pipe("pbpaste"))
library("tidyverse")
library("tidyr")
library("lubridate")
pasted_data %>%
separate(
col = 1,
sep = "\\s+",
into = c("col_ind", "case_date", "cases"),
convert = TRUE
) %>%
mutate(case_date = dmy(case_date)) %>%
ggplot(aes(x = case_date, y = cases)) +
geom_line()
For posterity, ff pasted in the available format, data should be separated into distinct columns. While in tidyverse, this can be achieved using tidy::separate.

How to plot dates as dates (not numbers or character) on x axis of ggplot?

I have a huge data set containing bacteria samples (4 types of bacteria) from 10 water resources from 2010 until 2019. some values are missing so we need to not include them in the plot or analysis.
I want to plot a time series for each type of bacteria for each resource for all years.
What is the best way to do that?
library("ggplot2")
BactData= read.csv('RÃ¥vannsdata_Bergen_2010_2018a.csv', sep='\t',header=TRUE)
summary(BactData,na.rm = TRUE)
df$Date = as.Date( df$Date, '%d/%m/%Y')
#require(ggplot2)
ggplot( data = df, aes( Date,BactData$Svartediket_CB )) + geom_line()
#plot(BactData$Svartediket_CB,col='brown')
plot(BactData$Svartediket_CP,col='cyan')
plot(BactData$Svartediket_EC,col='magenta')
plot(BactData$Svartediket_IE,col='darkviolet')
using plot is not satisfactory because the x axis is just numbers not dates . Tried to use ggplot but got an error message. I am beginner in R.
Error message
Error in df$Date : object of type 'closure' is not subsettable
Data as CVS file with tab delimiter
This will do the trick
BactData = read.csv('RÃ¥vannsdata_Bergen_2010_2018a.csv', sep='\t',header=TRUE, stringsAsFactors = F)
colnames(BactData)[1] <- "Date"
library(lubridate)
BactData$Date = dmy(BactData$Date) # converts strings to date class
ggplot(data = BactData, aes(Date, Svartediket_CB )) + geom_line()
You can filter for any year using dplyr with lubridate. For example, 2017:
library(dplyr)
BactData %>% filter(year(Date) == 2017) %>%
ggplot(aes(Date, Svartediket_CB )) + geom_line()
Or for two years
library(dplyr)
BactData %>% filter(year(Date) == 2017 | year(Date) == 2018) %>%
ggplot(aes(Date, Svartediket_CB )) + geom_line()

How to filter a dataset by the time stamp

I'm working with some bird GPS tracking data, and I would like to exclude points based on the time stamp.
Some background information- the GPS loggers track each bird for just over 24 hours, starting in the evening, and continuing through the night and the following day. What I would like to do is exclude points taken after 9:30pm on the day AFTER deployment (so removing points from the very end of the track).
As an R novice, I'm struggling because the deployment dates differ for each bird, so I can't simply use subset() for a specific date and time.
An example of my dataframe (df):
BirdID x y Datetime
15K12 492719.9 5634805 2015-06-23 18:25:00
15K12 492491.5 5635018 2015-06-23 18:27:00
15K70 455979.1 5653581 2015-06-24 19:54:00
15K70 456040.9 5653668 2015-06-24 19:59:00
So, pretending these points represent the start of the GPS track for each animal, I would like to remove points after 9:30 pm on June 24 for bird 15K12, and after 9:30 on June 25 for bird 15K70.
Any ideas?
First, check if df$Datetime is a date variable:
class(df$Datetime)
If it's not, you can convert it with this:
df$Datetime <- ymd_hms(df&Datetime)
You use mutate to create a new variable called newdate that takes the earliest date of the bird's data and sets the date for cutoff which is the next day at 21:30:00 of the earliest date of a bird's observations.
Then you filter the Datetime column by the newdate column and you get the observations that are found earlier that the specified date.
library(dplyr); library(lubridate)
df %>%
group_by(BirdID) %>%
mutate(newdate = as.POSIXct(date(min(Datetime)) + days(1) + hours(21) + minutes(30))) %>%
filter(Datetime < newdate)
Did a reproducible example:
library(dplyr); library(lubridate)
set.seed(1)
# Create a data frame (1000 observations)
BirdID <- paste(rep(floor(runif(250, 1, 20)),4),
rep("k", 1000), rep(floor(runif(250, 1, 40)),4), sep = "")
x <- rnorm(1000, mean = 47000, sd = 2000)
y <- rnorm(1000, mean = 5650000, sd = 300000)
Datetime <- as.POSIXct(rnorm(1000, mean = as.numeric(as.POSIXct("2015-06-23 18:25:00")), sd = 99999), tz = "GMT", origin = "1970-01-01")
df <- data.frame(BirdID, x, y, Datetime, stringsAsFactors = FALSE)
# Filter the data frame by the specified date
df_filtered <- df %>%
group_by(BirdID) %>%
mutate(newdate = as.POSIXct(date(min(Datetime)) + days(1) + hours(21) + minutes(30))) %>%
filter(Datetime < newdate)
This should fix any problem.

Adding date ticks to ggplot in R

I am trying to add ticks to my x-axis in this graph to show all the months of the year:
My code is as follows:
library(ggplot2)
library(scales)
p <- ggplot(df_test, aes(time, reading))
p + geom_point(alpha = 1/4) + geom_smooth()
I have tried to use scale_x_date but have come across the following error:
Error: Invalid input: date_trans works with objects of class Date only
Here's the data frame I'm using:
hour reading date time
1 53 1/1/15 2015-01-01 01:00:00
2 55 1/1/15 2015-01-01 02:00:00
3 56 1/1/15 2015-01-01 03:00:00
The class of my time variable:
class(df_test$time)
"POSIXct" "POSIXt"
Use scale_x_date(breaks="month", labels=date_format("%b%)). Here's an example.
library(quantmod)
sp500 <- getSymbols("SP500", src="FRED", auto.assign=FALSE)
sp500 <- sp500["2015-01-01::"]
sp500 <- data.frame(time=as.POSIXct(index(sp500), origin="1970-01-01"),sp500)
class(sp500$time)
# [1] "POSIXct" "POSIXt"
library(ggplot2)
library(scales) # for date_format(...)
ggplot(sp500, aes(x=as.Date(time), y=SP500))+
geom_line()+
scale_x_date(breaks="month", labels=date_format("%b"))
You are trying to use a scale specific for Date on a POSIXct object. The solution is to cast the POSIXct object to a date using as.date:
> Sys.time()
[1] "2015-09-16 09:52:42 CEST"
> as.Date(Sys.time())
[1] "2015-09-16"
To do this on your data.frame, I recommend using the dplyr package:
df_test = df_test %>% mutate(time = as.Date(time))
Combining the answers of both #PaulHiemstra and #jihoward, I've managed to figure out an answer.
First using the dplyr library to rework the data:
library(dplyr)
df_test1 = df_test %>% mutate(time = as.Date(time))
Then using scale_x_dates:
library(ggplot2)
library(scales)
p <- ggplot(df_test1, aes(time, reading))
p + geom_point(alpha = 1/4)+
scale_x_date(breaks="month", labels=date_format("%b"))
Gives the result:
Since the most recent answers (2015) date_format() is deprecated (I believe). Replace it with label_date() or scales::label_date(). It may not be loaded in your namespace but should come with ggplot so scales:: might be necessary.
Here's a copy and paste of the #timothyylim accepted answer with the change.
library(dplyr)
df_test1 = df_test %>% mutate(time = as.Date(time))
library(ggplot2)
library(scales)
p <- ggplot(df_test1, aes(time, reading))
p + geom_point(alpha = 1/4)+
scale_x_date(breaks="month", labels = scales::label_date("%b"))

Change date order at axis

I am always struggeling with this, so I think it is finally time to ask some help...
I tried to make a reproducible example, but for some reason I cannot get my x$monthday in the %m-%d format :(.
x<-data.frame(seq(as.POSIXct('2012-10-01'), as.POSIXct('2015-03-01'), by= "day"))
names(x)<- "date"
x$month<- months(x$date)
x$monthday<- as.POSIXct(x$date, format= "%m-%d")
x1<- x[x$month== 'October' |x$month== 'November' | x$month== 'December' |x$month== 'January'|x$month== 'February', ]
y<- 1: nrow(x1)
x2<-cbind(x1, y)
x3<- aggregate(list(y=x2$y), list(monthday=x2$monthday), mean)
plot(x3$monthday, x3$y)
The date has the format of %m/%d and is of a time series from October-March.
R orders the axis beautifully from January to December, which causes a big gap in between, because my data range from October-March.
How can I make my x axis order in the form from October-March?
Thank you very much in advance.
library(dplyr)
library(ggplot2)
library(lubridate)
# Fake data
dat <- data.frame(date=seq(as.POSIXct('2012-10-01'), as.POSIXct('2015-03-01'), by="day"))
set.seed(23)
dat$temperature = cumsum(rnorm(nrow(dat)))
# Subset to October - March
dat <- dat[months(dat$date) %in% month.name[c(1:2,10:12)], ]
# Calculate mean daily temperature
dat = dat %>% group_by(Month=month(date), Day=day(date)) %>%
summarise(dailyMeanTemp = mean(temperature)) %>%
mutate(newDate = as.Date(ifelse(Month %in% 10:12,
paste0("2014-", Month, "-", Day),
paste0("2015-", Month, "-", Day))))
The mutate function above creates a fake year, only so that we can keep the dates in "date" format and get them ordered from October to March. There's probably a better way to do it (maybe a function in the zoo or xts packages), but this seems to work.
ggplot(dat, aes(newDate, dailyMeanTemp)) +
geom_line() + geom_point() +
labs(y="Mean Temperature", x="Month")
Or, in base graphics:
plot(dat$newDate, dat$dailyMeanTemp)

Resources