How to plot 24 hour for 365 - r

How can I plot time series data hourly so that x-axis is 1:24. If I hav let's say one year of data so 365 days and 8000+ rows?
Tried with ggplot2 but didn't get it to work.
head looks like this
Value DateTime
1 104 2018-01-01 01:00:00
2 104 2018-01-01 02:00:00
3 108 2018-01-01 03:00:00
4 106 2018-01-01 04:00:00
5 117 2018-01-01 05:00:00
6 166 2018-01-01 06:00:00
And Tail
Value DateTime
8754 160.10 2018-12-31 19:00:00
8755 156.11 2018-12-31 20:00:00
8756 139.11 2018-12-31 21:00:00
8757 112.11 2018-12-31 22:00:00
8758 96.10 2018-12-31 23:00:00
8759 90.11 2019-01-01 00:00:00
Here is an image what I'm trying to achieve

What about having time of the day and date as seperate variables? You can use the package hms to do this.
timeOfDay <- as.hms(df$DateTime)
date <- as.Date(df$DateTime)
Now, you can use timeOfDay on the x-axis and date as your grouping aesthetics.
This works for me:
ggplot(df, aes(x = timeOfDay, y = value)) +
geom_line(aes(group = date))

Related

Remove days based on number of hours missing

I have some air pollution data measured by hours.
Datetime
PM2.5
Station.id
2020-01-01 00:00:00
10
1
2020-01-01 01:00:00
NA
1
2020-01-01 02:00:00
15
1
2020-01-01 03:00:00
NA
1
2020-01-01 04:00:00
7
1
2020-01-01 05:00:00
20
1
2020-01-01 06:00:00
30
1
2020-01-01 00:00:00
NA
2
2020-01-01 01:00:00
17
2
2020-01-01 02:00:00
21
2
2020-01-01 03:00:00
55
2
I have a very large number of data collected from many stations. Using R, what is the most efficient way to remove a day when it has 1. A total of 18 hours of missing data AND 2. 8 hours continuous missing data.
PS. The original data can be either NAs have already been removed OR NAs are inserted.
The "most efficient" way will almost certainly use data.table. Something like this:
library(data.table)
setDT(your_data)
your_data[, date := as.IDate(Datetime)][,
if(
!(sum(is.na(PM2.5)) >= 18 &
with(rle(is.na(PM2.5)), max(lengths[values])) >= 8
)) .SD,
by = .(date, station.id)
]
# date Datetime PM2.5
# 1: 2020-01-01 2020-01-01 00:00:00 10
# 2: 2020-01-01 2020-01-01 01:00:00 NA
# 3: 2020-01-01 2020-01-01 02:00:00 15
# 4: 2020-01-01 2020-01-01 03:00:00 NA
# 5: 2020-01-01 2020-01-01 04:00:00 7
# 6: 2020-01-01 2020-01-01 05:00:00 20
# 7: 2020-01-01 2020-01-01 06:00:00 30
Using this sample data:
your_data = fread(text = 'Datetime PM2.5
2020-01-01 00:00:00 10
2020-01-01 01:00:00 NA
2020-01-01 02:00:00 15
2020-01-01 03:00:00 NA
2020-01-01 04:00:00 7
2020-01-01 05:00:00 20
2020-01-01 06:00:00 30')

how to generate two columns with a 2019 year date and each date having time slot 9.00 am to 8.00 pm?

Create 2 columns in R with one column having 2019 date and in second column time, which has time slot 9.00AM to 8PM with 1 hour gap. So in total for a date we should have 11 columns. For example(below)
I am not sure, what is your desired column type, so you have different options below :-)
Here comes my solution:
library(lubridate)
library(tidyverse)
start <- ymd_hms("2019-05-01 09:00:00")
end <- start + hm("11:00")
tibble(timestamp = seq.POSIXt(start, end, by = 3600)) %>%
mutate(day = date(timestamp),
time = strftime(timestamp, format="%H:%M:%S")) %>%
select(day, time, timestamp)
day time timestamp
<date> <chr> <dttm>
1 2019-05-01 09:00:00 2019-05-01 09:00:00
2 2019-05-01 10:00:00 2019-05-01 10:00:00
3 2019-05-01 11:00:00 2019-05-01 11:00:00
4 2019-05-01 12:00:00 2019-05-01 12:00:00
5 2019-05-01 13:00:00 2019-05-01 13:00:00
6 2019-05-01 14:00:00 2019-05-01 14:00:00
7 2019-05-01 15:00:00 2019-05-01 15:00:00
8 2019-05-01 16:00:00 2019-05-01 16:00:00
9 2019-05-01 17:00:00 2019-05-01 17:00:00
10 2019-05-01 18:00:00 2019-05-01 18:00:00
11 2019-05-01 19:00:00 2019-05-01 19:00:00
12 2019-05-01 20:00:00 2019-05-01 20:00:00
Regards
Paweł
A random date range:
df <- data.frame(
date = seq.Date(Sys.Date() - 6, Sys.Date(), 1)
)
df <- merge(df,expand.grid(date = df$date, time = 9:20))
df <- df[order(df$date, df$time), ]
df$time <- sprintf("%02i:00", df$time)

R: calculate number of occurrences which have started but not ended - count if within a datetime range

I've got a dataset with the following shape
ID Start Time End Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00
I would like to count every 15 min for an entire year how many items have started but not finished, so count the number of times with a start time greater or equal than the time I'm looking at and an end time less or equal than the time I'm looking at.
I'm looking for an approach using tidyverse/dplyr if possible.
Any help or guidance would be very much appreciated.
If I understand correctly, the OP wants to count the number of simultaneously active events.
One possibility to tackle this question is the coverage() function from Bioconductor's IRange package. Another one is to aggregate in a non-equi join which is available with the data.table package.
Non-equi join
# create sequence of datetimes (limited to 4 days for demonstration)
seq15 <- seq(lubridate::as_datetime("2017-01-01"),
lubridate::as_datetime("2017-01-05"), by = "15 mins")
# aggregate within a non-equi join
library(data.table)
result <- periods[.(time = seq15), on = .(Start.Time <= time, End.Time > time),
.(time, count = sum(!is.na(ID))), by = .EACHI][, .(time, count)]
result
time count
1: 2017-01-01 00:00:00 0
2: 2017-01-01 00:15:00 1
3: 2017-01-01 00:30:00 1
4: 2017-01-01 00:45:00 1
5: 2017-01-01 01:00:00 1
---
381: 2017-01-04 23:00:00 0
382: 2017-01-04 23:15:00 0
383: 2017-01-04 23:30:00 0
384: 2017-01-04 23:45:00 0
385: 2017-01-05 00:00:00 0
The result can be visualized graphically:
library(ggplot2)
ggplot(result) + aes(time, count) + geom_step()
Data
periods <- readr::read_table(
"ID Start.Time End.Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00"
)
# convert date strings to class Date
library(data.table)
cols <- names(periods)[names(periods) %like% "Time$"]
setDT(periods)[, (cols) := lapply(.SD, lubridate::dmy_hms), .SDcols = cols]
periods
ID Start.Time End.Time
1: 1 2017-01-01 00:15:00 2017-01-01 07:15:00
2: 2 2017-01-01 04:45:00 2017-01-01 06:15:00
3: 3 2017-01-01 10:20:00 2017-01-01 20:15:00
4: 4 2017-01-01 02:15:00 2017-01-01 00:15:00
5: 5 2017-01-02 15:15:00 2017-01-03 00:30:00
6: 6 2017-01-03 07:00:00 2017-01-04 09:15:00

subset data with timestamp irrespective of date in R

The Dataset
head(data)
Date OPEN
2015-11-30 10:00:00 951.15
2015-11-30 10:30:00 949.90
2015-11-30 11:00:00 943.45
2015-11-30 11:30:00 944.30
2015-11-30 12:00:00 942.00
2015-11-30 12:30:00 940.60
2015-01-01 10:00:00 951.15
2015-01-01 10:30:00 949.90
2015-01-02 10:30:00 943.45
2015-01-02 11:30:00 944.30
2015-01-03 10:00:00 943.45
2015-01-03 10:30:00 943.45
2015-01-03 11:30:00 944.30
2015-01-06 10:00:00 942.00
2015-01-06 10:30:00 940.60
2015-01-06 11:00:00 940.60
2015-01-06 11:30:00 942.00
str(data)
'data.frame': 32023 obs. of 2 variables:
$ Date : POSIXct, format: "2015-11-30 10:00:00" "2015-11-30 10:30:00" "2015-11-30 11:00:00" ...
$ OPEN : num 951 950 943 944 942 ...
Hi,
Dataframe is mentioned above. I want to extract OPEN prices with timestamps 10:00 and 10:30 for all the dates available. I only need to keep timestamps 10:00 to 10:30 in filter condition irrespective of dates. Please suggest in R.
Thanks.
We can format the 'Date' to extract the HH:MM part, use %in% to get a logical vector and subset based on that.
subset(data, format(Date, "%H:%M") %in% c("10:00", "10:30"), select="OPEN")
# OPEN
#1 951.15
#2 949.90
#7 951.15
#8 949.90
#9 943.45
#11 943.45
#12 943.45
#14 942.00
#15 940.60
If it is between those intervals
library(chron)
subset(data, between(times(format(Date, "%H:%M:%S")) ,
times("10:00:00"), times("10:30:00")))
you can use lubridate package to make a friendly subset:
library(lubridate)
res <- subset(data, minute(Date) <=30 & hour(Date) == 10)

Split time series data into time intervals (say an hour) and then plot the count

I just have a data file with one column of time series:
'2012-02-01 17:42:44'
'2012-02-01 17:42:44'
'2012-02-01 17:42:44'
...
I want to split the data up such that I have a count at the top of hour. Say:
'2012-02-01 17:00:00' 20
'2012-02-01 18:00:00' 30
The '20' and '30' represent the number of time series entries for that out period. And I want to be able to graph the time vs that 'count'. How can I do this with R?
Here is my current line graph plot.
library(ggplot2)
req <- read.table("times1.dat")
summary(req)
da <- req$V2
db <- req$V1
time <- as.POSIXct(db)
png('time_data_errs.png', width=800, height=600)
gg <- qplot(time, da) + geom_line()
print(gg)
dev.off()
It sounds like you want to use cut to figure out how many values occur within an hour.
It's generally helpful if you can provide some sample data. Here's some:
set.seed(1) # So you can get the same numbers as I do
MyDates <- ISOdatetime(2012, 1, 1, 0, 0, 0, tz = "GMT") + sample(1:27000, 500)
head(MyDates)
# [1] "2012-01-01 01:59:29 GMT" "2012-01-01 02:47:27 GMT" "2012-01-01 04:17:46 GMT"
# [4] "2012-01-01 06:48:39 GMT" "2012-01-01 01:30:45 GMT" "2012-01-01 06:44:13 GMT"
You can use table and cut (with the argument breaks="hour" (see ?cut.Date for more info)) to find the frequencies per hour.
MyDatesTable <- table(cut(MyDates, breaks="hour"))
MyDatesTable
#
# 2012-01-01 00:00:00 2012-01-01 01:00:00 2012-01-01 02:00:00 2012-01-01 03:00:00
# 59 73 74 83
# 2012-01-01 04:00:00 2012-01-01 05:00:00 2012-01-01 06:00:00 2012-01-01 07:00:00
# 52 62 64 33
# Or a data.frame if you prefer
data.frame(MyDatesTable)
# Var1 Freq
# 1 2012-01-01 00:00:00 59
# 2 2012-01-01 01:00:00 73
# 3 2012-01-01 02:00:00 74
# 4 2012-01-01 03:00:00 83
# 5 2012-01-01 04:00:00 52
# 6 2012-01-01 05:00:00 62
# 7 2012-01-01 06:00:00 64
# 8 2012-01-01 07:00:00 33
Finally, here's a line plot of the MyDatesTable object:
plot(MyDatesTable, type="l", xlab="Time", ylab="Freq")
cut can handle a range of time intervals. For example, if you wanted to tabulate for every 30 minutes, you can easily adapt the breaks argument to handle that:
data.frame(table(cut(MyDates, breaks = "30 mins")))
# Var1 Freq
# 1 2012-01-01 00:00:00 22
# 2 2012-01-01 00:30:00 37
# 3 2012-01-01 01:00:00 38
# 4 2012-01-01 01:30:00 35
# 5 2012-01-01 02:00:00 32
# 6 2012-01-01 02:30:00 42
# 7 2012-01-01 03:00:00 39
# 8 2012-01-01 03:30:00 44
# 9 2012-01-01 04:00:00 25
# 10 2012-01-01 04:30:00 27
# 11 2012-01-01 05:00:00 33
# 12 2012-01-01 05:30:00 29
# 13 2012-01-01 06:00:00 29
# 14 2012-01-01 06:30:00 35
# 15 2012-01-01 07:00:00 33
Update
Since you were trying to plot with ggplot2, here's one approach (not sure if it is the best since I usually use base R's graphics when I need to).
Create a data.frame of the table (as demonstrated above) and add a dummy "group" variable and plot that as follows:
MyDatesDF <- data.frame(MyDatesTable, grp = 1)
ggplot(MyDatesDF, aes(Var1, Freq)) + geom_line(aes(group = grp))

Resources