I have this data frame:
dput(test)
structure(1376661600, class = c("POSIXct", "POSIXt"), tzone = "")
I need to increment this value by one hour if the time is greater than 07:00 and less than 13:00 and date is in M-F.
Is there somesort of package that I can use to do this?
# A data.frame with a .POSIXct column
d <- data.frame(x = .POSIXct(0, tz="GMT") + 6:14*60*60)
d
# x
#1 1970-01-01 06:00:00
#2 1970-01-01 07:00:00
#3 1970-01-01 08:00:00
#4 1970-01-01 09:00:00
#5 1970-01-01 10:00:00
#6 1970-01-01 11:00:00
#7 1970-01-01 12:00:00
#8 1970-01-01 13:00:00
#9 1970-01-01 14:00:00
# get the hours
hour <- as.POSIXlt(d[["x"]])$hour
subsetBool <- hour > 7 & hour < 13 # a logical vector to use for subsetting
# replace subset with subset + 1 hour
d[["x"]][subsetBool] <- d[["x"]][subsetBool] + 60 * 60
d
# x
#1 1970-01-01 06:00:00
#2 1970-01-01 07:00:00
#3 1970-01-01 09:00:00
#4 1970-01-01 10:00:00
#5 1970-01-01 11:00:00
#6 1970-01-01 12:00:00
#7 1970-01-01 13:00:00
#8 1970-01-01 13:00:00
#9 1970-01-01 14:00:00
Related
I have a dataset of hourly observations with the format %Y%m%d %H:%M that results like this 2020-03-01 01:00:00 for various days. How can filter filter out a certain time interval? My goal is to maintain the observations between 08:00 and 20:00.
You can extract the hour value from the column and keep the rows between 8 and 20 hours.
df$hour <- as.integer(format(df$datetime, '%H'))
result <- subset(df, hour >= 8 & hour <= 20)
result
# datetime hour
#9 2020-01-01 08:00:00 8
#10 2020-01-01 09:00:00 9
#11 2020-01-01 10:00:00 10
#12 2020-01-01 11:00:00 11
#13 2020-01-01 12:00:00 12
#14 2020-01-01 13:00:00 13
#15 2020-01-01 14:00:00 14
#16 2020-01-01 15:00:00 15
#17 2020-01-01 16:00:00 16
#18 2020-01-01 17:00:00 17
#19 2020-01-01 18:00:00 18
#20 2020-01-01 19:00:00 19
#21 2020-01-01 20:00:00 20
#33 2020-01-02 08:00:00 8
#34 2020-01-02 09:00:00 9
#35 2020-01-02 10:00:00 10
#...
#...
data
df <- data.frame(datetime = seq(as.POSIXct('2020-01-01 00:00:00', tz = 'UTC'),
as.POSIXct('2020-01-10 00:00:00', tz = 'UTC'), 'hour'))
between(hour( your_date_value ), 8, 19)
Create 2 columns in R with one column having 2019 date and in second column time, which has time slot 9.00AM to 8PM with 1 hour gap. So in total for a date we should have 11 columns. For example(below)
I am not sure, what is your desired column type, so you have different options below :-)
Here comes my solution:
library(lubridate)
library(tidyverse)
start <- ymd_hms("2019-05-01 09:00:00")
end <- start + hm("11:00")
tibble(timestamp = seq.POSIXt(start, end, by = 3600)) %>%
mutate(day = date(timestamp),
time = strftime(timestamp, format="%H:%M:%S")) %>%
select(day, time, timestamp)
day time timestamp
<date> <chr> <dttm>
1 2019-05-01 09:00:00 2019-05-01 09:00:00
2 2019-05-01 10:00:00 2019-05-01 10:00:00
3 2019-05-01 11:00:00 2019-05-01 11:00:00
4 2019-05-01 12:00:00 2019-05-01 12:00:00
5 2019-05-01 13:00:00 2019-05-01 13:00:00
6 2019-05-01 14:00:00 2019-05-01 14:00:00
7 2019-05-01 15:00:00 2019-05-01 15:00:00
8 2019-05-01 16:00:00 2019-05-01 16:00:00
9 2019-05-01 17:00:00 2019-05-01 17:00:00
10 2019-05-01 18:00:00 2019-05-01 18:00:00
11 2019-05-01 19:00:00 2019-05-01 19:00:00
12 2019-05-01 20:00:00 2019-05-01 20:00:00
Regards
Paweł
A random date range:
df <- data.frame(
date = seq.Date(Sys.Date() - 6, Sys.Date(), 1)
)
df <- merge(df,expand.grid(date = df$date, time = 9:20))
df <- df[order(df$date, df$time), ]
df$time <- sprintf("%02i:00", df$time)
I've got a dataset with the following shape
ID Start Time End Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00
I would like to count every 15 min for an entire year how many items have started but not finished, so count the number of times with a start time greater or equal than the time I'm looking at and an end time less or equal than the time I'm looking at.
I'm looking for an approach using tidyverse/dplyr if possible.
Any help or guidance would be very much appreciated.
If I understand correctly, the OP wants to count the number of simultaneously active events.
One possibility to tackle this question is the coverage() function from Bioconductor's IRange package. Another one is to aggregate in a non-equi join which is available with the data.table package.
Non-equi join
# create sequence of datetimes (limited to 4 days for demonstration)
seq15 <- seq(lubridate::as_datetime("2017-01-01"),
lubridate::as_datetime("2017-01-05"), by = "15 mins")
# aggregate within a non-equi join
library(data.table)
result <- periods[.(time = seq15), on = .(Start.Time <= time, End.Time > time),
.(time, count = sum(!is.na(ID))), by = .EACHI][, .(time, count)]
result
time count
1: 2017-01-01 00:00:00 0
2: 2017-01-01 00:15:00 1
3: 2017-01-01 00:30:00 1
4: 2017-01-01 00:45:00 1
5: 2017-01-01 01:00:00 1
---
381: 2017-01-04 23:00:00 0
382: 2017-01-04 23:15:00 0
383: 2017-01-04 23:30:00 0
384: 2017-01-04 23:45:00 0
385: 2017-01-05 00:00:00 0
The result can be visualized graphically:
library(ggplot2)
ggplot(result) + aes(time, count) + geom_step()
Data
periods <- readr::read_table(
"ID Start.Time End.Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00"
)
# convert date strings to class Date
library(data.table)
cols <- names(periods)[names(periods) %like% "Time$"]
setDT(periods)[, (cols) := lapply(.SD, lubridate::dmy_hms), .SDcols = cols]
periods
ID Start.Time End.Time
1: 1 2017-01-01 00:15:00 2017-01-01 07:15:00
2: 2 2017-01-01 04:45:00 2017-01-01 06:15:00
3: 3 2017-01-01 10:20:00 2017-01-01 20:15:00
4: 4 2017-01-01 02:15:00 2017-01-01 00:15:00
5: 5 2017-01-02 15:15:00 2017-01-03 00:30:00
6: 6 2017-01-03 07:00:00 2017-01-04 09:15:00
I have a dataframe where I splitted the datetime column by date and time (two columns). However, when I group by time it gives me duplicates in time. So, to analyze it I used table() on time column, and it gave me duplicates also. This is a sample of it:
> table(df$time)
00:00:00 00:00:00 00:15:00 00:15:00 00:30:00 00:30:00
2211 1047 2211 1047 2211 1047
As you may see, when I splitted one of the "unique" values kept a " " inside. Is there a easy way to solve this?
PS: The datatype of the time column is character.
EDIT: Code added
df$datetime <- as.character.Date(df$datetime)
x <- colsplit(df$datetime, ' ', names = c('Date','Time'))
df <- cbind(df, x)
There are a number of approaches. One of them is to use appropriate functions to extract Dates and Times from Datetime column:
df <- data.frame(datetime = seq(
from=as.POSIXct("2018-5-15 0:00", tz="UTC"),
to=as.POSIXct("2018-5-16 24:00", tz="UTC"),
by="30 min") )
head(df$datetime)
#[1] "2018-05-15 00:00:00 UTC" "2018-05-15 00:30:00 UTC" "2018-05-15 01:00:00 UTC" "2018-05-15 01:30:00 UTC"
#[5] "2018-05-15 02:00:00 UTC" "2018-05-15 02:30:00 UTC"
df$Date <- as.Date(df$datetime)
df$Time <- format(df$datetime,"%H:%M:%S")
head(df)
# datetime Date Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 01:30:00 2018-05-15 01:30:00
# 5 2018-05-15 02:00:00 2018-05-15 02:00:00
# 6 2018-05-15 02:30:00 2018-05-15 02:30:00
table(df$Time)
#00:00:00 00:30:00 01:00:00 01:30:00 02:00:00 02:30:00 03:00:00 03:30:00 04:00:00 04:30:00 05:00:00 05:30:00
#3 2 2 2 2 2 2 2 2 2 2 2
#06:00:00 06:30:00 07:00:00 07:30:00 08:00:00 08:30:00 09:00:00 09:30:00 10:00:00 10:30:00 11:00:00 11:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#12:00:00 12:30:00 13:00:00 13:30:00 14:00:00 14:30:00 15:00:00 15:30:00 16:00:00 16:30:00 17:00:00 17:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#18:00:00 18:30:00 19:00:00 19:30:00 20:00:00 20:30:00 21:00:00 21:30:00 22:00:00 22:30:00 23:00:00 23:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#If the data were given as character strings and contain extra spaces the above approach will still work
df <- data.frame(datetime=c("2018-05-15 00:00:00","2018-05-15 00:30:00",
"2018-05-15 01:00:00", "2018-05-15 02:00:00",
"2018-05-15 00:00:00","2018-05-15 00:30:00"),
stringsAsFactors=FALSE)
df$Date <- as.Date(df$datetime)
df$Time <- format(as.POSIXct(df$datetime, tz="UTC"),"%H:%M:%S")
head(df)
# datetime Date Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 02:00:00 2018-05-15 02:00:00
# 5 2018-05-15 00:00:00 2018-05-15 00:00:00
# 6 2018-05-15 00:30:00 2018-05-15 00:30:00
table(df$Time)
#00:00:00 00:30:00 01:00:00 02:00:00
# 2 2 1 1
reshape2::colsplit accepts regular expressions, so you could split on '\s+' which matches 1 or more whitespace characters.
You can find out more about regular expressions in R using ?base::regex. The syntax is generally constant between languages, so you can use pretty much any regex tutorial. Take a look at https://regex101.com/. This site evaluates your regular expressions in real time and shows you exactly what each part is matching. It is extremely helpful!
Keep in mind that in R, as compared to most other languages, you must double the number of backslashes \. So \s (to match 1 whitespace character) must be written as \\s in R.
I have hourly rainfall and temperature data for long period. I would like to get daily values from hourly data. I am considering day means from 07:00:00 to next day 07:00:00.
Could you tell me how to convert hourly data to daily between specific time interval?
example : 07:00:00 to 07:00:00 or 12:00:00 to 12:00:00)
Rainfall data looks like:
1970-01-05 00:00:00 1.0
1970-01-05 01:00:00 1.0
1970-01-05 02:00:00 1.0
1970-01-05 03:00:00 1.0
1970-01-05 04:00:00 1.0
1970-01-05 05:00:00 3.6
1970-01-05 06:00:00 3.6
1970-01-05 07:00:00 2.2
1970-01-05 08:00:00 2.2
1970-01-05 09:00:00 2.2
1970-01-05 10:00:00 2.2
1970-01-05 11:00:00 2.2
1970-01-05 12:00:00 2.2
1970-01-05 13:00:00 2.2
1970-01-05 14:00:00 2.2
1970-01-05 15:00:00 2.2
1970-01-05 16:00:00 0.0
1970-01-05 17:00:00 0.0
1970-01-05 18:00:00 0.0
1970-01-05 19:00:00 0.0
1970-01-05 20:00:00 0.0
1970-01-05 21:00:00 0.0
1970-01-05 22:00:00 0.0
1970-01-05 23:00:00 0.0
1970-01-06 00:00:00 0.0
First, create some reproducible data so we can help you better:
require(xts)
set.seed(1)
X = data.frame(When = as.Date(seq(from = ISOdatetime(2012, 01, 01, 00, 00, 00),
length.out = 100, by="1 hour")),
Measurements = sample(1:20, 100, replace=TRUE))
We now have a data frame with 100 hourly observations where the dates start at 2012-01-01 00:00:00 and end at 2012-01-05 03:00:00 (time is in 24-hour format).
Second, convert it to an XTS object.
X2 = xts(X$Measurements, order.by=X$When)
Third, learn how to subset a specific time window.
X2['T04:00/T08:00']
# [,1]
# 2012-01-01 04:00:00 5
# 2012-01-01 05:00:00 18
# 2012-01-01 06:00:00 19
# 2012-01-01 07:00:00 14
# 2012-01-01 08:00:00 13
# 2012-01-02 04:00:00 18
# 2012-01-02 05:00:00 7
# 2012-01-02 06:00:00 10
# 2012-01-02 07:00:00 12
# 2012-01-02 08:00:00 10
# 2012-01-03 04:00:00 9
# 2012-01-03 05:00:00 5
# 2012-01-03 06:00:00 2
# 2012-01-03 07:00:00 2
# 2012-01-03 08:00:00 7
# 2012-01-04 04:00:00 18
# 2012-01-04 05:00:00 8
# 2012-01-04 06:00:00 16
# 2012-01-04 07:00:00 20
# 2012-01-04 08:00:00 9
Fourth, use that information with apply.daily and whatever function you want, as follows:
apply.daily(X2['T04:00/T08:00'], mean)
# [,1]
# 2012-01-01 08:00:00 13.8
# 2012-01-02 08:00:00 11.4
# 2012-01-03 08:00:00 5.0
# 2012-01-04 08:00:00 14.2
Update: Custom endpoints
After re-reading your question, I see that I misinterpreted what you wanted.
It seems that you want to take the mean of a 24 hour period, not necessarily from midnight to midnight.
For this, you should ditch apply.daily and instead, use period.apply with custom endpoints, like this:
# You want to start at 7AM. Find out which record is the first one at 7AM.
A = which(as.character(index(X2)) == "2012-01-01 07:00:00")
# Use that to create your endpoints.
# The ends of the endpoints should start at 0
# and end at the max number of records.
ep = c(0, seq(A, 100, by=24), 100)
period.apply(X2, INDEX=ep, FUN=function(x) mean(x))
# [,1]
# 2012-01-01 07:00:00 12.62500
# 2012-01-02 07:00:00 10.08333
# 2012-01-03 07:00:00 10.79167
# 2012-01-04 07:00:00 11.54167
# 2012-01-05 03:00:00 10.25000
You can you this code :
fun <- function(s,i,j) { sum(s[i:(i+j-1)]) }
sapply(X=seq(1,24*nb_of_days,24),FUN=fun,s=your_time_serie,j=24)
You just have to change 1 to another value to have different interval of time : 8 of 07:00:00 to 07:00:00 or 13 for 12:00:00 to 12:00:00
Step 1: transform date to POSIXct
ttt <- as.POSIXct("1970-01-05 08:00:00",tz="GMT")
ttt
#"1970-01-05 08:00:00 GMT"
Step 2: substract difftime of 7 hours
ttt <- ttt-as.difftime(7,units="hours")
ttt
#"1970-01-05 01:00:00 GMT"
Step 3: trunc to days
ttt<-trunc(ttt,"days")
ttt
#"1970-01-05 GMT"
Step 4: use plyr, data.table or whatever method you prefer, to calculate daily means
Using regular expressions should get you what you need. Select lines that match your needs and sum the values. Do this for each day within your hour range and you're set.