Maintain date time stamp when calculating time intervals - r
I calculated the time intervals between date and time based on location and sensor. Here is some of my data:
datehour <- c("2016-03-24 20","2016-03-24 06","2016-03-24 18","2016-03-24 07","2016-03-24 16",
"2016-03-24 09","2016-03-24 15","2016-03-24 09","2016-03-24 20","2016-03-24 05",
"2016-03-25 21","2016-03-25 07","2016-03-25 19","2016-03-25 09","2016-03-25 12",
"2016-03-25 07","2016-03-25 18","2016-03-25 08","2016-03-25 16","2016-03-25 09",
"2016-03-26 20","2016-03-26 06","2016-03-26 18","2016-03-26 07","2016-03-26 16",
"2016-03-26 09","2016-03-26 15","2016-03-26 09","2016-03-26 20","2016-03-26 05",
"2016-03-27 21","2016-03-27 07","2016-03-27 19","2016-03-27 09","2016-03-27 12",
"2016-03-27 07","2016-03-27 18","2016-03-27 08","2016-03-27 16","2016-03-27 09")
location <- c(1,1,2,2,3,3,4,4,"out","out",1,1,2,2,3,3,4,4,"out","out",
1,1,2,2,3,3,4,4,"out","out",1,1,2,2,3,3,4,4,"out","out")
sensor <- c(1,16,1,16,1,16,1,16,1,16,1,16,1,16,1,16,1,16,1,16,
1,16,1,16,1,16,1,16,1,16,1,16,1,16,1,16,1,16,1,16)
Temp <- c(35,34,92,42,21,47,37,42,63,12,35,34,92,42,21,47,37,42,63,12,
35,34,92,42,21,47,37,42,63,12,35,34,92,42,21,47,37,42,63,12)
df <- data.frame(datehour,location,sensor,Temp)
I used the following code to calculate the time differences. However it does not maintain the correct date hour with each entry. See columns datehour1 and datehour2.
df$datehour <- as.POSIXct(df$datehour, format = "%Y-%m-%d %H")
final.time.df <- setDT(df)[order(datehour, location, sensor), .(difftime(datehour[-length(datehour)],
datehour[-1], unit = "hour"),
datehour1 = datehour[1], datehour2 = datehour[2]), .(location, sensor)]
I would like each time difference to have the two times used to calculate it to identify it. I would like the result to be the following:
location sensor V1 datehour1 datehour2
out 16 -28 hours 2016-03-24 05:00:00 2016-03-25 09:00:00
1 16 -25 hours 2016-03-24 06:00:00 2016-03-25 07:00:00
2 16 -26 hours 2016-03-24 07:00:00 2016-03-25 09:00:00
3 16 -22 hours 2016-03-24 09:00:00 2016-03-25 07:00:00
4 16 -23 hours 2016-03-24 09:00:00 2016-03-25 08:00:00
4 1 -27 hours 2016-03-24 15:00:00 2016-03-25 18:00:00
3 1 -20 hours 2016-03-24 16:00:00 2016-03-25 12:00:00
2 1 -25 hours 2016-03-24 18:00:00 2016-03-25 19:00:00
1 1 -25 hours 2016-03-24 20:00:00 2016-03-25 21:00:00
out 1 -20 hours 2016-03-24 20:00:00 2016-03-25 16:00:00
Okay, so I'm not an expert by any means at data.tables solutions, and as a result I'm not quite sure how you're using the grouping statement to resolve the number of values down to 10.
That said, I think the answer to your question (if you haven't already solved this another way) lies in the difftime(datehour[-length(datehour)], datehour[-1], unit = "hour") chunk of code, but not in the fact that it's calculating the difference incorrectly, but in that it's preventing the grouping statement from resolving to the expected number of groups.
I tried separating the grouping from the time difference calculation, and was able to get to your expected output (obviously some formatting required):
final.time.df <- setDT(df)[order(datehour, location, sensor), .(datehour1 = datehour[1], datehour2 = datehour[2]), .(location, sensor)]
final.time.df$diff = final.time.df$datehour1 - final.time.df$datehour2
If I've missed the point, feel free to let me know and I'll delete the answer! I know it's not a particularly insightful answer, but it looks like this might do it, and I'm stuck on a problem myself right now, and wanted to try to help.
Related
Calculating mean and sd of bedtime (hh:mm) in R - problem are times before/after midnight
I got the following dataset: data <- read.table(text=" wake_time sleep_time 08:38:00 23:05:00 09:30:00 00:50:00 06:45:00 22:15:00 07:27:00 23:34:00 09:00:00 23:00:00 09:05:00 00:10:00 06:40:00 23:28:00 10:00:00 23:30:00 08:10:00 00:10:00 08:07:00 00:38:00", header=T) I used the chron-package to calculate the average wake_time: > mean(times(data$wake_time)) [1] 08:20:12 But when I do the same for the variable sleep_time, this happens: > mean(times(data$sleep_time)) [1] 14:04:00 I guess the result is distorted because the sleep_time contains times before and after midnight. But how can I solve this problem? Additionally: How can I calculate the sd of the times. I want to use it like "mean wake-up-time 08:20 ± 44 min" for example.
THe times values are stored as numbers 0-1 representing a fraction of a day. If the sleep time is earlier than the wake time, you can "add a day" before taking the mean. For example library(chron) wake <- times(data$wake_time) sleep <- times(data$sleep_time) times(mean(ifelse(sleep < wake, sleep+1, sleep))) # [1] 23:40:00 And since the values are parts of a day, if you want the sd in minutes, you'd take the partial day values and convert to minutes sd(ifelse(sleep < wake, sleep+1, sleep) * 24*60) # [1] 47.60252
R convert hourly to daily data up to 0:00 instead of 23:00
How do you set 0:00 as end of day instead of 23:00 in an hourly data? I have this struggle while using period.apply or to.period as both return days ending at 23:00. Here is an example : x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:00:00"), by="hour"), x = rnorm(120)) The following functions show periods ends at 23:00 to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "days") x1[endpoints(x1, 'days')] So when I am aggregating the hourly data to daily, does someone have an idea how to set the end of day at 0:00?
As already pointed out by another answer here, to.period on days computes on the data with timestamps between 00:00:00 and 23:59:59.9999999 on the day in question. so 23:00:00 is seen as the last timestamp in your data, and 00:00:00 corresponds to a value in the next day "bin". What you can do is shift all the timestamps back 1 hour, use to.period get the daily data points from the hour points, and then using align.time to get the timestamps aligned correctly. (More generally, to.period is useful for generating OHLCV type data, and so if you're say generating say hourly bars from ticks, it makes sense to look at all the ticks between 23:00:00 and 23:59:59.99999 in the bar creation. then 00:00:00 to 00:59:59.9999.... would form the next hourly bar and so on.) Here is an example: > tail(x1["2018-02-01"]) # [,1] # 2018-02-01 18:00:00 -1.2760349 # 2018-02-01 19:00:00 -0.1496041 # 2018-02-01 20:00:00 -0.5989614 # 2018-02-01 21:00:00 -0.9691905 # 2018-02-01 22:00:00 -0.2519618 # 2018-02-01 23:00:00 -1.6081656 > head(x1["2018-02-02"]) # [,1] # 2018-02-02 00:00:00 -0.3373271 # 2018-02-02 01:00:00 0.8312698 # 2018-02-02 02:00:00 0.9321747 # 2018-02-02 03:00:00 0.6719425 # 2018-02-02 04:00:00 -0.5597391 # 2018-02-02 05:00:00 -0.9810128 > head(x1["2018-02-03"]) # [,1] # 2018-02-03 00:00:00 2.3746424 # 2018-02-03 01:00:00 0.8536594 # 2018-02-03 02:00:00 -0.2467268 # 2018-02-03 03:00:00 -0.1316978 # 2018-02-03 04:00:00 0.3079848 # 2018-02-03 05:00:00 0.2445634 x2 <- x1 .index(x2) <- .index(x1) - 3600 > tail(x2["2018-02-01"]) # [,1] # 2018-02-01 18:00:00 -0.1496041 # 2018-02-01 19:00:00 -0.5989614 # 2018-02-01 20:00:00 -0.9691905 # 2018-02-01 21:00:00 -0.2519618 # 2018-02-01 22:00:00 -1.6081656 # 2018-02-01 23:00:00 -0.3373271 x.d2 <- to.period(x2, OHLC = FALSE, drop.date = FALSE, period = "days") > x.d2 # [,1] # 2018-01-31 23:00:00 0.12516594 # 2018-02-01 23:00:00 -0.33732710 # 2018-02-02 23:00:00 2.37464235 # 2018-02-03 23:00:00 0.51797747 # 2018-02-04 23:00:00 0.08955208 # 2018-02-05 22:00:00 0.33067734 x.d2 <- align.time(x.d2, n = 86400) > x.d2 # [,1] # 2018-02-01 0.12516594 # 2018-02-02 -0.33732710 # 2018-02-03 2.37464235 # 2018-02-04 0.51797747 # 2018-02-05 0.08955208 # 2018-02-06 0.33067734 Want to convince yourself? Try something like this: x3 <- rbind(x1, xts(x = matrix(c(1,2), nrow = 2), order.by = as.POSIXct(c("2018-02-01 23:59:59.999", "2018-02-02 00:00:00")))) x3["2018-02-01 23/2018-02-02 01"] # [,1] # 2018-02-01 23:00:00.000 -1.6081656 # 2018-02-01 23:59:59.999 1.0000000 # 2018-02-02 00:00:00.000 -0.3373271 # 2018-02-02 00:00:00.000 2.0000000 # 2018-02-02 01:00:00.000 0.8312698 x3.d <- to.period(x3, OHLC = FALSE, drop.date = FALSE, period = "days") > x3.d <- align.time(x3.d, 86400) > x3.d [,1] 2018-02-02 1.00000000 2018-02-03 -0.09832625 2018-02-04 -0.65075506 2018-02-05 -0.09423664 2018-02-06 0.33067734 See that the value of 2 on 00:00:00 did not form the last observation in the day for 2018-02-02 (00:00:00), which went from 2018-02-01 00:00:00 to 2018-02-01 23:59:59.9999. Of course, if you want the daily timestamp to be the start of the day, not the end of the day, which would be 2018-02-01 as start of bar for the first row, in x3.d above, you could shift back the day by one. You could do this relatively safely for most timezones, when your data doesn't involve weekend dates: index(x3.d) = index(x3.d) - 86400 I say relatively safetly, because there are corner cases when there are time shifts in a time zone. e.g. Be careful with day light savings. Simply subtracting -86400 can be a problem when going from Sunday to Saturday in time zones where day light saving occurs: #e.g. bad: day light savings occurs on this weekend for US EST z <- xts(x = 9, order.by = as.POSIXct("2018-03-12", tz = "America/New_York")) > index(z) - 86400 [1] "2018-03-10 23:00:00 EST" i.e. the timestamp is off by one hour, when you really want the midnight timestamp (00:00:00). You could get around this problem using something much safer like this: library(lubridate) # right > index(z) - days(1) [1] "2018-03-11 EST"
I don't think this is possible because 00:00 is the start of the day. From the manual: These endpoints are aligned in POSIXct time to the zero second of the day at the beginning, and the 59.9999th second of the 59th minute of the 23rd hour of the final day I think the solution here is to use minutes instead of hours. Using your example: x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:59:99"), by="min"), x = rnorm(7200)) to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "day") x1[endpoints(x1, 'day')]
Creating time index that creates same intervals every day
I want to make a time index that starts at 9:15:00 and progress in 50-minute intervals before ending at 03:30:00. E.g. 9:15:00,10:05:00, 10:55:00 and so on. This is my code that perfectly creates these time indices for 1st day. However, it becomes messed up next day and begins at 09:25:00 instead of 09:15:00 and gets all intervals wrong. The start time keeps changing every day. Intervals <- seq(as.POSIXct("2016-04-01 09:15:00", format="%Y-%m-%d %H:%M:%S"), as.POSIXct("2016-04-29 15:30:00", format="%Y-%m-%d %H:%M:%S"), by="50 min") As I am trying to calculate various intervals by change only argument by="50 min" for example to by="55 min" and need flexibility in fixing the end time, so I put it as before 15:30:00 . Please me to fix It?
You might be better off generating one sequence and reusing this for each day. As so: start <- "2016-04-01" stop <- "2016-04-29" daylength <- difftime(as.POSIXct(stop), as.POSIXct(start), units="days") Intervals <- seq( as.POSIXct(paste(start, "09:15")), as.POSIXct(paste(start, "15:30")), by="50 min" ) out <- Intervals + as.difftime(rep(0:daylength, each=length(Intervals)), units="days") range(out) #[1] "2016-04-01 09:15:00 AEST" "2016-04-29 15:05:00 AEST"
It may be worth exploring making use of the cut function. For example for the set of days: myDays <- seq( from = as.Date("2016-04-01"), to = as.Date("2016-04-29"), by = "day" ) the one could arrive at 50 minutes intervals for each day via: myIntervals <- data.frame(table(cut(x = as.POSIXct(myDays), breaks = "50 min"))) Preview >> head(myIntervals, 10) Var1 Freq 1 2016-04-01 01:00:00 1 2 2016-04-01 01:50:00 0 3 2016-04-01 02:40:00 0 4 2016-04-01 03:30:00 0 5 2016-04-01 04:20:00 0 6 2016-04-01 05:10:00 0 7 2016-04-01 06:00:00 0 8 2016-04-01 06:50:00 0 9 2016-04-01 07:40:00 0 10 2016-04-01 08:30:00 0
Time to failure variable based off start and end timestamps in R
I have two data sets. Data set 1 contains time stamps of 15 minute intervals starting at 2009-08-18 18:15:00 and ending 2012-11-09 22:30:00 with measurements taken at those times. Data set 2 has start and end time stamps for faults occurring in a factory. There are 6 faults and these faults' start and end times are also 15 min intervals, yet can last longer than 1 interval. They also all fall somewhere between 2009-08-18 18:15:00 and 2012-11-09 22:30:00 as well. I am trying to create a time to failure variable for the faults, where -i would indicate the next fault is i intervals (which are 15 mins) away and i would indicate the fault started i intervals ago. For example, DataSet1 Timestamp Sensor 1 2009-09-04 10:00:00 30 2009-09-04 10:30:00 40 2009-09-04 10:45:00 33 2009-09-04 11:00:00 23 2009-09-04 11:15:00 24 2009-09-04 11:30:00 42 DataSet 2 Start Time End Time Fault Type 09/04/09 10:45 9/4/2009 11:15 1 09/04/09 21:45 9/4/2009 22:00 1 09/04/09 23:00 9/4/2009 23:15 1 09/05/09 10:45 9/5/2009 11:15 1 09/05/09 21:30 9/5/2009 23:15 1 09/08/09 10:45 9/8/2009 12:30 1 So what I want to end up with is the following time to failure variable (TTF1) and then repeat the process for faults 2-6 Timestamp Sensor 1 TTF1 2009-09-04 10:00:00 30 -3 2009-09-04 10:30:00 40 -1 2009-09-04 10:45:00 33 0 2009-09-04 11:00:00 23 1 2009-09-04 11:15:00 24 2 2009-09-04 11:30:00 42 -41 I know I can use the sqldf function to separate out each fault type, but I have no clue where to begin to even create counting the time to fault variable. I'm very stuck, any help would be greatly appreciated!
You can use the difftime() function from base R to get the time difference between these 2 timestamps: (z <- Sys.time() - 3600) Sys.time() - z # just over 3600 seconds. as.difftime(c("0:3:20", "11:23:15")) as.difftime(c("3:20", "23:15", "2:"), format = "%H:%M") # 3rd gives NA (z <- as.difftime(c(0,30,60), units = "mins")) as.numeric(z, units = "secs") as.numeric(z, units = "hours") format(z) I would recommend set units = "mins". You can convert the class to character, strip out any non-numeric data with gsub, then change the class with as.numeric. Finally just divide by 15 to get the 15-minute time units you want. You can use floor to round the result if needed.
Create a time interval of 15 minutes from minutely data in R?
I have some data which is formatted in the following way: time count 00:00 17 00:01 62 00:02 41 So I have from 00:00 to 23:59hours and with a counter per minute. I'd like to group the data in intervals of 15 minutes such that: time count 00:00-00:15 148 00:16-00:30 284 I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it. I'd really appreciate some help!! Thank you very much!
For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages. First, create some fake data: set.seed(4984) dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60), count=sample(1:50, 100, replace=TRUE)) Base R cut the data into 15 minute groups: dat$by15 = cut(dat$time, breaks="15 min") time count by15 1 2016-05-01 00:00:00 22 2016-05-01 00:00:00 2 2016-05-01 00:01:00 11 2016-05-01 00:00:00 3 2016-05-01 00:02:00 31 2016-05-01 00:00:00 ... 98 2016-05-01 01:37:00 20 2016-05-01 01:30:00 99 2016-05-01 01:38:00 29 2016-05-01 01:30:00 100 2016-05-01 01:39:00 37 2016-05-01 01:30:00 Now aggregate by the new grouping column, using sum as the aggregation function: dat.summary = aggregate(count ~ by15, FUN=sum, data=dat) by15 count 1 2016-05-01 00:00:00 312 2 2016-05-01 00:15:00 395 3 2016-05-01 00:30:00 341 4 2016-05-01 00:45:00 318 5 2016-05-01 01:00:00 349 6 2016-05-01 01:15:00 397 7 2016-05-01 01:30:00 341 dplyr library(dplyr) dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>% summarise(count=sum(count)) data.table library(data.table) dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")] UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it. If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14. If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.
The cut approach is handy but slow with large data frames. The following approach is approximately 1,000x faster than the cut approach (tested with 400k records.) # Function: Truncate (floor) POSIXct to time interval (specified in seconds) # Author: Stephen McDaniel # PowerTrip Analytics # Date : 2017MAY # Copyright: (C) 2017 by Freakalytics, LLC # License: MIT floor_datetime <- function(date_var, floor_seconds = 60, origin = "1970-01-01") { # defaults to minute rounding if(!is(date_var, "POSIXct")) stop("Please pass in a POSIXct variable") if(is.na(date_var)) return(as.POSIXct(NA)) else { return(as.POSIXct(floor(as.numeric(date_var) / (floor_seconds))*(floor_seconds), origin = origin)) } } Sample output: test <- data.frame(good = as.POSIXct(Sys.time()), bad1 = as.Date(Sys.time()), bad2 = as.POSIXct(NA)) test$good_15 <- floor_datetime(test$good, 15 * 60) test$bad1_15 <- floor_datetime(test$bad1, 15 * 60) Error in floor_datetime(test$bad, 15 * 60) : Please pass in a POSIXct variable test$bad2_15 <- floor_datetime(test$bad2, 15 * 60) test good bad1 bad2 good_15 bad2_15 1 2017-05-06 13:55:34.48 2017-05-06 <NA> 2007-05-06 13:45:00 <NA>
You can do it in one line by using trs function from FQOAT, just like: df_15mins=trs(df, "15 mins") Below is a repeatable example: library(foqat) head(aqi[,c(1,2)]) # Time NO #1 2017-05-01 01:00:00 0.0376578 #2 2017-05-01 01:01:00 0.0341483 #3 2017-05-01 01:02:00 0.0310285 #4 2017-05-01 01:03:00 0.0357016 #5 2017-05-01 01:04:00 0.0337507 #6 2017-05-01 01:05:00 0.0238120 #mean aqi_15mins=trs(aqi[,c(1,2)], "15 mins") head(aqi_15mins) # Time NO #1 2017-05-01 01:00:00 0.02736549 #2 2017-05-01 01:15:00 0.03244958 #3 2017-05-01 01:30:00 0.03743626 #4 2017-05-01 01:45:00 0.02769419 #5 2017-05-01 02:00:00 0.02901817 #6 2017-05-01 02:15:00 0.03439455