Time difference calculations considering the midnight time using r - r

I am working on a problem where I need to calculate the time difference in minutes. I have the time values in hh:mm:ss format in a column (more than 28,000 values).
I have been using the following function to calculate the time difference.
tdiff <- dt[dt, Time_Diff := c(abs(diff(as.numeric(Time))),0.30), Student_ID]
where dt --> is the ordered data table and
0.30 --> 30 minutes assigned to the last activity of the student in a course.
This works, but it is not considering the midnight time.
Thanks to #niko for his help and this problem is solved, however the '30 minutes' that should be assigned to each student's last activity is still not done. Any help in this direction will be greatly appreciated. Thank you.
The expected output is described below
S_Id Date Time Time_Diff Time_Diff(minutes)
A 10/08/2018 23:49:00 00:01:00 1 minutes
A 10/08/2018 23:50:00 00:09:00 9
A 10/08/2018 23:59:00 00:02:00 2
A 10/09/2018 00:01:00 00:09:00 9
A 10/09/2018 00:10:00 08:02:00 482
A 10/09/2018 08:12:00 04:08:00 248
A 10/09/2018 12:20:00 10:01:00 601
A 10/09/2018 22:21:00 01:35:00 95
A 10/09/2018 23:56:00 00:09:00 9
A 10/10/2018 00:05:00 00:05:00 5
A 10/10/2018 00:10:00 00:02:00 2
A 10/10/2018 00:12:00 00:30:00 30
B 10/08/2018 23:49:00 00:01:00 1
B 10/08/2018 23:50:00 00:09:00 9
B 10/08/2018 23:59:00 00:02:00 2
B 10/09/2018 00:01:00 00:09:00 9
B 10/09/2018 00:10:00 08:02:00 482
B 10/09/2018 08:12:00 04:08:00 248
B 10/09/2018 12:20:00 10:01:00 601
B 10/09/2018 22:21:00 01:35:00 95
B 10/09/2018 23:56:00 00:09:00 9
B 10/10/2018 00:05:00 00:05:00 5
B 10/10/2018 00:10:00 00:02:00 2
B 10/10/2018 00:12:00 00:30:00 30
C 10/08/2018 23:49:00 00:01:00 1
C 10/08/2018 23:50:00 00:09:00 9
C 10/08/2018 23:59:00 00:02:00 2
C 10/09/2018 00:01:00 00:09:00 9
C 10/09/2018 00:10:00 08:02:00 482
C 10/09/2018 08:12:00 04:08:00 248
C 10/09/2018 12:20:00 10:01:00 601
C 10/09/2018 22:21:00 01:35:00 95
C 10/09/2018 23:56:00 00:09:00 9
C 10/10/2018 00:05:00 00:05:00 5
C 10/10/2018 00:10:00 00:02:00 2
C 10/10/2018 00:12:00 00:30:00 30

Try converting date and time to POSIXct
# dt is your data frame
diff(as.POSIXct(paste(dt$Date, dt$Time), format='%m/%d/%Y %H:%M:%S')) # or '%d/%m/%Y %H:%M:%S'
That should do the trick.
Data
dt <- structure(list(Date = c("10/08/2018", "10/08/2018", "10/08/2018", "10/09/2018", "10/09/2018",
"10/09/2018", "10/09/2018", "10/09/2018", "10/09/2018", "10/10/2018",
"10/10/2018", "10/10/2018"),
Time = c("23:49:00", "23:50:00", "23:59:00", "00:01:00", "00:10:00", "08:12:00",
"12:20:00", "22:21:00", "23:56:00", "00:05:00", "00:10:00", "00:12:00")),
class = "data.frame", row.names = c(NA, -12L))

Related

Joining datasets in nearest time R

I am trying to join 2 data sets with the closest TimeDate(POSTIXct format).
Indeed, some of the DateTime are properly matching and others are different by 5 min.
(df1) //// every 5 min with specific time points
# A tibble: 6 × 3
TimeDate TimeDateAnimal Event
<dttm> <chr> <dbl>
1 2015-03-01 00:55:00 2015-03-01 00:55:00 G 1
**2 2015-03-01 03:40:00 2015-03-01 03:40:00 G 1
3 2015-03-01 03:45:00 2015-03-01 03:45:00 G 1**
4 2015-03-01 13:35:00 2015-03-01 13:35:00 G 1
5 2015-03-01 18:45:00 2015-03-01 18:45:00 G 1
6 2015-03-01 19:10:00 2015-03-01 19:10:00 G 1
> (df2) /// every 10 min
A tibble: 52 × 3
TimeDate TimeDateAnimal Temperature
<dttm> <chr> <dbl>
1 2015-03-01 00:05:00 2015-03-01 00:05:00 G 38.52000
2 2015-03-01 00:15:00 2015-03-01 00:15:00 G 38.65333
3 2015-03-01 00:25:00 2015-03-01 00:25:00 G 38.78667
4 2015-03-01 00:35:00 2015-03-01 00:35:00 G 38.86000
5 2015-03-01 00:45:00 2015-03-01 00:45:00 G 38.92667
6 2015-03-01 00:55:00 2015-03-01 00:55:00 G 38.99333
..
**34 2015-03-01 03:35:00 2015-03-01 03:35:00 G 38.80000
35 2015-03-01 03:45:00 2015-03-01 03:45:00 G 38.80000**
I would like this output:
Merge df:
TimeDate TimeDateAnimal Temperature Event
<dttm> <chr> <dbl> <dbl>
1 2015-03-01 00:05:00 2015-03-01 00:05:00 G 38.52000 NA
2 2015-03-01 00:15:00 2015-03-01 00:15:00 G 38.65333 NA
3 2015-03-01 00:25:00 2015-03-01 00:25:00 G 38.78667 NA
4 2015-03-01 00:35:00 2015-03-01 00:35:00 G 38.86000 NA
5 2015-03-01 00:45:00 2015-03-01 00:45:00 G 38.92667 NA
6 2015-03-01 00:55:00 2015-03-01 00:55:00 G 38.99333 NA
..
**34 2015-03-01 03:35:00 2015-03-01 03:35:00 G 38.80000 1
35 2015-03-01 03:45:00 2015-03-01 03:45:00 G 38.80000 1**
I tried fuzzyjoin, data.table but I always get extra raw instead of merging by the nearest TimeDate
Test<- merge(df1, df2, by = "TimeDate", roll = "nearest",all = T)
#head(Test, n=15)
TimeDate TimeDateAnimal.x Event TimeDateAnimal.y Temperature
10 2015-03-01 00:45:00 <NA> NA 2015-03-01 00:45:00 G 38.92667
11 2015-03-01 00:55:00 2015-03-01 00:55:00 G 1 <NA> NA
12 2015-03-01 00:55:00 <NA> NA 2015-03-01 00:55:00 G 38.99333
Thanks in advance.

extract the remaining time period

I have two data frames.
df1
Tstart Tend start_temp
2012-12-19 21:12:00 2012-12-20 02:48:00 17.7637930350627
2013-01-31 17:36:00 2013-01-31 22:54:00 18.9618654078963
2013-02-14 09:12:00 2013-02-14 09:48:00 18.2361739981826
2013-02-21 15:36:00 2013-02-21 16:36:00 20.9938186870285
2013-03-21 03:54:00 2013-03-21 05:18:00 16.7130008152092
2013-03-30 23:42:00 2013-03-31 02:30:00 15.3775459369926
df2
datetime airtemp
2012-12-11 23:00:00 14.40
2012-12-11 23:06:00 14.22
2012-12-11 23:12:00 14.04
2012-12-11 23:18:00 13.86
2012-12-11 23:24:00 13.68
2012-12-11 23:30:00 13.50
......
2015-03-31 23:24:00 15.46
2015-03-31 23:30:00 15.90
2015-03-31 23:36:00 15.82
2015-03-31 23:42:00 15.74
I want to extract the remaining datetime from df2 (df2 is a time series) other than the periods between startT and endT in df1.
Can you please help me to do this?
Many thanks.
With base R we can try the following (with the following df1 & df2):
df1 <- read.csv(text='Tstart, Tend, start_temp
2012-12-19 21:12:00, 2012-12-20 02:48:00, 17.7637930350627
2013-01-31 17:36:00, 2013-01-31 22:54:00, 18.9618654078963
2013-02-14 09:12:00, 2013-02-14 09:48:00, 18.2361739981826
2013-02-21 15:36:00, 2013-02-21 16:36:00, 20.9938186870285
2013-03-21 03:54:00, 2013-03-21 05:18:00, 16.7130008152092
2013-03-30 23:42:00, 2013-03-31 02:30:00, 15.3775459369926', header=TRUE)
df2 <- read.csv(text='datetime, airtemp
2012-12-11 23:00:00, 14.40
2012-12-11 23:06:00, 14.22
2012-12-11 23:12:00, 14.04
2012-12-11 23:18:00, 13.86
2012-12-11 23:24:00, 13.68
2012-12-19 23:30:00, 13.50
2013-03-21 04:24:00, 15.46
2013-03-21 23:30:00, 15.90
2015-03-31 23:36:00, 15.82
2015-03-31 23:42:00, 15.74', header=TRUE)
df1$Tstart <- strptime(as.character(df1$Tstart), '%Y-%m-%d %H:%M:%S')
df1$Tend <- strptime(as.character(df1$Tend), '%Y-%m-%d %H:%M:%S')
df2$datetime <- strptime(as.character(df2$datetime), '%Y-%m-%d %H:%M:%S')
indices <- sapply(1:nrow(df2), function(j) all(sapply(1:nrow(df1), function(i) df2[j,]$datetime < df1[i,]$Tstart | df2[j,]$datetime > df1[i,]$Tend)))
df2[indices,]
# datetime airtemp
#1 2012-12-11 23:00:00 14.40
#2 2012-12-11 23:06:00 14.22
#3 2012-12-11 23:12:00 14.04
#4 2012-12-11 23:18:00 13.86
#5 2012-12-11 23:24:00 13.68
#8 2013-03-21 23:30:00 15.90
#9 2015-03-31 23:36:00 15.82
#10 2015-03-31 23:42:00 15.74

How to get equally spaced intervals when counting factors?

I have some difficulties to create an time interval with 30 min breaks beginning either with the full hour 00 or full hour 00 and 30 min:
For instance:
library(reshape2)
library(dplyr)
# Given some data which resemble the original data
foo <- data.frame(start.time = c("2012-02-01 13:47:00",
"2012-02-01 14:02:00",
"2012-02-01 14:20:00",
"2012-02-01 14:40:00",
"2012-02-01 15:08:00",
"2012-02-01 16:01:00",
"2012-02-01 16:02:00",
"2012-02-01 16:20:00",
"2012-02-01 17:09:00",
"2012-02-01 18:08:00",
"2012-02-01 18:20:00",
"2012-02-01 19:08:00"
),
employee = c("mike","john","john","steven","mike","mike","mike","steven","mike","steven","mike","mike"))
start.time employee
#1 2012-02-01 13:47:00 mike
#2 2012-02-01 14:02:00 john
#3 2012-02-01 14:20:00 john
#4 2012-02-01 14:40:00 steven
#5 2012-02-01 15:08:00 mike
#6 2012-02-01 16:01:00 mike
#7 2012-02-01 16:02:00 mike
#8 2012-02-01 16:20:00 steven
#9 2012-02-01 17:09:00 mike
#10 2012-02-01 18:08:00 steven
#11 2012-02-01 18:20:00 mike
#12 2012-02-01 19:08:00 mike
# change factor to POSIXct
foo$start.time <- as.POSIXct(foo$start.time)
# long to wide
my_emp<- dcast(foo, start.time ~ employee, fun.aggregate = length)
# 30 min breaks
my_emp_ag<- my_emp %>% group_by(start.time = as.POSIXct(cut(start.time, breaks="30 min"))) %>%
summarize(john = sum(john ),mike = sum(mike ),steven = sum(steven))
# Missing intervalls
miss_interval <- data.frame(start.time=seq(from = min(as.POSIXct(my_emp$start.time)), to= max(as.POSIXct(my_emp$start.time)), by = "30 mins"))
# join old woth new
substitited <- left_join(miss_interval,my_emp_ag,by=c('start.time'))
# change NA to zero
substitited[is.na(substitited)] <- 0
start.time john mike steven
1 2012-02-01 13:47:00 1 1 0
2 2012-02-01 14:17:00 1 0 1
3 2012-02-01 14:47:00 0 1 0
4 2012-02-01 15:17:00 0 0 0
5 2012-02-01 15:47:00 0 2 0
6 2012-02-01 16:17:00 0 0 1
7 2012-02-01 16:47:00 0 1 0
8 2012-02-01 17:17:00 0 0 0
9 2012-02-01 17:47:00 0 0 1
10 2012-02-01 18:17:00 0 1 0
11 2012-02-01 18:47:00 0 1 0
which is almost as desired 2012-02-01 13:30:00 2012-02-01 14:00:00 and so on.
library(data.table)
library(lubridate)
setDT(foo)[, `:=` (
round.time = {
todate = ymd_hms(start.time)
rounddate = floor_date(todate, "30 minutes")
}
)]
start.time employee round.time
1: 2012-02-01 13:47:00 mike 2012-02-01 13:30:00
2: 2012-02-01 14:02:00 john 2012-02-01 14:00:00
3: 2012-02-01 14:20:00 john 2012-02-01 14:00:00
4: 2012-02-01 14:40:00 steven 2012-02-01 14:30:00
5: 2012-02-01 15:08:00 mike 2012-02-01 15:00:00
6: 2012-02-01 16:01:00 mike 2012-02-01 16:00:00
7: 2012-02-01 16:02:00 mike 2012-02-01 16:00:00
8: 2012-02-01 16:20:00 steven 2012-02-01 16:00:00
9: 2012-02-01 17:09:00 mike 2012-02-01 17:00:00
10: 2012-02-01 18:08:00 steven 2012-02-01 18:00:00
11: 2012-02-01 18:20:00 mike 2012-02-01 18:00:00
12: 2012-02-01 19:08:00 mike 2012-02-01 19:00:00

R - Gap fill a time series

I am trying to fill in the gaps in one of my time series by merging a full day time series into my original time series. But for some reason I get duplicate entries and all the rest of my data is NA.
My data looks like this:
> head(data)
TIME Water_Temperature
1 2016-08-22 00:00:00 81.000
2 2016-08-22 00:01:00 80.625
3 2016-08-22 00:02:00 85.000
4 2016-08-22 00:03:00 80.437
5 2016-08-22 00:04:00 85.000
6 2016-08-22 00:05:00 80.375
> tail(data)
TIME Water_Temperature
1398 2016-08-22 23:54:00 19.5
1399 2016-08-22 23:55:00 19.5
1400 2016-08-22 23:56:00 19.5
1401 2016-08-22 23:57:00 19.5
1402 2016-08-22 23:58:00 19.5
1403 2016-08-22 23:59:00 19.5
In between are some minutes missing (1403 rows instead of 1440). I tried to fill them in using:
data.length <- length(data$TIME)
time.min <- data$TIME[1]
time.max <- data$TIME[data.length]
all.dates <- seq(time.min, time.max, by="min")
all.dates.frame <- data.frame(list(TIME=all.dates))
merged.data <- merge(all.dates.frame, data, all=T)
But that gives me a result of 1449 rows instead of 1440. The first eight minutes are duplicates in the time stamp column and all other values in Water_Temperature are NA. Looks like this:
> merged.data[1:25,]
TIME Water_Temperature
1 2016-08-22 00:00:00 NA
2 2016-08-22 00:00:00 81.000
3 2016-08-22 00:01:00 NA
4 2016-08-22 00:01:00 80.625
5 2016-08-22 00:02:00 NA
6 2016-08-22 00:02:00 85.000
7 2016-08-22 00:03:00 NA
8 2016-08-22 00:03:00 80.437
9 2016-08-22 00:04:00 NA
10 2016-08-22 00:04:00 85.000
11 2016-08-22 00:05:00 NA
12 2016-08-22 00:05:00 80.375
13 2016-08-22 00:06:00 NA
14 2016-08-22 00:06:00 80.812
15 2016-08-22 00:07:00 NA
16 2016-08-22 00:07:00 80.812
17 2016-08-22 00:08:00 NA
18 2016-08-22 00:08:00 80.937
19 2016-08-22 00:09:00 NA
20 2016-08-22 00:10:00 NA
21 2016-08-22 00:11:00 NA
22 2016-08-22 00:12:00 NA
23 2016-08-22 00:13:00 NA
24 2016-08-22 00:14:00 NA
25 2016-08-22 00:15:00 NA
> tail(merged.data)
TIME Water_Temperature
1444 2016-08-22 23:54:00 NA
1445 2016-08-22 23:55:00 NA
1446 2016-08-22 23:56:00 NA
1447 2016-08-22 23:57:00 NA
1448 2016-08-22 23:58:00 NA
1449 2016-08-22 23:59:00 NA
Does anyone has an idea whats going wrong?
EDIT:
Using the xts and zoo package now to do the job by doing:
library(xts)
library(zoo)
df1.zoo<-zoo(data[,-1],data[,1])
df2 <- as.data.frame(as.zoo(merge(as.xts(df1.zoo), as.xts(zoo(,seq(start(df1.zoo),end(df1.zoo),by="min"))))))
Very easy and effective!
Instead of merge use rbind which gives you an irregular time series without NAs to start with. If you really want a regular time series with a frequency of say 1 minute you can build a time based sequence as an index and merge it with your data after ( after using rbind) and fill the resulting NAs with na.locf. Hope this helps.
you can try merging with full_join from tidyverse
This works for me with two dataframes (daily values) sharing a column named date.
big_data<-my_data %>%
reduce(full_join, by="Date")

Subsetting seconds in xts/zoo

Is there a possiblity to subset seconds in xts?
2013-01-01 00:01:00 2.2560000
2013-01-01 00:02:00 2.3883333
2013-01-01 00:03:00 1.8450000
2013-01-01 00:04:00 1.6966667
2013-01-01 00:04:03 1.3100000
2013-01-01 00:05:00 0.8533333
I want to get the line not having :00 seconds in the End!
2013-01-01 00:04:03 1.3100000
Usually I would subset by Time [T09:00/T09:30] but right now I want the lines not having a timestamp with 00 for seconds. Thanks!
First, I have to get your example data into R (please use dput next time)
lines <- '2013-01-01 00:01:00 2.2560000
2013-01-01 00:02:00 2.3883333
2013-01-01 00:03:00 1.8450000
2013-01-01 00:04:00 1.6966667
2013-01-01 00:04:03 1.3100000
2013-01-01 00:05:00 0.8533333'
tmp <- read.table(text=lines)
x <- xts(tmp[, 3], as.POSIXct(paste(tmp[, 1], tmp[, 2])))
You can use the .indexsec function to extract rows where the second is (or isn't) 0.
x[.indexsec(x) == 0]
# [,1]
#2013-01-01 00:01:00 2.2560000
#2013-01-01 00:02:00 2.3883333
#2013-01-01 00:03:00 1.8450000
#2013-01-01 00:04:00 1.6966667
#2013-01-01 00:05:00 0.8533333
x[.indexsec(x) != 0]
# [,1]
#2013-01-01 00:04:03 1.31
Another idea would be to use the unexported xts:::startof function which is analogous to the endpoints function.
x[xts:::startof(x, "mins")]
# [,1]
#2013-01-01 00:01:00 2.2560000
#2013-01-01 00:02:00 2.3883333
#2013-01-01 00:03:00 1.8450000
#2013-01-01 00:04:00 1.6966667
#2013-01-01 00:05:00 0.8533333
Or, if you only one the row that does not end in 00, you can use negative subsetting:
x[-xts:::startof(x, "mins")]
# [,1]
#2013-01-01 00:04:03 1.31
Here's how to do it by merging with a zero width xts object that has the index that you want.
merge(xts(, seq(start(x), end(x), by="min")), x, all=FALSE)
# x
#2013-01-01 00:01:00 2.2560000
#2013-01-01 00:02:00 2.3883333
#2013-01-01 00:03:00 1.8450000
#2013-01-01 00:04:00 1.6966667
#2013-01-01 00:05:00 0.8533333

Resources