calculating differences in times, data grouped by rows - r

I have a data set in the following format
ID DATETIME VALUE
1 4/2/2012 10:00 300
1 5/2/2012 23:00 150
1 6/3/2012 10:00 650
2 1/2/2012 10:00 450
2 2/2/2012 13:00 240
3 6/5/2012 09:00 340
3 7/5/2012 23:00 240
I would like to first calculate the time difference from first instance per ID to each subsequent time.
ID DATETIME VALUE DIFTIME(days)
1 4/2/2012 10:00 300 0
1 5/2/2012 23:00 150 1.3
1 6/3/2012 10:00 650 33
2 1/2/2012 10:00 450 0
2 2/2/2012 13:00 240 1
3 6/5/2012 09:00 340 0
3 7/5/2012 23:00 240 1
And then I'd like to make this a wide format
ID 0 1 1.3 33
1 300 na 150 na 650
2 450 240 na na
3 340 240 na na

Here a solution using data.table and reshape2 packages:
library(data.table)
DT <- as.data.table(dat)
DT[, `:=`(DIFTIME, c(0, diff(as.Date(DATETIME)))), by = "ID"]
## ID VALUE DATETIME DIFTIME
## 1: 1 300 2012-02-04 10:00:00 0
## 2: 1 150 2012-02-05 23:00:00 1
## 3: 1 650 2012-03-06 10:00:00 30
## 4: 2 450 2012-02-01 10:00:00 0
## 5: 2 240 2012-02-02 13:00:00 1
## 6: 3 340 2012-05-06 09:00:00 0
## 7: 3 240 2012-05-07 23:00:00 1
library(reshape2)
dcast(formula = ID ~ DIFTIME, data = DT[, list(ID, DIFTIME, VALUE)])
## ID 0 1 30
## 1 1 300 150 650
## 2 2 450 240 NA
## 3 3 340 240 NA
data in handy format
Here my dat:
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L), DATETIME = structure(c(1328346000,
1328479200, 1331024400, 1328086800, 1328184000, 1336287600, 1336424400
), class = c("POSIXct", "POSIXt"), tzone = ""), VALUE = c(300L,
150L, 650L, 450L, 240L, 340L, 240L)), .Names = c("ID", "DATETIME",
"VALUE"), class = "data.frame", row.names = c(NA, 7L))

Related

Repeat rows then manipulate those rows using data table in R

Hi I am new to data table syntax in R (and R in general) and need help to repeat certain rows and incrementally increased them based on category.
My mock data table information is below:
> head(dt)
Time Values1 Values2 Values3 Category
1: 00:15:00 1 2 1.5 A
2: 00:30:00 3 4 2.5 A
3: 00:45:00 5 6 3.5 A
4: 01:00:00 7 8 4.5 A
5: 01:15:00 9 10 5.5 A
6: 01:30:00 11 12 6.5 A
> tail(dt)
Time Values1 Values2 Values3 Category
1: 22:45:00 182 181 92.5 B
2: 23:00:00 184 183 93.5 B
3: 23:15:00 186 185 94.5 B
4: 23:30:00 188 187 95.5 B
5: 23:45:00 190 189 96.5 B
6: 00:00:00 192 191 97.5 B
> str(dt)
Classes ‘data.table’ and 'data.frame': 192 obs. of 5 variables:
$ Time :Class 'ITime' int [1:192] 900 1800 2700 3600 4500 5400 6300 7200 8100 9000 ...
$ Values1 : int 1 3 5 7 9 11 13 15 17 19 ...
$ Values2 : int 2 4 6 8 10 12 14 16 18 20 ...
$ Values3 : num 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 ...
$ Category: chr "A" "A" "A" "A" ...
- attr(*, ".internal.selfref")=<externalptr>
If the Category is A, I want to extrapolate each value (highlighted yellow) in the Time column to one minute while the remaining columns would still have the same values. Note that, if the time is 00:15, then my extrapolated section would have time from both 00:01 to 00:14 and 00:16 to 00:29, as shown below:
---Goal---:
If the category is B, then the time extrapolation is 5 minutes.
The final result will have the original data with all the time extrapolations and no duplicated time values based on Category.
--- Thought Process----:
My strategy is to break up into categories A and B, somehow find ways to add the extrapolated time and append them back to the original data table.
So far, I know how to break up into categories A and B, come up with a function to add minutes to the as.ITime type Time column and repeat each row in Time column
add_minutes <- function(m) {
x <- m * 60
return(x)
}
A <- dt[Category == 'A']
B <- dt[Category == 'B']
A <- A[,list(freq=rep(1,14)), by =.(Time,Values1,Values2,Values3,Category)][,freq:=NULL]
However, I do not know how to combine add_minutes() function to those repeated rows to:
Reset the time for each original time value
.For example, if the original time is 00:30. I managed to repeat that line 14 times, then I want the 14 appearances of 00:30 to be a sequence from 00:31 to 00:44. If the original time is 00:45, then I want a sequence from 00:46 to 00:59, and so on.
Append this back to the original data table
Thank you in advance for your help!!
Unfortunately, the rolling join as suggested by pseudospin will not return the expected result because as.ITime("00:00:00") is part of the time series dt and thus will be rolled forward to the additional time steps at 00:01:00, 00:02:00, 00:03:00, etc. for Category A, or 00:05:00, 00:10:00 for Category B, resp. (Note that as.ITime("24:00:00") == as.ITime("00:00:00")).
The approach below
creates all required time steps completed_ts for each Category
right joins with dt which adds many NAs to the values columns
fills the missing values for each Category by last observation carried forward,
and fills the missing values at the top of each Category by next observation carried backward, finally.
completed_ts <- rbind(
data.table(Time = as.ITime(seq(1L, 1440L, 1L) * 60L), Category = "A"),
data.table(Time = as.ITime(seq(5L, 1440L, 5L) * 60L), Category = "B")
)
res <- dt[completed_ts, on = .(Time, Category)]
cols <- paste0("Values", 1:3)
res[, (cols) := lapply(.SD, nafill, type = "locf"), .SDcols = cols, by = Category]
res[, (cols) := lapply(.SD, nafill, type = "nocb"), .SDcols = cols, by = Category]
# print interesting parts of the result
res[Category == "A", .SD[c(1:16, .N - 16:0)]]
res[Category == "B", .SD[c(1:4, .N - 4:0)]]
Time Values1 Values2 Values3 Category
1: 00:01:00 1 2 1.5 A
2: 00:02:00 1 2 1.5 A
3: 00:03:00 1 2 1.5 A
4: 00:04:00 1 2 1.5 A
5: 00:05:00 1 2 1.5 A
6: 00:06:00 1 2 1.5 A
7: 00:07:00 1 2 1.5 A
8: 00:08:00 1 2 1.5 A
9: 00:09:00 1 2 1.5 A
10: 00:10:00 1 2 1.5 A
11: 00:11:00 1 2 1.5 A
12: 00:12:00 1 2 1.5 A
13: 00:13:00 1 2 1.5 A
14: 00:14:00 1 2 1.5 A
15: 00:15:00 1 2 1.5 A
16: 00:16:00 1 2 1.5 A
17: 23:44:00 187 188 94.5 A
18: 23:45:00 189 190 95.5 A
19: 23:46:00 189 190 95.5 A
20: 23:47:00 189 190 95.5 A
21: 23:48:00 189 190 95.5 A
22: 23:49:00 189 190 95.5 A
23: 23:50:00 189 190 95.5 A
24: 23:51:00 189 190 95.5 A
25: 23:52:00 189 190 95.5 A
26: 23:53:00 189 190 95.5 A
27: 23:54:00 189 190 95.5 A
28: 23:55:00 189 190 95.5 A
29: 23:56:00 189 190 95.5 A
30: 23:57:00 189 190 95.5 A
31: 23:58:00 189 190 95.5 A
32: 23:59:00 189 190 95.5 A
33: 00:00:00 191 192 96.5 A
Time Values1 Values2 Values3 Category
Time Values1 Values2 Values3 Category
1: 00:05:00 2 1 2.5 B
2: 00:10:00 2 1 2.5 B
3: 00:15:00 2 1 2.5 B
4: 00:20:00 2 1 2.5 B
5: 23:40:00 188 187 95.5 B
6: 23:45:00 190 189 96.5 B
7: 23:50:00 190 189 96.5 B
8: 23:55:00 190 189 96.5 B
9: 00:00:00 192 191 97.5 B
Note that data.tables's nafill() function only supports double and integer data types currently. If you need to fill other data types please see zoo::na.locf().
Reproducible data
library(data.table)
dtA <- data.table(Time = seq(as.ITime("00:15:00"), by = 900L, length.out = 96L),
Values1 = seq(1L, by = 2L, length.out = 96L),
Values2 = seq(2L, by = 2L, length.out = 96L),
Values3 = seq(1.5, by = 1.0, length.out = 96L),
Category = rep("A", 96L))
dtB <- data.table(Time = seq(as.ITime("00:15:00"), by = 900L, length.out = 96L),
Values1 = seq(to = 192L, by = 2L, length.out = 96L),
Values2 = seq(to = 191L, by = 2L, length.out = 96L),
Values3 = seq(to = 97.5, by = 1.0, length.out = 96L),
Category = rep("B", 96L))
dt <- rbind(dtA, dtB)
The magical rolling join in data.table.
desire <- rbind(
data.table(Category = "A", Time = as.ITime(seq(1, 1440, 1)*60)),
data.table(Category = "B", Time = as.ITime(seq(5, 1440, 5)*60))
)
dt[desire, on = c('Category','Time'), roll = TRUE, rollends = c(TRUE, TRUE)]

Cross join two dataframes by key column using condition in R

I have two dataframes.
mydata1=structure(list(ID_WORKES = c(58005854L, 58005854L, 58002666L,
58002666L), ID_SP_NAR = c(463L, 1951L, 21L, 465L), KOD_DEPO = c(3786L,
3786L, 1439L, 1439L), KOD_DOR = c(58L, 58L, 92L, 92L), COLUMN_MASH = c(6L,
6L, 5L, 5L), prop_violations = structure(c(1L, 2L, 2L, 2L), .Label = c("0.2",
"1"), class = "factor"), mash_score = c(0L, 2L, 2L, 2L)), .Names = c("ID_WORKES",
"ID_SP_NAR", "KOD_DEPO", "KOD_DOR", "COLUMN_MASH", "prop_violations",
"mash_score"), class = "data.frame", row.names = c(NA, -4L))
mydata2=structure(list(ID_SP_NAR = c(463L, 1951L, 21L, 465L, 500L, 600L
)), .Names = "ID_SP_NAR", class = "data.frame", row.names = c(NA,
-6L))
i need crossjoin merge these dataframes by ID_SP_NAR. Mydata2 contatins only key variable ID_SP_NAR.
I need join this in such a way that if the id_workers does not have any codes from the ID_SP_NAR from mydata2, then these code are inserted into the dataset, but for them in variables prop_violations and mash_score must be inserted zero values.
I.E. SP_ID_NAR in mydata2 has such values
ID_SP_NAR
463
1951
21
465
500
600
ID_workes =58005854 has
463,
1951
but another not have.
and
ID_workes =58002666 has 21 and 465 and not anonter!
So desired output after cross join
ID_WORKES ID_SP_NAR KOD_DEPO KOD_DOR COLUMN_MASH prop_violations mash_score
1 58005854 463 3786 58 6 0.2 0
2 58005854 1951 3786 58 6 1 2
3 58005854 21 3786 58 6 0 0
4 58005854 465 3786 58 6 0 0
5 58005854 500 3786 58 6 0 0
6 58005854 600 3786 58 6 0 0
7 58002666 21 1439 92 5 1 2
8 58002666 465 1439 92 5 1 2
9 58002666 500 1439 92 5 0 0
10 58002666 600 1439 92 5 0 0
11 58002666 463 1439 92 5 0 0
12 58002666 1951 1439 92 5 0 0
KOD_DEPO,KOD_DOR,COLUMN_MASH have fixed value , it must be saved too.
How to do that?
merge(mydata1,mydata2, by = ID_SP_NAR) is not working( i try use via left join doesn't work), it doesn't insert zeros as i want .
We could use complete from tidyr to expand the dataset based on the 'ID_WORKES' and the valuse of 'ID_SP_NAR' in the second dataset
library(tidyverse)
mydata1 %>%
mutate_if(is.factor, as.character) %>%
complete(ID_WORKES, ID_SP_NAR = mydata2$ID_SP_NAR,
fill = list(prop_violations = '0', mash_score = 0)) %>%
fill(3:5)
# A tibble: 12 x 7
# ID_WORKES ID_SP_NAR KOD_DEPO KOD_DOR COLUMN_MASH prop_violations mash_score
# <int> <int> <int> <int> <int> <chr> <dbl>
# 1 58002666 21 1439 92 5 1 2
# 2 58002666 463 1439 92 5 0 0
# 3 58002666 465 1439 92 5 1 2
# 4 58002666 500 1439 92 5 0 0
# 5 58002666 600 1439 92 5 0 0
# 6 58002666 1951 1439 92 5 0 0
# 7 58005854 21 1439 92 5 0 0
# 8 58005854 463 3786 58 6 0.2 0
# 9 58005854 465 3786 58 6 0 0
#10 58005854 500 3786 58 6 0 0
#11 58005854 600 3786 58 6 0 0
#12 58005854 1951 3786 58 6 1 2

Lookup Last and First Value for unique id based on datetime in R

I have below mentioned dataframe:
ID Date Value1 Value2 Value3 Value4
XX-12 2018-02-01 15:48:15 XXC 1000 15.45 18
XX-12 2018-02-05 20:18:43 XTR 1500 15.45 12
XX-13 2018-02-03 19:14:17 XRR 1900 18.25 10
XX-13 2018-02-03 22:42:18 XTC 1600 20.25 12
XX-14 2018-02-04 23:14:45 XRY 1100 10.50 10
XX-15 2018-02-05 21:16:48 XTC 1400 20.25 14
From the above dataframe, I want to derive Initial value (I_Value) and final value (F_Value) based on Datetime and difference between initial and final value.
Required Output:
ID I_Value1 F_Value1 I_Value2 F_Value2 Diff2 I_vaule3 F_Value3 Diff3 I_Value4 F_Value4 Diff4
XX-12 XXC XTR 1000 1500 500 15.45 15.45 0 18 12 -6
XX-13 XRR XTC 1900 1600 -300 18.25 20.25 2 10 12 2
XX-14 XRY XTC 1100 1100 0 10.50 10.50 0 10 10 0
XX-15 XTC XTC 1400 1400 0 20.25 20.25 0 14 14 0
Using dplyr
library(dplyr)
df %>%
mutate(Date = as.POSIXct(Date, "%Y-%m-%d %H:%M:%S", tz = "GMT")) %>%
arrange(ID, Date) %>%
group_by(ID) %>%
summarise_at(vars(Value1:Value4), funs(I = first(.),
F = last(.),
Diff = ifelse(is.character(.), NA, last(.) - first(.)))) %>%
select_if(~!all(is.na(.)))
gives
ID Value1_I Value2_I Value3_I Value4_I Value1_F Value2_F Value3_F Value4_F Value2_Diff Value3_Diff Value4_Diff
<chr> <chr> <int> <dbl> <int> <chr> <int> <dbl> <int> <int> <dbl> <int>
1 XX-12 XXC 1000 15.4 18 XTR 1500 15.4 12 500 0 -6
2 XX-13 XRR 1900 18.2 10 XTC 1600 20.2 12 -300 2.00 2
3 XX-14 XRY 1100 10.5 10 XRY 1100 10.5 10 0 0 0
4 XX-15 XTC 1400 20.2 14 XTC 1400 20.2 14 0 0 0
Sample data:
df <- structure(list(ID = c("XX-12", "XX-12", "XX-13", "XX-13", "XX-14",
"XX-15"), Date = c("2018-02-01 15:48:15", "2018-02-05 20:18:43",
"2018-02-03 19:14:17", "2018-02-03 22:42:18", "2018-02-04 23:14:45",
"2018-02-05 21:16:48"), Value1 = c("XXC", "XTR", "XRR", "XTC",
"XRY", "XTC"), Value2 = c(1000L, 1500L, 1900L, 1600L, 1100L,
1400L), Value3 = c(15.45, 15.45, 18.25, 20.25, 10.5, 20.25),
Value4 = c(18L, 12L, 10L, 12L, 10L, 14L)), .Names = c("ID",
"Date", "Value1", "Value2", "Value3", "Value4"), class = "data.frame", row.names = c(NA,
-6L))

Date difference between end date to start date

I have a data alooks like below.
id from data to date
1 2015-03-09 2015-03-14
2 2015-02-22 2015-02-24
2 2015-05-06 2015-05-17
3 2015-02-12 2015-02-16
4 2015-03-10 2015-03-16
4 2015-03-22 2015-04-07
4 2015-06-07 2015-07-07
4 2015-07-06 2015-07-07
4 2015-08-02 2015-08-07
I want to create a seperate variable which is the difference between the to date and the next from date grouped by id.
So the first time of the id will be NA.I tried the below method based on the other answer in stackoverflow and I could not
achieve that.
library(data.table)
chf1 = data.table(id = chf$id,from date = chf$f.date,to_date = chf$t.date)
setkey(chf1,id)
chf1[,diff:=c(NA,difftime(from_date, to_date, units = "days")),by=id]
The output look like
id from_date to_date difference
1 2015-03-09 2015-03-14 NA
2 2015-02-22 2015-02-24 NA
2 2015-05-06 2015-05-17 71
3 2015-02-12 2015-02-16 NA
4 2015-03-10 2015-03-16 NA
4 2015-03-22 2015-04-07 6
4 2015-06-07 2015-06-10 64
4 2015-07-06 2015-07-07 26
4 2015-08-02 2015-08-07 26
There are three issues in the code
1) chf1$from_date, chf1$to_date gets the whole column, so there is no effect of grouping by 'id'
2) difftime gives output with the same length as the initial column length.
3) As difftime takes the difference between each element of 'from_date' with corresponding element of 'to_date', there is no need for by = id
Therefore, the code can be
chf1[, diff1:=difftime(from_date, to_date, units = "days")]
chf1
# id from_date to_date diff1
#1: 1 2015-03-09 2015-03-14 -5 days
##2: 2 2015-02-22 2015-02-24 -2 days
#3: 2 2015-05-06 2015-05-17 -11 days
#4: 3 2015-02-12 2015-02-16 -4 days
#5: 4 2015-03-10 2015-03-16 -6 days
#6: 4 2015-03-22 2015-04-07 -16 days
#7: 4 2015-06-07 2015-07-07 -30 days
#8: 4 2015-07-06 2015-07-07 -1 days
#9: 4 2015-08-02 2015-08-07 -5 days
Based on the description in OP's code, if we need to get the difference between the next value of 'from_date', after grouping by 'id', use the difftime on the shifted 'from_date' with that of 'to_date' and assign (:= it to 'diff1'.
chf1[, diff1 := difftime(shift(from_date, type = "lead"), to_date,
units = "days") , by = id]
chf1
# id from_date to_date diff1
#1: 1 2015-03-09 2015-03-14 NA days
#2: 2 2015-02-22 2015-02-24 71 days
#3: 2 2015-05-06 2015-05-17 NA days
#4: 3 2015-02-12 2015-02-16 NA days
#5: 4 2015-03-10 2015-03-16 6 days
#6: 4 2015-03-22 2015-04-07 61 days
#7: 4 2015-06-07 2015-07-07 -1 days
#8: 4 2015-07-06 2015-07-07 26 days
#9: 4 2015-08-02 2015-08-07 NA days
Or it could be
chf1[, diff1 := difftime(from_date, shift(to_date), units = "days"), by = id]
data
chf <- structure(list(id = c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 4L),
f.date = structure(c(16503,
16488, 16561, 16478, 16504, 16516, 16593, 16622, 16649), class = "Date"),
t.date = structure(c(16508, 16490, 16572, 16482, 16510, 16532,
16623, 16623, 16654), class = "Date")), .Names = c("id",
"f.date", "t.date"), row.names = c(NA, -9L), class = "data.frame")
chf1 = data.table(id = chf$id,from_date = chf$f.date,to_date = chf$t.date)

index grouped columns in data frame

I have a data frame as follow
time site val
2014-09-01 00:00:00 2001 1
2014-09-01 00:15:00 2001 0
2014-09-01 00:30:00 2001 2
2014-09-01 00:45:00 2001 0
2014-09-01 00:00:00 2002 1
2014-09-01 00:15:00 2002 0
2014-09-01 00:30:00 2002 2
2014-09-02 00:45:00 2001 0
2014-09-02 00:00:00 2001 1
2014-09-02 00:15:00 2001 0
2014-09-02 00:30:00 2001 2
2014-09-02 00:45:00 2001 0
2014-09-02 00:00:00 2002 1
2014-09-02 00:15:00 2002 0
2014-09-02 00:30:00 2002 2
2014-09-02 00:45:00 2001 0
I'd like to be able group it by time and site then add a new variable that will consist of the occurence index of the group
time site val h
2014-09-01 00:00:00 2001 1 1
2014-09-01 00:15:00 2001 0 2
2014-09-01 00:30:00 2001 2 3
2014-09-01 00:45:00 2001 0 4
2014-09-01 00:00:00 2002 1 1
2014-09-01 00:15:00 2002 0 2
2014-09-01 00:30:00 2002 2 3
2014-09-02 00:45:00 2002 0 4
2014-09-02 00:00:00 2001 1 1
2014-09-02 00:15:00 2001 0 2
2014-09-02 00:30:00 2001 2 3
2014-09-02 00:45:00 2001 0 4
2014-09-02 00:00:00 2002 1 1
2014-09-02 00:15:00 2002 0 2
2014-09-02 00:30:00 2002 2 3
2014-09-02 00:45:00 2001 0 4
df <- structure(list(time = structure(c(1409522400, 1409523300, 1409524200,
1409525100, 1409522400, 1409523300, 1409524200, 1409611500, 1409608800,
1409609700, 1409610600, 1409611500, 1409608800, 1409609700, 1409610600,
1409611500), class = c("POSIXct", "POSIXt"), tzone = ""), site = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("2001",
"2002"), class = "factor"), val = c(1L, 0L, 2L, 0L, 1L, 0L, 2L,
0L, 1L, 0L, 2L, 0L, 1L, 0L, 2L, 0L)), .Names = c("time", "site",
"val"), row.names = c(NA, -16L), class = "data.frame")
what are my possibilities in r to achieve this
thanks
Using dplyr. First we create a column id extracting the day from the date (column time). Then we group by site and id, and add a new variable counter counting the number of occurrences by those two groups.
df$id <- as.factor(format(df$time,'%d'))
library(dplyr)
df %>% group_by(site, id) %>% mutate(counter = row_number())
Output:
time site val id counter
(time) (fctr) (int) (fctr) (int)
1 2014-09-01 00:00:00 2001 1 01 1
2 2014-09-01 00:15:00 2001 0 01 2
3 2014-09-01 00:30:00 2001 2 01 3
4 2014-09-01 00:45:00 2001 0 01 4
5 2014-09-01 00:00:00 2002 1 01 1
6 2014-09-01 00:15:00 2002 0 01 2
7 2014-09-01 00:30:00 2002 2 01 3
8 2014-09-02 00:45:00 2001 0 02 1
9 2014-09-02 00:00:00 2001 1 02 2
10 2014-09-02 00:15:00 2001 0 02 3
11 2014-09-02 00:30:00 2001 2 02 4
12 2014-09-02 00:45:00 2001 0 02 5
13 2014-09-02 00:00:00 2002 1 02 1
14 2014-09-02 00:15:00 2002 0 02 2
15 2014-09-02 00:30:00 2002 2 02 3
16 2014-09-02 00:45:00 2001 0 02 6
We can use ave
df$h <- with(df, ave(val, cumsum(c(TRUE,diff(time)< 0)), FUN= seq_along))
df
# time site val h
#1 2014-09-01 03:30:00 2001 1 1
#2 2014-09-01 03:45:00 2001 0 2
#3 2014-09-01 04:00:00 2001 2 3
#4 2014-09-01 04:15:00 2001 0 4
#5 2014-09-01 03:30:00 2002 1 1
#6 2014-09-01 03:45:00 2002 0 2
#7 2014-09-01 04:00:00 2002 2 3
#8 2014-09-02 04:15:00 2001 0 4
#9 2014-09-02 03:30:00 2001 1 1
#10 2014-09-02 03:45:00 2001 0 2
#11 2014-09-02 04:00:00 2001 2 3
#12 2014-09-02 04:15:00 2001 0 4
#13 2014-09-02 03:30:00 2002 1 1
#14 2014-09-02 03:45:00 2002 0 2
#15 2014-09-02 04:00:00 2002 2 3
#16 2014-09-02 04:15:00 2001 0 4
NOTE: This is based on the expected output showed in the OP's post. I understand that 'site' is also described as the grouping variable, but then the expected output should be something else.

Resources