Shorter substitute for tab_row_group when rows are large

Shorter substitute for tab_row_group when rows are large - r

I love gt package of R but I am having trouble in coming up with a crisp code for row grouping that is suitable for large tables and where row group labels are unknown.
Consider this toy example, and since this is a small data.table it looks ok.
library(data.table)
library(magrittr)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:data.table':
#>
#> hour, isoweek, mday, minute, month, quarter, second, wday, week,
#> yday, year
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(gt)
# create a toy data.table
dt <- data.table(datetime = seq(ymd_hm(202205100800),by = "5 hours",length.out = 15))[order(datetime)]
dt[,date:=as_date(datetime)]
dt[,time:=format(datetime,"%H:%M")]
dt[,values:=seq(1000,by = 10,length.out=15)]
# Here's how my toy data.table looks like:
print(dt)
#> datetime date time values
#> 1: 2022-05-10 08:00:00 2022-05-10 08:00 1000
#> 2: 2022-05-10 13:00:00 2022-05-10 13:00 1010
#> 3: 2022-05-10 18:00:00 2022-05-10 18:00 1020
#> 4: 2022-05-10 23:00:00 2022-05-10 23:00 1030
#> 5: 2022-05-11 04:00:00 2022-05-11 04:00 1040
#> 6: 2022-05-11 09:00:00 2022-05-11 09:00 1050
#> 7: 2022-05-11 14:00:00 2022-05-11 14:00 1060
#> 8: 2022-05-11 19:00:00 2022-05-11 19:00 1070
#> 9: 2022-05-12 00:00:00 2022-05-12 00:00 1080
#> 10: 2022-05-12 05:00:00 2022-05-12 05:00 1090
#> 11: 2022-05-12 10:00:00 2022-05-12 10:00 1100
#> 12: 2022-05-12 15:00:00 2022-05-12 15:00 1110
#> 13: 2022-05-12 20:00:00 2022-05-12 20:00 1120
#> 14: 2022-05-13 01:00:00 2022-05-13 01:00 1130
#> 15: 2022-05-13 06:00:00 2022-05-13 06:00 1140
# Now let's create a table using the gt package and add row groups.
# We will group on date.
dt %>%
gt %>%
tab_row_group(label = "May 10",id = "may10",rows = date==ymd(20220510)) %>%
tab_row_group(label = "May 11",id = "may11",rows = date==ymd(20220511)) %>%
tab_row_group(label = "May 12",id = "may12",rows = date==ymd(20220512)) %>%
row_group_order(groups = c("may10","may11","may12")) %>%
cols_hide(columns = c(datetime,date))
But in real life there may be hundreds of dates. And the dates are not known in advance. If I use the current method of tab_row_group() in gt, the code will become unwieldy
Is there a way to shorten the code and automate the row groupings?

you can use groupname_col inside the gt function
dt %>%
gt(groupname_col = c("date")) %>%
cols_hide(columns = c(datetime,date))

Related

Is it possible to get the specific index of occurence when working with period.apply?

Assuming I worked with meteorological observations and wanted to know not only the daily maximum value but also the relevant timestamp, describing when this value was observed, is it possible to accomplish this without significant overhead, e.g. by setting some sort of parameter?
library(xts)
set.seed(42)
# init data
datetimes <- seq(from = as.POSIXct("2022-01-01"),
to = as.POSIXct("2022-01-08"),
by = "10 mins")
values <- length(datetimes) |> runif() |> sin()
data <- xts(x = values,
order.by = datetimes)
#
ep <- endpoints(data, "days")
data_max <- period.apply(data, INDEX = ep, FUN = max)
head(data_max)
#> [,1]
#> 2022-01-01 23:50:00 0.8354174
#> 2022-01-02 23:50:00 0.8396034
#> 2022-01-03 23:50:00 0.8364624
#> 2022-01-04 23:50:00 0.8376930
#> 2022-01-05 23:50:00 0.8392988
#> 2022-01-06 23:50:00 0.8372780
Obviously, this would not work with summarizing functions like mean and median where you would want to specify the interval width considered, but when working with e.g. min and max, how would I proceed when I wanted to know the exact index of the value in question observed.
At the moment, I'm just looping over my xts subsets to determine the relevant index, but maybe there is a more elegant approach, maybe even an argument when using period.apply() I haven't noticed to get this information.
sub <- "2022-01-04"
ind <- which(data[sub] == max(data[sub]))
data[sub][ind]
#> [,1]
#> 2022-01-04 23:20:00 0.837693
My desired output would look like this:
#> [,1]
#> 2022-01-01 03:40:00 0.8354174
#> 2022-01-02 15:00:00 0.8396034
#> 2022-01-03 05:10:00 0.8364624
#> 2022-01-04 23:20:00 0.8376930
#> 2022-01-05 02:50:00 0.8392988
#> 2022-01-06 06:40:00 0.8372780
Thanks a lot in advance!

1) Use ave and subset that down to the rows with the maxima. The last line takes the first maximum in each day and could be omitted if there are no duplicate maxima within a day such as for a strictly increasing input or if all maxima were desired. The format is to avoid time zone problems.
library(magrittr)
library(xts)
data %>%
subset(ave(., as.Date(format(time)), FUN = max) == .) %>%
subset(ave(., as.Date(format(time)), FUN = seq_along) == 1)
## [,1]
## 2022-01-01 03:40:00 0.8354174
## 2022-01-02 15:00:00 0.8396034
## 2022-01-03 05:10:00 0.8364624
## 2022-01-04 23:20:00 0.8376930
## 2022-01-05 02:50:00 0.8392988
## 2022-01-06 06:40:00 0.8372780
## 2022-01-07 08:40:00 0.8406546
## 2022-01-08 00:00:00 0.2335385
2) Another possibility is to aggregate using tapply with which.max to get the time of the first maximum in each date and then subset the data to those times.
data %>%
subset(time %in% tapply(time, as.Date(format(time)), \(x) x[which.max(.[x])] ))
## [,1]
## 2022-01-01 03:40:00 0.8354174
## 2022-01-02 15:00:00 0.8396034
## 2022-01-03 05:10:00 0.8364624
## 2022-01-04 23:20:00 0.8376930
## 2022-01-05 02:50:00 0.8392988
## 2022-01-06 06:40:00 0.8372780
## 2022-01-07 08:40:00 0.8406546
## 2022-01-08 00:00:00 0.2335385

Using your data and inserting some NAs
set.seed(42)
datetimes <- seq(from = as.POSIXct('2022-01-01'),
to = as.POSIXct('2022-01-09'), by = '10 mins')
values <- length(datetimes) |> runif() |> sin()
ep <- endpoints(data, 'days')
values[which(1:length(values) %% 10 == 0)] <- NA
data <- xts(x = values,
order.by = datetimes)
data_max <- period.apply(data, INDEX=ep, FUN = max, na.rm = TRUE)
> data[which(data %in% data_max == TRUE)]
[,1]
2022-01-01 03:40:00 0.8354174
2022-01-02 15:00:00 0.8396034
2022-01-03 05:10:00 0.8364624
2022-01-04 23:20:00 0.8376930
2022-01-05 02:50:00 0.8392988
2022-01-06 06:40:00 0.8372780
2022-01-07 08:40:00 0.8406546
2022-01-08 06:40:00 0.8411353
2022-01-09 00:00:00 0.8247520
Seems like period.apply is the way to go, and index of 'moment' can be found. And helps to use the same time period 01 -> 09, not 10.

Is there a way to group data according to time in R?

I'm working with trip ticket data and it includes a column with dates and times. I'm want to group trips according to Morning(05:00 - 10:59), Lunch(11:00-12:59), Afternoon(13:00-17:59), Evening(18:00-23:59), and Dawn/Graveyard(00:00-04:59) and then count the number of trips (by means of counting the unique values in the trip_id column) for each of those categories.
Only I don't know how to group/summarize according to time values. Is this possible in R?
trip_id start_time end_time day_of_week
1 CFA86D4455AA1030 2021-03-16 08:32:30 2021-03-16 08:36:34 Tuesday
2 30D9DC61227D1AF3 2021-03-28 01:26:28 2021-03-28 01:36:55 Sunday
3 846D87A15682A284 2021-03-11 21:17:29 2021-03-11 21:33:53 Thursday
4 994D05AA75A168F2 2021-03-11 13:26:42 2021-03-11 13:55:41 Thursday
5 DF7464FBE92D8308 2021-03-21 09:09:37 2021-03-21 09:27:33 Sunday

Here's a solution with hour() and case_when().
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
trip <- tibble(start_time = mdy_hm("1/1/2022 1:00") + minutes(seq(0, 700, 15)))
trip <- trip %>%
mutate(
hr = hour(start_time),
time_of_day = case_when(
hr >= 5 & hr < 11 ~ "morning",
hr >= 11 & hr < 13 ~ "afternoon",
TRUE ~ "fill in the rest yourself :)"
)
)
print(trip)
#> # A tibble: 47 x 3
#> start_time hr time_of_day
#> <dttm> <int> <chr>
#> 1 2022-01-01 01:00:00 1 fill in the rest yourself :)
#> 2 2022-01-01 01:15:00 1 fill in the rest yourself :)
#> 3 2022-01-01 01:30:00 1 fill in the rest yourself :)
#> 4 2022-01-01 01:45:00 1 fill in the rest yourself :)
#> 5 2022-01-01 02:00:00 2 fill in the rest yourself :)
#> 6 2022-01-01 02:15:00 2 fill in the rest yourself :)
#> 7 2022-01-01 02:30:00 2 fill in the rest yourself :)
#> 8 2022-01-01 02:45:00 2 fill in the rest yourself :)
#> 9 2022-01-01 03:00:00 3 fill in the rest yourself :)
#> 10 2022-01-01 03:15:00 3 fill in the rest yourself :)
#> # ... with 37 more rows
trips <- trip %>%
count(time_of_day)
print(trips)
#> # A tibble: 3 x 2
#> time_of_day n
#> <chr> <int>
#> 1 afternoon 7
#> 2 fill in the rest yourself :) 16
#> 3 morning 24
Created on 2022-03-21 by the reprex package (v2.0.1)

do() superseded! Alternative is to use across(), nest_by(), and summarise, how?

I'm doing something quite simple. Given a dataframe of start dates and end dates for specific periods I want to expand/create a full sequence for each period binned by week (with the factor for each row), then output this in a single large dataframe.
For instance:
library(tidyverse)
library(lubridate)
# Dataset
start_dates = ymd_hms(c("2019-05-08 00:00:00",
"2020-01-17 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00"), tz = "UTC")
end_dates = ymd_hms(c( "2019-10-24 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00",
"2022-01-19 00:00:00"), tz = "UTC")
df1 = data.frame(studying = paste0("period",seq(1:7),sep = ""),start_dates,end_dates)
It was suggested to me to use do(), which currently works fine but I hate it when things are superseded. I also have a way of doing it using map2. But reading the file (https://dplyr.tidyverse.org/reference/do.html) suggests you can use nest_by(), across() and summarise() to do the same job as do(), how would I go about getting same result? I've tried a lot of things but I just can't seem to get it.
# do() way to do it
df1 %>%
group_by(studying) %>%
do(data.frame(week=seq(.$start_dates,.$end_dates,by="1 week")))
# transmute() way to do it
df1 %>%
transmute(weeks = map2(start_dates,end_dates, seq, by = "1 week"), studying)
%>% unnest(cols = c(weeks))

As the documentation of ?do suggests, we can now use summarise and replace the . with across():
library(tidyverse)
library(lubridate)
df1 %>%
group_by(studying) %>%
summarise(week = seq(across()$start_dates,
across()$end_dates,
by = "1 week"))
#> `summarise()` has grouped output by 'studying'. You can override using the
#> `.groups` argument.
#> # A tibble: 134 x 2
#> # Groups: studying [7]
#> studying week
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
Created on 2022-01-19 by the reprex package (v0.3.0)

You can also use tidyr::complete:
df1 %>%
group_by(studying) %>%
complete(start_dates = seq(from = start_dates, to = end_dates, by = "1 week")) %>%
select(-end_dates, weeks = start_dates)
# A tibble: 134 x 2
# Groups: studying [7]
studying weeks
<chr> <dttm>
1 period1 2019-05-08 00:00:00
2 period1 2019-05-15 00:00:00
3 period1 2019-05-22 00:00:00
4 period1 2019-05-29 00:00:00
5 period1 2019-06-05 00:00:00
6 period1 2019-06-12 00:00:00
7 period1 2019-06-19 00:00:00
8 period1 2019-06-26 00:00:00
9 period1 2019-07-03 00:00:00
10 period1 2019-07-10 00:00:00
# ... with 124 more rows

Although marked Experimental the help file for group_modify does say that
‘group_modify()’ is an evolution of ‘do()’
and, in fact, the code for the example in the question using group_modify is nearly the same as with do.
# with group_modify
df2 <- df1 %>%
group_by(studying) %>%
group_modify(~ data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
# with do
df0 <- df1 %>%
group_by(studying) %>%
do(data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
identical(df2, df0)
## [1] TRUE

Not sure if this exactly what you are looking for, but here is my attempt with rowwise and unnest
df1 %>%
rowwise() %>%
mutate(week = list(seq(start_dates, end_dates, by = "1 week"))) %>%
select(studying, week) %>%
unnest(cols = c(week))

Another approach:
library(tidyverse)
df1 %>%
group_by(studying) %>%
summarise(df = tibble(weeks = seq(start_dates, end_dates, by = 'week'))) %>%
unnest(df)
#> `summarise()` has grouped output by 'studying'. You can override using the `.groups` argument.
#> # A tibble: 134 × 2
#> # Groups: studying [7]
#> studying weeks
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
Created on 2022-01-20 by the reprex package (v2.0.1)

Find the corresponding time for min and max for a day

Below is what my data looks like. I need to find the max and min temp for each day as well as the corresponding temperature.
Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00
This code I tried for min and max temp for daily.
library(xts)
read.csv("hourly1.csv", header = T) -> hourly1
xts(hourly1$Temp, as.Date(hourly1$date)) -> temp_date1
apply.daily(temp_date1, min) -> mintemp1_date
apply.daily(temp_date1, max) -> maxtemp1_date
I need help regarding how to find the time of day for min and max temp

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dataset <- read.table(text = 'Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00',
header = TRUE,
stringsAsFactors = FALSE)
dataset %>%
group_by(date) %>%
summarise(min_temp = min(Temp),
min_temp_time = time[which.min(x = Temp)],
max_temp = max(Temp),
max_temp_time = time[which.max(x = Temp)])
#> # A tibble: 2 x 5
#> date min_temp min_temp_time max_temp max_temp_time
#> <chr> <dbl> <chr> <dbl> <chr>
#> 1 01-01-79 280. 21:00:00 295. 09:00:00
#> 2 02-01-79 279. 00:00:00 296. 09:00:00
Created on 2019-06-15 by the reprex package (v0.3.0)
Hope this helps.

Try the dplyr package.
df <- structure(list(Temp = c(280.9876771, 291.9695498, 294.9583426,
290.2357847, 286.2944531, 282.9282138, 280.326689, 279.2551605,
281.3981824, 293.076125, 295.8072204),
date = c("01-01-79", "01-01-79",
"01-01-79", "01-01-79", "01-01-79", "01-01-79", "01-01-79", "02-01-79",
"02-01-79", "02-01-79", "02-01-79"),
time = c("03:00:00", "06:00:00", "09:00:00", "12:00:00", "15:00:00", "18:00:00", "21:00:00", "00:00:00",
"03:00:00", "06:00:00", "09:00:00")),
row.names = c(NA, -11L),
class = c("data.frame"))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df %>%
group_by(date)%>%
slice(which.max(Temp))
#> # A tibble: 2 x 3
#> # Groups: date [2]
#> Temp date time
#> <dbl> <chr> <chr>
#> 1 295. 01-01-79 09:00:00
#> 2 296. 02-01-79 09:00:00
df %>%
group_by(date)%>%
slice(which.min(Temp))
#> # A tibble: 2 x 3
#> # Groups: date [2]
#> Temp date time
#> <dbl> <chr> <chr>
#> 1 280. 01-01-79 21:00:00
#> 2 279. 02-01-79 00:00:00
Created on 2019-06-15 by the reprex package (v0.3.0)

a data.table + lubridate solution
# load libraries
library(data.table)
library(lubridate)
# load data
dt <- fread(" Temp date time
280.9876771 01-01-79 03:00:00
291.9695498 01-01-79 06:00:00
294.9583426 01-01-79 09:00:00
290.2357847 01-01-79 12:00:00
286.2944531 01-01-79 15:00:00
282.9282138 01-01-79 18:00:00
280.326689 01-01-79 21:00:00
279.2551605 02-01-79 00:00:00
281.3981824 02-01-79 03:00:00
293.076125 02-01-79 06:00:00
295.8072204 02-01-79 09:00:00")
# Convert date - time values to real dates:
dt[, date2 := dmy_hms(paste(date, time, sep = " "))]
# find the date - time for max temp:
dt[, date2[which(Temp == max(Temp))], by = floor_date(date2, "days")]
# find the date - time for min temp:
dt[, date2[which(Temp == min(Temp))], by = floor_date(date2, "days")]

Thank You Guys for the help. But I have 116881 entries.
So I tried the index command in R. This fetched me the corresponding id.
index(hourly1)[hourly1$Temp %in% maxtemp1_date] -> max_id
index(hourly1)[hourly1$Temp %in% mintemp1_date] -> min_id
Then I used the vlookup command in Excel to get the desired solution.

In data.table
dt[, x.value.min := frollapply(x = x, n = 2, min, fill = NA, align = "right", na.rm =TRUE), by = ID]

How to convert data captured every 10 min interval into 15 min interval data

I have a dataframe with below data ( Average of the values of timestamp 7.50 and 7.40 should be my value of A for time Stamp 7.45)
Date_Time | A
7/28/2017 8:00| 443.75
7/28/2017 7:50| 440.75
7/28/2017 7:45| NA
7/28/2017 7:40| 447.5
7/28/2017 7:30| 448.75
7/28/2017 7:20| 444.5
7/28/2017 7:15| NA
7/28/2017 7:10| 440.25
7/28/2017 7:00| 447.5
I want it to transform into 15 min interval something like below using mean:
Date / Time | Object Value
7/28/2017 8:00| 465
7/28/2017 7:45| 464.875
7/28/2017 7:30| 464.75
7/28/2017 7:15| 464.875
7/28/2017 7:00| 465

Updat
The OP changes his or her desired output. Since I have no time to update my answer, I will leave my answer as it is. See my comment in the original post to see how to use na.interpolation to fill in the missing values.
Original Post
This solution assumes you calculated the average based on the average values in 8:00, 7:30, and 7:00.
library(dplyr)
library(tidyr)
library(lubridate)
library(imputeTS)
dt2 <- dt %>%
mutate(Date.Time = mdy_hm(Date.Time)) %>%
filter(Date.Time %in% seq(min(Date.Time), max(Date.Time), by = "15 min")) %>%
complete(Date.Time = seq(min(Date.Time), max(Date.Time), by = "15 min")) %>%
mutate(Object.Value = na.interpolation(Object.Value)) %>%
fill(Object.Name) %>%
arrange(desc(Date.Time))
dt2
# A tibble: 5 x 3
Date.Time Object.Name Object.Value
<dttm> <chr> <dbl>
1 2017-07-28 08:00:00 a 465.000
2 2017-07-28 07:45:00 a 464.875
3 2017-07-28 07:30:00 a 464.750
4 2017-07-28 07:15:00 a 464.875
5 2017-07-28 07:00:00 a 465.000
Data
dt <- read.table(text = "'Date Time' 'Object Name' 'Object Value'
'7/28/2017 8:00' a 465
'7/28/2017 7:50' a 465
'7/28/2017 7:40' a 464.75
'7/28/2017 7:30' a 464.75
'7/28/2017 7:20' a 464.75
'7/28/2017 7:10' a 465
'7/28/2017 7:00' a 465",
header = TRUE, stringsAsFactors = FALSE)

If the values measured on the 10-minute intervals are time-integrated averages over that period, it's reasonable to average them to a different period. If these are instantaneous measurements, then it's more reasonable to smooth them as others have suggested.
To take time-integrated averages measured on the 10-minute schedule and average those to the 15-minute schedule, you can use the intervalaverage package:
library(data.table)
library(intervalaverage)
x <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:50", "7/28/2017 7:45",
"7/28/2017 7:40", "7/28/2017 7:30", "7/28/2017 7:20", "7/28/2017 7:15",
"7/28/2017 7:10", "7/28/2017 7:00"), A = c(443.75, 440.75, NA,
447.5, 448.75, 444.5, NA, 440.25, 447.5)), row.names = c(NA,
-9L), class = "data.frame")
y <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:45", "7/28/2017 7:30",
"7/28/2017 7:15", "7/28/2017 7:00")), row.names = c(NA, -5L), class = "data.frame")
setDT(x)
setDT(y)
x
#> time A
#> 1: 7/28/2017 8:00 443.75
#> 2: 7/28/2017 7:50 440.75
#> 3: 7/28/2017 7:45 NA
#> 4: 7/28/2017 7:40 447.50
#> 5: 7/28/2017 7:30 448.75
#> 6: 7/28/2017 7:20 444.50
#> 7: 7/28/2017 7:15 NA
#> 8: 7/28/2017 7:10 440.25
#> 9: 7/28/2017 7:00 447.50
y
#> time
#> 1: 7/28/2017 8:00
#> 2: 7/28/2017 7:45
#> 3: 7/28/2017 7:30
#> 4: 7/28/2017 7:15
#> 5: 7/28/2017 7:00
x[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")]
setnames(x, "time","start_time")
x[, start_time_integer:=as.integer(start_time)]
y[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")]
setnames(y, "time","start_time")
y[, start_time_integer:=as.integer(start_time)]
setkey(y, start_time)
setkey(x, start_time)
##drop time times at 15 and 45
x <- x[!start_time %in% as.POSIXct(c("2017-07-28 07:45:00","2017-07-28 07:15:00"),tz="UTC")]
x[, end_time_integer:=as.integer(start_time)+60L*10L-1L]
x[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")]
y[, end_time_integer:=as.integer(start_time)+60L*15L-1L]
y[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")]
x
#> start_time A start_time_integer end_time_integer
#> 1: 2017-07-28 07:00:00 447.50 1501225200 1501225799
#> 2: 2017-07-28 07:10:00 440.25 1501225800 1501226399
#> 3: 2017-07-28 07:20:00 444.50 1501226400 1501226999
#> 4: 2017-07-28 07:30:00 448.75 1501227000 1501227599
#> 5: 2017-07-28 07:40:00 447.50 1501227600 1501228199
#> 6: 2017-07-28 07:50:00 440.75 1501228200 1501228799
#> 7: 2017-07-28 08:00:00 443.75 1501228800 1501229399
#> end_time
#> 1: 2017-07-28 07:09:59
#> 2: 2017-07-28 07:19:59
#> 3: 2017-07-28 07:29:59
#> 4: 2017-07-28 07:39:59
#> 5: 2017-07-28 07:49:59
#> 6: 2017-07-28 07:59:59
#> 7: 2017-07-28 08:09:59
y
#> start_time start_time_integer end_time_integer end_time
#> 1: 2017-07-28 07:00:00 1501225200 1501226099 2017-07-28 07:14:59
#> 2: 2017-07-28 07:15:00 1501226100 1501226999 2017-07-28 07:29:59
#> 3: 2017-07-28 07:30:00 1501227000 1501227899 2017-07-28 07:44:59
#> 4: 2017-07-28 07:45:00 1501227900 1501228799 2017-07-28 07:59:59
#> 5: 2017-07-28 08:00:00 1501228800 1501229699 2017-07-28 08:14:59
out <- intervalaverage(x,y,interval_vars=c("start_time_integer","end_time_integer"),value_vars="A")
out[, start_time:=as.POSIXct(start_time_integer,origin="1969-12-31 24:00:00",tz="UTC")]
out[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz="UTC")]
out[, list(start_time,end_time, A)]
#> start_time end_time A
#> 1: 2017-07-28 07:00:00 2017-07-28 07:14:59 445.0833
#> 2: 2017-07-28 07:15:00 2017-07-28 07:29:59 443.0833
#> 3: 2017-07-28 07:30:00 2017-07-28 07:44:59 448.3333
#> 4: 2017-07-28 07:45:00 2017-07-28 07:59:59 443.0000
#> 5: 2017-07-28 08:00:00 2017-07-28 08:14:59 NA
#Note that this just equivalent to taking weighted.mean:
weighted.mean(c(447.5,440.25),w=c(10,5))
#> [1] 445.0833
weighted.mean(c(440.25,444.5),w=c(5,10))
#> [1] 443.0833
#etc
Note that the intervalaverage package requires integer columns defining closed intervals, hence the conversion to integer. integers are converted back to datetime (POSIXct) for readability.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Shorter substitute for tab_row_group when rows are large - r

you can use groupname_col inside the gt function dt %>% gt(groupname_col = c("date")) %>% cols_hide(columns = c(datetime,date))

Related

Is it possible to get the specific index of occurence when working with period.apply?

Is there a way to group data according to time in R?

do() superseded! Alternative is to use across(), nest_by(), and summarise, how?

Find the corresponding time for min and max for a day

How to convert data captured every 10 min interval into 15 min interval data

Categories

Resources