How to combine two columns of time in R?

How to combine two columns of time in R? - r

I have two text files:
1-
> head(val)
V1 V2 V3
1 2015/03/31 00:00 0.134
2 2015/03/31 01:00 0.130
3 2015/03/31 02:00 0.133
4 2015/03/31 03:00 0.132
2-
> head(tes)
A B date
1 0.04 0.02 2015-03-31 02:18:56
What I need is to combine V1 (date) and V2 (hour) in val. search in val the date and time that correspond (the closest) to date in tes and then extract the corresponding V3 and put it in tes.
the desired out put would be:
tes
A B date V3
1 0.04 0.02 2015-04-01 02:18:56 0.133

Updated answer based on OP's comments.
val$date <- with(val,as.POSIXct(paste(V1,V2), format="%Y/%m/%d %H:%M"))
val
# V1 V2 V3 date
# 1 2015/03/31 00:00 0.134 2015-03-31 00:00:00
# 2 2015/03/31 01:00 0.130 2015-03-31 01:00:00
# 3 2015/03/31 02:00 0.133 2015-03-31 02:00:00
# 4 2015/03/31 03:00 0.132 2015-03-31 03:00:00
# 5 2015/04/07 13:00 0.080 2015-04-07 13:00:00
# 6 2015/04/07 14:00 0.082 2015-04-07 14:00:00
tes$date <- as.POSIXct(tes$date)
tes
# A B date
# 1 0.04 0.02 2015-03-31 02:18:56
# 2 0.05 0.03 2015-03-31 03:30:56
# 3 0.06 0.04 2015-03-31 05:30:56
# 4 0.07 0.05 2015-04-07 13:42:56
f <- function(d) { # for given tes$date, find val$V3
diff <- abs(difftime(val$date,d,units="min"))
if (min(diff > 45)) Inf else which.min(diff)
}
tes <- cbind(tes,val[sapply(tes$date,f),c("date","V3")])
tes
# A B date date V3
# 1 0.04 0.02 2015-03-31 02:18:56 2015-03-31 02:00:00 0.133
# 2 0.05 0.03 2015-03-31 03:30:56 2015-03-31 03:00:00 0.132
# 3 0.06 0.04 2015-03-31 05:30:56 <NA> NA
# 4 0.07 0.05 2015-04-07 13:42:56 2015-04-07 14:00:00 0.082
The function f(...) calculates the index into val (the row number) for which val$date is closest in time to the given tes$date, unless that time is > 45 min, in which case Inf is returned. Using this function with sapply(...) as in:
sapply(tes$date, f)
returns a vector of row numbers in val matching your condition for each test$date.
The reason we use Inf instead of NA for missing values is that indexing a data.frame using Inf always returns a single "row" containing NA, whereas indexing using NA returns nrow(...) rows all containing NA.
I added the extra rows into val and tes per your comment.

Related

How to aggregate hourly data into daily values for several years [duplicate]

I have an hourly weather data in the following format:
Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
...
...
12/31/2000 23:00,25
What I need is a daily aggregate of max, min, ave like this:
Date,MaxDBT,MinDBT,AveDBT
01/01/2000,36,23,28
01/02/2000,34,22,29
01/03/2000,32,25,30
...
...
12/31/2000,35,9,20
How to do this in R?

1) This can be done compactly using zoo:
L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"
library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)
This gives:
> z
min max mean
2000-01-01 30 33 31.33333
2000-12-31 25 25 25.00000
2) here is a solution that only uses core R:
DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution
The last line gives:
> ag
Date DBT.min DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000
EDIT: (1) Since this first appeared the text= argument to read.zoo was added in the zoo package.
(2) minor improvements.

Using strptime(), trunc() and ddply() from the plyr package :
#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)
# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")
require(plyr)
ddply(dataframe,.(day),
summarize,
aveDBT=mean(DBT),
maxDBT=max(DBT),
minDBT=min(DBT)
)
gives
day aveDBT maxDBT minDBT
1 2000-01-01 31.33333 33 30
2 2000-12-31 25.00000 25 25
To clarify :
strptime converts the character to dates according to the format. To see how you can specify the format, see ?strptime. trunc will then truncate these date-times to the specified unit, which is day in this case.
ddply will evaluate the function summarize within the dataframe after splitting it up according to day. everything after summarize are arguments that are passed to the function summarize.

There is also a nice package called hydroTSM. It uses zoo objects and can convert to other aggregates in time
The function in your case is subdaily2daily. You can choose if the aggregation should be based on min / max / mean...

A couple of options:
1. Timetk
If you have a data frame (or tibble) then the summarize_by_time() function from timetk can be used:
library(tidyverse)
library(timetk)
# Collect Data
text <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"
df <- read_csv(text, col_types = cols(Date = col_datetime("%m/%d/%Y %H:%M")))
df
#> # A tibble: 4 x 2
#> Date DBT
#> <dttm> <dbl>
#> 1 2000-01-01 01:00:00 30
#> 2 2000-01-01 02:00:00 31
#> 3 2000-01-01 03:00:00 33
#> 4 2000-12-31 23:00:00 25
# Summarize
df %>%
summarise_by_time(
.date_var = Date,
.by = "day",
min = min(DBT),
max = max(DBT),
mean = mean(DBT)
)
#> # A tibble: 2 x 4
#> Date min max mean
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2000-01-01 00:00:00 30 33 31.3
#> 2 2000-12-31 00:00:00 25 25 25
Created on 2021-05-21 by the reprex package (v2.0.0)
2. Tidyquant
You can use the tidyquant package for this. The process is involves using the tq_transmute function to return a data frame that is modified using the xts aggregation function, apply.daily. We'll apply a custom stat_fun, which returns the min, max and mean. However, you can apply any vector function you'd like such as quantile.
library(tidyquant)
df
#> # A tibble: 4 x 2
#> Date DBT
#> <dttm> <dbl>
#> 1 2000-01-01 01:00:00 30
#> 2 2000-01-01 02:00:00 31
#> 3 2000-01-01 03:00:00 33
#> 4 2000-12-31 23:00:00 25
stat_fun <- function(x) c(min = min(x), max = max(x), mean = mean(x))
df %>%
tq_transmute(select = DBT,
mutate_fun = apply.daily,
FUN = stat_fun)
# A tibble: 2 x 4
#> Date min max mean
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2000-01-01 03:00:00 30 33 31.33333
#> 2 2000-12-31 23:00:00 25 25 25.00000

Given that you have POSIXct time format, you can do this using as.POSIXct(time), all you need is cut and aggregate().
try this:
split_hour = cut(as.POSIXct(temp$time), breaks = "60 mins") # summrise given mins
temp$hour = split_hour # make hourly vaiable
ag = aggregate(. ~ hour, temp, mean)
In this case, temp is like this
temp
1 0.6 0.6 0.0 0.350 0.382 0.000 2020-04-13 18:30:42
2 0.0 0.5 0.5 0.000 0.304 0.292 2020-04-13 19:56:02
3 0.0 0.2 0.2 0.000 0.107 0.113 2020-04-13 20:09:10
4 0.6 0.0 0.6 0.356 0.000 0.376 2020-04-13 20:11:57
5 0.0 0.3 0.2 0.000 0.156 0.148 2020-04-13 20:12:07
6 0.0 0.4 0.4 0.000 0.218 0.210 2020-04-13 22:02:49
7 0.2 0.2 0.0 0.112 0.113 0.000 2020-04-13 22:31:43
8 0.3 0.0 0.3 0.155 0.000 0.168 2020-04-14 03:19:03
9 0.4 0.0 0.4 0.219 0.000 0.258 2020-04-14 03:55:58
10 0.2 0.0 0.0 0.118 0.000 0.000 2020-04-14 04:25:25
11 0.3 0.3 0.0 0.153 0.160 0.000 2020-04-14 05:38:20
12 0.0 0.7 0.8 0.000 0.436 0.493 2020-04-14 05:40:02
13 0.0 0.0 0.2 0.000 0.000 0.101 2020-04-14 05:40:44
14 0.3 0.0 0.3 0.195 0.000 0.198 2020-04-14 06:09:26
15 0.2 0.2 0.0 0.130 0.128 0.000 2020-04-14 06:17:15
16 0.2 0.0 0.0 0.144 0.000 0.000 2020-04-14 06:19:36
17 0.3 0.0 0.4 0.177 0.000 0.220 2020-04-14 06:23:43
18 0.2 0.0 0.0 0.110 0.000 0.000 2020-04-14 06:25:19
19 0.0 0.0 0.0 1.199 1.035 0.251 2020-04-14 07:05:24
20 0.2 0.2 0.0 0.125 0.107 0.000 2020-04-14 07:21:46
ag is like this
ag
1 2020-04-13 18:30:00 0.60000000 0.6000000 0.0000000 0.3500000 0.38200000 0.00000000
2 2020-04-13 19:30:00 0.15000000 0.2500000 0.3750000 0.0890000 0.14175000 0.23225000
3 2020-04-13 21:30:00 0.00000000 0.4000000 0.4000000 0.0000000 0.21800000 0.21000000
4 2020-04-13 22:30:00 0.20000000 0.2000000 0.0000000 0.1120000 0.11300000 0.00000000
5 2020-04-14 02:30:00 0.30000000 0.0000000 0.3000000 0.1550000 0.00000000 0.16800000
6 2020-04-14 03:30:00 0.30000000 0.0000000 0.2000000 0.1685000 0.00000000 0.12900000
7 2020-04-14 05:30:00 0.18750000 0.1500000 0.2125000 0.1136250 0.09050000 0.12650000
8 2020-04-14 06:30:00 0.10000000 0.1000000 0.0000000 0.6620000 0.57100000 0.12550000
9 2020-04-14 07:30:00 0.00000000 0.3000000 0.2000000 0.0000000 0.16200000 0.11800000
10 2020-04-14 19:30:00 0.20000000 0.3000000 0.0000000 0.1460000 0.19000000 0.00000000
11 2020-04-14 20:30:00 0.06666667 0.2000000 0.2666667 0.0380000 0.11766667 0.17366667
12 2020-04-14 22:30:00 0.20000000 0.3000000 0.0000000 0.1353333 0.18533333 0.00000000
13 2020-04-14 23:30:00 0.00000000 0.5000000 0.5000000 0.0000000 0.28000000 0.32100000
14 2020-04-15 01:30:00 0.25000000 0.2000000 0.4500000 0.1355000 0.11450000 0.26100000

How to increase time series granularity in R Dataframe? [duplicate]

This question already has answers here:
Insert rows for missing dates/times
(9 answers)
Closed 5 years ago.
I have a dataframe that contains hourly weather information. I would like to increase the granularity of the time measurements (5 minute intervals instead of 60 minute intervals) while copying the other columns data into the new rows created:
Current Dataframe Structure:
Date Temperature Humidity
2015-01-01 00:00:00 25 0.67
2015-01-01 01:00:00 26 0.69
Target Dataframe Structure:
Date Temperature Humidity
2015-01-01 00:00:00 25 0.67
2015-01-01 00:05:00 25 0.67
2015-01-01 00:10:00 25 0.67
.
.
.
2015-01-01 00:55:00 25 0.67
2015-01-01 01:00:00 26 0.69
2015-01-01 01:05:00 26 0.69
2015-01-01 01:10:00 26 0.69
.
.
.
What I've Tried:
for(i in 1:nrow(df)) {
five.minutes <- seq(df$date[i], length = 12, by = "5 mins")
for(j in 1:length(five.minutes)) {
df$date[i]<-rbind(five.minutes[j])
}
}
Error I'm getting:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied

The one possible solution can be using fill from tidyr and right_join from dplyr.
The approach is to create date/time series between min and max+55mins times from dataframe. Left join dataframe with timeseries which will provide you all desired rows but NA for Temperature and Humidity. Now use fill to populated NA values with previous valid values.
# Data
df <- read.table(text = "Date Temperature Humidity
'2015-01-01 00:00:00' 25 0.67
'2015-01-01 01:00:00' 26 0.69
'2015-01-01 02:00:00' 28 0.69
'2015-01-01 03:00:00' 25 0.69", header = T, stringsAsFactors = F)
df$Date <- as.POSIXct(df$Date, format = "%Y-%m-%d %H:%M:%S")
# Create a dataframe with all possible date/time at intervale of 5 mins
Dates <- data.frame(Date = seq(min(df$Date), max(df$Date)+3540, by = 5*60))
result <- df %>%
right_join(Dates, by="Date") %>%
fill(Temperature, Humidity)
result
# Date Temperature Humidity
#1 2015-01-01 00:00:00 25 0.67
#2 2015-01-01 00:05:00 25 0.67
#3 2015-01-01 00:10:00 25 0.67
#4 2015-01-01 00:15:00 25 0.67
#5 2015-01-01 00:20:00 25 0.67
#6 2015-01-01 00:25:00 25 0.67
#7 2015-01-01 00:30:00 25 0.67
#8 2015-01-01 00:35:00 25 0.67
#9 2015-01-01 00:40:00 25 0.67
#10 2015-01-01 00:45:00 25 0.67
#11 2015-01-01 00:50:00 25 0.67
#12 2015-01-01 00:55:00 25 0.67
#13 2015-01-01 01:00:00 26 0.69
#14 2015-01-01 01:05:00 26 0.69
#.....
#.....
#44 2015-01-01 03:35:00 25 0.69
#45 2015-01-01 03:40:00 25 0.69
#46 2015-01-01 03:45:00 25 0.69
#47 2015-01-01 03:50:00 25 0.69
#48 2015-01-01 03:55:00 25 0.69

I think this might do:
df=tibble(DateTime=c("2015-01-01 00:00:00","2015-01-01 01:00:00"),Temperature=c(25,26),Humidity=c(.67,.69))
df$DateTime<-ymd_hms(df$DateTime)
DateTime=as.POSIXct((sapply(1:(nrow(df)-1),function(x) seq(from=df$DateTime[x],to=df$DateTime[x+1],by="5 min"))),
origin="1970-01-01", tz="UTC")
Temperature=c(sapply(1:(nrow(df)-1),function(x) rep(df$Temperature[x],12)),df$Temperature[nrow(df)])
Humidity=c(sapply(1:(nrow(df)-1),function(x) rep(df$Humidity[x],12)),df$Humidity[nrow(df)])
tibble(as.character(DateTime),Temperature,Humidity)
<chr> <dbl> <dbl>
1 2015-01-01 00:00:00 25.0 0.670
2 2015-01-01 00:05:00 25.0 0.670
3 2015-01-01 00:10:00 25.0 0.670
4 2015-01-01 00:15:00 25.0 0.670
5 2015-01-01 00:20:00 25.0 0.670
6 2015-01-01 00:25:00 25.0 0.670
7 2015-01-01 00:30:00 25.0 0.670
8 2015-01-01 00:35:00 25.0 0.670
9 2015-01-01 00:40:00 25.0 0.670
10 2015-01-01 00:45:00 25.0 0.670
11 2015-01-01 00:50:00 25.0 0.670
12 2015-01-01 00:55:00 25.0 0.670
13 2015-01-01 01:00:00 26.0 0.690

Efficient dynamic addition of rows in dataframe and dynamic calculation in R

I have the following dataframe (ts1):
D1 Value N
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.05 2
4 20/11/2014 22:00 0.20 3
5 20/11/2014 23:00 0.03 4
I would like to insert rows as the number of of (N-1) the new ts1 and rows will be:
D1 Value N
1 20/11/2014 16:00 0.00 1
2 20/11/2014 17:00 0.01 1
3 20/11/2014 18:00 0.03 1 <---
4 20/11/2014 19:00 0.05 1
5 20/11/2014 20:00 0.10 1 <---
6 20/11/2014 21:00 0.15 1 <---
7 20/11/2014 22:00 0.20 1
8 20/11/2014 23:00 0.03 1
As can be seen lines 3, 5 and 6 were added because of the gap in time (N > 1) the number in ts1$Value is filled in by dividing the gap of ts1$Value and dividing them by the number of new rows. I would like to add the values as efficient as possible with minimum number of going over the dataframe.

Here is the complete solution:
The usage of the last command of linear interpolation solves the issue
> Lines <- "D1,Value
+ 1,20/11/2014 16:00,0.00
+ 2,20/11/2014 17:00,0.01
+ 3,20/11/2014 19:00,0.05
+ 4,20/11/2014 22:00,0.20
+ 5,20/11/2014 23:00,0.03"
> ts1 <- read.csv(text = Lines, as.is = TRUE)
> library(zoo)
> z <- read.zoo(ts1, tz = "", format = "%d/%m/%Y %H:%M")
>
> z0 <- zoo(, seq(start(z), end(z), "hours"))
> zz <- merge(z, z0)
> interpolated <- na.approx(zz)
> interpolated
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00 2014-11-20 20:00:00 2014-11-20 21:00:00
0.00 0.01 0.03 0.05 0.10 0.15
2014-11-20 22:00:00 2014-11-20 23:00:00
0.20 0.03

Adding missing dates to dataframe

I have a data frame which looks like this:
times values
1 2013-07-06 20:00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-09 20:00:00 0.13
4 2013-07-10 20:00:00 0.12
5 2013-07-11 20:00:00 0.03
6 2013-07-14 20:00:00 0.06
7 2013-07-15 20:00:00 0.08
8 2013-07-16 20:00:00 0.07
9 2013-07-17 20:00:00 0.08
There are a few dates missing from the data, and I would like to insert them and to carry over the value from the previous day into these new rows, i.e. obtain this:
times values
1 2013-07-06 20:00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-08 20:00:00 0.03
4 2013-07-09 20:00:00 0.13
5 2013-07-10 20:00:00 0.12
6 2013-07-11 20:00:00 0.03
7 2013-07-12 20:00:00 0.03
8 2013-07-13 20:00:00 0.03
9 2013-07-14 20:00:00 0.06
10 2013-07-15 20:00:00 0.08
11 2013-07-16 20:00:00 0.07
12 2013-07-17 20:00:00 0.08
...
I have been trying to use a vector of all the dates:
dates <- as.Date(1:length(df),origin = df$times[1])
I am stuck, and can't find a way to do it without a horrible for loop in which I'm getting lost...
Thank you for your help

Some test data (I am using Date, yours seems to be a different type, but this does not affect the algorithm):
data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")),
values = as.double(1:3))
# Generate **all** timestamps at which you want to have your result.
# I use `seq`, but you may use any other method of generating those timestamps.
alldates = seq(min(data$dates), max(data$dates), 1)
# Filter out timestamps that are already present in your `data.frame`:
# Construct a `data.frame` to append with missing values:
dates0 = alldates[!(alldates %in% data$dates)]
data0 = data.frame(dates = dates0, values = NA_real_)
# Append this `data.frame` and resort in time:
data = rbind(data, data0)
data = data[order(data$dates),]
# forward fill the values
# I would recommend to move this code into a separate `ffill` function:
# proved to be very useful in general):
current = NA_real_
data$values = sapply(data$values, function(x) {
current <<- ifelse(is.na(x), current, x); current })

library(zoo)
g <- data.frame(dates=seq(min(data$dates),max(data$dates),1))
na.locf(merge(g,data,by="dates",all.x=TRUE))
or entirely with zoo:
z <- read.zoo(data)
gz <- zoo(, seq(min(time(z)), max(time(z)), "day")) # time grid in zoo
na.locf(merge(z, gz))

Using tidyr's complete and fill assuming the times columns is already of class POSIXct.
library(tidyr)
df %>%
complete(times = seq(min(times), max(times), by = 'day')) %>%
fill(values)
# A tibble: 12 x 2
# times values
# <dttm> <dbl>
# 1 2013-07-06 20:00:00 0.02
# 2 2013-07-07 20:00:00 0.03
# 3 2013-07-08 20:00:00 0.03
# 4 2013-07-09 20:00:00 0.13
# 5 2013-07-10 20:00:00 0.12
# 6 2013-07-11 20:00:00 0.03
# 7 2013-07-12 20:00:00 0.03
# 8 2013-07-13 20:00:00 0.03
# 9 2013-07-14 20:00:00 0.06
#10 2013-07-15 20:00:00 0.08
#11 2013-07-16 20:00:00 0.07
#12 2013-07-17 20:00:00 0.08
data
df <- structure(list(times = structure(c(1373140800, 1373227200, 1373400000,
1373486400, 1373572800, 1373832000, 1373918400, 1374004800, 1374091200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), values = c(0.02,
0.03, 0.13, 0.12, 0.03, 0.06, 0.08, 0.07, 0.08)), row.names = c(NA,
-9L), class = "data.frame")

df2 <- data.frame(times=seq(min(df$times), max(df$times), by="day"))
df3 <- merge(x=df2, y=df, by="times", all.x=T)
idx <- which(is.na(df3$values))
for (id in idx)
df3$values[id] <- df3$values[id-1]
df3
# times values
# 1 2013-07-06 20:00:00 0.02
# 2 2013-07-07 20:00:00 0.03
# 3 2013-07-08 20:00:00 0.03
# 4 2013-07-09 20:00:00 0.13
# 5 2013-07-10 20:00:00 0.12
# 6 2013-07-11 20:00:00 0.03
# 7 2013-07-12 20:00:00 0.03
# 8 2013-07-13 20:00:00 0.03
# 9 2013-07-14 20:00:00 0.06
# 10 2013-07-15 20:00:00 0.08
# 11 2013-07-16 20:00:00 0.07
# 12 2013-07-17 20:00:00 0.08

You can try this:
setkey(NADayWiseOrders, date)
all_dates <- seq(from = as.Date("2013-01-01"),
to = as.Date("2013-01-07"),
by = "days")
NADayWiseOrders[J(all_dates), roll=Inf]
date orders amount guests
1: 2013-01-01 50 2272.55 149
2: 2013-01-02 3 64.04 4
3: 2013-01-03 3 64.04 4
4: 2013-01-04 1 18.81 0
5: 2013-01-05 2 77.62 0
6: 2013-01-06 2 77.62 0
7: 2013-01-07 2 35.82 2

Aggregating hourly data into daily aggregates

I have an hourly weather data in the following format:
Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
...
...
12/31/2000 23:00,25
What I need is a daily aggregate of max, min, ave like this:
Date,MaxDBT,MinDBT,AveDBT
01/01/2000,36,23,28
01/02/2000,34,22,29
01/03/2000,32,25,30
...
...
12/31/2000,35,9,20
How to do this in R?

1) This can be done compactly using zoo:
L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"
library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)
This gives:
> z
min max mean
2000-01-01 30 33 31.33333
2000-12-31 25 25 25.00000
2) here is a solution that only uses core R:
DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution
The last line gives:
> ag
Date DBT.min DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000
EDIT: (1) Since this first appeared the text= argument to read.zoo was added in the zoo package.
(2) minor improvements.

Using strptime(), trunc() and ddply() from the plyr package :
#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)
# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")
require(plyr)
ddply(dataframe,.(day),
summarize,
aveDBT=mean(DBT),
maxDBT=max(DBT),
minDBT=min(DBT)
)
gives
day aveDBT maxDBT minDBT
1 2000-01-01 31.33333 33 30
2 2000-12-31 25.00000 25 25
To clarify :
strptime converts the character to dates according to the format. To see how you can specify the format, see ?strptime. trunc will then truncate these date-times to the specified unit, which is day in this case.
ddply will evaluate the function summarize within the dataframe after splitting it up according to day. everything after summarize are arguments that are passed to the function summarize.

There is also a nice package called hydroTSM. It uses zoo objects and can convert to other aggregates in time
The function in your case is subdaily2daily. You can choose if the aggregation should be based on min / max / mean...

A couple of options:
1. Timetk
If you have a data frame (or tibble) then the summarize_by_time() function from timetk can be used:
library(tidyverse)
library(timetk)
# Collect Data
text <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"
df <- read_csv(text, col_types = cols(Date = col_datetime("%m/%d/%Y %H:%M")))
df
#> # A tibble: 4 x 2
#> Date DBT
#> <dttm> <dbl>
#> 1 2000-01-01 01:00:00 30
#> 2 2000-01-01 02:00:00 31
#> 3 2000-01-01 03:00:00 33
#> 4 2000-12-31 23:00:00 25
# Summarize
df %>%
summarise_by_time(
.date_var = Date,
.by = "day",
min = min(DBT),
max = max(DBT),
mean = mean(DBT)
)
#> # A tibble: 2 x 4
#> Date min max mean
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2000-01-01 00:00:00 30 33 31.3
#> 2 2000-12-31 00:00:00 25 25 25
Created on 2021-05-21 by the reprex package (v2.0.0)
2. Tidyquant
You can use the tidyquant package for this. The process is involves using the tq_transmute function to return a data frame that is modified using the xts aggregation function, apply.daily. We'll apply a custom stat_fun, which returns the min, max and mean. However, you can apply any vector function you'd like such as quantile.
library(tidyquant)
df
#> # A tibble: 4 x 2
#> Date DBT
#> <dttm> <dbl>
#> 1 2000-01-01 01:00:00 30
#> 2 2000-01-01 02:00:00 31
#> 3 2000-01-01 03:00:00 33
#> 4 2000-12-31 23:00:00 25
stat_fun <- function(x) c(min = min(x), max = max(x), mean = mean(x))
df %>%
tq_transmute(select = DBT,
mutate_fun = apply.daily,
FUN = stat_fun)
# A tibble: 2 x 4
#> Date min max mean
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2000-01-01 03:00:00 30 33 31.33333
#> 2 2000-12-31 23:00:00 25 25 25.00000

Given that you have POSIXct time format, you can do this using as.POSIXct(time), all you need is cut and aggregate().
try this:
split_hour = cut(as.POSIXct(temp$time), breaks = "60 mins") # summrise given mins
temp$hour = split_hour # make hourly vaiable
ag = aggregate(. ~ hour, temp, mean)
In this case, temp is like this
temp
1 0.6 0.6 0.0 0.350 0.382 0.000 2020-04-13 18:30:42
2 0.0 0.5 0.5 0.000 0.304 0.292 2020-04-13 19:56:02
3 0.0 0.2 0.2 0.000 0.107 0.113 2020-04-13 20:09:10
4 0.6 0.0 0.6 0.356 0.000 0.376 2020-04-13 20:11:57
5 0.0 0.3 0.2 0.000 0.156 0.148 2020-04-13 20:12:07
6 0.0 0.4 0.4 0.000 0.218 0.210 2020-04-13 22:02:49
7 0.2 0.2 0.0 0.112 0.113 0.000 2020-04-13 22:31:43
8 0.3 0.0 0.3 0.155 0.000 0.168 2020-04-14 03:19:03
9 0.4 0.0 0.4 0.219 0.000 0.258 2020-04-14 03:55:58
10 0.2 0.0 0.0 0.118 0.000 0.000 2020-04-14 04:25:25
11 0.3 0.3 0.0 0.153 0.160 0.000 2020-04-14 05:38:20
12 0.0 0.7 0.8 0.000 0.436 0.493 2020-04-14 05:40:02
13 0.0 0.0 0.2 0.000 0.000 0.101 2020-04-14 05:40:44
14 0.3 0.0 0.3 0.195 0.000 0.198 2020-04-14 06:09:26
15 0.2 0.2 0.0 0.130 0.128 0.000 2020-04-14 06:17:15
16 0.2 0.0 0.0 0.144 0.000 0.000 2020-04-14 06:19:36
17 0.3 0.0 0.4 0.177 0.000 0.220 2020-04-14 06:23:43
18 0.2 0.0 0.0 0.110 0.000 0.000 2020-04-14 06:25:19
19 0.0 0.0 0.0 1.199 1.035 0.251 2020-04-14 07:05:24
20 0.2 0.2 0.0 0.125 0.107 0.000 2020-04-14 07:21:46
ag is like this
ag
1 2020-04-13 18:30:00 0.60000000 0.6000000 0.0000000 0.3500000 0.38200000 0.00000000
2 2020-04-13 19:30:00 0.15000000 0.2500000 0.3750000 0.0890000 0.14175000 0.23225000
3 2020-04-13 21:30:00 0.00000000 0.4000000 0.4000000 0.0000000 0.21800000 0.21000000
4 2020-04-13 22:30:00 0.20000000 0.2000000 0.0000000 0.1120000 0.11300000 0.00000000
5 2020-04-14 02:30:00 0.30000000 0.0000000 0.3000000 0.1550000 0.00000000 0.16800000
6 2020-04-14 03:30:00 0.30000000 0.0000000 0.2000000 0.1685000 0.00000000 0.12900000
7 2020-04-14 05:30:00 0.18750000 0.1500000 0.2125000 0.1136250 0.09050000 0.12650000
8 2020-04-14 06:30:00 0.10000000 0.1000000 0.0000000 0.6620000 0.57100000 0.12550000
9 2020-04-14 07:30:00 0.00000000 0.3000000 0.2000000 0.0000000 0.16200000 0.11800000
10 2020-04-14 19:30:00 0.20000000 0.3000000 0.0000000 0.1460000 0.19000000 0.00000000
11 2020-04-14 20:30:00 0.06666667 0.2000000 0.2666667 0.0380000 0.11766667 0.17366667
12 2020-04-14 22:30:00 0.20000000 0.3000000 0.0000000 0.1353333 0.18533333 0.00000000
13 2020-04-14 23:30:00 0.00000000 0.5000000 0.5000000 0.0000000 0.28000000 0.32100000
14 2020-04-15 01:30:00 0.25000000 0.2000000 0.4500000 0.1355000 0.11450000 0.26100000

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to combine two columns of time in R? - r

Related

How to aggregate hourly data into daily values for several years [duplicate]

How to increase time series granularity in R Dataframe? [duplicate]

Efficient dynamic addition of rows in dataframe and dynamic calculation in R

Adding missing dates to dataframe

Aggregating hourly data into daily aggregates

Categories

Resources