I have this data.frame:
Time a b c d
1 2015-01-01 00:00:00 863 1051 1899 25385
2 2015-01-01 01:00:00 920 1009 1658 24382
3 2015-01-01 02:00:00 1164 973 1371 22734
4 2015-01-01 03:00:00 1503 949 779 21286
5 2015-01-01 04:00:00 1826 953 720 20264
6 2015-01-01 05:00:00 2109 952 743 19905
...
Time a b c d
8756 2015-12-31 19:00:00 0 775 4957 28812
8757 2015-12-31 20:00:00 0 783 5615 29568
8758 2015-12-31 21:00:00 0 790 4838 28653
8759 2015-12-31 22:00:00 0 766 3841 27078
8760 2015-12-31 23:00:00 72 729 2179 24565
8761 2016-01-01 00:00:00 290 710 1612 23311
It represents every hour of every day for a year. I would like to extract one line per day, as a function of the maximum value of d. So at the end I want to obtain a data.frame of 365x5.
I have tried all the propositions from :Extract the maximum value within each group in a dataframe and also:Daily minimum values in R but it still doesn't work.
May be it could come from the way I proceed to generate my time serie?
library(lubridate)
start <- dmy_hms("1 Jan 2015 00:00:00")
end <- dmy_hms("01 Jan 2016 00:00:00")
time <- as.data.frame(seq(start, end, by="hours"))
Thanks for help!
If we are aggregating by the 'Day', convert the 'Time' column to Date class stripping off the Time attributes, grouped by those, get the max of 'd'. In the OP's post, the syntax for data.table involves mydf and df. Assuming these are the same, we need
library(data.table)
setDT(mydf)[, .(d = max(d)), by = .(Day = as.Date(Time))]
Or using aggregate from base R
aggregate(d ~ Day, transform(mydf, Day = as.Date(Time)), FUN = max)
Or with tidyverse
library(tidyverse)
mydf %>%
group_by(Day = as.Date(Time)) %>%
summarise(d = max(d))
NOTE: Based on the OP's comments, columns 'a' to 'd' are factor class. We need to convert it to numeric either at the beginning or convert it during the processing stage
mydf$d <- as.numeric(as.character(mydf$d)))
For multiple columns
mydf[c('a', 'b', 'c', 'd')] <- lapply(mydf[c('a', 'b', 'c', 'd'), function(x)
as.numeric(as.character(x)))
data
mydf <- structure(list(Time = c("2015-01-01 00:00:00", "2015-01-01 01:00:00",
"2015-01-01 02:00:00", "2015-01-01 03:00:00", "2015-01-01 04:00:00",
"2015-01-01 05:00:00"), a = c(863L, 920L, 1164L, 1503L, 1826L,
2109L), b = c(1051L, 1009L, 973L, 949L, 953L, 952L), c = c(1899L,
1658L, 1371L, 779L, 720L, 743L), d = c(25385L, 24382L, 22734L,
21286L, 20264L, 19905L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
'max' doesn't work with factors. Hence convert the column (in your case, its column d) for which you are finding the maximum into double using as.numeric
Assuming your data set is in a data frame
mydf$d = as.numeric(mydf$d)
Thanks for your help! Finally I choose
do.call(rbind, lapply(split(test,test$time), function(x) {return(x[which.max(x$d),])}))
which allows me to have a 365x5 data.frame. All your propositions were right. I just needed to change my time serie like
time <- as.data.frame(rep(c(1:365), each = 24))
test<- cbind.data.frame(time, df, timebis)
which allows me to have a 365x5 data.frame. All your propositions were right. I just needed to change my time serie.
Related
I would like to calculate with labvalues from certain (asynchronous) dates (120+). However, Lubridate does not allow both ymd and ymd_hms in the same column. Hence, I wrote a working grep which looks up the dates without time, and it adds 00:00:00 behind those. So i.e. 01-12-1998 becomes 01-12-1998 00:00 (based on this query: lubridate converting midnight timestamp returns NA: how to fill missing timestamp)
Now I want to make a Forloop which automatically recognizes the eligeble columns (might alter in the future), and perform the time addition function.
I couldn't find the right documententation to tie all the functions below together. Would love to know where to find more info on this!
Data frame: Testset
ID Lab_date1 Lab_date2 Lab_date3 Lab_date4
76 18/1/1982 26/01/1990 20/06/1990 15/11/1990
183 18/10/1982 24/04/1989 27/04/1989 02/04/1991
27 1/11/1983 18/10/1982 01:01 13/04/1983 31/10/1984
84 12-1-1983 12-1-1983 00:00 21-4-1983 15:10 22-3-1984 00:00
28 13-10-1989 13-1-1989 12:00 13-11-1991 14:11 19-11-1991 00:00
120 1-10-1982 14-7-1982 00:00 26-8-1986 00:00 26-8-1986 00:00
The function for altering the dates is, now programmed for Lab_date1
Testset$Lab_date1[grep("[0-9]{1,2}.[0-9]{1,2}.[0-9]{4}$",Testset$Lab_date1)] <- paste(
Testset$Lab_date1[grep("[0-9]{1,2}.[0-9]{1,2}.[0-9]{4}$",Testset$Lab_date1)],"00:00:00")
Also, i wrote a grep (pattern) which returns the colnumbers of the lab dates, ie 2:4. Can this result be feeded into a a forloop with the function above?
dat_lab <- grep(pattern="Lab_date",
x=colnames(Testset))
I already tried this, but it didn't work
for(i in names(dat_lab)){
y <- dat_lab[i]
y[grep("[0-9]{1,2}.[0-9]{1,2}.[0-9]{4}$",y)] <- paste(
y[grep("[0-9]{1,2}.[0-9]{1,2}.[0-9]{4}$",y)],"00:00:00")
}
You can use parse_date_time from lubridate to change datetime in different formats.
library(dplyr)
Testset %>%
mutate(across(starts_with('Lab_date'),
lubridate::parse_date_time, c('dmY', 'dmY HM'))) -> Testset
Testset
# ID Lab_date1 Lab_date2 Lab_date3 Lab_date4
#1 76 1982-01-18 1990-01-26 00:00:00 1990-06-20 00:00:00 1990-11-15
#2 183 1982-10-18 1989-04-24 00:00:00 1989-04-27 00:00:00 1991-04-02
#3 27 1983-11-01 1982-10-18 01:01:00 1983-04-13 00:00:00 1984-10-31
#4 84 1983-01-12 1983-01-12 00:00:00 1983-04-21 15:10:00 1984-03-22
#5 28 1989-10-13 1989-01-13 12:00:00 1991-11-13 14:11:00 1991-11-19
#6 120 1982-10-01 1982-07-14 00:00:00 1986-08-26 00:00:00 1986-08-26
data
Testset <- structure(list(ID = c(76L, 183L, 27L, 84L, 28L, 120L), Lab_date1 = c("18/1/1982",
"18/10/1982", "1/11/1983", "12-1-1983", "13-10-1989", "1-10-1982"
), Lab_date2 = c("26/01/1990", "24/04/1989", "18/10/1982 01:01",
"12-1-1983 00:00", "13-1-1989 12:00", "14-7-1982 00:00"), Lab_date3 = c("20/06/1990",
"27/04/1989", "13/04/1983", "21-4-1983 15:10", "13-11-1991 14:11",
"26-8-1986 00:00"), Lab_date4 = c("15/11/1990", "02/04/1991",
"31/10/1984", "22-3-1984 00:00", "19-11-1991 00:00", "26-8-1986 00:00"
)), class = "data.frame", row.names = c(NA, -6L))
We can use anytime
library(anytime)
library(dplyr)
addFormats('%d/%m/%Y')
Testset %>%
mutate(across(starts_with('Lab_date'), anytime))
# ID Lab_date1 Lab_date2 Lab_date3 Lab_date4
#1 76 1982-01-18 1990-01-26 1990-06-20 1990-11-15
#2 183 1982-10-18 1989-04-24 1989-04-27 1991-04-02
#3 27 1983-01-11 1982-10-18 1983-04-13 1984-10-31
#4 84 1983-01-12 1983-01-12 1983-04-21 1984-03-22
#5 28 1989-10-13 1989-01-13 1991-11-13 1991-11-19
#6 120 1982-01-10 1982-07-14 1986-08-26 1986-08-26
I'm trying to summarize values for overlapping time periods.
I can use only tidyr, ggplot2 and dplyr libraries. Base R is preferred though.
My data looks like this, but usually it has around 100 records:
df <- structure(list(Start = structure(c(1546531200, 1546531200, 546531200, 1546638252.6316, 1546549800, 1546534800, 1546545600, 1546531200, 1546633120, 1547065942.1053), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Stop = structure(c(1546770243.1579, 1546607400, 1547110800, 1546670652.6316, 1547122863.1579, 1546638252.6316, 1546878293.5579, 1546416000, 1546849694.4, 1547186400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Value = c(12610, 520, 1500, 90, 331380, 27300, 6072, 4200, 61488, 64372)), .Names = c("Start", "Stop", "Value"), row.names = c(41L, 55L, 25L, 29L, 38L, 28L, 1L, 20L, 14L, 31L), class = c("tbl_df", "tbl", "data.frame"))
head(df) and str(df) gives:
Start Stop Value
2019-01-03 16:00:00 2019-01-06 10:24:03 12610
2019-01-03 16:00:00 2019-01-04 13:10:00 520
2019-01-03 16:00:00 2019-01-10 09:00:00 1500
2019-01-04 21:44:12 2019-01-05 06:44:12 90
2019-01-03 21:10:00 2019-01-10 12:21:03 331380
2019-01-03 17:00:00 2019-01-04 21:44:12 27300
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 3 variables:
$ Start: POSIXct, format: "2019-01-03 16:00:00" "2019-01-03 16:00:00" ...
$ Stop : POSIXct, format: "2019-01-06 10:24:03" "2019-01-04 13:10:00" ...
$ Value: num 12610 520 1500 90 331380 ...
So there are overlapping time periods with "Start" and "Stop" dates with assigned value. In any given record when there is a value between df$Start and df$Stop and outside of this scope the value is 0.
I want to create another dataframe based on which I could show how this values summarize and change over time. The Desired output would look like this (the "sum" column is made up):
> head(df2)
timestamp sum
"2019-01-02 09:00:00 CET" 14352
"2019-01-03 17:00:00 CET" 6253
"2019-01-03 18:00:00 CET" 23465
"2019-01-03 21:00:00 CET" 3241
"2019-01-03 22:10:00 CET" 23235
"2019-01-04 14:10:00 CET" 123321
To get unique timestamps:
timestamps <- sort(unique(c(df$`Start`, df$`Stop`)))
With df2 dataframe I could easily draw a graph with ggplot, but how to get this sums?
I think I should iterate over df data frame either some custom function or any built-it summarize function which would work like this:
fnct <- function(date, min, max, value) {
if (date >= min && date <=max) {
a <- value
}
else {
a <- 0
}
return(a)
}
...for every given date from timestamps iterate through df and give me a sum of values for the timestamp.
It looks really simple and I'm missing something very basic.
Here's a tidyverse solution similar to my response to this recent question. I gather to bring the timestamps (Starts and Stops) into one column, with another column specifying which. The Starts add the value and the Stops subtract it, and then we just take the cumulative sum to get values at all the instants when the sum changes.
For 100 records, there won't be any perceivable speed improvement from using data.table; in my experience it starts to make more of a difference around 1M records, especially when grouping is involved.
library(dplyr); library(tidyr)
df2 <- df %>%
gather(type, time, Start:Stop) %>%
mutate(chg = if_else(type == "Start", Value, -Value)) %>%
arrange(time) %>%
mutate(sum = cumsum(chg)) # EDIT: corrected per OP comment
> head(df2)
## A tibble: 6 x 5
# Value type time chg sum
# <dbl> <chr> <dttm> <dbl> <dbl>
#1 1500 Start 1987-04-27 14:13:20 1500 1500
#2 4200 Stop 2019-01-02 08:00:00 -4200 -2700
#3 12610 Start 2019-01-03 16:00:00 12610 9910
#4 520 Start 2019-01-03 16:00:00 520 10430
#5 4200 Start 2019-01-03 16:00:00 4200 14630
#6 27300 Start 2019-01-03 17:00:00 27300 41930
In the past I have tried to solve similar problems using the tidyverse/baseR... But nothing comes even remotely close to the speeds that data.table provides for these kind of operations, so I encourage you to give it a try...
For questions like this, my favourite finction is foverlaps() from the data.table-package. With this function you can (fast!) perform an overlap-join. If you want more flexibility in your joining than foverlaps() provides, a non-equi-join (again using data.table) is probably the best (and fastest!) option. But foverlaps() will do here (I guess).
I used the sample data you provided, but filtered out rows where Stop <= Start (probably a tyop in your sample data). When df$Start is not before df$Stop, foverlaps give a warning and won't execute.
library( data.table )
#create data.table with periods you wish to simmarise on
#NB: UTC is used as timezone, since this is also the case in the sample data provided!!
dt.dates <- data.table( id = paste0( "Day", 1:31 ),
Start = seq( as.POSIXct( "2019-01-01 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "UTC" ),
as.POSIXct( "2019-01-31 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "UTC" ),
by = "1 days"),
Stop = seq( as.POSIXct( "2019-01-02 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "UTC" ) - 1,
as.POSIXct( "2019-02-01 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "UTC" ) - 1,
by = "1 days") )
If you do not want to summarise on a daily basis, but by hour, minute, second, of year. Just change the values (and stepsize) in dt.dates data.table so that they match your periods.
#set df as data.table
dt <- as.data.table( df )
#filter out any row where Stop is smaller than Start
dt <- dt[ Start < Stop, ]
#perform overlap join
#first set keys
setkey(dt, Start, Stop)
#then perform join
result <- foverlaps( dt.dates, dt, type = "within" )
#summarise
result[, .( Value = sum( Value , na.rm = TRUE ) ), by = .(Day = i.Start) ]
output
# Day Value
# 1: 2019-01-01 1500
# 2: 2019-01-02 1500
# 3: 2019-01-03 1500
# 4: 2019-01-04 351562
# 5: 2019-01-05 413050
# 6: 2019-01-06 400440
# 7: 2019-01-07 332880
# 8: 2019-01-08 332880
# 9: 2019-01-09 332880
# 10: 2019-01-10 64372
# 11: 2019-01-11 0
# 12: 2019-01-12 0
# 13: 2019-01-13 0
# 14: 2019-01-14 0
# 15: 2019-01-15 0
# 16: 2019-01-16 0
# 17: 2019-01-17 0
# 18: 2019-01-18 0
# 19: 2019-01-19 0
# 20: 2019-01-20 0
# 21: 2019-01-21 0
# 22: 2019-01-22 0
# 23: 2019-01-23 0
# 24: 2019-01-24 0
# 25: 2019-01-25 0
# 26: 2019-01-26 0
# 27: 2019-01-27 0
# 28: 2019-01-28 0
# 29: 2019-01-29 0
# 30: 2019-01-30 0
# 31: 2019-01-31 0
# Day Value
plot
#summarise for plot
result.plot <- result[, .( Value = sum( Value , na.rm = TRUE ) ), by = .(Day = i.Start) ]
library( ggplot2 )
ggplot( data = result.plot, aes( x = Day, y = Value ) ) + geom_col()
First of all, I have a large data.table with the one parameter-Date, but the str(Date) is chr.
date
2015-07-01 0:15:00
2015-07-01 0:30:00
2015-07-01 0:45:00
2015-07-01 0:60:00
2015-07-01 1:15:00
2015-07-01 1:30:00
2015-07-01 1:45:00
2015-07-01 1:60:00
what i want to do is
make them in standard format like: 2015-07-01 00:15:00
correct the time, for example: 2015-07-01 1:60:00 -> 2015-07-01 02:00:00
for the first one, I tried to use the function as.POSIXct() to reset the format, it should be correct, but the problem is for the data like 2015-07-01 1:60:00, after transformatiion, it is just NA.
anybody has ideas?
Here is a code to generate test data:
dd <- data.table(date = c("2015-07-01 0:15:00", "2015-07-01 0:30:00",
"2015-07-01 0:45:00","2015-07-01 0:60:00", "2015-07-01 1:15:00",
"2015-07-01 1:30:00","2015-07-01 1:45:00","2015-07-01 1:60:00","2015-07-01 2:15:00"))
Note: this table is just for one day and the last value of the table is
2015-07-01 23:60:00
for any unclear points, feel free to let me know
thanks for that !
In base R you could try this:
df1$date <- gsub(":60:",":59:",df1$date, fixed = TRUE)
df1$date <- as.POSIXct(df1$date)
the59s <- grepl(":59:",df1$date)
df1$date[the59s] <- df1$date[the59s] + 60
#> df1
# date
#1 2015-07-01 00:15:00
#2 2015-07-01 00:30:00
#3 2015-07-01 00:45:00
#4 2015-07-01 01:00:00
#5 2015-07-01 01:15:00
#6 2015-07-01 01:30:00
#7 2015-07-01 01:45:00
#8 2015-07-01 02:00:00
#9 2015-07-01 02:15:00
The idea is to let POSIXct perform the conversion to the next hour / day / month / ... triggered by a "60 minutes" value. For this we first identify those entries containing :60: and replace that part with :59:. Then the column is converted into a POSIXct object. Afterwards we find all those entries containing a ":59:" and add 60 (seconds), thereby converting the time/date to the intended format.
In the case described by the OP the data contains only quarter hour values 0, 15, 30, 40, 60. A more general situation may include genuine 59 minutes values that should not be converted to the next hour. It would then be better to store the relevant row indices before performing the conversion:
the60s <- grepl(":60:", df1$date)
df1$date <- gsub(":60:",":59:",df1$date, fixed = TRUE)
df1$date <- as.POSIXct(df1$date)
df1$date[the60s] <- df1$date[the60s] + 60
data:
df1 <- structure(list(date = structure(1:9, .Label = c("2015-07-01 0:15:00",
"2015-07-01 0:30:00", "2015-07-01 0:45:00", "2015-07-01 0:60:00",
"2015-07-01 1:15:00", "2015-07-01 1:30:00", "2015-07-01 1:45:00",
"2015-07-01 1:60:00", "2015-07-01 2:15:00"), class = "factor")),
.Names = "date", row.names = c(NA, -9L), class = "data.frame")
I have a table in R like:
start duration
02/01/2012 20:00:00 5
05/01/2012 07:00:00 6
etc... etc...
I got to this by importing a table from Microsoft Excel that looked like this:
date time duration
2012/02/01 20:00:00 5
etc...
I then merged the date and time columns by running the following code:
d.f <- within(d.f, { start=format(as.POSIXct(paste(date, time)), "%m/%d/%Y %H:%M:%S") })
I want to create a third column called 'end', which will be calculated as the number of hours after the start time. I am pretty sure that my time is a POSIXct vector. I have seen how to manipulate one datetime object, but how can I do that for the entire column?
The expected result should look like:
start duration end
02/01/2012 20:00:00 5 02/02/2012 01:00:00
05/01/2012 07:00:00 6 05/01/2012 13:00:00
etc... etc... etc...
Using lubridate
> library(lubridate)
> df$start <- mdy_hms(df$start)
> df$end <- df$start + hours(df$duration)
> df
# start duration end
#1 2012-02-01 20:00:00 5 2012-02-02 01:00:00
#2 2012-05-01 07:00:00 6 2012-05-01 13:00:00
data
df <- structure(list(start = c("02/01/2012 20:00:00", "05/01/2012 07:00:00"
), duration = 5:6), .Names = c("start", "duration"), class = "data.frame", row.names = c(NA,
-2L))
You can simply add dur*3600 to start column of the data frame. E.g. with one date:
start = as.POSIXct("02/01/2012 20:00:00",format="%m/%d/%Y %H:%M:%S")
start
[1] "2012-02-01 20:00:00 CST"
start + 5*3600
[1] "2012-02-02 01:00:00 CST"
I have read in and formatted my data set like shown under.
library(xts)
#Read data from file
x <- read.csv("data.dat", header=F)
x[is.na(x)] <- c(0) #If empty fill in zero
#Construct data frames
rawdata.h <- data.frame(x[,2],x[,3],x[,4],x[,5],x[,6],x[,7],x[,8]) #Hourly data
rawdata.15min <- data.frame(x[,10]) #15 min data
#Convert time index to proper format
index.h <- as.POSIXct(strptime(x[,1], "%d.%m.%Y %H:%M"))
index.15min <- as.POSIXct(strptime(x[,9], "%d.%m.%Y %H:%M"))
#Set column names
names(rawdata.h) <- c("spot","RKup", "RKdown","RKcon","anm", "pp.stat","prod.h")
names(rawdata.15min) <- c("prod.15min")
#Convert data frames to time series objects
data.htemp <- xts(rawdata.h,order.by=index.h)
data.15mintemp <- xts(rawdata.15min,order.by=index.15min)
#Select desired subset period
data.h <- data.htemp["2013"]
data.15min <- data.15mintemp["2013"]
I want to be able to combine hourly data from data.h$prod.h with data, with 15 min resolution, from data.15min$prod.15min corresponding to the same hour.
An example would be to take the average of the hourly value at time 2013-12-01 00:00-01:00 with the last 15 minute value in that same hour, i.e. the 15 minute value from time 2013-12-01 00:45-01:00. I'm looking for a flexible way to do this with an arbitrary hour.
Any suggestions?
Edit: Just to clarify further: I want to do something like this:
N <- NROW(data.h$prod.h)
for (i in 1:N){
prod.average[i] <- mean(data.h$prod.h[i] + #INSERT CODE THAT FINDS LAST 15 MIN IN HOUR i )
}
I found a solution to my problem by converting the 15 minute data into hourly data using the very useful .index* function from the xts package like shown under.
prod.new <- data.15min$prod.15min[.indexmin(data.15min$prod.15min) %in% c(45:59)]
This creates a new time series with only the values occuring in the 45-59 minute interval each hour.
For those curious my data looked like this:
Original hourly series:
> data.h$prod.h[1:4]
2013-01-01 00:00:00 19.744
2013-01-01 01:00:00 27.866
2013-01-01 02:00:00 26.227
2013-01-01 03:00:00 16.013
Original 15 minute series:
> data.15min$prod.15min[1:4]
2013-09-30 00:00:00 16.4251
2013-09-30 00:15:00 18.4495
2013-09-30 00:30:00 7.2125
2013-09-30 00:45:00 12.1913
2013-09-30 01:00:00 12.4606
2013-09-30 01:15:00 12.7299
2013-09-30 01:30:00 12.9992
2013-09-30 01:45:00 26.7522
New series with only the last 15 minutes in each hour:
> prod.new[1:4]
2013-09-30 00:45:00 12.1913
2013-09-30 01:45:00 26.7522
2013-09-30 02:45:00 5.0332
2013-09-30 03:45:00 2.6974
Short answer
df %>%
group_by(t = cut(time, "30 min")) %>%
summarise(v = mean(value))
Long answer
Since, you want to compress the 15 minutes time series to a smaller resolution (30 minutes), you should use dplyr package or any other package that computes the "group by" concept.
For instance:
s = seq(as.POSIXct("2017-01-01"), as.POSIXct("2017-01-02"), "15 min")
df = data.frame(time = s, value=1:97)
df is a time series with 97 rows and two columns.
head(df)
time value
1 2017-01-01 00:00:00 1
2 2017-01-01 00:15:00 2
3 2017-01-01 00:30:00 3
4 2017-01-01 00:45:00 4
5 2017-01-01 01:00:00 5
6 2017-01-01 01:15:00 6
The cut.POSIXt, group_by and summarise functions do the work:
df %>%
group_by(t = cut(time, "30 min")) %>%
summarise(v = mean(value))
t v
1 2017-01-01 00:00:00 1.5
2 2017-01-01 00:30:00 3.5
3 2017-01-01 01:00:00 5.5
4 2017-01-01 01:30:00 7.5
5 2017-01-01 02:00:00 9.5
6 2017-01-01 02:30:00 11.5
A more robust way is to convert 15 minutes values into hourly values by taking average. Then do whatever operation you want to.
### 15 Minutes Data
min15 <- structure(list(V1 = structure(1:8, .Label = c("2013-01-01 00:00:00",
"2013-01-01 00:15:00", "2013-01-01 00:30:00", "2013-01-01 00:45:00",
"2013-01-01 01:00:00", "2013-01-01 01:15:00", "2013-01-01 01:30:00",
"2013-01-01 01:45:00"), class = "factor"), V2 = c(16.4251, 18.4495,
7.2125, 12.1913, 12.4606, 12.7299, 12.9992, 26.7522)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -8L))
min15
### Hourly Data
hourly <- structure(list(V1 = structure(1:4, .Label = c("2013-01-01 00:00:00",
"2013-01-01 01:00:00", "2013-01-01 02:00:00", "2013-01-01 03:00:00"
), class = "factor"), V2 = c(19.744, 27.866, 26.227, 16.013)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -4L))
hourly
### Convert 15min data into hourly data by taking average of 4 values
min15$V1 <- as.POSIXct(min15$V1,origin="1970-01-01 0:0:0")
min15 <- aggregate(. ~ cut(min15$V1,"60 min"),min15[setdiff(names(min15), "V1")],mean)
min15
names(min15) <- c("time","min15")
names(hourly) <- c("time","hourly")
### merge the corresponding values
combined <- merge(hourly,min15)
### average of hourly and 15min values
rowMeans(combined[,2:3])