Comparing time series with different sampling rate (dates) in R - r

I have two long time series to compare, however, the sampling of them is completely different. The first one is with hourly, the second one is with irregular sampling.
I would like to compare Value1 and Value2, so, I would like to select Value1 records from df1 at 02:00 according to df2 dates. How can I solve it in R?
df1:
Date1
Value1
2014-01-01 01:00:00
0.16
2014-01-01 02:00:00
0.13
2014-01-01 03:00:00
0.6
2014-01-02 01:00:00
0.5
2014-01-02 02:00:00
0.22
2014-01-02 03:00:00
0.17
2014-01-19 01:00:00
0.2
2014-01-19 02:00:00
0.11
2014-01-19 03:00:00
0.15
2014-01-21 01:00:00
0.13
2014-01-21 02:00:00
0.33
2014-01-21 03:00:00
0.1
2014-01-23 01:00:00
0.09
2014-01-23 02:00:00
0.02
2014-01-23 03:00:00
0.16
df2:
Date2
Value2
2014-01-01
13
2014-01-19
76
2014-01-23
8
desired output:
df_fused:
Date1
Value1
Value2
2014-01-01 02:00:00
0.13
13
2014-01-19 02:00:00
0.11
76
2014-01-23 02:00:00
0.02
8

here is a data.table approach
library( data.table )
#sample data can also be setDT(df1);setDT(df2)
df1 <- fread("Date1 Value1
2014-01-01 01:00:00 0.16
2014-01-01 02:00:00 0.13
2014-01-01 03:00:00 0.6
2014-01-02 01:00:00 0.5
2014-01-02 02:00:00 0.22
2014-01-02 03:00:00 0.17
2014-01-19 01:00:00 0.2
2014-01-19 02:00:00 0.11
2014-01-19 03:00:00 0.15
2014-01-21 01:00:00 0.13
2014-01-21 02:00:00 0.33
2014-01-21 03:00:00 0.1
2014-01-23 01:00:00 0.09
2014-01-23 02:00:00 0.02
2014-01-23 03:00:00 0.16")
df2 <- fread("Date2 Value2
2014-01-01 13
2014-01-19 76
2014-01-23 8")
#set dates to posix
df1[, Date1 := as.POSIXct( Date1, format = "%Y-%m-%d %H:%M:%S", tz = "UTC" )]
#set df2 dates to 02:00:00 time
df2[, Date2 := as.POSIXct( paste0( Date2, "02:00:00" ), format = "%Y-%m-%d %H:%M:%S", tz = "UTC" )]
#join
df2[ df1, Value1 := i.Value1, on = .(Date2 = Date1)][]
# Date2 Value2 Value1
# 1: 2014-01-01 02:00:00 13 0.13
# 2: 2014-01-19 02:00:00 76 0.11
# 3: 2014-01-23 02:00:00 8 0.02

Related

Creating dataframe in R with multiple lines of the same hour

I'm trying to create a dataframe with the following columns: dt, depth, var1
But I need 4 lines of each hour going through a whole year as I need to adjust var1 at certain depths:
dt
depth
var1
2008-01-01 00:00
2
0.01
2008-01-01 00:00
40
0.01
2008-01-01 00:00
45
0.01
2008-01-01 00:00
100
0.01
2008-01-01 01:00
2
0.01
2008-01-01 01:00
40
0.01
2008-01-01 01:00
45
0.01
2008-01-01 01:00
100
0.01
2008-01-01 02:00
2
0.01
2008-01-01 02:00
40
0.01
2008-01-01 02:00
45
0.01
2008-01-01 02:00
100
0.01
2008-01-01 03:00
2
0.01
How do I create the "dt" list for the first column?
Thank you!
You can use expand.grid :
expand.grid(
dt = seq(as.POSIXct("2008-01-01 00:00:00", 'UTC'),
as.POSIXct("2008-01-01 03:00:00", 'UTC'), 'hour'),
depth = c(2, 40, 45, 100),
var1 = 0.01
) -> result
result
# dt depth var1
#1 2008-01-01 00:00:00 2 0.01
#2 2008-01-01 01:00:00 2 0.01
#3 2008-01-01 02:00:00 2 0.01
#4 2008-01-01 03:00:00 2 0.01
#5 2008-01-01 00:00:00 40 0.01
#6 2008-01-01 01:00:00 40 0.01
#7 2008-01-01 02:00:00 40 0.01
#8 2008-01-01 03:00:00 40 0.01
#9 2008-01-01 00:00:00 45 0.01
#10 2008-01-01 01:00:00 45 0.01
#11 2008-01-01 02:00:00 45 0.01
#12 2008-01-01 03:00:00 45 0.01
#13 2008-01-01 00:00:00 100 0.01
#14 2008-01-01 01:00:00 100 0.01
#15 2008-01-01 02:00:00 100 0.01
#16 2008-01-01 03:00:00 100 0.01
If you want order as shown you can arrange the above result or use tidyr::expand_grid :
tidyr::expand_grid(
dt = seq(as.POSIXct("2008-01-01 00:00:00", 'UTC'),
as.POSIXct("2008-01-01 03:00:00", 'UTC'), 'hour'),
depth = c(2, 40, 45, 100),
var1 = 0.01
) -> result
result
# A tibble: 16 x 3
# dt depth var1
# <dttm> <dbl> <dbl>
# 1 2008-01-01 00:00:00 2 0.01
# 2 2008-01-01 00:00:00 40 0.01
# 3 2008-01-01 00:00:00 45 0.01
# 4 2008-01-01 00:00:00 100 0.01
# 5 2008-01-01 01:00:00 2 0.01
# 6 2008-01-01 01:00:00 40 0.01
# 7 2008-01-01 01:00:00 45 0.01
# 8 2008-01-01 01:00:00 100 0.01
# 9 2008-01-01 02:00:00 2 0.01
#10 2008-01-01 02:00:00 40 0.01
#11 2008-01-01 02:00:00 45 0.01
#12 2008-01-01 02:00:00 100 0.01
#13 2008-01-01 03:00:00 2 0.01
#14 2008-01-01 03:00:00 40 0.01
#15 2008-01-01 03:00:00 45 0.01
#16 2008-01-01 03:00:00 100 0.01
Try this:
start_date_time <- as.POSIXct("2008-01-01 00:00", format= "%Y-%m-%d %H:%M")
end_date_time <- as.POSIXct("2008-01-01 03:00", format= "%Y-%m-%d %H:%M")
df <- data.frame(dt = rep(seq(start_date_time, end_date_time, by = 3600), each = 4),
depth = rep(c(2, 40, 45, 100)),
var1 = 0.01)
df
#> dt depth var1
#> 1 2008-01-01 00:00:00 2 0.01
#> 2 2008-01-01 00:00:00 40 0.01
#> 3 2008-01-01 00:00:00 45 0.01
#> 4 2008-01-01 00:00:00 100 0.01
#> 5 2008-01-01 01:00:00 2 0.01
#> 6 2008-01-01 01:00:00 40 0.01
#> 7 2008-01-01 01:00:00 45 0.01
#> 8 2008-01-01 01:00:00 100 0.01
#> 9 2008-01-01 02:00:00 2 0.01
#> 10 2008-01-01 02:00:00 40 0.01
#> 11 2008-01-01 02:00:00 45 0.01
#> 12 2008-01-01 02:00:00 100 0.01
#> 13 2008-01-01 03:00:00 2 0.01
#> 14 2008-01-01 03:00:00 40 0.01
#> 15 2008-01-01 03:00:00 45 0.01
#> 16 2008-01-01 03:00:00 100 0.01
Created on 2021-04-22 by the reprex package (v2.0.0)

overlap(intersect) time interval and xts

There's two time datasets: data from raincollector -- time interval ti with start, end and rain p (total amount of rain per period in mm)
ti <- data.frame(
start = c("2017-06-05 19:30:00", "2017-06-06 12:00:00"),
end = c("2017-06-05 23:30:00", "2017-06-06 14:00:00"),
p = c(16.4, 4.4)
)
ti[,1] <- as.POSIXct(ti[, 1])
ti[,2] <- as.POSIXct(ti[, 2])
and timeseries ts from gauging station with time and parameter q, which is the water discharge (cu. m per sec)
ts <- data.frame(stringsAsFactors=FALSE,
time = c("2017-06-05 16:00:00", "2017-06-05 19:00:00",
"2017-06-05 21:00:00", "2017-06-05 23:00:00",
"2017-06-06 9:00:00", "2017-06-06 11:00:00", "2017-06-06 13:00:00",
"2017-06-06 16:00:00", "2017-06-06 17:00:00"),
q = c(0.78, 0.84, 0.9, 0.78, 0.78, 0.78, 0.78, 1.22, 1.25)
)
ts[,1] <- as.POSIXct(ts[,1])
I need to intersect timeseries with time interval and create a new column in ts with TRUE/FALSE if this row in the rain interval (TRUE) and if it not (FALSE) like this one:
time q rain
1 2017-06-05 16:00:00 0.78 FALSE
2 2017-06-05 19:00:00 0.84 FALSE
3 2017-06-05 21:00:00 0.90 TRUE # there were rain
4 2017-06-05 23:00:00 0.78 TRUE # there were rain
5 2017-06-06 9:00:00 0.78 FALSE
6 2017-06-06 11:00:00 0.78 FALSE
7 2017-06-06 13:00:00 0.78 TRUE # there were rain
8 2017-06-06 16:00:00 1.22 FALSE
9 2017-06-06 17:00:00 1.25 FALSE
Have you got any ideas how to apply such simple operation?
With sqldf:
library(sqldf)
sqldf('select ts.*, case when ti.p is not null then 1 else 0 end as rain
from ts
left join ti
on start <= time and
time <= end')
Result:
time q rain
1 2017-06-05 16:00:00 0.78 0
2 2017-06-05 19:00:00 0.84 0
3 2017-06-05 21:00:00 0.90 1
4 2017-06-05 23:00:00 0.78 1
5 2017-06-06 9:00:00 0.78 0
6 2017-06-06 11:00:00 0.78 0
7 2017-06-06 13:00:00 0.78 1
8 2017-06-06 16:00:00 1.22 0
9 2017-06-06 17:00:00 1.25 0

How to increase time series granularity in R Dataframe? [duplicate]

This question already has answers here:
Insert rows for missing dates/times
(9 answers)
Closed 5 years ago.
I have a dataframe that contains hourly weather information. I would like to increase the granularity of the time measurements (5 minute intervals instead of 60 minute intervals) while copying the other columns data into the new rows created:
Current Dataframe Structure:
Date Temperature Humidity
2015-01-01 00:00:00 25 0.67
2015-01-01 01:00:00 26 0.69
Target Dataframe Structure:
Date Temperature Humidity
2015-01-01 00:00:00 25 0.67
2015-01-01 00:05:00 25 0.67
2015-01-01 00:10:00 25 0.67
.
.
.
2015-01-01 00:55:00 25 0.67
2015-01-01 01:00:00 26 0.69
2015-01-01 01:05:00 26 0.69
2015-01-01 01:10:00 26 0.69
.
.
.
What I've Tried:
for(i in 1:nrow(df)) {
five.minutes <- seq(df$date[i], length = 12, by = "5 mins")
for(j in 1:length(five.minutes)) {
df$date[i]<-rbind(five.minutes[j])
}
}
Error I'm getting:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
The one possible solution can be using fill from tidyr and right_join from dplyr.
The approach is to create date/time series between min and max+55mins times from dataframe. Left join dataframe with timeseries which will provide you all desired rows but NA for Temperature and Humidity. Now use fill to populated NA values with previous valid values.
# Data
df <- read.table(text = "Date Temperature Humidity
'2015-01-01 00:00:00' 25 0.67
'2015-01-01 01:00:00' 26 0.69
'2015-01-01 02:00:00' 28 0.69
'2015-01-01 03:00:00' 25 0.69", header = T, stringsAsFactors = F)
df$Date <- as.POSIXct(df$Date, format = "%Y-%m-%d %H:%M:%S")
# Create a dataframe with all possible date/time at intervale of 5 mins
Dates <- data.frame(Date = seq(min(df$Date), max(df$Date)+3540, by = 5*60))
result <- df %>%
right_join(Dates, by="Date") %>%
fill(Temperature, Humidity)
result
# Date Temperature Humidity
#1 2015-01-01 00:00:00 25 0.67
#2 2015-01-01 00:05:00 25 0.67
#3 2015-01-01 00:10:00 25 0.67
#4 2015-01-01 00:15:00 25 0.67
#5 2015-01-01 00:20:00 25 0.67
#6 2015-01-01 00:25:00 25 0.67
#7 2015-01-01 00:30:00 25 0.67
#8 2015-01-01 00:35:00 25 0.67
#9 2015-01-01 00:40:00 25 0.67
#10 2015-01-01 00:45:00 25 0.67
#11 2015-01-01 00:50:00 25 0.67
#12 2015-01-01 00:55:00 25 0.67
#13 2015-01-01 01:00:00 26 0.69
#14 2015-01-01 01:05:00 26 0.69
#.....
#.....
#44 2015-01-01 03:35:00 25 0.69
#45 2015-01-01 03:40:00 25 0.69
#46 2015-01-01 03:45:00 25 0.69
#47 2015-01-01 03:50:00 25 0.69
#48 2015-01-01 03:55:00 25 0.69
I think this might do:
df=tibble(DateTime=c("2015-01-01 00:00:00","2015-01-01 01:00:00"),Temperature=c(25,26),Humidity=c(.67,.69))
df$DateTime<-ymd_hms(df$DateTime)
DateTime=as.POSIXct((sapply(1:(nrow(df)-1),function(x) seq(from=df$DateTime[x],to=df$DateTime[x+1],by="5 min"))),
origin="1970-01-01", tz="UTC")
Temperature=c(sapply(1:(nrow(df)-1),function(x) rep(df$Temperature[x],12)),df$Temperature[nrow(df)])
Humidity=c(sapply(1:(nrow(df)-1),function(x) rep(df$Humidity[x],12)),df$Humidity[nrow(df)])
tibble(as.character(DateTime),Temperature,Humidity)
<chr> <dbl> <dbl>
1 2015-01-01 00:00:00 25.0 0.670
2 2015-01-01 00:05:00 25.0 0.670
3 2015-01-01 00:10:00 25.0 0.670
4 2015-01-01 00:15:00 25.0 0.670
5 2015-01-01 00:20:00 25.0 0.670
6 2015-01-01 00:25:00 25.0 0.670
7 2015-01-01 00:30:00 25.0 0.670
8 2015-01-01 00:35:00 25.0 0.670
9 2015-01-01 00:40:00 25.0 0.670
10 2015-01-01 00:45:00 25.0 0.670
11 2015-01-01 00:50:00 25.0 0.670
12 2015-01-01 00:55:00 25.0 0.670
13 2015-01-01 01:00:00 26.0 0.690

Split date into YYYY-MM-DD-HH-MM-SS and aggregate date (R)

How can one split the following datetime into year-month-day-hour-minute-second? The date was created using:
datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'),
as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'),
by="hour",tz="GMT"))
The ultimate goal is to aggregate x which is at hourly resolution into 6-hourly resolution. Probably it is possible to aggregate datetime without needing to split it?
datetime x
1 2015-04-01 00:00:00 0.0
2 2015-04-01 01:00:00 0.0
3 2015-04-01 02:00:00 0.0
4 2015-04-01 03:00:00 0.0
5 2015-04-01 04:00:00 0.0
6 2015-04-01 05:00:00 0.0
7 2015-04-01 06:00:00 0.0
8 2015-04-01 07:00:00 0.0
9 2015-04-01 08:00:00 0.0
10 2015-04-01 09:00:00 0.0
11 2015-04-01 10:00:00 0.0
12 2015-04-01 11:00:00 0.0
13 2015-04-01 12:00:00 0.0
14 2015-04-01 13:00:00 0.0
15 2015-04-01 14:00:00 0.0
16 2015-04-01 15:00:00 0.0
17 2015-04-01 16:00:00 0.0
18 2015-04-01 17:00:00 0.0
19 2015-04-01 18:00:00 0.0
20 2015-04-01 19:00:00 0.0
21 2015-04-01 20:00:00 0.0
22 2015-04-01 21:00:00 0.0
23 2015-04-01 22:00:00 1.6
24 2015-04-01 23:00:00 0.2
25 2015-04-02 00:00:00 1.5
26 2015-04-02 01:00:00 1.5
27 2015-04-02 02:00:00 0.5
28 2015-04-02 03:00:00 0.0
29 2015-04-02 04:00:00 0.0
30 2015-04-02 05:00:00 0.0
31 2015-04-02 06:00:00 0.0
32 2015-04-02 07:00:00 0.5
33 2015-04-02 08:00:00 0.3
34 2015-04-02 09:00:00 0.0
35 2015-04-02 10:00:00 0.0
36 2015-04-02 11:00:00 0.0
37 2015-04-02 12:00:00 0.0
38 2015-04-02 13:00:00 0.0
39 2015-04-02 14:00:00 0.0
40 2015-04-02 15:00:00 0.0
41 2015-04-02 16:00:00 0.0
42 2015-04-02 17:00:00 0.0
43 2015-04-02 18:00:00 0.0
44 2015-04-02 19:00:00 0.0
45 2015-04-02 20:00:00 0.0
46 2015-04-02 21:00:00 0.0
47 2015-04-02 22:00:00 0.0
48 2015-04-02 23:00:00 0.0
....
The output should be very close to:
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss
2015-04-01 00:00:00 2015-04-01 06:00:00 2015-04-01 12:00:00 2015-04-01 18:00:00
2015-04-02 00:00:00 2015-04-02 06:00:00 2015-04-02 12:00:00 2015-04-02 18:00:00
.....
I appreciate your thoughts on this.
EDIT
How to implement #r2evans answer on a list object such as:
x = runif(5856)
flst1=list(x,x,x,x)
flst1=lapply(flst1, function(x){x$datetime <- as.POSIXct(x$datetime, tz = "GMT"); x})
sixhours1=lapply(flst1, function(x) {x$bin <- cut(x$datetime,sixhours);x})
head(sixhours1[[1]],n=7)
ret=lapply(sixhours1, function(x) aggregate(x$precip, list(x$bin), sum,na.rm=T))
head(ret[[1]],n=20)
Your minimal data is incomplete, so I'll generate something random:
dat <- data.frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = "GMT"),
as.POSIXct("2015-11-30 23:59:59", tz = "GMT"),
by = "hour",tz = "GMT"),
x = runif(5856))
# the "1+" ensures we extend at least to the end of the datetimes;
# without it, the last several rows in "bin" would be NA
sixhours <- seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = "GMT"),
1 + as.POSIXct("2015-11-30 23:59:59", tz = "GMT"),
by = "6 hours",tz = "GMT")
# this doesn't have to go into the data.frame (could be a separate
# vector), but I'm including it for easy row-wise comparison
dat$bin <- cut(dat$datetime, sixhours)
head(dat, n=7)
# datetime x bin
# 1 2015-04-01 00:00:00 0.91022534 2015-04-01 00:00:00
# 2 2015-04-01 01:00:00 0.02638850 2015-04-01 00:00:00
# 3 2015-04-01 02:00:00 0.42486354 2015-04-01 00:00:00
# 4 2015-04-01 03:00:00 0.90722845 2015-04-01 00:00:00
# 5 2015-04-01 04:00:00 0.24540085 2015-04-01 00:00:00
# 6 2015-04-01 05:00:00 0.60360906 2015-04-01 00:00:00
# 7 2015-04-01 06:00:00 0.01843313 2015-04-01 06:00:00
tail(dat)
# datetime x bin
# 5851 2015-11-30 18:00:00 0.5963204 2015-11-30 18:00:00
# 5852 2015-11-30 19:00:00 0.2503440 2015-11-30 18:00:00
# 5853 2015-11-30 20:00:00 0.9600476 2015-11-30 18:00:00
# 5854 2015-11-30 21:00:00 0.6837394 2015-11-30 18:00:00
# 5855 2015-11-30 22:00:00 0.9093506 2015-11-30 18:00:00
# 5856 2015-11-30 23:00:00 0.9197769 2015-11-30 18:00:00
nrow(dat)
# [1] 5856
The work:
ret <- aggregate(dat$x, list(dat$bin), mean)
nrow(ret)
# [1] 976
head(ret)
# Group.1 x
# 1 2015-04-01 00:00:00 0.5196193
# 2 2015-04-01 06:00:00 0.4770019
# 3 2015-04-01 12:00:00 0.5359483
# 4 2015-04-01 18:00:00 0.8140603
# 5 2015-04-02 00:00:00 0.4874332
# 6 2015-04-02 06:00:00 0.6139554
tail(ret)
# Group.1 x
# 971 2015-11-29 12:00:00 0.6881228
# 972 2015-11-29 18:00:00 0.4791925
# 973 2015-11-30 00:00:00 0.5793872
# 974 2015-11-30 06:00:00 0.4809868
# 975 2015-11-30 12:00:00 0.5157432
# 976 2015-11-30 18:00:00 0.7199298
I got a solution using:
library(xts)
flst<- list.files(pattern=".csv")
flst1<- lapply(flst,function(x) read.csv(x,header = TRUE,stringsAsFactors=FALSE,sep = ",",fill=TRUE,
dec = ".",quote = "\"",colClasses=c('factor', 'numeric', 'NULL'))) # read files ignoring 3 column
head(flst1[[1]])
dat.xts=lapply(flst1, function(x) xts(x$precip,as.POSIXct(x$datetime)))
head(dat.xts[[1]])
ep.xts=lapply(dat.xts, function(x) endpoints(x, on="hours", k=6))#k=by .... see endpoints for "on"
head(ep.xts[[1]])
stations6hrly<-lapply(dat.xts, function(x) period.apply(x, FUN=sum,INDEX=ep))
head(stations6hrly[[703]])
[,1]
2015-04-01 05:00:00 0.3
2015-04-01 11:00:00 1.2
2015-04-01 17:00:00 0.0
2015-04-01 23:00:00 0.2
2015-04-02 05:00:00 0.0
2015-04-02 11:00:00 1.4
The dates are not as I wanted them to be but the values are correct. I doubt if there is a -shifttime function in R just as in CDO

Adding missing dates to dataframe

I have a data frame which looks like this:
times values
1 2013-07-06 20:00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-09 20:00:00 0.13
4 2013-07-10 20:00:00 0.12
5 2013-07-11 20:00:00 0.03
6 2013-07-14 20:00:00 0.06
7 2013-07-15 20:00:00 0.08
8 2013-07-16 20:00:00 0.07
9 2013-07-17 20:00:00 0.08
There are a few dates missing from the data, and I would like to insert them and to carry over the value from the previous day into these new rows, i.e. obtain this:
times values
1 2013-07-06 20:00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-08 20:00:00 0.03
4 2013-07-09 20:00:00 0.13
5 2013-07-10 20:00:00 0.12
6 2013-07-11 20:00:00 0.03
7 2013-07-12 20:00:00 0.03
8 2013-07-13 20:00:00 0.03
9 2013-07-14 20:00:00 0.06
10 2013-07-15 20:00:00 0.08
11 2013-07-16 20:00:00 0.07
12 2013-07-17 20:00:00 0.08
...
I have been trying to use a vector of all the dates:
dates <- as.Date(1:length(df),origin = df$times[1])
I am stuck, and can't find a way to do it without a horrible for loop in which I'm getting lost...
Thank you for your help
Some test data (I am using Date, yours seems to be a different type, but this does not affect the algorithm):
data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")),
values = as.double(1:3))
# Generate **all** timestamps at which you want to have your result.
# I use `seq`, but you may use any other method of generating those timestamps.
alldates = seq(min(data$dates), max(data$dates), 1)
# Filter out timestamps that are already present in your `data.frame`:
# Construct a `data.frame` to append with missing values:
dates0 = alldates[!(alldates %in% data$dates)]
data0 = data.frame(dates = dates0, values = NA_real_)
# Append this `data.frame` and resort in time:
data = rbind(data, data0)
data = data[order(data$dates),]
# forward fill the values
# I would recommend to move this code into a separate `ffill` function:
# proved to be very useful in general):
current = NA_real_
data$values = sapply(data$values, function(x) {
current <<- ifelse(is.na(x), current, x); current })
library(zoo)
g <- data.frame(dates=seq(min(data$dates),max(data$dates),1))
na.locf(merge(g,data,by="dates",all.x=TRUE))
or entirely with zoo:
z <- read.zoo(data)
gz <- zoo(, seq(min(time(z)), max(time(z)), "day")) # time grid in zoo
na.locf(merge(z, gz))
Using tidyr's complete and fill assuming the times columns is already of class POSIXct.
library(tidyr)
df %>%
complete(times = seq(min(times), max(times), by = 'day')) %>%
fill(values)
# A tibble: 12 x 2
# times values
# <dttm> <dbl>
# 1 2013-07-06 20:00:00 0.02
# 2 2013-07-07 20:00:00 0.03
# 3 2013-07-08 20:00:00 0.03
# 4 2013-07-09 20:00:00 0.13
# 5 2013-07-10 20:00:00 0.12
# 6 2013-07-11 20:00:00 0.03
# 7 2013-07-12 20:00:00 0.03
# 8 2013-07-13 20:00:00 0.03
# 9 2013-07-14 20:00:00 0.06
#10 2013-07-15 20:00:00 0.08
#11 2013-07-16 20:00:00 0.07
#12 2013-07-17 20:00:00 0.08
data
df <- structure(list(times = structure(c(1373140800, 1373227200, 1373400000,
1373486400, 1373572800, 1373832000, 1373918400, 1374004800, 1374091200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), values = c(0.02,
0.03, 0.13, 0.12, 0.03, 0.06, 0.08, 0.07, 0.08)), row.names = c(NA,
-9L), class = "data.frame")
df2 <- data.frame(times=seq(min(df$times), max(df$times), by="day"))
df3 <- merge(x=df2, y=df, by="times", all.x=T)
idx <- which(is.na(df3$values))
for (id in idx)
df3$values[id] <- df3$values[id-1]
df3
# times values
# 1 2013-07-06 20:00:00 0.02
# 2 2013-07-07 20:00:00 0.03
# 3 2013-07-08 20:00:00 0.03
# 4 2013-07-09 20:00:00 0.13
# 5 2013-07-10 20:00:00 0.12
# 6 2013-07-11 20:00:00 0.03
# 7 2013-07-12 20:00:00 0.03
# 8 2013-07-13 20:00:00 0.03
# 9 2013-07-14 20:00:00 0.06
# 10 2013-07-15 20:00:00 0.08
# 11 2013-07-16 20:00:00 0.07
# 12 2013-07-17 20:00:00 0.08
You can try this:
setkey(NADayWiseOrders, date)
all_dates <- seq(from = as.Date("2013-01-01"),
to = as.Date("2013-01-07"),
by = "days")
NADayWiseOrders[J(all_dates), roll=Inf]
date orders amount guests
1: 2013-01-01 50 2272.55 149
2: 2013-01-02 3 64.04 4
3: 2013-01-03 3 64.04 4
4: 2013-01-04 1 18.81 0
5: 2013-01-05 2 77.62 0
6: 2013-01-06 2 77.62 0
7: 2013-01-07 2 35.82 2

Resources