I have a large time series (in data frame form) (n=>6000) that looks like this:
time, precip
1 2005-09-30 11:45:00, 0.08
2 2005-09-30 23:45:00, 0.72
3 2005-10-01 11:45:00, 0.01
4 2005-10-01 23:45:00, 0.08
5 2005-10-02 11:45:00, 0.10
6 2005-10-02 23:45:00, 0.33
7 2005-10-03 11:45:00, 0.15
8 2005-10-03 23:45:00, 0.30
9 2005-10-04 11:45:00, 0.00
10 2005-10-04 23:45:00, 0.00
11 2005-10-05 11:45:00, 0.02
12 2005-10-05 23:45:00, 0.00
13 2005-10-06 11:45:00, 0.00
14 2005-10-06 23:45:00, 0.01
15 2005-10-07 11:45:00, 0.00
16 2005-10-07 23:45:00, 0.00
17 2005-10-08 11:45:00, 0.00
18 2005-10-08 23:45:00, 0.16
19 2005-10-09 11:45:00, 0.03
20 2005-10-09 23:45:00, 0.00
Each row has a time (YYYY-MM-DD HH:MM:SS, 12 hour timeseries) and a precipitation amount. I'd like to separate the data by storm events.
What I'd like to do is this:
1) adding a new column called "storm"
2) for each set of amount values separated by 0's, call it one storm.
For example...
Time, Precip, Storm
1 2005-09-30 11:45:00, 0.08, 1
2 2005-09-30 23:45:00, 0.72, 1
3 2005-10-01 11:45:00, 0.01, 1
4 2005-10-01 23:45:00, 0.08, 1
5 2005-10-02 11:45:00, 0.10, 1
6 2005-10-02 23:45:00, 0.33, 1
7 2005-10-03 11:45:00, 0.15, 1
8 2005-10-03 23:45:00, 0.30, 1
9 2005-10-04 11:45:00, 0.00
10 2005-10-04 23:45:00, 0.00
11 2005-10-05 11:45:00, 0.02, 2
12 2005-10-05 23:45:00, 0.00
13 2005-10-06 11:45:00, 0.00
14 2005-10-06 23:45:00, 0.01, 3
15 2005-10-07 11:45:00, 0.00
16 2005-10-07 23:45:00, 0.00
17 2005-10-08 11:45:00, 0.00
18 2005-10-08 23:45:00, 0.16, 4
19 2005-10-09 11:45:00, 0.03, 4
20 2005-10-09 23:45:00, 0.00
4) after that, my plan is to subset the data by storm event.
I am pretty new to R, so don't be afraid of pointing out the obvious. Your help would be much appreciated!
You can find the events within a storm then use rle and modify the results
# assuming your data is called rainfall
# identify whether a precipitation has been recorded at each timepoint
rainfall$storm <- rainfall$precip > 0
# do run length encoding on this storm indicator
storms < rle(rainfall$storms)
# set the FALSE values to NA
is.na(storms$values) <- !storms$values
# replace the TRUE values with a number in seqence
storms$values[which(storms$values)] <- seq_len(sum(storms$values, na.rm = TRUE))
# use inverse.rle to revert to the full length column
rainfall$stormNumber <- inverse.rle(storms)
Assuming this input:
Lines <- "time, precip
1 2005-09-30 11:45:00, 0.08
2 2005-09-30 23:45:00, 0.72
3 2005-10-01 11:45:00, 0.01
4 2005-10-01 23:45:00, 0.08
5 2005-10-02 11:45:00, 0.10
6 2005-10-02 23:45:00, 0.33
7 2005-10-03 11:45:00, 0.15
8 2005-10-03 23:45:00, 0.30
9 2005-10-04 11:45:00, 0.00
10 2005-10-04 23:45:00, 0.00
11 2005-10-05 11:45:00, 0.02
12 2005-10-05 23:45:00, 0.00
13 2005-10-06 11:45:00, 0.00
14 2005-10-06 23:45:00, 0.01
15 2005-10-07 11:45:00, 0.00
16 2005-10-07 23:45:00, 0.00
17 2005-10-08 11:45:00, 0.00
18 2005-10-08 23:45:00, 0.16
19 2005-10-09 11:45:00, 0.03
20 2005-10-09 23:45:00, 0.00
"
We read in the data and then create a logical vector that is TRUE for each non-zero precip for which the prior value is zero. We prepend the first value which is TRUE if z[1] is non-zero and FALSE if zero. Applying cumsum to this vector gives the correct values in positions corresponding to non-zero precip values. To handle the values whose positions correspond to zero precip values we use replace to store empty into them:
# read in data
library(zoo)
z <- read.zoo(text = Lines, skip = 1, tz = "", index = 2:3)[, 2]
# calculate
e <- NA # empty
cbind(precip = z, storm = replace(cumsum(c(z[1]!=0, z!=0 & lag(z,-1)==0)), z==0, e))
The last line gives this:
precip storm
2005-09-30 11:45:00 0.08 1
2005-09-30 23:45:00 0.72 1
2005-10-01 11:45:00 0.01 1
2005-10-01 23:45:00 0.08 1
2005-10-02 11:45:00 0.10 1
2005-10-02 23:45:00 0.33 1
2005-10-03 11:45:00 0.15 1
2005-10-03 23:45:00 0.30 1
2005-10-04 11:45:00 0.00 NA
2005-10-04 23:45:00 0.00 NA
2005-10-05 11:45:00 0.02 2
2005-10-05 23:45:00 0.00 NA
2005-10-06 11:45:00 0.00 NA
2005-10-06 23:45:00 0.01 3
2005-10-07 11:45:00 0.00 NA
2005-10-07 23:45:00 0.00 NA
2005-10-08 11:45:00 0.00 NA
2005-10-08 23:45:00 0.16 4
2005-10-09 11:45:00 0.03 4
2005-10-09 23:45:00 0.00 NA
Related
I have a dataset like this:
structure(list(`Frequency
Percent` = c("car", "window", "ball",
"ups"), AI = c("2\n0.00", "3\n0.00", "1\n0.00", "2\n0.00"), BLK = c("0\n0.00",
"218\n0.29", "48\n0.06", "0\n0.00"), HIANIC = c("1\n0.00", "8\n0.01",
"4\n0.01", "0\n0.00"), NATRICAN = c("9\n0.01", "7\n0.01", "8\n0.01",
"0\n0.00"), UNK = c("15\n0.02", "83\n0.11", "36\n0.05", "0\n0.00"
), yy = c("111\n0.15", "897\n1.20", "756\n1.02", "1\n0.00")), class = "data.frame", row.names = c(NA,
-4L))
How can I split each row by "" and remove n to make two new columns. For instance, car and AI cell (“2\n0.00”), I will have 2 and 0.00 in two different columns.
One way is to use tidyr::separate in a for loop:
for(i in names(df[,-1])){
df <- tidyr::separate(df, i, sep = "\n", into = c(i, paste0(i,"_val")))
}
Output:
# Frequency\n Percent AI AI_val BLK BLK_val HIANIC HIANIC_val NATRICAN NATRICAN_val UNK UNK_val yy yy_val
# 1 car 2 0.00 0 0.00 1 0.00 9 0.01 15 0.02 111 0.15
# 2 window 3 0.00 218 0.29 8 0.01 7 0.01 83 0.11 897 1.20
# 3 ball 1 0.00 48 0.06 4 0.01 8 0.01 36 0.05 756 1.02
# 4 ups 2 0.00 0 0.00 0 0.00 0 0.00 0 0.00 1 0.00
Using tidyr::separate_rows and tidyr::pivot_wider you could do:
library(tidyr)
library(dplyr)
dat |>
mutate(unit = c("n\npct")) |>
separate_rows(-1, sep = "\n") |>
pivot_wider(names_from = "unit", values_from = -1)
#> # A tibble: 4 × 15
#> Frequency\n…¹ AI_n AI_pct BLK_n BLK_pct HIANI…² HIANI…³ NATRI…⁴ NATRI…⁵ UNK_n
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 car 2 0.00 0 0.00 1 0.00 9 0.01 15
#> 2 window 3 0.00 218 0.29 8 0.01 7 0.01 83
#> 3 ball 1 0.00 48 0.06 4 0.01 8 0.01 36
#> 4 ups 2 0.00 0 0.00 0 0.00 0 0.00 0
#> # … with 5 more variables: UNK_pct <chr>, yy_n <chr>, yy_pct <chr>,
#> # unit_n <chr>, unit_pct <chr>, and abbreviated variable names
#> # ¹`Frequency\n Percent`, ²HIANIC_n, ³HIANIC_pct, ⁴NATRICAN_n,
#> # ⁵NATRICAN_pct
A base one liner:
do.call(data.frame, lapply(DF, \(x) do.call(rbind, strsplit(x, "\n"))))
# Frequency.Percent AI.1 AI.2 BLK.1 BLK.2 HIANIC.1 HIANIC.2 NATRICAN.1
#1 car 2 0.00 0 0.00 1 0.00 9
#2 window 3 0.00 218 0.29 8 0.01 7
#3 ball 1 0.00 48 0.06 4 0.01 8
#4 ups 2 0.00 0 0.00 0 0.00 0
# NATRICAN.2 UNK.1 UNK.2 yy.1 yy.2
#1 0.01 15 0.02 111 0.15
#2 0.01 83 0.11 897 1.20
#3 0.01 36 0.05 756 1.02
#4 0.00 0 0.00 1 0.00
Or add also a type convert.
type.convert(do.call(data.frame, lapply(DF, \(x) do.call(rbind, strsplit(x, "\n")))), as.is=TRUE)
There is also a base R solution:
dat = structure(list(`Frequency
Percent` = c("car", "window", "ball",
"ups"), AI = c("2\n0.00", "3\n0.00", "1\n0.00", "2\n0.00"), BLK = c("0\n0.00",
"218\n0.29", "48\n0.06", "0\n0.00"), HIANIC = c("1\n0.00", "8\n0.01",
"4\n0.01", "0\n0.00"), NATRICAN = c("9\n0.01", "7\n0.01", "8\n0.01",
"0\n0.00"), UNK = c("15\n0.02", "83\n0.11", "36\n0.05", "0\n0.00"
), yy = c("111\n0.15", "897\n1.20", "756\n1.02", "1\n0.00")), class = "data.frame", row.names = c(NA,
-4L))
transformed = data.frame(Freq_pc = dat[,1])
for(col in seq(2, ncol(dat))){
transformed = cbind(transformed, t(matrix(unlist(strsplit(dat[,col], "\n")), nrow=2)))
names(transformed)[c(2*(col-1), 2*(col-1)+1)] = c(paste0(names(dat)[col], "_n"), paste0(names(dat)[col], "_pc"))
}
That results in:
Freq_pc AI_n AI_pc BLK_n BLK_pc HIANIC_n HIANIC_pc NATRICAN_n NATRICAN_pc UNK_n UNK_pc yy_n yy_pc
1 car 2 0.00 0 0.00 1 0.00 9 0.01 15 0.02 111 0.15
2 window 3 0.00 218 0.29 8 0.01 7 0.01 83 0.11 897 1.20
3 ball 1 0.00 48 0.06 4 0.01 8 0.01 36 0.05 756 1.02
4 ups 2 0.00 0 0.00 0 0.00 0 0.00 0 0.00 1 0.00
We may use cSplit
library(splitstackshape)
cSplit(df1, 2:ncol(df1), sep = "\n")
-output
Frequency\nPercent AI_1 AI_2 BLK_1 BLK_2 HIANIC_1 HIANIC_2 NATRICAN_1 NATRICAN_2 UNK_1 UNK_2 yy_1 yy_2
1: car 2 0 0 0.00 1 0.00 9 0.01 15 0.02 111 0.15
2: window 3 0 218 0.29 8 0.01 7 0.01 83 0.11 897 1.20
3: ball 1 0 48 0.06 4 0.01 8 0.01 36 0.05 756 1.02
4: ups 2 0 0 0.00 0 0.00 0 0.00 0 0.00 1 0.00
I've created a frequency table in R with the fdth package using this code
fdt(x, breaks = "Sturges")
The specific result was:
Class limits f rf rf(%) cf cf(%)
[-15.907,-11.817) 12 0.00 0.10 12 0.10
[-11.817,-7.7265) 8 0.00 0.07 20 0.16
[-7.7265,-3.636) 6 0.00 0.05 26 0.21
[-3.636,0.4545) 70 0.01 0.58 96 0.79
[0.4545,4.545) 58 0.00 0.48 154 1.27
[4.545,8.6355) 91 0.01 0.75 245 2.01
[8.6355,12.726) 311 0.03 2.55 556 4.57
[12.726,16.817) 648 0.05 5.32 1204 9.89
[16.817,20.907) 857 0.07 7.04 2061 16.93
[20.907,24.998) 1136 0.09 9.33 3197 26.26
[24.998,29.088) 1295 0.11 10.64 4492 36.90
[29.088,33.179) 1661 0.14 13.64 6153 50.55
[33.179,37.269) 2146 0.18 17.63 8299 68.18
[37.269,41.36) 2525 0.21 20.74 10824 88.92
[41.36,45.45) 1349 0.11 11.08 12173 100.00
It was given as a list:
> class(x)
[1] "fdt.multiple" "fdt" "list"
I need to convert it into a data frame object, so I can have a table. How can I do it?
I'm a beginner at using R :(
Since you did not provide a reproducible example of your data I have used example from the help page of ?fdt which is closer to what you have.
library(fdth)
mdf <- data.frame(c1=sample(LETTERS[1:3], 1e2, TRUE),
c2=as.factor(sample(1:10, 1e2, TRUE)),
n1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
n2=rnorm(100, 60, 4),
n3=rnorm(100, 50, 4),
stringsAsFactors=TRUE)
fdt <- fdt(mdf,breaks='FD',by='c1')
class(fdt)
#[1] "fdt.multiple" "fdt" "list"
You can extract the table part from each list and bind them together.
result <- purrr::map_df(fdt, `[[`, 'table')
#In base R
#result <- do.call(rbind, lapply(fdt, `[[`, 'table'))
result
# Class limits f rf rf(%) cf cf(%)
#1 [8.1781,9.1041) 5 0.20833333 20.833333 5 20.833333
#2 [9.1041,10.03) 6 0.25000000 25.000000 11 45.833333
#3 [10.03,10.956) 10 0.41666667 41.666667 21 87.500000
#4 [10.956,11.882) 3 0.12500000 12.500000 24 100.000000
#5 [53.135,56.121) 4 0.16000000 16.000000 4 16.000000
#6 [56.121,59.107) 8 0.32000000 32.000000 12 48.000000
#7 [59.107,62.092) 8 0.32000000 32.000000 20 80.000000
#....
I have two xts objects.
The first one "stocks_purchase_dates" contains the opening dates and purchase prices of 4 stocks.
stocks_purchase_dates
stock1 stock2 stock3 stock4
2018-03-19 NA NA NA 165.78
2018-03-21 NA 36.1 NA NA
2018-03-23 23 NA NA NA
2018-03-26 NA NA 48.81 NA
The second one "stocks_prices_mar15_mar28" contains the prices of the 4 stocks for the period March 15 - March 28, 2018.
stocks_prices_mar15_mar28
stock1 stock2 stock3 stock4
2018-03-15 23.30 44.28 54.75 177.34
2018-03-16 23.06 45.12 55.10 176.72
2018-03-19 23.31 44.44 54.31 174.02
2018-03-20 23.75 44.82 54.06 173.96
2018-03-21 23.92 43.19 53.91 170.02
2018-03-22 23.47 41.27 51.68 167.61
2018-03-23 23.43 39.96 49.90 163.73
2018-03-26 24.16 38.27 51.68 171.50
2018-03-27 23.40 37.19 50.10 167.11
2018-03-28 23.27 36.99 50.94 165.26
In "stocks_prices_mar15_mar28", I want to replace the values of each stock before the opening date (given in "stocks_purchase_dates") with 0s.
One possible solution is to replace by column and dates:
stocks_prices_mar15_mar28[,"stock1"]["/2018-03-22", ] <- 0
stocks_prices_mar15_mar28[,"stock2"]["/2018-03-20", ] <- 0
stocks_prices_mar15_mar28[,"stock3"]["/2018-03-25", ] <- 0
stocks_prices_mar15_mar28[,"stock4"]["/2018-03-18", ] <- 0
The output is:
stocks_prices_mar15_mar28
stock1 stock2 stock3 stock4
2018-03-15 0.00 0.00 0.00 0.00
2018-03-16 0.00 0.00 0.00 0.00
2018-03-19 0.00 0.00 0.00 165.78
2018-03-20 0.00 0.00 0.00 173.96
2018-03-21 0.00 36.10 0.00 170.02
2018-03-22 0.00 41.27 0.00 167.61
2018-03-23 23.00 39.96 0.00 163.73
2018-03-26 24.16 38.27 48.81 171.50
2018-03-27 23.40 37.19 50.10 167.11
2018-03-28 23.27 36.99 50.94 165.26
It works, but if we have a lot more stocks and opening dates it will become hardwork and complicated.
Is there any way to accomplish the task more efficiently, for example with apply or for loop or a function from the purrr package?
I used a for loop to loop over the column names. With !is.na(stocks_purchase_dates$stock1) you can find which stock1 record is not NA. With which you can find the position of this record. With .index you can filter inside an xts object. So what we can do is check if the index of stocks_prices_mar15_mar28 is lower than the index of record of stocks_purchase_dates we find looking with which and !is.na and then set those records to 0.
Now this only works if per stock there is only 1 purchase record in stocks_purchase_dates.
for(i in names(stocks_purchase_dates)) {
stocks_prices_mar15_mar28[, i][.index(stocks_prices_mar15_mar28[, i]) < index(stocks_purchase_dates[, i])[which(!is.na(stocks_purchase_dates[, i]))]] <- 0
}
stocks_prices_mar15_mar28
stock1 stock2 stock3 stock4
2018-03-15 0.00 0.00 0.00 0.00
2018-03-16 0.00 0.00 0.00 0.00
2018-03-19 0.00 0.00 0.00 174.02
2018-03-20 0.00 0.00 0.00 173.96
2018-03-21 0.00 43.19 0.00 170.02
2018-03-22 0.00 41.27 0.00 167.61
2018-03-23 23.43 39.96 0.00 163.73
2018-03-26 24.16 38.27 51.68 171.50
2018-03-27 23.40 37.19 50.10 167.11
2018-03-28 23.27 36.99 50.94 165.26
When copying and replacing values in the code, be careful of all brackets and braces.
I have a data frame like this:
for each gill, I would like to find the maximum time for which the Diameter is different from 0. I have tried to use the function aggregate and the dplyr package but this did not work. A combinaison of for, if and aggregate would probably work but I did not find how to do it.
I'm not sure of the best way to approach this. I'd appreciate any help.
After grouping by 'Gill', subset the 'Time' where 'Diametre' is not 0 and get the max (assuming 'Time' is numeric class)
library(dplyr)
df1 %>%
group_by(Gill) %>%
summarise(Time = max(Time[Diametre != 0]))
Here how you can use aggregate:
> df<- data.frame(
Gill = rep(1:11, each = 2),
diameter = c(0,0,1,0,0,0,73.36, 80.08,1,25.2,53.48,61.21,28.8,28.66,71.2,80.25,44.55,53.50,60.91,0,11,74.22),
time = 0.16
)
> df
Gill diameter time
1 1 0.00 0.16
2 1 0.00 0.16
3 2 1.00 0.16
4 2 0.00 0.16
5 3 0.00 0.16
6 3 0.00 0.16
7 4 73.36 0.16
8 4 80.08 0.16
9 5 1.00 0.16
10 5 25.20 0.16
11 6 53.48 0.16
12 6 61.21 0.16
13 7 28.80 0.16
14 7 28.66 0.16
15 8 71.20 0.16
16 8 80.25 0.16
17 9 44.55 0.16
18 9 53.50 0.16
19 10 60.91 0.16
20 10 0.00 0.16
21 11 11.00 0.16
22 11 74.22 0.16
> # Remove diameter == 0 before aggregate
> dfnew <- df[df$diameter != 0, ]
> aggregate(dfnew$time, list(dfnew$Gill), max )
Group.1 x
1 2 0.16
2 4 0.16
3 5 0.16
4 6 0.16
5 7 0.16
6 8 0.16
7 9 0.16
8 10 0.16
9 11 0.16
I would use a different approach than the elegant solution that akrun suggested. I know how to use this method to create the column MaxTime that you show in your image.
#This will split your df into a list of data frames for each gill.
list.df <- split(df1, df1$Gill)
Then you can use lapply to find the maximum of Time for each Gill and then make that value a new column called MaxTime.
lapply(list.df, function(x) mutate(x, MaxTime = max(x$Time[x$Diametre != 0])))
Then you can combine these split dataframes back together using bind_rows()
df1 = bind_rows(list.df)
I have a data frame which looks like this:
times values
1 2013-07-06 20:00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-09 20:00:00 0.13
4 2013-07-10 20:00:00 0.12
5 2013-07-11 20:00:00 0.03
6 2013-07-14 20:00:00 0.06
7 2013-07-15 20:00:00 0.08
8 2013-07-16 20:00:00 0.07
9 2013-07-17 20:00:00 0.08
There are a few dates missing from the data, and I would like to insert them and to carry over the value from the previous day into these new rows, i.e. obtain this:
times values
1 2013-07-06 20:00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-08 20:00:00 0.03
4 2013-07-09 20:00:00 0.13
5 2013-07-10 20:00:00 0.12
6 2013-07-11 20:00:00 0.03
7 2013-07-12 20:00:00 0.03
8 2013-07-13 20:00:00 0.03
9 2013-07-14 20:00:00 0.06
10 2013-07-15 20:00:00 0.08
11 2013-07-16 20:00:00 0.07
12 2013-07-17 20:00:00 0.08
...
I have been trying to use a vector of all the dates:
dates <- as.Date(1:length(df),origin = df$times[1])
I am stuck, and can't find a way to do it without a horrible for loop in which I'm getting lost...
Thank you for your help
Some test data (I am using Date, yours seems to be a different type, but this does not affect the algorithm):
data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")),
values = as.double(1:3))
# Generate **all** timestamps at which you want to have your result.
# I use `seq`, but you may use any other method of generating those timestamps.
alldates = seq(min(data$dates), max(data$dates), 1)
# Filter out timestamps that are already present in your `data.frame`:
# Construct a `data.frame` to append with missing values:
dates0 = alldates[!(alldates %in% data$dates)]
data0 = data.frame(dates = dates0, values = NA_real_)
# Append this `data.frame` and resort in time:
data = rbind(data, data0)
data = data[order(data$dates),]
# forward fill the values
# I would recommend to move this code into a separate `ffill` function:
# proved to be very useful in general):
current = NA_real_
data$values = sapply(data$values, function(x) {
current <<- ifelse(is.na(x), current, x); current })
library(zoo)
g <- data.frame(dates=seq(min(data$dates),max(data$dates),1))
na.locf(merge(g,data,by="dates",all.x=TRUE))
or entirely with zoo:
z <- read.zoo(data)
gz <- zoo(, seq(min(time(z)), max(time(z)), "day")) # time grid in zoo
na.locf(merge(z, gz))
Using tidyr's complete and fill assuming the times columns is already of class POSIXct.
library(tidyr)
df %>%
complete(times = seq(min(times), max(times), by = 'day')) %>%
fill(values)
# A tibble: 12 x 2
# times values
# <dttm> <dbl>
# 1 2013-07-06 20:00:00 0.02
# 2 2013-07-07 20:00:00 0.03
# 3 2013-07-08 20:00:00 0.03
# 4 2013-07-09 20:00:00 0.13
# 5 2013-07-10 20:00:00 0.12
# 6 2013-07-11 20:00:00 0.03
# 7 2013-07-12 20:00:00 0.03
# 8 2013-07-13 20:00:00 0.03
# 9 2013-07-14 20:00:00 0.06
#10 2013-07-15 20:00:00 0.08
#11 2013-07-16 20:00:00 0.07
#12 2013-07-17 20:00:00 0.08
data
df <- structure(list(times = structure(c(1373140800, 1373227200, 1373400000,
1373486400, 1373572800, 1373832000, 1373918400, 1374004800, 1374091200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), values = c(0.02,
0.03, 0.13, 0.12, 0.03, 0.06, 0.08, 0.07, 0.08)), row.names = c(NA,
-9L), class = "data.frame")
df2 <- data.frame(times=seq(min(df$times), max(df$times), by="day"))
df3 <- merge(x=df2, y=df, by="times", all.x=T)
idx <- which(is.na(df3$values))
for (id in idx)
df3$values[id] <- df3$values[id-1]
df3
# times values
# 1 2013-07-06 20:00:00 0.02
# 2 2013-07-07 20:00:00 0.03
# 3 2013-07-08 20:00:00 0.03
# 4 2013-07-09 20:00:00 0.13
# 5 2013-07-10 20:00:00 0.12
# 6 2013-07-11 20:00:00 0.03
# 7 2013-07-12 20:00:00 0.03
# 8 2013-07-13 20:00:00 0.03
# 9 2013-07-14 20:00:00 0.06
# 10 2013-07-15 20:00:00 0.08
# 11 2013-07-16 20:00:00 0.07
# 12 2013-07-17 20:00:00 0.08
You can try this:
setkey(NADayWiseOrders, date)
all_dates <- seq(from = as.Date("2013-01-01"),
to = as.Date("2013-01-07"),
by = "days")
NADayWiseOrders[J(all_dates), roll=Inf]
date orders amount guests
1: 2013-01-01 50 2272.55 149
2: 2013-01-02 3 64.04 4
3: 2013-01-03 3 64.04 4
4: 2013-01-04 1 18.81 0
5: 2013-01-05 2 77.62 0
6: 2013-01-06 2 77.62 0
7: 2013-01-07 2 35.82 2