How to eliminate warning message when summarizing date based on max(Date)

How to eliminate warning message when summarizing date based on max(Date) - r

I am trying to summarize dates by ID based on the max() of ExitDate. When I run the following code, however, I receive this message:
In max.default(structure(NA_real_, class = "Date"), na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
I have imported the data and set the date values using setAs. Using setClass eliminated the initial warning message (as noted in another answer) but I don't know how to eliminate these other warning messages.
Any advice would be greatly appreciated!
setClass("myDate")
setAs("character", "myDate", function(from)
as.Date(from, format = "%m/%d/%Y"))
prog <- read.csv("Program.csv",
stringsAsFactors = FALSE,
colClass = c("EntryDate" = "myDate",
"ExitDate" = "myDate",
"DateUpdated"= "myDate")
prog2 <- prog %>%
group_by(id, EntryDate) %>%
summarize(new_exit = as.Date(max(ExitDate, na.rm = TRUE), origin ="1970-01-01")) %>%
right_join(prg, by = c("id", "EntryDate"))
id EntryDate ExitDate
1 5 2014-10-06 <NA>
2 5 2014-02-05 2014-02-21
3 3 2014-02-05 2014-02-28
4 3 2014-09-30 2014-11-25
5 3 2014-11-25 <NA>
6 4 2014-10-03 <NA>

Related

mutate variable based on other columns with similar names

I have a df here (the desired output, my starting df does not have the Flag variable):
df <- data.frame(
Person = c('1','2','3'),
Date = as.Date(c('2010-09-30', '2012-11-20', '2015-03-11')),
Treatment_1 = as.Date(c('2010-09-30', '2012-11-21', '2015-03-22')),
Treatment_2 = as.Date(c('2011-09-30', 'NA', '2011-03-22')),
Treatment_3 = as.Date(c('2012-09-30', '2015-11-21', '2015-06-22')),
Surgery_1 = as.Date(c(NA, '2016-11-21', '2015-03-12')),
Surgery_2 = as.Date(c(NA, '2017-11-21', '2019-03-12')),
Surgery_3 = as.Date(c(NA, '2018-11-21', '2013-03-12')),
Flag = c('', 'Y', '')
)
and I want to derive the Flag variable based on these conditions:
For any column that starts with Treatment, set Flag to "" if Date = Treatment
For any column that starts with Surgery, set Flag to "" if Date = Surgery OR Date = Surgery +1 OR Date = Surgery - 1 (basically if the Surgery date is on the day, one day before, or one day after the Date variable, set Flag to "").
else set Flag = "Y"
I've looked into mutate_at but that rewrites the variables and assigns values of True/False.
This is wrong but this is my attempt:
df2 <- df %>%
mutate(Flag = case_when(
vars(starts_with("Treatment"), Date == . ) ~ '',
vars(starts_with("Surgery"), Date == . | Date == . - 1 | Date == . + 1) ~ '',
TRUE ~ 'Y')
)
UPDATE 2022-Aug-22
When I change a cell with the same date as the one in row 2:
df <- data.frame(
Person = c('1','2','3'),
Date = as.Date(c('2010-09-30', '2012-11-20', '2015-03-11')),
Treatment_1 = as.Date(c('2010-09-30', '2012-11-21', '2015-03-22')),
Treatment_2 = as.Date(c('2011-09-30', 'NA', '2011-03-22')),
Treatment_3 = as.Date(c('2012-09-30', '2015-11-21', '2015-06-22')),
Surgery_1 = as.Date(c(NA, '2016-11-21', '2015-03-12')),
Surgery_2 = as.Date(c(NA, '2017-11-21', '2019-03-12')),
Surgery_3 = as.Date(c(NA, '2018-11-21', '2012-11-20')),
Flag = c('', 'Y', '')
)
and then re-run the base R solution, the Flag in the second row is no longer "Y" but it should be as in that row, it doesn't meet any of the above conditions.

We can use rowwise and c_across along with any for each condition in case_when. Then, we can make a list for the Date (and +1, -1 days) for Surgery to match.
library(tidyverse)
df %>%
rowwise() %>%
mutate(Flag = case_when(
any(c_across(starts_with("Treatment")) == Date) ~ "",
any(c_across(starts_with("Surgery")) %in% c(Date, (Date +1), (Date-1))) ~ "",
TRUE ~ "Y"
))
Output
Person Date Treatment_1 Treatment_2 Treatment_3 Surgery_1 Surgery_2 Surgery_3 Flag
<chr> <date> <date> <date> <date> <date> <date> <date> <chr>
1 1 2010-09-30 2010-09-30 2011-09-30 2012-09-30 NA NA NA ""
2 2 2012-11-20 2012-11-21 NA 2015-11-21 2016-11-21 2017-11-21 2018-11-21 "Y"
3 3 2015-03-11 2015-03-22 2011-03-22 2015-06-22 2015-03-12 2019-03-12 2013-03-12 ""
Update
Here is a possible base R solution that is a lot quicker than tidyverse. This could be done in one line of code, but I decided that readability is better. First, I duplicate the Surgery columns so that we have +1 day and -1 day, and then convert these columns to character. Then, I subset the Treatment columns and convert to character. I convert to character as you cannot compare Date with %in% or ==. Then, I bind the date, treatment, and surgery columns together (a). Then, I use an ifelse for if the Date is in any of the columns but doing it row by row with apply, then we return "" and if not then return Y. Then, I bind the result back to the original dataframe (minus Flag from your original dataframe).
dup_names <- colnames(df)[startsWith(colnames(df), "Surgery")]
surgery <-
cbind(df[dup_names], setNames(df[dup_names] + 1, paste0(dup_names, "_range1")))
surgery <-
sapply(cbind(surgery, setNames(df[dup_names] - 1, paste0(
dup_names, "_range2"
))), as.character)
treatment <-
sapply(df[startsWith(colnames(df), "Treatment")], as.character)
a <- cbind(Date = as.character(df$Date), treatment, surgery)
cbind(subset(df, select = -Flag),
Flag = ifelse(apply(a[,1]==a[,2:ncol(a)], 1, any, na.rm = TRUE), "", "Y"))
Benchmark

Here is an alternative using across approach:
library(tidyverse)
df %>%
mutate(across(starts_with("Treatment"), ~as.numeric(. %in% Date), .names ="new_{.col}"),
across(starts_with("Surgery"), ~as.numeric(. %in% c(Date, Date+1, Date-1)), .names ="new_{.col}")) %>%
mutate(Flag = ifelse(rowSums(select(., contains('new')))==1, "", "Y"), .keep="used") %>%
bind_cols(df)
Flag Person Date Treatment_1 Treatment_2 Treatment_3 Surgery_1 Surgery_2 Surgery_3
1 1 2010-09-30 2010-09-30 2011-09-30 2012-09-30 <NA> <NA> <NA>
2 Y 2 2012-11-20 2012-11-21 <NA> 2015-11-21 2016-11-21 2017-11-21 2018-11-21
3 3 2015-03-11 2015-03-22 2011-03-22 2015-06-22 2015-03-12 2019-03-12 2013-03-12

Updated to add data.table approach
If you want a data.table approach, here it is:
df[melt(df, id=c(1,2))[,flag:=fifelse(
(str_starts(variable,"T") & value==Date) |
(str_starts(variable,"S") & abs(value-Date)<=1),"", "Y")][
, .(flag=min(flag,na.rm=T)), Person], on=.(Person)]
Output
Person Date Treatment_1 Treatment_2 Treatment_3 Surgery_1 Surgery_2 Surgery_3 flag
1: 1 2010-09-30 2010-09-30 2011-09-30 2012-09-30 <NA> <NA> <NA>
2: 2 2012-11-20 2012-11-21 <NA> 2015-11-21 2016-11-21 2017-11-21 2018-11-21 Y
3: 3 2015-03-11 2015-03-22 2011-03-22 2015-06-22 2015-03-12 2019-03-12 2013-03-12
I like Andrew's approach, but I was working on this when his answer came in, so here it is in case you are interested
df %>% inner_join(
pivot_longer(df, cols=Treatment_1:Surgery_3) %>%
mutate(flag=case_when(
(str_starts(name,"T") & value==Date) | (str_starts(name,"S") & abs(value-Date)<=1) ~ "",
TRUE ~"Y")) %>%
group_by(Person) %>%
summarize(flag = min(flag))
)
Output:
Person Date Treatment_1 Treatment_2 Treatment_3 Surgery_1 Surgery_2 Surgery_3 flag
1 1 2010-09-30 2010-09-30 2011-09-30 2012-09-30 <NA> <NA> <NA>
2 2 2012-11-20 2012-11-21 <NA> 2015-11-21 2016-11-21 2017-11-21 2018-11-21 Y
3 3 2015-03-11 2015-03-22 2011-03-22 2015-06-22 2015-03-12 2019-03-12 2013-03-12

How to insert NA values in a ts object to fill the gap with another time series?

(I'm new to R) I have two time series with different lengths, one starting from jan 2011 (ts1) and the other from jan 2016 (ts2).
How to fill the time interval "ts1 - ts2" (from jan 2011 to dec 2015) in ts2 with NA values to "align" it with ts1?

Say you have two time-series data.tables of different lengths:
dt1 = data.table(
Date = seq(as.Date('2000-01-01'), as.Date('2000-01-10'), by = 1),
Return1 = rnorm(10)
)
dt2 = data.table(
Date = seq(as.Date('2000-01-05'), as.Date('2000-01-10'), by = 1),
Return2 = rnorm(6)
)
You can perform merge() onto the two data.tables and supply the variable you want to merge them by. In this case, "Date". Furthermore, we pass the All = T argument in order to keep rows which are not in the union of Date entries across dt1 and dt2.
dtmain = merge(dt1, dt2, on = 'Date', all = T)
> dtmain
Date Return1 Return2
1: 2000-01-01 -2.9934945 NA
2: 2000-01-02 -0.6712139 NA
3: 2000-01-03 0.2146184 NA
4: 2000-01-04 1.2342134 NA
5: 2000-01-05 0.3276646 -2.35205416
6: 2000-01-06 1.1823349 0.39382064
7: 2000-01-07 -0.8771251 0.72213968
8: 2000-01-08 -0.8145120 -0.15433887
9: 2000-01-09 1.0455526 0.05794934
10: 2000-01-10 -1.2378961 -0.49929648
Consider now if you have three or more time-series data.table objects:
dt3 = data.table(
Date = seq(as.Date('2000-01-02'), as.Date('2000-01-8'), by = 1),
Return3 = rnorm(7)
)
If you want to merge them all, you can use the following solution using Reduce():
dtlist = list(dt1, dt2, dt3) # Put your TS objects in a list
by = 'Date' # Declare the variable you want to merge the tables on
dtmain = Reduce(function(...) merge(..., all = TRUE, by = by), dtlist)
> dtmain
Date Return1 Return2 Return3
1: 2000-01-01 0.45667875 NA NA
2: 2000-01-02 -0.84284705 NA 0.7747270
3: 2000-01-03 0.58849764 NA -0.4224948
4: 2000-01-04 -0.76110475 NA -0.7372464
5: 2000-01-05 0.72950287 -0.6800249 -0.6412878
6: 2000-01-06 1.65512675 -0.9477490 0.4073604
7: 2000-01-07 -0.56407002 0.9283520 0.3264292
8: 2000-01-08 0.05535025 1.7146754 0.7125701
9: 2000-01-09 0.06031502 1.2413374 NA
10: 2000-01-10 -0.23840704 0.3846532 NA

Welcome to StackOverflow! In the future please include an example of your data so that we can test the code before providing an answer. In this case, any time series object with different start dates would suffice. I have had to find my own data to answer your question.
First I load stock price data into R with the quantmod package. This returns objects that are of the class xts, which is convenient in this case. I've loaded AAPL, which starts from 2011 and GOOG, which starts from 2016. Now to achieve what you want, the easiest way is to create a new xts object from 2011 to 2016 and fill it with NAs. Then simply combine the shorter time series object with this new time series object that has NAs, in this case GOOG.
library(quantmod)
getSymbols('AAPL', from = "2011-01-01", to = "2019-09-30")
getSymbols("GOOG", from = '2016-01-01', to = "2019-09-30")
new_rows <- nrow(AAPL) - nrow(GOOG)
temp <- matrix(NA, nrow = new_rows, ncol = ncol(GOOG))
temp <- xts(temp, order.by = index(AAPL[1:new_rows,,drop=F]))
column_names <- colnames(GOOG)
GOOG <- rbind(temp, GOOG)
colnames(GOOG) <- column_names
nrow(AAPL)==nrow(GOOG)
[1] TRUE
Now GOOG has the same start date as AAPL and it has NAs from 2011 to 2015 December.

Matching values between data frames based on overlapping dates

I am currently dealing with the following data structures:
Attributes df:
ID Begin_A End_A Interval Value
1 5 1990-03-01 2017-03-10 1990-03-01 UTC--2017-03-10 UTC Cat1
2 10 1993-12-01 2017-12-02 1993-12-01 UTC--2017-12-02 UTC Cat2
3 5 1991-03-01 2017-03-03 1991-03-01 UTC--2017-03-03 UTC Cat3
4 10 1995-12-05 2017-12-10 1995-12-05 UTC--2017-12-10 UTC Cat4
Bookings df:
ID Begin_A End_A Interval
1 5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC
2 6 2017-05-03 2017-05-05 2017-05-03 UTC--2017-05-05 UTC
3 8 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC
4 10 2017-12-05 2017-12-06 2017-12-05 UTC--2017-12-06 UTC
As already mentioned in the following post: Matching values conditioned on overlapping Intervals and ID , I intend to do the following data-restructuring: Take the ID from bookings, filter all rows of the attributes data frame where attributes ID matches the booking ID. Check which of the rows with matching attribute ID also have overlapping time intervals (int_overlaps from lubridate). Then take the respective value from the Value column and print each of them in the Attribute_value column.
The intended result would look like this:
ID Begin_A End_A Interval Attribute_value
5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC Cat1,Cat3
6 2017-05-03 2017-05-05 2017-05-03 UTC--2017-05-05 UTC NA
8 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC NA
10 2017-12-05 2017-12-06 2017-12-05 UTC--2017-12-06 UTC Cat4
ycw already provided a partial answer to this question here:(https://stackoverflow.com/a/46819541/8259308). This solution does not allow long periods between Begin_A and End_A in the attributes data frame, because a vector with individual dates is created with this command:
complete(Date = full_seq(Date, period = 1), ID) %>%
Since my original dataset has a very large amount of observations with long time frames in the Attributes data frame, R is not capable of processing these large amount of observations. My idea was to either modify the above mentioned line to reduce the jumps in dates to months ( which would also diminish the precision) or to try a new approach.
The following code produces the data frames presented above:
library(lubridate)
library(tidyverse)
# Attributes data frame:
date1 <- as.Date(c('1990-3-1','1993-12-1','1991-3-1','1995-12-5'))
date2 <- as.Date(c('2017-3-10','2017-12-2','2017-3-3','2017-12-10'))
attributes <- data.frame(matrix(NA,nrow=4, ncol = 5))
names(attributes) <- c("ID","Begin_A", "End_A", "Interval", "Value")
attributes$ID <- as.numeric(c(5,10,5,10))
attributes$Begin_A <-date1
attributes$End_A <-date2
attributes$Interval <-attributes$Begin_A %--% attributes$End_A
attributes$Value<- as.character(c("Cat1","Cat2","Cat3","Cat4"))
### Bookings data frame:
date1 <- as.Date(c('2017-3-3','2017-5-3','2017-3-3','2017-12-5'))
date2 <- as.Date(c('2017-3-5','2017-5-5','2017-3-5','2017-12-6'))
bookings <- data.frame(matrix(NA,nrow=4, ncol = 4))
names(bookings) <- c("ID","Begin_A", "End_A", "Interval")
bookings$ID <- as.numeric(c(5,6,8,10))
bookings$Begin_A <-date1
bookings$End_A <-date2
bookings$Interval <-bookings$Begin_A %--% bookings$End_A
This is the solution for the previous post provided by ycw:
library(tidyverse)
attributes2 <- attributes %>%
select(-Interval) %>%
gather(Type, Date, ends_with("_A")) %>%
select(-Type) %>%
group_by(Value) %>%
complete(Date = full_seq(Date, period = 1), ID) %>%
ungroup()
bookings2 <- bookings %>%
select(-Interval) %>%
gather(Type, Date, ends_with("_A")) %>%
select(-Type) %>%
group_by(ID) %>%
complete(Date = full_seq(Date, period = 1)) %>%
ungroup()
bookings3 <- bookings2 %>%
left_join(attributes2, by = c("ID", "Date")) %>%
group_by(ID) %>%
summarise(Attribute_value = toString(sort(unique(Value)))) %>%
mutate(Attribute_value = ifelse(Attribute_value %in% "", NA, Attribute_value))
bookings4 <- bookings %>% left_join(bookings3, by = "ID")
bookings4
ID Begin_A End_A Interval Attribute_value
1 5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC Cat1, Cat3
2 6 2017-05-03 2017-05-05 2017-05-03 UTC--2017-05-05 UTC <NA>
3 8 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC <NA>
4 10 2017-12-05 2017-12-06 2017-12-05 UTC--2017-12-06 UTC Cat4

You may consider data.table which allows for "non-equi joins", i.e. joins based on >=, >, <= and <. In the same call, aggregate operations may be performed on the groups in the LHS data set that each row in the RHS data set (i) matches (by = .EACHI).
d1[d2, on = .(id = id, end >= begin),
.(i.begin, i.end, val_str = toString(val)), by = .EACHI]
# id end i.begin i.end val_str
# 1: 5 2017-03-03 2017-03-03 2017-03-05 Cat3, Cat1
# 2: 6 2017-05-03 2017-05-03 2017-05-05 NA
# 3: 8 2017-03-03 2017-03-03 2017-03-05 NA
# 4: 10 2017-12-05 2017-12-05 2017-12-06 Cat4
Data preparation:
d1 <- data.frame(id = c(5, 10, 5, 10),
begin = as.Date(c('1990-3-1','1993-12-1','1991-3-1','1995-12-5')),
end = as.Date(c('2017-3-10','2017-12-2','2017-3-3','2017-12-10')),
val = c("Cat1", "Cat2", "Cat3", "Cat4"))
d2 <- data.frame(id = c(5, 6, 8, 10),
begin = as.Date(c('2017-3-3','2017-5-3','2017-3-3','2017-12-5')),
end = as.Date(c('2017-3-5','2017-5-5','2017-3-5','2017-12-6')))
library(data.table)
setDT(d1)
setDT(d2)

calculate stats based on dynamic window using dplyr

I am trying to use dplyr in R to calculate rolling stats (mean, sd, etc) based on a dynamic window based on dates and for specific models. For instance, within groupings of items, I would like to calculate the rolling mean for all data 10 days prior. The dates on the data are not sequential and not complete so I can't use a fixed window.
One way to do this is use rollapply referencing the window width as shown below. However, I'm having trouble calculating the dynamic width. I'd prefer a method that omits the intermediate step of calculating the window and simply calculate based on the date_lookback. Here's a toy example.
I've used for loops to do this, but they are very slow.
library(dplyr)
library(zoo)
date_lookback <- 10 #days to look back for rolling calcs
df <- data.frame(label = c(rep("a",5),rep("b",5)),
date = as.Date(c("2017-01-02","2017-01-20",
"2017-01-21","2017-01-30","2017-01-31","2017-01-05",
"2017-01-08","2017-01-09","2017-01-10","2017-01-11")),
data = c(790,493,718,483,825,186,599,408,108,666),stringsAsFactors = FALSE) %>%
mutate(.,
cut_date = date - date_lookback, #calcs based on sample since this date
dyn_win = c(1,1,2,3,3,1,2,3,4,5), ##!! need to calculate this vector??
roll_mean = rollapply(data, align = "right", width = dyn_win, mean),
roll_sd = rollapply(data, align = "right", width = dyn_win, sd))
These are the roll_mean and roll_sd results I'm looking for:
> df
label date data cut_date dyn_win roll_mean roll_sd
1 a 2017-01-02 790 2016-12-23 1 790.0000 NA
2 a 2017-01-20 493 2017-01-10 1 493.0000 NA
3 a 2017-01-21 718 2017-01-11 2 605.5000 159.0990
4 a 2017-01-30 483 2017-01-20 3 564.6667 132.8847
5 a 2017-01-31 825 2017-01-21 3 675.3333 174.9467
6 b 2017-01-05 186 2016-12-26 1 186.0000 NA
7 b 2017-01-08 599 2016-12-29 2 392.5000 292.0351
8 b 2017-01-09 408 2016-12-30 3 397.6667 206.6938
9 b 2017-01-10 108 2016-12-31 4 325.2500 222.3921
10 b 2017-01-11 666 2017-01-01 5 393.4000 245.5928
Thanks in advance.

You could try explicitly referencing your dataset inside the dplyr call:
date_lookback <- 10 #days to look back for rolling calcs
df <- data.frame(label = c(rep("a",5),rep("b",5)),
date = as.Date(c("2017-01-02","2017-01-20",
"2017-01-21","2017-01-30","2017-01-31","2017-01-05",
"2017-01-08","2017-01-09","2017-01-10","2017-01-11")),
data = c(790,493,718,483,825,186,599,408,108,666),stringsAsFactors = FALSE)
df %>%
group_by(date,label) %>%
mutate(.,
roll_mean = mean(ifelse(df$date >= date-date_lookback & df$date <= date & df$label == label,
df$data,NA),na.rm=TRUE),
roll_sd = sd(ifelse(df$date >= date-date_lookback & df$date <= date & df$label == label,
df$data,NA),na.rm=TRUE))

Automatically expanding data frame with NAs values across any number of columns for missing dates

I'm interested in expanding a data frame with missing values across any number of columns for the periods where data is missing following the data units.
Example
The problem can be easily illustrated on with use of a simple example.
Data
The generated data contains some time series observations and dates missing on random.
# Data generation
# Seed
set.seed(1)
# Size
sizeDf <- 10
# Populate data frame
dta <- data.frame(
dates = seq(
from = Sys.Date() - (sizeDf - 1),
to = Sys.Date(),
by = 1
),
varA = runif(n = sizeDf),
varB = runif(n = sizeDf),
varC = runif(n = sizeDf)
)
# Delete rows
dta <-
dta[-sample(1:sizeDf, replace = TRUE, size = round(sqrt(sizeDf), 0)),]
Preview
>> dta
dates varA varB varC
1 2016-07-28 0.26550866 0.2059746 0.93470523
2 2016-07-29 0.37212390 0.1765568 0.21214252
3 2016-07-30 0.57285336 0.6870228 0.65167377
4 2016-07-31 0.90820779 0.3841037 0.12555510
7 2016-08-03 0.94467527 0.7176185 0.01339033
8 2016-08-04 0.66079779 0.9919061 0.38238796
9 2016-08-05 0.62911404 0.3800352 0.86969085
10 2016-08-06 0.06178627 0.7774452 0.34034900
Key characteristics
From the perspective of the proposed analysis, the key characteristics are:
The date units, days in that case
Randomly missing dates
Missing dates
seq(
from = Sys.Date() - (sizeDf - 1),
to = Sys.Date(),
by = 1
)[!(seq(
from = Sys.Date() - (sizeDf - 1),
to = Sys.Date(),
by = 1
) %in% dta$dates)]
"2016-08-01" "2016-08-02"
Desired results
The newly created data frame should look like that:
>> dtaNew
dates varA varB varC
1 2016-07-28 0.3337749 0.32535215 0.8762692
2 2016-07-29 0.4763512 0.75708715 0.7789147
3 2016-07-30 0.8921983 0.20269226 0.7973088
4 2016-07-31 0.8643395 0.71112122 0.4552745
5 2016-08-01 NA NA NA
6 2016-08-02 NA NA NA
7 2016-08-03 0.9606180 0.14330438 0.6049333
8 2016-08-04 0.4346595 0.23962942 0.6547239
9 2016-08-05 0.7125147 0.05893438 0.3531973
10 2016-08-06 0.3999944 0.64228826 0.2702601
This simply obtained with use of:
dtaNew[dtaNew$dates %in% missDates, 2:4] <- NA
where the missDates is taken from the previous seq.
Attempts
Creating vector with all the dates is simple:
allDates <- seq(from = min(dta$dates), to = max(dta$dates), by = 1)
but obviously I cannot just push it to the data frame:
>> dta$allDates <- allDates
Error in `$<-.data.frame`(`*tmp*`, "allDates", value = c(17010, 17011, :
replacement has 10 rows, data has 8
The possible solution could use the loop that would push the row with NA values to the data frame row by row for each of the dates identified as missing but this is grossly inefficient and messy.
To sum up, I'm interested in achieving the following:
Expanding the data frame with all the dates following the same unit. I.e. for missing daily data days are added, for missing quarterly data quarters are added.
I would like to then push the NA values across all the columns in the data frame for where the missing date was found

If I understand your question, you can use rbind.fill from the plyr package to get your desired output:
sizeDf <- 10
# Populate data frame
dta <- data.frame(
dates = seq(
from = Sys.Date() - (sizeDf - 1),
to = Sys.Date(),
by = 1
),
varA = runif(n = sizeDf),
varB = runif(n = sizeDf),
varC = runif(n = sizeDf)
)
# Delete rows
dta <-dta[-sample(1:sizeDf, replace = TRUE, size = round(sqrt(sizeDf), 0)),]
#Get missing dates
missing_dates <- seq(from=min(dta$dates), to=max(dta$dates), by=1)[!(seq(from=min(dta$dates), to=max(dta$dates), by=1) %in% dta$dates)]
#Create the new dataset by using plyr's rbind.fill function
dta_new <- plyr::rbind.fill(dta,data.frame(dates=missing_dates))
#Order the data by the dates column
dta_new <- dta_new[order(dta_new$dates),]
#Print it
print(dta_new, row.names = F, right = F)
dates varA varB varC
2016-07-28 0.837859418 0.2966637 0.61245244
2016-07-29 0.144884547 0.9284294 0.11033990
2016-07-30 NA NA NA
2016-07-31 NA NA NA
2016-08-01 0.003167049 0.9096805 0.29239470
2016-08-02 0.574859760 0.1466993 0.69541969
2016-08-03 NA NA NA
2016-08-04 0.748639215 0.9602836 0.67681826
2016-08-05 0.983939562 0.4867804 0.35270309
2016-08-06 0.383366957 0.2241982 0.09244522
I hope this helps.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to eliminate warning message when summarizing date based on max(Date) - r

Related

mutate variable based on other columns with similar names

How to insert NA values in a ts object to fill the gap with another time series?

Matching values between data frames based on overlapping dates

calculate stats based on dynamic window using dplyr

Automatically expanding data frame with NAs values across any number of columns for missing dates

Categories

Resources