Dealing with NAs in vectors with user-defined functions - r

I'm trying to create a function in R that replicates Excel's EOMonth function (i.e. you enter a date and a number of months and it returns you the end of the month date for the same number of months before / after the input date). I've a got a function that works on a single input using the lubridate package:
EOMonth <- function(date, months)
{
NewDate <- date %m+% months(months)
NewDate <- ceiling_date(NewDate, "month") - days(1)
}
The problem is how to 'vectorise' this (not sure that's the right word). When I pass the function a vector (in this case a column in a data frame), I get the following message:
NAs are not allowed in subscripted assignments
I don't want to ignore any NAs in the vector (because I am sending the results to a new column in the data frame). I just want the function, when it sees an NA, to return an NA, but to process all the valid dates as the function dictates. I'm really confused as to how to do this; most of the posts I have seen on this topic relate to how to ignore / remove NAs from the results.
Any help would be greatly appreciated.
Thanks.
Edit. Added in some sample data. Below is sample input data:
01/07/2016
NA
22/07/2016
NA
30/06/2016
22/07/2016
22/07/2016
29/07/2016
NA
22/07/2016
30/06/2016
NA
31/01/2016
02/08/2016
So, entering the following:
newVector <- EOMonth(OldVector, 3)
Should return the end of the month for each of the dates in 3 months' time:
31/10/2016
NA
31/10/2016
NA
30/09/2016
31/10/2016
31/10/2016
31/10/2016
NA
31/10/2016
30/09/2016
NA
NA
30/04/2016
30/11/2016

One solution is to first make a vector of NAs and then process only the non-NA elements of date. Note the NA class needs to be date or the dates are converted to numeric.
EOMonth <- function(date, months)
{
NewDate <- date(rep(NA, length(date)))
NewDate[!is.na(date)] <- date[!is.na(date)] %m+% months(months)
NewDate[!is.na(date)] <- ceiling_date(NewDate[!is.na(date)], "month") - days(1)
NewDate
}
EOMonth(OldVector, 3)

Related

Replacement of missing day and month in dates using R

This question is about how to replace missing days and months in a data frame using R. Considering the data frame below, 99 denotes missing day or month and NA represents dates that are completely unknown.
df<-data.frame("id"=c(1,2,3,4,5),
"date" = c("99/10/2014","99/99/2011","23/02/2016","NA",
"99/04/2009"))
I am trying to replace the missing days and months based on the following criteria:
For dates with missing day but known month and year, the replacement date would be a random selection from the middle of the interval (first day to the last day of that month). Example, for id 1, the replacement date would be sampled from the middle of 01/10/2014 to 31/10/2014. For id 5, this would be the middle of 01/04/2009 to 30/04/2009. Of note is the varying number of days for different months, e.g. 31 days for October and 30 days for April.
As in the case of id 2, where both day and month are missing, the replacement date is a random selection from the middle of the interval (first day to last day of the year), e.g 01/01/2011 to 31/12/2011.
Please note: complete dates (e.g. the case of id 3) and NAs are not to be replaced.
I have tried by making use of the seq function together with the as.POSIXct and as.Date functions to obtain the sequence of dates from which the replacement dates are to be sampled. The difficulty I am experiencing is how to automate the R code to obtain the date intervals (it varies across distinct id) and how to make a random draw from the middle of the intervals.
The expected output would have the date of id 1, 2 and 5 replaced but those of id 3 and 4 remain unchanged. Any help on this is greatly appreciated.
This isn't the prettiest, but it seems to work and adapts to differing month and year lengths:
set.seed(999)
df$dateorig <- df$date
seld <- grepl("^99/", df$date)
selm <- grepl("^../99", df$date)
md <- seld & (!selm)
mm <- seld & selm
df$date <- as.Date(gsub("99","01",as.character(df$date)), format="%d/%m/%Y")
monrng <- sapply(df$date[md], function(x) seq(x, length.out=2, by="month")[2]) - as.numeric(df$date[md])
df$date[md] <- df$date[md] + sapply(monrng, sample, 1)
yrrng <- sapply(df$date[mm], function(x) seq(x, length.out=2, by="12 months")[2]) - as.numeric(df$date[mm])
df$date[mm] <- df$date[mm] + sapply(yrrng, sample, 1)
#df
# id date dateorig
#1 1 2014-10-14 99/10/2014
#2 2 2011-02-05 99/99/2011
#3 3 2016-02-23 23/02/2016
#4 4 <NA> NA
#5 5 2009-04-19 99/04/2009

Subset dataframe in r for a specific month and date

I have a dataframe that looks like this:
V1 V2 V3 Month_nr Date
1 2 3 1 2017-01-01
3 5 6 1 2017-01-02
6 8 9 2 2017-02-01
6 8 9 8 2017-08-01
and I want to take all variables from the data set that have Month=1 (January) and date from 2017-01-01 til 2017-01-31 (so end of January), which means that I want to take the dates as well. I would create a column with days but I have multiple observations for one day and this would be even more confusing. I tried it with this:
df<- filter(df,df$Month_nr == 1, df$Date > 2017-01-01 && df$Date < 2017-01-31)
but it did not work. I would appreciate so much your help! I am desperate at this point. My dataset has measurements for an entire year (from 1 to 12) and hence I filter for months.
The problem is that you didn't put quotation marks around 2017-01-01. Directly putting 2017-01-01 will compute the subtraction and return a number, and then you're comparing a string to a number. You can compare string to string; with string, "2" is still greater than "1", so it would work for comparing dates as strings. BTW, you don't need to write df$ when using filter; you can directly write the column names without quoting when using the tidyverse.
Why do you need to have the month as well as dates in the filter? Just the filter on the dates would work fine. However, you will have to convert the date column into a date object. You can do that as follows:
df$Date_nr <- as.Date(df$Date_nr, format = "%Y-%m-%d")
df_new <- subset(df, Date_nr >= "2017-01-01" & Date_nr <= "2017-01-31")

R: as.POSIXct returns NA

I am new on manipulating time in R. I am trying to turn a character a="201304021500" into POSIXct form using the following code:
#First I turn my charcter a into the format "2013/04/02 15:00"
annee=substr(a,1,4)
mois=substr(a,5,6)
jour=substr(a,7,8)
heure=substr(a,9,10)
a=paste(paste(annee,mois,jour,sep="/"),paste(heure,min,sep=":"),sep=" ")
#Then I use the as.POSIXct function (as follows) that unfortunately returns NA even though the date "2013/04/02 15:00" does not correspond to the time where #the DST was #changed in France (where I am working)
> as.POSIXct(a,format="%y/%m/%d %h:%m",tz="Europe/Paris")
[1] NA
How can I solve this problem ? Much appreciated !

R turn irregular time interval into regular ones using previous numbers

i have an irregular time interval like this
df=data.frame(Date=c("2013-01-08","2013-01-11","2013-01-13","2013-01-21","2013-02-06"), runningtotal=c(800,910,1060,1210,660)
i found through zoo object it can be merged with a regular time interval and fill in 0 as missing values. However, I need to fill in previous value instead, except at month start fill it with 0. So the end output is like this:
date runningtotal
2013-01-01 0
2013-01-02 0
...
2013-01-08 800
2013-01-09 800
2013-01-10 800
2013-01-11 910
2013-01-12 910
2013-01-13 1060
...
2013-02-01 0
And also, does it make sense to fill in value like this for forecasting purpose?
Thanks.
Try approxfun with the constant method. I don't have lubridate and just deal with regular Date objects. For instance:
df<-data.frame(Date=c("2013-01-08","2013-01-11","2013-01-13","2013-01-21","2013-02-06"), runningtotal=c(800,910,1060,1210,660))
df$Date<-as.Date(as.character(df$Date))
#create some new dates
newDates<-seq(df$Date[1],df$Date[5],length.out=10)
intfun<-approxfun(df$Date,df$runningtotal,method="constant",yleft=0,yright=0)
data.frame(newDates,intfun(newDates))
I would use na.locf from zoo package. But You should prepare data before applying it.
## generate a vector of dates
mm <- min(DF$Date)
day(mm) <- 1
seq_dates <- seq.POSIXt(mm,max(DF$Date),by='days')
## add zeros valus for the beging of month
DF <- rbind(DF,data.frame(Date=seq_dates[day(seq_dates)==1],runningtotal=0))
library(zoo)
## merge with the sequence of dates , and apply na.locf for previous values.
na.locf(merge(seq_dates,DF,by=1,all.x=TRUE))
The idea is to apply na.locf that change missing values with the previous non missing values. Merge your data with a sequence of dates(from the first month to the end of dates) will insert missing values.

How to align dates for merging two xts files?

I'm trying to analyze 1-year %-change data in R on two data series by merging them into one file. One series is weekly and the other is monthly. Converting the weekly series to monthly is the problem. Using apply.monthly() on the weekly data creates a monthly file but with intra-monthly dates that don't match the first-day-of-month format in the monthly series after combining the two files via merge.xts(). Question: How to change the resulting merged file (sample below) to one monthly entry for both series?
2012-11-01 0.02079801 NA
2012-11-24 NA -0.03375796
2012-12-01 0.02052502 NA
2012-12-29 NA 0.04442094
2013-01-01 0.01881466 NA
2013-01-26 NA 0.06370272
2013-02-01 0.01859883 NA
2013-02-23 NA 0.02999318
You can pass indexAt="firstof" in a call to to.monthly to get monthly data using the first of the month for the index.
library(quantmod)
getSymbols(c("USPRIV", "ICSA"), src="FRED")
merge(USPRIV, to.monthly(ICSA, indexAt="firstof", OHLC=FALSE))
Something like this:
do.call(rbind, by(d[-1], d[[1]] - as.POSIXlt(d[[1]])$mday, FUN=apply, 2, sum, na.rm=TRUE))
## V2 V3
## 2012-10-31 0.02079801 -0.03375796
## 2012-11-30 0.02052502 0.04442094
## 2012-12-31 0.01881466 0.06370272
## 2013-01-31 0.01859883 0.02999318
Note that the dates are encoded as row names, not as a column in the result.
It is a frequently occurring issue. And sometimes I forget my own solution for it and google does not easily lead to one. So I am posting my solution here.
Basically, just convert the index of monthly aggregated series to yearmon. You can also optionally convert it back to yyyy-mm-dd (to 1st of each month ) format with as.date . After the exact dates are stripped and the indices are 'homogenised' , all the columns align perfectly.
# Here with dplyr
time(myxts)<- time(myxts) %>% as.yearmon() %%> as.date()
#or without dplyr
time(myxts)<- as.date( as.yearmon( time(myxts) ) )

Resources