How to convert ARIMA forecast output to Date - r

I have got the output from the ARIMA forecast. But I don't know how to convert the first column to date. The output is the prediction for the last days of 2021 and I wanted to get the date or a day number of 2021. Is it possible?
Following is the first column output which I need to convert:
2021.9589 is the output for 17th Dec.
2021.9616 is the output for 18th Dec.
2021.9644 is the output for 19th Dec.
2021.9671 ... and so on....
2021.9699
2021.9726
2021.9753
2021.9781
2021.9808
2021.9836
2021.9863
2021.989
2021.9918
2021.9945
2021.9973
2022.0000
2022.0027
2022.0055

I would use date_decimal, from lubridate, it works like:
> require(lubridate)
( d <- date_decimal(2021.9589) )
[1] "2021-12-16 23:57:50 UTC"
And then use month and day functions:
> month(d)
[1] 12
> day(d)
[1] 16

Related

Why does the as.Date function in R converts the years I enter into the current year 2020?

I have some dates in a dataframe, and when I use as.Date() to convert them into dates, the years convert into 2020, which isn't really valid because the file only has data up to 2018.
What I have so far:
> fechadeinsc1[2]
[1] "2020-08-15"
> class(fechadeinsc1)
[1] "Date"
> fechainsc[2]
[1] "2017/99/99"
> class(fechainsc)
[1] "character"
As you can see, fechadeinsc1 was converted into a date and fechainsc is the original dataframe which elements are characters. "fechadeinsc1" should give the same year, shouldn't it? Even though days and months aren't valid.
Another example:
> fechadenac1[2]
[1] "2020-12-31"
> class(fechadenac1)
[1] "Date"
> fechanac[2]
[1] "12/31/2016"
> class(fechanac)
[1] "character"
Again, the year changes.
My code:
fechanac <- dat$fecha_nac
fechainsc <- dat$fecha_insc
fechadeinsc1 <- as.Date(fechainsc,tryFormats =c("%d/%m/%y","%m/%d/%y","%y","%d%m%y","%m%d%y"))
fechadenac1 <- as.Date(fechanac,tryFormats =c("%d/%m/%y","%m/%d/%y","%y","%d%m%y","%m%d%y"))
"dat" is the original dataframe which contains information about newborns registered in 2016 and 2017 in Ecuador, if anyone wants the original .csv file please contact me.
Based on strptime, referred from as.Date, you should use upper case Y for 4-digit years:
%y Year without century (00--99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 -- that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
%Y Year with century. [...]

Sequential numbering for each month on a period of time in R

I have set of dates for a period of 10 years starting April 2006 till August 2016 i.e. 125 months. I want to identify each month by marking them out by sequential numbering starting from "1" till "125" in corresponding column (new column).
Example:
All dates in Apr'2006 will be identified as 1...May'2006 as 2 ...... Aug'2016 as 125.
Dates in the data set is in format type.
Requesting guidance on how to achieve this.
Assume that you start with a vector of dates in factor format:
x<- as.factor(c("8/7/2006", "12/13/2006", "12/14/2006"))
First you should convert this vector to Date format. In your case this can be done like this
x<- as.Date(x, format= "%m/%d/%Y")
Using the format command you can delete the day of a specific date:
format(x, "%Y %m")
> "2006 08" "2006 12" "2006 12"
This way you get rid of the day and just keep year and month.
Next you define a reference vector which contains all months from April 2006 to August 2016:
ref<- seq(from= as.Date("04/01/2006", format= "%m/%d/%Y"), to= as.Date("08/01/2016", format= "%m/%d/%Y"), length.out = 125)
ref<- format(ref, "%Y %m").
Finally you compare the entries from x with the entries from ref. This can be done with the sapply function which basically applies a function to each component of x. Here, the function it applies is the function:
myfun<-function(z) {
which(ref == format(z, "%Y %m"))
}
But since you do not need the function myfun elsewhere you can directly plug it into the sapply funtion. In the end you use the command unlist, so you get a vector.
sapply(x, function(z) which(ref == format(z, "%Y %m")))
> 6 10 10
should do the trick.
Using lubridate to format the dates:
library(lubridate)
# Create a data frame from the string below, as a factor variable
dat <- '8/7/2006 12/13/2006 12/14/2006 12/15/2006 12/16/2006 8/28/2007 8/29/2007 4/22/2008 4/23/2008 4/24/2008 4/25/2008 4/28/2008 4/29/2008 4/30/2008 5/1/2008 5/2/2008 5/7/2016 5/7/2016 5/7/2016 5/7/2016 6/26/2016 7/4/2016 7/31/2016 8/28/2016'
test_df <- data.frame(original=as.factor(strsplit(dat, ' ')[[1]]))
# We will need to convert the dates to strings in the right format
test_df$converted_string <- as.character(floor_date(mdy(test_df$original), unit="month"))
# Create a lookup table
my_months <- seq(125)
names(my_months) <- seq(as.Date('2006-04-01'), by='month', length.out=125)
# Do the lookup
test_df$converted_int <- my_months[test_df$converted_string]

Low efficiency when I loop in dataframe R

I use following code to map the month name to number and I find it's low efficiency compared to other dataframe computation without for loop.
Sys.time()
head(df[,4])
for (i in 1:nrow(df)){
df$monthnum[i]<-match(tolower(as.character(df[i,4])), tolower(month.name))
}
Sys.time()
and I got output like this:
> Sys.time()
[1] "2016-03-07 19:20:53 CST"
> dim(df)
[1] 229464 6
> head(df[,4])
[1] January January January January January January
Levels: April August December February January July June March May November October September
> for (i in 1:nrow(df)){
+ df$monthnum[i]<-match(tolower(as.character(df[i,4])), tolower(month.name))
+ }
> Sys.time()
[1] "2016-03-07 19:23:23 CST"
Can anyone the logic of for loop in dataframe. Any information will be appreciated.
Use sapply function.
First, create your function:
my_function = function(my_month){
match(tolower(as.character(my_month)), tolower(month.name))
}
then use sapply
sapply(df[,4],my_function)

Obtaining last Friday's date

I can get today's date:
Sys.Date( )
But how do I get last Friday's date?
I tried:
library(xts)
date1 <- Sys.Date( )
to.weekly(date1 )
But this gives an error.
I think this should work:
library(lubridate)
Sys.Date() - wday(Sys.Date() + 1)
Try this:
library(zoo)
lastfri(Sys.Date())
where lastfri is the same as the one line function nextfri in the this zoo vignette, zoo quickref vignette, except that ceiling is replaced with floor. Note that lastfri is vectorized, i.e. it can take a vector of input dates and produces a vector of output dates. For example,
library(zoo)
Sys.Date()
## 2015-03-10
lastfri(Sys.Date() + 0:6)
## [1] "2015-03-06" "2015-03-06" "2015-03-06" "2015-03-13" "2015-03-13"
## [6] "2015-03-13" "2015-03-13"
Thus last Friday was March 6th and we keep getting March 6th until the day advances to to next Friday at which point the last Friday is March 13th.
Aside: Next Friday is Friday the 13th.
Here is a function that finds the last date for any day of the week:
getlastdate <- function(day) {
library(lubridate)
dates <- seq((Sys.Date()-7), (Sys.Date()-1), by="days")
dates[wday(dates, label=T)==day]
}
getlastdate("Mon")
# "2015-03-09"
Enter the day of the week in abbreviated format: i.e.
Sun Mon Tues Wed Thurs Fri Sat
Last Friday was 4 days ago, thus:
Sys.Date()-4
> Sys.Date()-4
[1] "2015-03-06"
OR for any day of the week, using base:
Sys.Date()-(as.POSIXlt(Sys.Date())$wday+2)

Create end of the month date from a date variable

I have a large data frame with date variables, which reflect first day of the month. Is there an easy way to create a new data frame date variable that represents the last day of the month?
Below is some sample data:
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
df$date.start.month
"2012-01-01" "2012-02-01" "2012-03-01" "2012-04-01"
I would like to return a new variable with:
"2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
I've tried the following but it was unsuccessful:
df$date.end.month=seq(df$date.start.month,length=1,by="+1 months")
To get the end of months you could just create a Date vector containing the 1st of all the subsequent months and subtract 1 day.
date.end.month <- seq(as.Date("2012-02-01"),length=4,by="months")-1
date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
Here is another solution using the lubridate package:
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
library(lubridate)
df$date.end.month <- ceiling_date(df$date.start.month, "month") - days(1)
df$date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
This uses the same concept given by James above, in that it gets the first day of the next month and subtracts one day.
By the way, this will work even when the input date is not necessarily the first day of the month. So for example, today is the 27th of the month and it still returns the correct last day of the month:
ceiling_date(Sys.Date(), "month") - days(1)
[1] "2017-07-31"
Use timeLastDayInMonth from the timeDate package:
df$eom <- timeLastDayInMonth(df$somedate)
library(lubridate)
as.Date("2019-09-01") - days(1)
[1] "2019-08-31"
or
library(lubridate)
as.Date("2019-09-01") + months(1) - days(1)
[1] "2019-09-30"
A straightforward solution would be using the yearmonfunction with the argument frac=1 from the xts-package. frac is a number between 0 and 1 that indicates the fraction of the way through the period that the result represents.
as.Date(as.yearmon(seq.Date(as.Date('2017-02-01'),by='month',length.out = 6)),frac=1)
[1] "2017-02-28" "2017-03-31" "2017-04-30" "2017-05-31" "2017-06-30" "2017-07-31"
Or if you prefer “piping” using magrittr:
seq.Date(as.Date('2017-02-01'),by='month',length.out = 6) %>%
as.yearmon() %>% as.Date(,frac=1)
[1] "2017-02-28" "2017-03-31" "2017-04-30" "2017-05-31" "2017-06-30" "2017-07-31"
A function as below would do the work (assume dt is scalar) -
month_end <- function(dt) {
d <- seq(dt, dt+31, by="days")
max(d[format(d,"%m")==format(dt,"%m")])
}
If you have a vector of Dates, then do the following -
sapply(dates, month_end)
you can use timeperiodsR
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
df$date.start.month
# install.packages("timeperiodsR")
pm <- previous_month(df$date.start.month[1]) # get previous month
start(pm) # first day of previous month
end(pm) # last day of previous month
seq(pm) # vector with all days of previous month
We can also use bsts::LastDayInMonth:
transform(df, date.end.month = bsts::LastDayInMonth(df$date.start.month))
# date.start.month date.end.month
# 1 2012-01-01 2012-01-31
# 2 2012-02-01 2012-02-29
# 3 2012-03-01 2012-03-31
# 4 2012-04-01 2012-04-30
tidyverse has added the clock package in addition to the lubridate package that has nice functionality for this:
library(clock)
date_build(2012, 1:12, 31, invalid = "previous")
# [1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30" "2012-05-31" "2012-06-30"
# [7] "2012-07-31" "2012-08-31" "2012-09-30" "2012-10-31" "2012-11-30" "2012-12-31"
The invalid argument specifies what to do with an invalid date (e.g. 2012-02-31). From the documentation:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of
day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.

Resources