Create indicator variables of holidays from a date column - r

I am still a bonehead novice so forgive me if this is a simple question, but I can't find the answer on stackoverflow. I would like to create a set of indicator variables for each of the major US holidays, just by applying a function to my date field that can detect which days are holidays and then I could us Model.matrix etc.. to convert to a set of indicator variables.
For example, I have daily data from Jan 1 2012 through September 15th, 2013 and I would like to create a variable indicator for Easter.
I am currently using the timeDate package to pass a year to their function Easter() to find the date. I then type the dates into the following code to create an indicator variable.
Easter(2012)
EasterInd2012<-as.numeric(DATASET$Date=="2012-04-08")

The easiest way to get a general holiday indicator variable is to create a vector of all the holidays you're interested in and then match those dates in your data frame. Something like this should work:
library(timeDate)
# Sample data
Date <- seq(as.Date("2012-01-01"), as.Date("2013-09-15"), by="1 day")
DATASET <- data.frame(rnorm(624), Date)
# Vector of holidays
holidays <- c(as.Date("2012-01-01"),
as.Date(Easter(2013)),
as.Date("2012-12-25"),
as.Date("2012-12-31"))
# 1 if holiday, 0 if not. Could also be a factor, like c("Yes", "No")
DATASET$holiday <- ifelse(DATASET$Date %in% holidays, 1, 0)
You can either manually input the dates, or use some of timeDate's built-in holiday functions (the listHolidays() function shows all those). So you could also construct holidays like so:
holidays <- c(as.Date("2012-01-01"),
as.Date(Easter(2013)),
as.Date(USLaborDay(2012)),
as.Date(USThanksgivingDay(2012)),
as.Date(USMemorialDay(2012)),
as.Date("2012-12-25"),
as.Date("2012-12-31"))
To get specific indicators for each holiday, you'll need to do them one at a time:
EasterInd2012 <- ifelse(DATASET$Date==as.Date(Easter(2012)), 1, 0)
LaborDay2012 <- ifelse(DATASET$Date==as.Date(UsLaborDay(2012)), 1, 0)
# etc.

Related

Filter Data by Seasonal Ranges Over Several Years Based on Month and Day Column in R Studio

I am trying to filter a large dataset to contain results between a range of days and months over several years to evaluate seasonal objectives. My season is defined from 15 March through 15 September. I can't figure out how to filter the days so that they are only applied to March and September and not the other months within the range. My dataframe is very large and contains proprietary information, but I think the most important information is that the dates are describes by columns: SampleDate (date formatted as %y%m%d), day (numeric), and month (numeric).
I have tried filtering using multiple conditions like so:
S1 <- S1 %>%
filter((S1$month >= 3 & S1$day >=15) , (S1$month<=9 & S1$day<=15 ))
I also attempted to set ranges using between for every year that I have data with no luck:
S1 %>% filter(between(SampleDate, as.Date("2010-03-15"), as.Date("2010-09-15") &
as.Date("2011-03-15"), as.Date("2011-09-15")&
as.Date("2012-03-15"), as.Date("2012-09-15")&
as.Date("2013-03-15"), as.Date("2013-09-15")&
as.Date("2014-03-15"), as.Date("2014-09-15")&
as.Date("2015-03-15"), as.Date("2015-09-15")&
as.Date("2016-03-15"), as.Date("2016-09-15")&
as.Date("2017-03-15"), as.Date("2017-09-15")&
as.Date("2018-03-15"), as.Date("2018-09-15")))
I am pretty new to R and can't find any solution online. I know there must be a somewhat simple way to do this! Any help is greatly appreciated!
Maybe something like this:
library(data.table)
df <- setDT(df)
# convert a date like this '2020-01-01' into this '01-01'
df[,`:=`(month_day = str_sub(date, 6, 10))]
df[month_day >= '03-15' & month_day <= '09-15']

Is there a way to use the round date to next trading day while keeping both date and variable columns in R?

How can I round the dates in the date column to the following business day? So each Saturday, Sunday and holiday should be transformed to the following business day. Furthermore, how can we include the output from the other columns as well in the transformation to following business days?
I tried this with the bizdays function:
TestDates <- RawTweetDataWSentiment
View(TestDates)
bizdays.options$set(default.calendar="UnitedKingdom/ANBIMA")
cal <- create.calendar("UnitedKingdom/ANBIMA", holidays=holidaysANBIMA, weekdays=c("saturday", "sunday"))
adjust.next(TestDates$Date, cal)
TestDates1 <- adjust.next(TestDates$Date, cal)
View(TestDates1)
This however only returns the date column
Does anyone know how to do this in R?

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Simple time series analysis with R: aggregating and subsetting

I want to convert monthly data into quarterly averages. These are my 2 datasets:
gas <- UKgas
dd <- UKDriverDeaths
I was able to accomplish (I think) for the dd data as so:
dd.zoo <- zoo(dd)
ddq <- aggregate(dd.zoo, as.yearqtr, mean)
However I cannot figure out how to do this with the gas data...any help?
Follow-up
When I try to subset the data based on date (1969-1984) the resulting data does not include 1969 Q1 and instead includes 1985 Q1...any suggestions on how to fix this? I was just trying to subset as gas[1969:1984].
Originally I did not plan to post answer, as it looks like you did not pre-check your UKgas dataset to see that it is already a quarterly time series.
But the follow-up question is worth answering. "ts" object comes with many handy generic functions. We can use window to easily subset a time series. To extract the section between first quarter of 1969 and the final quarter of 1984, we can use
window(UKgas, start = c(1969,1), end = c(1984,4))
The result will still be a quarterly time series.
On the other hand, if we use "[" for subsetting, we lose object class:
class(UKgas[1:12])
#[1] "numeric"

Creating a weekend dummy variable

I'm trying to create a dummy variable in my dataset in R for weekend i.e. the column has a value of 1 when the day is during a weekend and a value of 0 when the day is during the week.
I first tried iterating through the entire dataset by row and assigning the weekend variable a 1 if the date is on the weekend. But this takes forever considering there are ~70,000 rows and I know there is a much simpler way, I just can't figure it out.
Below is what I want the dataframe to look like. Right now it looks like this except for the weekend column. I don't know if this changes anything, but right now date is a factor. I also have a list of the dates fall on weekends:
weekend <- c("2/9/2013", "2/10/2013", "2/16/2013", "2/17/2013", ... , "3/2/2013")
date hour weekend
2/10/2013 0 1
2/11/2013 1 0
.... .... ....
Thanks for the help
It might be safer to rely on data structures and functions that are actually built around dates:
dat <- read.table(text = "date hour weekend
+ 2/10/2013 0 1
+ 2/11/2013 1 0",header = TRUE,sep = "")
> weekdays(as.Date(as.character(dat$date),"%m/%d/%Y")) %in% c('Sunday','Saturday')
[1] TRUE FALSE
This is essentially the same idea as SenorO's answer, but we convert the dates to an actual date column and then simply use weekdays, which means we don't need to have a list of weekends already on hand.
DF$IsWeekend <- DF$date %in% weekend
Then if you really prefer 0s and 1s:
DF$IsWeekend <- as.numeric(DF$IsWeeekend)
I would check if my dates are really weekend dates before.
weekends <- c("2/9/2013", "2/10/2013", "2/16/2013", "2/17/2013","3/2/2013")
weekends = weekends[ as.POSIXlt(as.Date(weekends,'%m/%d/%Y'))$wday %in% c(0,6)]
Then using trsanform and ifelse I create the new column
transform(dat ,weekend = ifelse(date %in% as.Date(weekends,'%m/%d/%Y') ,1,0 ))

Resources