Creating a weekend dummy variable - r

I'm trying to create a dummy variable in my dataset in R for weekend i.e. the column has a value of 1 when the day is during a weekend and a value of 0 when the day is during the week.
I first tried iterating through the entire dataset by row and assigning the weekend variable a 1 if the date is on the weekend. But this takes forever considering there are ~70,000 rows and I know there is a much simpler way, I just can't figure it out.
Below is what I want the dataframe to look like. Right now it looks like this except for the weekend column. I don't know if this changes anything, but right now date is a factor. I also have a list of the dates fall on weekends:
weekend <- c("2/9/2013", "2/10/2013", "2/16/2013", "2/17/2013", ... , "3/2/2013")
date hour weekend
2/10/2013 0 1
2/11/2013 1 0
.... .... ....
Thanks for the help

It might be safer to rely on data structures and functions that are actually built around dates:
dat <- read.table(text = "date hour weekend
+ 2/10/2013 0 1
+ 2/11/2013 1 0",header = TRUE,sep = "")
> weekdays(as.Date(as.character(dat$date),"%m/%d/%Y")) %in% c('Sunday','Saturday')
[1] TRUE FALSE
This is essentially the same idea as SenorO's answer, but we convert the dates to an actual date column and then simply use weekdays, which means we don't need to have a list of weekends already on hand.

DF$IsWeekend <- DF$date %in% weekend
Then if you really prefer 0s and 1s:
DF$IsWeekend <- as.numeric(DF$IsWeeekend)

I would check if my dates are really weekend dates before.
weekends <- c("2/9/2013", "2/10/2013", "2/16/2013", "2/17/2013","3/2/2013")
weekends = weekends[ as.POSIXlt(as.Date(weekends,'%m/%d/%Y'))$wday %in% c(0,6)]
Then using trsanform and ifelse I create the new column
transform(dat ,weekend = ifelse(date %in% as.Date(weekends,'%m/%d/%Y') ,1,0 ))

Related

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Next week day for a given vector of dates

I'm trying to get the next week day for a vector of dates in R. My approach was to create a vector of weekdays and then find the date to the weekend date I have. The problem is that for Saturday and some holidays (which are a lot in my country) i end up getting the previous week day which doesn't work.
This is an example of my problem:
vecDates = as.Date(c("2011-01-11","2011-01-12","2011-01-13","2011-01-14","2011-01-17","2011-01-18",
"2011-01-19","2011-01-20","2011-01-21","2011-01-24"))
testDates = as.Date(c("2011-01-22","2011-01-23"))
findInterval(testDates,vecDates)
for both dates the correct answer should be 10 which is "2011-01-24" but I get 9.
I though of a solution where I remove all the previous dates to the date i'm analyzing, and then use findInterval. It works but it is not vectorized and therefore kind of slow which does not work for my actual purpose.
Does this do what you want?
vecDates = as.Date(c("2011-01-11","2011-01-12",
"2011-01-13","2011-01-14",
"2011-01-17","2011-01-18",
"2011-01-19","2011-01-20",
"2011-01-21","2011-01-24"))
testDates = as.Date(c("2011-01-20","2011-01-22","2011-01-23"))
get_next_biz_day <- function(testdays, bizdays){
o <- findInterval(testdays, bizdays) + 1
bizdays[o]
}
get_next_biz_day(testDates, vecDates)
#[1] "2011-01-21" "2011-01-24" "2011-01-24"

how to convert the weekdays to a factor in R

I have a data frame like this:
date count wd
2012-10-01 0 Monday
2012-10-02 5 Tuesday
2012-10-06 10 Saturday
2012-10-07 15 Sunday
I use
dat <- mutate(dat,wd = weekdays(as.Date(dat$date)))
to add a new array "wd" , however, I'd like to add a new factor array to show if this day is a weekday or weekend, something like:
date count wd
2012-10-01 0 weekday
2012-10-02 5 weekday
2012-10-06 10 weekend
2012-10-07 15 weekend
Is any simple way to do that? Thanks
ifelse is that standard way to check a condition on each element of a vector and do something based on the check. Since you already have the named weekdays, you have a pretty trivial condition to check:
dat$we = ifelse(dat$wd %in% c("Saturday", "Sunday"), "weekend", "weekday")
This adds a new variable, we, to your data. It will be a factor by default when added to the data frame with $<-.
You can, of course, use ifelse() inside mutate (or use dplyr::if_else), in which case you would need to wrap the result in factor() to coerce it to a factor - it will be a character by otherwise.
For other methods of checking weekendness that don't depend on already having the names of the days of the week see How to check if a date is a weekend?.

grouping by date and treatment in R

I have a time series that looks at how caffeine impacts test scores. On each day, the first test is used to measure a baseline score for the day, and the second score is the effect of a treatment.
Post Caffeine Score Time/Date
yes 10 3/17/2014 17:58:28
no 9 3/17/2014 23:55:47
no 7 3/18/2014 18:50:50
no 10 3/18/2014 23:09:03
Some days have a caffeine treatment, others not. Here's a question: how do I group variables by the day of the week, and create a measure of impact, by subtracting the second days' score from the first.
I'm going to be using these groupings for later graphs and analysis, so I think it's most efficient if there's a way to create objects that look at the improvement in score each day and groups by whether caffeine (treatment) was used.
Thank you for your help!
First make a column for the day:
df$day = strftime(df$'Time/Date', format="%Y-%m-%d")
then I think what you're after is two aggregates:
1) To find if the day had caffeine
dayCaf = aggregate(df$Caffeine~df$day, FUN=function(x) ifelse(length(which(grepl("yes",x)))>0,1,0))
2) To calculate the difference in scores
dayDiff = aggregate(df$Score~df$day, FUN=function(x) x[2]-x[1])
Now put the two together
out = merge(dayCaf, dayDiff, by='df$day')
That gives:
df$day df$caff df$score
1 2014-03-17 1 -1
2 2014-03-18 0 3
The whole code is:
df$day = strftime(df$'Time/Date', format="%Y-%m-%d")
dayCaf = aggregate(df$Caffeine~df$day, FUN=function(x) ifelse(length(which(grepl("yes",x)))>0,1,0))
dayDiff = aggregate(df$Score~df$day, FUN=function(x) x[2]-x[1])
out = merge(dayCaf, dayDiff, by='df$day')
Just replace "df" with the name of your frame and it should work.
Alternatively:
DF <- data.frame(Post.Caffeine = c("Yes","No","No","No"),Score=c(10,9,7,10),Time.Date=c("3/17/2014 17:58:28","3/17/2014 23:55:47","3/18/2014 18:50:50", "3/18/2014 23:09:03"))
DF$Time.Date <- as.Date(DF$Time.Date,format="%m/%d/%Y")
DF2 <- setNames(aggregate(Score~Time.Date,DF,diff),c("Date","Diff"))
DF2$PC <- DF2$Date %in% DF$Time.Date[DF$Post.Caffeine=="Yes"]
DF2
EDIT: This assumes that your data is in the order that you demonstrate.
data.table solution. The order part sorts your data first (If it is already sorted, you can remove the order part, just leave the comma in place). The advantage of this approach is that you are doing the whole process in one line and that it will be fast too
library(data.table)
setDT(temp)[order(as.POSIXct(strptime(`Time/Date`, "%m/%d/%Y %H:%M:%S"))),
list(HadCafffeine = if(any(PostCaffeine == "yes")) "yes" else "no",
Score = diff(Score),
by = as.Date(strptime(`Time/Date`, "%m/%d/%Y"))]
## as.Date HadCafffeine Score
## 1: 2014-03-17 yes -1
## 2: 2014-03-18 no 3
This solution assumes temp as your data set and PostCaffeine instead Post Caffeine as the variable name (it is bad practice in R to put spaces or / into variable names as it limits your possibilities to work with them).

Create indicator variables of holidays from a date column

I am still a bonehead novice so forgive me if this is a simple question, but I can't find the answer on stackoverflow. I would like to create a set of indicator variables for each of the major US holidays, just by applying a function to my date field that can detect which days are holidays and then I could us Model.matrix etc.. to convert to a set of indicator variables.
For example, I have daily data from Jan 1 2012 through September 15th, 2013 and I would like to create a variable indicator for Easter.
I am currently using the timeDate package to pass a year to their function Easter() to find the date. I then type the dates into the following code to create an indicator variable.
Easter(2012)
EasterInd2012<-as.numeric(DATASET$Date=="2012-04-08")
The easiest way to get a general holiday indicator variable is to create a vector of all the holidays you're interested in and then match those dates in your data frame. Something like this should work:
library(timeDate)
# Sample data
Date <- seq(as.Date("2012-01-01"), as.Date("2013-09-15"), by="1 day")
DATASET <- data.frame(rnorm(624), Date)
# Vector of holidays
holidays <- c(as.Date("2012-01-01"),
as.Date(Easter(2013)),
as.Date("2012-12-25"),
as.Date("2012-12-31"))
# 1 if holiday, 0 if not. Could also be a factor, like c("Yes", "No")
DATASET$holiday <- ifelse(DATASET$Date %in% holidays, 1, 0)
You can either manually input the dates, or use some of timeDate's built-in holiday functions (the listHolidays() function shows all those). So you could also construct holidays like so:
holidays <- c(as.Date("2012-01-01"),
as.Date(Easter(2013)),
as.Date(USLaborDay(2012)),
as.Date(USThanksgivingDay(2012)),
as.Date(USMemorialDay(2012)),
as.Date("2012-12-25"),
as.Date("2012-12-31"))
To get specific indicators for each holiday, you'll need to do them one at a time:
EasterInd2012 <- ifelse(DATASET$Date==as.Date(Easter(2012)), 1, 0)
LaborDay2012 <- ifelse(DATASET$Date==as.Date(UsLaborDay(2012)), 1, 0)
# etc.

Resources