how to convert the weekdays to a factor in R - r

I have a data frame like this:
date count wd
2012-10-01 0 Monday
2012-10-02 5 Tuesday
2012-10-06 10 Saturday
2012-10-07 15 Sunday
I use
dat <- mutate(dat,wd = weekdays(as.Date(dat$date)))
to add a new array "wd" , however, I'd like to add a new factor array to show if this day is a weekday or weekend, something like:
date count wd
2012-10-01 0 weekday
2012-10-02 5 weekday
2012-10-06 10 weekend
2012-10-07 15 weekend
Is any simple way to do that? Thanks

ifelse is that standard way to check a condition on each element of a vector and do something based on the check. Since you already have the named weekdays, you have a pretty trivial condition to check:
dat$we = ifelse(dat$wd %in% c("Saturday", "Sunday"), "weekend", "weekday")
This adds a new variable, we, to your data. It will be a factor by default when added to the data frame with $<-.
You can, of course, use ifelse() inside mutate (or use dplyr::if_else), in which case you would need to wrap the result in factor() to coerce it to a factor - it will be a character by otherwise.
For other methods of checking weekendness that don't depend on already having the names of the days of the week see How to check if a date is a weekend?.

Related

Subset dataframe in r for a specific month and date

I have a dataframe that looks like this:
V1 V2 V3 Month_nr Date
1 2 3 1 2017-01-01
3 5 6 1 2017-01-02
6 8 9 2 2017-02-01
6 8 9 8 2017-08-01
and I want to take all variables from the data set that have Month=1 (January) and date from 2017-01-01 til 2017-01-31 (so end of January), which means that I want to take the dates as well. I would create a column with days but I have multiple observations for one day and this would be even more confusing. I tried it with this:
df<- filter(df,df$Month_nr == 1, df$Date > 2017-01-01 && df$Date < 2017-01-31)
but it did not work. I would appreciate so much your help! I am desperate at this point. My dataset has measurements for an entire year (from 1 to 12) and hence I filter for months.
The problem is that you didn't put quotation marks around 2017-01-01. Directly putting 2017-01-01 will compute the subtraction and return a number, and then you're comparing a string to a number. You can compare string to string; with string, "2" is still greater than "1", so it would work for comparing dates as strings. BTW, you don't need to write df$ when using filter; you can directly write the column names without quoting when using the tidyverse.
Why do you need to have the month as well as dates in the filter? Just the filter on the dates would work fine. However, you will have to convert the date column into a date object. You can do that as follows:
df$Date_nr <- as.Date(df$Date_nr, format = "%Y-%m-%d")
df_new <- subset(df, Date_nr >= "2017-01-01" & Date_nr <= "2017-01-31")

R: create index for xts time object from calendar week , e.g. 201501 ... 201553

I know how to get the week from an index, but don't know the other way around: how to create an index if I have the calendar weeks (in this case, from an SAP system with 0CALWEEK as 201501, 201502 ... 201552, 201553.
Found this:
How to Parse Year + Week Number in R?
but the day is needed and it's not clear how to set it, especially at the end of the year (Year - week - day: YEAR-53-01 does not always exist, since the first day of week 53 might be Monday, then 01 (Sunday) is not in that week.
I could try to get in the source system the first day of the corresponding week (through SQL) but thought R might do it easier...
Do you have any suggestions?
(Which first day of the week would be not important , since I will create all objects the same way and then merge/cbind them, then continue the analysis. If zoo is easier, I'll go with it)
Thanks!
The problem is that all indices end in 2015-07-29:
data <- 1:4
weeks <- c('201501','201502','201552','201553')
weeks_2 <- as.Date(weeks,format='%Y%w')
xts(data, order.by = weeks_2)
[,1]
2015-07-29 1
2015-07-29 2
2015-07-29 3
2015-07-29 4
test <- xts(data, order.by = weeks_2)
index(test)
[1] "2015-07-29" "2015-07-29" "2015-07-29" "2015-07-29"
You can use as.Date() function, I think is the easiest way:
weeks <- c('201501','201502','201552','201553')
as.Date(paste0(weeks,'1'),format='%Y%W%w') # paste a dummy day
## [1] "2015-01-05" "2015-01-12" "2015-12-28" NA
Where:
%W: Week 00-53 with Monday as first day of the week
or
%U: Week 01-53 with Sunday as first day of the week
%w: Weekday 0-6 Sunday is 0
For this year, week number 53 doesn't exist. And If you want to start with 2015-01-01, just set the right week day:
weeks <- c('201500','201501','201502','201551','201552')
as.Date(paste0(weeks,'4'),format='%Y%W%w')
## [1] "2015-01-01" "2015-01-08" "2015-01-15" "2015-12-24" "2015-12-31"
You may try with substr() and lubridate
library(lubridate)
# a number from your list: 201502
# set the year
x <- ymd("2015-01-1")
# retrieve second week
week(x) <- 2
x
[1] "2015-01-08"
you can use the result for your Index or rownames().
zoo and xts are great for time series once you have set the names,
be sure to remove any column with dates from your data frame

R turn irregular time interval into regular ones using previous numbers

i have an irregular time interval like this
df=data.frame(Date=c("2013-01-08","2013-01-11","2013-01-13","2013-01-21","2013-02-06"), runningtotal=c(800,910,1060,1210,660)
i found through zoo object it can be merged with a regular time interval and fill in 0 as missing values. However, I need to fill in previous value instead, except at month start fill it with 0. So the end output is like this:
date runningtotal
2013-01-01 0
2013-01-02 0
...
2013-01-08 800
2013-01-09 800
2013-01-10 800
2013-01-11 910
2013-01-12 910
2013-01-13 1060
...
2013-02-01 0
And also, does it make sense to fill in value like this for forecasting purpose?
Thanks.
Try approxfun with the constant method. I don't have lubridate and just deal with regular Date objects. For instance:
df<-data.frame(Date=c("2013-01-08","2013-01-11","2013-01-13","2013-01-21","2013-02-06"), runningtotal=c(800,910,1060,1210,660))
df$Date<-as.Date(as.character(df$Date))
#create some new dates
newDates<-seq(df$Date[1],df$Date[5],length.out=10)
intfun<-approxfun(df$Date,df$runningtotal,method="constant",yleft=0,yright=0)
data.frame(newDates,intfun(newDates))
I would use na.locf from zoo package. But You should prepare data before applying it.
## generate a vector of dates
mm <- min(DF$Date)
day(mm) <- 1
seq_dates <- seq.POSIXt(mm,max(DF$Date),by='days')
## add zeros valus for the beging of month
DF <- rbind(DF,data.frame(Date=seq_dates[day(seq_dates)==1],runningtotal=0))
library(zoo)
## merge with the sequence of dates , and apply na.locf for previous values.
na.locf(merge(seq_dates,DF,by=1,all.x=TRUE))
The idea is to apply na.locf that change missing values with the previous non missing values. Merge your data with a sequence of dates(from the first month to the end of dates) will insert missing values.

Selecting Specific Dates in R

I am wondering how to create a subset of data in R based on a list of dates, rather than by a date range.
For example, I have the following data set data which contains 3 years of 6-minute data.
date zone month day year hour minute temp speed gust dir
1 09/06/2009 00:00 PDT 9 6 2009 0 0 62 2 15 156
2 09/06/2009 00:06 PDT 9 6 2009 0 6 62 13 16 157
I have used breeze<-subset(data, ws>=15 & wd>=247.5 & wd<=315, select=date:dir) to select the rows which meet my criteria for a sea breeze, which is fine, but what I want to do is create a subset of the days which contain those times that meet my criteria.
I have used...
as.character(breeze$date)
trimdate<-strtrim(breeze$date, 10)
breezedate<-as.Date(trimdate, "%m/%d/%Y")
breezedate<-format(breezedate, format="%m/%d/%Y")
...to extract the dates from each row that meets my criteria so I have a variable called breezedate that contains a list of the dates that I want (not the most eloquent coding to do this, I'm sure). There are about two-hundred dates in the list. What I am trying to do with the next command is in my original dataset data to create a subset which contains only those days which meet the seabreeze criteria, not just the specific times.
breezedays<-(data$date==breezedate)
I think one of my issues here is that I am comparing one value to a list of values, but I am not sure how to make it work.
Lets assume your breezedate list looks like this and data$date is simple string:
breezedate <- as.Date(c("2009-09-06", "2009-10-01"))
This is probably want you want:
breezedays <- data[as.Date(data$date, '%m/%d/%Y') %in% breezedate]
The intersect() function (docs) will allow you to compare one data frame to another and return those records that are the same.
To use, run the following:
breezedays <- intersect(data$date,breezedate) # returns into breezedays all records that are shared between data$date and breezedate

Creating a weekend dummy variable

I'm trying to create a dummy variable in my dataset in R for weekend i.e. the column has a value of 1 when the day is during a weekend and a value of 0 when the day is during the week.
I first tried iterating through the entire dataset by row and assigning the weekend variable a 1 if the date is on the weekend. But this takes forever considering there are ~70,000 rows and I know there is a much simpler way, I just can't figure it out.
Below is what I want the dataframe to look like. Right now it looks like this except for the weekend column. I don't know if this changes anything, but right now date is a factor. I also have a list of the dates fall on weekends:
weekend <- c("2/9/2013", "2/10/2013", "2/16/2013", "2/17/2013", ... , "3/2/2013")
date hour weekend
2/10/2013 0 1
2/11/2013 1 0
.... .... ....
Thanks for the help
It might be safer to rely on data structures and functions that are actually built around dates:
dat <- read.table(text = "date hour weekend
+ 2/10/2013 0 1
+ 2/11/2013 1 0",header = TRUE,sep = "")
> weekdays(as.Date(as.character(dat$date),"%m/%d/%Y")) %in% c('Sunday','Saturday')
[1] TRUE FALSE
This is essentially the same idea as SenorO's answer, but we convert the dates to an actual date column and then simply use weekdays, which means we don't need to have a list of weekends already on hand.
DF$IsWeekend <- DF$date %in% weekend
Then if you really prefer 0s and 1s:
DF$IsWeekend <- as.numeric(DF$IsWeeekend)
I would check if my dates are really weekend dates before.
weekends <- c("2/9/2013", "2/10/2013", "2/16/2013", "2/17/2013","3/2/2013")
weekends = weekends[ as.POSIXlt(as.Date(weekends,'%m/%d/%Y'))$wday %in% c(0,6)]
Then using trsanform and ifelse I create the new column
transform(dat ,weekend = ifelse(date %in% as.Date(weekends,'%m/%d/%Y') ,1,0 ))

Resources