R assign week value for a range of numbers - r

I have a data frame that looks like this:
dataset <- data.frame(date = seq(from=as.Date("2015-07-06"),
to=as.Date("2015-07-15"),by="day"),
stringsAsFactors=F)
My objective is to assign a week value to a sequence first 5 dates so it looks something like this:
date week
1: 2015-07-06 Week 1
2: 2015-07-07 Week 1
3: 2015-07-08 Week 1
4: 2015-07-09 Week 1
5: 2015-07-10 Week 1
6: 2015-07-11 Week 2
7: 2015-07-12 Week 2
8: 2015-07-13 Week 2
9: 2015-07-14 Week 2
10: 2015-07-15 Week 2
My data is only week day data, hence it's only 5 days. Each week starts from a Monday ...just to give some context.
Is there a way to do this besides counting first 5 and assigning "Week 1", then counting the next 5 and assigning "Week 2"...and so on?
I'm putting this piece in a for loop so I'm hoping for a straighforward solution.
Thank you very much!

As the 'date' column have only week days and without any breaks, we can use gl/paste to create week index. This doesn't depend on the nrow of the dataset i.e. even if the nrow is not a multiple of 5, it will work.
dataset$week <- paste('Week', as.numeric(gl(nrow(dataset),5, nrow(dataset))))
Other option would be using format after converting the 'date' column to 'Date' class.
format(as.Date(dataset$date),'%W')
#[1] "27" "27" "27" "27" "27" "27" "27" "28" "28" "28"
Or
week(strptime(dataset$date,format='%Y-%m-%d'))
#[1] 27 27 28 28 28 28 28 28 28 29
but, I am not sure that is what you wanted.

Here is my solution withe the week function of the lubridate package. Note that before passing to the week function column date need to be converted with ymd as a POSIX class.
library(lubridate)
dataset$date <- ymd(dataset$date)
dataset$week <- week(dataset$date)
dataset
date week
1 2015-07-06 27
2 2015-07-07 27
3 2015-07-08 28
4 2015-07-09 28
5 2015-07-10 28
6 2015-07-11 28
7 2015-07-12 28
8 2015-07-13 28
9 2015-07-14 28
10 2015-07-15 29

Here's a simple solution using base R:
nweeks <- 10 #choose as required
days <- paste0("Week",rep(seq(nweeks),each=5))
#> days
# [1] "Week1" "Week1" "Week1" "Week1" "Week1" "Week2" "Week2" "Week2" "Week2" "Week2" "Week3" "Week3" "Week3" "Week3" "Week3"
#[16] "Week4" "Week4" "Week4" "Week4" "Week4" "Week5" "Week5" "Week5" "Week5" "Week5" "Week6" "Week6" "Week6" "Week6" "Week6"
#[31] "Week7" "Week7" "Week7" "Week7" "Week7" "Week8" "Week8" "Week8" "Week8" "Week8" "Week9" "Week9" "Week9" "Week9" "Week9"
#[46] "Week10" "Week10" "Week10" "Week10" "Week10"

Related

How to find third Sunday date for all months between a date range in R and their respective values

For a particular date range, for example between 2020-01-29 and 2021-05-02, I want to find out dates for every 3rd Sunday of every month along with their associated value in a data.frame.
Additionally, if there is any 5th Monday in any month then I want to obtain its date and corresponding value in a separate data.frame.
Please note that it needs to be between a date range from those given in the data.frame.
## for creating data frame in R wrt dates and values
dates_seq<-(seq(as.Date("2019/12/28"), by = "day", length.out = 1000))
dates_seq<-as.data.frame(dates_seq)
values<-seq(1:1000)
df<-as.data.frame(cbind(dates_seq,values))
To summarize, I want to find the third Sunday date for every month and it's corresponding value and the fifth Monday for every month if there is any along with it's value.
Here's a base R approach :
# Get date between 2020-01-29 and 2021-05-0
temp <- subset(df, dates_seq >= as.Date('2020-01-29') &
dates_seq <= as.Date('2021-05-02'))
#Add weekday
temp$week_day <- weekdays(temp$dates_seq)
#Add week number for each month
temp$week_number <- ave(temp$week_day, temp$week_day,
format(temp$dates_seq, "%Y-%m"), FUN = seq_along)
#Subset 3rd Sunday and 5th Monday
subset(temp, week_number == 3 & week_day == 'Sunday' |
week_number == 5 & week_day == 'Monday')
# dates_seq values week_day week_number
#51 2020-02-16 51 Sunday 3
#79 2020-03-15 79 Sunday 3
#94 2020-03-30 94 Monday 5
#114 2020-04-19 114 Sunday 3
#142 2020-05-17 142 Sunday 3
#177 2020-06-21 177 Sunday 3
#185 2020-06-29 185 Monday 5
#205 2020-07-19 205 Sunday 3
#233 2020-08-16 233 Sunday 3
#248 2020-08-31 248 Monday 5
#268 2020-09-20 268 Sunday 3
#296 2020-10-18 296 Sunday 3
#324 2020-11-15 324 Sunday 3
#339 2020-11-30 339 Monday 5
#359 2020-12-20 359 Sunday 3
#387 2021-01-17 387 Sunday 3
#422 2021-02-21 422 Sunday 3
#450 2021-03-21 450 Sunday 3
#458 2021-03-29 458 Monday 5
#478 2021-04-18 478 Sunday 3
As in lubridate Sundays are the 1st day of the week, this code will give you a data frame containing all third Sundays:
df <- df %>%
mutate(dates_seq = as.Date(dates_seq)) %>%
mutate(year = year(dates_seq),
month = month(dates_seq),
day = wday(dates_seq)) %>%
filter(day == 1) %>%
group_by(year, month) %>%
slice(3)
You could do a match with the original data frame to find the row index.

extract weekdays from a set of dates in R

I know using the lubridate package, I can generate the respective weekday for each date of entry. I am now dealing with a large dataset having a lot of date entries and I wish to extract weekdays for each date entries. I think it is quite impossible to search for each date and to find weekdays. I will love to have a function that will allow me to insert my date column from my data frame and will produce days corresponding to each dates of the frame.
my frame is like
uinq_id Product_ID Date_of_order count
1 Aarkios04_2014-09-09 Aarkios04 2014-09-09 10
2 ABEE01_2014-08-18 ABEE01 2014-08-18 1
3 ABEE01_2014-08-19 ABEE01 2014-08-19 0
4 ABEE01_2014-08-20 ABEE01 2014-08-20 0
5 ABEE01_2014-08-21 ABEE01 2014-08-21 0
6 ABEE01_2014-08-22 ABEE01 2014-08-22 0
i am trying to generate
uinq_id Product_ID Date_of_order count weekday
1 Aarkios04_2014-09-09 Aarkios04 2014-09-09 10 Tues
2 ABEE01_2014-08-18 ABEE01 2014-08-18 1 Mon
3 ABEE01_2014-08-19 ABEE01 2014-08-19 0 Tues
4 ABEE01_2014-08-20 ABEE01 2014-08-20 0 Wed
5 ABEE01_2014-08-21 ABEE01 2014-08-21 0 Thurs
6 ABEE01_2014-08-22 ABEE01 2014-08-22 0 Fri
any help will be highly beneficial.
thank you.
Using weekdays from base R you can do this for a vector all at once:
temp = data.frame(timestamp = Sys.Date() + 1:20)
> head(temp)
timestamp
1 2016-06-01
2 2016-06-02
3 2016-06-03
4 2016-06-04
5 2016-06-05
6 2016-06-06
temp$weekday = weekdays(temp$timestamp)
> head(temp)
timestamp weekday
1 2016-06-01 Wednesday
2 2016-06-02 Thursday
3 2016-06-03 Friday
4 2016-06-04 Saturday
5 2016-06-05 Sunday
6 2016-06-06 Monday
We can use format to get the output
df1$weekday <- format(as.Date(df1$Date_of_order), "%a")
df1$weekday
#[1] "Tue" "Mon" "Tue" "Wed" "Thu" "Fri"
According to ?strptime
%a - Abbreviated weekday name in the current locale on this platform.
(Also matches full name on input: in some locales there are no
abbreviations of names.)
library(lubridate)
date <- as.Date(yourdata$Date_of_order, format = "%Y/%m/%d")
yourdata$WeekDay <- weekdays(date)

Divide time-series data into weekday and weekend datasets using R

I have dataset consisting of two columns (timestamp and power) as:
str(df2)
'data.frame': 720 obs. of 2 variables:
$ timestamp: POSIXct, format: "2015-08-01 00:00:00" "2015-08-01 01:00:00" " ...
$ power : num 124 149 118 167 130 ..
This dataset is of entire one month duration. I want to create two subsets of it - one containing the weekend data, and other one containing weekday (Monday - Friday) data. In other words, one dataset should contain data corresponding to saturday and sunday and the other one should contain data of other days. Both of the subsets should retain both of the columns. How can I do this in R?
I tried to use the concept of aggregate and split, but I am not clear in the function parameter (FUN) of aggregate, how should I specify a divison of dataset.
You can use R base functions to do this, first use strptime to separate date data from first column and then use function weekdays.
Example:
df1<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00"),power=1:2)
df1$day<-strptime(df1[,1], "%Y-%m-%d")
df1$weekday<-weekdays(df1$day)
df1
timestamp power day weekday
2015-08-01 00:00:00 1 2015-08-01 Saturday
2015-10-13 00:00:00 2 2015-10-13 Tuesday
Building on top of #ShruS example:
df<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00", "2015-10-11 00:00:00", "2015-10-14 00:00:00"))
df$day<-strptime(df[,1], "%Y-%m-%d")
df$weekday<-weekdays(df$day)
df1 = subset(df,df$weekday == "Saturday" | df$weekday == "Sunday")
df2 = subset(df,df$weekday != "Saturday" & df$weekday != "Sunday")
> df
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
3 2015-10-11 00:00:00 2015-10-11 Sunday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
> df1
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
3 2015-10-11 00:00:00 2015-10-11 Sunday
> df2
timestamp day weekday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
Initially, I tried for complex approaches using extra libraries, but at the end, I came out with a basic approach using R.
#adding day column to existing set
df2$day <- weekdays(as.POSIXct(df2$timestamp))
# creating two data_subsets, i.e., week_data and weekend_data
week_data<- data.frame(timestamp=factor(), power= numeric(),day= character())
weekend_data<- data.frame(timestamp=factor(),power=numeric(),day= character())
#Specifying weekend days in vector, weekend
weekend <- c("Saturday","Sunday")
for(i in 1:nrow(df2)){
if(is.element(df2[i,3], weekend)){
weekend_data <- rbind(weekend_data, df2[i,])
} else{
week_data <- rbind(week_data, df2[i,])
}
}
The datasets created, i.e., weekend_data and week_data are my required sub datasets.

How to get week numbers from dates?

Looking for a function in R to convert dates into week numbers (of year) I went for week from package data.table.
However, I observed some strange behaviour:
> week("2014-03-16") # Sun, expecting 11
[1] 11
> week("2014-03-17") # Mon, expecting 12
[1] 11
> week("2014-03-18") # Tue, expecting 12
[1] 12
Why is the week number switching to 12 on tuesday, instead of monday? What am I missing? (Timezone should be irrelevant as there are just dates?!)
Other suggestions for (base) R functions are appreciated as well.
Base package Using the function strftime passing the argument %V to obtain the week of the year as decimal number (01–53) as defined in ISO 8601. (More details in the documentarion: ?strftime)
strftime(c("2014-03-16", "2014-03-17","2014-03-18", "2014-01-01"), format = "%V")
Output:
[1] "11" "12" "12" "01"
if you try with lubridate:
library(lubridate)
lubridate::week(ymd("2014-03-16", "2014-03-17","2014-03-18", '2014-01-01'))
[1] 11 11 12 1
The pattern is the same. Try isoweek
lubridate::isoweek(ymd("2014-03-16", "2014-03-17","2014-03-18", '2014-01-01'))
[1] 11 12 12 1
I understand the need for packages in certain situations, but the base language is so elegant and so proven (and debugged and optimized).
Why not:
dt <- as.Date("2014-03-16")
dt2 <- as.POSIXlt(dt)
dt2$yday
[1] 74
And then your choice whether the first week of the year is zero (as in indexing in C) or 1 (as in indexing in R).
No packages to learn, update, worry about bugs in.
Actually, I think you may have discovered a bug in the week(...) function, or at least an error in the documentation. Hopefully someone will jump in and explain why I am wrong.
Looking at the code:
library(lubridate)
> week
function (x)
yday(x)%/%7 + 1
<environment: namespace:lubridate>
The documentation states:
Weeks is the number of complete seven day periods that have occured between the date and January 1st, plus one.
But since Jan 1 is the first day of the year (not the zeroth), the first "week" will be a six day period. The code should (??) be
(yday(x)-1)%/%7 + 1
NB: You are using week(...) in the data.table package, which is the same code as lubridate::week except it coerces everything to integer rather than numeric for efficiency. So this function has the same problem (??).
if you want to get the week number with the year use: "%Y-W%V":
e.g yearAndweeks <- strftime(dates, format = "%Y-W%V")
so
> strftime(c("2014-03-16", "2014-03-17","2014-03-18", "2014-01-01"), format = "%Y-W%V")
becomes:
[1] "2014-W11" "2014-W12" "2014-W12" "2014-W01"
If you want to get the week number with the year, Grant Shannon's solution using strftime works, but you need to make some corrections for the dates around january 1st. For instance, 2016-01-03 (yyyy-mm-dd) is week 53 of year 2015, not 2016. And 2018-12-31 is week 1 of 2019, not of 2018. This codes provides some examples and a solution. In column "yearweek" the years are sometimes wrong, in "yearweek2" they are corrected (rows 2 and 5).
library(dplyr)
library(lubridate)
# create a testset
test <- data.frame(matrix(data = c("2015-12-31",
"2016-01-03",
"2016-01-04",
"2018-12-30",
"2018-12-31",
"2019-01-01") , ncol=1, nrow = 6 ))
# add a colname
colnames(test) <- "date_txt"
# this codes provides correct year-week numbers
test <- test %>%
mutate(date = as.Date(date_txt, format = "%Y-%m-%d")) %>%
mutate(yearweek = as.integer(strftime(date, format = "%Y%V"))) %>%
mutate(yearweek2 = ifelse(test = day(date) > 7 & substr(yearweek, 5, 6) == '01',
yes = yearweek + 100,
no = ifelse(test = month(date) == 1 & as.integer(substr(yearweek, 5, 6)) > 51,
yes = yearweek - 100,
no = yearweek)))
# print the result
print(test)
date_txt date yearweek yearweek2
1 2015-12-31 2015-12-31 201553 201553
2 2016-01-03 2016-01-03 201653 201553
3 2016-01-04 2016-01-04 201601 201601
4 2018-12-30 2018-12-30 201852 201852
5 2018-12-31 2018-12-31 201801 201901
6 2019-01-01 2019-01-01 201901 201901
I think the problem is that the week calculation somehow uses the first day of the year. I don't understand the internal mechanics, but you can see what I mean with this example:
library(data.table)
dd <- seq(as.IDate("2013-12-20"), as.IDate("2014-01-20"), 1)
# dd <- seq(as.IDate("2013-12-01"), as.IDate("2014-03-31"), 1)
dt <- data.table(i = 1:length(dd),
day = dd,
weekday = weekdays(dd),
day_rounded = round(dd, "weeks"))
## Now let's add the weekdays for the "rounded" date
dt[ , weekday_rounded := weekdays(day_rounded)]
## This seems to make internal sense with the "week" calculation
dt[ , weeknumber := week(day)]
dt
i day weekday day_rounded weekday_rounded weeknumber
1: 1 2013-12-20 Friday 2013-12-17 Tuesday 51
2: 2 2013-12-21 Saturday 2013-12-17 Tuesday 51
3: 3 2013-12-22 Sunday 2013-12-17 Tuesday 51
4: 4 2013-12-23 Monday 2013-12-24 Tuesday 52
5: 5 2013-12-24 Tuesday 2013-12-24 Tuesday 52
6: 6 2013-12-25 Wednesday 2013-12-24 Tuesday 52
7: 7 2013-12-26 Thursday 2013-12-24 Tuesday 52
8: 8 2013-12-27 Friday 2013-12-24 Tuesday 52
9: 9 2013-12-28 Saturday 2013-12-24 Tuesday 52
10: 10 2013-12-29 Sunday 2013-12-24 Tuesday 52
11: 11 2013-12-30 Monday 2013-12-31 Tuesday 53
12: 12 2013-12-31 Tuesday 2013-12-31 Tuesday 53
13: 13 2014-01-01 Wednesday 2014-01-01 Wednesday 1
14: 14 2014-01-02 Thursday 2014-01-01 Wednesday 1
15: 15 2014-01-03 Friday 2014-01-01 Wednesday 1
16: 16 2014-01-04 Saturday 2014-01-01 Wednesday 1
17: 17 2014-01-05 Sunday 2014-01-01 Wednesday 1
18: 18 2014-01-06 Monday 2014-01-01 Wednesday 1
19: 19 2014-01-07 Tuesday 2014-01-08 Wednesday 2
20: 20 2014-01-08 Wednesday 2014-01-08 Wednesday 2
21: 21 2014-01-09 Thursday 2014-01-08 Wednesday 2
22: 22 2014-01-10 Friday 2014-01-08 Wednesday 2
23: 23 2014-01-11 Saturday 2014-01-08 Wednesday 2
24: 24 2014-01-12 Sunday 2014-01-08 Wednesday 2
25: 25 2014-01-13 Monday 2014-01-08 Wednesday 2
26: 26 2014-01-14 Tuesday 2014-01-15 Wednesday 3
27: 27 2014-01-15 Wednesday 2014-01-15 Wednesday 3
28: 28 2014-01-16 Thursday 2014-01-15 Wednesday 3
29: 29 2014-01-17 Friday 2014-01-15 Wednesday 3
30: 30 2014-01-18 Saturday 2014-01-15 Wednesday 3
31: 31 2014-01-19 Sunday 2014-01-15 Wednesday 3
32: 32 2014-01-20 Monday 2014-01-15 Wednesday 3
i day weekday day_rounded weekday_rounded weeknumber
My workaround is this function:
https://github.com/geneorama/geneorama/blob/master/R/round_weeks.R
round_weeks <- function(x){
require(data.table)
dt <- data.table(i = 1:length(x),
day = x,
weekday = weekdays(x))
offset <- data.table(weekday = c('Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday'),
offset = -(0:6))
dt <- merge(dt, offset, by="weekday")
dt[ , day_adj := day + offset]
setkey(dt, i)
return(dt[ , day_adj])
}
Of course, you can easily change the offset to make Monday first or whatever. The best way to do this would be to add an offset to the offset... but I haven't done that yet.
I provided a link to my simple geneorama package, but please don't rely on it too much because it's likely to change and not very documented.
Using only base, I wrote the following function.
Note:
Assumes Mon is day number 1 in the week
First week is week 1
Returns 0 if week is 52 from last year
Fine-tune to suit your needs.
findWeekNo <- function(myDate){
# Find out the start day of week 1; that is the date of first Mon in the year
weekday <- switch(weekdays(as.Date(paste(format(as.Date(myDate),"%Y"),"01-01", sep = "-"))),
"Monday"={1},
"Tuesday"={2},
"Wednesday"={3},
"Thursday"={4},
"Friday"={5},
"Saturday"={6},
"Sunday"={7}
)
firstMon <- ifelse(weekday==1,1, 9 - weekday )
weekNo <- floor((as.POSIXlt(myDate)$yday - (firstMon-1))/7)+1
return(weekNo)
}
findWeekNo("2017-01-15") # 2

conditional statement in r

I'm struggling to write an if then statement in R. I have a variable called diel and I would like this variable term to either be "day" or "night" based on the values of a variable called hour. I wrote the code first in SAS and it looks like this:
length diel $5;
if 7 <= hour < 17 then diel = 'day';
if 19 <= hour <= 24 then diel = 'night';
if 0 <= hour < 5 then diel = 'night';
run;
As you can see the hours of dusk(17-19) and dawn (5-7) are left out. This is really the problem I'm having in R, I can't figure out how to leave out dusk and dawn. When I write:
dat4$diel <- ifelse ((dat4$hour)< 17, ifelse((dat4$hour) <=7,"day","night"),"night")
it labels the correct hours of day but labels everything else as night. When I try any other combination like adding another ifelse statement if overwrites the first statement and labels all of hours as day. Thanks for any suggestions!
Something like this might do:
hour <- 0:24
c('night', NA, 'day', NA, 'night')[findInterval(hour, c(0,5,7,17,19,24), rightmost.closed=TRUE)]
## [1] "night" "night" "night" "night" "night" NA NA
## [8] "day" "day" "day" "day" "day" "day" "day"
## [15] "day" "day" "day" NA NA "night" "night"
## [22] "night" "night" "night" "night"
I would do this :
dat = data.frame(hour =0:24)
transform(dat,diel =ifelse( hour < 17 & hour >=7 , 'day',
ifelse(hour>=19 | hour <5,'night',NA)))
hour diel
1 0 night
2 1 night
3 2 night
4 3 night
5 4 night
6 5 <NA>
7 6 <NA>
8 7 day
9 8 day
10 9 day
11 10 day
12 11 day
13 12 day
14 13 day
15 14 day
16 15 day
17 16 day
18 17 <NA>
19 18 <NA>
20 19 night
21 20 night
22 21 night
23 22 night
24 23 night
25 24 night

Resources