R: Get workweek number, not seven day periods since Jan 1st - r

Hi I am looking at data to do with prices of commodities throughout a period of a few years. I want to summarize prices by work weeks, not weeks defined by seven day periods since Jan 1st. When I tried:
data <- mutate(data, week = week(strptime(Date, "%m/%d/%Y")))
The lubridate week() function counts "1/13/10" (mdy) as week 2 and "1/14/10" as week 3. I want those to be in the same week. Basically any run of mon-fri in the same week. If the year starts on a wednesday I want week1 to be wed-fri, week2 to start the next monday. I have no data on any weekends. Any thoughts? Thanks

This will give you week number assuming Date column is in Date format (you can use as.Date() to convert):
data <- mutate(data, week = format(Date, '%U'))
If you want week and year, you can use:
data <- mutate(data, week = format(Date, '%Y-%U'))
It will correctly number partial weeks.
Note: week number starts with 00 (but, that should be no problem).
You can also do it WITHOUT dplyr and it's mutate, like this:
data$week <- format(data$Date, '%U')

Related

Convert from character to date in a "YYYY-WW" format in R

I have a hard time converting character to date in R.
I have a file where the dates are given as "2014-01", where the first is the year and the second is the week of the year. I want to convert this to a date type.
I have tried the following
z <- as.Date('2014-01', '%Y-%W')
print(z)
Output: "2014-12-05"
Which is not what I desire. I want to get the same format out, ie. the output should be "2014-01" but now as a date type.
It sounds like you are dealing with some version of year week, which exists in three forms in lubridate:
week() returns the number of complete seven day periods that have
occurred between the date and January 1st, plus one.
isoweek() returns the week as it would appear in the ISO 8601 system,
which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows
same rules as isoweek() but starts on Sunday. In other parts of the
world the convention is to start epidemiological weeks on Monday,
which is the same as isoweek.
Lubridate has functions to extract these from a date, but I don't know of a built-in way to go the other direction, from week to one representative day (out of 7 possible). One simple way if you're dealing with the first version would be to add 7 * (Week - 1) to jan 1 of the year.
library(dplyr)
data.frame(yearweek = c('2014-01', '2014-03')) %>%
tidyr::separate(yearweek, c("Year", "Week"), convert = TRUE) %>%
mutate(Date = as.Date(paste0(Year, "-01-01")) + 7 * (Week-1))
Year Week Date
1 2014 1 2014-01-01
2 2014 3 2014-01-15

Convert day of year to date assuming all years are non-leap years

I have a df with year and day of year as columns:
dat <- data.frame(year = rep(1980:2015, each = 365), day = rep(1:365,times = 36))
Please note that I am assuming 365 days in a year even if it is a leap year. I need to generate two things:
1) month
2) date
I did this:
# this tells me how many days in each month
months <- list(1:31, 32:59, 60:90, 91:120, 121:151, 152:181, 182:212, 213:243, 244:273, 274:304, 305:334, 335:365)
library(dplyr)
# this assigns each day to a month
dat1 <- dat %>% mutate(month = sapply(day, function(x) which(sapply(months, function(y) x %in% y))))
I want to produce a third column which is a date in the format year,month,day.
However, since I am assuming all years are non-leap years, I need to ensure that my dates also reflect this i.e. there should be no date as 29th Feb.
The reason I need to generate the date is because I want to generate number
of 15 days period of a year. A year will have 24 15-days period
1st Jan - 15th Jan: 1 period
16th Jan- 31st Jan: 2 period
1st Feb - 15th Feb: 3 period....
16th till 31st dec: 24th period)
I need dates to specify whether a day in a month falls in the first
half (i.e.d day <= 15) or second quarter (day > 15). I use the following
script to do this:
dat2 <- dat1 %>% mutate(twowk = month*2 - (as.numeric(format(date,"%d")) <= 15))
In order for me to run this above line, I need to generate date and hence my question.
A possible solution:
dat$dates <- as.Date(paste0(dat$year,'-',
format(strptime(paste0('1981-',dat$day), '%Y-%j'),
'%m-%d'))
)
What this does:
With strptime(paste0('1981-',dat$day), '%Y-%j') you get the dates of a non-leap year.
By embedding that in format with '%m-%d' you extract the month and the day in the month.
paste that together with the year in the year-column and wrap that in as.Date to get a non-leap-year date.

Split dataframe and calculate averages for data subsets in R

I have this data frame in R:
steps day month
4758 Tuesday December
9822 Wednesday December
10773 Thursday December
I want to iterate over the data frame and apply a function to the steps column based on the value in the month column. I'm trying to work out the average number of steps per weekday for each month.
I want to output to a new data frame like so where the week days repeat but I only have the average values per day:
average.steps day month
4500 Tuesday December
9000 Wednesday December
1000 Thursday December
I can work out how to work out the averages for the data frame as a whole, but want to use a for loop to apply it just for step values from the same month.
avgsteps <- ddply(DATA, "day", summarise, msteps = mean(steps))
My basic idea for the for function was:
f <- function(m in month) {ddply(DATA, "day", summarise, msteps = mean(steps))}
But it won't process it and throws the error:
Error: unexpected 'in' in "f <- function(m in"
Any help would be greatly appreciated!
EDIT:
SO I've tried #agstudy's suggested fix (below) and it gets the right data structure (single value for each weekday for each month), but the value assigned to each day is identical. I'm a bit confused what could be going wrong.
steps.month.day.avg <- ddply(steps.month.day, .(fitbit.day,fitbit.month), summarise, msteps = mean(steps))
No need to loop here , you should just change the variables to split data frame by,
ddply(DATA, .(day,month), summarise, msteps = mean(steps))

Calculating the number of weeks for each year based on dates using R

I have a dataset with dates of 2 different years (2009 and 2010) and would like to have the corresponding week number for each date.
My dataset is similar to this:
anim <- c(012,023,045,098,067)
dob <- c("01-09-2009","12-09-2009","22-09-2009","10-10-2010","28-10-2010")
mydf <- data.frame(anim,dob)
mydf
anim dob
1 12 01-09-2009
2 23 12-09-2009
3 45 22-09-2009
4 98 10-10-2010
5 67 28-10-2010
I would like to have variable "week" in the third column with the corresponding week numbers for each date.
EDIT:
Note: Week one begins on January 1st, week two begins on January 8th for each year
Any help would be highly appreciated.
Baz
Your definition of "week of year"
EDIT: Note: Week one begins on January 1st, week two begins on January 8th for each year
differs from the standard ones supported by strftime:
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1
of the week (and typically with the first Sunday of the year as day 1 of
week 1). The US convention.
%W
Week of the year as decimal number (00–53) using Monday as the first day
of week (and typically with the first Monday of the year as day 1 of week
1). The UK convention.
So you need to compute it based on the day-of-year number.
mydf$week <- (as.numeric(strftime(as.POSIXct(mydf$dob,
format="%d-%m-%Y"),
format="%j")) %/% 7) + 1
Post 2011 Answer
library(lubridate)
mydf$week <- week(mydf$week)
lubridate package is straight-forward for day-to-day tasks like this.
If you want to do how many weeks (or 7 day periods) have passed between your date of interest and the first day of the year, regardless of what day of the week it was on the first of the year, the following is a solution (using floor_date from lubridate).
mydf$weeks <- difftime(mydf$dob, floor_date(mydf$dob, "year"), units = c("weeks")))

Bucketing data into weekly, bi-weekly, monthly and quarterly data in R

I have a data frame with two columns. Date, Gender
I want to change the Date column to the start of the week for that observation. For example if Jun-28-2011 is a Tuesday, I'd like to change it to Jun-27-2011. Basically I want to re-label Date fields such that two data points that are in the same week have the same Date.
I also want to be able to do it by-weekly, or monthly and specially quarterly.
Update:
Let's use this as a dataset.
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
One slick way to do this that I just learned recently is to use the lubridate package:
library(lubridate)
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
#Add 1, since floor_date appears to round down to Sundays
floor_date(datset$date,"week") + 1
I'm not sure about how to do bi-weekly binning, but monthly and quarterly are easily handled with the respective base functions:
quarters(datset$date)
months(datset$date)
EDIT: Interestingly, floor_date from lubridate does not appear to be able to round down to the nearest quarter, but the function of the same name in ggplot2 does.
Look at ?strftime. In particular, the following formats:
%b: Abbreviated month name in the
current locale. (Also matches full
name on input.)
%B: Full month name
in the current locale. (Also matches
abbreviated name on input.)
%m: Month as decimal number (01–12).
%W: Week of the year as decimal number
(00–53) using Monday as the first day
of week (and typically with the first
Monday of the year as day 1 of week
1). The UK convention.
eg:
> strftime("2011-07-28","Month: %B, Week: %W")
[1] "Month: July, Week: 30"
> paste("Quarter:",ceiling(as.integer(strftime("2011-07-28","%m"))/3))
[1] "Quarter: 3"

Resources