Bucketing data into weekly, bi-weekly, monthly and quarterly data in R - r

I have a data frame with two columns. Date, Gender
I want to change the Date column to the start of the week for that observation. For example if Jun-28-2011 is a Tuesday, I'd like to change it to Jun-27-2011. Basically I want to re-label Date fields such that two data points that are in the same week have the same Date.
I also want to be able to do it by-weekly, or monthly and specially quarterly.
Update:
Let's use this as a dataset.
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))

One slick way to do this that I just learned recently is to use the lubridate package:
library(lubridate)
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
#Add 1, since floor_date appears to round down to Sundays
floor_date(datset$date,"week") + 1
I'm not sure about how to do bi-weekly binning, but monthly and quarterly are easily handled with the respective base functions:
quarters(datset$date)
months(datset$date)
EDIT: Interestingly, floor_date from lubridate does not appear to be able to round down to the nearest quarter, but the function of the same name in ggplot2 does.

Look at ?strftime. In particular, the following formats:
%b: Abbreviated month name in the
current locale. (Also matches full
name on input.)
%B: Full month name
in the current locale. (Also matches
abbreviated name on input.)
%m: Month as decimal number (01–12).
%W: Week of the year as decimal number
(00–53) using Monday as the first day
of week (and typically with the first
Monday of the year as day 1 of week
1). The UK convention.
eg:
> strftime("2011-07-28","Month: %B, Week: %W")
[1] "Month: July, Week: 30"
> paste("Quarter:",ceiling(as.integer(strftime("2011-07-28","%m"))/3))
[1] "Quarter: 3"

Related

Convert from character to date in a "YYYY-WW" format in R

I have a hard time converting character to date in R.
I have a file where the dates are given as "2014-01", where the first is the year and the second is the week of the year. I want to convert this to a date type.
I have tried the following
z <- as.Date('2014-01', '%Y-%W')
print(z)
Output: "2014-12-05"
Which is not what I desire. I want to get the same format out, ie. the output should be "2014-01" but now as a date type.
It sounds like you are dealing with some version of year week, which exists in three forms in lubridate:
week() returns the number of complete seven day periods that have
occurred between the date and January 1st, plus one.
isoweek() returns the week as it would appear in the ISO 8601 system,
which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows
same rules as isoweek() but starts on Sunday. In other parts of the
world the convention is to start epidemiological weeks on Monday,
which is the same as isoweek.
Lubridate has functions to extract these from a date, but I don't know of a built-in way to go the other direction, from week to one representative day (out of 7 possible). One simple way if you're dealing with the first version would be to add 7 * (Week - 1) to jan 1 of the year.
library(dplyr)
data.frame(yearweek = c('2014-01', '2014-03')) %>%
tidyr::separate(yearweek, c("Year", "Week"), convert = TRUE) %>%
mutate(Date = as.Date(paste0(Year, "-01-01")) + 7 * (Week-1))
Year Week Date
1 2014 1 2014-01-01
2 2014 3 2014-01-15

Is there a way to use the round date to next trading day while keeping both date and variable columns in R?

How can I round the dates in the date column to the following business day? So each Saturday, Sunday and holiday should be transformed to the following business day. Furthermore, how can we include the output from the other columns as well in the transformation to following business days?
I tried this with the bizdays function:
TestDates <- RawTweetDataWSentiment
View(TestDates)
bizdays.options$set(default.calendar="UnitedKingdom/ANBIMA")
cal <- create.calendar("UnitedKingdom/ANBIMA", holidays=holidaysANBIMA, weekdays=c("saturday", "sunday"))
adjust.next(TestDates$Date, cal)
TestDates1 <- adjust.next(TestDates$Date, cal)
View(TestDates1)
This however only returns the date column
Does anyone know how to do this in R?

Convert YYYY-YY to Year(date)

I have a data frame with year column as financial year
Year
2001-02
2002-03
2003-04
How can I convert this to as.Date keeping either the whole thing or just the second year i.e 2002,2003,2004. On converting with %Y, I inevitably get 2001-08-08, 2002-08-08, 2003-08-08 etc.
Thanks
library(lubridate)
Year <- c('2001-02', '2002-03', '2003-04')
year(as.Date(gsub('[0-9]{2}-', '', Year), format = '%Y'))
1) ISOdate Clarifying the question, since it refers to yearend and Date we assume that the input is the fiscal Year shown in the question (plus we have added the "1999-00" edge case) as well as the month and day of the yearend. We assume that the output desired is the yearend as a Date object. (If that is not the intended question and you just want the fiscal yearend year as a number then see Note at the end.)
Returning to the assumed problem let us suppose, for example, that March 31st is the yearend. Below we extract the first 4 character of Year using substring, convert that to numeric and add 1. Then we pass that along with month and day to ISODate and finally convert that to Date. No regular expressions or packages are used.
# test inputs
month <- 3
day <- 31
Year <- c("1999-00", "2001-02", "2002-03", "2003-04")
# yearends
as.Date(ISOdate( as.numeric(substring(Year, 1, 4))+1, month, day))
## [1] "2000-03-31" "2002-03-31" "2003-03-31" "2004-03-31"
2) string manipulation An alternative solution using the same inputs is the following. It is similar except that we use sub with a regular expression that matches the minus and following two characters subtituting a zero length string for them, converts to numeric and adds 1. Then it formats a string in a format acceptable to as.Date by using sprintf and finally applies as.Date. No packages are used.
as.Date(sprintf("%d-%d-%d", as.numeric(sub("-..", "", Year))+1, month, day))
## [1] "2000-03-31" "2002-03-31" "2003-03-31" "2004-03-31"
Note: If you only wanted the fiscal yearend year as a number then it would be just this:
as.numeric(substring(Year, 1, 4)) + 1

R: Creating two date variables from a complete date

I have date recorded as: Month/Day/Year or MM/DD/YYYY
I would like to write code that creates two new variables from that information.
I would like a year variable alone
I would like to create a quarter variable
The Quarter Variables would not be influenced by year. I would want this variable to apply to all years.
Quarter 1 would be January 1 - March 31
Quarter 2 would be April 1 - June 30
Quarter 3 would be July 1 - September 30
Quarter 4 would be October 1 - December 31
Any assistance would be greatly appreciated. I cannot seem to get the nuance of how to do these functions in R.
Thanks,
Jared
Assuming that the date variable is of class POSIX** you could do:
#example date
date <- as.POSIXlt( "05/12/2015", format='%m/%d/%Y')
In order to return the year from a date data.table has already a function to do it and that is year:
library(data.table)
> year(date)
[1] 2015
As for the quarter it can easily be created from the function below (uses data.table::month that returns the number of a month):
quarter <- function(x) {
rep(c('quarter 1','quarter 2','quarter 3','quarter 4'), each=3)[month(x)]
}
> quarter(date)
[1] "quarter 2"
Using only the base packages:
Try formatting your dates with the strptime fxn, so that all dates are now in the Year-Month-Day format. This format constrains the each element of the date to be the same character length and in the same position. Look at the strptime documentation for the appropriate formatting argument.
date.vec<-c(1/1/1999,2/2/1999)
fmt.date.vec<-strptime(date.vec, "%m/%d/%Y")
With the dates in this format it is easy to extract the year, month, and day using the substring function
Year<-substring(fmt.date.vec,1,4)
Month<-substring(fmt.date.vec,6,7)
Day<-substring(fmt.date.vec,9,10)
With this information you can now generate your Quarter vector any number of ways. For example if a data.frame "df" has a Month column:
df$Quarter<-"Quarter_1"
df[df$Month %in% c("04","05","06"),]$Quarter<-"Quarter_2"
df[df$Month %in% c("07","08","09"),]$Quarter<-"Quarter_3"
df[df$Month %in% c("10","11","12"),]$Quarter<-"Quarter_4"

Post-Process a Stata %tw date in R

The %tw format in Stata has the form: 1960w1 which has no equivalent in R.
Therefore %tw dates must be post-processed.
Importing a .dta file into R, the date is an integer like 1304 (instead of 1985w5) or 1426 (instead of 1987w23). If it was a simple time series you could set a starting date as follows:
ts(df, start= c(1985,5), frequency=52)
Another possibility would be:
as.Date(Camp$date, format= "%Yw%W" , origin = "1985w5")
But if each row is not a single date, then you must convert it.
The package ISOweek is based on ISO-8601 with the form "1985-W05" and does not process the Stata %tw.
The Lubridate package does not work with this format. The week() returns the number of complete seven day periods that have occurred between the date and January 1st, plus one. week function
In Stata week 1 of any year starts on 1 January, whatever day of the week that is. Stata Documentation on Dates
In the format %W of Date in R the week starts as Monday as first day of the week.
From strptime %V is
the Week of the year as decimal number (00--53) as defined in ISO
8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise,
it is the last week of the previous year, and the next week is week 1.
(Accepted but ignored on input.) Strptime
Larmarange noted on Github that Haven doesn't interpret dates properly:
months, week, quarter and halfyear are specific format from Stata,
respectively %tm, %tw, %tq and %th. I'm not sure that there are
corresponding formats available in R. So far they are imported as
integers.
Is there a way to convert Stata %tw to a date format R understands?
Here is an Stata file with dates
This won't be an answer in terms of R code, but it is commentary on Stata weeks that can't be fitted into a comment.
Strictly, dates in Stata are not defined by the display formats that make them intelligible to people. A date in Stata is always a numeric variable or scalar or macro defined with origin the first instance in 1960. Thus it is at best a shorthand to talk about %tw dates, etc. We can use display to see the effects of different date display formats:
. di %td 0
01jan1960
. di %tw 0
1960w1
. di %tq 0
1960q1
. di %td 42
12feb1960
. di %tw 42
1960w43
. di %tq 42
1970q3
A subtle point made explicit above is that changing the display format will not change what is stored, i.e. the numeric value.
Otherwise put, dates in Stata are not distinct data types; they are just integers made intelligible as dates by a pertinent display format.
The question presupposes that it was correct to describe some weekly dates in terms of Stata weeks. This seems unlikely, as I know no instance in which a body outside StataCorp uses the week rules of Stata, not only that week 1 always starts on 1 January, but also that week 52 always includes either 8 or 9 days and hence that there is never a week 53 in a calendar year.
So, you need to go upstream and find out what the data should have been. Failing some explanation, my best advice is to map the 52 weeks of each year to the days that start them, namely days 1(7)358 of each calendar year.
Stata weeks won't map one-to-one to any other scheme for defining weeks.
More in this article on Stata weeks
It's not completely clear what the question is but the year and week corresponding to 1304 are:
wk <- 1304
1960 + wk %/% 52
## [1] 1985
wk %% 52 + 1
## [1] 5
so assuming that the first week of the year is week 1 and starts on Jan 1st, the beginning of the above week is this date:
as.Date(paste(1960 + wk %/% 52, 1, 1, sep = "-")) + 7 * (wk %% 52)
## [1] "1985-01-29"

Resources