R: create index for xts time object from calendar week , e.g. 201501 ... 201553 - r

I know how to get the week from an index, but don't know the other way around: how to create an index if I have the calendar weeks (in this case, from an SAP system with 0CALWEEK as 201501, 201502 ... 201552, 201553.
Found this:
How to Parse Year + Week Number in R?
but the day is needed and it's not clear how to set it, especially at the end of the year (Year - week - day: YEAR-53-01 does not always exist, since the first day of week 53 might be Monday, then 01 (Sunday) is not in that week.
I could try to get in the source system the first day of the corresponding week (through SQL) but thought R might do it easier...
Do you have any suggestions?
(Which first day of the week would be not important , since I will create all objects the same way and then merge/cbind them, then continue the analysis. If zoo is easier, I'll go with it)
Thanks!

The problem is that all indices end in 2015-07-29:
data <- 1:4
weeks <- c('201501','201502','201552','201553')
weeks_2 <- as.Date(weeks,format='%Y%w')
xts(data, order.by = weeks_2)
[,1]
2015-07-29 1
2015-07-29 2
2015-07-29 3
2015-07-29 4
test <- xts(data, order.by = weeks_2)
index(test)
[1] "2015-07-29" "2015-07-29" "2015-07-29" "2015-07-29"

You can use as.Date() function, I think is the easiest way:
weeks <- c('201501','201502','201552','201553')
as.Date(paste0(weeks,'1'),format='%Y%W%w') # paste a dummy day
## [1] "2015-01-05" "2015-01-12" "2015-12-28" NA
Where:
%W: Week 00-53 with Monday as first day of the week
or
%U: Week 01-53 with Sunday as first day of the week
%w: Weekday 0-6 Sunday is 0
For this year, week number 53 doesn't exist. And If you want to start with 2015-01-01, just set the right week day:
weeks <- c('201500','201501','201502','201551','201552')
as.Date(paste0(weeks,'4'),format='%Y%W%w')
## [1] "2015-01-01" "2015-01-08" "2015-01-15" "2015-12-24" "2015-12-31"

You may try with substr() and lubridate
library(lubridate)
# a number from your list: 201502
# set the year
x <- ymd("2015-01-1")
# retrieve second week
week(x) <- 2
x
[1] "2015-01-08"
you can use the result for your Index or rownames().
zoo and xts are great for time series once you have set the names,
be sure to remove any column with dates from your data frame

Related

Adding quarters to R date

I have a R time series data, where I am calculating the means for all values up to a particular date, and storing this means in the date + 4 quarters. The dates are all month ends. To achieve this, I am looking to increment 4 quarters to a date. My question is how can I add 4 quarters to an R date data-type. An illustration:
a <- as.Date("2006-01-01")
b <- as.Date("2011-01-01")
date_range <- quarter(seq.Date(a, b, by = "quarter"), with_year = TRUE)
> date_range[1] + 1
[1] 2007.1
> date_range[1] + quarter(1)
[1] 2007.1
> date_range[1] + 0.25
[1] 2006.35
One possible way I am thinking is to get year-quarter dates, and then adding 4 to it. But wasn't sure what is the best way to do this?
The problem is that quarters have different lengths. Q1 is shortest because it includes February (though it ties with Q2 in leap years). Things like this make "adding a quarter to a date" poorly defined. Even adding months to a date can be tricky at the ends months - what is 1 month after January 31?
Beginnings of months are more straightforward, and I would recommend you use the 1st day of quarters rather than the last (if you must use a specific date). lubridate provides functions like floor_date() and ceiling_date() to which you can pass unit = "quarter" and they will return the first day of the current or subsequent quarter, respectively. You can also always add months(3) to a day at the beginning of a month, though of course if your intention is to add 4 quarters you may as well just add 1 year.
Just add 12 months or a year instead?
Or if it must be quarters, define yourself a function, like so:
quarters <- function(x) {
months(3*x)
}
and then use it to add to the date sequence:
date_range <- seq.Date(a, b, by = "quarter")
date_range + quarters(4)
Lubridate has a function for quarters already included. This is a much better solution than creating your own function.
https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/quarter
Old answer but to those arriving here, lubridate has a function %m+%that adds months and preserves monthends.
a <- as.Date("2006-01-01")
Add future months worth of dates:
The original poster wanted 4 quarters in future so that will be 12 months.
future_date <- a %m+% months(12)
future_date
[1] "2007-01-01"
You could also do years as the period:
future_date <- a %m+% years(1)
Remove months from date:
Subtract dates with %m-%
If you wanted a date 3 months ago from 1/1/2006:
past_date <- a %m-% months(3)
past_date
[1] "2005-10-01"
Example with dates not at end of months:
mplus will preserve days in month:
as.Date("2022-10-10") %m-% months(3)
[1] "2022-07-10"
For more, see documentation on "Add and subtract months to a date without exceeding the last day of the new month"
Note that other answers that use Date class will give irregularly spaced series and so are unsuitable for time series analysis.
To do this in such a way that time series analyses can be performed and noting the zoo tag on the question, the yearmon class represents year/month as year + fraction where fraction is 0 for Jan, 1/12 for Feb, 2/12 for Mar, ..., 11/12 for Dec. Thus adding 4 quarters is just a matter of adding 1. (Adding x quarters is done by adding x/4.)
library(zoo)
ym <- yearmon(2006) + 0:11/12 # months in 2006
ym + 1 # one year later
Also this converts yearmon objects to end-of-month Date and in the second line Date to yearmon. Using frac = 0 or omitting frac in the first line would convert to beginning of month dates.
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-months
as.yearmon(d) # convert Date vector to yearmon
If your input dates represent quarters then there is also the yearqtr class which represents a year/quarter as year + fraction where fraction is 0, 1/4, 2/4, 3/4 for the 4 quarters of a year. Adding 4 quarters is done by adding 1 (or to add x quarters add x/4).
yq <- as.yearqtr(2006) + 0:3/4 # all quarters in 2006
yq + 1 # one year later
Conversions work similarly to yearmon:
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-quarters
as.yearqtr(d) # convert Date vector to yearqtr

sequence of monthly dates making sure it's the same day, or the last day of month in case of invalid

Given an initial date, I want to generate a sequence of dates with monthly intervals, ensuring every element has the same day as the initial date or the last day of the month in case the same day would yield an invalid date.
Sounds pretty standard, right?
Using difftime is not possible. Here's what the help file of difftime says:
Units such as "months" are not possible as they are not of constant
length. To create intervals of months, quarters or years use seq.Date
or seq.POSIXt.
But then looking at the help file of seq.POSIXt I find that:
Using "month" first advances the month without changing the day: if
this results in an invalid day of the month, it is counted forward
into the next month: see the examples.
This is the example in the help file.
seq(ISOdate(2000,1,31), by = "month", length.out = 4)
> seq(ISOdate(2000,1,31), by = "month", length.out = 4)
[1] "2000-01-31 12:00:00 GMT" "2000-03-02 12:00:00 GMT"
"2000-03-31 12:00:00 GMT" "2000-05-01 12:00:00 GMT"
So, given that the initial date is on day 31, this would yield invalid dates on February, April, etc. So, the sequence end up actually skipping those months because it "counts forward" and end up with March-02, instead of February-29.
If I start on 2000-01-31, I would like the sequence as follows:
2000-01-31
2000-02-29
2000-03-31
2000-04-30
...
And it should properly handle leap-years, so if the initial date is 2015-01-31 the sequence should be:
2015-01-31
2015-02-28
2015-03-31
2015-04-30
...
These are just examples to illustrate the problem and I do not know the initial date in advance, nor can I assume anything about it. The initial date may well be in the middle of the month (2015-01-15) in which case seq works fine. But it can also be, as in the examples, towards the end of the month on dates that using seq alone would be problematic (days 29, 30 and 31). I cannot assume either that the initial date is the last day of the month.
I have looked around trying to find a solution. In some questions here in SO (e.g. here) there is a "trick" to get the last day of a month, by getting the first day of the next month and simply subtract 1. And finding the first day is "easy" because it is just day 1.
So my solution so far is:
# Given an initial date for my sequence
initial_date <- as.Date("2015-01-31")
# Find the first day of the month
library(magrittr) # to use pipes and make the code more readable
firs_day_of_month <- initial_date %>%
format("%Y-%m") %>%
paste0("-01") %>%
as.Date()
# Generate a sequence from initial date, using seq
# This is the sequence that will have incorrect values in months that would
# have invalid dates
given_dat_seq <- seq(initial_date, by = "month", length.out = 4)
# And then generate an auxiliary sequence for the last day of the month
# I do this generating a sequence that starts the first day of the
# same month as initial date and it goes one month further
# (lenght 5 instead of 4) and substract 1 to all the elements
last_day_seq <- seq(firs_day_of_month, by = "month", length.out = 5)-1
# And finally, for each pair of elements, I take the min date of both
pmin(given_dat_seq, last_day_seq[2:5])
It works, but it is, at the same time, kinda dumb, hacky and convoluted. So I do not like it. And most importantly, I cannot believe there is no easier way to do this in R.
Can someone please point me to a simpler solution? (I guess it should have been as simple as seq(initial_date, "month", 4), but apparently it is not). I've googled it and looked here in SO and R mailing lists, but apart from the tricks I mentioned above, I couldn't find a solution.
The simplest solution is %m+% from lubridate, which solves this exact problem. So:
seq_monthly <- function(from,length.out) {
return(from %m+% months(c(0:(length.out-1))))
}
Output:
> seq_monthly(as.Date("2015-01-31"),length.out=4)
[1] "2015-01-31" "2015-02-28" "2015-03-31" "2015-04-30"
Similar to the lubridate answer, here is one using RcppBDT (which wraps the Boost Date.Time library from C++)
R> dt <- new(bdtDt, 2010, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2010-02-28"
[1] "2010-04-30"
[1] "2010-07-31"
[1] "2010-11-30"
[1] "2011-04-30"
R> dt <- new(bdtDt, 2000, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2000-02-29"
[1] "2000-04-30"
[1] "2000-07-31"
[1] "2000-11-30"
[1] "2001-04-30"
R>

R: Creating two date variables from a complete date

I have date recorded as: Month/Day/Year or MM/DD/YYYY
I would like to write code that creates two new variables from that information.
I would like a year variable alone
I would like to create a quarter variable
The Quarter Variables would not be influenced by year. I would want this variable to apply to all years.
Quarter 1 would be January 1 - March 31
Quarter 2 would be April 1 - June 30
Quarter 3 would be July 1 - September 30
Quarter 4 would be October 1 - December 31
Any assistance would be greatly appreciated. I cannot seem to get the nuance of how to do these functions in R.
Thanks,
Jared
Assuming that the date variable is of class POSIX** you could do:
#example date
date <- as.POSIXlt( "05/12/2015", format='%m/%d/%Y')
In order to return the year from a date data.table has already a function to do it and that is year:
library(data.table)
> year(date)
[1] 2015
As for the quarter it can easily be created from the function below (uses data.table::month that returns the number of a month):
quarter <- function(x) {
rep(c('quarter 1','quarter 2','quarter 3','quarter 4'), each=3)[month(x)]
}
> quarter(date)
[1] "quarter 2"
Using only the base packages:
Try formatting your dates with the strptime fxn, so that all dates are now in the Year-Month-Day format. This format constrains the each element of the date to be the same character length and in the same position. Look at the strptime documentation for the appropriate formatting argument.
date.vec<-c(1/1/1999,2/2/1999)
fmt.date.vec<-strptime(date.vec, "%m/%d/%Y")
With the dates in this format it is easy to extract the year, month, and day using the substring function
Year<-substring(fmt.date.vec,1,4)
Month<-substring(fmt.date.vec,6,7)
Day<-substring(fmt.date.vec,9,10)
With this information you can now generate your Quarter vector any number of ways. For example if a data.frame "df" has a Month column:
df$Quarter<-"Quarter_1"
df[df$Month %in% c("04","05","06"),]$Quarter<-"Quarter_2"
df[df$Month %in% c("07","08","09"),]$Quarter<-"Quarter_3"
df[df$Month %in% c("10","11","12"),]$Quarter<-"Quarter_4"

Lubridate week() to find consecutive week number for multi-year periods

Within R, say I have a vector of some Lubridate dates:
> Date
"2012-01-01 UTC"
"2013-01-01 UTC"
Next, suppose I want to see what week number these days fall in:
> week(Date)
1
1
Lubridate is fantastic!
But wait...I'm dealing a time series with 10,000 rows of data...and the data spans 3 years.
I've been struggling with finding some way to make this happen:
> result of awesome R code here
1
54
The question: is there a succinct way to coax out a list of week numbers over multiyear periods within Lubridate? More directly, I would like the first week of the second year to be represented as the 54th week. And the first week in the third year to be represented as the 107th week, ad nauseum.
So far, I've attempted a number of hackney schemes but cannot seem to create something not fastened together with scotch tape. Any advice would be greatly appreciated. Thanks in advance.
To get the interval from a particular date to another date, you can just subtract...
If tda is your vector of dates, then
tda - min(tda)
will be the difference in seconds between them.
To get the units out in weeks:
(tda - min(tda))/eweeks(1)
To do it from a particular date:
tda - ymd(19960101)
This gives the number of days from 1996 to each value.
From there, you can divide by days per week, or seconds per week.
(tda - ymd(19960101))/eweeks(1)
To get only the integer part, and starting from January 2012:
trunc((tda - ymd(20111225))/eweeks(1))
Test data:
tda = ymd(c(20120101, 20120106, 20130101, 20130108))
Output:
1 1 53 54
Since eweeks() is now deprecated, I thought I'd add to #beroe's answer.
If tda is your date vector, you can get the week numbers with:
weeknos <- (interval(min(tda), tda) %/% weeks(1)) + 1
where %/% causes integer division. ( 5 / 3 = 1.667; 5 %/% 3 = 1)
You can do something like this :
week(dat) +53*(year(dat)-min(year(dat)))
Given you like lubridate (as do I)
year_week <- function(x,base) week(x) - week(base) + 52*(year(x) - year(base))
test <- ymd(c(20120101, 20120106, 20130101, 20130108))
year_week(test, "2012-01-01")
Giving
[1] 0 0 52 53

Calculate the week number (0-53) in year

I have a dataset with locations and dates. I would like to calculate week of the year as number (00–53) but using Thursday as the first day of the week. The data looks like this:
location <- c(a,b,a,b,a,b)
date <- c("04-01-2013","26-01-2013","03-02-2013","09-02-2013","20-02-2013","03-03-2013")
mydf <- data.frame(location, date)
mydf
I know that there is strftime function for calculating week of year but it is only possible to use Monday or Sunday as the first day of the week.
Any help would be highly appreciated.
Just add 4 to the Date-formatted values:
> mydf$Dt <- as.Date(mydf$date, format="%d-%m-%Y")
> weeknum <- as.numeric( format(mydf$Dt+3, "%U"))
> weeknum
[1] 1 4 5 6 7 9
This uses a 0 based counting convention since that is what strftime provides and we are just piggybacking off that code base, so the first Friday in a year that begins on Tuesday as was the case in 2013 would be a 1-week result. Add 1 to the value if you want a 1 based convention. (Fundamentally, Date-formated values are in an integer sequence from the "origin" so they don't really recognize years or weeks. Adding 4 just shifts the reference frame of the underlying Date-integer.)
Edit note. Changed to an add three strategy per Gabor's advice. .... which still does not address the question of how to deal with the last week of the prior year.
Since the question stated that week goes from 00-53 we assume that the week number is the number of Thursdays in the year on or before the date in question. Thus, the first Thursday in the year begins week 1 and week 0 is assigned to any days prior to that.
(There were comments that if the first day of the year were Tuesday then that would be week 1 but if that were the case there could never be a week 0 as seems to be required in the subject so some clarification on precisely what the definition of week number is may be required. Here we are going to use the definition in the preceding paragraph but it would not be hard to change it if we knew what the definition was. For example, if we always wanted the first week in the year to be 1 even if it were a short week then we could add !is.thu(jan1(d)) to the result.)
Both of the solutions below are short enough that they could be expressed in one statement; however, we have factored them into several short functions each for clarity. The first is particularly straight forward but the second is automatically vectorized without the need for a sapply and would likely be more efficient.
1. sum Thursdays in year This solution assumes the input d is of class "Date" and just sums the number of Thursdays in the year before or on it:
is.thu <- function(x) weekdays(x) == "Thursday"
jan1 <- function(x) as.Date(cut(x, "year"))
week4 <- function(d) {
sapply(d, function(d) sum(is.thu(seq(jan1(d), d, by = "day"))))
}
We can test it like this:
d <- as.Date(c("2013-01-04", "2013-01-26", "2013-02-03", "2013-02-09",
"2013-02-20", "2013-03-03"))
week4(d) # 1 4 5 6 7 9
2. nextthu
Based on the nextfri function in the zoo quickref vignette we see that the number of days since the Epoch (1970-01-01) of the next Thursday (or the day in question if its already a Thursday) is as given by nextthu in the first line below. Applying this to the first day of the year we derive the result where d is as before:
nextthu <- function(d) 7 * ceiling(as.numeric(d) / 7)
week4a <- function(d) (as.numeric(d) - nextthu(jan1(d))) %/% 7 + 1
and here is a test
week4a(d) # 1 4 5 6 7 9
ADDED: fixed bug in second solution.

Resources