I am working with a dateframe (INPUT) that contains number the of transaction of a product per calendar quarter. The first column (DATE) contains the calendar quarter in this format "2016 Q2". I would like to transform this date into the a financial quarter format such as "2016/17 Q1". The financial year start in the 1st April.
I came up with the following code which does the job, but I was wondering if there is a formula or a neater code that I could use.
INPUT$FY_Date=character(nrow(INPUT))
for (i in 1:nrow(INPUT)) {
INPUT$FY_Date[i]= if(substr(INPUT$DATE[i],7,7)==1) paste(as.numeric(substr(INPUT$DATE[i],1,4))-1,"/",substr(INPUT$DATE[i],3,4)," Q4",sep="") else
paste(substr(INPUT$DATE[i],1,4),"/", formatC(as.numeric(substr(INPUT$DATE[i],3,4))+1,width=2,format="d",flag=0)," Q",as.numeric(substr(INPUT$DATE[i],7,7))-1,sep="")
}
I could not find any previous related posts so I would appreciate any guidance.
Using the "yearqtr" class defined in zoo we can do it in two lines of code.
Convert to "yearqtr". The "yearqtr" class uses an internal representation of year + (qtr-1)/4 where qtr is 1, 2, 3 or 4 so adding 3/4 will shift it to the year-end year and fiscal quarter. Then in the final line of code as.integer will extract the year-end year. format function can be used to get the rest where %y means 2 digit year and %q means quarter.
library(zoo)
# test input
yq <- c("2016 Q2", "2016 Q3", "2016 Q4", "2017 Q1")
fyq <- as.yearqtr(yq, format = "%Y Q%q") + 3/4
paste0(as.integer(fyq) - 1, format(fyq, "/%y Q%q"))
giving:
[1] "2016/17 Q1" "2016/17 Q2" "2016/17 Q3" "2016/17 Q4"
Note that if you don't need the specific format shown in the question you could just use format(fyq) in place of the last line or maybe format(fyq, "%Y Q%q").
Update: Minor code improvements.
Related
Let say I have date as follows:
Date = as.Date('2020-11-30')
Now I want to determine the quarter for this date, So I can use the zoo package
library(zoo)
as.yearqtr(Date). ### [1] "2020 Q4"
However I want to determine the quarter with respect to a date, say
Date1 = as.Date("2020-05-31")
So with respect to this date, the quarter of Date should be Q2.
Is there any way to set up the base in the quarter calculation?
Any pointer will be highly appreciated.
Thanks,
if we want to extract the quarter, use format
format(as.yearqtr(Date1), 'Q%q')
[1] "Q2"
Or if it is based on difference, try
paste0("Q", (as.yearqtr(Date) - as.yearqtr(Date1)) * 4)
[1] "Q2"
I have panel of quarterly data which looks like this:
x <- c("Q1 2013","Q2 2013", "Q3 2013", "Q4 2013")
How can I be able to properly input this data into r as quarterly time series date so I would be able to perform analysis on it?
I tried to use yearqtr from zoo package but all I receive is NA.
as.Date(as.yearqtr(x, format = "Q%q /%yyyy"))
This could be because of the space between Q1 and 2013, I'm open to change my format if I have to but I'm not even sure what format would work in R. Should I change my columns to 1.2013, 2.2013, ... or this would also not be recognized as a date format by R? And how am I gonna be able to change them when I have a repeated sample of quarterly date in this format: Q1 2013, etc.
This should solve your problem:
as.yearqtr(format(x), "Q%q %Y")
This is the output:
# [1] "2013 Q1" "2013 Q2" "2013 Q3" "2013 Q4"
You can make it as dates meanwhile:
as.Date(as.yearqtr(format(x), "Q%q %Y"))
And the output would be:
# [1] "2013-01-01" "2013-04-01" "2013-07-01" "2013-10-01"
So I want to convert "October 2010" and "November 2010" to a numeric format and hence if I take the difference of these two I get result: 1.
I tried to use as.date function but it seems that it only works for full format: month-day-year.
You can try formatting your raw date strings, and treating each one as being on the first day of that month.
dates <- c("October 2010", "November 2010")
# extract the first three letters for the month, and the last 4 digits for the year
dates.new <- paste0(substr(dates, 1, 3), "-01-", substr(dates, nchar(dates)-3, nchar(dates)))
> dates.new
[1] "Oct-01-2010" "Nov-01-2010"
# convert to POSIXct
dates.posix <- as.POSIXct(dates.new, format="%B-%d-%y")
diff <- dates.posix[2] - dates.posix[1]
> diff
Time difference of 31 days
In your question you want to calculate the difference in number of months and not in number of days. You could map your month-year character vector to a numeric number of months, starting at month 1 with the first month in your dataset and ending with month n with the last month in your dataset. Then it would be straightforward to calculate a difference in number of months.
Alternatively - to be able to manipulate date-time objects - you will have to create full dates, by introducing a 01 in front of all dates for example "01 November 2010" and then calculating the difference between dates. This the main part of the answer below.
Manipulating date-time objects
The lubridate package can calculate the difference between two dates. It deals with non trivial issues such as February 29th. If it's not installed on your system:
install.packages("lubridate")
Then
library(lubridate)
ymd("20160301")-ymd("20160228")
# Time difference of 2 days
ymd("20150301")-ymd("20150228")
# Time difference of 1 days
To read full month names look at formatting details in help(parse_date_time)
d <- parse_date_time("November 01 2010", "Bdy") - parse_date_time("October 01 2010", "Bdy")
d
# Time difference of 31 days
d is a difftime object, (based on converting a difftime to integer) you can convert it to a numeric number of days and weeks (but not to a number of months):
class(d)
# [1] "difftime"
as.numeric(d, units="days")
# [1] 31
as.numeric(d, units="weeks")
# [1] 4.428571
I can generate quarterly OHLC date from a daily time series:
library(quantmod)
getSymbols("SPY", from="2000-01-01", to=Sys.Date())
tail(SPY)
dfQ <- to.quarterly(SPY[,6])
tail(dfQ)
I can also generate the quarterly mean:
dfmean1 <- apply.quarterly(xts(SPY[,6]), FUN = mean)
tail(dfmean1)
However I am having problems merging the two, with an index showing the first date of the quarter (rather than the last date of the quarter).
Thank you for your help
I think you have two questions here. The first is how to have a mean column in OHLC quarterly data. The second is how to have datestamps for the start of each quarter, instead of "last" datestamps. The xts/quantmod packages assume you want "last" datestamps, so go with the flow, and just replace the datestamps at the end.
To have mean with OHLC I've found it best just to do the OHLC calculation myself. So instead of passing mean to apply.quarterly(), do this:
bars = apply.quarterly(xts(SPY[,6]), FUN = function(x){
d=coredata(x);
c(first(d),max(d),min(d),last(d),mean(d))
} )
colnames(bars)=c("open","high","low","close","mean")
This gives:
...
2013-09-30 159.71 171.28 159.56 167.10 165.9822
2013-12-31 168.43 184.69 164.59 184.69 176.1416
2014-01-08 182.92 183.52 182.36 183.52 183.0340
Then to fix the datestamps:
index(bars) = as.Date(as.yearqtr(index(bars)))
To understand that, start by looking at index(bars), then look at as.yearqtr(index(bars)), which gives:
[1] "2000 Q1" "2000 Q2" "2000 Q3" ...
... "2013 Q3" "2013 Q4" "2014 Q1"
Then, as luck would have it, as.Date() gives you the datestamp of the start of each quarter.
The final bit is to assign the new index back to the bars object with index(bars) = ... (or index(bars) <- ... if you prefer).
By the way, there is also a indexAt="lastof" or indexAt="firstof" parameter you could give to to.quarterly(). Experiment with this, but in my tests it was not quite useful enough.
I have a data frame with two columns. Date, Gender
I want to change the Date column to the start of the week for that observation. For example if Jun-28-2011 is a Tuesday, I'd like to change it to Jun-27-2011. Basically I want to re-label Date fields such that two data points that are in the same week have the same Date.
I also want to be able to do it by-weekly, or monthly and specially quarterly.
Update:
Let's use this as a dataset.
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
One slick way to do this that I just learned recently is to use the lubridate package:
library(lubridate)
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
#Add 1, since floor_date appears to round down to Sundays
floor_date(datset$date,"week") + 1
I'm not sure about how to do bi-weekly binning, but monthly and quarterly are easily handled with the respective base functions:
quarters(datset$date)
months(datset$date)
EDIT: Interestingly, floor_date from lubridate does not appear to be able to round down to the nearest quarter, but the function of the same name in ggplot2 does.
Look at ?strftime. In particular, the following formats:
%b: Abbreviated month name in the
current locale. (Also matches full
name on input.)
%B: Full month name
in the current locale. (Also matches
abbreviated name on input.)
%m: Month as decimal number (01–12).
%W: Week of the year as decimal number
(00–53) using Monday as the first day
of week (and typically with the first
Monday of the year as day 1 of week
1). The UK convention.
eg:
> strftime("2011-07-28","Month: %B, Week: %W")
[1] "Month: July, Week: 30"
> paste("Quarter:",ceiling(as.integer(strftime("2011-07-28","%m"))/3))
[1] "Quarter: 3"