Adding fiscal month end - r

I would like to mutate a fiscal month-end date to a dataset in R. In my company the fiscal month-end would be on 21st of that. For example
12/22/2019 to 1/21/2020 will be Jan-2020
1/22/2020 to 2/21/2020 will be Feb-2020
2/22/2020 to 3/21/2020 will be Mar-2020
etc
Dataset
Desired_output
How would I accomplish this in R. The Date column in my data is %m/%d/%Y(1/22/2020)

You could extract the date and if date is greater than 22 add 10 days to it and get the date in month-year format :
transform(dat, Fiscal_Month = format(Date +
ifelse(as.integer(format(Date, '%d')) >= 22, 10, 0), '%b %Y'))
# Date Fiscal_Month
#1 2020-01-20 Jan 2020
#2 2020-01-21 Jan 2020
#3 2020-01-22 Feb 2020
#4 2020-01-23 Feb 2020
#5 2020-01-24 Feb 2020
This can also be done without ifelse like this :
transform(dat, Fiscal_Month = format(Date + c(0, 10)
[(as.integer(format(Date, '%d')) >= 22) + 1], '%b %Y'))
data
Used this sample data :
dat <- data.frame(Date = seq(as.Date('2020-01-20'), by = '1 day',length.out = 5))

1) yearmon We perform the following steps:
create test data d which shows both a date in the start of period month (i.e. 22nd or later) and a date in the end of period month (i.e. 21st or earlier)
convert the input d to Date class giving dd
subtract 21 days thereby shifting it to the month that starts the fiscal period
convert that to ym of yearmon class (which represents a year and a month without a day directly and internally represents it as the year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec) and then add 1/12 to get to the month at the end of fiscal period.
format it as shown. (We could omit this step, i.e. the last line of code, if the default format, e.g. Jan 2020, that yearmon uses is ok.
The whole thing could easily be written in a single line of code but we have broken it up for clarity.
library(zoo)
d <- c("1/22/2020", "1/21/2020") # test data
dd <- as.Date(d, "%m/%d/%Y")
ym <- as.yearmon(dd - 21) + 1/12
format(ym, "%b-%y")
## [1] "Feb-20" "Jan-20"
2) Base R This could be done using only in base R as follows. We make use of dd from above. cut computes the first of the month that dd-21 lies in (but not as a Date class object) and then as.Date converts it to one. Adding 31 shifts it to the end of period month and formatting this we get the final answer.
format(as.Date(cut(dd - 21, "month")) + 31, "%b-%y")
## [1] "Feb-20" "Jan-20"

Related

R Weekly Time Series Object

I have the following vector, which contains data for each day of December.
vector1 <- c(1056772, 674172, 695744, 775040, 832036,735124,820668,1790756,1329648,1195276,1267644,986716,926468,828892,826284,749504,650924,822256,3434204,2502916,1262928,1025980,1828580,923372,658824,956916,915776,1081736,869836,898736,829368)
Now I want to create a time series object on a weekly basis and used the following code snippet:
weeklyts = ts(vector1,start=c(2016,12,01), frequency=7)
However, the starting and end points are not correct. I always get the following time series:
> weeklyts
Time Series:
Start = c(2017, 5)
End = c(2021, 7)
Frequency = 7
[1] 1056772 674172 695744 775040 832036 735124 820668 1790756 1329648 1195276 1267644 986716 926468 828892 826284 749504
[17] 650924 822256 3434204 2502916 1262928 1025980 1828580 923372 658824 956916 915776 1081736 869836 898736 829368
Does anybody nows what I am doing wrong?
To get a timeseries that starts and ends as you would expect, you need to think about the timeserie. You have 31 days from december 2016.
The timeserie start option handles 2 numbers, not 3. So something like c(2016, 1) if you start with month 1 in 2016. See following example.
ts(1:12, start = c(2016, 1), frequency = 12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016 1 2 3 4 5 6 7 8 9 10 11 12
Now ts and daily data is an annoyance. ts cannot handle leap years. That is why you see people using a frequency of 365.25 to get an annual timeseries. To get a good december 2016 series we can do the following:
ts(vector1, start = c(2016, 336), frequency = 366)
Time Series:
Start = c(2016, 336)
End = c(2016, 366)
Frequency = 366
[1] 1056772 674172 695744 775040 832036 735124 820668 1790756 1329648 1195276 1267644 986716 926468 828892 826284 749504
[17] 650924 822256 3434204 2502916 1262928 1025980 1828580 923372 658824 956916 915776 1081736 869836 898736 829368
Note the following things that are going on:
Frequence is 366 because 2016 is a leap year
start is c(2016, 336), because 336 is the day in the year on "2016-12-01"
Personally I use xts package (and zoo) to handle daily data and use the functions in xts to aggregate to weekly timeseries. These can then be used with packages that like ts timeseries like forecast.
edit: added small xts example
my_df <- data.frame(dates = seq.Date(as.Date("2016-12-01"), as.Date("2017-01-31"), by = "day"),
var1 = rep(1:31, 2))
library(xts)
my_xts <- xts(my_df[, -1], order.by = my_df$dates)
# rollup to weekly. Dates shown are the last day in the weekperiod.
my_xts_weekly <- period.apply(my_xts, endpoints(my_xts, on = "weeks"), colSums)
head(my_xts_weekly)
[,1]
2016-12-04 10
2016-12-11 56
2016-12-18 105
2016-12-25 154
2017-01-01 172
2017-01-08 35
Depending on your needs you can transform this back into data.frames etc etc. Read the help for period.apply as you can specify your own functions in the rolling mechanism. And read the xts (and zoo) vignettes.

R - Calculate sum within date range using zoo

Suppose I have a data frame with ten years of daily temperature data (in degree C) like this:
mydf <- data.frame(Date = seq(as.Date("2001/1/1"), as.Date("2010/12/31"), by = "day"), Temp = runif(3652, 0, 40))
I am trying to calculate growing degree days for plants. This is how it works: within a date range, I need to integrate the difference between the daily temperature and a base temperature, let's say 10 degrees C. To make it harder, the date range goes across years. For example, I need to calculate the growing days between november 1st and march 31st for all years in the time series. In terms of an "algorithm", the logic would be something like this:
t_base <- 10
for (each day between nov 1st and mar 31st) {
sum (Temp - t_base)
}
How to do this using the zoo package?
Note that "yearmon" class variables are of the form year + frac where the frac is 0 for Jan, 1/12 for Feb, 2/12 for Mar, etc. Below ym is a "yearmon" vector corresponding to the Date except that we have added two months. ym is then split into year y (the season-end year) and month m (where month is 0 for the first month of the season, 1 for the second month, ..., 4 for the 5th and last month in season and higher numbers for months not in season) . in.seas is TRUE for those data points in Nov, Dec, Jan, Feb or Mar (which corresponds to m <= 4). Finally use ave to calculate the cumulative sum among dates having the same season-end year or aggregate to calculate the sum.
library(zoo)
z <- read.zoo(mydf)
ym <- as.numeric(as.yearmon(index(z)) + 2/12)
y <- floor(ym) # year of date's season end or this year if not in season
m <- round(12 * (ym - y)) # month Nov = 0, Dec = 1, Jan = 2, Feb = 3, Mar = 4, ...
in.seas <- m <= 4
Cum <- ave(z[in.seas], y[in.seas], FUN = function(x) cumsum(x - t_base))
or to just get the sum of each season:
Sum <- aggregate(z[in.seas], y[in.seas], function(x) sum(x - t_base))
Note that fortify.zoo(x) will convert zoo object x back to a data frame should that be necessary.

R Create function to add water year column

I want to be able to create a water year column for a time series. The US water year is from Oct-Sept and is considered the year it ends on. For example the 2014 water year is from October 1, 2013 - September 30, 2014.
This is the US water year, but not the only water year. Therefore I want to enter in a start month and have a water year calculated for the date.
For example if my data looks like
date
2008-01-01 00:00:00
2008-02-01 00:00:00
2008-03-01 00:00:00
2008-04-01 00:00:00
.
.
.
2008-12-01 00:00:00
I want my function to work something like:
wtr_yr <- function(data, start_month) {
does stuff
}
Then my output would be
wtr_yr(data, 2)
date wtr_yr
2008-01-01 00:00:00 2008
2008-02-01 00:00:00 2009
2008-03-01 00:00:00 2009
2008-04-01 00:00:00 2009
.
.
.
2009-01-01 00:00:00 2009
2009-02-01 00:00:00 2010
2009-03-01 00:00:00 2010
2009-04-01 00:00:00 2010
I started by breaking the date up into separate columns, but I don't think that is the best way to go about it. Any advice?
Thanks in advance!
We can use POSIXlt to come up with an answer.
wtr_yr <- function(dates, start_month=9) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
Let's now use this function in an example.
# Sample input vector
dates = c("2008-01-01 00:00:00",
"2008-02-01 00:00:00",
"2008-03-01 00:00:00",
"2008-04-01 00:00:00",
"2009-01-01 00:00:00",
"2009-02-01 00:00:00",
"2009-03-01 00:00:00",
"2009-04-01 00:00:00")
# Display the function output
wtr_yr(dates, 2)
# Combine the input and output vectors in a dataframe
df = data.frame(dates, wtr_yr=wtr_yr(dates, 2))
I had a similar problem a while back but dealing with fiscal years that started in October. I found this function which also computes the quarters within the year. For one part, I only wanted it to output the fiscal year, so I edited a tiny part of the function to do that. There is surely a much cleaner/efficient way of doing it, but this should work for smaller data sets. Here is the edited function:
getYearQuarter <- function(x,
firstMonth=7,
fy.prefix='FY',
quarter.prefix='Q',
sep='-',
level.range=c(min(x), max(x)) ) {
if(level.range[1] > min(x) | level.range[2] < max(x)) {
warning(paste0('The range of x is greater than level.range. Values ',
'outside level.range will be returned as NA.'))
}
quarterString <- function(d) {
year <- as.integer(format(d, format='%Y'))
month <- as.integer(format(d, format='%m'))
y <- ifelse(firstMonth > 1 & month >= firstMonth, year+1, year)
q <- cut( (month - firstMonth) %% 12, breaks=c(-Inf,2,5,8,Inf),
labels=paste0(quarter.prefix, 1:4))
return(paste0(fy.prefix, substring(y,3,4)))
}
vals <- quarterString(x)
levels <- unique(quarterString(seq(
as.Date(format(level.range[1], '%Y-%m-01')),
as.Date(format(level.range[2], '%Y-%m-28')), by='month')))
return(factor(vals, levels=levels, ordered=TRUE))
}
Your input vector should be type Date, and then specify the start month. Assuming you have a data frame(df) with the 'date' column as in your question, this should do the trick.
df$wtr_yr <- getYearQuarter(df$date, firstMonth=10)
You can also achieve adding a column by water year by using the "lfstat" package
https://www.rdocumentation.org/packages/lfstat/versions/0.9.4/topics/water_year

Format date strings comprising weeks and quarters as Date objects

I have dates in an R dataframe column formatted as character strings as WK01Q32014.
I want to turn each date into a Date() object.
So I altered the format to make it look like 01-3-2014. I want to try to do something like as.Date("01-3-2014","%W-%Q-%Y") for example, but there is no format code for quarters that I know of.
Is there any way to do this using the lubridate, zoo, or any other libraries?
I dont know of any specific function, but here's a basic one:
convert_WQ_to_Date <- function(D) {
weeks <- as.integer(substr(D, 3, 4))
quarter <- as.integer(substr(D, 6, 6))
year <- substr(D, 7, 10)
days <- 7 * ((quarter - 1) * 13 + (weeks-1))
as.Date(sprintf("%s-01-01", year)) + days
}
Example
D <- c("WK01Q32014", "WK01Q12014", "WK05Q42014", "WK01Q22014", "WK02Q32014")
convert_WQ_to_Date(D)
[1] "2014-07-02" "2014-01-01" "2014-10-29" "2014-04-02" "2014-07-09"
The week, quarter and year does not uniquely define a date so we will have to add some assumption. Here we add the assumption that the first week is the first day of the quarter, the second week is 7 days later and so on,
Below, we extract the qtr-year part and use as.yearqtr in the zoo package to convert that to a yearqtr object and then use as.Date to convert that to a date which is the first of the quarter. We then extract the week, subtract 1 and multiply by 7 to get the days offset. Adding the first of the quarter to the offset gives the result:
library(zoo)
xx <- "01-3-2014" # week-quarter-year
qtr.start <- as.Date(as.yearqtr(sub("...", "", xx), "%q-%Y"))
days <- 7 * (as.numeric(sub("-.*", "", xx)) - 1)
qtr.start + days
## [1] "2014-07-01"
Assuming the traditional notion of each quarter starting respectively at the 1st January, 1st April, 1st July and 1st September (in line with the quarters function), just start at these dates and add 7 days for each week:
x <- c("01-3-2014","01-1-2014","05-4-2014","01-2-2014","02-3-2014")
y <- as.numeric(substr(x,6,9))
m <- as.numeric(substr(x,4,4))
d <- as.numeric(substr(x,1,2))
as.Date(paste(y,(m-1)*3+1,"01",sep="-")) + (7*(d-1))
#[1] "2014-07-01" "2014-01-01" "2014-10-29" "2014-04-01" "2014-07-08"

Split date data (m/d/y) into 3 separate columns

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.

Resources