how to calculate MTD in R for a different month - r

The task is to find number of days in MTD for two months, the month that had highest inflow numbers (July) in my case and the present month.
Because I plan to run the statement as a script everyday, I don't want to hardcode anything.
The dataframe is like this:
SERVICE BEST MONTH TOTAL BEST MONTH MTD CURR. MONTH MTD
No of Working Days
..
..
..
For "BEST MONTH TOTAL", I used following statement:
report[1,2] <- sum(!weekdays(seq(as.Date('2019-07-01'), as.Date('2019-07-
31'), 'days')) %in% c('Sunday','Saturday'))
For current month no of days MTD, the number of days I calculated using:
difftime(Sys.Date(),'2019-09-01',units = "days" )
This gives the output:
Time difference of 12.22917 days
Is there a way that I can get just the interger 12?
And how do I calculate BEST MONTH MTD? Is there a function that'll help go back to same date as sys.date() in the month of July to calculate number of working days MTD?
i.e. essentially what I need is:
difftime('2019-07-13','2019-07-01', units = "days")
But don't want to hardcode '2019-07-13' as I want to run this as a script and want to avoid changing date every day. Also I just need the difference in integer without "Time difference of ... days".

To convert to the number of days, as a numeric:
as.numeric(difftime(Sys.Date(),'2019-09-01',units = "days" ))

Is this what you want?
trunc(difftime(Sys.Date(),'2019-09-01',units = "days"))
#output
#Time difference of 12 days
best_month_mtd <- function(y) {
trunc(difftime(Sys.Date(),y, units = "days"))
}
#typical usage
best_month_mtd('2019-09-01')
#output
#Time difference of 12 days
That should give you what you want.
Update
If you just want to compute the number of days elapsed from the start date of the month, a one liner using lubridate might be handy. This might be different from the actual number of working days. The later requires handling days closed by a calendar in an organization as well as national holidays.
library(lubridate)
# -------------------------------------------------------------------------
#The start date of the month can be computed using floor_date() from lubridate
#floor_date() takes a date-time object and rounds it down to the nearest
#boundary of the specified time unit.
month_start <- floor_date(Sys.Date(), "month")
month_start
#"2019-09-01"
# -------------------------------------------------------------------------
#difftime between Sys.Date() and month_start
difftime(Sys.Date(),month_start, units = "days")
#Time difference of 12 days
# -------------------------------------------------------------------------
# to print only the number of days
paste0(trunc(difftime(Sys.Date(),month_start, units = "days")))
#"12"
# -------------------------------------------------------------------------
#Using the above a one liner can be constructed follows.
paste0(trunc(difftime(Sys.Date(),floor_date(Sys.Date(), "month"), units = "days")))

Related

Date Formatting in Time Series Codes

I have a .csv file that looks like this:
Date
Time
Demand
01-Jan-05
6:30
6
01-Jan-05
6:45
3
...
23-Jan-05
21:45
0
23-Jan-05
22:00
1
The days are broken into 15 minute increments from 6:30 - 22:00.
Now, I am trying to do a time series on this, but I am a little lost on the notation of this.
I have the following so far:
library(tidyverse)
library(forecast)
library(zoo)
tp <- read.csv(".csv")
tp.ts <- ts(tp$DEMAND, start = c(), end = c(), frequency = 63)
The frequency I am after is an entire day, which I believe makes the number 63.***
However, I am unsure as to how to notate the dates in c().
***Edit
If the frequency is meant to be observations per a unit of time, and I am trying to observe just (Demand) by the 15 minute time slots (Time) in each day (Date), maybe my Frequency is 1?
***Edit 2
So I think I am struggling with doing the time series because I have a Date column (which is characters) and a Time column.
Since I need the data for Demand at the given hours on the dates, maybe I need to convert the dates to be used in ts() and combine the Date and Time date into a new column?
If I do this, I am assuming this should give me the times I need (6:30 to 22:00) but with the addition of having the date?
However, the data is to be used to predict the Demand for the rest of the month. So maybe the Date is an important variable if the day of the week impacts Demand?
We assume you are starting with tp shown reproducibly in the Note at the end. A complete cycle of 24 * 4 = 96 points should be represented by one unit of time internally. The chron class does that so read it in as a zoo series z with chron time index and then convert that to ts giving ts_ser or possibly leave it as a zoo series depending on what you are going to do next.
library(zoo)
library(chron)
to_chron <- function(date, time) as.chron(paste(date, time), "%d-%b-%y %H:%M")
z <- read.zoo(tp, index = 1:2, FUN = to_chron, frequency = 4 * 24)
ts_ser <- as.ts(z)
Note
tp <- structure(list(Date = c("01-Jan-05", "01-Jan-05"), Time = c("6:30",
"6:45"), Demand = c(6L, 3L)), row.names = 1:2, class = "data.frame")

Bizdays doesn't exclude weekends

I am trying to calculate utilization rates by relative employee lifespans. I need to assign a total number of hours available to this employee between the earliest and furthest date in which time was recorded. From there I will use this as the divisor in utilization rate = workhours / totalhours.
When testing the bizdays function, I tried a simple example.
bizdays::bizdays("2020-02-07","2020-02-14")
[1] 7
Any reason why the function is not returning the correct number of business days?
I am expecting 5 business days since 2/07 was a Friday so only 1 week should be included.
The goals is to use bizdays in the following function that will be applied to a grouped df with gapply.
timeentry = function(x){
end_date = max(x$terminus)#creates an end_date variable from further end date in the group
start_date = min(x$onset) #creates a start_date from earliest start date in the group
start_date %>% bizdays(end_date) * 8 #subtracts dates and multiple by 8 to get work hours between two dates
}
I will apply the function in this manner. Unfortunately, it returns an error suggesting it cannot allocate vector of size 4687 gb. This is a separate issue I hope someone can point out.
util = group %>% gapply(.,timeentry)
where group is the grouped df.
Try setting up your calendar with create.calendar
library(bizdays)
create.calendar(name = "demo", weekdays = c("saturday", "sunday"))
bizdays::bizdays("2020-02-07","2020-02-14", cal = "demo")
[1] 5

How do i find week number from an arbitrary start date in R?

How do I find the week number from an arbitrary start date in R. Let's say I want my start date to be august 1st.
Using lubridate, you can do:
interval(today(), dmy("21-08-2020"))/weeks(1)
[1] 30.42857
Or from the date of interest to another date:
interval(dmy("21-08-2020"), dmy("21-09-2020"))/weeks(1)
[1] 4.428571
You can use difftime for this:
difftime("2020-08-21", Sys.Date(), units = "weeks")
# Time difference of 30.45238 weeks

Adding quarters to R date

I have a R time series data, where I am calculating the means for all values up to a particular date, and storing this means in the date + 4 quarters. The dates are all month ends. To achieve this, I am looking to increment 4 quarters to a date. My question is how can I add 4 quarters to an R date data-type. An illustration:
a <- as.Date("2006-01-01")
b <- as.Date("2011-01-01")
date_range <- quarter(seq.Date(a, b, by = "quarter"), with_year = TRUE)
> date_range[1] + 1
[1] 2007.1
> date_range[1] + quarter(1)
[1] 2007.1
> date_range[1] + 0.25
[1] 2006.35
One possible way I am thinking is to get year-quarter dates, and then adding 4 to it. But wasn't sure what is the best way to do this?
The problem is that quarters have different lengths. Q1 is shortest because it includes February (though it ties with Q2 in leap years). Things like this make "adding a quarter to a date" poorly defined. Even adding months to a date can be tricky at the ends months - what is 1 month after January 31?
Beginnings of months are more straightforward, and I would recommend you use the 1st day of quarters rather than the last (if you must use a specific date). lubridate provides functions like floor_date() and ceiling_date() to which you can pass unit = "quarter" and they will return the first day of the current or subsequent quarter, respectively. You can also always add months(3) to a day at the beginning of a month, though of course if your intention is to add 4 quarters you may as well just add 1 year.
Just add 12 months or a year instead?
Or if it must be quarters, define yourself a function, like so:
quarters <- function(x) {
months(3*x)
}
and then use it to add to the date sequence:
date_range <- seq.Date(a, b, by = "quarter")
date_range + quarters(4)
Lubridate has a function for quarters already included. This is a much better solution than creating your own function.
https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/quarter
Old answer but to those arriving here, lubridate has a function %m+%that adds months and preserves monthends.
a <- as.Date("2006-01-01")
Add future months worth of dates:
The original poster wanted 4 quarters in future so that will be 12 months.
future_date <- a %m+% months(12)
future_date
[1] "2007-01-01"
You could also do years as the period:
future_date <- a %m+% years(1)
Remove months from date:
Subtract dates with %m-%
If you wanted a date 3 months ago from 1/1/2006:
past_date <- a %m-% months(3)
past_date
[1] "2005-10-01"
Example with dates not at end of months:
mplus will preserve days in month:
as.Date("2022-10-10") %m-% months(3)
[1] "2022-07-10"
For more, see documentation on "Add and subtract months to a date without exceeding the last day of the new month"
Note that other answers that use Date class will give irregularly spaced series and so are unsuitable for time series analysis.
To do this in such a way that time series analyses can be performed and noting the zoo tag on the question, the yearmon class represents year/month as year + fraction where fraction is 0 for Jan, 1/12 for Feb, 2/12 for Mar, ..., 11/12 for Dec. Thus adding 4 quarters is just a matter of adding 1. (Adding x quarters is done by adding x/4.)
library(zoo)
ym <- yearmon(2006) + 0:11/12 # months in 2006
ym + 1 # one year later
Also this converts yearmon objects to end-of-month Date and in the second line Date to yearmon. Using frac = 0 or omitting frac in the first line would convert to beginning of month dates.
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-months
as.yearmon(d) # convert Date vector to yearmon
If your input dates represent quarters then there is also the yearqtr class which represents a year/quarter as year + fraction where fraction is 0, 1/4, 2/4, 3/4 for the 4 quarters of a year. Adding 4 quarters is done by adding 1 (or to add x quarters add x/4).
yq <- as.yearqtr(2006) + 0:3/4 # all quarters in 2006
yq + 1 # one year later
Conversions work similarly to yearmon:
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-quarters
as.yearqtr(d) # convert Date vector to yearqtr

How to find decimal representation of years in R?

Since I need reasonably accurate representations of years in decimal format (~ 4-5 digits of accuracy would work) I turned to the lubridate package. This is what I have tried:
refDate <- as.Date("2016-01-10")
endDate <- as.Date("2020-12-31")
daysInLeapYear <- 366
daysInRegYear <- 365
leapYearFractStart <- 0
leapYearRegStart <- 0
daysInterval <- as.interval(difftime(endDate, refDate, unit = "d"), start = refDate)
periodObject <- as.period(daysInterval)
if(leap_year(refDate)) {
leapYearFractStart <- (as.numeric(days_in_month(refDate))-as.numeric(format(refDate, "%d")))/daysInLeapYear
}
if(!leap_year(refDate)) {
leapYearRegStart <- (as.numeric(days_in_month(refDate))-as.numeric(format(refDate, "%d")))/daysInRegYear
}
returnData <- periodObject#year+(periodObject#month/12)+leapYearFractStart+leapYearRegStart
It is safe to assume that the end date is always at the end of a month, hence no leap year check at the end. Relying on lubridate for proper year/month counting I am adjusting for leap-years only for the start date.
I recon this gets me to within 3 digits of accuracy only! In addition, it looks a bit crude.
Is there a more complete and accurate procedure to determine decimal representation of years in an interval?
It's very unclear what you're trying to do exactly here, which makes accuracy difficult to talk about.
lubridate has a function decimal_date which turns dates into decimals. But since 3 decimal places gives you 1000 possible positions within a year, when we only have 365/366 days, there are between 2 and 3 viable values that fall within a day. Accuracy depends on when in the day you want the result to fall.
> decimal_date(as.POSIXlt("2016-01-10 00:00:01"))
[1] 2016.025
> decimal_date(as.POSIXlt("2016-01-10 12:00:00"))
[1] 2016.026
> decimal_date(as.POSIXlt("2016-01-10 23:59:59"))
[1] 2016.027
In other words, going beyond 3 decimal places is only really important if you're interested in the time of day.
This solution uses only base R. We get the beginning of the year using cut(..., "year") and the number of days in the year by differencing it with the beginning of the next year obtained using cut(..., "year") on an arbitrary date in the following year. Finally use those quantities to get the fraction and add it to the year.
d <- as.Date(c("2015-01-31", "2016-01-01", "2016-01-10", "2016-12-31")) # sample input
year_begin <- as.Date(cut(d, "year"))
days_in_year <- as.numeric( as.Date(cut(year_begin + 366, "year")) - year_begin )
as.numeric(format(d, "%Y")) + as.numeric(d - year_begin) / days_in_year
## [1] 2015.082 2016.000 2016.025 2016.997
Alternately, using as.POSIXlt this variation crams it into one line:
with(unclass(as.POSIXlt(d)),1900+year+yday/as.numeric(as.Date(cut(d-yday+366,"y"))-d+yday))
## [1] 2015.082 2016.000 2016.025 2016.997

Resources