I am currently working with dates in R and need to calculate the time difference between two quarters. I have used the zoo library to transform my dates into quarterly format, but I am struggling to calculate the difference between my dates.
Here is a sample code for reproducability:
sample_dataframe <- data.frame(First_Purchase_date = c(as.Date("2020-01-15"), as.Date("2019-02-10"),as.Date("2018-12-24")),Recent_Purchase_date = c(as.Date("2020-06-20"), as.Date("2020-10-10"), as.Date("2019-05-26")))
library(zoo)
#using zoo library to transform my dates into quarters
sample_dataframe$First_purchase_quarter <- as.yearqtr((sample_dataframe$First_Purchase_date), "%Y-%m-%d")
sample_dataframe$Recent_Purchase_quarter <- as.yearqtr((sample_dataframe$Recent_Purchase_date), "%Y-%m-%d")
What I want to achieve is to subtract Recent_Purchase_quarter from First_purchase_quarter to get a time difference in quarters.
So if Recent_Purchase_quarter is 2019 Q2 and First_Purchase_quarter is 2018 Q4 the result should be 2.
What would be the easiest way to get the time difference in quarters as described above?
#using zoo library to transform my dates into quarters
sample_dataframe$First_purchase_quarter <- as.yearqtr((sample_dataframe$First_Purchase_date), "%Y-%m-%d")
sample_dataframe$Recent_Purchase_quarter <- as.yearqtr((sample_dataframe$Recent_Purchase_date), "%Y-%m-%d")
sample_dataframe$diff <- (sample_dataframe[, 4] - sample_dataframe[, 3]) * 4
head(sample_dataframe$diff)
[1] 1 7 2
Related
I'm trying to build a time series. My data frame has each month listed as a number. When I use as.Date() I get NA. How do I convert a number to its respective month, as a date.
Example
R Base has a built in month dataset. make sure your numbers are actually numeric by as.numeric() and then you can just use month.name[1] which outputs January
Below we assume that the month numbers given are the number of months relative to a base of the first month so for example month 13 would represent 12 months after month 1. Also we assume that the months re unique since that is the case in the question and since it is stated there that it represents a time series.
1) Let base be the year and month as a yearmon class object identifying the base year/month and assume months is vector of month numbers such that 1 is the base, 2 is one month later and so on. Since yearmon class represents a year and month as year + 0 for Jan, year + 1/12 for Feb, ..., year + 11/12 for Dec we have the code below to get a Date vector. Alternately use ym instead since that models a year and month already.
library(zoo)
# inputs
base <- as.yearmon("2020-01")
months <- 1:9
ym <- base + (months-1)/12
as.Date(ym)
## [1] "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" "2020-05-01"
## [6] "2020-06-01" "2020-07-01" "2020-08-01" "2020-09-01"
For example, if we have this data.frame we can convert that to a zoo series or a ts series like this using base from above:
library(zoo)
DF <- data.frame(month = 1:9, value = 11:19) # input
z <- with(DF, zoo(value, base + (month-1)/12)) # zoo series
tt <- as.ts(z) # ts series
2) Alternately, if it were known that the series is consecutive months starting in January 2020 then we could ignore the month column and do this (where DF and base were shown above):
library(zoo)
zz <- zooreg(DF$value, base, freq = 12) # zooreg series
as.ts(zz) # ts series
3) This would also work to create a ts series if we can make the same assumptions as in (2). This uses only base R.
ts(DF$value, start = 2020, freq = 12)
I have a R time series data, where I am calculating the means for all values up to a particular date, and storing this means in the date + 4 quarters. The dates are all month ends. To achieve this, I am looking to increment 4 quarters to a date. My question is how can I add 4 quarters to an R date data-type. An illustration:
a <- as.Date("2006-01-01")
b <- as.Date("2011-01-01")
date_range <- quarter(seq.Date(a, b, by = "quarter"), with_year = TRUE)
> date_range[1] + 1
[1] 2007.1
> date_range[1] + quarter(1)
[1] 2007.1
> date_range[1] + 0.25
[1] 2006.35
One possible way I am thinking is to get year-quarter dates, and then adding 4 to it. But wasn't sure what is the best way to do this?
The problem is that quarters have different lengths. Q1 is shortest because it includes February (though it ties with Q2 in leap years). Things like this make "adding a quarter to a date" poorly defined. Even adding months to a date can be tricky at the ends months - what is 1 month after January 31?
Beginnings of months are more straightforward, and I would recommend you use the 1st day of quarters rather than the last (if you must use a specific date). lubridate provides functions like floor_date() and ceiling_date() to which you can pass unit = "quarter" and they will return the first day of the current or subsequent quarter, respectively. You can also always add months(3) to a day at the beginning of a month, though of course if your intention is to add 4 quarters you may as well just add 1 year.
Just add 12 months or a year instead?
Or if it must be quarters, define yourself a function, like so:
quarters <- function(x) {
months(3*x)
}
and then use it to add to the date sequence:
date_range <- seq.Date(a, b, by = "quarter")
date_range + quarters(4)
Lubridate has a function for quarters already included. This is a much better solution than creating your own function.
https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/quarter
Old answer but to those arriving here, lubridate has a function %m+%that adds months and preserves monthends.
a <- as.Date("2006-01-01")
Add future months worth of dates:
The original poster wanted 4 quarters in future so that will be 12 months.
future_date <- a %m+% months(12)
future_date
[1] "2007-01-01"
You could also do years as the period:
future_date <- a %m+% years(1)
Remove months from date:
Subtract dates with %m-%
If you wanted a date 3 months ago from 1/1/2006:
past_date <- a %m-% months(3)
past_date
[1] "2005-10-01"
Example with dates not at end of months:
mplus will preserve days in month:
as.Date("2022-10-10") %m-% months(3)
[1] "2022-07-10"
For more, see documentation on "Add and subtract months to a date without exceeding the last day of the new month"
Note that other answers that use Date class will give irregularly spaced series and so are unsuitable for time series analysis.
To do this in such a way that time series analyses can be performed and noting the zoo tag on the question, the yearmon class represents year/month as year + fraction where fraction is 0 for Jan, 1/12 for Feb, 2/12 for Mar, ..., 11/12 for Dec. Thus adding 4 quarters is just a matter of adding 1. (Adding x quarters is done by adding x/4.)
library(zoo)
ym <- yearmon(2006) + 0:11/12 # months in 2006
ym + 1 # one year later
Also this converts yearmon objects to end-of-month Date and in the second line Date to yearmon. Using frac = 0 or omitting frac in the first line would convert to beginning of month dates.
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-months
as.yearmon(d) # convert Date vector to yearmon
If your input dates represent quarters then there is also the yearqtr class which represents a year/quarter as year + fraction where fraction is 0, 1/4, 2/4, 3/4 for the 4 quarters of a year. Adding 4 quarters is done by adding 1 (or to add x quarters add x/4).
yq <- as.yearqtr(2006) + 0:3/4 # all quarters in 2006
yq + 1 # one year later
Conversions work similarly to yearmon:
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-quarters
as.yearqtr(d) # convert Date vector to yearqtr
I'm creating an xts object with a weekly (7 day) frequency to use in forecasting. However, even when using the frequency=7 argument in the xts call, the resulting xts object has a frequency of 1.
Here's an example with random data:
> values <- rnorm(364, 10)
> days <- seq.Date(from=as.Date("2014-01-01"), to=as.Date("2014-12-30"), by='days')
> x <- xts(values, order.by=days, frequency=7)
> frequency(x)
[1] 1
I have also tried, after using the above code, frequency(x) <- 7. However, this changes the class of x to only zooreg and zoo, losing the xts class and messing with the time stamp formats.
Does xts automatically choose a frequency based on analyzing the data in some way? If so, how can you override this to set a specific frequency for forecasting purposes (in this case, passing a seasonal time series to ets from the forecast package)?
I understand that xts may not allow frequencies that don't make sense, but a frequency of 7 with daily time stamps seems pretty logical.
Consecutive Date class dates always have a frequency of 1 since consecutive dates are 1 apart. Use ts or zooreg to get a frequency of 7:
tt <- ts(values, frequency = 7)
library(zoo)
zr <- as.zooreg(tt)
# or
zr <- zooreg(values, frequency = 7)
These will create a series whose times are 1, 1+1/7, 1+2/7, ...
If we have some index values of zr
zrdates <- index(zr)[5:12]
we can recover the dates from zrdates like this:
days[match(zrdates, index(zr))]
As pointed out in the comments xts does not support this type of series.
I have a chunk of data logging temperatures from a few dozen devices every hour for over a year. The data are stored as a zoo object. I'd very much like to summarize those data by looking at the average values for every one of the 24 hours in a day (1am, 2am, 3am, etc.). So that for each device I can see what its average value is for all the 1am times, 2am times, and so on. I can do this with a loop but sense that there must be a way to do this in zoo with an artful use of aggregate.zoo. Any help?
require(zoo)
# random hourly data over 30 days for five series
x <- matrix(rnorm(24 * 30 * 5),ncol=5)
# Assign hourly data with a real time and date
x.DateTime <- as.POSIXct("2014-01-01 0100",format = "%Y-%m-%d %H") +
seq(0,24 * 30 * 60 * 60, by=3600)
# make a zoo object
x.zoo <- zoo(x, x.DateTime)
#plot(x.zoo)
# what I want:
# the average value for each series at 1am, 2am, 3am, etc. so that
# the dimensions of the output are 24 (hours) by 5 (series)
# If I were just working on x I might do something like:
res <- matrix(NA,ncol=5,nrow=24)
for(i in 1:nrow(res)){
res[i,] <- apply(x[seq(i,nrow(x),by=24),],2,mean)
}
res
# how can I avoid the loop and write an aggregate statement in zoo that
# will get me what I want?
Calculate the hour for each time point and then aggregate by that:
hr <- as.numeric(format(time(x.zoo), "%H"))
ag <- aggregate(x.zoo, hr, mean)
dim(ag)
## [1] 24 5
ADDED
Alternately use hours from chron or hour from data.table:
library(chron)
ag <- aggregate(x.zoo, hours, mean)
This is quite similar to the other answer but takes advantage of the fact the the by=... argument to aggregate.zoo(...) can be a function which will be applied to time(x.zoo):
as.hour <- function(t) as.numeric(format(t,"%H"))
result <- aggregate(x.zoo,as.hour,mean)
identical(result,ag) # ag from G. Grothendieck answer
# [1] TRUE
Note that this produces a result identical to the other answer, not not the same as yours. This is because your dataset starts at 1:00am, not midnight, so your loop produces a matrix wherein the 1st row corresponds to 1:00am and the last row corresponds to midnight. These solutions produce zoo objects wherein the first row corresponds to midnight.
I couldn't find a solution of my problem with POSIXct format - I have a monthly data. This is a scrap of my code:
Data <- as.POSIXct(as.character(czerwiec$Data), format = "%Y-%m-%d %H:%M:%S")
get.rows <- Data >= as.POSIXct(as.character("2013-06-03 00:00:01")) & Data <= as.POSIXct(as.character("2013-06-09 23:59:59"))
czerwiec <- czerwiec[get.rows,]
Data <- Data[get.rows]
I chose one hole week of June from 3 to 9 and wanted to estimate the sum of column X (czerwiec$X) by every hours. As you see I could reduce time, but it will be stupid to do it, like this
get.rows <- Data >= as.POSIXct(as.character("2013-06-03 00:00:01")) &
Data <= as.POSIXct(as.character("2013-06-03 00:59:59"))
then
get.rows <- Data >= as.POSIXct(as.character("2013-06-04 00:00:01")) &
Data <= as.POSIXct(as.character("2013-06-04 00:59:59"))
And in the end of this operations, I can estimate sum for this hour etc.
Do you have any idea, how I can recall to every rows, which have time like 2013-06-03 to 2013-06-09 and 00:00:01 to 00:59:59??
Something about data frame "czerwiec", so I have three columns, where first call "ID", second "Price" and third "Data" (means Date).
Thx for help :)
This might help. I've used the lubridate package, which doesn't really do anything you can't do in base R, but it makes handling dates much easier
# Set up Data as a string vector
Data <- c("2013-06-01 05:05:05", "2013-06-06 05:05:05", "2013-06-06 08:10:05", "2013-07-07 05:05:05")
require(lubridate)
# Set up the data frame with fake data. This makes a reproducible example
set.seed(4) #For reproducibility, always set the seed when using random numbers
# Create a data frame with Data and price
czerwiec <- data.frame(price=runif(4))
# Use lubridate to turn the Data string into a vector of POSIXctn objects
czerwiec$Data <- ymd_hms(Data)
# Determine the 'yearday' -i.e. yearday of Jan 1 is 1; yearday of Dec 31 is 365 (or 366 in a leap year)
czerwiec$yday <- yday(czerwiec$Data)
# in.range is true if the date is in the desired date range
czerwiec$in.range <- czerwiec$yday[czerwiec$yday >= yday(ymd("2013-06-03")) &
czerwiec$yday yday(ymd("2013-06-09")]
# Pick out the dates that have the range that you want
selected_dates <- subset(czerwiec, in.range==TRUE)