Suppose I have a data frame with ten years of daily temperature data (in degree C) like this:
mydf <- data.frame(Date = seq(as.Date("2001/1/1"), as.Date("2010/12/31"), by = "day"), Temp = runif(3652, 0, 40))
I am trying to calculate growing degree days for plants. This is how it works: within a date range, I need to integrate the difference between the daily temperature and a base temperature, let's say 10 degrees C. To make it harder, the date range goes across years. For example, I need to calculate the growing days between november 1st and march 31st for all years in the time series. In terms of an "algorithm", the logic would be something like this:
t_base <- 10
for (each day between nov 1st and mar 31st) {
sum (Temp - t_base)
}
How to do this using the zoo package?
Note that "yearmon" class variables are of the form year + frac where the frac is 0 for Jan, 1/12 for Feb, 2/12 for Mar, etc. Below ym is a "yearmon" vector corresponding to the Date except that we have added two months. ym is then split into year y (the season-end year) and month m (where month is 0 for the first month of the season, 1 for the second month, ..., 4 for the 5th and last month in season and higher numbers for months not in season) . in.seas is TRUE for those data points in Nov, Dec, Jan, Feb or Mar (which corresponds to m <= 4). Finally use ave to calculate the cumulative sum among dates having the same season-end year or aggregate to calculate the sum.
library(zoo)
z <- read.zoo(mydf)
ym <- as.numeric(as.yearmon(index(z)) + 2/12)
y <- floor(ym) # year of date's season end or this year if not in season
m <- round(12 * (ym - y)) # month Nov = 0, Dec = 1, Jan = 2, Feb = 3, Mar = 4, ...
in.seas <- m <= 4
Cum <- ave(z[in.seas], y[in.seas], FUN = function(x) cumsum(x - t_base))
or to just get the sum of each season:
Sum <- aggregate(z[in.seas], y[in.seas], function(x) sum(x - t_base))
Note that fortify.zoo(x) will convert zoo object x back to a data frame should that be necessary.
Related
I would like to mutate a fiscal month-end date to a dataset in R. In my company the fiscal month-end would be on 21st of that. For example
12/22/2019 to 1/21/2020 will be Jan-2020
1/22/2020 to 2/21/2020 will be Feb-2020
2/22/2020 to 3/21/2020 will be Mar-2020
etc
Dataset
Desired_output
How would I accomplish this in R. The Date column in my data is %m/%d/%Y(1/22/2020)
You could extract the date and if date is greater than 22 add 10 days to it and get the date in month-year format :
transform(dat, Fiscal_Month = format(Date +
ifelse(as.integer(format(Date, '%d')) >= 22, 10, 0), '%b %Y'))
# Date Fiscal_Month
#1 2020-01-20 Jan 2020
#2 2020-01-21 Jan 2020
#3 2020-01-22 Feb 2020
#4 2020-01-23 Feb 2020
#5 2020-01-24 Feb 2020
This can also be done without ifelse like this :
transform(dat, Fiscal_Month = format(Date + c(0, 10)
[(as.integer(format(Date, '%d')) >= 22) + 1], '%b %Y'))
data
Used this sample data :
dat <- data.frame(Date = seq(as.Date('2020-01-20'), by = '1 day',length.out = 5))
1) yearmon We perform the following steps:
create test data d which shows both a date in the start of period month (i.e. 22nd or later) and a date in the end of period month (i.e. 21st or earlier)
convert the input d to Date class giving dd
subtract 21 days thereby shifting it to the month that starts the fiscal period
convert that to ym of yearmon class (which represents a year and a month without a day directly and internally represents it as the year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec) and then add 1/12 to get to the month at the end of fiscal period.
format it as shown. (We could omit this step, i.e. the last line of code, if the default format, e.g. Jan 2020, that yearmon uses is ok.
The whole thing could easily be written in a single line of code but we have broken it up for clarity.
library(zoo)
d <- c("1/22/2020", "1/21/2020") # test data
dd <- as.Date(d, "%m/%d/%Y")
ym <- as.yearmon(dd - 21) + 1/12
format(ym, "%b-%y")
## [1] "Feb-20" "Jan-20"
2) Base R This could be done using only in base R as follows. We make use of dd from above. cut computes the first of the month that dd-21 lies in (but not as a Date class object) and then as.Date converts it to one. Adding 31 shifts it to the end of period month and formatting this we get the final answer.
format(as.Date(cut(dd - 21, "month")) + 31, "%b-%y")
## [1] "Feb-20" "Jan-20"
I have two data frames I want to join together. I have a series of dates and I want to join a quarter at time t up with its t+1 quarter.
I get a little stuck at quarter 4, joining up with the year + 1. i.e. Q4 of 2006 should be joined with Q1 of 2007.
The data I have is that I have an event which occurs once per year, say February 15th 2006 and another event which occurs March 3rd 2006. At the end of March I collect all the events together and then obtain a number between 1 and 5 for each document. I want to the track the monthly performance over the next 3 months (or in quarter 2 in this case).
Then I take the events which happened between April and June and track these from July - Sept.
Then take all the events which happened between July and September and track the performance from Oct to Dec.
Take all the events which happened between Oct and Dec and track the performance from Jan t+1 to Mar t+1.
How can this be done?
library(lubridate)
dates_A <- sample(seq(as.Date('2005/01/01'), as.Date('2010/01/01'), by="day"), 1000)
x_var_A <- rnorm(1000)
d_A <- data.frame(dates_A, x_var_A) %>%
mutate(quarter_A = quarter(dates_A),
year_A = year(dates_A))
dates_B <- sample(seq(as.Date('2005/01/01'), as.Date('2010/01/01'), by="day"), 1000)
x_var_B <- rnorm(1000)
d_B <- data.frame(dates_B, x_var_B) %>%
mutate(quarter_B = quarter(dates_B),
year_B = year(dates_B),
quarter_plus_B = quarter(dates_B + months(3)))
One way to accomplish this is to combine your year and quarter and join based on that.
Your code already includes the quarter +1, so adding a line to each of your calls to mutate() and then using joining based on the new column.
library(lubridate)
library(tidyverse)
dates_A <- sample(seq(as.Date('2005/01/01'), as.Date('2010/01/01'), by="day"), 1000)
x_var_A <- rnorm(1000)
d_A <- data.frame(dates_A, x_var_A) %>%
mutate(quarter_A = quarter(dates_A),
year_A = year(dates_A),
YearQ = paste(year_A, quarter_A))
dates_B <- sample(seq(as.Date('2005/01/01'), as.Date('2010/01/01'), by="day"), 1000)
x_var_B <- rnorm(1000)
d_B <- data.frame(dates_B, x_var_B) %>%
mutate(quarter_B = quarter(dates_B),
year_B = year(dates_B),
quarter_plus_B = quarter(dates_B + months(3)),
YearQ = paste(year_B, quarter_plus_B))
final_d <- left_join(d_A, d_B))
I have an R time series data frame, consisting of multiple variables for each day for about 19 years of data. I would like to compute the mean of only the months which have more than 10 days of values. So, if a month (e.g. Jan for the entire period 1996-2015) has less than 10 days values, I would like to exclude it for the calculation of the mean-monthly for the whole time period.
The data frame is as follows:
date val1,val2,val3,val4,val5
1 1996-01-01 5.25,4.20,3.58,6.44,2.66
2 1996-01-02 10.11,9.22,14.25,12.11,13.22
3 1996-01-03 25.11,30.44,45.22,31.24,27.35
..
..
..
7305 2015-12-31 30.54,55.14,63.12,51.22,45.21
Any ideas?
You can first get the number of observations per month with aggregate and then restrict your dataset to those which have at least minDays observations using merge.
x <- read.table(sep=c(","), head=T, as.is = TRUE, text=
"date,val1,val2,val3,val4,val5
1996-01-01,5.25,4.20,3.58,6.44,2.66
1996-01-02,10.11,9.22,14.25,12.11,13.22
1996-01-03,25.11,30.44,45.22,31.24,27.35")
minDays <- 10
x$ym <- substr(x$date,1,nchar(x$date)-3) #get year month out of date
tt <- aggregate(val1 ~ ym, data=x, FUN=length) #Get number of observations per month
aggregate(val1 ~ ym, data=merge(x, tt[tt$val1>=minDays, "ym", drop=FALSE]), FUN=mean) #Calculate mean when n observations are >= minDays
Or using ave:
x <- read.table(sep=c(","), head=T, as.is = TRUE, text=
"date,val1,val2,val3,val4,val5
1996-01-01,5.25,4.20,3.58,6.44,2.66
1996-01-02,10.11,9.22,14.25,12.11,13.22
1996-01-03,25.11,30.44,45.22,31.24,27.35")
minDays <- 10
x$ym <- substr(x$date,1,nchar(x$date)-3) #get year month out of date
x$n <- with(x, ave(val1, ym, FUN=length))
aggregate(val1 ~ ym, data=x[x$n>=minDays,], FUN=mean)
I would like to calculate the time difference considering ONLY the days and months.
For example:
> as.Date("2018-12-15")-as.Date("2018-12-16")
Time difference of -1 days
> as.Date("2008-12-15")-as.Date("2018-12-16")
Time difference of -3653 days
I want both of them to return -1.
Edit:
Leap years should not be considered as we just want an approximation and the return value do not need to be exact.
As suggested by #Omry Atia we can set the years component to same year and then calculate the difference.
library(lubridate)
get_difference_without_years <- function(x, y) {
x <- ymd(x)
year(x) <- 2018
y <- ymd(y)
year(y) <- 2018
x - y
}
get_difference_without_years("2018-12-15", "2018-12-16")
#Time difference of -1 days
get_difference_without_years("2008-12-15", "2018-12-16")
#Time difference of -1 days
To keep it in base R
get_difference_without_years <- function(x, y) {
x <- as.Date(paste0("2018-", format(as.Date(x), "%m-%d")))
y <- as.Date(paste0("2018-", format(as.Date(y), "%m-%d")))
x - y
}
get_difference_without_years("2008-12-15", "2018-12-16")
#Time difference of -1 days
get_difference_without_years("2018-12-15", "2018-12-16")
#Time difference of -1 days
The question is not well defined for the case that the dates straddle the end of Feb and one year is a leap year and one is not but ignoring this we can replace the year in each date with a leap year if either is a leap year (year 2000) and a non-leap year (year 1999) otherwise and then subtract:
library(lubridate)
d1 <- "2008-12-15"
d2 <- "2018-12-16"
yr <- 1999 + (leap_year(as.Date(d1)) || leap_year(as.Date(d2)))
as.Date(sub("....", yr, d1)) - as.Date(sub("....", yr, d2))
## Time difference of -1 days
ADDED
In a comment the poster indicated that we can ignore the problems introduced by leap years. In that case we can just pick a leap year as the date to substitute in so that it always returns an answer. We do that below. We no longer need lubridate to check whether the dates are leap years or not.
as.Date(sub("....", 2000, d1)) - as.Date(sub("....", 2000, d2))
## Time difference of -1 days
(Alternately we could pick a year that is not a leap year and since most years are not leap years that would more likely not be one day off for straddled dates; however, it would be at the cost of failing if one of the dates is Feb 29th.)
If we're allowed to be a bit more approximate, ignoring leap-years, we can simplify things a bit by using %j (day of year) in format().
yd_diff <- function(x, y=NULL) {
x <- as.integer(format(x, "%j"))
if (is.null(y)) {
diff(x)
} else {
x - as.integer(format(y, "%j"))
}
}
d1 <- as.Date("2008-12-15")
d2 <- as.Date("2018-12-16")
yd_diff(d1, d2)
# 0
set.seed(1)
rd <- as.Date(sample(1:10000, 5), origin="1970-01-01")
yd_diff(rd)
# -30 180 65 -123
And even simpler, we can convert the date to integer and take the modulo days in a year. Graciously, R lets you use modulo with non-integers.
(as.integer(d1) %% 365.24) - (as.integer(d2) %% 365.24)
# -0.6
diff(as.integer(rd) %% 365.24)
# -30.72 180.80 64.84 -123.44
Another solution might be to extract only the day-of-year from each date, and then do the maths op, especially if leap-years are important.
For example, the DoY for the following:
DayOfYear(2020, 12, 15) = 350 # leap year
DayOfYear(2018, 12, 15) = 349
DayOfYear(2016, 12, 15) = 350 # leap year
DayOfYear(2011, 12, 16) = 350
You can find lots of suggestions on how to get the DoY from extract day number of year from dates and How do you convert POSIX date to day of year in R?.
I would like a function that counts the number of specific days per month..
i.e.. Nov '13 -> 5 fridays.. while Dec'13 would return 4 Fridays..
Is there an elegant function that would return this?
library(lubridate)
num_days <- function(date){
x <- as.Date(date)
start = floor_date(x, "month")
count = days_in_month(x)
d = wday(start)
sol = ifelse(d > 4, 5, 4) #estimate that is the first day of the month is after Thu or Fri then the week will have 5 Fridays
sol
}
num_days("2013-08-01")
num_days(today())
What would be a better way to do this?
1) Here d is the input, a Date class object, e.g. d <- Sys.Date(). The result gives the number of Fridays in the year/month that contains d. Replace 5 with 1 to get the number of Mondays:
first <- as.Date(cut(d, "month"))
last <- as.Date(cut(first + 31, "month")) - 1
sum(format(seq(first, last, "day"), "%w") == 5)
2) Alternately replace the last line with the following line. Here, the first term is the number of Fridays from the Epoch to the next Friday on or after the first of the next month and the second term is the number of Fridays from the Epoch to the next Friday on or after the first of d's month. Again, we replace all 5's with 1's to get the count of Mondays.
ceiling(as.numeric(last + 1 - 5 + 4) / 7) - ceiling(as.numeric(first - 5 + 4) / 7)
The second solution is slightly longer (although it has the same number of lines) but it has the advantage of being vectorized, i.e. d could be a vector of dates.
UPDATE: Added second solution.
There are a number of ways to do it. Here is one:
countFridays <- function(y, m) {
fr <- as.Date(paste(y, m, "01", sep="-"))
to <- fr + 31
dt <- seq(fr, to, by="1 day")
df <- data.frame(date=dt, mon=as.POSIXlt(dt)$mon, wday=as.POSIXlt(dt)$wday)
df <- subset(df, df$wday==5 & df$mon==df[1,"mon"])
return(nrow(df))
}
It creates the first of the months, and a day in the next months.
It then creates a data frame of month index (on a 0 to 11 range, but we only use this for comparison) and weekday.
We then subset to a) be in the same month and b) on a Friday. That is your result set, and
we return the number of rows as your anwser.
Note that this only uses base R code.
Without using lubridate -
#arguments to pass to function:
whichweekday <- 5
whichmonth <- 11
whichyear <- 2013
#function code:
firstday <- as.Date(paste('01',whichmonth,whichyear,sep="-"),'%d-%m-%Y')
lastday <- if(whichmonth == 12) { '31-12-2013' } else {seq(as.Date(firstday,'%d-%m-%Y'), length=2, by="1 month")[2]-1}
sum(
strftime(
seq.Date(
from = firstday,
to = lastday,
by = "day"),
'%w'
) == whichweekday)