I want to be able to create a water year column for a time series. The US water year is from Oct-Sept and is considered the year it ends on. For example the 2014 water year is from October 1, 2013 - September 30, 2014.
This is the US water year, but not the only water year. Therefore I want to enter in a start month and have a water year calculated for the date.
For example if my data looks like
date
2008-01-01 00:00:00
2008-02-01 00:00:00
2008-03-01 00:00:00
2008-04-01 00:00:00
.
.
.
2008-12-01 00:00:00
I want my function to work something like:
wtr_yr <- function(data, start_month) {
does stuff
}
Then my output would be
wtr_yr(data, 2)
date wtr_yr
2008-01-01 00:00:00 2008
2008-02-01 00:00:00 2009
2008-03-01 00:00:00 2009
2008-04-01 00:00:00 2009
.
.
.
2009-01-01 00:00:00 2009
2009-02-01 00:00:00 2010
2009-03-01 00:00:00 2010
2009-04-01 00:00:00 2010
I started by breaking the date up into separate columns, but I don't think that is the best way to go about it. Any advice?
Thanks in advance!
We can use POSIXlt to come up with an answer.
wtr_yr <- function(dates, start_month=9) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
Let's now use this function in an example.
# Sample input vector
dates = c("2008-01-01 00:00:00",
"2008-02-01 00:00:00",
"2008-03-01 00:00:00",
"2008-04-01 00:00:00",
"2009-01-01 00:00:00",
"2009-02-01 00:00:00",
"2009-03-01 00:00:00",
"2009-04-01 00:00:00")
# Display the function output
wtr_yr(dates, 2)
# Combine the input and output vectors in a dataframe
df = data.frame(dates, wtr_yr=wtr_yr(dates, 2))
I had a similar problem a while back but dealing with fiscal years that started in October. I found this function which also computes the quarters within the year. For one part, I only wanted it to output the fiscal year, so I edited a tiny part of the function to do that. There is surely a much cleaner/efficient way of doing it, but this should work for smaller data sets. Here is the edited function:
getYearQuarter <- function(x,
firstMonth=7,
fy.prefix='FY',
quarter.prefix='Q',
sep='-',
level.range=c(min(x), max(x)) ) {
if(level.range[1] > min(x) | level.range[2] < max(x)) {
warning(paste0('The range of x is greater than level.range. Values ',
'outside level.range will be returned as NA.'))
}
quarterString <- function(d) {
year <- as.integer(format(d, format='%Y'))
month <- as.integer(format(d, format='%m'))
y <- ifelse(firstMonth > 1 & month >= firstMonth, year+1, year)
q <- cut( (month - firstMonth) %% 12, breaks=c(-Inf,2,5,8,Inf),
labels=paste0(quarter.prefix, 1:4))
return(paste0(fy.prefix, substring(y,3,4)))
}
vals <- quarterString(x)
levels <- unique(quarterString(seq(
as.Date(format(level.range[1], '%Y-%m-01')),
as.Date(format(level.range[2], '%Y-%m-28')), by='month')))
return(factor(vals, levels=levels, ordered=TRUE))
}
Your input vector should be type Date, and then specify the start month. Assuming you have a data frame(df) with the 'date' column as in your question, this should do the trick.
df$wtr_yr <- getYearQuarter(df$date, firstMonth=10)
You can also achieve adding a column by water year by using the "lfstat" package
https://www.rdocumentation.org/packages/lfstat/versions/0.9.4/topics/water_year
Related
I would like to mutate a fiscal month-end date to a dataset in R. In my company the fiscal month-end would be on 21st of that. For example
12/22/2019 to 1/21/2020 will be Jan-2020
1/22/2020 to 2/21/2020 will be Feb-2020
2/22/2020 to 3/21/2020 will be Mar-2020
etc
Dataset
Desired_output
How would I accomplish this in R. The Date column in my data is %m/%d/%Y(1/22/2020)
You could extract the date and if date is greater than 22 add 10 days to it and get the date in month-year format :
transform(dat, Fiscal_Month = format(Date +
ifelse(as.integer(format(Date, '%d')) >= 22, 10, 0), '%b %Y'))
# Date Fiscal_Month
#1 2020-01-20 Jan 2020
#2 2020-01-21 Jan 2020
#3 2020-01-22 Feb 2020
#4 2020-01-23 Feb 2020
#5 2020-01-24 Feb 2020
This can also be done without ifelse like this :
transform(dat, Fiscal_Month = format(Date + c(0, 10)
[(as.integer(format(Date, '%d')) >= 22) + 1], '%b %Y'))
data
Used this sample data :
dat <- data.frame(Date = seq(as.Date('2020-01-20'), by = '1 day',length.out = 5))
1) yearmon We perform the following steps:
create test data d which shows both a date in the start of period month (i.e. 22nd or later) and a date in the end of period month (i.e. 21st or earlier)
convert the input d to Date class giving dd
subtract 21 days thereby shifting it to the month that starts the fiscal period
convert that to ym of yearmon class (which represents a year and a month without a day directly and internally represents it as the year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec) and then add 1/12 to get to the month at the end of fiscal period.
format it as shown. (We could omit this step, i.e. the last line of code, if the default format, e.g. Jan 2020, that yearmon uses is ok.
The whole thing could easily be written in a single line of code but we have broken it up for clarity.
library(zoo)
d <- c("1/22/2020", "1/21/2020") # test data
dd <- as.Date(d, "%m/%d/%Y")
ym <- as.yearmon(dd - 21) + 1/12
format(ym, "%b-%y")
## [1] "Feb-20" "Jan-20"
2) Base R This could be done using only in base R as follows. We make use of dd from above. cut computes the first of the month that dd-21 lies in (but not as a Date class object) and then as.Date converts it to one. Adding 31 shifts it to the end of period month and formatting this we get the final answer.
format(as.Date(cut(dd - 21, "month")) + 31, "%b-%y")
## [1] "Feb-20" "Jan-20"
I would like to know is there a way to transform dates like this
"2016-01-8" in "20160101q" which means the first half of January 2016 or
"20160127" in "20160102q" which means the second half of January 2016 for example and thank you in advance?
here is a solution makeing use of data.table and the lubridate-packages.
It uses the lubridate::days_in_month()-function, to determine the number of days in the month of the date. This is necessairey, since February has (normally) 28 days, so day 15 of February --> 02q. But January has 31 days, so day 15 of January --> 01q.
The logic for calculating the q-periode is:
If day_number / number_of_days_in_month > 0.5 --> q periode = 02q,
else q_period --> 01q.
Then a paste0 command is used to crete the text for in de q_date-column. sprintf() is used to add leading zero for single-digit monthnumbers.
library(data.table)
library(lubridate)
#sample data
data <- data.table( date = as.Date( c("2019-12-30", "2020-01-15", "2020-02-15", "2020-02-14") ) )
# date
# 1: 2019-12-30
# 2: 2020-01-15
# 3: 2020-02-15
# 4: 2020-02-14
#if the day / #days of month > 0.5, date is in q2, else q1
data[ lubridate::mday(date) / lubridate::days_in_month(date) > 0.5,
q_date := paste0( lubridate::year(date), sprintf( "%02d", lubridate::month(date) ), "02q" ) ]
data[ is.na( q_date ),
q_date := paste0( lubridate::year(date), sprintf( "%02d", lubridate::month(date) ), "01q" ) ]
# date q_date
# 1: 2019-12-30 20191202q
# 2: 2020-01-15 20200101q
# 3: 2020-02-15 20200202q
# 4: 2020-02-14 20200201q
you can try with mutate and paste0, first you decompose the date in day, month and year. then create a variable that says if we are in the first or second half of the month, then paste the sting text of month, year and the variable containing "01q" or "02q" depending on the period
date<- c("2016-01-8",
"2016-01-27")
id <- c(1,2)
x <- as.data.frame(cbind(id, date))
library(tidyverse)
library(lubridate)
x = x %>%
mutate(date = ymd(date)) %>%
mutate_at(vars(date), funs(year, month, day))
x$half <- "01q"
x$half[day>15] <- "02q"
paste0(x$year,x$month,x$half)
How can a date/time object in R be transformed on the fraction of a julian day?
For example, how can I turn this date:
date <- as.POSIXct('2006-12-12 12:00:00',tz='GMT')
into a number like this
> fjday
[1] 365.5
where julian day is elapsed day counted from the january 1st. The fraction 0.5 means that it's 12pm, and therefore half of the day.
This is just an example, but my real data covers all the 365 days of year 2006.
Since all your dates are from the same year (2006) this should be pretty easy:
julian(date, origin = as.POSIXct('2006-01-01', tz = 'GMT'))
If you or another reader happen to expand your dataset to other years, then you can set the origin for the beginning of each year as follows:
sapply(date, function(x) julian(x, origin = as.POSIXct(paste0(format(x, "%Y"),'-01-01'), tz = 'GMT')))
Have a look at the difftime function:
> unclass(difftime('2006-12-12 12:00:00', '2006-01-01 00:00:00', tz="GMT", units = "days"))
[1] 345.5
attr(,"units")
[1] "days"
A function to convert POSIX to julian day, an extension of the answer above, source it before using.
julian_conv <- function(x) {
if (is.na(x)) { # Because julian() cannot accept NA values
return(NA)
}
else {
j <-julian(x, origin = as.POSIXlt(paste0(format(x, "%Y"),'-01-01')))
temp <- unclass(j) # To unclass the object julian day to extract julian day
return(temp[1] + 1) # Because Julian day 1 is 1 e.g., 2016-01-01
}
}
Example:
date <- as.POSIXct('2006-12-12 12:00:00')
julian_conv(date)
#[1] 345.5
I like to count the number of Sundays, Mondays, Tuesdays, ...,Saturdays in year 2001. Taking the following dates { 1 Jan, 5 April, 13 April, 25 Dec and 26 Dec} as public holidays and consider them as Sundays. How can I do it in R? - Thanks
Here is the Lithuanian version:
dates <- as.Date("2001-01-01") + 0:364
wd <- weekdays(dates)
idx <- which(dates %in% as.Date(c("2001-01-01", "2001-04-05",
"2001-04-13", "2001-12-25", "2001-12-26")))
wd[idx] <- "sekmadienis"
table(wd)
wd
antradienis ketvirtadienis penktadienis pirmadienis sekmadienis šeštadienis trečiadienis
51 51 51 52 57 52 51
Try the following:
# get all the dates you need
dates <- seq(from=as.Date("2001-01-01"), to=as.Date("2001-12-31"), by="day")
# makes sure the dates are in POSIXlt format
dates <- strptime(dates, "%Y-%m-%d")
# get rid of the public holidays
pub <- strptime(c(as.Date("2001-01-01"),
as.Date("2001-04-05"),
as.Date("2001-04-13"),
as.Date("2001-12-25"),
as.Date("2001-12-26")), "%Y-%m-%d")
dates <- dates[which(!dates%in%pub)]
# To see the day of the week
weekdays <- dates$wday
# Now, count the number of Mondays for example:
length(which(weekdays == 1))
For details, see the documentation for DateTimeClasses. Remember to add 5 to your count of Sundays.
I have a dataset filled with the average windspeed per hour for multiple years. I would like to create an 'average year', in which for each hour the average windspeed for that hour over multiple years is calculated. How can I do this without looping endlessly through the dataset?
Ideally, I would like to just loop through the data once, extracting for each row the right month, day, and hour, and adding the windspeed from that row to the right row in a dataframe where the aggregates for each month, day, and hour are gathered. Is it possible to do this without extracting the month, day, and hour, and then looping over the complete average-year data.frame to find the right row?
Some example data:
data.multipleyears <- data.frame(
DATETIME = c("2001-01-01 01:00:00", "2001-05-03 09:00:00", "2007-01-01 01:00:00", "2008-02-29 12:00:00"),
Windspeed = c(10, 5, 8, 3)
)
Which I would like to aggregate in a dataframe like this:
average.year <- data.frame(
DATETIME = c("01-01 00:00:00", "01-01 01:00:00", ..., "12-31 23:00:00")
Aggregate.Windspeed = (100, 80, ...)
)
From there, I can go on calculating the averages, etc. I have probably overlooked some command, but what would be the right syntax for something like this (in pseudocode):
for(i in 1:nrow(data.multipleyears) {
average.year$Aggregate.Windspeed[
where average.year$DATETIME(month, day, hour) == data.multipleyears$DATETIME[i](month, day, hour)] <- average.year$Aggregate.Windspeed + data.multipleyears$Windspeed[i]
}
Or something like that. Help is appreciated!
I predict that ddply and the plyr package are going to be your best friend :). I created a 30 year dataset with hourly random windspeeds between 1 and 10 ms:
begin_date = as.POSIXlt("1990-01-01", tz = "GMT")
# 30 year dataset
dat = data.frame(dt = begin_date + (0:(24*30*365)) * (3600))
dat = within(dat, {
speed = runif(length(dt), 1, 10)
unique_day = strftime(dt, "%d-%m")
})
> head(dat)
dt unique_day speed
1 1990-01-01 00:00:00 01-01 7.054124
2 1990-01-01 01:00:00 01-01 2.202591
3 1990-01-01 02:00:00 01-01 4.111633
4 1990-01-01 03:00:00 01-01 2.687808
5 1990-01-01 04:00:00 01-01 8.643168
6 1990-01-01 05:00:00 01-01 5.499421
To calculate the daily normalen (30 year average, this term is much used in meteorology) over this 30 year period:
library(plyr)
res = ddply(dat, .(unique_day),
summarise, mean_speed = mean(speed), .progress = "text")
> head(res)
unique_day mean_speed
1 01-01 5.314061
2 01-02 5.677753
3 01-03 5.395054
4 01-04 5.236488
5 01-05 5.436896
6 01-06 5.544966
This takes just a few seconds on my humble two core AMD, so I suspect just going once through the data is not needed. Multiple of these ddply calls for different aggregations (month, season etc) can be done separately.
You can use substr to extract the part of the date you want,
and then use tapply or ddply to aggregate the data.
tapply(
data.multipleyears$Windspeed,
substr( data.multipleyears$DATETIME, 6, 19),
mean
)
# 01-01 01:00:00 02-29 12:00:00 05-03 09:00:00
# 9 3 5
library(plyr)
ddply(
data.multipleyears,
.(when=substr(DATETIME, 6, 19)),
summarize,
Windspeed=mean(Windspeed)
)
# when Windspeed
# 1 01-01 01:00:00 9
# 2 02-29 12:00:00 3
# 3 05-03 09:00:00 5
It is pretty old post, but I wanted to add. I guess timeAverage in Openair can also be used. In the manual, there are more options for timeAverage function.