Adding a column of corresponding seasons to dataframe - r

Here is an example of my dataframe. I am working in R.
date name count
2016-11-12 Joe 5
2016-11-15 Bob 5
2016-06-15 Nick 12
2016-10-16 Cate 6
I would like to add a column to my data frame that will tell me the season that corresponds to the date. I would like it to look like this:
date name count Season
2016-11-12 Joe 5 Winter
2016-11-15 Bob 5 Winter
2017-06-15 Nick 12 Summer
2017-10-16 Cate 6 Fall
I have started some code:
startWinter <- c(month.name[1], month.name[12], month.name[11])
startSummer <- c(month.name[5], month.name[6], month.name[7])
startSpring <- c(month.name[2], month.name[3], month.name[4])
# create a function to find the correct season based on the month
MonthSeason <- function(Month) {
# !is.na()
# ignores values with NA
# match()
# returns a vector of the positions of matches
# If the starting month matches a spring season, print "Spring". If the starting month matches a summer season, print "Summer" etc.
ifelse(!is.na(match(Month, startSpring)),
return("spring"),
return(ifelse(!is.na(match(Month, startWinter)),
"winter",
ifelse(!is.na(match(Month, startSummer)),
"summer","fall"))))
}
This code gives me the season for a month. Im not sure if I am going about this problem in the right way. Can anyone help me out?
Thanks!

There are a couple of hacks, and their usability depends on whether you want to use meteorological or astronomical seasons. I'll offer both, I think they offer sufficient flexibility.
I'm going to use your second data provided, since it provides more than just "Winter".
txt <- "date name count
2016-11-12 Joe 5
2016-11-15 Bob 5
2017-06-15 Nick 12
2017-10-16 Cate 6"
dat <- read.table(text = txt, header = TRUE, stringsAsFactors = FALSE)
dat$date <- as.Date(dat$date)
The quickest method works well when seasons are defined strictly by month.
metseasons <- c(
"01" = "Winter", "02" = "Winter",
"03" = "Spring", "04" = "Spring", "05" = "Spring",
"06" = "Summer", "07" = "Summer", "08" = "Summer",
"09" = "Fall", "10" = "Fall", "11" = "Fall",
"12" = "Winter"
)
metseasons[format(dat$date, "%m")]
# 11 11 06 10
# "Fall" "Fall" "Summer" "Fall"
If you choose to use date ranges for your seasons that are not defined by month start/stop such as the astronomical seasons, here's another 'hack':
astroseasons <- as.integer(c("0000", "0320", "0620", "0922", "1221", "1232"))
astroseasons_labels <- c("Winter", "Spring", "Summer", "Fall", "Winter")
If you use proper Date or POSIX types, then you are including years, which makes things a little less-generic. One might think of using julian dates, but during leap years this produces anomalies. So, with the assumption that Feb 28 is never a seasonal boundary, I'm "numericizing" the month-day. Even though R does do character-comparisons just fine, cut expects numbers, so we convert them to integers.
Two safe-guards: because cut is either right-open (and left-closed) or right-closed (and left-open), then our two book-ends need to extend beyond the legal dates, ergo "0000" and "1232". There are other techniques that could work equally well here (e.g., using -Inf and Inf, post-integerization).
astroseasons_labels[ cut(as.integer(format(dat$date, "%m%d")), astroseasons, labels = FALSE) ]
# [1] "Fall" "Fall" "Spring" "Fall"
Notice that the third date is in Spring when using astronomical seasons and Summer otherwise.
This solution can easily be adjusted to account for the Southern hemisphere or other seasonal preferences/beliefs.
Edit: motivated by #Kristofersen's answer (thanks), I looked into benchmarks. lubridate::month uses a POSIXct-to-POSIXlt conversion to extract the month, which can be over 10x faster than my format(x, "%m") method. As such:
metseasons2 <- c(
"Winter", "Winter",
"Spring", "Spring", "Spring",
"Summer", "Summer", "Summer",
"Fall", "Fall", "Fall",
"Winter"
)
Noting that as.POSIXlt returns 0-based months, we add 1:
metseasons2[ 1 + as.POSIXlt(dat$date)$mon ]
# [1] "Fall" "Fall" "Summer" "Fall"
Comparison:
library(lubridate)
library(microbenchmark)
set.seed(42)
x <- Sys.Date() + sample(1e3)
xlt <- as.POSIXlt(x)
microbenchmark(
metfmt = metseasons[ format(x, "%m") ],
metlt = metseasons2[ 1 + xlt$mon ],
astrofmt = astroseasons_labels[ cut(as.integer(format(x, "%m%d")), astroseasons, labels = FALSE) ],
astrolt = astroseasons_labels[ cut(100*(1+xlt$mon) + xlt$mday, astroseasons, labels = FALSE) ],
lubridate = sapply(month(x), seasons)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# metfmt 1952.091 2135.157 2289.63943 2212.1025 2308.1945 3748.832 100
# metlt 14.223 16.411 22.51550 20.0575 24.7980 68.924 100
# astrofmt 2240.547 2454.245 2622.73109 2507.8520 2674.5080 3923.874 100
# astrolt 42.303 54.702 72.98619 66.1885 89.7095 163.373 100
# lubridate 5906.963 6473.298 7018.11535 6783.2700 7508.0565 11474.050 100
So the methods using as.POSIXlt(...)$mon are significantly faster. (#Kristofersen's answer could be improved by vectorizing it, perhaps with ifelse, but that still won't compare to the speed of the vector lookups with or without cut.)

You can do this pretty quickly with lubridate and a function to change the month number into a season.
library(lubridate)
seasons = function(x){
if(x %in% 2:4) return("Spring")
if(x %in% 5:7) return("Summer")
if(x %in% 8:10) return("Fall")
if(x %in% c(11,12,1)) return("Winter")
}
dat$Season = sapply(month(dat$date), seasons)
> dat
date name count Season
1 2016-11-12 Joe 5 Winter
2 2016-11-15 Bob 5 Winter
3 2016-06-15 Nick 12 Summer
4 2016-10-16 Cate 6 Fall

if your data is df:
# create dataframe for month and corresponding season
dfSeason <- data.frame(season = c(rep("Winter", 3), rep("Summer", 3),
rep("Spring", 3), rep("Fall", 3)),
month = month.name[c(11,12,1, 5:7, 2:4, 8:10)],
stringsAsFactors = F)
# make date as date
df$data <- as.Date(df$date)
# match the month of the date in df (format %B) with month in season
# then use it to index the season of dfSeason
df$season <- dfSeason$season[match(format(df$data, "%B"), dfSeason$month)]

Related

How to determine the seasons of the year from a multitemporal data list using R?

I would like to determine the seasons here in my region from a time list using dplyr or tidyr.
In my province:
Summer: Starts on December 21st through March 20th.
Autumn: Starts on March 21st through June 20th.
Winter: Starts on June 21st through September 22nd.
Spring: Starts September 23rd through December 20th.
My data.frame
sample_station <-c('A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C','A','B','C','A','B','C')
Date_dmy <-c('01/01/2000','08/08/2000','16/03/2001','22/09/2001','01/06/2002','05/01/2002','26/01/2002','16/02/2002','09/03/2002','30/03/2002','20/04/2002','04/01/2000','11/08/2000','19/03/2001','25/09/2001','04/06/2002','08/01/2002','29/01/2002','19/02/2002','12/03/2002','13/09/2001','08/01/2000','15/08/2000','23/03/2001','29/09/2001','08/06/2002','12/01/2002','02/02/2002','23/02/2002','16/03/2002','06/04/2002','01/02/2000','01/02/2000','01/02/2000','02/11/2001','02/11/2001','02/11/2001')
Temperature <-c(17,20,24,19,17,19,23,26,19,19,21,15,23,18,22,22,23,18,19,26,21,22,23,27,19,19,21,23,24,25,26,29,30,21,25,24,23)
df<-data.frame(sample_station, Date_dmy, Temperature)
1) Use findInterval to look up the date in the season_start vector and extract the associated season_name.
library(dplyr)
# given Date class vector returns vector of season names
date2season <- function(date) {
season_start <- c("0101", "0321", "0621", "0923", "1221") # mmdd
season_name <- c("Summer", "Autumn", "Winter", "Spring", "Summer")
mmdd <- format(date, "%m%d")
season_name[findInterval(mmdd, season_start)] ##
}
df %>% mutate(season = date2season(as.Date(Date_dmy, "%d/%m/%Y")))
giving:
sample_station Date_dmy Temperature season
1 A 01/01/2000 17 Summer
2 A 08/08/2000 20 Winter
3 A 16/03/2001 24 Summer
4 A 22/09/2001 19 Winter
5 A 01/06/2002 17 Autumn
...snip...
1a) The last line in date2season, marked ##, could optionally be replaced with
season_name[(mmdd >= "0101") + (mmdd >= "0321") + (mmdd >= "0621") +
(mmdd >= "0923") + (mmdd >= "1221")]
and in that case you don't need the line defining season_start either.
2) An alternative is to use case_when:
df %>%
mutate(mmdd = format(as.Date(Date_dmy, "%d/%m/%Y"), "%m%d"),
season = case_when(
mmdd <= "0320" ~ "Summer",
mmdd <= "0620" ~ "Autumn",
mmdd <= "0922" ~ "Winter",
mmdd <= "1220" ~ "Spring",
TRUE ~ "Summer")) %>%
select(-mmdd)

R - subsetting a data frame in a for loop

im trying to subset a data frame in a for loop to create a smaller data.frame. This is my data.frame
day rain in mm temperature in °C season
1 201 20 summer
2 156 18 summer
3 56 -4 winter
4 98 15 spring
I want to extract a data.frame for each season (with all columns). Here is my code:
for (season in seasons){
a<- weather[which(weather$season %in% season[1]) , ,drop=TRUE]
...
}
Unfortunately, the sub-setting doesn' t work. When i use
a<- weather[which(weather$season %in% "summer") , ,drop=TRUE] it works perfectly. Also this does not work properly:
season <- "summer"
a<- weather[which(weather$season %in% season[1]) , ,drop=TRUE]
Does anyone see the problem with my code? Thank you.
It works with dplyr.
library(dplyr)
mydf <- data.frame(day = c(1,2,3,4),
rain = c(201,156,56,98),
temperature = c(20,18,-4,15),
season = c("summer", "summer", "winter", "spring"))
seasons <- c("spring", "summer", "autumn", "winter")
for (sea in seasons) {
a <- dplyr::filter(mydf, season == sea)
print(a)
}

Odd or even in r

persnr date
411223-6213 2011-01-19
420211-6911 2012-01-19
420604-7716 2007-09-01
430404-8558 2011-09-01
431030-7030 2011-09-01
440127-0055 2012-09-01
I want to create a new column for persnr if the 10th digit is odd or even.
The new column will they be true or false depending on whether the 10th digit of persnr is odd or even. odd=true, even=false
I also would like to create another column för 'date' so for example 2011-09-01 is fall and in the new column fall=true
2012-01-19 is spring and in the new column spring=false.
This is certainly basic but I am a new user in the R and may not be right on it.
You can try substr. Not sure if you count the - character also. In that case,
v1 <- as.numeric(substr(df1$persnr,10,10))
Or else replace 10 by 11 as in #nico's post
df1$newCol <- as.logical(v1%%2)
I would prefer to have it as a logical column, but if you need to change it to 'odd', 'even'
df1$newCol <- c('even', 'odd')[df1$newCol+1L]
# Generate the data
my.data <- data.frame(
persnr=c("411223-6213", "420211-6911",
"420604-7716", "430404-8558",
"431030-7030", "440127-0055"),
date = c("2011-01-19", "2012-01-19",
"2007-09-01", "2011-09-01",
"2011-09-01", "2012-09-01"))
# Get the 10th digit of persnr using substring, then check the reminder
# of its division by 2 to determine if it is odd or even
# Note that I get the 11th char as there is a - in the middle of the number
digit.10 <- substr(my.data$persnr, 11, 11)
my.data$evenOdd <- ifelse(as.integer(digit.10)%%2, "odd", "even")
my.data$evenOdd <- factor(my.data$evenOdd, levels=c("odd", "even"))
To find the season of each date:
# Get month and day, ignore year
month.day <- strftime(my.data$date, format="%m-%d")
# Now check which season we're in -- ASSUMING NORTHERN HEMISPHERE, change if needed
# Also note that the dates of solstices and equinoxes are variable so
# this is approximative...
# Set everyone to winter
my.data$season <- "Winter"
# Find indices for the other seasons
spring <- which(month.day >= "03-21" & month.day < "06-21")
summer <- which(month.day >= "06-21" & month.day < "09-21")
fall <- which(month.day >= "09-21" & month.day < "12-21")
my.data$season[spring] <- "Spring"
my.data$season[summer] <- "Summer"
my.data$season[fall] <- "Fall"
my.data$season <- factor(my.data$season, levels =
c("Spring", "Summer", "Fall", "Winter"))

R Need to extract month and assign season [duplicate]

This question already has answers here:
Find which season a particular date belongs to
(11 answers)
Closed 8 years ago.
I am using R, and I need to set up a loop (I think) where I extract the month from the date and assign a season. I would like to assign winter to months 12, 1, 2; spring to 3, 4, 5; summer to 6, 7, 8; and fall to 9, 10, 11. I have a subset of the data below. I am awful with loops and couldn't figure it out. Also for the date, I wasn't sure how packages like lubridate would work
"","UT_TDS_ID_2011.Monitoring.Location.ID","UT_TDS_ID_2011.Activity.Start.Date","UT_TDS_ID_2011.Value","UT_TDS_ID_2011.Season"
"1",4930585,"7/28/2010 0:00",196,""
"2",4933115,"4/21/2011 0:00",402,""
"3",4933115,"7/23/2010 0:00",506,""
"4",4933115,"6/14/2011 0:00",204,""
"8",4933115,"12/3/2010 0:00",556,""
"9",4933157,"11/18/2010 0:00",318,""
"10",4933157,"11/6/2010 0:00",328,""
"11",4933157,"7/23/2010 0:00",290,""
"12",4933157,"6/14/2011 0:00",250,""
Regarding the subject/title of the question, its actually possible to do this without extracting the month. The first two solutions below do not extract the month. There is also a third solution which does extract the month but only to increment it.
1) as.yearqtr/as.yearmon Convert the dates to year/month and add one month (1/12). Then the calendar quarters correspond to the seasons so convert to year/quarter, yq, and label the quarters as shown:
library(zoo)
yq <- as.yearqtr(as.yearmon(DF$dates, "%m/%d/%Y") + 1/12)
DF$Season <- factor(format(yq, "%q"), levels = 1:4,
labels = c("winter", "spring", "summer", "fall"))
giving:
dates Season
1 7/28/2010 summer
2 4/21/2011 spring
3 7/23/2010 summer
4 6/14/2011 summer
5 12/3/2010 winter
6 11/18/2010 fall
7 11/6/2010 fall
8 7/23/2010 summer
9 6/14/2011 summer
1a) A variation of this is to use chron's quarters which produces a factor so that levels=1:4 does not have to be specified. To use chron replace the last line in (1) with:
library(chron)
DF$Season <- factor(quarters(as.chron(yq)),
labels = c("winter", "spring", "summer", "fall"))
chron could also be used in conjunction with the remaining solutions.
2) cut. This solution only uses the base of R. First convert the dates to the first of the month using cut and add 32 to get a date in the next month, d. The quarters corresponding to d are the seasons so compute the quarters using quarters and construct the labels in the same fashion as the first answser:
d <- as.Date(cut(as.Date(DF$dates, "%m/%d/%Y"), "month")) + 32
DF$Season <- factor(quarters(d), levels = c("Q1", "Q2", "Q3", "Q4"),
labels = c("winter", "spring", "summer", "fall"))
giving the same answer.
3) POSIXlt This solution also uses only the base of R:
p <- as.POSIXlt(as.Date(DF$dates, "%m/%d/%Y"))
p$day <- 1
p$mo <- p$mo+1
DF$Season <- factor(quarters(p), levels = c("Q1", "Q2", "Q3", "Q4"),
labels = c("winter", "spring", "summer", "fall"))
Note 1: We could optionally omit levels= in all these solutions if we knew that every season appears.
Note 2: We used this data frame:
DF <- data.frame(dates = c('7/28/2010', '4/21/2011', '7/23/2010',
'6/14/2011', '12/3/2010', '11/18/2010', '11/6/2010', '7/23/2010',
'6/14/2011'))
Using only base R, you can convert the "datetime" column to "Date" class (as.Date(..)), extract the "month" (format(..., '%m')) and change the character value to numeric (as.numeric(). Create an "indx" vector that have values from "1" to "12", set the names of the values according to the specific season (setNames(..)), and use this to get the corresponding "Season" for the "months" vector.
months <- as.numeric(format(as.Date(df$datetime, '%m/%d/%Y'), '%m'))
indx <- setNames( rep(c('winter', 'spring', 'summer',
'fall'),each=3), c(12,1:11))
df$Season <- unname(indx[as.character(months)])
df
# datetime Season
#1 7/28/2010 0:00 summer
#2 4/21/2011 0:00 spring
#3 7/23/2010 0:00 summer
#4 6/14/2011 0:00 summer
#5 12/3/2010 0:00 winter
#6 11/18/2010 0:00 fall
#7 11/6/2010 0:00 fall
#8 7/23/2010 0:00 summer
#9 6/14/2011 0:00 summer
Or as #Roland mentioned in the comments, you can use strptime to convert the "datetime" to "POSIXlt" and extract the month ($mon)
months <- strptime(df$datetime, format='%m/%d/%Y %H:%M')$mon +1
and use the same method as above
data
df <- data.frame(datetime = c('7/28/2010 0:00', '4/21/2011 0:00',
'7/23/2010 0:00', '6/14/2011 0:00', '12/3/2010 0:00', '11/18/2010 0:00',
'11/6/2010 0:00', '7/23/2010 0:00', '6/14/2011 0:00'),stringsAsFactors=FALSE)

Find which season a particular date belongs to

I have a vector of dates and for each entry, I would like to assign a season. So for example, if a date is between 21.12. and 21.3., I would says that's winter. So far I have tried the following code but I couldn't make it more generic, irrespective of the year.
my.dates <- as.Date("2011-12-01", format = "%Y-%m-%d") + 0:60
low.date <- as.Date("2011-12-15", format = "%Y-%m-%d")
high.date <- as.Date("2012-01-15", format = "%Y-%m-%d")
my.dates[my.dates <= high.date & my.dates >= low.date]
[1] "2011-12-15" "2011-12-16" "2011-12-17" "2011-12-18" "2011-12-19" "2011-12-20" "2011-12-21" "2011-12-22" "2011-12-23" "2011-12-24" "2011-12-25"
[12] "2011-12-26" "2011-12-27" "2011-12-28" "2011-12-29" "2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-01-05"
[23] "2012-01-06" "2012-01-07" "2012-01-08" "2012-01-09" "2012-01-10" "2012-01-11" "2012-01-12" "2012-01-13" "2012-01-14" "2012-01-15"
I have tried formatting the dates without the year, but it isn't working.
ld <- as.Date("12-15", format = "%m-%d")
hd <- as.Date("01-15", format = "%m-%d")
my.dates[my.dates <= hd & my.dates >= ld]
How about using something like this:
getSeason <- function(DATES) {
WS <- as.Date("2012-12-15", format = "%Y-%m-%d") # Winter Solstice
SE <- as.Date("2012-3-15", format = "%Y-%m-%d") # Spring Equinox
SS <- as.Date("2012-6-15", format = "%Y-%m-%d") # Summer Solstice
FE <- as.Date("2012-9-15", format = "%Y-%m-%d") # Fall Equinox
# Convert dates from any year to 2012 dates
d <- as.Date(strftime(DATES, format="2012-%m-%d"))
ifelse (d >= WS | d < SE, "Winter",
ifelse (d >= SE & d < SS, "Spring",
ifelse (d >= SS & d < FE, "Summer", "Fall")))
}
my.dates <- as.Date("2011-12-01", format = "%Y-%m-%d") + 0:60
head(getSeason(my.dates), 24)
# [1] "Fall" "Fall" "Fall" "Fall" "Fall" "Fall" "Fall"
# [8] "Fall" "Fall" "Fall" "Fall" "Fall" "Fall" "Fall"
# [15] "Winter" "Winter" "Winter" "Winter" "Winter" "Winter"
One note: 2012 is a good year to which to convert all of the dates; since it is a leap year, any February 29ths in your data set will be handled smoothly.
I have something similarly ugly as Tim:
R> toSeason <- function(dat) {
+
+ stopifnot(class(dat) == "Date")
+
+ scalarCheck <- function(dat) {
+ m <- as.POSIXlt(dat)$mon + 1 # correct for 0:11 range
+ d <- as.POSIXlt(dat)$mday # correct for 0:11 range
+ if ((m == 3 & d >= 21) | (m == 4) | (m == 5) | (m == 6 & d < 21)) {
+ r <- 1
+ } else if ((m == 6 & d >= 21) | (m == 7) | (m == 8) | (m == 9 & d < 21)) {
+ r <- 2
+ } else if ((m == 9 & d >= 21) | (m == 10) | (m == 11) | (m == 12 & d < 21)) {
+ r <- 3
+ } else {
+ r <- 4
+ }
+ r
+ }
+
+ res <- sapply(dat, scalarCheck)
+ res <- ordered(res, labels=c("Spring", "Summer", "Fall", "Winter"))
+ invisible(res)
+ }
R>
And here is a test:
R> date <- Sys.Date() + (0:11)*30
R> DF <- data.frame(Date=date, Season=toSeason(date))
R> DF
Date Season
1 2012-02-29 Winter
2 2012-03-30 Spring
3 2012-04-29 Spring
4 2012-05-29 Spring
5 2012-06-28 Summer
6 2012-07-28 Summer
7 2012-08-27 Summer
8 2012-09-26 Fall
9 2012-10-26 Fall
10 2012-11-25 Fall
11 2012-12-25 Winter
12 2013-01-24 Winter
R> summary(DF)
Date Season
Min. :2012-02-29 Spring:3
1st Qu.:2012-05-21 Summer:3
Median :2012-08-12 Fall :3
Mean :2012-08-12 Winter:3
3rd Qu.:2012-11-02
Max. :2013-01-24
R>
I would create a lookup table, and go from there. An example (note the code obfuscation using the d() function and the pragmatic way of filling the lut):
# Making lookup table (lut), only needed once. You can save
# it using save() for later use. Note I take a leap year.
d = function(month_day) which(lut$month_day == month_day)
lut = data.frame(all_dates = as.POSIXct("2012-1-1") + ((0:365) * 3600 * 24),
season = NA)
lut = within(lut, { month_day = strftime(all_dates, "%b-%d") })
lut[c(d("Jan-01"):d("Mar-20"), d("Dec-21"):d("Dec-31")), "season"] = "winter"
lut[c(d("Mar-21"):d("Jun-20")), "season"] = "spring"
lut[c(d("Jun-21"):d("Sep-20")), "season"] = "summer"
lut[c(d("Sep-21"):d("Dec-20")), "season"] = "autumn"
rownames(lut) = lut$month_day
After creating the lookup table, you can extract quite easily from it to what season a month/day combination belongs to:
dat = data.frame(dates = Sys.Date() + (0:11)*30)
dat = within(dat, {
season = lut[strftime(dates, "%b-%d"), "season"]
})
> dat
dates season
1 2012-02-29 winter
2 2012-03-30 spring
3 2012-04-29 spring
4 2012-05-29 spring
5 2012-06-28 summer
6 2012-07-28 summer
7 2012-08-27 summer
8 2012-09-26 autumn
9 2012-10-26 autumn
10 2012-11-25 autumn
11 2012-12-25 winter
12 2013-01-24 winter
All nice and vectorized :). I think once the table is created, this is very quick.
Simply use time2season function. It gets date and generates season:
time2season(x, out.fmt = "months", type="default")
You can find more infromation here.
I think this would do it, but it's an ugly solution:
my.dates <- as.Date("2011-12-01", format = "%Y-%m-%d") + 0:60
ld <- as.Date("12-15", format = "%m-%d")
hd <- as.Date("01-15", format = "%m-%d")
my.dates2 <- as.Date(unlist(lapply(strsplit(as.character(my.dates),split=""),function(x) paste(x[6:10],collapse=""))),format="%m-%d")
my.dates[my.dates2 <= hd | my.dates2 >= ld]
[1] "2011-12-15" "2011-12-16" "2011-12-17" "2011-12-18" "2011-12-19"
[6] "2011-12-20" "2011-12-21" "2011-12-22" "2011-12-23" "2011-12-24"
[11] "2011-12-25" "2011-12-26" "2011-12-27" "2011-12-28" "2011-12-29"
[16] "2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03"
[21] "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" "2012-01-08"
[26] "2012-01-09" "2012-01-10" "2012-01-11" "2012-01-12" "2012-01-13"
[31] "2012-01-14" "2012-01-15"
My solution is not fast but is flexible about the starts of the seasons as long as they are defined in a dataframe first for the function assignSeason. It requires magrittr for the piping functions, lubridate for the year function, and dplyr for mutate.
seasons <- data.frame(
SE = as.POSIXct(c("2009-3-20", "2010-3-20", "2011-3-20", "2012-3-20",
"2013-3-20", "2014-3-20"), format="%Y-%m-%d"),
SS = as.POSIXct(c("2009-6-21", "2010-6-21", "2011-6-21", "2012-6-20",
"2013-6-21", "2014-6-21"), format="%Y-%m-%d"),
FE = as.POSIXct(c("2009-9-22", "2010-9-23", "2011-9-23", "2012-9-22",
"2013-9-22", "2014-9-23"), format="%Y-%m-%d"),
WS = as.POSIXct(c("2009-12-21", "2010-12-21", "2011-12-22", "2012-12-21",
"2013-12-21", "2014-12-21"), format="%Y-%m-%d")
)
assignSeason <- function(dat, SeasonStarts=seasons) {
dat %<>% mutate(
Season = lapply(Date,
function(x) {
findInterval(
x,
SeasonStarts[which(year(x)==year(SeasonStarts$WS)), ]
)
}
) %>% unlist
)
dat[which(dat$Season==0 | dat$Season==4), ]$Season <- "Winter"
dat[which(dat$Season==1), ]$Season <- "Spring"
dat[which(dat$Season==2), ]$Season <- "Summer"
dat[which(dat$Season==3), ]$Season <- "Fall"
return(dat)
}
Example data:
dat = data.frame(
Date = as.POSIXct(strptime(as.Date("2011-12-01", format = "%Y-%m-%d") +
(0:10)*30, format="%Y-%m-%d"))
)
dat %>% assignSeason
Result:
Date Season
1 2011-12-01 Fall
2 2011-12-31 Winter
3 2012-01-30 Winter
4 2012-02-29 Winter
5 2012-03-30 Spring
6 2012-04-29 Spring
7 2012-05-29 Spring
8 2012-06-28 Summer
9 2012-07-28 Summer
10 2012-08-27 Summer
11 2012-09-26 Fall
Here a more general solution, that nevertheless needs 3 libraries... It considers all years and the hemisphere:
library(data.table)
library(zoo)
library(dplyr)
get.seasons <- function(dates, hemisphere = "N"){
years <- unique(year(dates))
years <- c(min(years - 1), max(years + 1), years) %>% sort
if(hemisphere == "N"){
seasons <- c("winter", "spring", "summer", "fall")}else{
seasons <- c("summer", "fall", "winter", "spring")}
dt.dates <- bind_rows(
data.table(date = as.Date(paste0(years, "-12-21")), init = seasons[1], type = "B"),# Summer in south hemisphere
data.table(date = as.Date(paste0(years, "-3-21")), init = seasons[2], type = "B"), # Fall in south hemisphere
data.table(date = as.Date(paste0(years, "-6-21")), init = seasons[3], type = "B"), # Winter in south hemisphere
data.table(date = as.Date(paste0(years, "-9-23")), init = seasons[4], type = "B"), # Winter in south hemisphere
data.table(date = dates, i = 1:(length(dates)), type = "A") # dates to compute
)[order(date)]
dt.dates[, init := zoo::na.locf(init)]
return(dt.dates[type == "A"][order(i)]$init)
}
I think library zoo would be easy
library(zoo)
yq <- as.yearqtr(as.yearmon(DF$dates, "%m/%d/%Y") + 1/12)
DF$Season <- factor(format(yq, "%q"), levels = 1:4,
labels = c("winter", "spring", "summer", "fall"))
The most accurate approach to this issue is by splitting up the season that intersects newyear.
Now I'm a c# guy but the idea behind the season check is the same for all languages.
I've created a jsfiddle here: https://jsfiddle.net/pieterjandc/L3prwqmh/1/
Here is the core code, which splits up the season crossing the newyear, and performs the comparision:
const seasons = [{
name: 'Spring',
start: new Date(2000, 2, 21),
end: new Date(2000, 5, 20)
},{
name: 'Summer',
start: new Date(2000, 5, 21),
end: new Date(2000, 8, 20)
},{
name: 'Autumn/Fall',
start: new Date(2000, 8, 21),
end: new Date(2000, 11, 20)
},{
name: 'Winter',
start: new Date(2000, 11, 21),
end: new Date(2001, 2, 20)
}];
/** Checks if a date is within a specified season */
function checkSeason(season, date) {
let remappedStart = new Date(2000, season.start.getMonth(), season.start.getDate());
let remappedDate = new Date(2000, date.getMonth(), date.getDate());
let remappedEnd = new Date(2000, season.end.getMonth(), season.end.getDate());
// Check if the season crosses newyear
if (season.start.getFullYear() === season.end.getFullYear()) {
// Simple comparison
return (remappedStart <= remappedDate) && (remappedDate <= remappedEnd);
} else {
// Split the season, remap all to year 2000, and perform a simple comparison
return (remappedStart <= remappedDate) && (remappedDate <= new Date(2000, 11, 31))
|| (new Date(2000, 0, 1) <= remappedDate) && (remappedDate <= remappedEnd);
}
}
function findSeason(seasons, date) {
for (let i = 0; i < seasons.length; i++) {
let isInSeason = checkSeason(seasons[i], date);
if (isInSeason === true) {
return seasons[i];
}
}
return null;
}
8 years later and there is a really easy Lubridate answer for checking if X date is in Y date range.
as.Date("2020-05-01") %within% (as.Date("2020-01-01") %--% as.Date("2021-01-01"))
So you'd define your date ranges using the lubridate date range opperator, %--%
range_1 <- A_Date %--% Z_date
then to check if X date is within range_1 use %within%
library(lubridate)
summer <-
ymd(paste0(seq(2019, 2021), "-01", "-01")) %--% ymd(paste0(seq(2019, 2021), "-05", "-05"))
ymd("2020-02-01") %within% summer
since the above ranges are from 20xx-01-1 %--% 20xx-05-05 the query above returns FALSE, TRUE, FALSE but you could set a query to return TRUE if any are TRUE.
Bit late to the party but an additional base R solution (I stole #Josh O'Brien's brilliant logic for the astronomical seasons piece) updating the UTC dates for equinoxes and solstices for the 2016 - 2026 decade (i will endeavour to add a lookup table for the UTC dates for the equinoxes and solstices in the past and future).
# Function to take a date vector and return the season
# season_stamper => function
season_stamper <- function(
date_vec,
date_fmt = "%Y-%m-%d",
hemisphere = c("north", "south"),
season_type = c(
ifelse(hemisphere == "south",
"monthly periods", "astronomical"),
ifelse(hemisphere == "south",
"astronomical", "monthly periods")
)){
# Resolve which hemisphere was selected:
# hemisphere_selected => string scalar
hemisphere_selected <- match.arg(hemisphere)
# Extract the month number from the dates:
# mon_nos => integer vector
mon_nos <- (as.POSIXlt(strptime(date_vec, date_fmt))$mon + 1)
# Resolve the type of season: season_type_selected => character scalar
season_type_selected <- match.arg(season_type)
# If the season type is a 3-month period:
if(season_type_selected == "monthly periods"){
# Resolve the seasons based on the hemisphere:
# seasons => string vector
seasons <- switch(
hemisphere_selected,
"north"=c("Winter", "Spring", "Summer", "Fall"),
c("Summer", "Autumn", "Winter", "Spring")
)
# Stamp the date vector: season_stamps => string vector
season_stamps <- seasons[((mon_nos %/% (12 / 4)) %% 4 + 1)]
# Otherwise:
}else{
# Convert dates from any year to 2020: d=> Date Scalar
d <- as.Date(strftime(date_vec, format="2020-%m-%d"))
# If the dates are from the northern hemisphere:
if(hemisphere_selected == "north"){
# Store as a variable Date of the Winter Solstice for a leap year:
# WS => date scalar
WS <- as.Date("2020-12-21", format = "%Y-%m-%d")
# Store as a variable Date of the Spring Equinox for a leap year:
# SE => date scalar
SE <- as.Date("2020-3-20", format = "%Y-%m-%d")
# Store as a variable Date of the Summer Solstice for a leap year:
# SS => date scalar
SS <- as.Date("2020-6-21", format = "%Y-%m-%d")
# Store as a variable Date of the Fall Equinox for a leap year:
# SS => date scalar
FE <- as.Date("2020-9-22", format = "%Y-%m-%d")
# Resolve the season: season_stamps => character vector
season_stamps <- ifelse(d >= WS | d < SE, "Winter",
ifelse(d >= SE & d < SS, "Spring",
ifelse(d >= SS & d < FE, "Summer", "Fall")))
# Otherwise:
}else{
# Store as a variable Date of the Summer Solstice for a leap year:
# WS => date scalar
SS <- as.Date("2020-12-21", format = "%Y-%m-%d")
# Store as a variable the Date of the Autumn Equinox:
# AE => date scalar
AE <- as.Date("2020-3-20", format = "%Y-%m-%d")
# Store as a variable the Date of the Winter Solstice:
# WS => date scalar
WS <- as.Date("2020-6-21", format = "%Y-%m-%d")
# Store as a variable the DAte of the Spring Equinox:
# SE => date scalar
SE <- as.Date("2020-9-22", format = "%Y-%m-%d")
# Resolve the season: season_stamps => character vector
season_stamps <- ifelse(d >= SS | d < AE, "Summer",
ifelse(d >= SE & d < SS, "Spring",
ifelse(d >= WS & d < SE, "Winter", "Autumn")))
}
}
# Explicitly define the returned object:
# string vecctor => Global Env
return(season_stamps)
}
# Data:
my.dates <- as.Date("2019-12-01", format = "%Y-%m-%d") + 0:60
low.date <- as.Date("2019-12-15", format = "%Y-%m-%d")
high.date <- as.Date("2020-01-15", format = "%Y-%m-%d")
date_vec <- my.dates[my.dates <= high.date & my.dates >= low.date]

Resources