I have a vector of dates and for each entry, I would like to assign a season. So for example, if a date is between 21.12. and 21.3., I would says that's winter. So far I have tried the following code but I couldn't make it more generic, irrespective of the year.
my.dates <- as.Date("2011-12-01", format = "%Y-%m-%d") + 0:60
low.date <- as.Date("2011-12-15", format = "%Y-%m-%d")
high.date <- as.Date("2012-01-15", format = "%Y-%m-%d")
my.dates[my.dates <= high.date & my.dates >= low.date]
[1] "2011-12-15" "2011-12-16" "2011-12-17" "2011-12-18" "2011-12-19" "2011-12-20" "2011-12-21" "2011-12-22" "2011-12-23" "2011-12-24" "2011-12-25"
[12] "2011-12-26" "2011-12-27" "2011-12-28" "2011-12-29" "2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03" "2012-01-04" "2012-01-05"
[23] "2012-01-06" "2012-01-07" "2012-01-08" "2012-01-09" "2012-01-10" "2012-01-11" "2012-01-12" "2012-01-13" "2012-01-14" "2012-01-15"
I have tried formatting the dates without the year, but it isn't working.
ld <- as.Date("12-15", format = "%m-%d")
hd <- as.Date("01-15", format = "%m-%d")
my.dates[my.dates <= hd & my.dates >= ld]
How about using something like this:
getSeason <- function(DATES) {
WS <- as.Date("2012-12-15", format = "%Y-%m-%d") # Winter Solstice
SE <- as.Date("2012-3-15", format = "%Y-%m-%d") # Spring Equinox
SS <- as.Date("2012-6-15", format = "%Y-%m-%d") # Summer Solstice
FE <- as.Date("2012-9-15", format = "%Y-%m-%d") # Fall Equinox
# Convert dates from any year to 2012 dates
d <- as.Date(strftime(DATES, format="2012-%m-%d"))
ifelse (d >= WS | d < SE, "Winter",
ifelse (d >= SE & d < SS, "Spring",
ifelse (d >= SS & d < FE, "Summer", "Fall")))
}
my.dates <- as.Date("2011-12-01", format = "%Y-%m-%d") + 0:60
head(getSeason(my.dates), 24)
# [1] "Fall" "Fall" "Fall" "Fall" "Fall" "Fall" "Fall"
# [8] "Fall" "Fall" "Fall" "Fall" "Fall" "Fall" "Fall"
# [15] "Winter" "Winter" "Winter" "Winter" "Winter" "Winter"
One note: 2012 is a good year to which to convert all of the dates; since it is a leap year, any February 29ths in your data set will be handled smoothly.
I have something similarly ugly as Tim:
R> toSeason <- function(dat) {
+
+ stopifnot(class(dat) == "Date")
+
+ scalarCheck <- function(dat) {
+ m <- as.POSIXlt(dat)$mon + 1 # correct for 0:11 range
+ d <- as.POSIXlt(dat)$mday # correct for 0:11 range
+ if ((m == 3 & d >= 21) | (m == 4) | (m == 5) | (m == 6 & d < 21)) {
+ r <- 1
+ } else if ((m == 6 & d >= 21) | (m == 7) | (m == 8) | (m == 9 & d < 21)) {
+ r <- 2
+ } else if ((m == 9 & d >= 21) | (m == 10) | (m == 11) | (m == 12 & d < 21)) {
+ r <- 3
+ } else {
+ r <- 4
+ }
+ r
+ }
+
+ res <- sapply(dat, scalarCheck)
+ res <- ordered(res, labels=c("Spring", "Summer", "Fall", "Winter"))
+ invisible(res)
+ }
R>
And here is a test:
R> date <- Sys.Date() + (0:11)*30
R> DF <- data.frame(Date=date, Season=toSeason(date))
R> DF
Date Season
1 2012-02-29 Winter
2 2012-03-30 Spring
3 2012-04-29 Spring
4 2012-05-29 Spring
5 2012-06-28 Summer
6 2012-07-28 Summer
7 2012-08-27 Summer
8 2012-09-26 Fall
9 2012-10-26 Fall
10 2012-11-25 Fall
11 2012-12-25 Winter
12 2013-01-24 Winter
R> summary(DF)
Date Season
Min. :2012-02-29 Spring:3
1st Qu.:2012-05-21 Summer:3
Median :2012-08-12 Fall :3
Mean :2012-08-12 Winter:3
3rd Qu.:2012-11-02
Max. :2013-01-24
R>
I would create a lookup table, and go from there. An example (note the code obfuscation using the d() function and the pragmatic way of filling the lut):
# Making lookup table (lut), only needed once. You can save
# it using save() for later use. Note I take a leap year.
d = function(month_day) which(lut$month_day == month_day)
lut = data.frame(all_dates = as.POSIXct("2012-1-1") + ((0:365) * 3600 * 24),
season = NA)
lut = within(lut, { month_day = strftime(all_dates, "%b-%d") })
lut[c(d("Jan-01"):d("Mar-20"), d("Dec-21"):d("Dec-31")), "season"] = "winter"
lut[c(d("Mar-21"):d("Jun-20")), "season"] = "spring"
lut[c(d("Jun-21"):d("Sep-20")), "season"] = "summer"
lut[c(d("Sep-21"):d("Dec-20")), "season"] = "autumn"
rownames(lut) = lut$month_day
After creating the lookup table, you can extract quite easily from it to what season a month/day combination belongs to:
dat = data.frame(dates = Sys.Date() + (0:11)*30)
dat = within(dat, {
season = lut[strftime(dates, "%b-%d"), "season"]
})
> dat
dates season
1 2012-02-29 winter
2 2012-03-30 spring
3 2012-04-29 spring
4 2012-05-29 spring
5 2012-06-28 summer
6 2012-07-28 summer
7 2012-08-27 summer
8 2012-09-26 autumn
9 2012-10-26 autumn
10 2012-11-25 autumn
11 2012-12-25 winter
12 2013-01-24 winter
All nice and vectorized :). I think once the table is created, this is very quick.
Simply use time2season function. It gets date and generates season:
time2season(x, out.fmt = "months", type="default")
You can find more infromation here.
I think this would do it, but it's an ugly solution:
my.dates <- as.Date("2011-12-01", format = "%Y-%m-%d") + 0:60
ld <- as.Date("12-15", format = "%m-%d")
hd <- as.Date("01-15", format = "%m-%d")
my.dates2 <- as.Date(unlist(lapply(strsplit(as.character(my.dates),split=""),function(x) paste(x[6:10],collapse=""))),format="%m-%d")
my.dates[my.dates2 <= hd | my.dates2 >= ld]
[1] "2011-12-15" "2011-12-16" "2011-12-17" "2011-12-18" "2011-12-19"
[6] "2011-12-20" "2011-12-21" "2011-12-22" "2011-12-23" "2011-12-24"
[11] "2011-12-25" "2011-12-26" "2011-12-27" "2011-12-28" "2011-12-29"
[16] "2011-12-30" "2011-12-31" "2012-01-01" "2012-01-02" "2012-01-03"
[21] "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" "2012-01-08"
[26] "2012-01-09" "2012-01-10" "2012-01-11" "2012-01-12" "2012-01-13"
[31] "2012-01-14" "2012-01-15"
My solution is not fast but is flexible about the starts of the seasons as long as they are defined in a dataframe first for the function assignSeason. It requires magrittr for the piping functions, lubridate for the year function, and dplyr for mutate.
seasons <- data.frame(
SE = as.POSIXct(c("2009-3-20", "2010-3-20", "2011-3-20", "2012-3-20",
"2013-3-20", "2014-3-20"), format="%Y-%m-%d"),
SS = as.POSIXct(c("2009-6-21", "2010-6-21", "2011-6-21", "2012-6-20",
"2013-6-21", "2014-6-21"), format="%Y-%m-%d"),
FE = as.POSIXct(c("2009-9-22", "2010-9-23", "2011-9-23", "2012-9-22",
"2013-9-22", "2014-9-23"), format="%Y-%m-%d"),
WS = as.POSIXct(c("2009-12-21", "2010-12-21", "2011-12-22", "2012-12-21",
"2013-12-21", "2014-12-21"), format="%Y-%m-%d")
)
assignSeason <- function(dat, SeasonStarts=seasons) {
dat %<>% mutate(
Season = lapply(Date,
function(x) {
findInterval(
x,
SeasonStarts[which(year(x)==year(SeasonStarts$WS)), ]
)
}
) %>% unlist
)
dat[which(dat$Season==0 | dat$Season==4), ]$Season <- "Winter"
dat[which(dat$Season==1), ]$Season <- "Spring"
dat[which(dat$Season==2), ]$Season <- "Summer"
dat[which(dat$Season==3), ]$Season <- "Fall"
return(dat)
}
Example data:
dat = data.frame(
Date = as.POSIXct(strptime(as.Date("2011-12-01", format = "%Y-%m-%d") +
(0:10)*30, format="%Y-%m-%d"))
)
dat %>% assignSeason
Result:
Date Season
1 2011-12-01 Fall
2 2011-12-31 Winter
3 2012-01-30 Winter
4 2012-02-29 Winter
5 2012-03-30 Spring
6 2012-04-29 Spring
7 2012-05-29 Spring
8 2012-06-28 Summer
9 2012-07-28 Summer
10 2012-08-27 Summer
11 2012-09-26 Fall
Here a more general solution, that nevertheless needs 3 libraries... It considers all years and the hemisphere:
library(data.table)
library(zoo)
library(dplyr)
get.seasons <- function(dates, hemisphere = "N"){
years <- unique(year(dates))
years <- c(min(years - 1), max(years + 1), years) %>% sort
if(hemisphere == "N"){
seasons <- c("winter", "spring", "summer", "fall")}else{
seasons <- c("summer", "fall", "winter", "spring")}
dt.dates <- bind_rows(
data.table(date = as.Date(paste0(years, "-12-21")), init = seasons[1], type = "B"),# Summer in south hemisphere
data.table(date = as.Date(paste0(years, "-3-21")), init = seasons[2], type = "B"), # Fall in south hemisphere
data.table(date = as.Date(paste0(years, "-6-21")), init = seasons[3], type = "B"), # Winter in south hemisphere
data.table(date = as.Date(paste0(years, "-9-23")), init = seasons[4], type = "B"), # Winter in south hemisphere
data.table(date = dates, i = 1:(length(dates)), type = "A") # dates to compute
)[order(date)]
dt.dates[, init := zoo::na.locf(init)]
return(dt.dates[type == "A"][order(i)]$init)
}
I think library zoo would be easy
library(zoo)
yq <- as.yearqtr(as.yearmon(DF$dates, "%m/%d/%Y") + 1/12)
DF$Season <- factor(format(yq, "%q"), levels = 1:4,
labels = c("winter", "spring", "summer", "fall"))
The most accurate approach to this issue is by splitting up the season that intersects newyear.
Now I'm a c# guy but the idea behind the season check is the same for all languages.
I've created a jsfiddle here: https://jsfiddle.net/pieterjandc/L3prwqmh/1/
Here is the core code, which splits up the season crossing the newyear, and performs the comparision:
const seasons = [{
name: 'Spring',
start: new Date(2000, 2, 21),
end: new Date(2000, 5, 20)
},{
name: 'Summer',
start: new Date(2000, 5, 21),
end: new Date(2000, 8, 20)
},{
name: 'Autumn/Fall',
start: new Date(2000, 8, 21),
end: new Date(2000, 11, 20)
},{
name: 'Winter',
start: new Date(2000, 11, 21),
end: new Date(2001, 2, 20)
}];
/** Checks if a date is within a specified season */
function checkSeason(season, date) {
let remappedStart = new Date(2000, season.start.getMonth(), season.start.getDate());
let remappedDate = new Date(2000, date.getMonth(), date.getDate());
let remappedEnd = new Date(2000, season.end.getMonth(), season.end.getDate());
// Check if the season crosses newyear
if (season.start.getFullYear() === season.end.getFullYear()) {
// Simple comparison
return (remappedStart <= remappedDate) && (remappedDate <= remappedEnd);
} else {
// Split the season, remap all to year 2000, and perform a simple comparison
return (remappedStart <= remappedDate) && (remappedDate <= new Date(2000, 11, 31))
|| (new Date(2000, 0, 1) <= remappedDate) && (remappedDate <= remappedEnd);
}
}
function findSeason(seasons, date) {
for (let i = 0; i < seasons.length; i++) {
let isInSeason = checkSeason(seasons[i], date);
if (isInSeason === true) {
return seasons[i];
}
}
return null;
}
8 years later and there is a really easy Lubridate answer for checking if X date is in Y date range.
as.Date("2020-05-01") %within% (as.Date("2020-01-01") %--% as.Date("2021-01-01"))
So you'd define your date ranges using the lubridate date range opperator, %--%
range_1 <- A_Date %--% Z_date
then to check if X date is within range_1 use %within%
library(lubridate)
summer <-
ymd(paste0(seq(2019, 2021), "-01", "-01")) %--% ymd(paste0(seq(2019, 2021), "-05", "-05"))
ymd("2020-02-01") %within% summer
since the above ranges are from 20xx-01-1 %--% 20xx-05-05 the query above returns FALSE, TRUE, FALSE but you could set a query to return TRUE if any are TRUE.
Bit late to the party but an additional base R solution (I stole #Josh O'Brien's brilliant logic for the astronomical seasons piece) updating the UTC dates for equinoxes and solstices for the 2016 - 2026 decade (i will endeavour to add a lookup table for the UTC dates for the equinoxes and solstices in the past and future).
# Function to take a date vector and return the season
# season_stamper => function
season_stamper <- function(
date_vec,
date_fmt = "%Y-%m-%d",
hemisphere = c("north", "south"),
season_type = c(
ifelse(hemisphere == "south",
"monthly periods", "astronomical"),
ifelse(hemisphere == "south",
"astronomical", "monthly periods")
)){
# Resolve which hemisphere was selected:
# hemisphere_selected => string scalar
hemisphere_selected <- match.arg(hemisphere)
# Extract the month number from the dates:
# mon_nos => integer vector
mon_nos <- (as.POSIXlt(strptime(date_vec, date_fmt))$mon + 1)
# Resolve the type of season: season_type_selected => character scalar
season_type_selected <- match.arg(season_type)
# If the season type is a 3-month period:
if(season_type_selected == "monthly periods"){
# Resolve the seasons based on the hemisphere:
# seasons => string vector
seasons <- switch(
hemisphere_selected,
"north"=c("Winter", "Spring", "Summer", "Fall"),
c("Summer", "Autumn", "Winter", "Spring")
)
# Stamp the date vector: season_stamps => string vector
season_stamps <- seasons[((mon_nos %/% (12 / 4)) %% 4 + 1)]
# Otherwise:
}else{
# Convert dates from any year to 2020: d=> Date Scalar
d <- as.Date(strftime(date_vec, format="2020-%m-%d"))
# If the dates are from the northern hemisphere:
if(hemisphere_selected == "north"){
# Store as a variable Date of the Winter Solstice for a leap year:
# WS => date scalar
WS <- as.Date("2020-12-21", format = "%Y-%m-%d")
# Store as a variable Date of the Spring Equinox for a leap year:
# SE => date scalar
SE <- as.Date("2020-3-20", format = "%Y-%m-%d")
# Store as a variable Date of the Summer Solstice for a leap year:
# SS => date scalar
SS <- as.Date("2020-6-21", format = "%Y-%m-%d")
# Store as a variable Date of the Fall Equinox for a leap year:
# SS => date scalar
FE <- as.Date("2020-9-22", format = "%Y-%m-%d")
# Resolve the season: season_stamps => character vector
season_stamps <- ifelse(d >= WS | d < SE, "Winter",
ifelse(d >= SE & d < SS, "Spring",
ifelse(d >= SS & d < FE, "Summer", "Fall")))
# Otherwise:
}else{
# Store as a variable Date of the Summer Solstice for a leap year:
# WS => date scalar
SS <- as.Date("2020-12-21", format = "%Y-%m-%d")
# Store as a variable the Date of the Autumn Equinox:
# AE => date scalar
AE <- as.Date("2020-3-20", format = "%Y-%m-%d")
# Store as a variable the Date of the Winter Solstice:
# WS => date scalar
WS <- as.Date("2020-6-21", format = "%Y-%m-%d")
# Store as a variable the DAte of the Spring Equinox:
# SE => date scalar
SE <- as.Date("2020-9-22", format = "%Y-%m-%d")
# Resolve the season: season_stamps => character vector
season_stamps <- ifelse(d >= SS | d < AE, "Summer",
ifelse(d >= SE & d < SS, "Spring",
ifelse(d >= WS & d < SE, "Winter", "Autumn")))
}
}
# Explicitly define the returned object:
# string vecctor => Global Env
return(season_stamps)
}
# Data:
my.dates <- as.Date("2019-12-01", format = "%Y-%m-%d") + 0:60
low.date <- as.Date("2019-12-15", format = "%Y-%m-%d")
high.date <- as.Date("2020-01-15", format = "%Y-%m-%d")
date_vec <- my.dates[my.dates <= high.date & my.dates >= low.date]
Related
I would like to determine the seasons here in my region from a time list using dplyr or tidyr.
In my province:
Summer: Starts on December 21st through March 20th.
Autumn: Starts on March 21st through June 20th.
Winter: Starts on June 21st through September 22nd.
Spring: Starts September 23rd through December 20th.
My data.frame
sample_station <-c('A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C','A','B','C','A','B','C')
Date_dmy <-c('01/01/2000','08/08/2000','16/03/2001','22/09/2001','01/06/2002','05/01/2002','26/01/2002','16/02/2002','09/03/2002','30/03/2002','20/04/2002','04/01/2000','11/08/2000','19/03/2001','25/09/2001','04/06/2002','08/01/2002','29/01/2002','19/02/2002','12/03/2002','13/09/2001','08/01/2000','15/08/2000','23/03/2001','29/09/2001','08/06/2002','12/01/2002','02/02/2002','23/02/2002','16/03/2002','06/04/2002','01/02/2000','01/02/2000','01/02/2000','02/11/2001','02/11/2001','02/11/2001')
Temperature <-c(17,20,24,19,17,19,23,26,19,19,21,15,23,18,22,22,23,18,19,26,21,22,23,27,19,19,21,23,24,25,26,29,30,21,25,24,23)
df<-data.frame(sample_station, Date_dmy, Temperature)
1) Use findInterval to look up the date in the season_start vector and extract the associated season_name.
library(dplyr)
# given Date class vector returns vector of season names
date2season <- function(date) {
season_start <- c("0101", "0321", "0621", "0923", "1221") # mmdd
season_name <- c("Summer", "Autumn", "Winter", "Spring", "Summer")
mmdd <- format(date, "%m%d")
season_name[findInterval(mmdd, season_start)] ##
}
df %>% mutate(season = date2season(as.Date(Date_dmy, "%d/%m/%Y")))
giving:
sample_station Date_dmy Temperature season
1 A 01/01/2000 17 Summer
2 A 08/08/2000 20 Winter
3 A 16/03/2001 24 Summer
4 A 22/09/2001 19 Winter
5 A 01/06/2002 17 Autumn
...snip...
1a) The last line in date2season, marked ##, could optionally be replaced with
season_name[(mmdd >= "0101") + (mmdd >= "0321") + (mmdd >= "0621") +
(mmdd >= "0923") + (mmdd >= "1221")]
and in that case you don't need the line defining season_start either.
2) An alternative is to use case_when:
df %>%
mutate(mmdd = format(as.Date(Date_dmy, "%d/%m/%Y"), "%m%d"),
season = case_when(
mmdd <= "0320" ~ "Summer",
mmdd <= "0620" ~ "Autumn",
mmdd <= "0922" ~ "Winter",
mmdd <= "1220" ~ "Spring",
TRUE ~ "Summer")) %>%
select(-mmdd)
A subject was measured at several time points over several days. I have a row "resptime_s" (time that the subject was answered the beep on his smartphone). Now I want to know the mean time between those (so between the rows of this column) with the night time taken out (nighttime is always from 22:30 p.m till 7:30 a.m). Take as example:
The R script:
setwd("C:/Users/Hanne/Desktop/")
dat <- read.csv(file="datnew2.csv", sep=";",header=TRUE)
rows <- c(1:388) #time points
columns <- c(2,60) # datum and time
nVariables = 2
newdata<-dat[rows,columns]
head(newdata)
fun2 <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
f <- cumsum(c(FALSE, diff(bt) < 0))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
bt <- as.POSIXct(paste(d, x))
res <- sapply(split(bt, f), function(b) c(0, difftime(b[-1], b[1])))
unname(unlist(res))
}
fun2(newdata$resptime_s)
But the result isn't correct.
And with:
dput(head(newdata, 30))
I obtained this output:
Using the different functions for working with time intervals in lubridate gives the most elegant and easy to understand solution.
library(tidyverse)
library(lubridate)
data <- tribble(
~time_point, ~beeptime,
1, "08:30",
2, "11:13",
3, "12:08",
4, "17:20",
5, "22:47",
6, "7:36",
7, "9:40"
) %>%
mutate(beeptime = as_datetime(hm(beeptime)))
1. Define the daytime interval
day <- interval(
as_datetime(hm("07:30")),
as_datetime(hm("22:30"))
)
2. Keep daytime beeps and estimate the time (interval) between them
# %--% is basically the same as interval() above.
data_interval <-
data %>%
filter(beeptime %within% day) %>%
mutate(beep_interval = lag(beeptime) %--% beeptime)
3. Take the average
# You can use as.numeric() to extract (e.g.) minutes, which you can
# just pass to mean().
data_interval$beep_interval %>%
as.numeric("minutes") %>%
abs() %>%
mean(na.rm = TRUE)
#> [1] 247.6
Try the following. It pastes a date that increments every time the next hour is less than the previous one. Then difftime works as expected.
fun <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
inx <- as.logical(cumsum(c(FALSE, diff(bt) < 0)))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
d[inx] <- d[inx] + 1
beeptime <- as.POSIXct(paste(d, x))
difftime(beeptime[-1], beeptime[1])
}
fun(newdata$beeptime)
#Time differences in hours
#[1] 2.716667 3.633333 8.833333 14.283333 23.100000 25.166667
Data.
newdata <-
structure(list(time_point = 1:7, beeptime = structure(1:7, .Label = c("08:30",
"11:13", "12:08", "17:20", "22:47", "7:36", "9:40"), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
Edit.
I believe that I have missunderstood the question. The OP does not want differences between the first hour and all others. What is needed is the differences restarting from zero every night.
If this is the case, the following function will do it.
fun2 <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
f <- cumsum(c(FALSE, diff(bt) < 0))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
bt <- as.POSIXct(paste(d, x))
res <- sapply(split(bt, f), function(b) c(0, difftime(b[-1], b[1])))
unname(unlist(res))
}
fun2(newdata$beeptime)
#[1] 0.000000 2.716667 3.633333 8.833333 14.283333 0.000000 2.066667
Another approach could be to convert beeptime in offset (in seconds) from midnight using lubridate package.
We can then write a function(s) to calculate difference in time excluding night time (22:30 - 7:30).
Before we start solution, lets have a look for offset in seconds from midnight for 7:30 and 22:30.
library(lubridate)
as.numeric(seconds(hm("7:30")))
# [1] 27000
as.numeric(seconds(hm("22:30")))
# [1] 81000
I have written two sets of function to calculate difference between two times:
# Function checks individual time and shifts them to night boundary. So that
# time over night can be excluded.
checkNightBoundry <- function(val){
if(val < 27000){
val = 27000
} else if(val > 81000) {
val = 81000
}
val
}
# Arguments are offset from midnight in seconds
# Calculate difference between two time, excluding midtime
calcDifftime <- function(currVal, prevVal){
diffTime <- 0
currVal = checkNightBoundry(currVal)
prevVal = checkNightBoundry(prevVal)
if(currVal > prevVal){
diffTime = currVal - prevVal
}else if(currVal < prevVal){
diffTime = (81000 - prevVal) + (currVal - 27000)
}
diffTime
}
Now, use above functions:
library(dplyr)
library(lubridate)
df %>% mutate(beeptimeOffset = as.numeric(seconds(hm(beeptime)))) %>%
mutate(diffTime = mapply(calcDifftime,
beeptimeOffset, lag(beeptimeOffset, default = first(beeptimeOffset)))/3600)
# timepoint beeptime beeptimeOffset(sec) diffTime(hrs)
# 1 1 08:30 30600 0.0000000
# 2 2 11:13 40380 2.7166667
# 3 3 12:08 43680 0.9166667
# 4 4 17:20 62400 5.2000000
# 5 5 22:47 82020 5.1666667
# 6 6 7:36 27360 0.1000000
# 7 7 9:40 34800 2.0666667
Data:
df <- read.table(text =
"timepoint beeptime
1 08:30
2 11:13
3 12:08
4 17:20
5 22:47
6 7:36
7 9:40",
header = TRUE, stringsAsFactors = FALSE)
Here is an example of my dataframe. I am working in R.
date name count
2016-11-12 Joe 5
2016-11-15 Bob 5
2016-06-15 Nick 12
2016-10-16 Cate 6
I would like to add a column to my data frame that will tell me the season that corresponds to the date. I would like it to look like this:
date name count Season
2016-11-12 Joe 5 Winter
2016-11-15 Bob 5 Winter
2017-06-15 Nick 12 Summer
2017-10-16 Cate 6 Fall
I have started some code:
startWinter <- c(month.name[1], month.name[12], month.name[11])
startSummer <- c(month.name[5], month.name[6], month.name[7])
startSpring <- c(month.name[2], month.name[3], month.name[4])
# create a function to find the correct season based on the month
MonthSeason <- function(Month) {
# !is.na()
# ignores values with NA
# match()
# returns a vector of the positions of matches
# If the starting month matches a spring season, print "Spring". If the starting month matches a summer season, print "Summer" etc.
ifelse(!is.na(match(Month, startSpring)),
return("spring"),
return(ifelse(!is.na(match(Month, startWinter)),
"winter",
ifelse(!is.na(match(Month, startSummer)),
"summer","fall"))))
}
This code gives me the season for a month. Im not sure if I am going about this problem in the right way. Can anyone help me out?
Thanks!
There are a couple of hacks, and their usability depends on whether you want to use meteorological or astronomical seasons. I'll offer both, I think they offer sufficient flexibility.
I'm going to use your second data provided, since it provides more than just "Winter".
txt <- "date name count
2016-11-12 Joe 5
2016-11-15 Bob 5
2017-06-15 Nick 12
2017-10-16 Cate 6"
dat <- read.table(text = txt, header = TRUE, stringsAsFactors = FALSE)
dat$date <- as.Date(dat$date)
The quickest method works well when seasons are defined strictly by month.
metseasons <- c(
"01" = "Winter", "02" = "Winter",
"03" = "Spring", "04" = "Spring", "05" = "Spring",
"06" = "Summer", "07" = "Summer", "08" = "Summer",
"09" = "Fall", "10" = "Fall", "11" = "Fall",
"12" = "Winter"
)
metseasons[format(dat$date, "%m")]
# 11 11 06 10
# "Fall" "Fall" "Summer" "Fall"
If you choose to use date ranges for your seasons that are not defined by month start/stop such as the astronomical seasons, here's another 'hack':
astroseasons <- as.integer(c("0000", "0320", "0620", "0922", "1221", "1232"))
astroseasons_labels <- c("Winter", "Spring", "Summer", "Fall", "Winter")
If you use proper Date or POSIX types, then you are including years, which makes things a little less-generic. One might think of using julian dates, but during leap years this produces anomalies. So, with the assumption that Feb 28 is never a seasonal boundary, I'm "numericizing" the month-day. Even though R does do character-comparisons just fine, cut expects numbers, so we convert them to integers.
Two safe-guards: because cut is either right-open (and left-closed) or right-closed (and left-open), then our two book-ends need to extend beyond the legal dates, ergo "0000" and "1232". There are other techniques that could work equally well here (e.g., using -Inf and Inf, post-integerization).
astroseasons_labels[ cut(as.integer(format(dat$date, "%m%d")), astroseasons, labels = FALSE) ]
# [1] "Fall" "Fall" "Spring" "Fall"
Notice that the third date is in Spring when using astronomical seasons and Summer otherwise.
This solution can easily be adjusted to account for the Southern hemisphere or other seasonal preferences/beliefs.
Edit: motivated by #Kristofersen's answer (thanks), I looked into benchmarks. lubridate::month uses a POSIXct-to-POSIXlt conversion to extract the month, which can be over 10x faster than my format(x, "%m") method. As such:
metseasons2 <- c(
"Winter", "Winter",
"Spring", "Spring", "Spring",
"Summer", "Summer", "Summer",
"Fall", "Fall", "Fall",
"Winter"
)
Noting that as.POSIXlt returns 0-based months, we add 1:
metseasons2[ 1 + as.POSIXlt(dat$date)$mon ]
# [1] "Fall" "Fall" "Summer" "Fall"
Comparison:
library(lubridate)
library(microbenchmark)
set.seed(42)
x <- Sys.Date() + sample(1e3)
xlt <- as.POSIXlt(x)
microbenchmark(
metfmt = metseasons[ format(x, "%m") ],
metlt = metseasons2[ 1 + xlt$mon ],
astrofmt = astroseasons_labels[ cut(as.integer(format(x, "%m%d")), astroseasons, labels = FALSE) ],
astrolt = astroseasons_labels[ cut(100*(1+xlt$mon) + xlt$mday, astroseasons, labels = FALSE) ],
lubridate = sapply(month(x), seasons)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# metfmt 1952.091 2135.157 2289.63943 2212.1025 2308.1945 3748.832 100
# metlt 14.223 16.411 22.51550 20.0575 24.7980 68.924 100
# astrofmt 2240.547 2454.245 2622.73109 2507.8520 2674.5080 3923.874 100
# astrolt 42.303 54.702 72.98619 66.1885 89.7095 163.373 100
# lubridate 5906.963 6473.298 7018.11535 6783.2700 7508.0565 11474.050 100
So the methods using as.POSIXlt(...)$mon are significantly faster. (#Kristofersen's answer could be improved by vectorizing it, perhaps with ifelse, but that still won't compare to the speed of the vector lookups with or without cut.)
You can do this pretty quickly with lubridate and a function to change the month number into a season.
library(lubridate)
seasons = function(x){
if(x %in% 2:4) return("Spring")
if(x %in% 5:7) return("Summer")
if(x %in% 8:10) return("Fall")
if(x %in% c(11,12,1)) return("Winter")
}
dat$Season = sapply(month(dat$date), seasons)
> dat
date name count Season
1 2016-11-12 Joe 5 Winter
2 2016-11-15 Bob 5 Winter
3 2016-06-15 Nick 12 Summer
4 2016-10-16 Cate 6 Fall
if your data is df:
# create dataframe for month and corresponding season
dfSeason <- data.frame(season = c(rep("Winter", 3), rep("Summer", 3),
rep("Spring", 3), rep("Fall", 3)),
month = month.name[c(11,12,1, 5:7, 2:4, 8:10)],
stringsAsFactors = F)
# make date as date
df$data <- as.Date(df$date)
# match the month of the date in df (format %B) with month in season
# then use it to index the season of dfSeason
df$season <- dfSeason$season[match(format(df$data, "%B"), dfSeason$month)]
I have a dataframe with a number of accounts, their status and the start and endtime for that status. I would like to report on the number of accounts in each of these statuses over a date range. The data looks like the df below, with the resulting report. (Actual data contains more state values. N/A values are shown with a dummy date far in the future.)
df <- data.frame(account = c(1,1,2,3),
state = c("Open","Closed","Open","Open"),
startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"),
enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
)
report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6),
number.open = c(2,2,2,1,1,1)
)
I have looked at options involving rowwise() and mutate from dplyr and foverlaps from data.table, but haven't been able to code it up so it works.
(See Checking if Date is Between two Dates in R)
We can use sapply to do this for us:
report$NumberOpen <-
sapply(report$date, function(x)
sum(as.Date(df1$startdate) < as.Date(x) &
as.Date(df1$enddate) > as.Date(x) &
df1$state == 'Open'))
# report
# date NumberOpen
# 1 2016-04-01 2
# 2 2016-04-02 2
# 3 2016-04-03 2
# 4 2016-04-04 1
# 5 2016-04-05 1
# 6 2016-04-06 1
data
df1 <- data.frame(account = c(1,1,2,3),
state = c("Open","Closed","Open","Open"),
startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"),
enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
)
report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6)
)
This question already has answers here:
How to subtract months from a date in R?
(6 answers)
Closed 4 years ago.
I am trying to add a month to a date i have. But then its not possible in a straight manner so far. Following is what i tried.
d <- as.Date("2004-01-31")
d + 60
# [1] "2004-03-31"
Adding wont help as the month wont be overlapped.
seq(as.Date("2004-01-31"), by = "month", length = 2)
# [1] "2004-01-31" "2004-03-02"
Above might work , but again its not straight forward.
Also its also adding 30 days or something to the date which has issues like the below
seq(as.Date("2004-01-31"), by = "month", length = 10)
# [1] "2004-01-31" "2004-03-02" "2004-03-31" "2004-05-01" "2004-05-31" "2004-07-01" "2004-07-31" "2004-08-31" "2004-10-01" "2004-10-31"
In the above , for the first 2 dates , month haven’t changed.
Also the following approach also failed for month but was success for year
d <- as.POSIXlt(as.Date("2010-01-01"))
d$year <- d$year +1
d
# [1] "2011-01-01 UTC"
d <- as.POSIXlt(as.Date("2010-01-01"))
d$month <- d$month +1
d
Error in format.POSIXlt(x, usetz = TRUE) : invalid 'x' argument
What is the right method to do this ?
Function %m+% from lubridate adds one month without exceeding last day of the new month.
library(lubridate)
(d <- ymd("2012-01-31"))
1 parsed with %Y-%m-%d
[1] "2012-01-31 UTC"
d %m+% months(1)
[1] "2012-02-29 UTC"
It is ambiguous when you say "add a month to a date".
Do you mean
add 30 days?
increase the month part of the date by 1?
In both cases a whole package for a simple addition seems a bit exaggerated.
For the first point, of course, the simple + operator will do:
d=as.Date('2010-01-01')
d + 30
#[1] "2010-01-31"
As for the second I would just create a one line function as simple as that (and with a more general scope):
add.months= function(date,n) seq(date, by = paste (n, "months"), length = 2)[2]
You can use it with arbitrary months, including negative:
add.months(d, 3)
#[1] "2010-04-01"
add.months(d, -3)
#[1] "2009-10-01"
Of course, if you want to add only and often a single month:
add.month=function(date) add.months(date,1)
add.month(d)
#[1] "2010-02-01"
If you add one month to 31 of January, since 31th February is meaningless, the best to get the job done is to add the missing 3 days to the following month, March. So correctly:
add.month(as.Date("2010-01-31"))
#[1] "2010-03-03"
In case, for some very special reason, you need to put a ceiling to the last available day of the month, it's a bit longer:
add.months.ceil=function (date, n){
#no ceiling
nC=add.months(date, n)
#ceiling
day(date)=01
C=add.months(date, n+1)-1
#use ceiling in case of overlapping
if(nC>C) return(C)
return(nC)
}
As usual you could add a single month version:
add.month.ceil=function(date) add.months.ceil(date,1)
So:
d=as.Date('2010-01-31')
add.month.ceil(d)
#[1] "2010-02-28"
d=as.Date('2010-01-21')
add.month.ceil(d)
#[1] "2010-02-21"
And with decrements:
d=as.Date('2010-03-31')
add.months.ceil(d, -1)
#[1] "2010-02-28"
d=as.Date('2010-03-21')
add.months.ceil(d, -1)
#[1] "2010-02-21"
Besides you didn't tell if you were interested to a scalar or vector solution. As for the latter:
add.months.v= function(date,n) as.Date(sapply(date, add.months, n), origin="1970-01-01")
Note: *apply family destroys the class data, that's why it has to be rebuilt.
The vector version brings:
d=c(as.Date('2010/01/01'), as.Date('2010/01/31'))
add.months.v(d,1)
[1] "2010-02-01" "2010-03-03"
Hope you liked it))
Vanilla R has a naive difftime class, but the Lubridate CRAN package lets you do what you ask:
require(lubridate)
d <- ymd(as.Date('2004-01-01')) %m+% months(1)
d
[1] "2004-02-01"
Hope that helps.
The simplest way is to convert Date to POSIXlt format.
Then perform the arithmetic operation as follows:
date_1m_fwd <- as.POSIXlt("2010-01-01")
date_1m_fwd$mon <- date_1m_fwd$mon +1
Moreover, incase you want to deal with Date columns in data.table, unfortunately, POSIXlt format is not supported.
Still you can perform the add month using basic R codes as follows:
library(data.table)
dt <- as.data.table(seq(as.Date("2010-01-01"), length.out=5, by="month"))
dt[,shifted_month:=tail(seq(V1[1], length.out=length(V1)+3, by="month"),length(V1))]
Hope it helps.
"mondate" is somewhat similar to "Date" except that adding n adds n months rather than n days:
> library(mondate)
> d <- as.Date("2004-01-31")
> as.mondate(d) + 1
mondate: timeunits="months"
[1] 2004-02-29
Here's a function that doesn't require any packages to be installed. You give it a Date object (or a character that it can convert into a Date), and it adds n months to that date without changing the day of the month (unless the month you land on doesn't have enough days in it, in which case it defaults to the last day of the returned month). Just in case it doesn't make sense reading it, there are some examples below.
Function definition
addMonth <- function(date, n = 1){
if (n == 0){return(date)}
if (n %% 1 != 0){stop("Input Error: argument 'n' must be an integer.")}
# Check to make sure we have a standard Date format
if (class(date) == "character"){date = as.Date(date)}
# Turn the year, month, and day into numbers so we can play with them
y = as.numeric(substr(as.character(date),1,4))
m = as.numeric(substr(as.character(date),6,7))
d = as.numeric(substr(as.character(date),9,10))
# Run through the computation
i = 0
# Adding months
if (n > 0){
while (i < n){
m = m + 1
if (m == 13){
m = 1
y = y + 1
}
i = i + 1
}
}
# Subtracting months
else if (n < 0){
while (i > n){
m = m - 1
if (m == 0){
m = 12
y = y - 1
}
i = i - 1
}
}
# If past 28th day in base month, make adjustments for February
if (d > 28 & m == 2){
# If it's a leap year, return the 29th day
if ((y %% 4 == 0 & y %% 100 != 0) | y %% 400 == 0){d = 29}
# Otherwise, return the 28th day
else{d = 28}
}
# If 31st day in base month but only 30 days in end month, return 30th day
else if (d == 31){if (m %in% c(1, 3, 5, 7, 8, 10, 12) == FALSE){d = 30}}
# Turn year, month, and day into strings and put them together to make a Date
y = as.character(y)
# If month is single digit, add a leading 0, otherwise leave it alone
if (m < 10){m = paste('0', as.character(m), sep = '')}
else{m = as.character(m)}
# If day is single digit, add a leading 0, otherwise leave it alone
if (d < 10){d = paste('0', as.character(d), sep = '')}
else{d = as.character(d)}
# Put them together and convert return the result as a Date
return(as.Date(paste(y,'-',m,'-',d, sep = '')))
}
Some examples
Adding months
> addMonth('2014-01-31', n = 1)
[1] "2014-02-28" # February, non-leap year
> addMonth('2014-01-31', n = 5)
[1] "2014-06-30" # June only has 30 days, so day of month dropped to 30
> addMonth('2014-01-31', n = 24)
[1] "2016-01-31" # Increments years when n is a multiple of 12
> addMonth('2014-01-31', n = 25)
[1] "2016-02-29" # February, leap year
Subtracting months
> addMonth('2014-01-31', n = -1)
[1] "2013-12-31"
> addMonth('2014-01-31', n = -7)
[1] "2013-06-30"
> addMonth('2014-01-31', n = -12)
[1] "2013-01-31"
> addMonth('2014-01-31', n = -23)
[1] "2012-02-29"
addedMonth <- seq(as.Date('2004-01-01'), length=2, by='1 month')[2]
addedQuarter <- seq(as.Date('2004-01-01'), length=2, by='1 quarter')[2]
I turned antonio's thoughts into a specific function:
library(DescTools)
> AddMonths(as.Date('2004-01-01'), 1)
[1] "2004-02-01"
> AddMonths(as.Date('2004-01-31'), 1)
[1] "2004-02-29"
> AddMonths(as.Date('2004-03-30'), -1)
[1] "2004-02-29"