Subset between two dates [duplicate] - r

I am working with daily returns from a Brazilian Index (IBOV) since 1993, I am trying to figure out the best way to subset for periods between 2 dates.
The data frame (IBOV_RET) is as follows :
head(IBOV_RET)
DATE 1D_RETURN
1 1993-04-28 -0.008163265
2 1993-04-29 -0.024691358
3 1993-04-30 0.016877637
4 1993-05-03 0.000000000
5 1993-05-04 0.033195021
6 1993-05-05 -0.012048193
...
I set 2 variables DATE1 and DATE2 as dates
DATE1 <- as.Date("2014-04-01")
DATE2 <- as.Date("2014-05-05")
I was able to create a new subset using this code:
TEST <- IBOV_RET[IBOV_RET$DATE >= DATE1 & IBOV_RET$DATE <= DATE2,]
It worked, but I was wondering if there is a better way to subset the data between 2 date, maybe using subset.

As already pointed out by #MrFlick, you dont get around the basic logic of subsetting. One way to make it easier for you to subset your specific data.frame would be to define a function that takes two inputs like DATE1 and DATE2 in your example and then returns the subset of IBOV_RET according to those subset parameters.
myfunc <- function(x,y){IBOV_RET[IBOV_RET$DATE >= x & IBOV_RET$DATE <= y,]}
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
Test <- myfunc(DATE1,DATE2)
#> Test
# DATE X1D_RETURN
#2 1993-04-29 -0.02469136
#3 1993-04-30 0.01687764
#4 1993-05-03 0.00000000
#5 1993-05-04 0.03319502
You can also enter the specific dates directly into myfunc:
myfunc(as.Date("1993-04-29"),as.Date("1993-05-04")) #will produce the same result

You can use the subset() function with the & operator:
subset(IBOV_RET, DATE1> XXXX-XX-XX & DATE2 < XXXX-XX-XX)
Updating for a more "tidyverse-oriented" approach:
IBOV_RET %>%
filter(DATE1 > XXXX-XX-XX, DATE2 < XXXX-XX-XX) #comma same as &

There is no real other way to extract date ranges. The logic is the same as extracting a range of numeric values as well, you just need to do the explicit Date conversion as you've done. You can make your subsetting shorter as you would with any other subsetting task with subset or with. You can break ranges into intervals with cut (there is a specific cut.Date overload). But base R does not have any way to specify Date literals so you cannot avoid the conversion. I can't imagine what other sort of syntax you may have had in mind.

What about:
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
# creating a data range with the start and end date:
dates <- seq(DATE1, DATE2, by="days")
IBOV_RET <- subset(IBOV_RET, DATE %in% dates)

I believe lubridate could help here;
daterange <- interval(DATE1, DATE2)
TEST <- IBOV_RET[which(Date %within% daterange),]

I sort of love dplyr package
So if you
>library("dplyr")
and then, as you did:
>Date1<-as.Date("2014-04-01")
>Date2<-as.Date("2014-05-05")
Finally
>test<-filter(IBOV_RET, filter(DATE>Date1 & DATE<Date2))

You can use R's between() function after simply converting the strings to dates:
df %>%
filter(between(date_column, as.Date("string-date-lower-bound"), as.Date("string-date-upper-bound")))

Test = IBOV_RET[IBOV_RET$Date => "2014-04-01" | IBOV_RET$Date <= "1993-05-04"]
Here I am using "or" function | where data should be greater than particular data or data should be less than or equal to this date.

Related

How to split a data frame in R based on date when multiple rows have identical date stamp [duplicate]

I am working with daily returns from a Brazilian Index (IBOV) since 1993, I am trying to figure out the best way to subset for periods between 2 dates.
The data frame (IBOV_RET) is as follows :
head(IBOV_RET)
DATE 1D_RETURN
1 1993-04-28 -0.008163265
2 1993-04-29 -0.024691358
3 1993-04-30 0.016877637
4 1993-05-03 0.000000000
5 1993-05-04 0.033195021
6 1993-05-05 -0.012048193
...
I set 2 variables DATE1 and DATE2 as dates
DATE1 <- as.Date("2014-04-01")
DATE2 <- as.Date("2014-05-05")
I was able to create a new subset using this code:
TEST <- IBOV_RET[IBOV_RET$DATE >= DATE1 & IBOV_RET$DATE <= DATE2,]
It worked, but I was wondering if there is a better way to subset the data between 2 date, maybe using subset.
As already pointed out by #MrFlick, you dont get around the basic logic of subsetting. One way to make it easier for you to subset your specific data.frame would be to define a function that takes two inputs like DATE1 and DATE2 in your example and then returns the subset of IBOV_RET according to those subset parameters.
myfunc <- function(x,y){IBOV_RET[IBOV_RET$DATE >= x & IBOV_RET$DATE <= y,]}
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
Test <- myfunc(DATE1,DATE2)
#> Test
# DATE X1D_RETURN
#2 1993-04-29 -0.02469136
#3 1993-04-30 0.01687764
#4 1993-05-03 0.00000000
#5 1993-05-04 0.03319502
You can also enter the specific dates directly into myfunc:
myfunc(as.Date("1993-04-29"),as.Date("1993-05-04")) #will produce the same result
You can use the subset() function with the & operator:
subset(IBOV_RET, DATE1> XXXX-XX-XX & DATE2 < XXXX-XX-XX)
Updating for a more "tidyverse-oriented" approach:
IBOV_RET %>%
filter(DATE1 > XXXX-XX-XX, DATE2 < XXXX-XX-XX) #comma same as &
There is no real other way to extract date ranges. The logic is the same as extracting a range of numeric values as well, you just need to do the explicit Date conversion as you've done. You can make your subsetting shorter as you would with any other subsetting task with subset or with. You can break ranges into intervals with cut (there is a specific cut.Date overload). But base R does not have any way to specify Date literals so you cannot avoid the conversion. I can't imagine what other sort of syntax you may have had in mind.
What about:
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
# creating a data range with the start and end date:
dates <- seq(DATE1, DATE2, by="days")
IBOV_RET <- subset(IBOV_RET, DATE %in% dates)
I believe lubridate could help here;
daterange <- interval(DATE1, DATE2)
TEST <- IBOV_RET[which(Date %within% daterange),]
I sort of love dplyr package
So if you
>library("dplyr")
and then, as you did:
>Date1<-as.Date("2014-04-01")
>Date2<-as.Date("2014-05-05")
Finally
>test<-filter(IBOV_RET, filter(DATE>Date1 & DATE<Date2))
You can use R's between() function after simply converting the strings to dates:
df %>%
filter(between(date_column, as.Date("string-date-lower-bound"), as.Date("string-date-upper-bound")))
Test = IBOV_RET[IBOV_RET$Date => "2014-04-01" | IBOV_RET$Date <= "1993-05-04"]
Here I am using "or" function | where data should be greater than particular data or data should be less than or equal to this date.

How to aggregate using water years (oct 1 2008- sept 31 2009)

I have data measuring precipitation daily using R. My dates are in format 2008-01-01 and range for 10 years. I am trying to aggregate from 2008-10-01 to 2009-09-31 but I am not sure how. Is there a way in aggregate to set a start date of aggregation and group.
My current code is
data<- aggregate(data$total_snow_cm, by=list(data$year), FUN = 'sum')
but this output gives me a sum total of the snowfall for each year from jan - dec but I want it to include oct / 08 to sept / 09.
Assuming your data are in long format, I'd do something like this:
library(tidyverse)
#make sure R knows your dates are dates - you mention they're 'yyyy-mm-dd', so
yourdataframe <- yourdataframe %>%
mutate(yourcolumnforprecipdate = ymd(yourcolumnforprecipdate)
#in this script or another, define a water year function
water_year <- function(date) {
ifelse(month(date) < 10, year(date), year(date)+1)}
#new wateryear column for your data, using your new function
yourdataframe <- yourdataframe %>%
mutate(wateryear = water_year(yourcolumnforprecipdate)
#now group by water year (and location if there's more than one)
#and sum and create new data.frame
wy_sums <- yourdataframe %>% group_by(locationcolumn, wateryear) %>%
summarize(wy_totalprecip = sum(dailyprecip))
For more info, read up on the tidyverse 's great sublibrary called lubridate -
where the ymd() function is from. There are others like ymd_hms(). mutate() is from the tidyverse's dplyr libary. Both libraries are extremely useful!
I'd like to give the actual answer to the question, where the aggregate() way was asked.
You may use with() to wrap the data specification around aggregate(). In the with() you can define date intervals as you can with numbers.
df1.agg <- with(df1[as.Date("2008-10-01") <= df1$year & df1$year <= as.Date("2009-09-30"), ],
aggregate(total_snow_cm, by=list(year), FUN=sum))
Another way is to use aggregate()'s formula interface, where data and, hence, also the interval can be specified inside the aggregate() call.
df1.agg <- aggregate(total_snow_cm ~ year,
data=df1[as.Date("2008-10-01") <= df1$year &
df1$year <= as.Date("2009-09-30"), ], FUN=sum)
Result
head(df1.agg)
# year total_snow_cm
# 1 2008-10-01 171
# 2 2008-10-02 226
# 3 2008-10-03 182
# 4 2008-10-04 129
# 5 2008-10-05 135
# 6 2008-10-06 222
Data
set.seed(42)
df1 <- data.frame(total_snow_cm=sample(120:240, 4018, replace=TRUE),
year=seq(as.Date("2000-01-01"),as.Date("2010-12-31"), by="day"))

Trying to count rows with non-missing date/category observations: why is this code not working?

I've got a data frame with five categorical variables and two date variables.
I'd like to get a count of the observations for which none of the categorical variables is missing AND for which the difference between the dates is less than or equal to six months. So for this data frame, it would be a count of 1 as only one observation (row 1) meets the criteria.
The code I've tried so far works on the minimal working example but doesn't work when I run it on my actual data set. When I take the code apart the bits and pieces work (eg as.numeric(difftime(white$dnf_DateDeath, white$RecruitmentFinal, units = "days")) <= 182.52) but when together as below I get [1] NA. I have no idea why.
Is there a way of building an ifelse() tree, so that the expressions might get evaluated step-wise? Any help would be much appreciated.
Starting point:
df <-
data.frame(sports=c(1,NA,1,1),car=c(1,NA,NA,1),hobbies=c(1,NA,1,1),
home=c(1,NA,NA,1),office=c(1,1,NA,1), start_date=c("01/01/2016",
"01/01/2016","01/01/2016","01/01/2016"),
leave_date=c("01/04/2016","01/03/2016",NA,"01/12/2016"))
I've tried using:
library(lubridate)
sum(!is.na(df$sports) &!is.na(df$hobbies) & !is.na(df$car) &
!is.na(df$home) & !is.na(df$office) &
as.period(interval(df$start_date, df$leave_date)) <= months(6))
And I've also tried:
sum(!is.na(df$sports) &!is.na(df$hobbies) & !is.na(df$car) &
!is.na(df$home) & !is.na(df$office) &
as.numeric(difftime(df$leave_date, df$start_date, units = "days"))
<= 182.52)
The following seems to work as expected.
df2 <- df[complete.cases(df), ]
df2[abs(difftime(df2$start_date, df2$leave_date, unit = "days")) <= 365.25/2, ]
# sports car hobbies home office start_date leave_date
#1 1 1 1 1 1 01/01/2016 01/04/2016
EDIT.
If you want to use package lubridate for date arithmetics, you could do
library(lubridate)
inx <- dmy(df2$start_date) + months(6) > dmy(df2$leave_date)
df2[inx, ]
# sports car hobbies home office start_date leave_date
#1 1 1 1 1 1 01/01/2016 01/04/2016

How do I create a string containing quotes and then parse and evaluate it?

I have a data frame with 3 years worth of sales data that I'm trying to convert to a time series. Manually creating subsets for each of the 36 months:
mydfJan2011 <- subset(myDataFrame,
as.Date("2011-01-01") <= myDataFrame$Dates &
myDataFrame$Dates <= as.Date("2011-01-31"))
...
mydfDec2013 <- subset(myDataFrame,
as.Date("2013-12-01") <= myDataFrame$Dates &
myDataFrame$Dates <= as.Date("2013-12-31"))
and then summing them up and putting them into a vector
counts[1] <- sum(mydfJan2011$itemsSold)
...
counts[36] <- sum(mydfDec2013$itemsSold))
to get the values for the time series works fine, but I'd like to make it a little more automatic as I have to create more than one time series, so I'm trying to turn it into a loop.
In order to do that, I need to create a string with a subset command like this:
"subset(myDataFrame,
as.Date("2011-01-01") <= myDataFrame$Dates &
myDataFrame$Dates <= as.Date("2011-01-31"))"
But when I use paste, the result is this:
myString
>"subset(myDataFrame, as.Date(\"2011-02-01\") <= myDataFrame$Dates & myDataFrame$Dates <= as.Date(\"2011-02-28\"))"
and
eval(parse(text = myString))
results in the following error message:
Error in charToDate(x) :
character string is not in a standard unambiguous format
whereas just typing in the command (without escapes) results in the subset I'm trying to create.
I've tried playing around with single and double quotes, substitute and deparse, but none of it results in any kind of subset of my data frame.
Any suggestions?
Even another way of splitting up the data by month and summing it up would be welcome.
Thanks,
Signe
Here is a solution using tapply:
with(sales, tapply(itemsSold, substr(Dates, 1, 7), sum))
Produces monthly sums (I limited my data to 9 months for illustrative purposes, but this extends to longer periods):
2011-01 2011-02 2011-03 2011-04 2011-05 2011-06 2011-07 2011-08 2011-09
1592.097 1468.427 1594.386 1563.014 1595.489 1560.361 1553.128 1663.705 1325.519
tapply computes the sum of values in a vector (sales$sales) grouped by the values of another vector (substr(sales$date, 1, 7), which is basically "yyyy-mm"). with allows me to avoid me typing sales$ repeatedly. You should almost never have to use eval(parse(...)). There is almost always a better, faster way to do it without resorting to that.
And here is the data I used:
set.seed(1)
sales <- data.frame(Dates=seq(as.Date("2011-01-01"), as.Date("2011-09-30"), by="+1 day"))
sales$itemsSold <- runif(nrow(sales), 1, 100)
For reference, there are also several 3rd party packages that simplify this type of computation (see data.table, dplyr).
Here's a data.table approach that aggregates by year and month, using the first of the month as the respective group label:
library(data.table)
##
mDt <- Dt[
,list(monthSold=sum(itemsSold)),
keyby=list(mDay=as.Date(paste0(
year(Dates),"-",month(Dates),"-01")))]
##
R> head(mDt)
mDay monthSold
1: 2012-01-01 179
2: 2012-02-01 128
3: 2012-03-01 152
4: 2012-04-01 160
5: 2012-05-01 152
6: 2012-06-01 141
Data:
set.seed(123)
Dt <- data.table(
Dates=seq.Date(
from=as.Date("2012-01-01"),
to=as.Date("2014-12-31"),
by="day"),
itemsSold=rpois(1096,5))

Subset a dataframe between 2 dates

I am working with daily returns from a Brazilian Index (IBOV) since 1993, I am trying to figure out the best way to subset for periods between 2 dates.
The data frame (IBOV_RET) is as follows :
head(IBOV_RET)
DATE 1D_RETURN
1 1993-04-28 -0.008163265
2 1993-04-29 -0.024691358
3 1993-04-30 0.016877637
4 1993-05-03 0.000000000
5 1993-05-04 0.033195021
6 1993-05-05 -0.012048193
...
I set 2 variables DATE1 and DATE2 as dates
DATE1 <- as.Date("2014-04-01")
DATE2 <- as.Date("2014-05-05")
I was able to create a new subset using this code:
TEST <- IBOV_RET[IBOV_RET$DATE >= DATE1 & IBOV_RET$DATE <= DATE2,]
It worked, but I was wondering if there is a better way to subset the data between 2 date, maybe using subset.
As already pointed out by #MrFlick, you dont get around the basic logic of subsetting. One way to make it easier for you to subset your specific data.frame would be to define a function that takes two inputs like DATE1 and DATE2 in your example and then returns the subset of IBOV_RET according to those subset parameters.
myfunc <- function(x,y){IBOV_RET[IBOV_RET$DATE >= x & IBOV_RET$DATE <= y,]}
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
Test <- myfunc(DATE1,DATE2)
#> Test
# DATE X1D_RETURN
#2 1993-04-29 -0.02469136
#3 1993-04-30 0.01687764
#4 1993-05-03 0.00000000
#5 1993-05-04 0.03319502
You can also enter the specific dates directly into myfunc:
myfunc(as.Date("1993-04-29"),as.Date("1993-05-04")) #will produce the same result
You can use the subset() function with the & operator:
subset(IBOV_RET, DATE1> XXXX-XX-XX & DATE2 < XXXX-XX-XX)
Updating for a more "tidyverse-oriented" approach:
IBOV_RET %>%
filter(DATE1 > XXXX-XX-XX, DATE2 < XXXX-XX-XX) #comma same as &
There is no real other way to extract date ranges. The logic is the same as extracting a range of numeric values as well, you just need to do the explicit Date conversion as you've done. You can make your subsetting shorter as you would with any other subsetting task with subset or with. You can break ranges into intervals with cut (there is a specific cut.Date overload). But base R does not have any way to specify Date literals so you cannot avoid the conversion. I can't imagine what other sort of syntax you may have had in mind.
What about:
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
# creating a data range with the start and end date:
dates <- seq(DATE1, DATE2, by="days")
IBOV_RET <- subset(IBOV_RET, DATE %in% dates)
I believe lubridate could help here;
daterange <- interval(DATE1, DATE2)
TEST <- IBOV_RET[which(Date %within% daterange),]
I sort of love dplyr package
So if you
>library("dplyr")
and then, as you did:
>Date1<-as.Date("2014-04-01")
>Date2<-as.Date("2014-05-05")
Finally
>test<-filter(IBOV_RET, filter(DATE>Date1 & DATE<Date2))
You can use R's between() function after simply converting the strings to dates:
df %>%
filter(between(date_column, as.Date("string-date-lower-bound"), as.Date("string-date-upper-bound")))
Test = IBOV_RET[IBOV_RET$Date => "2014-04-01" | IBOV_RET$Date <= "1993-05-04"]
Here I am using "or" function | where data should be greater than particular data or data should be less than or equal to this date.

Resources