Filter Data by Seasonal Ranges Over Several Years Based on Month and Day Column in R Studio - r

I am trying to filter a large dataset to contain results between a range of days and months over several years to evaluate seasonal objectives. My season is defined from 15 March through 15 September. I can't figure out how to filter the days so that they are only applied to March and September and not the other months within the range. My dataframe is very large and contains proprietary information, but I think the most important information is that the dates are describes by columns: SampleDate (date formatted as %y%m%d), day (numeric), and month (numeric).
I have tried filtering using multiple conditions like so:
S1 <- S1 %>%
filter((S1$month >= 3 & S1$day >=15) , (S1$month<=9 & S1$day<=15 ))
I also attempted to set ranges using between for every year that I have data with no luck:
S1 %>% filter(between(SampleDate, as.Date("2010-03-15"), as.Date("2010-09-15") &
as.Date("2011-03-15"), as.Date("2011-09-15")&
as.Date("2012-03-15"), as.Date("2012-09-15")&
as.Date("2013-03-15"), as.Date("2013-09-15")&
as.Date("2014-03-15"), as.Date("2014-09-15")&
as.Date("2015-03-15"), as.Date("2015-09-15")&
as.Date("2016-03-15"), as.Date("2016-09-15")&
as.Date("2017-03-15"), as.Date("2017-09-15")&
as.Date("2018-03-15"), as.Date("2018-09-15")))
I am pretty new to R and can't find any solution online. I know there must be a somewhat simple way to do this! Any help is greatly appreciated!

Maybe something like this:
library(data.table)
df <- setDT(df)
# convert a date like this '2020-01-01' into this '01-01'
df[,`:=`(month_day = str_sub(date, 6, 10))]
df[month_day >= '03-15' & month_day <= '09-15']

Related

R function for finding difference of dates with exceptions?

I was wondering if there was a function for finding the difference between and issue date and a maturity date, but with 2 maturity date. For example, I want to prioritize the dates in maturity date source 1 and subtract it from the issue date to find the difference. Then, if my dataset is missing dates from maturity date source 1, such as in lines 5 & 6, I want to use dates from maturity date source 2 to fill in the rest. I have tried the code below, but am unsure how to incorporate the data from maturity date source 2 without changing everything else. I have attached a picture for reference. Thank you in advance.
df$Maturity_Date_source_1 <- as.Date(c(df$Maturity_Date_source_1))
df$Issue_Date <- as.Date(c(df$Issue_Date))
df$difference <- (df$Maturity_Date_source_1 - df$Issue_Date) / 365.25
df$difference <- as.numeric(c(df$difference))
An option would be to coalesce the columns and then do the difference
library(dplyr)
df %>%
mutate(difference = as.numeric((coalesce(Maturity_Date_source_1,
Maturity_Date_source_2) - Issue_Date)/365.25))

Calculate mean of one column for 14 rows before certain row, as identified by date for each group (year)

I would like to calculate mean of Mean.Temp.c. before certain date, such as 1963-03-23 as showed in date2 column in this example. This is time when peak snowmelt runoff occurred in 1963 in my area. I want to know 10 day’s mean temperature before this date (ie., 1963-03-23). How to do it? I have 50 years data, and each year peak snowmelt date is different.
example data
You can try:
library(dplyr)
df %>%
mutate(date2 = as.Date(as.character(date2)),
ten_day_mean = mean(Mean.Temp.c[between(date2, "1963-03-14", "1963-03-23")]))
In this case the desired mean would populate the whole column.
Or with data.table:
library(data.table)
setDT(df)[between(as.Date(as.character(date2)), "1963-03-14", "1963-03-23"), ten_day_mean := mean(Mean.Temp.c)]
In the latter case you'd get NA for those days that are not relevant for your date range.
Supposing date2 is a Date field and your data.frame is called x:
start_date <- as.Date("1963-03-23")-10
end_date <- as.Date("1963-03-23")
mean(x$Mean.Temp.c.[x$date2 >= start_date & x$date2 <= end_date])
Now, if you have multiple years of interest, you could wrap this code within a for loop (or [s|l]apply) taking elements from a vector of dates.

Removing specific times and days every week from time dataframe

Been learning R for a couple months and stumbled across an issue that I can't seem to find yet on stackoverflow. I have a timeframe dataset dictated by:
ts <- seq.POSIXt(as.POSIXlt("2014-08-01 15:00"), as.POSIXlt("2017-08-04 19:33"), by="min")
ts <- format.POSIXct(ts,'%Y%m%d %H%M')
df <- data.frame(timestamp=ts)
I have seen how to remove specific times from every day, and how to remove complete days such as weekends/holidays but I am looking to remove subsets from every week, specifically 8:00 on every Saturday to 9:00 on every Monday throughout the entire dataset. I have tried doing the reverse, by subsetting the period I need by using lubridate (thanks #Christian):
dfc = ymd_hm(df$timestamp)
df[day(dfc) == 2 & hour(dfc) >= 9 | day(dfc) == 7 & hour(dfc) >= 8,]
but it didn't seem to work.
Cheers.
you cant subset when using lubridate with square brackets. Instead its called like a regular function. try to replace e.g. hour[dfc] with hour(dfc) and you should be fine.
edit: to subset a range you need to be aware of == is not like >=
edit2: a bit more of a pointing into the right direction
ts_sat_until_monday = seq.POSIXt(as.POSIXlt("2014-08-02 09:00"),
as.POSIXlt("2014-08-04 08:00"), by = 1)
unique(day(ts_sat_until_monday))
unique(hour(ts_sat_until_monday))
#what about sunday? up to you

How to subset data according to date in R?

Simple enough question. I have data of US treasury bill rates, with two columns-
1) Date and 2) Rate. The data ranges back to 1960. I wish to subset the rates from 1990 onward, i.e. according to the date.
Code:-
data = read.csv("3mt-bill.csv")
rates= ?
So, I just want a vector of the t-bill rates, but from 1990 onwards.
How should I write the condition?
We need to first convert the 'Date' to Date class, extract the year with format, check whether it is greater than 1990 and subset the 'Rates' based on that logical vector
data$Rate[format(as.Date(data$Date), "%Y") >= 1990]
If the 'Date' column include only year part, it is easier
data$Rate[data$Date >= 1990]
Just in case, if we need tidyverse
library(tidyverse)
data %>%
filter(year(ymd(Date)) >= 1990) %>%
select(Rate)
Or using data.table
library(data.table)
setDT(data)[year(as.IDate(Date)) >= 1990, Rate]

Create indicator variables of holidays from a date column

I am still a bonehead novice so forgive me if this is a simple question, but I can't find the answer on stackoverflow. I would like to create a set of indicator variables for each of the major US holidays, just by applying a function to my date field that can detect which days are holidays and then I could us Model.matrix etc.. to convert to a set of indicator variables.
For example, I have daily data from Jan 1 2012 through September 15th, 2013 and I would like to create a variable indicator for Easter.
I am currently using the timeDate package to pass a year to their function Easter() to find the date. I then type the dates into the following code to create an indicator variable.
Easter(2012)
EasterInd2012<-as.numeric(DATASET$Date=="2012-04-08")
The easiest way to get a general holiday indicator variable is to create a vector of all the holidays you're interested in and then match those dates in your data frame. Something like this should work:
library(timeDate)
# Sample data
Date <- seq(as.Date("2012-01-01"), as.Date("2013-09-15"), by="1 day")
DATASET <- data.frame(rnorm(624), Date)
# Vector of holidays
holidays <- c(as.Date("2012-01-01"),
as.Date(Easter(2013)),
as.Date("2012-12-25"),
as.Date("2012-12-31"))
# 1 if holiday, 0 if not. Could also be a factor, like c("Yes", "No")
DATASET$holiday <- ifelse(DATASET$Date %in% holidays, 1, 0)
You can either manually input the dates, or use some of timeDate's built-in holiday functions (the listHolidays() function shows all those). So you could also construct holidays like so:
holidays <- c(as.Date("2012-01-01"),
as.Date(Easter(2013)),
as.Date(USLaborDay(2012)),
as.Date(USThanksgivingDay(2012)),
as.Date(USMemorialDay(2012)),
as.Date("2012-12-25"),
as.Date("2012-12-31"))
To get specific indicators for each holiday, you'll need to do them one at a time:
EasterInd2012 <- ifelse(DATASET$Date==as.Date(Easter(2012)), 1, 0)
LaborDay2012 <- ifelse(DATASET$Date==as.Date(UsLaborDay(2012)), 1, 0)
# etc.

Resources