How to subset data according to date in R? - r

Simple enough question. I have data of US treasury bill rates, with two columns-
1) Date and 2) Rate. The data ranges back to 1960. I wish to subset the rates from 1990 onward, i.e. according to the date.
Code:-
data = read.csv("3mt-bill.csv")
rates= ?
So, I just want a vector of the t-bill rates, but from 1990 onwards.
How should I write the condition?

We need to first convert the 'Date' to Date class, extract the year with format, check whether it is greater than 1990 and subset the 'Rates' based on that logical vector
data$Rate[format(as.Date(data$Date), "%Y") >= 1990]
If the 'Date' column include only year part, it is easier
data$Rate[data$Date >= 1990]
Just in case, if we need tidyverse
library(tidyverse)
data %>%
filter(year(ymd(Date)) >= 1990) %>%
select(Rate)
Or using data.table
library(data.table)
setDT(data)[year(as.IDate(Date)) >= 1990, Rate]

Related

How can I show Q1 to quarter without year on r

I am studying R and the exercise needs that I create a column to Quarter where the data seem Q4.
I use zoo library and lubridate, but the result that I achieved was a year plus Q1.
I have a column InvoiceDate where I it has a complete datetime.
What I need:
The original table, more specifically column, has a date-time like this: 2010-12-01 08:26:00. The column name is InvoiceDate. I need to get this column and to create other columns with specific values, like a quarter, year, month, etc.
What I archieved:
How do I achieve my goal?
You can use the inbuilt functions to extract the information that you need.
library(dplyr)
df %>%
mutate(quarter = quarters(InvoiceDate),
Month = format(InvoiceDate, '%b'),
weekday = weekdays(InvoiceDate))

Filter Data by Seasonal Ranges Over Several Years Based on Month and Day Column in R Studio

I am trying to filter a large dataset to contain results between a range of days and months over several years to evaluate seasonal objectives. My season is defined from 15 March through 15 September. I can't figure out how to filter the days so that they are only applied to March and September and not the other months within the range. My dataframe is very large and contains proprietary information, but I think the most important information is that the dates are describes by columns: SampleDate (date formatted as %y%m%d), day (numeric), and month (numeric).
I have tried filtering using multiple conditions like so:
S1 <- S1 %>%
filter((S1$month >= 3 & S1$day >=15) , (S1$month<=9 & S1$day<=15 ))
I also attempted to set ranges using between for every year that I have data with no luck:
S1 %>% filter(between(SampleDate, as.Date("2010-03-15"), as.Date("2010-09-15") &
as.Date("2011-03-15"), as.Date("2011-09-15")&
as.Date("2012-03-15"), as.Date("2012-09-15")&
as.Date("2013-03-15"), as.Date("2013-09-15")&
as.Date("2014-03-15"), as.Date("2014-09-15")&
as.Date("2015-03-15"), as.Date("2015-09-15")&
as.Date("2016-03-15"), as.Date("2016-09-15")&
as.Date("2017-03-15"), as.Date("2017-09-15")&
as.Date("2018-03-15"), as.Date("2018-09-15")))
I am pretty new to R and can't find any solution online. I know there must be a somewhat simple way to do this! Any help is greatly appreciated!
Maybe something like this:
library(data.table)
df <- setDT(df)
# convert a date like this '2020-01-01' into this '01-01'
df[,`:=`(month_day = str_sub(date, 6, 10))]
df[month_day >= '03-15' & month_day <= '09-15']

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Calculate mean of one column for 14 rows before certain row, as identified by date for each group (year)

I would like to calculate mean of Mean.Temp.c. before certain date, such as 1963-03-23 as showed in date2 column in this example. This is time when peak snowmelt runoff occurred in 1963 in my area. I want to know 10 day’s mean temperature before this date (ie., 1963-03-23). How to do it? I have 50 years data, and each year peak snowmelt date is different.
example data
You can try:
library(dplyr)
df %>%
mutate(date2 = as.Date(as.character(date2)),
ten_day_mean = mean(Mean.Temp.c[between(date2, "1963-03-14", "1963-03-23")]))
In this case the desired mean would populate the whole column.
Or with data.table:
library(data.table)
setDT(df)[between(as.Date(as.character(date2)), "1963-03-14", "1963-03-23"), ten_day_mean := mean(Mean.Temp.c)]
In the latter case you'd get NA for those days that are not relevant for your date range.
Supposing date2 is a Date field and your data.frame is called x:
start_date <- as.Date("1963-03-23")-10
end_date <- as.Date("1963-03-23")
mean(x$Mean.Temp.c.[x$date2 >= start_date & x$date2 <= end_date])
Now, if you have multiple years of interest, you could wrap this code within a for loop (or [s|l]apply) taking elements from a vector of dates.

Create indicator variables of holidays from a date column

I am still a bonehead novice so forgive me if this is a simple question, but I can't find the answer on stackoverflow. I would like to create a set of indicator variables for each of the major US holidays, just by applying a function to my date field that can detect which days are holidays and then I could us Model.matrix etc.. to convert to a set of indicator variables.
For example, I have daily data from Jan 1 2012 through September 15th, 2013 and I would like to create a variable indicator for Easter.
I am currently using the timeDate package to pass a year to their function Easter() to find the date. I then type the dates into the following code to create an indicator variable.
Easter(2012)
EasterInd2012<-as.numeric(DATASET$Date=="2012-04-08")
The easiest way to get a general holiday indicator variable is to create a vector of all the holidays you're interested in and then match those dates in your data frame. Something like this should work:
library(timeDate)
# Sample data
Date <- seq(as.Date("2012-01-01"), as.Date("2013-09-15"), by="1 day")
DATASET <- data.frame(rnorm(624), Date)
# Vector of holidays
holidays <- c(as.Date("2012-01-01"),
as.Date(Easter(2013)),
as.Date("2012-12-25"),
as.Date("2012-12-31"))
# 1 if holiday, 0 if not. Could also be a factor, like c("Yes", "No")
DATASET$holiday <- ifelse(DATASET$Date %in% holidays, 1, 0)
You can either manually input the dates, or use some of timeDate's built-in holiday functions (the listHolidays() function shows all those). So you could also construct holidays like so:
holidays <- c(as.Date("2012-01-01"),
as.Date(Easter(2013)),
as.Date(USLaborDay(2012)),
as.Date(USThanksgivingDay(2012)),
as.Date(USMemorialDay(2012)),
as.Date("2012-12-25"),
as.Date("2012-12-31"))
To get specific indicators for each holiday, you'll need to do them one at a time:
EasterInd2012 <- ifelse(DATASET$Date==as.Date(Easter(2012)), 1, 0)
LaborDay2012 <- ifelse(DATASET$Date==as.Date(UsLaborDay(2012)), 1, 0)
# etc.

Resources