Selecting Date Range from Column in R - r

I have a dataset in R which I read in using read.table (Table Name a) . I want to select dates from '2007-02-01' to '2007-02-02'. The Current Date Column is of class "Character".
Date
1 16/12/2006
2 16/12/2006
3 16/12/2006
4 16/12/2006
5 16/12/2006
6 16/12/2006
I tried the following:
1. as.Date(a$Date) returns date in the format "0016-12-20"
2. a[a$Date >= '2007-02-01' & a$Date <= '2007-02-01'] returns all rows with 0 variables
3. strptime(a$Date,'%d%b%Y') returns NA values

Convert date to date class and subset :
df$Date <- as.Date(df$Date, '%d/%m/%Y')
subset(df, Date >= as.Date('2007-02-01') & Date <= as.Date('2007-02-02'))
You can also use :
library(dplyr)
df %>%
mutate(Date = lubridate::dmy(Date)) %>%
filter(Date >= as.Date('2007-02-01') & Date <= as.Date('2007-02-02'))

Related

String to Date with leading X character

I'm trying to convert the Date column to date format but I keep getting an error. I think the problem might be that the date is a character and has an X before the year:
HMC.Close Date
1 39.71 X2007.01.03
2 40.04 X2007.01.04
3 38.67 X2007.01.05
4 38.89 X2007.01.08
5 38.91 X2007.01.09
6 37.94 X2007.01.10
This is the code I've been running:
stock_honda <- expand.grid("HMC" = HMC$HMC.Close) %>%
"Date" = as.Date(row.names(as.data.frame(HMC))) %>%
subset(Date >"2021-02-28" & Date < "2022-03-11")
Error in charToDate(x) :
character string is not in a standard unambiguous format
You can use gsub to first remove the "X" that is causing a problem and then use ymd from lubridate package to convert the strings into Dates. Additionally, you can make that conversion using mutate(across(...)) from the dplyr package to do everything in a tidyverse-way.
library(dplyr)
library(lubridate)
df |>
# Mutate Date to remove X and convert it to Date
mutate(across(Date, function(x){
ymd(gsub("X","", x))
}))
# HMC.Close Date
#1 39.71 2007-01-03
#2 40.04 2007-01-04
#3 38.67 2007-01-05
#4 38.89 2007-01-08
#5 38.91 2007-01-09
#6 37.94 2007-01-10
Here is a pipeline that avoids prepending "X" to the dates in the first place:
library(quantmod)
getSymbols(c("FCAU.VI", "TYO", "VWAGY", "HMC"), na.rm = TRUE)
library(tidyverse)
stock_honda <- (HMC
%>% as.data.frame()
%>% rownames_to_column("Date")
%>% select(Date, HMC.Close)
%>% mutate(across(Date, lubridate::ymd))
%>% filter(between(Date, as.Date("2021-02-28"), as.Date("2022-03-11")))
)
It would be nice if there were a version of between that avoided the need to explicitly convert to dates. (filter("2021-02-28" < Date, Date < "2022-03-11") would also work for the last step.)

How to get date type format?

I have a date in yyyymmdd format dataframe
ex.
df= data.frame(dat = seq.Date(from= as.Date("2021-01-01") , to = as.Date("2021-01-07"), by =1))
I want to create a column of strings in this format:
example : 2021-01-07 should look like 07-JAN-21
toupper(format(date_column, "%d-%b-%y"))
here is the premise
> df$dat <- toupper(format(df$dat, "%d-%b-%y"))
> df
dat
1 01-JAN-21
2 01-FEB-21
3 01-MAR-21
4 01-APR-21
5 01-MAY-21
6 01-JUN-21
7 01-JUL-21

How can I subset a dataset to a specific year?

I have a dataset (Crime) with 6,847,944 observations. I am trying to downsize this data to only those occurring in the relevant year of 2016. The dates can be found in the "Date" column. I have tried all of the following for code:
#change dates to proper format#
Crime$Date = as.Date(Crime$Date, format = "%m/%d/%y")
#filter crimes to 2016#
ATTEMPT 1: Crime16 = subset(Crime$Date = as.Date("2016"))
RESULT 1: Error: unexpected '=' in "Crime16 = subset(Crime$Date ="
ATTEMPT 2: Crimes_2016 <- Crime[year(Date)==2016,]
RESULT 2: Error in as.POSIXlt.default(x, tz = tz(x)) : do not know how to convert 'x' to class “POSIXlt”
ATTEMPT 3: Crimes_2016 = subset(Crime, Date >=2016/1/1 & Date <= 2016/31/12)
RESULT 3: Creates data frame, but contains no observations.
ATTEMPT 4: morecrimes = subset(Crime, Date == 2016)
RESULT 4: Creates data frame, but contains no observations.
ATTEMPT 5: Crimes.2016 = selectByDate(Crime$Date = 2016)
RESULT 5: Error: unexpected '=' in "Crimes.2016 = selectByDate(Crime$Date ="
Without a proper reproducible example dataset I cannot be sure of what you are after but... taking the following dataframe as a test:
x <- data.frame(
"Date" = as.Date(c("2016-01-01", "2015-05-12", "2016-06-16"), format = "%Y-%m-%d"),
"Crime" = LETTERS[1:3])
Which gives:
> x
Date Crime
1 2016-01-01 A
2 2015-05-12 B
3 2016-06-16 C
This can be subset making a logical vector, generated by format(x$Date, "%Y") == "2016" where I change the date format to just year, and using that in a linear search of the data.frame to return the rows where the elements of the logical vector are "TRUE" as such:
> x[format(x$Date, "%Y") == "2016", ]
Date Crime
1 2016-01-01 A
3 2016-06-16 C
x[format(x$Date, "%Y") == "2016", ]
Giving:
> x[format(x$Date, "%Y") == "2016", ]
Date Crime
1 2016-01-01 A
3 2016-06-16 C
Alternatively you could use the dplyr function filter():
library(tidyverse)
# Route 1. Implement filter() the base R way
filter(x, format(x$Date, "%Y") == "2016")
# Route 2. Use filter() the tidyverse way
x %>% filter(format(x$Date, "%Y") == "2016")

How to change the date format & remove rows from dataframe before certain date R Studio

I have a dataframe with over 8.8 million observations and I need to remove rows from the dataframe before a certain date. Currently the date format is in MM/DD/YYYY but I would like to convert it to R date format (I believe YYYY-MM-DD).
When I run the code that I have below, it puts them in the correct R format, but it does not keep the correct date. For some reason, it makes the dates 2020. None of the dates in my data frame have the year 2020
> dates <- nyc_call_data_sample$INCIDENT_DATETIME
> date <- as.Date(dates,
+ format = "%m/%d/%y")
> head(nyc_call_data_sample$INCIDENT_DATETIME)
[1] "07/01/2015" "04/24/2016" "04/01/2013" "02/07/2015" "06/27/2016" "05/04/2017"
> head(date)
[1] "2020-07-01" "2020-04-24" "2020-04-01" "2020-02-07" "2020-06-27" "2020-05-04"
> nyc_call_data_sample$INCIDENT_DATETIME <- strptime(as.character(nzd$date), "%d/%m/%y")
Also, I have data that goes back as far as 2013. How would I go about removing all rows from the dataframe that are before 01/01/2017
Thanks!
as.Date and basic ?Extraction are your friend here.
dat <- data.frame(
unformatted = c("07/01/2015", "04/24/2016", "04/01/2013", "02/07/2015", "06/27/2016", "05/04/2017")
)
dat$date <- as.Date(dat$unformatted, format = "%m/%d/%Y")
dat
# unformatted date
# 1 07/01/2015 2015-07-01
# 2 04/24/2016 2016-04-24
# 3 04/01/2013 2013-04-01
# 4 02/07/2015 2015-02-07
# 5 06/27/2016 2016-06-27
# 6 05/04/2017 2017-05-04
dat[ dat$date > as.Date("2017-01-01"), ]
# unformatted date
# 6 05/04/2017 2017-05-04
(Feel free to remove the unformatted column with dat$unformatted <- NULL.)
With tidyverse:
library(dplyr)
dat %>%
mutate(date = as.Date(unformatted, format = "%m/%d/%Y")) %>%
select(-unformatted) %>%
filter(date > as.Date("2017-01-01"))
# date
# 1 2017-05-04

compare date variable with a list of dates

I have a df with a datetime variable (made with lubridate)
str(raw_data$date)
POSIXct[1:37166], format: "2016-11-04 09:12:38" "2016-11-04 09:04:08" "2016-11-04 09:04:14" "2016-11-04 09:08:01" "2016-11-04 09:11:56" ...
and a list of dates for a school term
vsdate<- c("2017/01/30","2017/03/31","2017/04/18","2017/06/30","2017/07/17","2017/09/22","2017/10/09","2017/12/22","2018/01/30","2018/03/29","2018/04/16","2018/06/29","2018/07/16","2018/09/21","2018/10/08","2018/12/21")
vsdate <- as_date(vsdate)
I want to compare if the dates in the list are between the dates in raw_data. I have done this below, but I can't get it to work in the tidyverse:
vsdate<- c("2017/01/30","2017/03/31","2017/04/18","2017/06/30","2017/07/17","2017/09/22","2017/10/09","2017/12/22","2018/01/30","2018/03/29","2018/04/16","2018/06/29","2018/07/16","2018/09/21","2018/10/08","2018/12/21")
vsdate <- as.Date(vsdate)
raw_data$Vic.School.Term=0
raw_data[raw_data$date<=vsdate[2]& raw_data$date>=vsdate[1],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[4]& raw_data$date>=vsdate[3],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[6]& raw_data$date>=vsdate[5],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[8]& raw_data$date>=vsdate[7],"Vic.School.Term"]<-1
raw_data[raw_data$date<=vsdate[10]& raw_data$date>=vsdate[9],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[12]& raw_data$date>=vsdate[11],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[14]& raw_data$date>=vsdate[13],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[16]& raw_data$date>=vsdate[15],"Vic.School.Term"]<-1
and here is my failed attempt in the tidyverse:
raw_data<- raw_data <- mutate(school.term=case_when(
between(date,vsdate[1],vsdate[2] ~ 1)))
Error in between(date, vsdate[1], vsdate[2] ~ 1) :
Expecting a single value: [extent=3].
Thanks!
Your between function is not closed properly. The proper signature for it is between(value,left, right) and you have between(value, left, right ~1). See below for the 1st few cases:
library(dplyr)
library(lubridate)
raw_data <- data.frame( date = c("2016-11-04 09:12:38", "2016-11-04 09:04:08",
"2016-11-04 09:04:14", "2016-11-04 09:08:01",
"2016-11-04 09:11:56", "2017-02-15 09:10:01",
"2017-05-01 10:00:00")
)
raw_data %>% mutate(date = ymd_hms(date)) -> raw_data
str(raw_data)
vsdate<- ymd(c("2017/01/30","2017/03/31","2017/04/18","2017/06/30",
"2017/07/17","2017/09/22","2017/10/09","2017/12/22",
"2018/01/30","2018/03/29","2018/04/16","2018/06/29",
"2018/07/16","2018/09/21","2018/10/08","2018/12/21"))
str(vsdate)
raw_data %>% mutate(school.term = case_when(between(as.Date(date), vsdate[1], vsdate[2]) ~1,
between(as.Date(date), vsdate[3], vsdate[4]) ~1,
TRUE ~ 0)
date school.term
1 2016-11-04 09:12:38 0
2 2016-11-04 09:04:08 0
3 2016-11-04 09:04:14 0
4 2016-11-04 09:08:01 0
5 2016-11-04 09:11:56 0
6 2017-02-15 09:10:01 1
7 2017-05-01 10:00:00 1
Also, note the as.Date function in the between. This allows the comparison between POSIXct and regular date format in R

Resources