String to Date with leading X character - r

I'm trying to convert the Date column to date format but I keep getting an error. I think the problem might be that the date is a character and has an X before the year:
HMC.Close Date
1 39.71 X2007.01.03
2 40.04 X2007.01.04
3 38.67 X2007.01.05
4 38.89 X2007.01.08
5 38.91 X2007.01.09
6 37.94 X2007.01.10
This is the code I've been running:
stock_honda <- expand.grid("HMC" = HMC$HMC.Close) %>%
"Date" = as.Date(row.names(as.data.frame(HMC))) %>%
subset(Date >"2021-02-28" & Date < "2022-03-11")
Error in charToDate(x) :
character string is not in a standard unambiguous format

You can use gsub to first remove the "X" that is causing a problem and then use ymd from lubridate package to convert the strings into Dates. Additionally, you can make that conversion using mutate(across(...)) from the dplyr package to do everything in a tidyverse-way.
library(dplyr)
library(lubridate)
df |>
# Mutate Date to remove X and convert it to Date
mutate(across(Date, function(x){
ymd(gsub("X","", x))
}))
# HMC.Close Date
#1 39.71 2007-01-03
#2 40.04 2007-01-04
#3 38.67 2007-01-05
#4 38.89 2007-01-08
#5 38.91 2007-01-09
#6 37.94 2007-01-10

Here is a pipeline that avoids prepending "X" to the dates in the first place:
library(quantmod)
getSymbols(c("FCAU.VI", "TYO", "VWAGY", "HMC"), na.rm = TRUE)
library(tidyverse)
stock_honda <- (HMC
%>% as.data.frame()
%>% rownames_to_column("Date")
%>% select(Date, HMC.Close)
%>% mutate(across(Date, lubridate::ymd))
%>% filter(between(Date, as.Date("2021-02-28"), as.Date("2022-03-11")))
)
It would be nice if there were a version of between that avoided the need to explicitly convert to dates. (filter("2021-02-28" < Date, Date < "2022-03-11") would also work for the last step.)

Related

Transform number chain into date

I am trying to transform a list of numbers (e.g. 20200119) into a valid date (here: 2020-01-19)
This is my trial data:
df <- data.frame(c(20200119, 20180718, 20180729, 20150502, 20010301))
colnames(df)[1] = "Dates"
And this is what I tried so far:
df <- as_date(df)
df <- as.Date.numeric(df)
df <- as.Date.factor(df)
Neither of them works unfortunately.
I also tried to seperate the numbers, but I couldn't achieve either.
Can somebody help me?
Convert it to a character and convert it then to a Date with given format %Y%m%d:
as.Date(as.character(df$Dates), "%Y%m%d")
#[1] "2020-01-19" "2018-07-18" "2018-07-29" "2015-05-02" "2001-03-01"
Another option using strptime with the right format like this:
df <- data.frame(c(20200119, 20180718, 20180729, 20150502, 20010301))
colnames(df)[1] = "Dates"
df$Dates2 <- strptime(df$Dates, format = "%Y%m%d")
df
#> Dates Dates2
#> 1 20200119 2020-01-19
#> 2 20180718 2018-07-18
#> 3 20180729 2018-07-29
#> 4 20150502 2015-05-02
#> 5 20010301 2001-03-01
Created on 2023-01-12 with reprex v2.0.2

How to convert a "char" column to datetime column in large datasets

I am working with large datasets and in which one column is represented as char data type instead of a DateTime datatype. I trying it convert but I am unable to convert it.
Could you please suggest any suggestions for this problem? it would be very helpful for me
Thanks in advance
code which i am using right now
c_data$dt_1 <- lubridate::parse_date_time(c_data$started_at,"ymd HMS")
getting output:
2027- 05- 20 20:10:03
but desired output is
2020-05-20 10:03
Here is another way using lubridate:
library(lubridate)
df <- tibble(start_at = c("27/05/2020 10:03", "25/05/2020 10:47"))
df %>%
mutate(start_at = dmy_hms(start_at))
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 20:10:03
2 2020-05-25 20:10:47
In R, dates and times have a single format. You can change it's format to your required format but then it would be of type character.
If you want to keep data in the format year-month-day min-sec you can use format as -
format(Sys.time(), '%Y-%m-%d %M:%S')
#[1] "2021-08-27 17:54"
For the entire column you can apply this as -
c_data$dt_2 <- format(c_data$dt_1, '%Y-%m-%d %M:%S')
Read ?strptime for different formatting options.
Using anytime
library(dplyr)
library(anytime)
addFormats("%d/%m/%Y %H:%M")
df %>%
mutate(start_at = anytime(start_at))
-output
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 10:03:00
2 2020-05-25 10:47:00

How to change the date format & remove rows from dataframe before certain date R Studio

I have a dataframe with over 8.8 million observations and I need to remove rows from the dataframe before a certain date. Currently the date format is in MM/DD/YYYY but I would like to convert it to R date format (I believe YYYY-MM-DD).
When I run the code that I have below, it puts them in the correct R format, but it does not keep the correct date. For some reason, it makes the dates 2020. None of the dates in my data frame have the year 2020
> dates <- nyc_call_data_sample$INCIDENT_DATETIME
> date <- as.Date(dates,
+ format = "%m/%d/%y")
> head(nyc_call_data_sample$INCIDENT_DATETIME)
[1] "07/01/2015" "04/24/2016" "04/01/2013" "02/07/2015" "06/27/2016" "05/04/2017"
> head(date)
[1] "2020-07-01" "2020-04-24" "2020-04-01" "2020-02-07" "2020-06-27" "2020-05-04"
> nyc_call_data_sample$INCIDENT_DATETIME <- strptime(as.character(nzd$date), "%d/%m/%y")
Also, I have data that goes back as far as 2013. How would I go about removing all rows from the dataframe that are before 01/01/2017
Thanks!
as.Date and basic ?Extraction are your friend here.
dat <- data.frame(
unformatted = c("07/01/2015", "04/24/2016", "04/01/2013", "02/07/2015", "06/27/2016", "05/04/2017")
)
dat$date <- as.Date(dat$unformatted, format = "%m/%d/%Y")
dat
# unformatted date
# 1 07/01/2015 2015-07-01
# 2 04/24/2016 2016-04-24
# 3 04/01/2013 2013-04-01
# 4 02/07/2015 2015-02-07
# 5 06/27/2016 2016-06-27
# 6 05/04/2017 2017-05-04
dat[ dat$date > as.Date("2017-01-01"), ]
# unformatted date
# 6 05/04/2017 2017-05-04
(Feel free to remove the unformatted column with dat$unformatted <- NULL.)
With tidyverse:
library(dplyr)
dat %>%
mutate(date = as.Date(unformatted, format = "%m/%d/%Y")) %>%
select(-unformatted) %>%
filter(date > as.Date("2017-01-01"))
# date
# 1 2017-05-04

compare date variable with a list of dates

I have a df with a datetime variable (made with lubridate)
str(raw_data$date)
POSIXct[1:37166], format: "2016-11-04 09:12:38" "2016-11-04 09:04:08" "2016-11-04 09:04:14" "2016-11-04 09:08:01" "2016-11-04 09:11:56" ...
and a list of dates for a school term
vsdate<- c("2017/01/30","2017/03/31","2017/04/18","2017/06/30","2017/07/17","2017/09/22","2017/10/09","2017/12/22","2018/01/30","2018/03/29","2018/04/16","2018/06/29","2018/07/16","2018/09/21","2018/10/08","2018/12/21")
vsdate <- as_date(vsdate)
I want to compare if the dates in the list are between the dates in raw_data. I have done this below, but I can't get it to work in the tidyverse:
vsdate<- c("2017/01/30","2017/03/31","2017/04/18","2017/06/30","2017/07/17","2017/09/22","2017/10/09","2017/12/22","2018/01/30","2018/03/29","2018/04/16","2018/06/29","2018/07/16","2018/09/21","2018/10/08","2018/12/21")
vsdate <- as.Date(vsdate)
raw_data$Vic.School.Term=0
raw_data[raw_data$date<=vsdate[2]& raw_data$date>=vsdate[1],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[4]& raw_data$date>=vsdate[3],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[6]& raw_data$date>=vsdate[5],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[8]& raw_data$date>=vsdate[7],"Vic.School.Term"]<-1
raw_data[raw_data$date<=vsdate[10]& raw_data$date>=vsdate[9],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[12]& raw_data$date>=vsdate[11],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[14]& raw_data$date>=vsdate[13],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[16]& raw_data$date>=vsdate[15],"Vic.School.Term"]<-1
and here is my failed attempt in the tidyverse:
raw_data<- raw_data <- mutate(school.term=case_when(
between(date,vsdate[1],vsdate[2] ~ 1)))
Error in between(date, vsdate[1], vsdate[2] ~ 1) :
Expecting a single value: [extent=3].
Thanks!
Your between function is not closed properly. The proper signature for it is between(value,left, right) and you have between(value, left, right ~1). See below for the 1st few cases:
library(dplyr)
library(lubridate)
raw_data <- data.frame( date = c("2016-11-04 09:12:38", "2016-11-04 09:04:08",
"2016-11-04 09:04:14", "2016-11-04 09:08:01",
"2016-11-04 09:11:56", "2017-02-15 09:10:01",
"2017-05-01 10:00:00")
)
raw_data %>% mutate(date = ymd_hms(date)) -> raw_data
str(raw_data)
vsdate<- ymd(c("2017/01/30","2017/03/31","2017/04/18","2017/06/30",
"2017/07/17","2017/09/22","2017/10/09","2017/12/22",
"2018/01/30","2018/03/29","2018/04/16","2018/06/29",
"2018/07/16","2018/09/21","2018/10/08","2018/12/21"))
str(vsdate)
raw_data %>% mutate(school.term = case_when(between(as.Date(date), vsdate[1], vsdate[2]) ~1,
between(as.Date(date), vsdate[3], vsdate[4]) ~1,
TRUE ~ 0)
date school.term
1 2016-11-04 09:12:38 0
2 2016-11-04 09:04:08 0
3 2016-11-04 09:04:14 0
4 2016-11-04 09:08:01 0
5 2016-11-04 09:11:56 0
6 2017-02-15 09:10:01 1
7 2017-05-01 10:00:00 1
Also, note the as.Date function in the between. This allows the comparison between POSIXct and regular date format in R

Vectorised time zone conversion with lubridate

I have a data frame with a column of date-time strings:
library(tidyverse)
library(lubridate)
testdf = data_frame(
mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
mydt = c('2018-01-17T09:15:00', '2018-01-17T09:16:00', '2018-01-17T09:18:00'))
testdf
# A tibble: 3 x 2
# mytz mydt
# <chr> <chr>
# 1 Australia/Sydney 2018-01-17T09:15:00
# 2 Australia/Adelaide 2018-01-17T09:16:00
# 3 Australia/Perth 2018-01-17T09:18:00
I want to convert these date-time strings to POSIX date-time objects with their respective timezones:
testdf %>% mutate(mydt_new = ymd_hms(mydt, tz = mytz))
Error in mutate_impl(.data, dots) :
Evaluation error: tz argument must be a single character string.
In addition: Warning message:
In if (tz != "UTC") { :
the condition has length > 1 and only the first element will be used
I get the same result if I use ymd_hms without a timezone and pipe it into force_tz. Is it fair to conclude that lubridate doesn't support any sort of vectorisation when it comes to timezone operations?
Another option is map2. It may be better to store different tz output in a list as this may get coerced to a single tz
library(tidyverse)
out <- testdf %>%
mutate(mydt_new = map2(mydt, mytz, ~ymd_hms(.x, tz = .y)))
If required, it can be unnested
out %>%
unnest
The values in the list are
out %>%
pull(mydt_new)
#[[1]]
#[1] "2018-01-17 09:15:00 AEDT"
#[[2]]
#[1] "2018-01-17 09:16:00 ACDT"
#[[3]]
#[1] "2018-01-17 09:18:00 AWST"
tz argument must be a single character string. indicates that there are more than one time zones thrown into ymd_hms(). In order to make sure that there is only one time zone being thrown into the function, I used rowwise(). Note that I am not in Australian time zone. So I am not sure if the outcome I have is identical to yours.
testdf <- data_frame(mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
mydt = c('2018-01-17 09:15:00', '2018-01-17 09:16:00', '2018-01-17 09:18:00'))
testdf %>%
rowwise %>%
mutate(mydt_new = ymd_hms(mydt, tz = mytz))
mytz mydt mydt_new
<chr> <chr> <dttm>
1 Australia/Sydney 2018-01-17 09:15:00 2018-01-17 06:15:00
2 Australia/Adelaide 2018-01-17 09:16:00 2018-01-17 06:46:00
3 Australia/Perth 2018-01-17 09:18:00 2018-01-17 09:18:00

Resources