Hi guys I have a list of dates with this weird format : X1.22.20 X1.23.20 (month/day/year).
and I would like to have "2020-06-11" ('%d %b %Y').
I tried this:
> min.date <- min(dates)
> max.date <- max(dates)
> min.date.txt <- min.date %>% format('%d %b %Y')
> max.date.txt <- max.date %>% format('%d %b %Y') %>% paste('UTC')
> min.date
[1] "2002-10-10"
And the value is crazy because I know for sure that there are not 2002 int his data.
Any help?
Thanks
Assuming that the question is how to convert the input x shown below to Date class use as.Date with a format that corresponds to the input so it must start with X, have dots where the input has dots, etc. Look at ?strptime for documentation on the percent codes.
x <- c("X1.22.20", "X1.23.20") # input
as.Date(x, format = "X%m.%d.%y")
## [1] "2020-01-22" "2020-01-23"
Note that if you got those dates like this:
Lines <- "1.22.20 1.23.20
1 2
3 4"
read.table(text = Lines, header = TRUE)
## X1.22.20 X1.23.20
## 1 1 2
## 2 3 4
then the X can be avoided using check.names = FALSE as follows:
read.table(text = Lines, header = TRUE, check.names = FALSE)
## 1.22.20 1.23.20
## 1 1 2
## 2 3 4
Related
I read in an excel file, where 1 column contains dates in different format: excel format (e.g. 43596) and text (e.g. "01.01.2020").
To convert excel format one can use as.Date(as.numeric(df$date), origin = "1899-12-30")
to convert text one can use as.Date(df$date, format = "%d.%m.%Y")
These work for individual values, but when I try ifelse as:
df$date <- ifelse(length(df$date)==5,
as.Date(as.numeric(df$date), origin = "1899-12-30"),
as.Date(df$date, format = "%d.%m.%Y"))
or a for loop:
for (i in length(x)) {
if(nchar(x[i])==5) {
y[i] <- as.Date(as.numeric(x[i]), origin = "1899-12-30")
} else {x[i] <- as.Date(x[i], , format = "%d.%m.%Y"))}
} print(x)
It does not work because of:
"character string is not in a standard unambiguous format"
Maybe you could advice a better solution to convert/ replace different date formats in the appropriate one?
I have 2 solutions for it.
Changing the code, which I don't like because you are depending on xlsx date formats:
> df <- tibble(date = c("01.01.2020","43596"))
>
> df$date <- as.Date(ifelse(nchar(df$date)==5,
+ as.Date(as.numeric(df$date), origin = "1899-12-30"),
+ as.Date(df$date, format = "%d.%m.%Y")), origin = "1970-01-01")
Warning message:
In as.Date(as.numeric(df$date), origin = "1899-12-30") :
NAs introducidos por coerción
>
> df$date
[1] "2020-01-01" "2019-05-11"
>
Save the document as CSV and use read_csv() function from readr package. That solves everything !!!!
You could use sapply to apply ifelse to each value:
df$date <- as.Date(sapply(df$date,function(date) ifelse(nchar(date)==5,
as.Date(as.numeric(date), origin = "1899-12-30"),
as.Date(date, format = "%d.%m.%Y"))),
origin="1970-01-01")
df
# A tibble: 6 x 2
contract date
<dbl> <date>
1 231429 2019-05-11
2 231437 2020-01-07
3 231449 2021-01-01
4 231459 2020-03-03
5 231463 2020-10-27
6 231466 2011-03-17
A tidyverse solution using rowwise
library(dplyr)
library(lubridate)
df %>%
rowwise() %>%
mutate(date_new=as.Date(ifelse(grepl("\\.",date),
as.character(dmy(date)),
as.character(as.Date(as.numeric(date), origin="1899-12-30"))))) %>%
ungroup()
# A tibble: 6 × 3
contract date date_new
<dbl> <chr> <date>
1 231429 43596 2019-05-11
2 231437 07.01.2020 2020-01-07
3 231449 01.01.2021 2021-01-01
4 231459 03.03.2020 2020-03-03
5 231463 44131 2020-10-27
6 231466 40619 2011-03-17
I have a date in yyyymmdd format dataframe
ex.
df= data.frame(dat = seq.Date(from= as.Date("2021-01-01") , to = as.Date("2021-01-07"), by =1))
I want to create a column of strings in this format:
example : 2021-01-07 should look like 07-JAN-21
toupper(format(date_column, "%d-%b-%y"))
here is the premise
> df$dat <- toupper(format(df$dat, "%d-%b-%y"))
> df
dat
1 01-JAN-21
2 01-FEB-21
3 01-MAR-21
4 01-APR-21
5 01-MAY-21
6 01-JUN-21
7 01-JUL-21
I have a dataframe with over 8.8 million observations and I need to remove rows from the dataframe before a certain date. Currently the date format is in MM/DD/YYYY but I would like to convert it to R date format (I believe YYYY-MM-DD).
When I run the code that I have below, it puts them in the correct R format, but it does not keep the correct date. For some reason, it makes the dates 2020. None of the dates in my data frame have the year 2020
> dates <- nyc_call_data_sample$INCIDENT_DATETIME
> date <- as.Date(dates,
+ format = "%m/%d/%y")
> head(nyc_call_data_sample$INCIDENT_DATETIME)
[1] "07/01/2015" "04/24/2016" "04/01/2013" "02/07/2015" "06/27/2016" "05/04/2017"
> head(date)
[1] "2020-07-01" "2020-04-24" "2020-04-01" "2020-02-07" "2020-06-27" "2020-05-04"
> nyc_call_data_sample$INCIDENT_DATETIME <- strptime(as.character(nzd$date), "%d/%m/%y")
Also, I have data that goes back as far as 2013. How would I go about removing all rows from the dataframe that are before 01/01/2017
Thanks!
as.Date and basic ?Extraction are your friend here.
dat <- data.frame(
unformatted = c("07/01/2015", "04/24/2016", "04/01/2013", "02/07/2015", "06/27/2016", "05/04/2017")
)
dat$date <- as.Date(dat$unformatted, format = "%m/%d/%Y")
dat
# unformatted date
# 1 07/01/2015 2015-07-01
# 2 04/24/2016 2016-04-24
# 3 04/01/2013 2013-04-01
# 4 02/07/2015 2015-02-07
# 5 06/27/2016 2016-06-27
# 6 05/04/2017 2017-05-04
dat[ dat$date > as.Date("2017-01-01"), ]
# unformatted date
# 6 05/04/2017 2017-05-04
(Feel free to remove the unformatted column with dat$unformatted <- NULL.)
With tidyverse:
library(dplyr)
dat %>%
mutate(date = as.Date(unformatted, format = "%m/%d/%Y")) %>%
select(-unformatted) %>%
filter(date > as.Date("2017-01-01"))
# date
# 1 2017-05-04
I have a column with dates that are formatted like this:
yyyymm (e.g. 201809)
I want them to be formatted like this:
mm.yyyy (e.g. 09.2018)
I tried:
FF5factors$date <- strptime(FF5factors$date, format= "%Y%m")
format(FF5factors$date, format="%m.%Y")
But it only returns NA values.
What about:
d <- '201809'
format(as.Date(d,'%Y%M'),'%m.%Y')
[1] "09.2018"
Here are some alternatives. The question did not provide date in reproducible form so we assume the first line below although the first 4 alternatives will also work with date <- "201809" and with date <- factor(201809) .
date <- 201809
# 1
sub("(....)(..)", "\\2.\\1", date)
## [1] "09.2018"
# 2
library(zoo)
format(as.yearmon(format(date), "%Y%m"), "%m.%Y")
## [1] "09.2018"
# 3
paste(substr(date, 5, 6), substr(date, 1, 4), sep = ".")
## [1] "09.2018"
# 4
format(as.Date(paste0(date, "01"), "%Y%m%d"), "%m.%Y")
## [1] "09.2018"
# 5
sprintf("%02d.%d", date %% 100, date %/%100)
## [1] "09.2018"
My Code is reading in a CSV file and converting the time stamp column to the R time format
DF <- read.csv("DF.CSV",head=TRUE,sep=",")
DF[51082,1]
[1] 03/01/2012 19:29
DF[1,1]
[1] 02/24/12 00:29
It reads it in properly and the above 2 rows are displayed as expected
DF$START <- as.POSIXct(strptime(paste(DF$START),format="%m/%d/%y %H:%M"))
DF[1,1]
[1] "2012-02-24 00:29:00 GMT"
DF[51082,1]
[1] NA
After converting them to the R time format using strptime and then displaying them again some of the values have NA and there was no error message displayed or reason for it that I can figure out
You have (at least) two different date formats,
one in %Y (4-digit years), one in %y (2-digit years).
Unless 12 really means 12AD, you need to try both.
DF <- data.frame(
START = c(
"03/01/2012 19:29",
"02/24/12 00:29"
),
stringsAsFactors = FALSE
)
coalesce <- function (x, ...) {
z <- class(x)
for (y in list(...)) {
x <- ifelse(is.na(x), y, x)
}
class(x) <- z
x
}
DF$START <- coalesce(
as.POSIXct(strptime(DF$START, format="%m/%d/%y %H:%M")),
as.POSIXct(strptime(DF$START, format="%m/%d/%Y %H:%M"))
)
# START
# 1 2012-03-01 19:29:00
# 2 2012-02-24 00:29:00
Try to use this:
> DF$START <- as.POSIXct(strptime(paste(DF$START),format="%m/%d/%Y %H:%M"))
This adds year with century.