Unable to convert character / factor class into date - r

I know this question is asked a lot, but I only come to you because I tried everything (including the tips from similar questions that I managed to understand).
I have a rather big CSV file (> 16 000 rows), with, among others, a "Date" column, containing dates in the following format "01/01/1999".
However, when loading the file, the column is not recognised as a date, but as a Factor with read.csv2, or a character with fread (data.table package). I loaded the lubridate library.
In both cases, I tried to convert the column to a date format, using all methods I knew (column = Date, data = test):
as.Date(test$Date, format = "%d/%m/%Y", tz = "")
Or
strptime(test$Date, format = "%d/%m/%y", tz = "")
Or
as_date(test$Date)
And also the function dmy from lubridate, and
as.POSIXct(test$Date, "%d/%m/%y", tz = "").
I also tried changing the format: ymd instead of dmy, "-" instead of "/".
I even tried to change the character class to numeric (when loaded with fread), and the factor class to numeric (when loaded with read.csv2).
Despite all of this, the columns stay in their factor / character classes.
Does someone know what I missed?

Just use the anydate() function from the anytime package:
R> library(anytime)
R> var <- as.factor(c("01/01/1999", "01/02/1999"))
R> anydate(var)
[1] "1999-01-01" "1999-01-02"
R>
R> class(anydate(var))
[1] "Date"
R>
R> class(var)
[1] "factor"
R>
R>
It will read just about any input time, and convert it without requiring a format and this works as long as the represented is somewhat standard (i.e. we do not work with two-digit years etc).
(Otherwise you can of course also use the base R functions after first converting from factor back to character via as.character(). But anytime() and anydate() do that, and much more, for you too.)

If you are using read.csv2, try
read.csv2(..., stringsAsFactors=F)
and then continue with as.Date

Related

Easiest way to have "yyyy.mm.dd" format in R?

So I realized that this isn't a common date type to deal with at least with using as.Date(). When I do the following , the output isn't correct.
> as.Date(Sys.Date(), format = "yyyy.mm.dd")
[1] "2022-06-21"
Is there an easy way to this with lubridate or base R?
We can use format instead of as.Date as Sys.Date() is already in Date class, however as commented, format returns only a character class
format(Sys.Date(), '%Y.%m.%d')

how can I convert number to date?

I have a problem with the as.date function.
I have a list of normal date shows in the excel, but when I import it in R, it becomes numbers, like 33584. I understand that it counts since a specific day. I want to set up my date in the form of "dd-mm-yy".
The original data is:
how the "date" variable looks like in r
I've tried:
as.date <- function(x, origin = getOption(date.origin)){
origin <- ifelse(is.null(origin), "1900-01-01", origin)
as.Date(date, origin)
}
and also simply
as.Date(43324, origin = "1900-01-01")
but none of them works. it shows the error: do not know how to convert '.' to class “Date”
Thank you guys!
The janitor package has a pair of functions designed to deal with reading Excel dates in R. See the following links for usage examples:
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/excel_numeric_to_date
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/convert_to_date
janitor::excel_numeric_to_date(43324)
[1] "2018-08-12"
I've come across excel sheets read in with readxl::read_xls() that read date columns in as strings like "43488" (especially when there is a cell somewhere else that has a non-date value). I use
xldate<- function(x) {
xn <- as.numeric(x)
x <- as.Date(xn, origin="1899-12-30")
}
d <- data.frame(date=c("43488"))
d$actual_date <- xldate(d$date)
print(d$actual_date)
# [1] "2019-01-23"
Dates are notoriously annoying. I would highly recommend the lubridate package for dealing with them. https://lubridate.tidyverse.org/
Use as_date() from lubridate to read numeric dates if you need to.
You can use format() to put it in dd-mm-yy.
library(lubridate)
date_vector <- as_date(c(33584, 33585), origin = lubridate::origin)
formatted_date_vector <- format(date_vector, "%d-%m-%y")

Problems with parse_date_time converting a character vector

I have an imported CSV in R which contains a column of dates and times - this is imported into R as character. The format is "30/03/2020 08:59". I want to convert these strings into a format that allows me to work on them. For simplicity I have made a dataframe which has a single column of these dates (854) in this format.
I'm trying to use the parse_date_time function from lubridate.
It works fine when I reference a single value, e.g.
b=parse_date_time(consults_dates[3,1],orders="dmy HM")
gives b=2020-03-30 09:08:00
However, when I try to perform this on the entire(consults_dates), I get an error, e.g.
c= parse_date_time(consults_dates,orders="dmy HM") gives error:
Warning message:
All formats failed to parse. No formats found.
Apologies - if this is blatantly a simple question, day 1 of R after years of Matlab.
You need to pass the column to parse_date_time function and not the entire dataframe.
library(lubridate)
consults_dates$colum_name <- parse_date_time(consults_dates$colum_name, "dmy HM")
However, if you have only one format in the column you can use dmy_hm
consults_dates$colum_name <- dmy_hm(consults_dates$colum_name)
In base R, we can use :
consults_dates$colum_name <- as.POSIXct(consults_dates$colum_name,
format = "%d/%m/%Y %H:%M", tz = "UTC")

Convert factors to datetime, <fctr> 10/25/2018 (M, D, Y)

I have a data frame called RequisitionHistory2 with a variable called RequisitionDateTime and the levels are factors which look like 4/30/2019 14:16 I would like to split this into RequisitionDate and RequisitionTime in a datetime format.
I tried this code, but this still does not solve my issue with needing to split these into their own columns. The code also did not work as I got the error below.
mutate(When = as.POSIXct(RequisitionHistory2, format="%m/%d/%. %H:%M %p"))
Error in as.POSIXct.default(RequisitionHistory2, format = "%m/%d/%. %H:%M %p") : do not know how to convert 'RequisitionHistory2' to class “POSIXct”
I would like to have the variable RequisitionDateTime split into RequisitionDate and another variable RequisitionTime in the dataframe RequisitionHistory2. Any help is greatly appreciated!
Do not convert factors to datetime directly. You will need to convert it to a character first and then use a datetime function.
as.Date(as.character("10/25/2018"), format = "%m/%d/%Y")
would work for your date example.
library(lubridate)
mutate(df,When = mdy_hm(RequisitionHistory2))
If your datetime is in 4/30/2019 14:16 format
Note that as.POSIXct() works only on datetimes already in ISO 8601 format. I wrote a blog post about this and I think would be helpful for you to check out:
https://jackylam.io/tutorial/uber-data/
The anytime package ON CRAN directly converts from many formats, including factor and ordered to dates and datetime objects. It also heuristically tries a number of viable formats so that you do not need a format string. See the README at GitHub for an introduction, there is also a vignette
Your example works:
R> library(anytime)
R> anytime(as.factor("4/30/2019 14:16"))
[1] "2019-04-30 14:16:00 CDT"
R> anytime(as.factor("4/3/2019 14:16:17"), useR=TRUE)
[1] "2019-04-03 14:16:17 CDT"
R>
However, the underlying (Boost C++) parser does not like single digit days or month so you may need to flip back to R's parser via useR=TRUE as I did on the second example.

Converting integer format date to double format of date

I have date format in following format in a data frame:
Jan-85
Apr-99
1-Nov
Feb-96
When I see the typeof(df$col) I get the answer as "integer".
Actually when I see the format in excel it is in m/d/yyyy format. I was trying to convert this to date format in R. All my efforts yielded NA.
I tried parse_date_time function. I tried as.date along with as.character. I tried as.POSIXct but everything is giving me NA.
My trials were as follows and everything was a failure:
as.Date.numeric(df$col,"m%d%Y")
transform(df$col, as.Date(as.character(df$col), "%m%d%Y"))
as.Date(df$col,"m%d%Y")
as.POSIXct.numeric(as.character(loan_new$issue_d), format="%Y%m%d")
as.POSIXct.date(as.character(df$col), format="%Y%m%d")
mdy(df$col)
parse_date_time(df$col,c("mdy"))
How can I convert this to date format? I have used lubridate package for parse_date_time and mdy package.
dput output is below
Label <- factor(c("Apr-08",
"Apr-09", "Apr-10", "Apr-11", "Aug-07", "Aug-08", "Aug-09", "Aug-10",
"Aug-11", "Dec-07", "Dec-08", "Dec-09", "Dec-10", "Dec-11", "Feb-08",
"Feb-09", "Feb-10", "Feb-11", "Jan-08", "Jan-09", "Jan-10", "Jan-11",
"Jul-07", "Jul-08", "Jul-09", "Jul-10", "Jul-11", "Jun-07", "Jun-08",
"Jun-09", "Jun-10", "Jun-11", "Mar-08", "Mar-09", "Mar-10", "Mar-11",
"May-08", "May-09", "May-10", "May-11", "Nov-07", "Nov-08", "Nov-09",
"Nov-10", "Nov-11", "Oct-07", "Oct-08", "Oct-09", "Oct-10", "Oct-11",
"Sep-07", "Sep-08", "Sep-09", "Sep-10", "Sep-11"))
NA is typically what you get when you misspecify the format. Which is what you do. That said, if your data is really looking like the first example you gave, it's impossible to simply convert this to a date. You have two different formats, one being month-year and the other day-month.
If your updated date (i.e. Dec-11) is the correct format, then you use the format argument of as.Date like this:
date <- "Dec-11"
as.Date(date, format = "%b-%d")
# [1] "2017-12-11"
Or on your example data:
as.Date(Label, format = "%b-%d")
# [1] "2017-04-08" "2017-04-09" "2017-04-10" "2017-04-11" "2017-08-07" "2017-08-08"
# [7] "2017-08-09" "2017-08-10" "2017-08-11" "2017-12-07" "2017-12-08" "2017-12-09"
If you want to convert something like Jan-85, you have to decide which day of the month that date should have. Say we just take the first of each month, then you can do:
x <- "Jan-85"
xd <- paste0("1-",x)
as.Date(xd, "%d-%b-%y")
# [1] "1985-01-01"
More information on the format codes can be found on ?strptime
Note that R will automatically add this year as the year. It has to, otherwise it can't specify the date. In case you do not have a day of the month (eg like Jan-85), conversion to a date is impossible because the underlying POSIX algorithms don't have all necessary information.
Also keep in mind that this only works when your locale is set to english. Otherwise you have a big chance your OS won't recognize the month abbreviations correctly. To do so, do eg:
Sys.setlocale(category = "LC_TIME", locale = "English_United Kingdom")
You can later set it back to the original one if you must, or restart your R session to reset the locale settings.
note: Please check carefully which locale notations are valid for your OS. The above example works on Windows, but is not guaranteed on either Linux or Mac.
Why you see integer
The fact that these string values are of integer type, is due to the fact that R automatically convert character vectors to factors when reading in a data frame. So typeof() returns integer because that's the internal representation of a factor.

Resources