How can I parse date in another language? - r

My local environment is French :
> Sys.getlocale()
[1] "fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8"
I would like to parse date in English but I don't know how to enter this parameter to my function.
If the date is in French, everything works :
> as.Date("15 mai 2004", "%d %B %Y")
[1] "2004-05-15"
If I have a date in English, it doesn't work :
> as.Date("15 mai 2004", "%d %B %Y")
[1] "2004-05-15"
as.Date("15 may 2004", "%d %B %Y")
[1] NA

Ok, here is the solution :
Sys.setlocale(category = "LC_TIME", locale = "en_GB.UTF-8")
as.Date("15 may 2004", "%d %B %Y")

Related

convert "2014-05" into date format as "May 2015" for display in ggplot in R

I have date in this character format "2017-03" and I want to convert it in "March 2017" for display in ggplot in R. But when I try to convert it using as.Date("2017-03","%Y-%m") it gives NA
You can consider using zoo::as.yearmon function as:
library(zoo)
#Sample data
v <- c("2014-05", "2017-03")
as.yearmon(v, "%Y-%m")
#[1] "May 2014" "Mar 2017"
#if you want the month name to be in full. Then you can format yearmon type as
format(as.yearmon(v, "%Y-%m"), "%B %Y")
#[1] "May 2014" "March 2017"
Parse dates back and forth can be done like this:
The one you mentioned is done by quoting MKR:
Use zoo package
library(zoo)
date <- "2017-03"
as.yearmon(date, "%Y-%m")
#[1] "Mar 2017"
format(as.yearmon(date, "%Y-%m"), "%B %Y")
#[1] "March 2017"
If you want to parse March 2017 or other similar formats back to 2017-03:
Use hms package because base R doesn't provide a nice built-in class for date
library(hms)
DATE <- "March 1 2017"
parse_date(DATE, "%B %d %Y")
#[1] "2017-03-01"
Or if you are parsing dates with foreign language:
foreign_date <- "1 janvier 2018"
parse_date(foreign_date, "%d %B %Y", locale = locale("fr"))
#[1] "2018-01-01"
By using the locale = locale("language") you can parse dates with foreign months names to standard dates. Use this to check the language:
date_names_langs()
-Format:
-Year: %Y(4 digits) %y(2 digits; 00-69->2000-2069, 70-99 -> 1970-1999)
-Month: %m (2 digits), %b (abbreviation: Jan), %B full name January
-Day: %d (2 digits)

Invalid %%A auto parser

Can anybody tell me what is wrong with this command in R?
I literally have tried everything:
d0 <- "domingo 04 febrero 2018"
parse_date(d0, "%A %d %B %Y", locale = locale("es"))
When I execute the above code, I get an error that says
"invalid %%A auto parser"
I think that the problem is that the "%A" format specification is not defined in the parse_date function.
d0 <- "04 febrero 2018"
parse_date(d0, "%d %B %Y", locale = locale("es"))
[1] "2018-02-04"
The %A is a valid format per ISO8601 specification.
The actual problem is that parse_date (a function from readr package) doesnot provide support for all of ISO8601 specifications. Missing features include Week and Weekday specifications.
But we have an alternate solution in base r itself. Let me provide you few examples:
# %A is supported is base r
> as.character(Sys.Date(), "%A %d %B %Y")
[1] "Sunday 04 February 2018"
# Try to parse character string to date now
> as.POSIXct("Sunday 04 February 2018", format = "%A %d %B %Y")
[1] "2018-02-04 GMT"
# Lets execute the same code (converted to English) from OP
d0 <- "Sunday 04 February 2018"
as.POSIXct(d0, format = "%A %d %B %Y", tz = "GMT")
# Result :
[1] "2018-02-04 GMT"

R returns NA when I try to coerce a string to date using as.Date [duplicate]

This question already has answers here:
strptime, as.POSIXct and as.Date return unexpected NA
(2 answers)
Closed 5 years ago.
I am trying to convert "March 15, 2017" to date.
as.Date("March 15, 2017", "%B %d, %Y") and it returned NA
I feel that the syntax fits well, what is the problem?
You are close, but have been bitten by your locale. If you look at the documentation for strptime, you will notice that
%B Full month name in the current locale. (Also matches abbreviated name on input.)
This is also the case for my system since Slovenian doesn't have English month names:
> as.Date("March 15, 2017", "%B %d, %Y")
[1] NA
> Sys.getlocale()
[1] "LC_COLLATE=Slovenian_Slovenia.1250;LC_CTYPE=Slovenian_Slovenia.1250;LC_MONETARY=Slovenian_Slovenia.1250;LC_NUMERIC=C;LC_TIME=Slovenian_Slovenia.1250"
What you can do is change locale, perhaps only for the duration of the conversion.
> Sys.setlocale(locale = "English")
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
> as.Date("March 15, 2017", "%B %d, %Y")
[1] "2017-03-15"
And then back to normal
> Sys.setlocale(locale = "Slovenian")
[1] "LC_COLLATE=Slovenian_Slovenia.1250;LC_CTYPE=Slovenian_Slovenia.1250;LC_MONETARY=Slovenian_Slovenia.1250;LC_NUMERIC=C;LC_TIME=Slovenian_Slovenia.1250"
> as.Date("March 15, 2017", "%B %d, %Y")
[1] NA
But if I use a Slovenian name for March:
> as.Date("Marec 15, 2017", "%B %d, %Y")
[1] "2017-03-15"
Locale name will depend on your operating system, see ?Sys.setlocale for more info.

as.Date dosen't work with %d-%b-%y (R) [duplicate]

I often use as.POSIXct to convert characters to POSIXct, but I get NA sometimes and I don't know why. For example:
DATE <- "Fri Apr 10 11:57:47 2015"
DATE_in_posix <- as.POSIXct(DATE, format="%a %b %d %H:%M:%S %Y")
I tried this too:
DATE_in_posix <- as.POSIXct(DATE, format="%a %h %d %H:%M:%S %Y")
But result for both is always:
> DATE_in_posix
[1] NA
Maybe the input for as.POSIXct is too long? And when it's too long what could be the solution?
It's probably because "Fri" and "Apr" are not the correct abbreviations in your locale.
Use Sys.setlocale("LC_TIME", locale) to set your R session's locale to one that will correctly interpret English abbreviations. See the Examples section of ?Sys.setlocale for how to specify locale in the above function call.
For example, on my Ubuntu machine it would be:
> Sys.setlocale("LC_TIME", "en_US.UTF-8")
> as.POSIXct("Fri Apr 10 11:57:47 2015", format="%a %b %d %H:%M:%S %Y")
[1] "2015-04-10 11:57:47 CDT"
Thanks a lot Henrik!!!
I changed the LC_TIME category like this, now it works
Sys.getlocale(category = "LC_TIME")
[1] "German_Germany.1252"
Sys.setlocale("LC_TIME", "English")
[1] "English_United States.1252"
DATE_in_posix<-as.POSIXct(DATE,format="%a %b %d %H:%M:%S %Y")
> DATE_in_posix
[1] "2015-04-10 11:57:47 CEST"
and strptime now works too of course
DATE_in_posix<-strptime(DATE,format="%a %b %d %H:%M:%S %Y")
> DATE_in_posix
[1] "2015-04-10 11:57:47 CEST"
Thank you so much guys and have a nice weekend!

Conversion from character to date/time returns NA

I often use as.POSIXct to convert characters to POSIXct, but I get NA sometimes and I don't know why. For example:
DATE <- "Fri Apr 10 11:57:47 2015"
DATE_in_posix <- as.POSIXct(DATE, format="%a %b %d %H:%M:%S %Y")
I tried this too:
DATE_in_posix <- as.POSIXct(DATE, format="%a %h %d %H:%M:%S %Y")
But result for both is always:
> DATE_in_posix
[1] NA
Maybe the input for as.POSIXct is too long? And when it's too long what could be the solution?
It's probably because "Fri" and "Apr" are not the correct abbreviations in your locale.
Use Sys.setlocale("LC_TIME", locale) to set your R session's locale to one that will correctly interpret English abbreviations. See the Examples section of ?Sys.setlocale for how to specify locale in the above function call.
For example, on my Ubuntu machine it would be:
> Sys.setlocale("LC_TIME", "en_US.UTF-8")
> as.POSIXct("Fri Apr 10 11:57:47 2015", format="%a %b %d %H:%M:%S %Y")
[1] "2015-04-10 11:57:47 CDT"
Thanks a lot Henrik!!!
I changed the LC_TIME category like this, now it works
Sys.getlocale(category = "LC_TIME")
[1] "German_Germany.1252"
Sys.setlocale("LC_TIME", "English")
[1] "English_United States.1252"
DATE_in_posix<-as.POSIXct(DATE,format="%a %b %d %H:%M:%S %Y")
> DATE_in_posix
[1] "2015-04-10 11:57:47 CEST"
and strptime now works too of course
DATE_in_posix<-strptime(DATE,format="%a %b %d %H:%M:%S %Y")
> DATE_in_posix
[1] "2015-04-10 11:57:47 CEST"
Thank you so much guys and have a nice weekend!

Resources