conversion from Text String to Date in R - r

I have a column of text dates in the format of Feb20, Mar20, ... Feb21, I would like to create a new column with properly formatted dates as in Feb/20, Mar/20... etc. What is the easiest way doing this? I need this to create my ggplot.
thank you in advance!

if locale-setting is an issue, try
mystr <- c("Feb20", "Mar20")
#store current locale, en change
orig_locale <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "C")
#create dates
ans <- as.POSIXct( paste0( "01", mystr), format = "%d%b%y" )
#set locale top previous
Sys.setlocale("LC_TIME", orig_locale)
#format datse to whatever you like
format(ans, format = "%b/%y")

Related

How format date by country in R?

I need a simple way to format dates by different country formats. In the ideal case make one setup and use it everywhere in the code.
Let's say for EN and FR formats it should be: YYYY-MM-DD (England) and DD-MM-YYYY (France)
# This requires extra work. Each time ask wrapper
format_date <- function(date_obs, country_code) {
if(country_code == "en") result <- format(date_obs, format = "%Y-%m-%d")
if(country_code == "fr") result <- format(date_obs, format = "%d-%m-%Y")
result
}
format_date(today(), "en")
format_date(today(), "fr")
# I need this kind of solution
Sys.setlocale(date_format = '%d-%m-%Y')
print(today()) # <<- should be in French format
Thanks!
There are AFAIK no explicity ways to get the preferred date format of a country in R. The only thing you can do is to retrieve it yourself.
Using data from here, you can convert the date format in R strptime format, and then use it format your dates:
read.csv("https://gist.githubusercontent.com/mlconnor/1887156/raw/014a026f00d0a0a1f5646a91780d26a90781a169/country_date_formats.csv")
date_format <-
date_format %>%
mutate(Date.Format = str_replace_all(Date.Format, c("yyyy" = "%Y",
"MM" = "%m",
"(?<!M|%)M(?!M)" = "%-m",
"dd" = "%d",
"(?<!d|%)d(?!d)" = "%a"))) %>%
select(country = ISO639.2.Country.Code, date_format = Date.Format)
format_to_locale <- function(date, locale) format(date, date_format[date_format$country == locale, "date_format"])
format_to_locale(today(), "FR")
#[1] "07/02/2023"
format_to_locale(today(), "US")
#[1] "02/ 7/2023"
This has probably some limitations, but this is a starting point.

Parsing dates with different formats using lubridate

I am importing data from a csv file where the date column contains dates recorded in different formats. I wish to parse the column so that it has the class date and such that all of the dates are formatted in the same style (i.e %d-%m-%Y). I wish to use lubridate as I have some experience with it and want to get better using it.
I have looked for answers here Parsing dates with different formats and here Parsing dates in multiple formats in R using lubridate but I found the answers incomplete.
Typically when I import csv data I change the col_types like so:
potatoes <- read_csv("data/potato_prices.csv",
col_types = cols(
DATE = col_date(format = "%Y-%m-%d"),
'M04003DE00BERM372NNBR' = col_double())) %>%
rename("Price" = "M04003DE00BERM372NNBR")
but because my DATE column contains dates in different formats, dates not formatted like "%Y-%m-%d" return an NA and the class of the column appears as unknown.
I have tried col_guess, instead of specifying with col_date with the exact date formats and then mutating the DATE column with the following code, but it has not worked as I would like.
potatoes <- read_csv("data/potato_prices.csv",
col_types = cols(
DATE = col_guess(),
'M04003DE00BERM372NNBR' = col_double()))
potatoes <- potatoes %>%
mutate(DATE = parse_date_time(DATE, orders = c("Ymd", "dmY"))) %>%
rename("Price" = "M04003DE00BERM372NNBR")
Here is an example of how my data appears in excel in csv format
DATE <- c("1879-01-01", "1879-02-01", "1879-03-01", "1879-04-01", "1/05/1990", "1/06/1990", "1/07/1990", "1/08/1990", "1/09/1990", "1/10/1990")
Price <- c("23", "17.9", "17.8", "18", "20", "22", "20", "19", "17.2", "15")
spuds <- data.frame(DATE, Price)
I wish to have a tibble with two columns; DATE as class col_date and Price as class col_double. I will then create plots using ggplot and I think it will be easiest if my DATE column is in class date.
Thanks
The following function will try the several date formats passed in its argument format. It uses lubridate function guess_formats in order to get the possible formats based on that argument.
as_Date <- function(x, format = c("ymd", "dmy", "mdy")){
fmt <- lubridate::guess_formats(x, format)
fmt <- unique(fmt)
y <- as.Date(x, format = fmt[1])
for(i in seq_along(fmt)[-1]){
na <- is.na(y)
if(!any(na)) break
y[na] <- as.Date(x[na], format = fmt[i])
}
y
}
formats <- c("ymd", "dmy")
as_Date(spuds$DATE, formats)
#[1] "1879-01-01" "1879-02-01" "1879-03-01" "1879-04-01"
#[5] "1990-05-01" "1990-06-01" "1990-07-01" "1990-08-01"
#[9] "1990-09-01" "1990-10-01"

How to format properly date-time column in R using mutate?

I am trying to format a string column to a date-time serie.
The row in the column are like this example: "2019-02-27T19:08:29+000"
(dateTime is the column, the variable)
mutate(df,dateTime=as.Date(dateTime, format = "%Y-%m-%dT%H:%M:%S+0000"))
But the results is:
2019-02-27
What about the hours, minutes and seconds ?
I need it to apply a filter by date-time
Your code is almost correct. Just the extra 0 and the as.Date command were wrong:
library("dplyr")
df <- data.frame(dateTime = "2019-02-27T19:08:29+000",
stringsAsFactors = FALSE)
mutate(df, dateTime = as.POSIXct(dateTime, format = "%Y-%m-%dT%H:%M:%S+000"))

Strptime fails when working with a dataframe

Strptime seems to be missing something in this scenario:
aDateInPOSIXct <- strptime("2018-12-31", format = "%Y-%m-%d")
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- strptime("2019-01-01", format = "%Y-%m-%d")
df[1,1] <- bDateInPOSIXct
Assignment of bDate to the dataframe fails with:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
And a warning:
provided 11 variables to replace 1 variables
I want to use both POSIXct dates and POSIXct date-times to compare this and that. It's way less work than manipulating character strings -- and POSIX takes care of the time zone issues. Unfortunately, I'm missing something.
You only need to cast your calls to strptime to POSIXct explicitly:
aDateInPOSIXct <- as.POSIXct(strptime("2018-12-31", format = "%Y-%m-%d"))
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- as.POSIXct(strptime("2019-01-01", format = "%Y-%m-%d"))
df[1,1] <- bDateInPOSIXct
Check the R documentation which says:
Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct".

How to change the date of any format like 20080408 or 2008/04/08 or 08/04/2008

Please help as I have a csv file of large database with date column having various format of dates like 20080408 or 2008/04/08 or 08/04/2008.How do i change these format to one format of dd/mm/yyyy.In R Programing
You can do it with failure tests via lubridate dmy and mdy conversions as well (hence the suppressWarnings() calls. I don't think you're going to be able to ensure proper handling of things like "08/04/2008" if 08 is supposed to be the "day" component, tho, given that the functions can't read minds.
library(lubridate)
dat <- c("20080408", "2008/04/08", "08/04/2008")
dat.1 <- unlist(lapply(dat, function(x) {
suppressWarnings(res <- mdy(x))
if (is.na(res)) { suppressWarnings(res <- ymd(x)) }
return(as.character(res))
}))
dat.1
## [1] "2008-04-08" "2008-04-08" "2008-08-04"
The following should work for your data.frame. You may need to convert your date column to the class as.character in order that the string split function strsplit works correctly. After tha, the loop simply evaluates how many characters are in the string before the first "/" character, and adjusts the formatting accordingly.
Example:
df <- data.frame(DATE=as.character(c("20080408", "2008/04/08", "08/04/2008")), DATE2=as.Date(NA))
df$DATE=as.character(df$DATE)
for(i in seq(df$DATE)){
sp <- unlist(strsplit(df$DATE[i], "/"))
if(nchar(sp[1]) == 8){
df$DATE2[i] <- as.Date(df$DATE[i], format="%Y%m%d")
}
if(nchar(sp[1]) == 4){
df$DATE2[i] <- as.Date(df$DATE[i], format="%Y/%m/%d")
}
if(nchar(sp[1]) == 2){
df$DATE2[i] <- as.Date(df$DATE[i], format="%d/%m/%Y")
}
}
Result:
df
# DATE DATE2
#1 20080408 2008-04-08
#2 2008/04/08 2008-04-08
#3 08/04/2008 2008-04-08
You can read them as character values and convert them using as.Date.
x1 <- '20080408' ## class character (string)
x2 <- '2008/04/08'
x1.dt <- as.Date(x1, format='%Y%m%d')
x2.dt <- as.Date(x2, format='%Y/%m/%d') ## different format
print(c(x1, x2), format='%d/%m/%Y') ## you can return Date objects in any format you want
Check out ?strftime for all the formatting options.

Resources