Convert "xx-xxx-xxxx" to date in R - r

I want to convert strings such as "19-SEP-2022" to date. Is there any available function in R? Thank you.

Just to complete I want to add parse_date_time function from lubridate package. With no doubt, the preferred answer here is that of #Marco Sandri:
library(lubridate)
x <- "19-SEP-2022"
x <- parse_date_time(x, "dmy")
class(x)
[1] "2022-09-19 UTC"
> class(x)
[1] "POSIXct" "POSIXt"

Yes, strptime can be used to parse strings into dates.
You could do something like strptime("19-SEP-2022", "%d-%b-%Y").
If your days are not zero-padded, then use %e instead of %d.

A decade or so ago I starting writing the anytime package because of the firm belief that for obvious date(time) patterns we should not need to specify patterns, or learn grammars.
I still use it daily, and so do a bunch of other CRAN users.
> anytime::anydate("19-SEP-2022")
[1] "2022-09-19"
>
So here we do exaxtly what you ask for: supply the string, return a date object.

Related

Convert YYYYMM factor format to YYYY-MM format

I have data which have the format of YYYYMM and I wish convert it to YYYY-MM format.
exemple : 201805 should be in the format of 2018-05
How could I do it please ?
We can use as.yearmon from zoo to convert it to yearmon object and then do the format
library(zoo)
format(as.yearmon(as.character(v1), "%Y%m"), "%Y-%m")
#[1] "2018-05"
data
v1 <- 201805
I like the idea of using actual dates here. If the days component does not matter to you, then you may arbitrarily just set each of your dates to the first of the month. Then, we can leverage R's dates functions to handle the heavy lifting.
x <- "201805"
x <- paste0(x, "01")
x
y <- format(as.Date(x, format = "%Y%m%d"), "%Y-%m-%d")
substr(y, 1, 7)
[1] "20180501"
[1] "2018-05"
You could use regular expressions:
data <- "201805"
sub("(\\d{4})", "\\1-", data)
[1] "2018-05"
Another variant, using only lookarounds:
sub("(?<=\\d{4})(?=\\d{2})", "-", data, perl=TRUE)
How about following one(I am considering that OP need not to perform any checks on its variable's value here).
val="201805"
sub("(..$)","-\\1",val)
OR to perform substitution with last 2 digits only try following.
val="201805"
sub("(\\d{2}$)","-\\1",val)
[1] "2018-05"
Very similar to some of the others, but because I find the package useful I will mention it:
library(lubridate)
date <- "201805"
format(ymd(paste0(date,"01")), "%Y-%m")
Lubridate can make life easy if the formats start to vary.
Here is another option albeit a longer one:
library(tidyverse)
somestring<-"201805"
stringi::stri_sub(somestring,1,4)<-"-"
somestring1<-"201805"
somestring2<-substring(somestring1,1,4)
as.character.Date(paste0(somestring2,somestring))
Result:
"2018-05"

R - converting dates within data.frame

I am working with a "data.frame" which are given in the following formate: Aug 12, 2017.
class(data[,1]) = factor
How can i convert these into dates?
data[,1] <- as.Date.factor(data[,1],format = "%m.%d.%y"), returns NA's.
I would suggest the package lubridate for very easy to use functions to operate with dates. For example:
mdy("Aug 12,2017")
[1] "2017-08-12"
If your date is in YYYY-MM-DD format, you can use the ymd function. There are also other functions such as dmy, dmy_hms (for datetime), etc.
If your column is called my.date, you can do:
data$my.date <- mdy(data$my.date)
Alternatively, you can use the %<>% operator from magrittr to make your code even shorter:
data$my.date %<>% mdy
Use as.POSIXct (Base-R Solution):
as.POSIXct("Aug 12,2017", format="%b%d,%Y")
Output:
[1] "2017-08-12 CEST"
Using strptime, could work:
strptime("Aug 12,2017", "%b%d,%Y")
Output:
[1] "2017-08-12 UTC"
The second parameter for strptime is the format of the dates you have. For instance, if your dates are like this "1/5/2005", then the format would be:
format="%m/%d/%Y"
Hope it helps

R convert character "111213" into proper time which is "11:12:13"

R convert character "111213" into time "11:12:13".
strptime("111213", format="%H%m%s") gives NA
and
strptime("111213", "%H%m%s") gives 1970-01-01 01:00:13 CET
I think the canonical answer would be as in my comment:
format(strptime("111213", format="%H%M%S"), "%H:%M:%S")
#[1] "11:12:13"
where you can read ?strptime for all the details. format is a generic function, and in this specific case we are using format.POSIXlt.
Another solution is to merely play with string:
paste(substring("111213", c(1,3,5), c(2,4,6)), collapse = ":")
#[1] "11:12:13"
This makes sense because your input is really not a Date-Time: there is no Date.
We can use
library(chron)
times(gsub("(.{2})(?=\\d)", "\\1:", "111213", perl = TRUE))
#[1] 11:12:13
To manipulate time, you can use hms package.
By default, it working with %H:%M;%S (or %X format).
For you specifique time format ("111213"), you need to go through base function as.difftime
hms::as.hms(as.difftime("111213", format = "%H%M%S"))
#> 11:12:13
So if we incorporate also date in similar "integer" format we can obtain command :
strptime("20181017 112233", format="%Y%m%d %H%M%S")

how to take the difference between dates like "30MAY07"?

In R, how can I convert time variable "30MAY07" or "21AUG09" to a value? I want to find the time difference between them. Thanks!
You can use the lubridate package for this:
library(lubridate)
dmy(c('30MAY07', '21AUG09'))
# [1] "2007-05-30 UTC" "2009-08-21 UTC"
strftime and as.Date from base R are also good options, but lubridate make very good informed guesses as to the format of the date. You see in your example, there is no need to specify anything else than to use the day month year function (dmy) and things work out of the box.

extract part of a date in a dataframe column

thanks for your help in advance. i am working with the getQuote function in the quantmod package, which returns the following data frame:
is there a way to modify all the dates in the first column to exclude the time stamp, while retaining the data frame structure? i just want the "YYYY-MM-DD" in the first column. i know that if it was a vector of dates, i would use substr(df[,1],1,10). i have also looked into the apply function, with: apply(df[,1],1,substr,1,10).
Another option not mentioned yet:
tt <- getQuote("AAPL")
trunc(tt[,1], units='days')
This returns the date in POSIXlt. You can wrap it in as.POSIXct, if you want.
using ?strptime
tt <- getQuote("AAPL")
tt[,1]
[1] "2013-01-16 02:52:00 CET"
as.POSIXct(strptime(tt[,1],format ='%Y-%m-%d')) ## as.POSIXct because strptime returns POSIXlt
[1] "2013-01-16 CET"
EDIT
You can use the format argument of POSIXct, but you need to convert the tt[,1] to character before.
as.POSIXct(as.character(tt[,1]),format ='%Y-%m-%d')
[1] "2013-01-16 CET"
I would do this with lubridate
library(plyr)
library(lubridate)
tickers <- c("AAPL","AAJX","ABR")
df <- ldply(tickers, getQuote)
rownames(df) <- tickers
df[,"Trade Time"] <- paste(year(df[,"Trade Time"]),month(df[,"Trade Time"]),day(df[,"Trade Time"]),sep="-")
There might be a more elegant way of printing the date, but this is what came to me first.
You may just use gsub. No need to convert data type.
tt <- getQuote("AAPL")
tt[, 'Trade Time']<- gsub(" [0-9]{2}:[0-9]{2}:[0-9]{2}", "", tt[, 'Trade Time'])
It can be as simple as:
tt[,1]=as.Date(tt[,1])
(where tt is tt <- getQuote("AAPL"), as shown in the alternative answers)
The blank before the comma means "do all rows" and the 1 after the comma means "operate on (just) the first column".
I prefer this solution because it gives you a Date object, which must be exactly what you want if you are trying to strip off timestamps.
agstudy's answer give you a date with a timezone, and that is going to bite you the first time you run your script in a different timezone. (Aside: I got some regressions in a unit test suite when I ran them in the U.K. while there at Christmas, due to a subtle timezone assumption in my test code.)

Resources