In R, how can I convert time variable "30MAY07" or "21AUG09" to a value? I want to find the time difference between them. Thanks!
You can use the lubridate package for this:
library(lubridate)
dmy(c('30MAY07', '21AUG09'))
# [1] "2007-05-30 UTC" "2009-08-21 UTC"
strftime and as.Date from base R are also good options, but lubridate make very good informed guesses as to the format of the date. You see in your example, there is no need to specify anything else than to use the day month year function (dmy) and things work out of the box.
Related
I want to convert strings such as "19-SEP-2022" to date. Is there any available function in R? Thank you.
Just to complete I want to add parse_date_time function from lubridate package. With no doubt, the preferred answer here is that of #Marco Sandri:
library(lubridate)
x <- "19-SEP-2022"
x <- parse_date_time(x, "dmy")
class(x)
[1] "2022-09-19 UTC"
> class(x)
[1] "POSIXct" "POSIXt"
Yes, strptime can be used to parse strings into dates.
You could do something like strptime("19-SEP-2022", "%d-%b-%Y").
If your days are not zero-padded, then use %e instead of %d.
A decade or so ago I starting writing the anytime package because of the firm belief that for obvious date(time) patterns we should not need to specify patterns, or learn grammars.
I still use it daily, and so do a bunch of other CRAN users.
> anytime::anydate("19-SEP-2022")
[1] "2022-09-19"
>
So here we do exaxtly what you ask for: supply the string, return a date object.
I'm just starting my adventure with the lubridate package and the dates in R. And at the beginning I was surprised by some behavior. Because when I do this seq(ymd("2021-01-01"), ymd("2021-01-04"), ddays(1)) I only get one date [1]" 2021-01-01". But when I do this seq(ymd_h("2021-01-01 00"), ymd_h("2021-01-04 00"), ddays(1)) I get the more expected result which is four dates "2021- 01-01 UTC" "2021-01-02 UTC" "2021-01-03 UTC" "2021-01-04 UTC".
I admit that it surprised me a lot.
I will be very grateful for explaining in simple words why this is happening.
And immediately the second question. Is there any function like seq that would correctly understand the d... functions in the lubridate package (ddays, dhours, dminutes etc)?
seq is not part of the lubridate package and doesn't understand the d... functions.
ymd returns a Date, so when you call seq, you are using seq.Date.
You want seq(ymd("2021-01-01"), ymd("2021-01-04"), "days")
ymd_h returns a POSIXct object, so then seq is using seq.POSIXct.
You again want seq(ymd_h("2021-01-01"), ymd_h("2021-01-04"), "days"), but now the result is a POSIXct vector.
See the help for seq.Date and seq.POSIXct to see how they differ.
The new clock package has many good functions for date manipulation, including one called date_seq you might find useful.
I am working with a "data.frame" which are given in the following formate: Aug 12, 2017.
class(data[,1]) = factor
How can i convert these into dates?
data[,1] <- as.Date.factor(data[,1],format = "%m.%d.%y"), returns NA's.
I would suggest the package lubridate for very easy to use functions to operate with dates. For example:
mdy("Aug 12,2017")
[1] "2017-08-12"
If your date is in YYYY-MM-DD format, you can use the ymd function. There are also other functions such as dmy, dmy_hms (for datetime), etc.
If your column is called my.date, you can do:
data$my.date <- mdy(data$my.date)
Alternatively, you can use the %<>% operator from magrittr to make your code even shorter:
data$my.date %<>% mdy
Use as.POSIXct (Base-R Solution):
as.POSIXct("Aug 12,2017", format="%b%d,%Y")
Output:
[1] "2017-08-12 CEST"
Using strptime, could work:
strptime("Aug 12,2017", "%b%d,%Y")
Output:
[1] "2017-08-12 UTC"
The second parameter for strptime is the format of the dates you have. For instance, if your dates are like this "1/5/2005", then the format would be:
format="%m/%d/%Y"
Hope it helps
I am currently working on replicating a SAS code to R.
In SAS, there is INTNX function that helps to advance a date by a given interval.
For example -
intnx('month','2013/12/10',3) = 2014/03/10
I was wondering if there is a function in R that works in a similar fashion?
Using lubridate package you can simply do this:
library(lubridate)
ymd("2013/12/10") + months(3)
[1] "2014-03-10 UTC"
Note also if you want to add a month without exceeding the last day of the new month, you should use %m+:
ymd("2013/01/31") %m+% months(1)
[1] "2013-02-28 UTC"
There is. You could do:
seq(as.Date("2013-12-10"), length=2, by="3 months")[2]
[1] "2014-03-10"
thanks for your help in advance. i am working with the getQuote function in the quantmod package, which returns the following data frame:
is there a way to modify all the dates in the first column to exclude the time stamp, while retaining the data frame structure? i just want the "YYYY-MM-DD" in the first column. i know that if it was a vector of dates, i would use substr(df[,1],1,10). i have also looked into the apply function, with: apply(df[,1],1,substr,1,10).
Another option not mentioned yet:
tt <- getQuote("AAPL")
trunc(tt[,1], units='days')
This returns the date in POSIXlt. You can wrap it in as.POSIXct, if you want.
using ?strptime
tt <- getQuote("AAPL")
tt[,1]
[1] "2013-01-16 02:52:00 CET"
as.POSIXct(strptime(tt[,1],format ='%Y-%m-%d')) ## as.POSIXct because strptime returns POSIXlt
[1] "2013-01-16 CET"
EDIT
You can use the format argument of POSIXct, but you need to convert the tt[,1] to character before.
as.POSIXct(as.character(tt[,1]),format ='%Y-%m-%d')
[1] "2013-01-16 CET"
I would do this with lubridate
library(plyr)
library(lubridate)
tickers <- c("AAPL","AAJX","ABR")
df <- ldply(tickers, getQuote)
rownames(df) <- tickers
df[,"Trade Time"] <- paste(year(df[,"Trade Time"]),month(df[,"Trade Time"]),day(df[,"Trade Time"]),sep="-")
There might be a more elegant way of printing the date, but this is what came to me first.
You may just use gsub. No need to convert data type.
tt <- getQuote("AAPL")
tt[, 'Trade Time']<- gsub(" [0-9]{2}:[0-9]{2}:[0-9]{2}", "", tt[, 'Trade Time'])
It can be as simple as:
tt[,1]=as.Date(tt[,1])
(where tt is tt <- getQuote("AAPL"), as shown in the alternative answers)
The blank before the comma means "do all rows" and the 1 after the comma means "operate on (just) the first column".
I prefer this solution because it gives you a Date object, which must be exactly what you want if you are trying to strip off timestamps.
agstudy's answer give you a date with a timezone, and that is going to bite you the first time you run your script in a different timezone. (Aside: I got some regressions in a unit test suite when I ran them in the U.K. while there at Christmas, due to a subtle timezone assumption in my test code.)