parse date time function order format - r

Format of original input data is like: "1975M01", the variable name is Month.
The way we put the input into POSIXct date-time object is: parse_date_time(Month, "%Y%m").
I wonder why we ignored the 'M'(1975M01), like how can the function recognize the year and month and ignore the M in the middle automatically?

You can try as.POSIXct(). We have no days, though. But probably we want the first day of the month and just paste 01 at the end of the string(s). Finally define the "M" literally in format= string, and of course add %day,
x <- as.POSIXct(paste0('1975M01', '01'), format='%YM%m%d')
x
# [1] "1975-01-01 CET"
where
class(x)
# [1] "POSIXct" "POSIXt"
.
This should also work with lubridate but can't install it.

Related

R: why a '%b.%Y' date class is not "Date"?

Sometimes I work with data like this:
sep-2018
From date like this:
Sys.Date()
[1] "2018-09-21"
To have this result, I generally use:
format(Sys.Date(),'%b-%Y')
But its class is not a date:
class(format(Sys.Date(),'%b-%Y'))
[1] "character"
Why it's not a date? Is it possible to have it with class() = date, and how?
Also an external library like zoo have the same thing.
library(zoo)
> class(format(as.yearmon(format(Sys.Date()), "%Y-%m-%d"), "%b.%Y"))
[1] "character"
Also using "%m.%Y" seems to generate the same thing, but it does not creates (for example) ordering issue.
The format command takes the date and outputs a printable string based on the format you provide. To quote the documentation:
An object of similar structure to x containing character representations of the
elements of the first argument x in a common format, and in the current
locale's encoding.
Also, a Date variable is stored as a numeric type internally (number of days since 1970-01-01)
dput(Sys.Date())
#structure(17795, class = "Date")
structure(0, class = "Date")
#[1] "1970-01-01"
So to pinpoint the date, you need day, month and year fields. If you don't have all three, it will probably return NA or an error. Similarly for time classes. If you don't have the data then you can just use some dummy values, and use format to print only the fields you want.
As Rohit says, format doesn't outputs a Date object, but a string in the format of your choice.
To get a Date object from a string like "sep-2018" you could use readr::parse_date().
(my_date <- readr::parse_date("sep-2018", format = '%b-%Y'))
#[1] "2018-09-01"
class(my_date)
#[1] "Date"

How do I change the format of a char vector containing milliseconds to timeseries vector in R

I have a DF in R which has two character columns. The first column is a time series array and the second column contains continuous numbers. The time series field has time recorded in milliseconds. I am trying to convert this array to a date array. However whichever method I use to convert the same, I lose the milliseconds information.
Following is the dataframe:
time = c("08-08-2016 09:16:33.430","08-08-2016 09:16:37.930")
values <- c(45,21)
my_data <- data.frame(time,values)
I would like to preserve the millisecond information. However, as I convert the time char array using following method, I lose the milliseconds (O/P time array= 2016-08-08 09:16:33,08-08-2016 09:16:37) .
my_data$time=strptime(my_data$time,format="%m-%d-%Y %H:%M:%S.%OS")
I also tried using as.POSIXct, as.Date functions but could not resolve. Can someone please help?
%OS instead of %S, not in addition to it. "%m-%d-%Y %H:%M:%OS" is the format string required:
options(digits.secs=6)
as.POSIXct(my_data$time, format="%m-%d-%Y %H:%M:%OS")
#[1] "2016-08-08 09:16:33.43 AEST" "2016-08-08 09:16:37.93 AEST"
You have a standard-enough format so that anytime can parse this automagically with additional input from you:
R> timevec <- c("08-08-2016 09:16:33.430","08-08-2016 09:16:37.930")
R> anytime(timevec)
[1] "2016-08-08 09:16:33.43 CDT" "2016-08-08 09:16:37.93 CDT"
R>
I tend to have options(digits.secs=6) set by default which is why the display also shows the fractional seconds.

R not recognizing time component of datetime values

I have a dataframe where one column lists a bunch of datetimes. Oddly, the data type for that column is "integer." I need to coerce the column to a proper datetime data type such as POSIXct so that I can subtract these timestamps from those in another field. However, when I try to coerce these datetime values into POSIXct, they lose the time component. When I try to do math on the datetimes without first coercing into another datatype, R acts as if the time component of the timestamp isn't there (it assumes each date has a time of midnight). What's going on and how do I fix it so that R recognizes the timestamp?
> dates[1]
[1] 2016-05-05T16:46:21-04:00
48 Levels: 2016-05-03T06:45:42-04:00 2016-05-03T06:45:43-04:00 ... 2016-05-05T16:50:00-04:00
> typeof(dates)
[1] "integer"
> as.POSIXct(dates[1])
[1] "2016-05-05 EDT"
> as.character(dates[1])
[1] "2016-05-05T16:46:21-04:00"
> as.POSIXct(as.character(dates[1]))
[1] "2016-05-05 EDT"
You can use as.POSIXct with the tz argument to convert the timestamps with the right level of control.
If the timezones are all UTC-04:00 and that is your local timezone, you can use:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz=Sys.timezone())
If they are all UTC-04:00 and that is not your local timezone, but you know the exact location, then you can specify the appropriate timezone from the tz database:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz="America/Port_of_Spain")
Alternatively, you can use a generic GMT-4 timezone:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz="Etc/GMT-4")
[EDIT: With thanks to Roland for his comment below. I originally used strptime, which uses the same syntax, but returns a POSIXlt object.]

error in getting the correct date using strptime in R

I'm using strptime to extract date and the result is a wrong year
Where is the error in the below code:
strptime('8/29/2013 14:13', "%m/%d/%y")
[1] "2020-08-29 PDT"
What are the other ways to extract date and time as separate columns.
The data I have is in this format - 8/29/2013 14:13
I want to split this into two columns, one is 8/29/2013 and the other is 14:13.
You have a four digit year so you need to use %Y
strptime('8/29/2013 14:13', "%m/%d/%Y" )
[1] "2013-08-29 CEST"
Do you really want data and time in separate columns? It usually much easier to deal with a single date-time object.
Here's one possibility to separate time and date from the string.
For convenience, we could first convert the string into a POSIX object:
datetime <- '8/29/2013 14:13'
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
Then we can use as.Date() to extract the date from this object and use format() to display it in the desired format:
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013"
To store the time separately we can use, e.g., the strftime() function:
strftime(datetime.P, '%H:%M')
#[1] "14:13"
The last function (strftime()) is not vectorized, which means that if we are dealing with a vector datetime containing several character strings with date and time in the format as described in the OP, it should be wrapped into a loop like sapply() to extract the time from each string.
Example
datetime <- c('8/29/2013 14:13', '9/15/2014 12:03')
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013" "09/15/2014"
sapply(datetime.P, strftime, '%H:%M')
#[1] "14:13" "12:03"
Hope this helps.

extract part of a date in a dataframe column

thanks for your help in advance. i am working with the getQuote function in the quantmod package, which returns the following data frame:
is there a way to modify all the dates in the first column to exclude the time stamp, while retaining the data frame structure? i just want the "YYYY-MM-DD" in the first column. i know that if it was a vector of dates, i would use substr(df[,1],1,10). i have also looked into the apply function, with: apply(df[,1],1,substr,1,10).
Another option not mentioned yet:
tt <- getQuote("AAPL")
trunc(tt[,1], units='days')
This returns the date in POSIXlt. You can wrap it in as.POSIXct, if you want.
using ?strptime
tt <- getQuote("AAPL")
tt[,1]
[1] "2013-01-16 02:52:00 CET"
as.POSIXct(strptime(tt[,1],format ='%Y-%m-%d')) ## as.POSIXct because strptime returns POSIXlt
[1] "2013-01-16 CET"
EDIT
You can use the format argument of POSIXct, but you need to convert the tt[,1] to character before.
as.POSIXct(as.character(tt[,1]),format ='%Y-%m-%d')
[1] "2013-01-16 CET"
I would do this with lubridate
library(plyr)
library(lubridate)
tickers <- c("AAPL","AAJX","ABR")
df <- ldply(tickers, getQuote)
rownames(df) <- tickers
df[,"Trade Time"] <- paste(year(df[,"Trade Time"]),month(df[,"Trade Time"]),day(df[,"Trade Time"]),sep="-")
There might be a more elegant way of printing the date, but this is what came to me first.
You may just use gsub. No need to convert data type.
tt <- getQuote("AAPL")
tt[, 'Trade Time']<- gsub(" [0-9]{2}:[0-9]{2}:[0-9]{2}", "", tt[, 'Trade Time'])
It can be as simple as:
tt[,1]=as.Date(tt[,1])
(where tt is tt <- getQuote("AAPL"), as shown in the alternative answers)
The blank before the comma means "do all rows" and the 1 after the comma means "operate on (just) the first column".
I prefer this solution because it gives you a Date object, which must be exactly what you want if you are trying to strip off timestamps.
agstudy's answer give you a date with a timezone, and that is going to bite you the first time you run your script in a different timezone. (Aside: I got some regressions in a unit test suite when I ran them in the U.K. while there at Christmas, due to a subtle timezone assumption in my test code.)

Resources