Convert timestamps without adding yyyy-mm-dd - r

I want to convert timestamps from characters to time but I do not want any date. On top of that, it must display milliseconds, as in "hh:mm:ss,os".
If I use as.POSIXCT it always adds a date prefix to my timestamp and that is not my intention. I also checked the lubridate package but I can't seem to find a function that goes beyond "as.hms" so that it displays at least two digits in milliseconds.
Example using POSIXct
df <-c("01:31:12.20","01:31:14.56","01:31:14.84")
options(digits.secs = 2)
df <- as.POSIXct(df, format="%H:%M:%OS")
This is the outcome:
[1] "2019-03-15 01:31:12.20 EDT" "2019-03-15 01:31:14.55 EDT"
[3] "2019-03-15 01:31:14.83 EDT"
Thank you.

Perhaps, the hms package does what the OP expects. hms implements an S3 class for storing and formatting time-of-day values, based on the 'difftime' class.
library(hms)
as.hms(df)
01:31:12.200000
01:31:14.560000
01:31:14.840000
It can be used for calculation, e.g.,
diff(as.hms(df))
00:00:02.360000
00:00:00.280000
Please, note that the print() and format() methods do not accept other parameters and do not respect options(digits.secs = 2).
The hms class is similar to lubridate's as.hms() function which creates a period object:
lubridate::hms(df)
[1] "1H 31M 12.2S" "1H 31M 14.56S" "1H 31M 14.84S"
Arithmetic can be done as well:
diff(lubridate::hms(df))
[1] 2.36 0.28
Please, be aware that the internal representation of time, date, and datetime objects usually is based on numeric types to allow for doing calculations. The internal representation is different from the character string when the object is printed.
The ITime class in the data.table package is a time-of-day class which stores the integer number of seconds in the day. So, it cannot handle milliseconds.

Related

Convert "xx-xxx-xxxx" to date in R

I want to convert strings such as "19-SEP-2022" to date. Is there any available function in R? Thank you.
Just to complete I want to add parse_date_time function from lubridate package. With no doubt, the preferred answer here is that of #Marco Sandri:
library(lubridate)
x <- "19-SEP-2022"
x <- parse_date_time(x, "dmy")
class(x)
[1] "2022-09-19 UTC"
> class(x)
[1] "POSIXct" "POSIXt"
Yes, strptime can be used to parse strings into dates.
You could do something like strptime("19-SEP-2022", "%d-%b-%Y").
If your days are not zero-padded, then use %e instead of %d.
A decade or so ago I starting writing the anytime package because of the firm belief that for obvious date(time) patterns we should not need to specify patterns, or learn grammars.
I still use it daily, and so do a bunch of other CRAN users.
> anytime::anydate("19-SEP-2022")
[1] "2022-09-19"
>
So here we do exaxtly what you ask for: supply the string, return a date object.

Time series, lubridate, and the seq function

I'm just starting my adventure with the lubridate package and the dates in R. And at the beginning I was surprised by some behavior. Because when I do this seq(ymd("2021-01-01"), ymd("2021-01-04"), ddays(1)) I only get one date [1]" 2021-01-01". But when I do this seq(ymd_h("2021-01-01 00"), ymd_h("2021-01-04 00"), ddays(1)) I get the more expected result which is four dates "2021- 01-01 UTC" "2021-01-02 UTC" "2021-01-03 UTC" "2021-01-04 UTC".
I admit that it surprised me a lot.
I will be very grateful for explaining in simple words why this is happening.
And immediately the second question. Is there any function like seq that would correctly understand the d... functions in the lubridate package (ddays, dhours, dminutes etc)?
seq is not part of the lubridate package and doesn't understand the d... functions.
ymd returns a Date, so when you call seq, you are using seq.Date.
You want seq(ymd("2021-01-01"), ymd("2021-01-04"), "days")
ymd_h returns a POSIXct object, so then seq is using seq.POSIXct.
You again want seq(ymd_h("2021-01-01"), ymd_h("2021-01-04"), "days"), but now the result is a POSIXct vector.
See the help for seq.Date and seq.POSIXct to see how they differ.
The new clock package has many good functions for date manipulation, including one called date_seq you might find useful.

How to avoid that anytime(<numeric>) "updates by reference"?

I want to convert a numeric variable to POSIXct using anytime. My issue is that anytime(<numeric>) converts the input variable as well - I want to keep it.
Simple example:
library(anytime)
t_num <- 1529734500
anytime(t_num)
# [1] "2018-06-23 08:15:00 CEST"
t_num
# [1] "2018-06-23 08:15:00 CEST"
This differs from the 'non-update by reference' behaviour of as.POSIXct in base R:
t_num <- 1529734500
as.POSIXct(t_num, origin = "1970-01-01")
# [1] "2018-06-23 08:15:00 CEST"
t_num
# 1529734500
Similarly, anydate(<numeric>) also updates by reference:
d_num <- 17707
anydate(d_num)
# [1] "2018-06-25"
d_num
# [1] "2018-06-25"
I can't find an explicit description of this behaviour in ?anytime. I could use as.POSIXct as above, but does anyone know how to handle this within anytime?
anytime author here: this is standard R and Rcpp and passing-by-SEXP behaviour: you cannot protect a SEXP being passed from being changed.
The view that anytime takes is that you are asking for an input to be converted to a POSIXct as that is what anytime does: from char, from int, from factor, from anything. As a POSIXct really is a numeric value (plus a S3 class attribute) this is what you are getting.
If you do not want this (counter to the design of anytime) you can do what #Moody_Mudskipper and #PKumar showed: used a temporary expression (or variable).
(I also think the data.table example is a little unfair as data.table -- just like Rcpp -- is very explicit about taking references where it can. So of course it refers back to the original variable. There are idioms for deep copy if you need them.)
Lastly, an obvious trick is to use format if you just want different display:
R> d <- data.frame(t_num=1529734500)
R> d[1, "posixct"] <- format(anytime::anytime(d[1, "t_num"]))
R> d
t_num posixct
1 1529734500 2018-06-23 01:15:00
R>
That would work the same way in data.table, of course, as the string representation is a type change. Ditto for IDate / ITime.
Edit: And the development version in the Github repo has had functionality to preserve the incoming argument since June 2017. So the next CRAN version, whenever I will push it, will have it too.
You could hack it like this:
library(anytime)
t_num <- 1529734500
anytime(t_num+0)
# POSIXct[1:1], format: "2018-06-23 08:15:00"
t_num
# [1] 1529734500
Note that an integer input will be treated differently:
t_int <- 1529734500L
anytime(t_int)
# POSIXct[1:1], format: "2018-06-23 08:15:00"
t_int
# [1] 1529734500
If you do this, it will work :
t_num <- 1529734500
anytime(t_num*1)
#> anytime(t_num*1)
#[1] "2018-06-23 06:15:00 UTC"
#> t_num
#[1] 1529734500
Any reason to be married to anytime?
.POSIXct(t_num, tz = 'Europe/Berlin')
# [1] "2018-06-23 08:15:00 CEST"
.POSIXct(x, tz) is a wrapper for structure(x, class = c('POSIXct', 'POSIXt'), tzone = tz) (i.e. you can ignore declaring the origin), and is essentially as.POSIXct.numeric (except the latter is flexible in allowing non-UTC origin dates), look at print(as.POSIXct.numeric).
When I did my homework before posting the question, I checked the open anytime issues. I have now browsed the closed ones as well, where I found exactly the same issue as mine:
anytime is overwriting inputs
There the package author writes:
I presume because as.POSIXct() leaves its input alone, we should too?
So from anytime version 0.3.1 (unreleased):
Numeric input is now preserved rather than silently cast to the return object type
Thus, one answer to my question is: "wait for 0.3.1"*.
When 0.3.1 is released, the behaviour of anytime(<numeric>) will agree with anytime(<non-numeric>) and as.POSIXct(<numeric>), and work-arounds not needed.
*Didn't have to wait too long: 0.3.1 is now released: "Numeric input is now preserved rather than silently cast to the return object type"

How do I change the format of a char vector containing milliseconds to timeseries vector in R

I have a DF in R which has two character columns. The first column is a time series array and the second column contains continuous numbers. The time series field has time recorded in milliseconds. I am trying to convert this array to a date array. However whichever method I use to convert the same, I lose the milliseconds information.
Following is the dataframe:
time = c("08-08-2016 09:16:33.430","08-08-2016 09:16:37.930")
values <- c(45,21)
my_data <- data.frame(time,values)
I would like to preserve the millisecond information. However, as I convert the time char array using following method, I lose the milliseconds (O/P time array= 2016-08-08 09:16:33,08-08-2016 09:16:37) .
my_data$time=strptime(my_data$time,format="%m-%d-%Y %H:%M:%S.%OS")
I also tried using as.POSIXct, as.Date functions but could not resolve. Can someone please help?
%OS instead of %S, not in addition to it. "%m-%d-%Y %H:%M:%OS" is the format string required:
options(digits.secs=6)
as.POSIXct(my_data$time, format="%m-%d-%Y %H:%M:%OS")
#[1] "2016-08-08 09:16:33.43 AEST" "2016-08-08 09:16:37.93 AEST"
You have a standard-enough format so that anytime can parse this automagically with additional input from you:
R> timevec <- c("08-08-2016 09:16:33.430","08-08-2016 09:16:37.930")
R> anytime(timevec)
[1] "2016-08-08 09:16:33.43 CDT" "2016-08-08 09:16:37.93 CDT"
R>
I tend to have options(digits.secs=6) set by default which is why the display also shows the fractional seconds.

R not recognizing time component of datetime values

I have a dataframe where one column lists a bunch of datetimes. Oddly, the data type for that column is "integer." I need to coerce the column to a proper datetime data type such as POSIXct so that I can subtract these timestamps from those in another field. However, when I try to coerce these datetime values into POSIXct, they lose the time component. When I try to do math on the datetimes without first coercing into another datatype, R acts as if the time component of the timestamp isn't there (it assumes each date has a time of midnight). What's going on and how do I fix it so that R recognizes the timestamp?
> dates[1]
[1] 2016-05-05T16:46:21-04:00
48 Levels: 2016-05-03T06:45:42-04:00 2016-05-03T06:45:43-04:00 ... 2016-05-05T16:50:00-04:00
> typeof(dates)
[1] "integer"
> as.POSIXct(dates[1])
[1] "2016-05-05 EDT"
> as.character(dates[1])
[1] "2016-05-05T16:46:21-04:00"
> as.POSIXct(as.character(dates[1]))
[1] "2016-05-05 EDT"
You can use as.POSIXct with the tz argument to convert the timestamps with the right level of control.
If the timezones are all UTC-04:00 and that is your local timezone, you can use:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz=Sys.timezone())
If they are all UTC-04:00 and that is not your local timezone, but you know the exact location, then you can specify the appropriate timezone from the tz database:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz="America/Port_of_Spain")
Alternatively, you can use a generic GMT-4 timezone:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz="Etc/GMT-4")
[EDIT: With thanks to Roland for his comment below. I originally used strptime, which uses the same syntax, but returns a POSIXlt object.]

Resources