formatting time in R error - r

I have a Time column in my df with value 1.01.2016 0:00:05. I want it without the seconds and therefore used df$Time <- as.POSIXct(df$Time, format = "%d.%m.%Y :%H:%M", tz = "Asia/Kolkata"). But I get NA value. What is the problem here?

I suspect there are two things working here: the storage of a time object (POSIXt), and the representation of that object.
The string you present is (I believe) not a proper POSIXt (whether POSIXct or POSIXlt) object for R, which means it is just a character string. In that case, you can remove it with:
gsub(':[^:]*$', '', '1.01.2016 0:00:05')
# [1] "1.01.2016 0:00"
However, that is still just a string, not a date or time object. If you parse it into a time-object that R knows about:
as.POSIXct("1.01.2016 0:00:05", format = "%d.%m.%Y %H:%M:%S", tz = "Asia/Kolkata")
# [1] "2016-01-01 00:00:05 IST"
then you now have a time object that R knows something about ... and it defaults to representing it (printing it on the console) with seconds-precision. Typically, all that is available to change for the console-printing is the precision of the seconds, as in
options("digits.secs")
# $digits.secs
# NULL
Sys.time()
# [1] "2018-06-26 18:21:06 PDT"
options("digits.secs"=3)
Sys.time()
# [1] "2018-06-26 18:21:10.090 PDT"
then you can get more. But alas, I do know think there is an R-option to say "always print my POSIXt objects in this way". So your only choice is (at the point where you no longer need it to be a time-like object) to change it back into a string with no time-like value:
x <- as.POSIXct("1.01.2016 0:00:05", format = "%d.%m.%Y %H:%M:%S", tz = "Asia/Kolkata")
x
# [1] "2016-01-01 00:00:05 IST"
?strptime
# see that day-of-month can either be "%d" for 01-31 or "%e" for 1-31
format(x, format="%e.%m.%Y %H:%M")
# [1] " 1.01.2016 00:00"
(This works equally well for a vector.)
Part of me suggests convert to POSIXt and back to string as opposed to my gsub example because using as.POSIXct will tell you when the string does not match the date-time-like object you are expecting, whereas gsub will happily do something wrong or nothing.

Try asPOSIXlt:
> test <- "1.01.2016 0:00:05"
> as.POSIXlt(test, "%d.%m.%Y %H:%M:%S", tz="Asia/Kolkata")
[1] "2016-01-01 00:00:05 IST"

Related

R common character to date converter for multiple formats

I am working with an input file where I have different string dates given in different month,day,year formats
example input ->
input <- c("2014-08-31 23:59:38" , "9/1/2014 00:00:25","2014-08-31 13:39:23", "12/1/2014 20:03:28")
How can I use a single function that would convert various formats of dates, in a fast manner, I am processing millions of lines
so far I have written this function:
convert_date <- function(x){
if (is.na(mdy_hms(x))){
return(ymd_hms(x))
}
return(mdy_hms(x))
}
However, it is extremely slow, I am looking for a faster and more convenient method.
Thank you so much for your time.
If you can construct a vector of possible formats that the date could be in, you could use clock. For each date-time string, it stops on the first format that succeeds.
Note that this only works if your formats are unambiguous. i.e. it would probably give you faulty results if you had both %m/%d/%Y and %d/%m/%Y in the same vector, because those are ambiguous.
library(clock)
input <- c(
"2014-08-31 23:59:38" , "9/1/2014 00:00:25",
"2014-08-31 13:39:23", "12/1/2014 20:03:28"
)
format <- c("%Y-%m-%d %H:%M:%S", "%m/%d/%Y %H:%M:%S")
date_time_parse(input, zone = "UTC", format = format)
#> [1] "2014-08-31 23:59:38 UTC" "2014-09-01 00:00:25 UTC"
#> [3] "2014-08-31 13:39:23 UTC" "2014-12-01 20:03:28 UTC"

POSIXct Strips Seconds from 12-hour Timestamp

I'm trying to convert a 12-hour timestamp to a POSIXct object in R. For some reason it strips away the seconds after the conversion.
## timestamp
chk = '17-MAY-16 04.51.34.000000000 PM'
## convert
as.POSIXct(chk, format = '%d-%b-%y %I.%M.%S.%OS %p', tz = 'America/New_York')
[1] "2016-05-17 16:51:00 EDT"
Am I doing something incorrectly?
It does not strip the seconds. It simply adheres to a default for printing and formatting which does not include subseconds.
Witness an example that
actually has subsecond entries
runs in a session with options(digits.secs) set correctly
corrects one error you had in the format string
Demo:
R> options(digits.secs=6) # important to tell R we want subsecs
R> input <- '17-MAY-16 04.51.34.123456 PM'
R> as.POSIXct(input, '%d-%b-%y %I.%M.%OS %p', tz = 'America/New_York')
[1] "2016-05-17 16:51:34.123456 EDT"
R>
If we reset digits.secs=0 it falls back to whole seconds only (which is after all a good default for many settings, though one may argue that %0S could override it...)
R> options(digits.secs=0) # reset
R> as.POSIXct(input, '%d-%b-%y %I.%M.%OS %p', tz = 'America/New_York')
[1] "2016-05-17 16:51:34 EDT"
R>
Also note the small change to the format string. Don't use both %S and %OS.

setting column to datetime in R

The date in my dataset is like this: 20130501000000 and I'm trying to convert this to a better datetime format in R
data1$date <- as.Date(data1$date, format = "%Y-%m-%s-%h-%m-%s")
However, I get an error for needing an origin. After I put the very first cell under date in as origin, it converts every cell under date to N/A. Is this right or should I try as.POSIXct()?
That is a somewhat degenerate format, but the anytime() and anydate() functions of the anytime package can help you, without requiring any explicit format strings:
R> anytime("20130501000000") ## returns POSIXct
[1] "2013-05-01 CDT"
R> anydate("20130501000000") ## returns Date
[1] "2013-05-01"
R>
Not that we parse from character representation here -- parsing from numeric would be wrong as we use a conflicting heuristic to make sense of dates stored a numeric values.
So here your code would just become
data1$data <- anytime::anydate(data1$date)
provided data1$date is in character, else wrap one as.character() around it.
Lastly, if you actually want Datetime rather than Date (as per your title), don't use anydate() but anytime().
Before I write my answer, I would like to say that the format argument should be the format that your string is in. Therefore, if you have "20130501000000", you have to use (you don't have - between each component of your date in the string format):
as.Date("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01"
which works just fine, does not produce any error, and will return an object of class Date:
as.Date("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "Date"
Therefore, I think your issue is more of a formatting and not origin of the date.
Now to my detailed answer:
As far as I know and can understand, the as.Date() will convert it to "date", so if you want the time part of the string as well, you have to use as.POSIXct():
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01 EEST"
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "POSIXct" "POSIXt"
Note that the timezone is EEST which is my local timezone, if you want to define the timezone, you have to define it. For example to set the timezone to UTC:
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S", tz = "UTC")
# [1] "2013-05-01 UTC"
using the as.POSIXct() you can do arithmetic with the object:
times <- c("20130501000000",
"20130501035001") # added 03:50:01 to the first element
class(times)
# [1] "character"
times <- as.POSIXct(times, format = "%Y%m%d%H%M%S", tz = "UTC")
class(times)
# [1] "POSIXct" "POSIXt"
times[2] - times[1]
# Time difference of 3.833611 hours

Converting chr "00:00:00" to date-time "00:00:00"

My question comes from this question. The question had the following character string.
x <- "2007-02-01 00:00:00"
y <- "02/01/2007 00:06:10"
If you try to convert this string to date-class object, something funny happens.
This is a sample from #nrusell's answer.
as.POSIXct(x,tz=Sys.timezone())
[1] "2007-02-01 EST"
as.POSIXct(y,format="%m/%d/%Y %H:%M:%S",tz=Sys.timezone())
[1] "2007-02-01 00:06:10 EST"
As you see, 00:00:00 disappears from the first example. #Richard Scriven left the following example in our discussion using lubridate.
dt <- as.POSIXct("2007-02-01 00:00:00")
hour(dt) <- hour(dt)+1
dt
[1] "2007-02-01 01:00:00 EST"
hour(dt) <- hour(dt)-1
dt
[1] "2007-02-01 EST"
Once again, 00:00:00 disappears. Why does R avoid keeping 00:00:00 in date-class object after conversion? How can we keep 00:00:00?
It is just the print that remove the precision if the time part of a date is a midnight. This is literlay explained in ??strftime help, specially the format parameter:
A character string. The default is "%Y-%m-%d %H:%M:%S" if any
component has a time component which is not midnight, and "%Y-%m-%d"
otherwise
One idea is to redefine the S3 method print for POSIXct object:
print.POSIXct <- function(x,...)print(format(x,"%Y-%m-%d %H:%M:%S"))
Now for your example if your print your x date(with midnight part) you get:
x <- "2007-02-01 00:00:00"
x <- as.POSIXct(x,tz=Sys.timezone())
x
[1] "2007-02-01 00:00:00"

Modifying timezone of a POSIXct object without changing the display

I have a POSIXct object and would like to change it's tz attribute WITHOUT R to interpret it (interpret it would mean to change how the datetime is displayed on the screen).
Some background: I am using the fasttime package from S.Urbanek, which take strings and cast it to POSIXct very quickly. Problem is that the string should represent a datetime in "GMT" and it's not the case of my data.
I end up with a POSIXct object with tz=GMT, in reality it is tz=GMT+1, if I change the timezone with
attr(datetime, "tzone") <- "Europe/Paris";
datetime <- .POSIXct(datetime,tz="Europe/Paris");
then it will be "displayed" as GMT+2 (the underlying value never change).
EDIT: Here is an example
datetime=as.POSIXct("2011-01-01 12:32:23.234",tz="GMT")
attributes(datetime)
#$tzone
#[1] "GMT"
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
How can I change this attribute without R to interpret it aka how can I change tzone and still have datetime displayed as "2011-01-01 12:32:23.233" ?
EDIT/SOLUTION, #GSee's solution is reasonably fast, lubridate::force_tz very slow
datetime=rep(as.POSIXct("2011-01-01 12:32:23.234",tz="GMT"),1e5)
f <- function(x,tz) return(as.POSIXct(as.numeric(x), origin="1970-01-01", tz=tz))
> system.time(datetime2 <- f(datetime,"Europe/Paris"))
user system elapsed
0.01 0.00 0.02
> system.time(datetime3 <- force_tz(datetime,"Europe/Paris"))
user system elapsed
5.94 0.02 5.98
identical(datetime2,datetime3)
[1] TRUE
To change the tz attribute of a POSIXct variable it is not best practice to convert to character or numeric and then back to POSIXct. Instead you could use the force_tz function of the lubridate package
library(lubridate)
datetime2 <- force_tz(datetime, tzone = "CET")
datetime2
attributes(datetime2)
EDITED:
My previous solution was passing a character value to origin (i.e.origin="1970-01-01"). That only worked here because of a bug (#PR14973) that has now been fixed in R-devel.
origin was being coerced to POSIXct using the tz argument of the as.POSIXct call, and not "GMT" as it was documented to do. The behavior has been changed to match the documentation which, in this case, means that you have to specify your timezone for both the origin and the as.POSIXct call.
datetime
#[1] "2011-01-01 12:32:23.233 GMT"
as.POSIXct(as.numeric(datetime), origin=as.POSIXct("1970-01-01", tz="Europe/Paris"),
tz="Europe/Paris")
#[1] "2011-01-01 12:32:23.233 CET"
This will also works in older versions of R.
An alternative to the lubridate package is via conversion to and back from character type:
recastTimezone.POSIXct <- function(x, tz) return(
as.POSIXct(as.character(x), origin = as.POSIXct("1970-01-01"), tz = tz))
(Adapted from GSee's answer)
Don't know if this is efficient, but it would work for time zones with daylight savings.
Test code:
x <- as.POSIXct('2003-01-03 14:00:00', tz = 'Etc/UTC')
x
recastTimezone.POSIXct(x, tz = 'Australia/Melbourne')
Output:
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 14:00:00 AEDT" # Nothing is changed apart from the time zone.
Output if I replaced as.character() by as.numeric() (as GSee had done):
[1] "2003-01-03 14:00:00 UTC"
[1] "2003-01-03 15:00:00 AEDT" # An hour is added.

Resources