How to extract floating point second from POSIXct object in R? - r

I am working on a project that requires getting the exact second from a POSIXct object. For example, if printing out a POSIXct object named tm :
> tm
[1] "2017-07-10 09:03:32.26876 BRT"
> class(tm)
[1] "POSIXct" "POSIXt"
If I run:
> format(tm, "%S")
[1] "32"
which only prints out the decimal, instead I want "32.26876", how do I do that? Thanks for the help in advance.

You can use %OS parameter to extract fractional seconds:
tm <- as.POSIXct("2017-07-10 09:03:32.26876 BRT", format="%Y-%m-%d %H:%M:%OS")
class(tm)
# [1] "POSIXct" "POSIXt"
# set the second digits option
options(digits.secs=5)
# `strftime` or `format`:
strftime(tm, "%OS")
#[1] "32.26875"
format(tm, "%OS")
#[1] "32.26875"

From ?strftime the following is noted:
Specific to R is %OSn, which for output gives the seconds
truncated to 0 <= n <= 6 decimal places (and if %OS is not
followed by a digit, it uses the setting of
getOption("digits.secs"), or if that is unset, n = 0).
Hence we can recover up to 6 decimal places, although there seems to be some change in this information:
> tm <- as.POSIXct("2017-07-10 09:03:32.26876", tz = "BRT")
> tm
[1] "2017-07-10 09:03:32.268 BRT"
> format(tm, "%OS5")
[1] "32.26875"
> format(tm, "%OS6")
[1] "32.268759"

If you convert it to a POSIXlt, this is easy.
tm = as.POSIXct("2017-07-10 09:03:32.26876 BRT")
as.POSIXlt(tm)$sec
[1] 32.26876

Related

What does calling as.numeric() do to a lubridate Date object?

I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE

Why does R add non-existing microseconds/nanoseconds to date-time objects?

I have datetimes written in format "%d-%b-%Y %H:%M:%S.%OS",
so for example "25-Apr-2021 18:31:56.234",
that is to the precision of milliseconds.
And when parse that to time object I see values are not the same, sometimes it adds 1 microsecond or decreases it, or similiar things.
Why is this and what to do about this?
I want to have a time object which is exactly 56 seconds and 234 milliseconds! (and zeroes after that if it needs to add higher precision
For example some of the values it prints when I call print(as.numeric(), digits=20) command: "1615310444.7509999 1615310442.5550001",
or when I ask for difference between some 2 values, it gives: "Time difference of 0.003999949 secs" for example.
You can use the options(digits.secs) to show the milliseconds. Here is an example below. The digits.secs must be set at zero. Also note that you should change the format of the date
> dp <- options(digits.secs=0)
> dp
$digits.secs
[1] 0
> strptime("25-Apr-2021 18:31:56.234", format = "%d-%b-%Y %H:%M:%OS")
[1] "2021-04-25 18:31:56 +08"
> dp <- options(digits.secs=3)
> dp
$digits.secs
[1] 0
> strptime("25-Apr-2021 18:31:56.234", format = "%d-%b-%Y %H:%M:%OS")
[1] "2021-04-25 18:31:56.234 +08"

setting column to datetime in R

The date in my dataset is like this: 20130501000000 and I'm trying to convert this to a better datetime format in R
data1$date <- as.Date(data1$date, format = "%Y-%m-%s-%h-%m-%s")
However, I get an error for needing an origin. After I put the very first cell under date in as origin, it converts every cell under date to N/A. Is this right or should I try as.POSIXct()?
That is a somewhat degenerate format, but the anytime() and anydate() functions of the anytime package can help you, without requiring any explicit format strings:
R> anytime("20130501000000") ## returns POSIXct
[1] "2013-05-01 CDT"
R> anydate("20130501000000") ## returns Date
[1] "2013-05-01"
R>
Not that we parse from character representation here -- parsing from numeric would be wrong as we use a conflicting heuristic to make sense of dates stored a numeric values.
So here your code would just become
data1$data <- anytime::anydate(data1$date)
provided data1$date is in character, else wrap one as.character() around it.
Lastly, if you actually want Datetime rather than Date (as per your title), don't use anydate() but anytime().
Before I write my answer, I would like to say that the format argument should be the format that your string is in. Therefore, if you have "20130501000000", you have to use (you don't have - between each component of your date in the string format):
as.Date("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01"
which works just fine, does not produce any error, and will return an object of class Date:
as.Date("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "Date"
Therefore, I think your issue is more of a formatting and not origin of the date.
Now to my detailed answer:
As far as I know and can understand, the as.Date() will convert it to "date", so if you want the time part of the string as well, you have to use as.POSIXct():
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01 EEST"
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "POSIXct" "POSIXt"
Note that the timezone is EEST which is my local timezone, if you want to define the timezone, you have to define it. For example to set the timezone to UTC:
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S", tz = "UTC")
# [1] "2013-05-01 UTC"
using the as.POSIXct() you can do arithmetic with the object:
times <- c("20130501000000",
"20130501035001") # added 03:50:01 to the first element
class(times)
# [1] "character"
times <- as.POSIXct(times, format = "%Y%m%d%H%M%S", tz = "UTC")
class(times)
# [1] "POSIXct" "POSIXt"
times[2] - times[1]
# Time difference of 3.833611 hours

Data and time conversion with R

I'm using R Studio. When I try to convert the date and time format using as.Date or as.Time I'm getting NA as the result. I also tried to set the Locale as it has been recommended in some of the problems in SO, that's also not helping. The default class is factor after I import from the text file. I also tried to make it a character. Still the problem exists. Any help?
> x<-c("16-12-2006")
> class(x)
[1] "character"
> y<-as.Date(x)
> class(y)
[1] "Date"
> y<-as.Date(x,format="d%m%Y%")
> class(y)
[1] "Date"
> y
[1] NA
You are just misplacing the %s and missing the -s in your format string.
The format string needs to match the string characters exactly (spaces, hyphens, commas, colons, etc.). See the document: Date-time Conversion Functions to and from Character.
Try:
> y <- as.Date(x, format="%d-%m-%Y")
and it should work.
Try this:
> x <- as.POSIXct(strptime("11-09-2015", "%d-%m-%Y"))
> as.Date(x)
[1] "2015-09-10"
> x
[1] "2015-09-11 CEST"
x is a class of "POSIXct" and "POSIXt", but as.Date(x) is a class of "Date" and you can use it as x-axis in ggplot.

Convert numeric values to dates

I have a numeric vector as follows
aa <- c(1022011, 2022011, 13022011, 23022011) (this vector is just a sample, it is very long)
Values are written in such a way that first value is day then month and then year.
What I am doing right now is
as.Date(as.character(aa), %d%m%Y")
but,
it is causing problems (returning NA) in case of single digits day numbers. (i.e. 1022011, 2022011).
so basically
as.Date("1022011", "%d%m%Y") does not work
but
as.Date("01022011", "%d%m%Y") (pasting '0' ahead of the number) works.
I want to avoid pasting '0' in such cases. Is there any other (direct) alternative to convert numeric values to dates at once?
It could be rearranged using sub in which case a plain as.Date with no format works:
x <- c(1022011, 11022011) # test data
pat <- "^(..?)(..)(....)$"
as.Date(sub(pat, "\\3-\\2-\\1", x))
giving:
[1] "2011-02-01" "2011-02-11"
Depending on your platform, you could use sprintf in order to add a zero at the beginning. It seems that Mac is OK with this, but not windows 7 given the discussion with the OP.
aa <- c(1022011, 2022011, 13022011, 23022011)
as.Date(sprintf("%08s", aa), format = "%d%m%Y")
[1] "2011-02-01" "2011-02-02" "2011-02-13" "2011-02-23"
UPDATE
#CathyG kindly mentioned that sprintf("%08i",aa) works on Windows 7.
You can use dmy in lubridate:
library(lubridate)
aa <- c(1022011, 2022011, 13022011, 23022011)
> dmy(aa)
[1] "2011-02-01 UTC" "2011-02-02 UTC" "2011-02-13 UTC" "2011-02-23 UTC"
and if you don't want the timezone just wrap it in as.Date:
> as.Date(dmy(aa))
[1] "2011-02-01" "2011-02-02" "2011-02-13" "2011-02-23"
Thank you #Ben Bolker,
> as.Date(mdy(aa))
[1] "2011-01-02" "2011-02-02" "2012-01-02" "2011-01-02"
I know you don't want to add a "0" but still, in base R, this works :
as.Date(sapply(aa,function(x){ifelse(nchar(x)==8,x,paste("0",x,sep=""))}),format = "%d%m%Y")

Resources