I have a dataframe where one column lists a bunch of datetimes. Oddly, the data type for that column is "integer." I need to coerce the column to a proper datetime data type such as POSIXct so that I can subtract these timestamps from those in another field. However, when I try to coerce these datetime values into POSIXct, they lose the time component. When I try to do math on the datetimes without first coercing into another datatype, R acts as if the time component of the timestamp isn't there (it assumes each date has a time of midnight). What's going on and how do I fix it so that R recognizes the timestamp?
> dates[1]
[1] 2016-05-05T16:46:21-04:00
48 Levels: 2016-05-03T06:45:42-04:00 2016-05-03T06:45:43-04:00 ... 2016-05-05T16:50:00-04:00
> typeof(dates)
[1] "integer"
> as.POSIXct(dates[1])
[1] "2016-05-05 EDT"
> as.character(dates[1])
[1] "2016-05-05T16:46:21-04:00"
> as.POSIXct(as.character(dates[1]))
[1] "2016-05-05 EDT"
You can use as.POSIXct with the tz argument to convert the timestamps with the right level of control.
If the timezones are all UTC-04:00 and that is your local timezone, you can use:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz=Sys.timezone())
If they are all UTC-04:00 and that is not your local timezone, but you know the exact location, then you can specify the appropriate timezone from the tz database:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz="America/Port_of_Spain")
Alternatively, you can use a generic GMT-4 timezone:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz="Etc/GMT-4")
[EDIT: With thanks to Roland for his comment below. I originally used strptime, which uses the same syntax, but returns a POSIXlt object.]
Related
Format of original input data is like: "1975M01", the variable name is Month.
The way we put the input into POSIXct date-time object is: parse_date_time(Month, "%Y%m").
I wonder why we ignored the 'M'(1975M01), like how can the function recognize the year and month and ignore the M in the middle automatically?
You can try as.POSIXct(). We have no days, though. But probably we want the first day of the month and just paste 01 at the end of the string(s). Finally define the "M" literally in format= string, and of course add %day,
x <- as.POSIXct(paste0('1975M01', '01'), format='%YM%m%d')
x
# [1] "1975-01-01 CET"
where
class(x)
# [1] "POSIXct" "POSIXt"
.
This should also work with lubridate but can't install it.
I want to convert timestamps from characters to time but I do not want any date. On top of that, it must display milliseconds, as in "hh:mm:ss,os".
If I use as.POSIXCT it always adds a date prefix to my timestamp and that is not my intention. I also checked the lubridate package but I can't seem to find a function that goes beyond "as.hms" so that it displays at least two digits in milliseconds.
Example using POSIXct
df <-c("01:31:12.20","01:31:14.56","01:31:14.84")
options(digits.secs = 2)
df <- as.POSIXct(df, format="%H:%M:%OS")
This is the outcome:
[1] "2019-03-15 01:31:12.20 EDT" "2019-03-15 01:31:14.55 EDT"
[3] "2019-03-15 01:31:14.83 EDT"
Thank you.
Perhaps, the hms package does what the OP expects. hms implements an S3 class for storing and formatting time-of-day values, based on the 'difftime' class.
library(hms)
as.hms(df)
01:31:12.200000
01:31:14.560000
01:31:14.840000
It can be used for calculation, e.g.,
diff(as.hms(df))
00:00:02.360000
00:00:00.280000
Please, note that the print() and format() methods do not accept other parameters and do not respect options(digits.secs = 2).
The hms class is similar to lubridate's as.hms() function which creates a period object:
lubridate::hms(df)
[1] "1H 31M 12.2S" "1H 31M 14.56S" "1H 31M 14.84S"
Arithmetic can be done as well:
diff(lubridate::hms(df))
[1] 2.36 0.28
Please, be aware that the internal representation of time, date, and datetime objects usually is based on numeric types to allow for doing calculations. The internal representation is different from the character string when the object is printed.
The ITime class in the data.table package is a time-of-day class which stores the integer number of seconds in the day. So, it cannot handle milliseconds.
Sometimes I work with data like this:
sep-2018
From date like this:
Sys.Date()
[1] "2018-09-21"
To have this result, I generally use:
format(Sys.Date(),'%b-%Y')
But its class is not a date:
class(format(Sys.Date(),'%b-%Y'))
[1] "character"
Why it's not a date? Is it possible to have it with class() = date, and how?
Also an external library like zoo have the same thing.
library(zoo)
> class(format(as.yearmon(format(Sys.Date()), "%Y-%m-%d"), "%b.%Y"))
[1] "character"
Also using "%m.%Y" seems to generate the same thing, but it does not creates (for example) ordering issue.
The format command takes the date and outputs a printable string based on the format you provide. To quote the documentation:
An object of similar structure to x containing character representations of the
elements of the first argument x in a common format, and in the current
locale's encoding.
Also, a Date variable is stored as a numeric type internally (number of days since 1970-01-01)
dput(Sys.Date())
#structure(17795, class = "Date")
structure(0, class = "Date")
#[1] "1970-01-01"
So to pinpoint the date, you need day, month and year fields. If you don't have all three, it will probably return NA or an error. Similarly for time classes. If you don't have the data then you can just use some dummy values, and use format to print only the fields you want.
As Rohit says, format doesn't outputs a Date object, but a string in the format of your choice.
To get a Date object from a string like "sep-2018" you could use readr::parse_date().
(my_date <- readr::parse_date("sep-2018", format = '%b-%Y'))
#[1] "2018-09-01"
class(my_date)
#[1] "Date"
I have a DF in R which has two character columns. The first column is a time series array and the second column contains continuous numbers. The time series field has time recorded in milliseconds. I am trying to convert this array to a date array. However whichever method I use to convert the same, I lose the milliseconds information.
Following is the dataframe:
time = c("08-08-2016 09:16:33.430","08-08-2016 09:16:37.930")
values <- c(45,21)
my_data <- data.frame(time,values)
I would like to preserve the millisecond information. However, as I convert the time char array using following method, I lose the milliseconds (O/P time array= 2016-08-08 09:16:33,08-08-2016 09:16:37) .
my_data$time=strptime(my_data$time,format="%m-%d-%Y %H:%M:%S.%OS")
I also tried using as.POSIXct, as.Date functions but could not resolve. Can someone please help?
%OS instead of %S, not in addition to it. "%m-%d-%Y %H:%M:%OS" is the format string required:
options(digits.secs=6)
as.POSIXct(my_data$time, format="%m-%d-%Y %H:%M:%OS")
#[1] "2016-08-08 09:16:33.43 AEST" "2016-08-08 09:16:37.93 AEST"
You have a standard-enough format so that anytime can parse this automagically with additional input from you:
R> timevec <- c("08-08-2016 09:16:33.430","08-08-2016 09:16:37.930")
R> anytime(timevec)
[1] "2016-08-08 09:16:33.43 CDT" "2016-08-08 09:16:37.93 CDT"
R>
I tend to have options(digits.secs=6) set by default which is why the display also shows the fractional seconds.
I'm using strptime to extract date and the result is a wrong year
Where is the error in the below code:
strptime('8/29/2013 14:13', "%m/%d/%y")
[1] "2020-08-29 PDT"
What are the other ways to extract date and time as separate columns.
The data I have is in this format - 8/29/2013 14:13
I want to split this into two columns, one is 8/29/2013 and the other is 14:13.
You have a four digit year so you need to use %Y
strptime('8/29/2013 14:13', "%m/%d/%Y" )
[1] "2013-08-29 CEST"
Do you really want data and time in separate columns? It usually much easier to deal with a single date-time object.
Here's one possibility to separate time and date from the string.
For convenience, we could first convert the string into a POSIX object:
datetime <- '8/29/2013 14:13'
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
Then we can use as.Date() to extract the date from this object and use format() to display it in the desired format:
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013"
To store the time separately we can use, e.g., the strftime() function:
strftime(datetime.P, '%H:%M')
#[1] "14:13"
The last function (strftime()) is not vectorized, which means that if we are dealing with a vector datetime containing several character strings with date and time in the format as described in the OP, it should be wrapped into a loop like sapply() to extract the time from each string.
Example
datetime <- c('8/29/2013 14:13', '9/15/2014 12:03')
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013" "09/15/2014"
sapply(datetime.P, strftime, '%H:%M')
#[1] "14:13" "12:03"
Hope this helps.