Converting time format to numeric with R - r

In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar

If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE

Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.

Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.

The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))

Related

conversion from 12 hr to 24 hr in R and combine two tables

The image for Y table
enter image description here
I want to roll join two tables trial and trial2 with the key as time stamp. One table 'trial' has timestamp POSIXct as the key and other one 'trial2' has a timestamp in character . I tried to convert 'trial2' timestamo from 12 hour format to 24 hr format (POSIXct) so that I can apply roll join on them. But whatever I have used till now gave me NULL in the resulting field rolli for trial2.
library(data.table)
library(dplyr)
library(lubridate)
library(readr)
library(hms)
trial <- read_csv("X.csv")
trial2 <- read_csv("Y.csv")
trial2$rolli<- as.POSIXct(trial2$date ,format = '%m/%d/%Y %I:%M:%S %p')
#######OR#########
trial2$rolli<-strptime(trial2$date, "%m/%d/%Y %I:%M:%S %p")
#######OR#########
trial2$rolli<-ymd_hms(trial2$date)
trial<-mutate(trial, rolli=ymd_hms(paste("2018-11-27", Time), tz='Asia/Kolkata'))
trial<-data.table(trial)
trial2<-data.table(trial2)
setkey(trial, rolli)
setkey(trial2, rolli)
try<-trial[trial2, roll = "nearest"]
class(trial$rolli)
#[1] "POSIXct" "POSIXt"
class(trial2$rolli)
#[1] "POSIXct" "POSIXt"
Debugging is always hard, so a general tip: try to reduce it as much as possible.
Looking at it, I'd think that the parsing of the character hits a problem. I'm not sure about lubridate and ymd_hms, but the as.POSIXct and strptime calls should work.
You can check by printing trial2$date and trial2$rolli. If the date looks fine, but rolli consists of all NA's, then that's the problem.
Probably the dates provided as characters are not in the exact right format, these functions can be very picky.
In order to know exactly what is going wrong, I'd need to see a sample of Y.csv, but you can check if you've typed everything exactly right: spaces, or have you switched "\" and "/"? Also, I normally work with 24-hour-notation, so it could be that strptime is picky about a specification being "am" or "AM" or "a.m." or something else.
EDIT: I've seen the format you're trying to supply, which has decimals in the seconds, which means %S doesn't do the trick.
Instead, you want %OS (it is in the help for ?strptime, but it's quite hidden). Also, I can't see it clearly in the image, but in your original code, there are 2 spaces between "%Y" and "%I". Are there 2 in your input as well?
Anyway:
strptime('11/27/2018 11:44:04.479 AM', format='%m/%d/%Y %I:%M:%OS %p')
# Works with me
trial2$rolli<-strptime(trial2$date, "%m/%d/%Y %I:%M:%OS %p")
# Should solve your problem.
Furthermore, when printing trial2$rolli, the fractional part is not shown, but it is stored. You can show it with as.numeric(trial2$rolli) %% 1, although there may be some small rounding differences.
2nd EDIT:
To fix problems where you have times like 0:00 PM in your input (which is technically wrong, but you might not have control over your input), you can use:
trial2$date <- sub('0+(:..:..)', '12\1', trial2$date)
It replaces all occurences of the form 0 :restoftime or 00 :restoftime with 12 :restoftime
Only be careful about what your source actually means by something like 0:00:00.000 AM: is this midnight or noon? I don't know how R-functions handle this generally (or even if it's guaranteed to always be the same), and I'm not going to burn my hands on that question. If you look on the internet there are a lot of people who have very strong opinions on what AM/PM means in those circumstances, in all variations.

Trimming unwanted characters

I have a very large data set (CSV) with information about bicycle counts from a bike share system. The information I'm working with is the time at which bicycles were taken out of the racks (departure time) and also the total travel time. What I want to do is to add them so I can get the arrival time at the arrival station. The departure time variable is FECHA_HORA_RETIRO and the travel time variable is TIEMPO_USO. The former, which is read by R as factor object, is in the following format: "23/01/2017 19:55:16". On the other hand, TIEMPO_USO is read by R as a character and it's in the following format: "0:17:46".
> head(viajes_ecobici_2017$FECHA_HORA_RETIRO)
[1] 28/01/2017 13:51 17/01/2017 16:24 12/01/2017 16:38 25/01/2017 10:31
> head(viajes_ecobici_2017$TIEMPO_USO)
[1] "1:35:37" "0:11:17" "0:32:51" "0:31:29" "1:31:59" "0:21:43" "0:5:43"
I first used strptime to get everything in the desired format
> viajes_ecobici_2017$FECHA_HORA_RETIRO =format(strptime(viajes_ecobici_2017$FECHA_HORA_RETIRO,format = "%d/%m/%Y %H:%M"),format = "%d/%m/%Y %H:%M:%S")
> viajes_ecobici_2017$TIEMPO_USO = format(strptime(viajes_ecobici_2017$TIEMPO_USO, format="%H:%M:%S"), format="%H:%M:%S")
This works with most observations. However, several observations became NA values after running this code. I went back to the original data to see why this was happening and created a variable with just the observations that became NA. When I looked closer at this observations I saw they have this format "\t\t01/06/2017 00:01". How can I get rid of the "\t\t" while preserving the rest of the information?
Thanks in advance for your help.
trimws() trims white space (including tab characters, \t) from the ends of a character variable:
viajes_ecobici_2017$TIEMPO_USO <- trimws(viajes_ecobici_2017$TIEMPO_USO)
For what it's worth, readr::read_csv() has a built-in trimws option (which is TRUE by default).
Assuming that the variable with the problem is TIEMPO_USO, then a simple regex would take care of the tab characters ("\t")
viajes_ecobici_2017$TIEMPO_USO <- gsub("^\\t\\t","", viajes_ecobici_2017$TIEMPO_USO)

convert string to time in r

I have an array of time strings, for example 115521.45 which corresponds to 11:55:21.45 in terms of an actual clock.
I have another array of time strings in the standard format (HH:MM:SS.0) and I need to compare the two.
I can't find any way to convert the original time format into something useable.
I've tried using strptime but all it does is add a date (the wrong date) and get rid of time decimal places. I don't care about the date and I need the decimal places:
for example
t <- strptime(105748.35, '%H%M%OS') = ... 10:57:48
using %OSn (n = 1,2 etc) gives NA.
Alternatively, is there a way to convert a time such as 10:57:48 to 105748?
Set the options to allow digits in seconds, and then add the date you wish before converting (so that the start date is meaningful).
options(digits.secs=3)
strptime(paste0('2013-01-01 ',105748.35), '%Y-%M-%d %H%M%OS')

cast string directly to IDateTime

I am using the new version of data.table and especially the AWESOME fread function. My files contain dates that are loaded as strings (cause I don't know to do it otherwise) looking like 01APR2008:09:00:00.
I need to sort the data.table on those datetimes and then for the sort to be efficient to cast then in the IDateTime format (or anything alse I would not know yet).
> strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
[1] "2008-04-01 09:00:00"
> IDateTime(strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S"))
idate itime
1: 2008-04-01 09:00:00
> IDateTime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
Error in charToDate(x) :
character string is not in a standard unambiguous format
It looks like I cannot do DT[ , newType := IDateTime(strptime(oldType, "%d%b%Y:%H:%M:%S"))].
My questions are then:
Is there a way to cast directly to IDateTime from fread, such that I can sort afterward efficiently?
If not, what is the most efficient way to go knowing that I would like to be able to sort DT by this datetime column
Unfortunately (for efficiency) strptime produces a POSIXlt type, which is unsupported by data.table and always will be due its size (40 bytes per date!) and structure. Although strftime produces the much better POSIXct, it still does it via POSIXlt. More info here :
http://stackoverflow.com/a/12788992/403310
Looking to base functions such as as.Date, it uses strptime too, creating an integer offset from epoch (oddly) stored as double. The IDate (and friends) class in data.table aims to achieve integer epoch offsets stored as, um, integer. Suitable for fast sorting by base::sort.list(method = "radix") (which is really a counting sort). IDate doesn't really aim to be fast at (usually one off) conversion.
So to convert string dates/times, rightly or wrongly, I tend to roll my own helper function.
If the string date is "2012-12-24" I'd lean towards: as.integer(gsub("-", "", col)) and proceed with YYYYMMDD integer dates. Similarly times can be HHMMDD as an integer. Two columns: date and time separately can be useful if you generally want to roll = TRUE within a day, but not to the previous day. Grouping by month is simple and fast: by = date %/% 100L. Adding and subtracting days is troublesome, but it is anyway because rarely do you want to add calendar days, rather weekdays or business days. So that's a lookup to your business day vector anyway.
In your case the character month would need a conversion to 1:12. There isn't a separator in your dates "01APR2008", so a substring would be one way followed by a match or fmatch on the month name. Are you in control of the file format? If so, numbers are better in an unambiguous format that sorts naturally such as %Y-%m-%d, or %Y%m%d.
I haven't yet got to how best do this in fread, so date/times are left as character currently because I'm not yet sure how to detect the date format or which type to output. It does need to output either integer or double dates though, rather than inefficient character. I suspect that my use of YYYYMMDD integers are seen as unconventional, so I'm a little hesitant to make that the default. They have their place, and there are pros and cons of epoch based dates too. Dates don't have to be always epoch based is all I'm suggesting.
What do you think? Btw, thanks for encouragement on fread; was nice to see.
I d'ont know how your file is structured, but from your comment you want to use the date field as a key. Why not to read it as a time series and format it when in reading?
Here I use zoo to do it.(Here I suppose that the date column is the first one,otherwise see index.colum argument)
ff <- function(x) as.POSIXct(strptime(x,"%d%b%Y:%H:%M:%S"))
h <- read.zoo(text = "03avril2008:09:00:00 125
02avril2008:09:30:00 126
05avril2008:09:10:00 127
04avril2008:09:20:00 128
01avril2008:09:00:00 128"
,FUN=ff)
You get your dates sorted in the right format and sorted.
The conversion is natural from POSIXct to IDateTime
IDateTime(index(h))
idate itime
1: 2008-04-01 09:00:00
2: 2008-04-02 09:30:00
3: 2008-04-03 09:00:00
4: 2008-04-04 09:20:00
5: 2008-04-05 09:10:00
Here sure you still do 2 conversions, But you do it when reading data, and the second you do it without dealing with any format problem.

How to convert in both directions between year,month,day and dates in R?

How to convert between year,month,day and dates in R?
I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".
It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.
I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.
Edit: just to be clear, what I have is, a data frame which looks like:
year month day hour somevalue
2004 1 1 1 1515353
2004 1 1 2 3513535
....
I want to be able to freely convert to this format:
time(hour units) somevalue
1 1515353
2 3513535
....
... and also be able to go back again.
Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:
forwards direction:
lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL
backwards direction:
... well, I didnt do backwards yet, but I imagine something like:
create difftime object out of lh$time (somehow...)
add ISOdate(2004,1,1,0) to difftime object
use one of the solution below to get the year,month,day, hour back
I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?
Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.
Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.
toDate <- function(year, month, day) {
ISOdate(year, month, day)
}
toNumerics <- function(Date) {
stopifnot(inherits(Date, c("Date", "POSIXt")))
day <- as.numeric(strftime(Date, format = "%d"))
month <- as.numeric(strftime(Date, format = "%m"))
year <- as.numeric(strftime(Date, format = "%Y"))
list(year = year, month = month, day = day)
}
I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.
> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004
$month
[1] 12
$day
[1] 21
Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.
I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:
library(lubridate)
library(nycflights13)
library(tidyverse)
a <- flights %>%
mutate(date = make_date(year, month, day))
Found one solution for going from date to year,month,day.
Let's say we have a date object, that we'll create here using ISOdate:
somedate <- ISOdate(2004,12,21)
Then, we can get the numerical components of this as follows:
unclass(as.POSIXlt(somedate))
Gives:
$sec
[1] 0
$min
[1] 0
$hour
[1] 12
$mday
[1] 21
$mon
[1] 11
$year
[1] 104
Then one can get what one wants for example:
unclass(as.POSIXlt(somedate))$mon
Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)

Resources