I've been trying to find a simple way of formatting the output from difftime into HH:MM:SS.ms. So far I haven't come across anything which I was surprised by.
I did write the function below which almost does it. The limitation is the presentation of the numbers as significant single digits. eg 2hr, 3mins, 4.5secs becomes "2:3:4.5" instead of "02:03:04.5"
Does anyone have a better suggestion?
format.timediff <- function(start_time) {
diff = as.numeric(difftime(Sys.time(), start_time, units="mins"))
hr <- diff%/%60
min <- floor(diff - hr * 60)
sec <- round(diff%%1 * 60,digits=2)
return(paste(hr,min,sec,sep=':'))
}
In addition to #GSee's comment, you could use a function like this:
f <- function(start_time) {
start_time <- as.POSIXct(start_time)
dt <- difftime(Sys.time(), start_time, units="secs")
# Since you only want the H:M:S, we can ignore the date...
# but you have to be careful about time-zone issues
format(.POSIXct(dt,tz="GMT"), "%H:%M:%S")
}
f(Sys.Date())
Merge_Charge_Point$Duration<- difftime(Merge_Charge_Point$EndConnectionDateTime, Merge_Charge_Point$StartConnectionDateTime, units="secs")
This is the code. But this code transforms the data in to seconds but the outcome should be a time string.
Related
For accident-analysis, I have to check if logged accidents from one system, exist in the logs from another system. Problem is that both systems are filled manually, so (small?) differences in location and time may occur.
For now, I've got this problem solved with a function, which I call with:
sys1log.df["match_1_900"] <- apply(sys1log.df, 1, bestMatch, marginLocation = 1, marginTime = 900)
marginLocation is the margin I want to use for the location of an incident. In this case the margin is 1, so all incidents in syslog2.df, which are logged between 0 and 2 are possible candidates for a match.
The same goes for marginTime, in this example set to 900 seconds. All incidents from syslog2.df which are logged between a quarter of an hour before (or after) the incident from syslog1.df, are possible matches.
The only thing I want to match 'hard' is the roadnumber.
The function bestMatch is:
bestMatch <- function (x, marginLocation, marginTime) {
location <- as.numeric( x[10] )
roadnumber <- as.numeric( x[9] )
time <- as.POSIXct( strptime(x[4], "%Y-%m-%d %H:%M:%S") )
require("dplyr")
df <- sys2log.df %>%
#filter rows that match criteria (within margins)
filter(road == roadnumber,
loc < location + marginLocation,
loc > location - marginLocation,
starttime < time + marginTime,
starttime > time - marginTime) %>%
#create column with absolute difference between time system1 and time system2
mutate(timeDifference = abs( as.numeric(time) - as.numeric(starttime) )) %>%
#sort on timeDifference
arrange(timeDifference)
#if a match is found, return the value in column 15 from the row with the smallest timeDifference)
if (length(df)) {
return(df[1,15])
} else {
return(NA)
}
}
This works fine, but the problem is that the logs contain >100.000 rows, so the apply-function takes about 15-30 minutes to run. I'm using multiple combination of location/time-margins, so I would really like to speed up things.
I think this can be done (much) faster, using data.table's rolling joins. My "problem" is that I would like to join on three keys, of which two should contain a rolling window/margin. Data.table only lets you apply a rolling join on one (the last) key.
I'm sure there is a way to achieve my goal with data.table (or another package), but I'm lost. Who can point me in the right direction?
It's typically a situation where you shouldn't use apply, you're converting your data.frame to a matrix then at each iteration reconverting every value.
use purrr::pmap instead to iterate on the chosen columns.
Don't sort your data when you're only looking for a minimum value only, use which.min, (and keep only the first result in case of multiple solutions).
Your test on length(df)) is counting the columns of the data.frame so it will never fail, I think you meant to test for nrows. I just skipped it as you can just test afterwards what object you received.
As you don't provide a reproducible example I can't guarantee that it works as I'm a lousy blind coder :). But it should point you to the solution.
# I'm supposing that the indices 10 9 and 4 are for loc, road, and starttime, and that in the original format the columns are well formatted
get_new_col <- function(marginLocation = 1, marginTime = 900){
sys1log.df["match_1_900"] <- sys1log.df %>% select(loc,road,starttime) %>%
pmap(function(location,road_number,time){
filter(sys1log.df %>%
filter(road == roadnumber,
loc < location + marginLocation,
loc > location - marginLocation,
starttime < time + marginTime,
starttime > time - marginTime) %>%
%>% {.[which.min(abs(time-starttime))[1],"timeDifference"]}
}
}
sys1log.df["match_1_900"] <- get_new_col()
I've tried to use Sys.time to get the time elapsed between two points. However, it doesn't output in a way I like.
This is how it looks now:
a <- Sys.time
...running stuff between these two points...
b <- Sys.time
c <- b - a
c
Time difference of 1.00558 hours
I only want the number and the units. I know that to get just the number I can do:
c[[1]]
However, sometimes the result of c can give me seconds or minutes. I only want instances wherein I have the number and when the units are in hours. Does anyone know of a way such that I would get something like the following, using Sys.time() (or any alternative):
if (units == "hours")
{
if (number => 1)
{
#do something
}
}
Using difftime of base R allows you to obtain the time difference in different units. Rest is formatting.
a = Sys.time()
Sys.sleep(5) #do something
b = Sys.time()
paste0(round(as.numeric(difftime(time1 = b, time2 = a, units = "secs")), 3), " Seconds")
#[1] "5.091 Seconds"
The package tictoc simplifies this kind of timing. It doesn't return hours, but we can create a new function that converts its second-based measurements into hours.
library(tictoc)
toc_hour <- function() {
x <- toc()
(x$toc - x$tic) / 3600
}
You normally start the timer with tic() and stop it with toc().
tic()
Sys.sleep(2)
toc()
# 2.02 sec elapsed
Calling toc_hour() instead of toc() returns the number of hours that have elapsed.
tic()
Sys.sleep(2)
toc_hour()
# 2.25 sec elapsed
# elapsed
# 0.000625
It still prints the number of seconds above the hours, but if you capture the result it will only store the number of hours for downstream analysis.
tic()
Sys.sleep(2)
x <- toc_hour()
if(x < 1) {print("This took under an hour")}
You can evaluate everything as an argument to the system.time function. It will give you the elapsed time in seconds.
paste0(system.time( rnorm(1000000, 0, 1) )[3] / 3600, " hours")
# "2.58333333334172e-05 hours"
Alternatively, you can use Frank's suggestion in the comments. difftime(b, a, units = "hours") which is probably the dominant solution in most cases
The tictoc package normally returns seconds. The other solutions from this package manually converts this to other units but I find it still doesn't look right. Instead, use the built-in func.toc argument in toc() to change the output. For example:
toc_min <- function(tic,toc,msg="") {
mins <- round((((toc-tic)/60)),2)
outmsg <- paste0(mins, " minutes elapsed")
}
And then:
tic()
Sys.sleep(1)
toc(func.toc=toc_min)
returns
0.02 minutes elapsed
I think lubridate is the quickest solution for you:
start <- Sys.time()
## Do Stuff here
end <- Sys.time()
elapsed <- lubridate::ymd_hms(end) - lubridate::ymd_hms(start)
message(elapsed)
It should return something useful like:
"Time difference of 12.1 hours"
Maybe you can try the ´tictoc´ package.
As described in the documentation you can do the following:
tic()
#Do something
toc(log = TRUE, quiet = TRUE)
#Put the result in a log
log.txt <- tic.log(format = TRUE)
#Extract only the value
res <- gsub( " sec elapsed", "", unlist(log.txt))
#Clear the log
tic.clearlog()
That way, res gives you only the value and is in seconds, so it is pretty simple to have hours then.
Moreover, if you don't clear the log you can put successions of tic() and toc() and put everything in your log.txt, and then gsub( " sec elapsed", "", unlist(log.txt)) will give you a vector of strings with the value in seconds for each iteration which can be pretty useful
I am working in R. I have a date sequence and I would like assign whether each particular date is term time or a school holiday. I plan to do this using a dataframe column where each row is labelled "Holiday" or "Term"
My approach is to create a time sequence and individually specify each holiday dates using a vector; the first element is the start date of the holiday and the second is the end of the holiday.
I then create an if-or statement to test whether the time lies within any of the holiday dates specified in the vectors.
Here is my code so far:
start <- as.POSIXlt("2015-10-10 00:00:00")
end <- as.POSIXlt("2016-03-31 00:00:00")
DateSeq <- seq(from=start, to=end, by="mins")
#Holidays defined using a vector with by start and end date
H1 <- c("2015-10-26", "2015-11-3") #October half term
H2 <- c("2015-12-16", "2016-01-05") #Christmas holiday
H3 <- c("2016-02-15", "2016-02-19") #Feb half term
H4 <- c("2016-03-24", "2016-03-31") #Easter holiday
date_table <- data.frame(Time = DateSeq)
if ((round(date_table$Time, units = "days")== H1[1] <> H1[2]) | (round(date_table$Time, units = "days") == H2[1] <> H2[2])) {
date_table$Holiday <- "Holiday"
} else {
date_table$Holiday <- "Term"
}
As you can see this code does not work and simply labels all the rows as "Term".
Therefore I am wondering the following:
How I can specify the date range in the holiday vectors so it can be used within the if statement?
Whether this is the best approach to use? I am quite new to R and I was thinking of alternatives such as creating a set of individual sequences for term and holidays and then stitching them together. This approach however seems fiddly but would appreciate your thoughts.
Thanks for your help.
Similar to your approach by avoid if-else.
# Or alternatively by="mins"
DateSeq <- seq(from=start, to=end, by="hours")
date_table <- data.frame(Time = DateSeq)
# Put all holidays together
H<-rbind(H1,H2,H3,H4)
# Or alternatively set to "Term"
date_table$Holiday<-0
# Assign all the holidays
for (i in 1:nrow(H)){
date_table[date_table$Time > H[i,1] & date_table$Time < H[i,2],"Holiday"]<-1 # or "Holiday"
}
# Check they are correctly assign
plot(date_table)
You could use cut() to cut DateSeq into term and holiday.
res <- cut(DateSeq, breaks = as.POSIXlt( c(H1, H2, H3, H4), format = "%Y-%m-%d"), labels = c("October", "term1", "Christmas", "term2", "Feb", "Term3", "Easter"))
table(res)
I've got some performance time data in mm:ss.00 format (i.e. 02:15.45, or 00:34.58). R is recognizing the variable as a factor, but I'd like to convert each performance time to just seconds (i.e. 02:15.45 to 135.45). I've searched for an answer but can't seem to find a way to make it work.
Thanks in advance.
Using lubridate package (part of tidyverse):
library(lubridate)
period_to_seconds(hms("12:12:54"))
Here's one I've used for a number of years. It's vectorized, too.
toSeconds <- function(x){
if (!is.character(x)) stop("x must be a character string of the form H:M:S")
if (length(x)<=0)return(x)
unlist(
lapply(x,
function(i){
i <- as.numeric(strsplit(i,':',fixed=TRUE)[[1]])
if (length(i) == 3)
i[1]*3600 + i[2]*60 + i[3]
else if (length(i) == 2)
i[1]*60 + i[2]
else if (length(i) == 1)
i[1]
}
)
)
}
And the reverse (preserves fractional seconds to the number of digits requested:
secondsToString <- function(x,digits=2){
unlist(
lapply(x,
function(i){
# fractional seconds
fs <- as.integer(round((i - round(i))*(10^digits)))
fmt <- ''
if (i >= 3600)
fmt <- '%H:%M:%S'
else if (i >= 60)
fmt <- '%M:%S'
else
fmt <- '%OS'
i <- format(as.POSIXct(strptime("0:0:0","%H:%M:%S")) + i, format=fmt)
if (fs > 0)
sub('[0]+$','',paste(i,fs,sep='.'))
else
i
}
)
)
}
Look into strptime. Specifically
t = "02:15.45"
(as.numeric(as.POSIXct(strptime(t, format = "%M:%OS"))) -
as.numeric(as.POSIXct(strptime("0", format = "%S"))))
This will work, but is possibly a little awkward (doing it this way mostly because of POSIXct's annoying automatic unit conversion...)
library(lubridate)
df$variable<- hms(df$variable)
df$variable<- as.numeric(df$variable)
make it a one-liner is ok as well. Works like a charm for me.
I hope this helps.
I am not that much comfortable so i don't know if there is any builtin function available, but i have worked out this code.
mmss_to_ss <- function (string)
{
mmss <- strsplit (string, ":", T)
mm <- as.numeric (mmss[[1]][1])
ss <- as.numeric (mmss[[1]][2])
return (mm * 60 + ss)
}
This will accept a time string in mm:ss format and return second values. The code can be easily modified to convert from hh:mm:ss to seconds also.
You can do this easily with the Lubridate package. If you use the format "h:m:s" you can convert the variable to a lubridate object with
hms("12:12:54")
And then convert it to seconds with
seconds(hms("12:12:54"))
Here is a link to the lubridate article in JSS
http://www.jstatsoft.org/v40/i03/paper
(EDIT: one of the issues here is scale, namely what works for one row will blow up/crash R on a 200,000 * 50 dataframe. For example, strptime must be applied column-wise, not row-wise, to avoid hanging.
I'm looking for working code solutions that you actually ran on 200,000 * 50 including your measured runtime, not just casual "this is easy" remarks. It's easy to get runtimes > 12 hrs if you pick the wrong fn. Next, I also asked you to make my zero-time adjustment code faster, the job's not finished till that's done. Noone attempted that so far.)
I want to vectorize and accelerate the following multistep log-time conversion, with millisecond accuracy, involving converting strtime() to a single numeric, followed by subtraction and then log() on a large data-frame (200,000 rows * 300 cols; other (non-time) columns omitted).
Code below.
As well as making it vectorized and fast, an extra problem is I'm not sure how best to represent the (higher-dimensional) intermediate values at each step e.g. as list from strtime, matrix, vector). I already tried apply,sapply,lapply,vapply,ddply::maply(),... but the incompatibility of intermediate format(s) keeps messing me up...
Each row has 50 columns time1..time50 (chr, format="HH:MM:SS.sss") representing time as string in millisecond resolution. I need millisecond accuracy.
Within each row, columns time1..time50 are in non-decreasing order, and I want to convert them into log of time before time50. The conversion fn parse_hhmmsecms() is at bottom, and needs serious vectorization and speeding up, you can see alternative versions commented out. What I figured so far: strtime() is faster than (multiple) substr() calls, I then convert somehow to list of three numeric (hh,mm,sec.ms), then convert to vector assuming the next step should be to vector-multiply with %*% c(3600,60,1) to convert to numeric seconds.
Here is pseudocode of what I do for each row, and each time-string; full code is at bottom:
for each row in dataframe { # vectorize this, loop_apply(), or whatever...
#for each time-column index i ('time1'..'time50') { # vectorize this...
hhmmsecms_50 <- parse_hhmmsecms(xx$time50[i])
# Main computation
xx[i,Clogtime] <- -10*log10(1000*(hhmmsecms_50 - parse_hhmmsecms(xx[i,Ctime]) ))
# Minor task: fix up all the 'zero-time' events to be evenly spaced between -3..0
#}
}
So there are five subproblems involved:
How to vectorize handling the list returned by strtime()? since it returns a list of 3 items, when passed a 2D dataframe or 1D row of time-strings, we will get a 3D or 2D intermediate object. (do we internally we use list-of-list? matrix of lists? array of lists?)
How to vectorize the entire function parse_hhmmsecms()?
Then do the subtraction and log
Vectorize the zero-time fixup code as well (this is now the slowest part by far)
How to accelerate steps 1...4.?
Code snippet below using ten example columns time41..50 (use random_hhmmsecms() if you want a bigger sample)
I did my best to follow these recommendations, this is as reproducible as I can get it in six hours' work:
# Each of 200,000 rows has 50 time strings (chr) like this...
xx <- structure(list(time41 = c("08:00:41.465", "08:00:50.573", "08:00:50.684"
), time42 = c("08:00:41.465", "08:00:50.573", "08:00:50.759"),
time43 = c("08:00:41.465", "08:00:50.573", "08:00:50.759"
), time44 = c("08:00:41.465", "08:00:50.664", "08:00:50.759"
), time45 = c("08:00:41.465", "08:00:50.684", "08:00:50.759"
), time46 = c("08:00:42.496", "08:00:50.684", "08:00:50.759"
), time47 = c("08:00:42.564", "08:00:50.759", "08:00:51.373"
), time48 = c("08:00:48.370", "08:00:50.759", "08:00:51.373"
), time49 = c("08:00:50.573", "08:00:50.759", "08:00:54.452"
), time50 = c("08:00:50.573", "08:00:50.759", "08:00:54.452"
)), .Names = c("time41", "time42", "time43", "time44", "time45",
"time46", "time47", "time48", "time49", "time50"), row.names = 3:5, class = "data.frame")
# Handle millisecond timing and time conversion
options('digits.secs'=3)
# Parse "HH:MM:SS.sss" timestring into (numeric) number of seconds (Very slow)
parse_hhmmsecms <- function(t) {
as.numeric(substr(t,1,2))*3600 + as.numeric(substr(t,4,5))*60 + as.numeric(substr(t,7,12)) # WORKS, V SLOW
#c(3600,60,1) %*% sapply((strsplit(t[1,]$time1, ':')), as.numeric) # SLOW, NOT VECTOR
#as.vector(as.numeric(unlist(strsplit(t,':',fixed=TRUE)))) %*% c(3600,60,1) # WANT TO VECTORIZE THIS
}
random_hhmmsecms <- function(n=1, min=8*3600, max=16*3600) {
# Generate n random hhmmsecms objects between min and max (8am:4pm)
xx <- runif(n,min,max)
ss <- xx %% 60
mm <- (xx %/% 60) %% 60
hh <- xx %/% 3600
sprintf("%02d:%02d:%05.3f", hh,mm,ss)
}
xx$logtime45 <- xx$logtime44 <- xx$logtime43 <- xx$logtime42 <- xx$logtime41 <- NA
xx$logtime50 <- xx$logtime49 <- xx$logtime48 <- xx$logtime47 <- xx$logtime46 <- NA
# (we pass index vectors as the dataframe column ordering may change)
Ctime <- which(colnames(xx)=='time41') : which(colnames(xx)=='time50')
Clogtime <- which(colnames(xx)=='logtime41') : which(colnames(xx)=='logtime50')
for (i in 40:nrow(xx)) {
#if (i%%100==0) { print(paste('... row',i)) }
hhmmsecms_50 <- parse_hhmmsecms(xx$time50[i])
xx[i,Clogtime] <- -10*log10(1000*(hhmmsecms_50 - parse_hhmmsecms(xx[i,Ctime]) ))
# Now fix up all the 'zero-time' events to be evenly spaced between -3..0
Czerotime.p <- which(xx[i,Clogtime]==Inf | xx[i,Clogtime]>-1e-9)
xx[i,Czerotime.p] <- seq(-3,0,length.out=length(Czerotime.p))
}
You may be overcomplicating things.
Start with base classes which do milliseconds very well (and on appropriate operating systems even microseconds) but note that
you need to set options("digits.secs"=7) (that's the max that can be displayed) to see them displayed
you need an additional parsing character for strptime et al
all of which is in the docs, and countless examples here on SO.
Quick examples:
R> someTime <- ISOdatetime(2011, 12, 27, 2, 3, 4.567)
R> someTime
[1] "2011-12-27 02:03:04.567 CST"
R> now <- Sys.time()
R> now
[1] "2011-12-27 16:48:20.247298 CST" # microsecond display on Linux
R>
R> txt <- "2001-02-03 04:05:06.789123"
R> strptime(txt, "%Y-%m-%d %H:%M:%OS") # note the %0S for sub-seconds
[1] "2001-02-03 04:05:06.789123"
R>
And key functions such as strptime or as.POSIXct are all vectorised and you can throw entire columns at them.