I have a data frame with the amount of time it takes to do a lap and I'm trying to separate that into individual data frames for each driver.
These time values look like this, being in minutes:seconds.milliseconds, except for the first lap which has a Colon in between seconds and milliseconds.
13:14:50 1:28.322 1:24.561 1:23.973 1:23.733 1:24.752
I'd like to have these in a separate data frame in a seconds format like this.
794.500 88.322 84.561 83.973 83.733 84.752
When I convert this to a numeric it gives the following values.
214 201 174 150 133 183
And when I use strptime or POSIXlt it gives me huge values which are also wrong, even when I use the format codes. However, I subtracted 2 values to find that the time difference was correct, and through that I found that were all off by 1609164020. Also, these values ignore the decimal values which I need.
You can use POSIXlt in conjunction with a conversion to seconds.
First, add a date to your first time element:
ds <- c("13:14:50", "1:28.322", "1:24.561", "1:23.973", "1:23.733", "1:24.752")
ds[1] <- paste( Sys.Date(), ds[1] )
#[1] "2020-12-29 13:14:50" "1:28.322" "1:24.561"
#[4] "1:23.973" "1:23.733" "1:24.752"
Create a function to convert the subsequent minutes:seconds.milliseconds to seconds.milliseconds:
to_sec <- function(x){ as.numeric(sub( ":.*","", x )) * 60 +
as.numeric( sub( ".*:","", x ) ) }
Convert the vector to dates that enable calculation of time differences:
ds[2:6] <- to_sec(ds[2:6])
ds[2:6] <- cumsum(ds[2:6])
dv <- c( as.POSIXlt(ds[1]), as.POSIXlt(ds[1]) + as.numeric(ds[2:6]) )
# [1] "2020-12-29 13:14:50 CET" "2020-12-29 13:16:18 CET"
# [3] "2020-12-29 13:17:42 CET" "2020-12-29 13:19:06 CET"
# [5] "2020-12-29 13:20:30 CET" "2020-12-29 13:21:55 CET"
dv[6] - dv[1]
# Time difference of 7.089017 mins
Related
I would like to change only the year format on a POSIX date-time value. I would like to change 2013-12-30 XX:XX:XX to 2012-12-30 XX:XX:XX . I would like this to be general as there are hundreds of incidences with different hours. Is this possible to do while keeping the column as a POSIX value
1) Base R. Convert to POSIXlt, subtract one from the year component and convert back to POSIXct. No packages are used.
yearMinus <- function(x, n = 1) {
lt <- as.POSIXlt(x)
lt$year <- lt$year - n
as.POSIXct(lt)
}
# test
datetimes <- as.POSIXct( c("2013-12-30 03:02:01", "2013-12-30 03:02:01") )
yearMinus(datetimes)
## [1] "2012-12-30 03:02:01 EST" "2012-12-30 03:02:01 EST"
2) gsubfn Convert to character, match 4 digits, convert the match to numeric and subtract 1 (done in the second argument which represents the transformation in formula notation) and then convert back to POSIXct. This is done in one gsubfn call.
library(gsubfn)
as.POSIXct(gsubfn("\\d{4}", ~ as.numeric(year) - 1, as.character(datetimes)))
## [1] "2012-12-30 03:02:01 EST" "2012-12-30 03:02:01 EST"
If you want to subtract a year from the current timestamp
df$time - lubridate::years(1)
If you want to change only specific date without changing the time we can use sub
df$time <- as.POSIXct(sub("2013-12-30", "2012-12-30", df$time))
I have been encountering a strange performance issue with R.
I have a csv file that contains close to 600,00 lines and 11 columns. The last column contains dates. I am trying filter rows based on whether date in the last column is a weekend or weekday. As you can see from the output below, it takes 12 seconds for this relatively simple filtering.
> library(lubridate)
> data335 = read.csv("data335.csv")
> Sys.time()
[1] "2017-10-29 00:50:16 IST"
> delete_variable = data335[ifelse((wday(data335$ticket_date) %in% c("1","6")), T , F),][11]
> Sys.time()
[1] "2017-10-29 00:50:28 IST"
However, filtering on other column values hardly takes a second or two.
> Sys.time()
[1] "2017-10-29 00:58:58 IST"
> delete_variable = data335[(data335$route_no == "V-335EUP") ,][11]
> Sys.time()
[1] "2017-10-29 00:58:58 IST"
I'm sure, in the earlier filtering case, I am not doing it in the R way. Is there a way to bring this time taken to filter within 2 seconds?
On my machine, your original code ran in ~7 seconds. I noticed that data335$ticket_date was stored as a factor, so I read it in as a string and coerced it to date format. Time dropped to .1 second.
Also took out the if_else statement, because %in% already returns a logical vector. And used numeric instead of character for the c(1,7) (you had c("1", "6"), but if you are looking for weekends, I think you want 1 & 7). Those resulted in minor speed improvements.
library(lubridate)
data335 <- read.csv('Downloads/data335.csv', stringsAsFactors=FALSE)
data335$ticket_date <- as.Date(data335$ticket_date, format="%d-%m-%Y")
start <- Sys.time()
delete_variable = data335[wday(data335$ticket_date) %in% c(1,7),][11]
end <- Sys.time()
end-start
I have a column that I have already changed to DATE using as.date
> test_subset$Date <- as.Date(test_subset$Date, "%d/%m/%Y")
The next column holds numeric data as the times corresponding to adjacent column of dates. The problem I have is it keeps inserting the current date into this time only column - I specify column "TIME"
> test_subset[[2]] <- strptime(test_subset[[2]], "%H:%M:%S")
but the result appends the current date to the time.
How do I adjust the "TIME" column to NOT include the current date.
How can I strip out the current date from the "TIME" Column
The date is being inserted because you're using strptime(). You can use format(), since you don't necessarily need a special class for the times. If we have an example vector x,
x <- Sys.time() + 0:5
#[1] "2015-03-05 07:08:27 PST" "2015-03-05 07:08:28 PST"
#[3] "2015-03-05 07:08:29 PST" "2015-03-05 07:08:30 PST"
#[5] "2015-03-05 07:08:31 PST" "2015-03-05 07:08:32 PST"
To get the date only, you can use
as.Date(format(x, "%F"))
# [1] "2015-03-05" "2015-03-05" "2015-03-05" "2015-03-05"
# [5] "2015-03-05" "2015-03-05"
And to get the time only, use format() again, but with %T.
format(x, "%T")
# [1] "07:08:27" "07:08:28" "07:08:29" "07:08:30" "07:08:31"
# [6] "07:08:32"
If it turns out you do need a time class on the second column, you can use chron::times() which gives a "times" class.
(time <- chron::times(format(x, "%T")))
# [1] 07:08:27 07:08:28 07:08:29 07:08:30 07:08:31 07:08:32
class(time)
# [1] "times"
I have a column named timings of class factor with time stamps in the following format:
1/11/07 15:15
I applied strptime on timings to generate tStamp as follows:
tStamp=strptime(timings,format="%m/%d/%Y %H:%M")
i)
The corresponding entry in tStamp looks like 0007-01-11 15:15:00 now. Why has it made 2007 or 07 into 0007? What is a correct way to generate tStamp?
ii)
After generating tStamp correctly, how do we convert it to the Unix time Seconds. (Seconds since...1970) format?
You need the lowercase %y for 2-digit years:
R> pt <- strptime("1/11/07 15:15",format="%m/%d/%y %H:%M")
R> pt
[1] "2007-01-11 15:15:00 CST"
R>
where CST is my local timezone.
And as.numeric() or as.double() converts to a double ...
R> as.numeric(pt)
[1] 1168550100
... which has fractional seconds if those are in the input:
R> options("digits.secs"=3) # show milliseconds
R> as.numeric(Sys.time()) # convert current time
[1] 1372201674.52 # now with sub0seconds.
My goal is to create a vector of POSIXct time stamps given a start, an end and a delta (15min, 1hour, 1day). I hoped I could use seq for this, but I have a problem converting between the numeric and POSIXct representation:
now <- Sys.time()
now
# [1] "2012-01-19 10:30:39 CET"
as.POSIXct(as.double(now), origin="1970-01-01", tz="CET")
# [1] "2012-01-19 09:30:39 CET"
as.POSIXct(as.double(now), origin=as.POSIXct("1970-01-01", tz="CET"), tz="CET")
# [1] "2012-01-19 09:30:39 CET"
One hour gets lost during this conversion. What am I doing wrong?
There is a seq() method for objects of class "POSIXt" which is the super class of the "POSIXlt" and "POSIXct" classes. As such you don't need to do any conversion.
> now <- Sys.time()
> tseq <- seq(from = now, length.out = 100, by = "mins")
> length(tseq)
[1] 100
> head(tseq)
[1] "2012-01-19 10:52:38 GMT" "2012-01-19 10:53:38 GMT"
[3] "2012-01-19 10:54:38 GMT" "2012-01-19 10:55:38 GMT"
[5] "2012-01-19 10:56:38 GMT" "2012-01-19 10:57:38 GMT"
You have to be aware that when converting from POSIXct to numeric, R takes the timezone into account but always starts counting from a GMT origin :
> xgmt <- as.POSIXct('2011-01-01 14:00:00',tz='GMT')
> xest <- as.POSIXct('2011-01-01 14:00:00',tz='EST')
> (as.numeric(xgmt) - as.numeric(xest)) / 3600
[1] -5
As you see, the time in EST is conceived to be five hours earlier than the time in GMT, which is the time difference between both timezones. It's that value that is saved internally.
The as.POSIXCT() function just adds an attribute containing the timezone. It doesn't alter the value, so you get the time presented in GMT time, but with an attribute telling it is EST. This also means that once you go from POSIXct to numeric, you should treat your data as if it's GMT time. (It's a whole lot more complex than that, but it's the general idea). So you have to calculate the offset as follows:
> nest <- as.numeric(xest)
> origin <- as.POSIXct('1970-01-01 00:00:00',tz='EST')
> offset <- as.numeric(origin)
> as.POSIXct(nest-offset,origin=origin)
[1] "2011-01-01 14:00:00 EST"
This works whatever the timezone is in your locale (in my case, that's actually CET). Also note that behaviour of timezone data can differ between systems.
These time zone issues are always fiddly, but I think the problem is that your origin is being calculated in the wrong time zone (since the string only specifies the date).
Try using origin <- now - as.numeric(now).
Alternatively, use lubridate::origin, which is the string "1970-01-01 UTC".
A full solution, again using lubridate.
start <- now()
seq(start, start + days(3), by = "15 min")
I do not have an answer to your problem, but I do have an alternative way of creating vectors of POSIXct objects. If, for example, you want to create a vector of 1000 timestamps from now with a delta_t of 15 minutes:
now = Sys.time()
dt = 15 * 60 # in seconds
timestamps = now + seq(0, 1000) * dt
> head(timestamps)
[1] "2012-01-19 11:17:46 CET" "2012-01-19 11:32:46 CET"
[3] "2012-01-19 11:47:46 CET" "2012-01-19 12:02:46 CET"
[5] "2012-01-19 12:17:46 CET" "2012-01-19 12:32:46 CET"
The trick is you can add a vector of seconds to a POSIXct object.
An alternative to using seq.POSIXt is xts::timeBasedSeq, which allows you to specify the sequence as a string:
library(xts)
now <- Sys.time()
timeBasedSeq(paste("2012-01-01/",format(now),"/H",sep="")) # Hourly steps
timeBasedSeq(paste("2012-01-01/",format(now),"/d",sep="")) # Daily steps
You need to use seq(from=start,to=end, by=step). Note that in step you can either use "days" or an integer defining how many seconds elapse from item to item.