convert lap times from factor to Time with decimals in R - r

i have got a set of F1 lap times that in minutes and seconds. the seconds have tenth, hundred and thousands of a second and i need to keep it that way as i am trying to graph this data but R classes it as a factor. i have tried to convert it to numeric(as.numeric) but that does not seem to work.
here is a data sample:
DriverId Q1 Q2 Q3
1 1:05.585 1:04.772 1:04.293
20 1:04.655 1:07.354 1:04.123
ggplotdoes not let me graph the data properly if it is not in the right format.
i have tried using the chron and POSIXlt packages with no avail as it seems that they do not support the mm:ss.000 format
help please!

You can use strptime, where %OS represents the seconds and their fractional parts. Example:
x<-as.factor(c("1:04.293",'1:04.772','1:04.890'))
options(digits.secs=3)
x<- strptime(x, "%M:%OS")
Output:
"2018-01-13 00:01:04.293 CET" "2018-01-13 00:01:04.772 CET" "2018-01-13 00:01:04.890 CET"
Return only minutes and seconds (as character), for example for tick labels in a plot:
format(x,"%M:%OS3")
[1] "01:04.293" "01:04.772" "01:04.890"
So an example with plotting:
plot(x,c(1,2,3),xaxt='n')
at <- axTicks(1)
axis(1, at=at, labels=format(as.POSIXct(at,origin='1970-1-1'),"%M:%OS3"), las=1)
Output:
Hope this helps!

We can create a times class object with times from chron
library(chron)
library(tidyr)
library(dplyr)
gather(df1, key, val, -DriverId) %>%
mutate(val = chron::times(paste0("00:", val)))
In the above, we converted it to 'long' format. If we don't need to reshape, then
df1 %>%
mutate_at(vars(-DriverId), funs(chron::times(paste0("00:", .))))
# DriverId Q1 Q2 Q3
#1 1 00:01:06 00:01:05 00:01:04
#2 20 00:01:05 00:01:07 00:01:04

Related

Converting character to dates with hours and minutes

I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"

How to calculate time difference in milliseconds using R when formats are different?

I have a problem in R that is killing me! Can you help me?
I found a question in StackOverflow that gave me a very good explanation.
Here is the link: How to parse milliseconds?
I was able to implement the following code that works very well.
z2 <- strptime("10/2/20 11:16:17.682", "%d/%m/%y %H:%M:%OS")
z1 <- strptime("10/2/20 11:16:16.683", "%d/%m/%y %H:%M:%OS")
When I calculate z2-z1, I get
Time difference of 0.9989998 secs
Similarly, when I use
z3 <- strptime("130 11:16:16.683", "%j %H:%M:%OS")
z4 <- strptime("130 11:16:18.682", "%j %H:%M:%OS")
When I calculate z4-z3, I get
Time difference of 1.999 secs
What is my problem?
The first column has the format 130 18:25:50.408, with millions of rows!!!
The second column has the format 2020 130 18:25:51.357 that is like the first column but has the year 2020.
The first column is also from 2020, but as the year is not there R uses the current year.
First question,
How can I substract both columns? I know how to substract columns.
What I do not know is to subtract these two times.
For example, second time is 2020 130 18:25:51.357
and first time is 130 18:25:50.408
I guess that I can do it programmatically converting it to a string, and eliminating the 2020. However, I am hoping that a quicker solution is available using base R or the lubridate package.
Second question,
"%j %H:%M:%OS" is the format for 130 11:16:16.683
What is the format for 2020 130 18:25:51.357?
As explained before this is working very well:
z3 <- strptime("130 11:16:16.683", "%j %H:%M:%OS")
But, this is NOT working.
z7 <- strptime("2020 130 11:16:16.683", "%y %j %H:%M:%OS")
UPDATE 1
I solved the second question!
However, I have not figured out yet the first question.
For the second question, the mistake in the format was that instead of %y, I need to write %Y with upper case.
Here is one example:
later <- strptime("2020 130 11:16:17.683", "%Y %j %H:%M:%OS")
earlier <- strptime("2020 130 11:16:16.684", "%Y %j %H:%M:%OS")
difftime(later,earlier,units="secs")
The R results is:
Time difference of 0.9990001 secs
UPDATE 2
At this point, what is pending is the following:
I need to substract two times that were made the same day on 2020.
The second time does have the year, the first time does not.
later <- strptime("2020 130 11:16:17.683", "%Y %j %H:%M:%OS")
earlier <- strptime("130 11:16:16.684", "%j %H:%M:%OS")
difftime(later,earlier,units="secs")
R produces the following result:
Time difference of -31622399 secs
Why? As we are on 2021, R formats the vector earlier as the current year, 2021 because the year is not there.
My columns has millions of rows.
At this point, my guess is that I would need to add 2020 with a concatenation or something like that. Is there any other method?
Thank you for your help!
Your object z2 is a POSIX list object. What this means is that it is a list of the time elements of your time.
print.default(z2)
# $sec
# [1] 17.682
#
# $min
# [1] 16
#
# $hour
# [1] 11
#
# $mday
# [1] 10
#
# $mon
# [1] 1
#
# $year
# [1] 120
#
# $wday
# [1] 1
#
# $yday
# [1] 40
#
# $isdst
# [1] 0
#
# $zone
# [1] "GMT"
#
# $gmtoff
# [1] NA
#
# attr(,"class")
# [1] "POSIXlt" "POSIXt"
When you do a subtraction, z2 - z1 R dispatches this operation to a function called -.POSIXt, which itself calls difftime. This function converts z2 to a POSIX count object. What this means is that it gets converted to a count of seconds since the beginning of the epoch, by default "1970-01-01".
options("digits" = 16)
print.default(as.POSIXct(z2))
# [1] 1581333377.682
# attr(,"class")
# [1] "POSIXct" "POSIXt"
# attr(,"tzone")
# [1] ""
difftime(z2, z1)
# Time difference of 0.9989998340606689 secs
R, like most software, works with double precision numerics. This means that arithmetic is imprecise, although approximately true. Most software will try to hide this imprecision by reducing the number of digits shown. That said, different numbers will give you different imprecision, so you might prefer referring directly to the list element of z2.
print.default(z2$sec - z1$sec)
# [1] 0.9989999999999988
You could therefore apply the time difference using your favourite data.frame tools.
options("digits" = 6)
# character columns
df1 <- data.frame(
col1 = c("10/2/20 11:16:17.682", "10/2/20 11:16:16.683"),
col2 = c("130 11:16:16.683", "130 11:16:18.682"),
stringsAsFactors = FALSE)
library(dplyr)
# convert columns to POSIXlt
df2 <- mutate(df1,
col1 = strptime(col1, "%d/%m/%y %H:%M:%OS"),
col2 = strptime(stringr::str_c("2020 ", col2), "%Y %j %H:%M:%OS"),
diff_days = unclass(difftime(col2, col1, units = "days")))
df2
# col1 col2 diff_days
# 1 2020-02-10 11:16:17 2020-05-09 11:16:16 88.9583
# 2 2020-02-10 11:16:16 2020-05-09 11:16:18 88.9584

How do I custom the 24 hour start hour and finish hour for line plot? (for example, start at 7:30)

I have a 24 hour data starting from 7:30 today (for example), until 7:30 the next day, because I didn't link the date to the line plot, R sorts the hour starting from 00:00 despite the data starting at 7:30, I am a beginner in R, and I don't know where to begin to even solve this problem, should I try linking the date also to the X axis, or is there a better solution?
My time function somehow didn't work either, it used to work when I was plotting data for 15 minute increments.
library(chron)
d <- read.csv(file="data.csv", header = T)
t <- times(d$Time)
plot(t,d$MCO2, type="l")
Graph created from the 24 hour data I have :
Graph created from a 15 minute data using the same code :
I wanted the outcome to be from 7:30 to 7:30 the next day, but it showed now a decimal number from 0.0 to 1
Here is the link to the data, just in case:
https://www.dropbox.com/s/wsg437gu00e5t08/Data%20210519.csv?dl=0
The question is actually about combining a date column and a time column to create a timestamp containing date AND time. Note that I suggest to process everything as if we are in GMT timezone. You can pick whatever timezone you want, then stick to it.
# use ggplot
library(ggplot2)
# assume everything happens in GMT timezone
Sys.setenv( TZ = "GMT" )
# replicating the data: a measurement result sampled at 1 sec interval
t <- seq(start, end, by = "1 sec")
Time24 <- trimws(strftime(t, format = "%k:%M:%OS", tz="GMT"))
Date <- strftime(t, format = "%d/%m/%Y", tz="GMT")
head(Time24)
head(Date)
d <- data.frame(Date, Time24)
# this is just a random data of temperature
d$temp <- rnorm(length(d$Date),mean=25,sd=5)
head(d)
# the resulting data is as follows
# Date Time24 temp
#1 22/05/2019 0:00:00 22.67185
#2 22/05/2019 0:00:01 19.91123
#3 22/05/2019 0:00:02 19.57393
#4 22/05/2019 0:00:03 15.37280
#5 22/05/2019 0:00:04 31.76683
#6 22/05/2019 0:00:05 26.75153
# this is the answer to the question
# which is combining the the date and the time column of the data
# note we still assume that this happens in GMT
t <- as.POSIXct(paste(d$Date,d$Time24,sep=" "), format = "%d/%m/%Y %H:%M:%OS", tz="GMT")
# print the data into a plot
png(filename = "test.png", width = 800, height = 600, units = "px", pointsize = 22 )
ggplot(d,aes(x=t,y=temp)) + geom_line() +
scale_x_datetime(date_breaks = "3 hour",
date_labels = "%H:%M\n%d-%b")
The problem is that the function times does not include information about the day. This is a problem since your data spans two days.
The data type you use should be able to include information about the day. Posix is this data type. Also, since Posix is the go-to date-time object in R it is much easier to plot.
Before plotting the data, the time column should have the correct difference in days. When just transforming the column with as.POSIXct, the times of day 2 are read as if it is from day 1. This is why we have to add 24 hours to the correct entries.
After that, it is just a matter of plotting. I added an example of the package of ggplot2 since I prefer these plots.
You might notice that using as.POSIXct will add an incorrect date to your time information. Don't bother about this, you use this date just as a dummy date. You don't use this date itself, you just use it to be able to work with the difference in days.
library(ggplot2)
# Read in your data set
d <- read.csv(file="Data 210519.csv", header = T)
# Read column into R date-time object
t <- as.POSIXct(d$Time24, format = "%H:%M:%OS")
# Add 24 hours to time the time on day 2.
startOfDayTwo <- as.POSIXct("00:00:00", format = "%H:%M:%OS")
endOfDayTwo <- as.POSIXct("07:35:00", format = "%H:%M:%OS")
t[t >= startOfDayTwo & t <= endOfDayTwo] <- t[t >= startOfDayTwo & t <= endOfDayTwo] + 24*60*60
plot(t,d$MCO2, type="l")
# arguably a nicer plot
ggplot(d,aes(x=t,y=MCO2)) + geom_line() +
scale_x_datetime(date_breaks = "2 hour",
date_labels = "%I:%M %p")

R: plotting data.frame against time with "hardcoded" date column

I have googled a lot and yet I still cant figure this one out. I am trying to plot one column in a dataframe against time, however my date column is "hardcoded" (for the lack of a better word) as index in the dataframe not a DATE column, as a variable, by itself.
> head(tmp)[1]
this is what I get, the 1st column is Returns:
RETURNS
2010-01-13 00:00:00 0.8291384
2010-01-14 00:00:00 0.2423567
2010-01-15 00:00:00 -1.0882186
2010-01-19 00:00:00 1.2422194
2010-01-20 00:00:00 -1.0654438
2010-01-21 00:00:00 -1.9126605
If I plot it like:
plot(tmp$RETURNS)
I get a plot of returns against index from 1 to 1500 (number of obs.) and not time. If I got a distinct time column I would plot it like this and it would be fine:
plot(tmp$DATE, tmp$RETURNS)
However, I dont know how to extract the date from that "hardcoded" date column, if that makes sense. I tried to convert it to other objects, timeSeries, zoo etc. Didnt help.I am sure there is some kind of simple function, I just cant find it. Thanks for any help guys.
EDIT:
Thanks guys, your help is very much appreciated. All answers are great, too bad that I cant accept them all ;) Of course it was rownames what I was looking for.
Reproducing your data (you should really have used dput to make life easier for us):
df <- as.data.frame(c(0.8291384, 0.2423567,-1.0882186, 1.2422194,-1.0654438,-1.9126605))
names(df) <- c("RETURNS")
rownames(df) <- c("2010-01-13 00:00:00", "2010-01-14 00:00:00", "2010-01-15 00:00:00", "2010-01-19 00:00:00","2010-01-20 00:00:00","2010-01-21 00:00:00")
df
RETURNS
2010-01-13 00:00:00 0.8291384
2010-01-14 00:00:00 0.2423567
2010-01-15 00:00:00 -1.0882186
2010-01-19 00:00:00 1.2422194
2010-01-20 00:00:00 -1.0654438
2010-01-21 00:00:00 -1.9126605
Cleaning up:
df$Date <- as.Date(rownames(df))
rownames(df) <- NULL
df
RETURNS Date
1 0.8291384 2010-01-13
2 0.2423567 2010-01-14
3 -1.0882186 2010-01-15
4 1.2422194 2010-01-19
5 -1.0654438 2010-01-20
6 -1.9126605 2010-01-21
Plotting:
plot(df$Date, df$RETURNS)
or
library(ggplot2)
ggplot(df, aes(x=Date, y=RETURNS)) + geom_point() + scale_x_date()
Assuming that the input is as in the Note below then using zoo we can plot using classic graphics, ggplot2 and lattice as follows. We also show a base R solution at the end and a variation. Note that since the time is always 0 we used "Date" class for the time index in all cases.
library(zoo)
z <- zoo(df$RETURNS, as.Date(rownames(df)))
plot(z)
library(ggplot2)
autoplot(z)
library(lattice)
xyplot(z)
# this one does not use any packages
df2 <- data.frame(time = as.Date(rownames(df)), RETURNS = df$RETURNS)
plot(RETURNS ~ time, df2)
# this also works using df2 just calculated
plot(df2)
Note: We assume the input is:
df <- data.frame(
RETURNS = c(0.8291384, 0.2423567,-1.0882186, 1.2422194,-1.0654438,-1.9126605),
row.names = c("2010-01-13 00:00:00", "2010-01-14 00:00:00", "2010-01-15 00:00:00",
"2010-01-19 00:00:00","2010-01-20 00:00:00","2010-01-21 00:00:00"))

R Rounding a Time

I have a data frame with a series of times in the following format:
08:09:23.079
> class(timer3)
[1] "factor"
I would like to round/convert them to this format:
08:09
The end goal is to use them as values for the x-axis of a plot so I assume they would need to go to some type of time format (zoo, as.Date, etc.).
Any suggestions?
Suppose we have this input data:
DF <- data.frame(times = c("08:09:23.079", "08:30:13.062"), values = 1:2)
To keep it simple lets assume that there is at most one time point per minute (we show an alternative that is slightly longer afterwards without this restriction):
library(zoo)
library(chron)
# this assumes we want to store times to the second
tt <- times(as.character(DF$times))
z <- zoo(DF$values, tt)
plot(z, xaxt = "n")
# custom axis - assumes sufficiently many points to get reasonable graph
# round tick mark locations to the minute and remove the seconds from label
axt <- trunc(times(axTicks(1)), "min")
axis(1, at = axt, lab = sub(":..$", "", axt))
The above method of creating z could alternately be replaced with this. It works whether or not there is more than one point per minute as it aggregates them to the minute:
# with this z we will be store times to the minute
z <- read.zoo(DF, FUN = function(x) trunc(times(as.character(x)), "min"),
aggregate = mean)
EDIT: plotting and truncation.
At risk of being called necromancer, I will answer this question as I think this situation arises quite often.
Here is how to do it if you convert your timeseries data in xts format. The function to be used here is align.time
> head(GBPJPY)
GBPJPY.Open GBPJPY.High GBPJPY.Low GBPJPY.Close
2009-05-01 00:14:59 146.387 146.882 146.321 146.620
2009-05-01 00:29:54 146.623 146.641 146.434 146.579
2009-05-01 00:44:59 146.579 146.908 146.570 146.810
2009-05-01 00:59:59 146.810 146.842 146.030 146.130
2009-05-01 01:14:59 146.130 146.330 146.100 146.315
2009-05-01 01:29:57 146.315 146.382 146.159 146.201
> head(align.time(GBPJPY, 15*60))
GBPJPY.Open GBPJPY.High GBPJPY.Low GBPJPY.Close
2009-05-01 00:15:00 146.387 146.882 146.321 146.620
2009-05-01 00:30:00 146.623 146.641 146.434 146.579
2009-05-01 00:45:00 146.579 146.908 146.570 146.810
2009-05-01 01:00:00 146.810 146.842 146.030 146.130
2009-05-01 01:15:00 146.130 146.330 146.100 146.315
2009-05-01 01:30:00 146.315 146.382 146.159 146.201
as.zoo(sapply(timer3,substring,1,5))
or as.xts?
Maybe looking at a bigger sample of your data would help.
Two steps: 1) Factor to character: as.character() 2) character to POSIXct: strptime()

Resources