How can I sum the total time per driver in R? Can someone help me?
Total time
Prefered end result
One recommendation to make: please do not use images to share data. Instead, use dput() of your data frame. See this post on making a reproducible example on SO.
One approach to this involves the tidyverse and lubridate packages (I am sure there are other solutions).
First, would put your data into long form instead of wide. The times are then converted from %H:%M:%OS (with milliseconds) to durations since midnight.
Then, for each driver, these times are summed up, and results are provided in different formats:
total_time1 - total number of seconds (with decimal places)
total_time2 - number minutes (M) and the number decimal seconds (S)
total_time3 - total time in %M:%OS format (minutes and decimal seconds)
Edit: In addition, I have added two columns based on OP request:
total_time_minutes - total number of minutes (with decimal places)
avg_speed - average speed in km/hr, assuming 27.004,65 meters
I hope this is helpful. Please let me know.
library(tidyverse)
library(lubridate)
df %>%
pivot_longer(cols = -lap) %>%
mutate(lap_time = as.numeric(as.POSIXct(value, format = "%H:%M:%OS", tz = "UTC")) -
as.numeric(as.POSIXct(Sys.Date(), tz = "UTC"))) %>%
group_by(name) %>%
summarise(total_time1 = sum(lap_time)) %>%
mutate(total_time2 = seconds_to_period(total_time1),
total_time3 = sprintf("%d:%.4f", minute(total_time2), second(total_time2)),
total_time_minutes = total_time1/60,
avg_speed = 3.6 * 27004.65/total_time1) %>%
as.data.frame()
Output
name total_time1 total_time2 total_time3 total_time_minutes avg_speed
1 Bottas 319.782 5M 19.7815999984741S 5:19.7816 5.32969 304.010
2 Hamilton 320.320 5M 20.3204002380371S 5:20.3204 5.33867 303.498
3 Leclerc 319.981 5M 19.98140001297S 5:19.9814 5.33302 303.820
4 Verstappen 318.220 5M 18.219899892807S 5:18.2199 5.30366 305.502
5 Vettel 318.625 5M 18.6247997283936S 5:18.6248 5.31041 305.114
Data
df <- structure(list(lap = 1:5, Bottas = c("00:01:04.9388", "00:01:03.7164",
"00:01:04.0028", "00:01:03.3424", "00:01:03.7812"), Hamilton = c("00:01:04.5280",
"00:01:03.7524", "00:01:03.9632", "00:01:04.3712", "00:01:03.7056"
), Leclerc = c("00:01:04.9812", "00:01:03.7740", "00:01:04.6026",
"00:01:03.3920", "00:01:03.2316"), Verstappen = c("00:01:04.1704",
"00:01:03.7383", "00:01:03.7128", "00:01:02.8460", "00:01:03.7524"
), Vettel = c("00:01:04.3632", "00:01:02.8244", "00:01:03.7164",
"00:01:03.8532", "00:01:03.8676")), class = "data.frame", row.names = c(NA,
-5L))
Related
How do you convert SS.xxx (Seconds.Milliseconds) to MM:SS.xxx (Minutes:Seconds.Milliseconds) using R?
For example, my input is
time = 92.180
my desired output is
time = 01:32.180
All time fields have 3 decimal places.
one option is the lubridate package - since you did not specify the output class I included a few possible outputs:
package(lubridate)
t <- 92.180
# your output string as character
lubridate::seconds(t) %>%
lubridate::as_datetime() %>%
format("%M:%OS3")
# output as period
lubridate::seconds(t) %>%
lubridate::as.period()
# output as duration
lubridate::seconds(t) %>%
lubridate::as.duration()
# output as time time
lubridate::seconds(t) %>%
lubridate::as.difftime()
I want to transform from chr to date format
I have this representing year -week:
2020-53
I ve tried to do this
mutate(semana=as_date(year_week,format="%Y-%U"))
but I get the same date in all dataset 2020-01-18
I also tried
mutate(semana=strptime(year_week, "%Y-%U"))
getting the same result
Here you can see the wrong convertion
Any idea?, thanks
I think I've got something that does the job.
library(tidyverse)
library(lubridate)
# Set up table like example in post
trybble <- tibble(year_week = c("2020-53", rep("2021-01", 5)),
country = c("UK", "FR", "GER", "ITA", "SPA", "UK"))
# Function to go into mutate with given info of year and week
y_wsetter <- function(fixme, yeargoal, weekgoal) {
lubridate::year(fixme) <- yeargoal
lubridate::week(fixme) <- weekgoal
return(fixme)
}
# Making a random date so col gets set right
rando <- make_datetime(year = 2021, month = 1, day = 1)
# Show time
trybble <- trybble %>%
add_column(semana = rando) %>% # Set up col of dates to fix
mutate(yerr = substr(year_week, 1, 4)) %>% # Get year as chr
mutate(week = substr(year_week, 6, 7)) %>% # Get week as chr
mutate(semana2 = y_wsetter(semana,
as.numeric(yerr),
as.numeric(week))) %>% # fixed dates
select(-c(yerr, week, semana))
Notes:
If you somehow plug in a week greater than 53, lubridate doesn't mind, and goes forward a year.
I really struggled to get mutate to play nicely without writing my own function y_wsetter. In my experience with mutates with multiple inputs, or where I'm changing a "property" of a value instead of the whole value itself, I need to probably write a function. I'm using the lubridate package to change just the year or week based on your year_week column, so this is one such situation where a quick function helps mutate out.
I was having a weird time when I tried setting rando to Sys.Date(), so I manually set it to something using make_datetime. YMMV
EDIT:
I've did tried changes and opted for the tidyquant package shared in the comments below.
This time I've set a range with variables, but I think I'm having trouble turning it into a function or a vector. This could either be the result of me not writing a bad for loop orrr a limitation with the underlying library.
The idea behind this loop is that it pulls the adjusted prices for the period and then takes the first and last price to calculate a change (aka the return in share price.)
I'm not sure, but would love some thoughts!
start_date = "2019-05-20"
end_date = "2019-05-30"
Symbol_list <- c("CTVA","IBM", "GOOG", "GE")
range_returns <- for (Symbol in Symbol_List) {
frame <- tq_get(Symbol, get = "stock.prices", from = start_date, to = end_date, complete_cases = FALSE)[,7]
(frame[nrow(frame),] - frame[1,]) / frame[1,]
}
Old stuff
Let's say I've got a dataframe
symbol <- c("GOOG", "IBM","GE","F","BKR")
name <- c("Google", "IBM","General Electric","Ford","Berkshire Hathaway")
df <- cbind(symbol, name)
And I want to create a third column - df$custom_return that's defined based on my personal time frame.
I've tried working with the quantmod package and I'm having some trouble with it's constraints.
Where I'm at:
I have to pull the entire price history first which prohibits the ability create a new column like so:
start_date <- "2003-01-05"
end_date <- "2019-01-05"
df$defined_period_return <- ROC(getSymbol(df$symbol, src = yahoo, from = start_date, to = end_date, periodicity = "monthly"))
I know that I only want the adjusted close which is the 6th column for the Yahoo source. So, I could add the following and just pull the records into an environment.
price_history <- null
for (Symbol in sp_500$Symbol)
price_history <- cbind(price_history,
getSymbols(df$symbol, from = start_date,
to = end_date, periodicity = "daily",
auto.assign=FALSE)[,6])
Ok, that seems feasible, but it's not exactly seamless and now I run into an issue if one of my symbols (Tickers) falls outside of the range of dates provided. For example CTVA is one of them and it didn't start trading until after the the end date. The whole scrape stops in motion right there. How do I skip over that error?
And let's say we solve the "snag" of not finding relevant records...how would you calculate the return for each symbol over different timelines? For example - Google didn't start trading until 2004. getSymbol does pull the price history once it starts trading, but that return timeline is different than GE which had data at the start of my range.
No need for a for loop. You can do everything with tidyquant and dplyr. For the first and last observations of a group you can use the functions first and last from dplyr. See code below for a working example.
library(tidyquant)
library(dplyr)
start_date = "2019-05-20"
end_date = "2019-05-30"
Symbol_list <- c("CTVA","IBM", "GOOG", "GE")
stocks <- tq_get(Symbol_list, get = "stock.prices", from = start_date, to = end_date, complete_cases = FALSE)
stocks %>%
group_by(symbol) %>%
summarise(returns = (last(adjusted) / first(adjusted)) - 1) # calculate returns
# A tibble: 4 x 2
symbol returns
<chr> <dbl>
1 CTVA -0.0172
2 GE -0.0516
3 GOOG -0.0197
4 IBM -0.0402
I am currently working on a project and I need some help. I want to predict the length of flight delays, using a statistical model. The data set does not contain the length of flight delays, but it can be calculated from the actual and scheduled departure times.
I will include a link if you want the whole dataset:
https://drive.google.com/file/d/11BXmJCB5UGEIRmVkM-yxPb_dHeD2CgXa/view?usp=sharing
I then ran the following code
Delays <- read.table("FlightDelays.csv", header=T, sep=",")
DepatureTime <- strptime(formatC(Delays$deptime, width = 4, format = "d", flag = "0"), "%H%M")
ScheduleTime <- strptime(formatC(Delays$schedtime, width = 4, format = "d", flag = "0"), "%H%M")
DelayTime <- as.numeric(difftime(DepatureTime, ScheduleTime))/60
DelayData <- data.frame(DelayTime, Delays)
The above code allowed me to get the delay time in minutes
For those of you who do not want to obtain the whole dataset I will now include a small example of some observations of the form
structure(list(schedtime = c(1455, 1640, 1245, 1715, 1039 , 2120), deptime = c(1455, 1640, 1245, 1709, 1035, 0010)), .Names = c("schedtime", "deptime"), row.names = c(NA, 6L), class = "data.frame")
and if you run the a code I did at the beginning, the delay in minutes for the 6th observation will be -1270 minutes not a delay of 170 minutes as i believe strptime assumes you are still in the same day and doesn't recognise that the delay caused the departure time to be the early hours of the following day.
How can i get the code to recognise the delays will sometimes mean the departure time will go on to the following day?
Thank you for any help
Using lubridate:
library(lubridate)
ScheduleTime <- as_datetime(formatC(Delays$schedtime, width = 4, format = "d", flag = "0"),format="%H%M")
DepatureTime <- as_datetime(formatC(Delays$deptime, width = 4, format = "d", flag = "0"),format="%H%M") + hours(ifelse(Delays$deptime < Delays$schedtime & Delays$schedtime > 2000,24,0))
DelayTime <- difftime(DepatureTime, ScheduleTime)/60
DelayData <- data.frame(DelayTime, Delays)
The Problem is, that you have to decide when it isn't resonable, that a smaller value of deptime compared to schedtime does not correspond to a day shift, but to a flight leaving early. I don't see a general way around that.
In R I have a dataset containing abbreviated numbers, I really want the full metric so I can sum the values... is there a library or something that would aid in this effort?
i.e.
start result
5k = 5,000
.5k = 500
.25k = 250
5m = 5,000,000
.5m = 500,000
and so on...
Data
dd <- data.frame(start = c('5k', '.5k', '.25k', '5m', '.5m'),
result = c('5,000', '500', '250', '5,000,000', '500,000'),
stringsAsFactors = FALSE)
There is no need for any library.Just declare k=1000.And add * operator in between.
What's the big deal in it.No need for different Library.