I have time data in hours:minutes:seconds:milliseconds. I also have a descriptor for what occurs at each timepoint. An example of my dataset is as follows:
StartTime <- c("00:00:00:00", "00:00:14:04", "00:01:51:06", "00:03:30:02")
Events <- c("Start whistle", "Pass", "Shot", "Pass")
RawData <- data.frame(StartTime, Events)
I now wish to create a new column that rounds according to minutes played, from the StartTime column. My anticipated output would be:
MinutesPlayed <- c(0, 0, 2, 3)
I have tried to round using the following code, however it includes the date (unnecessary for my work) and still keeps time in H:M:S format.
RawData$MinutesPlayed <- strptime(RawData$StartTime, "%H:%M:%S")
Where am I please going wrong?
If you're rounding up, then the last element should go up to 4 (since 30.02 seconds rounds to 1 minute). Here's an idea using strptime(), rounding the minutes.
## replace the last colon with a decimal point
st <- sub("(.*):(.*)", "\\1.\\2", StartTime)
## convert to POSIXlt and grab the rounded minutes
round(strptime(st, "%H:%M:%OS"), "mins")$min
# [1] 0 0 2 4
sapply(strsplit(as.character(RawData$StartTime),":"), function(x)
#Use 'ceiling' or 'round' instead of 'floor' as needed
floor(as.numeric(x[1])*60 + #hours to minutes
as.numeric(x[2]) + #minutes
as.numeric(x[3])/60 + #seconds to minutes
as.numeric(x[4])/60000)) #milliseconds to minutes
#[1] 0 0 1 3
Related
I am trying to convert a number into time format.
For example:
I calculate how long has to be charged an electric car at the charging station of 11 kWh.
Energy demand - 2,8 kWh
Charging time = 2,8 kWh/11 kWh = 0,257 h
0,257 h = 15 min 25 sec. = 00:15:25
How can I convert 0,257 h into 00:15:25 in R?
Based on the example, we will assume that the input is less than 24 (but if that is not the case these could be modified to handle that depending on the definition of what such an input should produce).
1) chron::times Use chron times like this. times measures times in fractions of a day so divide the hours (.257) by 24 to give the fraction of a day that it represents.
library(chron)
times(.257 / 24)
## [1] 00:15:25
This gives a chron "times" class object. If x is such an object use format(x) to convert it to a character string, if desired.
2) POSIXct This uses no packages although it is longer. It returns the time as a character string. POSIXct measures time in seconds and so multiply the hours (.257) by 3600 as there are 3600 seconds in an hour.
format(as.POSIXct("1970-01-01") + 3600 * .257, "%H:%M:%S")
## [1] "00:15:25"
2a) This variation would also work. It is longer but it involves no conversion factors. It returns a character string.
format(as.POSIXct("1970-01-01") + as.difftime(.257, units = "hours"), "%H:%M:%S")
## [1] "00:15:25"
Updates: Added (2). Also added (2a) and improved (2).
The answer by #GGrothendieck seems to be the way to go here. But if you had to do this in base R, you could just compute the hour, minute, and second components and build the time string manually:
x <- 2.257 # number of hours
total <- round(x*60*60, digits=0) # the total number of seconds
hours <- trunc(total / (60*60))
minutes <- trunc((x - hours) * 60)
seconds <- total %% 60
ts <- paste0(formatC(hours, width=2, flag="0"), ":",
formatC(minutes, width=2, flag="0"), ":",
formatC(seconds, width=2, flag="0"))
ts
[1] "02:15:25"
Demo
The tidyverse solution would use the hms package:
hms::hms(0.257 * 60^2)
#> 00:15:25.2
Gives you an object of classes hms and difftime. If you want a string:
format(hms::hms(0.257 * 60^2))
#> [1] "00:15:25.2"
Inside a function a need to convert some number, in general in range of 20 to 200, in to difftime and show via format as expected time needed to finish.
as.difftime has got a useful units="auto" so it will use "sec" say for 20 secs and "mins" for 60+ secs...
But it says also
> as.difftime(100, units="auto")
Error in as.difftime(100, units = "auto") :
need explicit units for numeric conversion
How can I avoid that?
EDIT: Current workaround
> (Sys.time()+100)-Sys.time()
Time difference of 1.666667 mins
Is lubridate an alternative?
library(lubridate)
new_difftime(second = 20)
# Time difference of 20 secs
new_difftime(second = 60)
# Time difference of 1 mins
new_difftime(second = 240)
# Time difference of 4 mins
new_difftime(second = 1000000)
# Time difference of 11.57407 days
# new_difftime creates an object of same class as as.difftime does.
class(as.difftime(20, units = "secs"))
# [1] "difftime"
class(new_difftime(second = 20))
# [1] "difftime"
It is also possible to specify input values of several units. E.g. from ?new_difftime
new_difftime(second = 3, minute = 1.5, hour = 2, day = 6, week = 1)
# Time difference of 13.08441 days
This works for me in R:
# Setting up the first inner while-loop controller, the start of the next water year
NextH2OYear <- as.POSIXlt(firstDate)
NextH2OYear$year <- NextH2OYear$year + 1
NextH2OYear<-as.Date(NextH2OYear)
But this doesn't:
# Setting up the first inner while-loop controller, the start of the next water month
NextH2OMonth <- as.POSIXlt(firstDate)
NextH2OMonth$mon <- NextH2OMonth$mon + 1
NextH2OMonth <- as.Date(NextH2OMonth)
I get this error:
Error in as.Date.POSIXlt(NextH2OMonth) :
zero length component in non-empty POSIXlt structure
Any ideas why? I need to systematically add one year (for one loop) and one month (for another loop) and am comparing the resulting changed variables to values with a class of Date, which is why they are being converted back using as.Date().
Thanks,
Tom
Edit:
Below is the entire section of code. I am using RStudio (version 0.97.306). The code below represents a function that is passed an array of two columns (Date (CLass=Date) and Discharge Data (Class=Numeric) that are used to calculate the monthly averages. So, firstDate and lastDate are class Date and determined from the passed array. This code is adapted from successful code that calculates the yearly averages - there maybe one or two things I still need to change over, but I am prevented from error checking later parts due to the early errors I get in my use of POSIXlt. Here is the code:
MonthlyAvgDischarge<-function(values){
#determining the number of values - i.e. the number of rows
dataCount <- nrow(values)
# Determining first and last dates
firstDate <- (values[1,1])
lastDate <- (values[dataCount,1])
# Setting up vectors for results
WaterMonths <- numeric(0)
class(WaterMonths) <- "Date"
numDays <- numeric(0)
MonthlyAvg <- numeric(0)
# while loop variables
loopDate1 <- firstDate
loopDate2 <- firstDate
# Setting up the first inner while-loop controller, the start of the next water month
NextH2OMonth <- as.POSIXlt(firstDate)
NextH2OMonth$mon <- NextH2OMonth$mon + 1
NextH2OMonth <- as.Date(NextH2OMonth)
# Variables used in the loops
dayCounter <- 0
dischargeTotal <- 0
dischargeCounter <- 1
resultsCounter <- 1
loopCounter <- 0
skipcount <- 0
# Outer while-loop, controls the progression from one year to another
while(loopDate1 <= lastDate)
{
# Inner while-loop controls adding up the discharge for each water year
# and keeps track of day count
while(loopDate2 < NextH2OMonth)
{
if(is.na(values[resultsCounter,2]))
{
# Skip this date
loopDate2 <- loopDate2 + 1
# Skip this value
resultsCounter <- resultsCounter + 1
#Skipped counter
skipcount<-skipcount+1
} else{
# Adding up discharge
dischargeTotal <- dischargeTotal + values[resultsCounter,2]
}
# Adding a day
loopDate2 <- loopDate2 + 1
#Keeping track of days
dayCounter <- dayCounter + 1
# Keeping track of Dicharge position
resultsCounter <- resultsCounter + 1
}
# Adding the results/water years/number of days into the vectors
WaterMonths <- c(WaterMonths, as.Date(loopDate2, format="%mm/%Y"))
numDays <- c(numDays, dayCounter)
MonthlyAvg <- c(MonthlyAvg, round((dischargeTotal/dayCounter), digits=0))
# Resetting the left hand side variables of the while-loops
loopDate1 <- NextH2OMonth
loopDate2 <- NextH2OMonth
# Resetting the right hand side variable of the inner while-loop
# moving it one year forward in time to the next water year
NextH2OMonth <- as.POSIXlt(NextH2OMonth)
NextH2OMonth$year <- NextH2OMonth$Month + 1
NextH2OMonth<-as.Date(NextH2OMonth)
# Resettting vraiables that need to be reset
dayCounter <- 0
dischargeTotal <- 0
loopCounter <- loopCounter + 1
}
WaterMonths <- format(WaterMonthss, format="%mm/%Y")
# Uncomment the line below and return AvgAnnualDailyAvg if you want the water years also
# AvgAnnDailyAvg <- data.frame(WaterYears, numDays, YearlyDailyAvg)
return((MonthlyAvg))
}
Same error occurs in regular R. When doing it line by line, its not a problem, when running it as a script, it it.
Plain R
seq(Sys.Date(), length = 2, by = "month")[2]
seq(Sys.Date(), length = 2, by = "year")[2]
Note that this works with POSIXlt too, e.g.
seq(as.POSIXlt(Sys.Date()), length = 2, by = "month")[2]
mondate.
library(mondate)
now <- mondate(Sys.Date())
now + 1 # date in one month
now + 12 # date in 12 months
Mondate is bit smarter about things like mondate("2013-01-31")+ 1 which gives last day of February whereas seq(as.Date("2013-01-31"), length = 2, by = "month")[2] gives March 3rd.
yearmon If you don't really need the day part then yearmon may be preferable:
library(zoo)
now.ym <- yearmon(Sys.Date())
now.ym + 1/12 # add one month
now.ym + 1 # add one year
ADDED comment on POSIXlt and section on yearmon.
Here is you can add 1 month to a date in R, using package lubridate:
library(lubridate)
x <- as.POSIXlt("2010-01-31 01:00:00")
month(x) <- month(x) + 1
>x
[1] "2010-03-03 01:00:00 PST"
(note that it processed the addition correctly, as 31st of Feb doesn't exist).
Can you perhaps provide a reproducible example? What's in firstDate, and what version of R are you using? I do this kind of manipulation of POSIXlt dates quite often and it seems to work:
Sys.Date()
# [1] "2013-02-13"
date = as.POSIXlt(Sys.Date())
date$mon = date$mon + 1
as.Date(date)
# [1] "2013-03-13"
I've got logs of events that contain:
start time, end time, category id and count. They cover several months.
I'd like to aggregate them over time to be able to trace histograms over a given day, week, month.
So I assume the best way to do this is to bin the periods in buckets. I think 5 minutes would be good.
e.g. If an event starts at 1.01pm and ends at 1.07pm, I'd like to obtain 2 records for it as it covers 2 periods of 5 minutes (0-5 and 5-10) and replicate the rest of the original data for these new records (category and count)
if my input logs (x) are as such:
start / end / catid / count
2012-11-17 15:05:02.0, 2012-11-17 15:12:52.0, 1, 2
2012-11-17 15:07:13.0, 2012-11-17 15:17:47.0, 2, 10
2012-11-17 15:11:00.0, 2012-11-17 15:12:33.0, 3, 5
2012-11-17 15:12:01.0, 2012-11-17 15:20:00.0, 4, 1
I'm trying to get the output bucketed in 5 minutes (b) this way:
start / catid / count
2012-11-17 15:05:00.0 1, 2
2012-11-17 15:10:00.0 1, 2
2012-11-17 15:05:00.0 2, 10
2012-11-17 15:10:00.0 2, 10
2012-11-17 15:15:00.0 2, 10
2012-11-17 15:10:00.0 3, 5
2012-11-17 15:10:00.0 4, 1
2012-11-17 15:15:00.0 4, 1
Then I can easily aggregate the new data frame (b) over category ids for the period I want (hour, day, week, month)
I'm starting with R and I found a lot explanations on how to bucket a time value but not a period of time.
I've had a look at zoo and xts but I couldn't quite find what to do.
Hopefully that makes sense to some of you.
Edit:
I've slightly modified Ram's suggestion to get the correct calculation of blocks using the rounded endtime rather than the original end time. (Thanks Ram!)
mnslot=15 # size of the buckets/slot in minutes
#Round down the minutes of starttime to a mutliple of mnslot
st.str <- strptime(st, "%Y-%m-%d %H:%M:%S")
min_st <- as.numeric(format(st.str, "%M"))
roundedmins <- floor(min_st/mnslot) * mnslot
st.base <- strptime(st, "%Y-%m-%d %H")
rounded_start <- st.base + (roundedmins * 60)
#Round down the minutes of the endtime to a multiple of mnslot.
en.str <- strptime(en, "%Y-%m-%d %H:%M:%S")
min_en <- as.numeric(format(en.str, "%M"))
roundedmins <- floor(min_en/mnslot) * mnslot
en.base <- strptime(en, "%Y-%m-%d %H")
rounded_end<- en.base + (roundedmins * 60)
# calculate the number of blocks based on the rounded minutes of start and end
numblocks<- as.numeric(floor((rounded_end-rounded_start)/mnslot/60)+1)
# differenced of POSIXct values is in minutes
# but difference of POSIXlt seems to be in seconds , so have to divide by 60 as well
#Create REPLICATED Rows, depending on the size of the interval
replicated_cat = NULL
replicated_count = NULL
replicated_start = NULL
for (n in 1:length(numblocks)){
for (newrow in 1:numblocks[n]){
replicated_start = c(replicated_start, df$rounded_start[n]+(newrow-1)*300 )
replicated_cat = c(replicated_cat, df$catid[n])
replicated_count = c(replicated_count, df$count[n])
}
}
#Change to readable format
POSIXT <- unix2POSIXct(replicated_start)
newdf <- data.frame(POSIXT, replicated_cat, replicated_count)
names(newdf) <- c("start", "CatId", "Count")
newdf
This produces the required output. it is a bit slow though:p
Here's a fully working version. It involves step-by-step data manipulation for what you are after.
#storing the original data as a csv
df <- read.csv("tsdata.csv")
st<-as.POSIXlt(df$start)
en<-as.POSIXlt(df$end)
#a utility function to convert formats
unix2POSIXct <- function (time) structure(time, class = c("POSIXt", "POSIXct") )
#For each row, determine how many replications are needed
numdups <- as.numeric(floor((en-st)/5)+1)
st.str <- strptime(st, "%Y-%m-%d %H:%M:%S")
min_st <- as.numeric(format(st.str, "%M"))
#Round down the minutes of start to 5 minute starts. 0,5,10 etc...
roundedmins <- floor(min_st/5) * 5
st.base <- strptime(st, "%Y-%m-%d %H")
df$rounded_start <- st.base + (roundedmins * 60)
#Create REPLICATED Rows, depending on the size of the interval
replicated_cat = NULL
replicated_count = NULL
replicated_start = NULL
for (n in 1:length(numdups)){
for (newrow in 1:numdups[n]){
replicated_start = c(replicated_start, df$rounded_start[n]+(newrow-1)*300 )
replicated_cat = c(replicated_cat, df$catid[n])
replicated_count = c(replicated_count, df$count[n])
}
}
#Change to readable format
POSIXT <- unix2POSIXct(replicated_start)
newdf <- data.frame(POSIXT, replicated_cat, replicated_count)
names(newdf) <- c("start", "CatId", "Count")
newdf
Which produces:
start CatId Count
1 2012-11-17 15:05:00 1 2
2 2012-11-17 15:10:00 1 2
3 2012-11-17 15:05:00 2 10
4 2012-11-17 15:10:00 2 10
5 2012-11-17 15:15:00 2 10
6 2012-11-17 15:10:00 3 5
7 2012-11-17 15:10:00 4 1
8 2012-11-17 15:15:00 4 1
That's not an easy one ... I am also missing the structure of the whole problem so I hope it is ok if I limit myself to outlining the basic approach, if things are unclear you can come back to me.
First (if I were you) I would install the 'lubridate' package, which makes playing around with dates/times a lot easier.
Then maybe try something like this:
z <- strptime("17/11/12 15:05:00.0", "%d/%m/%y %H:%M:%OS")
This will define your starting point in time, if that is supposed to be defined by the first logs(x) time then there is the minute command available e.g.
z <- strptime("17/11/12 15:05:02.0", "%d/%m/%y %H:%M:%OS")
minute(z)<-5;second(z)<-0.0 #I guess, you get the concept
Then produce a sequence of 5 minute intervals
z5s<-z+minutes(seq(0,100,5))
This will produce a sequence of 20, 5 minute time intervals, here again I do not know how flexible the whole thing is supposed to be.
Finally you could then play around with for instance modulo operations
z2<-z+minutes(2)
z2 should be the end time, I just added 2 minutes "manually" here to illustrate the concept
(as.integer(z2-z))%%5 > 5
FALSE
or if you want to see how many 5 minute spans are covered only do (as.integer(z2-z))%%5
or whatever other functions you prefer to match/distribute the log times across the z5s POSIXlt intervals.
Hope this helps a bit i.e. gives you some direction.
I'm trying to write an elegant function in R to calculate the elapsed time between two timestamps, which are stored as integers with the format hmm or hhmm. I would like to return the elapsed time as an integer of minutes.
Here's my solution so far, which can probably be greatly improved:
#Treatment of varous length inputs:
#1 digit = m
#2 digits = mm
#3 digits = hmm
#4 digits = hhmm
#5+ digits = failure
elapsedtime <- function(S,E) {
S<-c(as.character(S))
E<-c(as.character(E))
if (length(S)!=length(E)) {
stop("Invalid input")
}
for (i in seq(1:length(S))) {
if (nchar(S[i])>4) {S[i]<-NA}
if (nchar(E[i])>4) {E[i]<-NA}
while (nchar(S[i])<4) {
S[i]<-paste('0',S[i],sep='')
}
while (nchar(E[i])<4) {
E[i]<-paste('0',E[i],sep='')
}
S[i]<-as.character(as.numeric(substr(S[i],1,2))*60+as.numeric(substr(S[i],3,4)))
E[i]<-as.character(as.numeric(substr(E[i],1,2))*60+as.numeric(substr(E[i],3,4)))
}
S<-as.numeric(S)
E<-as.numeric(E)
return(E-S)
}
elapsedtime(944,1733)
elapsedtime(44,33)
elapsedtime(44,133)
elapsedtime(c(944,44),c(1733,33))
elapsedtime(c(44,44),c(33,133))
elapsedtime(944,17335)
elapsedtime(c(944,945),c(1733,17335))
elapsedtime(c(944,945),c(1733,17335,34))
I'm not too wedded to the need to deal with the 1 and 2 digit cases, but I need to be able to handle input with 3 or 4 digits. I'm running this on a lot of dates, and doing 3/4 digits quickly is much preferable to doing 1,2,3 or 4 digits slowly.
/edit: Changed code to work properly on vectors of times
Try the following functions:
# calculate number of minutes of timestamp
mins <- function(x){
floor(x/100)*60+x%%100
}
# calculate difference in minutes
elapsedtime <- function(S,E){
mins(E)-mins(S)
}
This avoids loops so is vectorised. mins will work if hours are greater than 99, ie HHHMM, or you can modify for higher time units.
If 5 digits timestamps are actually erroneous, you can add the following as the first line of mins:
ifelse(x<10000,x,NA)
So you will get an NA in the difference when either or both of the timestamps are 5 digits.