Calculating Idle time for Uber service - r

I have uber dataset containing variables pickup point, request time, drop time, date variable without month and year.
I need code for calculating idle time and creating a new variable idle time. Calculation as follows:
If pickup points are same for consecutive rows and date is different for consecutive rows then NA value if not difference between drop time of first row and the pickup time in second row. I have done it in excel and need to do it in R
Attached is the screenshot of data in excel

Try something like this, if this is what you are looking for
for(i in 2:nrow(df)){
df$idle[1]<-NA
if(df$Pickup.point[i]!=df$Pickup.point[i-1])
df$idle[i]<-NA
else
if(df$Date[i]!=df$Date[i-1])
df$idle[i]<-NA
else
df$idle[i]<-(df$Req[i]-df$Drop[i-1])
}

Related

Grouping Time Duration by Date when Intervals Cross Midnight

I'm working with a simple dataframe in R containing two columns that represent a time interval:
Started (Date/Time)
Ended (Date/Time)
I want to create a column containing the duration of these time intervals where I can then group by date. The issue is some of the intervals cross midnight and thus have time durations associated with two different dates. Rather than arbitrarily grouping these by their start/end dates I'd like to find a way to include times prior to midnight in one date group and those after midnight in the next day's group.
My current approach seems inefficient, plus I'm hitting a roadblock. First I reformatted the df and created a blank column to hold duration, plus another to hold a "new end date" for performing interval operations:
Start.Date
Start.Time
End.Date
End.Time
Duration
End.Date.New
I then used a loop to find instances where the time crossed midnight to store the last second of that day 23:59:59 in the End.Date.New column"
for(i in 1:nrow(df)) {
if(df$End.Time[i] < df$Start.Time[i]) {
df$End.Time.New[i] = '23:59:59'}}
The idea would be that, for instances where End.Time.New != NA, I could calculate Duration using Start.Time and End.Time.New and use Start.Date as my group-by variable. I would then have to generate an identical row that added 1 day to the start time and perform a similar operation (End.Date and 00:00:00) to populate the duration column, and I haven't been able to figure out how to make this work.
Is this separate-and-loop approach the best way to achieve this or is there a more efficient strategy using functions I may not be aware of?

R - Filter Dates by Time Window without including weekends

Is there a way to window filter dates by a number of days excluding weekends?
I know you can use the between function for filtering between two specific dates but I only know one of the two specific dates, with the other date I would like to do is 4 days prior in business days only (not counting weekends).
An pseudo-example of what I am looking for is, given this wednesday I want to filter everything up to 4 business days beforehand:
window(z, start = as.POSIXct("2017-09-13"), end = as.POSIXct("2017-09-20"))
Another example would be if I am given this Friday's date, the start date would be Monday.
Ideally, I want to be able to play with the window value.

How do I make periods out of times in R?

I have 10 million+ data points which look like:
Identifier Times Data
6597104 2015-05-01 04:08:05 0.15512575543732
In order to study these I want to add a Period (1, 2,...) column so the oldest row with the 6597104 identifier is period 1 and the second oldest is period 2 etc. However the times come irregularly so I can't just make it a time series object.
Does anyone know how to do this? Thanks in advance
Let's call your data frame data
First sort it using
data <- data[sort(data$Times,decreasing=TRUE),]
Then add a new column called Period
for i in 1:nrow(data){
data$Period[i] <- paste("Period",i,sep=" ")
}

R: subsetting timestamped dataframe periodically

I have a csv file that contains many thousands of timestamped data points. The file includes the following columns: Date, Tag, East, North & DistFromMean. The following is a sample of the data in the file:
The data is recorded approximately every 15 minutes for 12 tags over a month. What I'm wanting to do is select from the data, starting from the first date entry, subsets of data i.e. every 3 hours but due to the tags transmitting at slightly different rates I need a minimum and maximum value start and end time.
I have found the a related previous question but don't understand the answer enough to implement.
The solution could firstly ask for the Tag number, then the period required perhaps in minutes from the start time (i.e. every 3hrs or 180 minutes), the minimum time range and the maximum time range, both of which would be constant for whatever time period was used. The minimum and maximum would probably need to be plus and minus 6 minutes from the period selected.
As the code below shows, I've managed to read in the file, change the Date format to POSIXlt and extract data within a specific time frame but the bit I'm stuck on is extracting the data every nth minute and within a range.
TestData<- read.csv ("TestData.csv", header=TRUE, as.is=TRUE)
TestData$Date <- strptime(TestData$Date, "%d/%m/%Y %H:%M")
TestData[TestData$Date >= as.POSIXlt("2014-02-26 7:10:00") & TestData$Date < as.POSIXlt("2014-02-26 7:18:00"),]

Using R to subset overlapping daily sensor data

I have a data set (3.2 million rows) in R which consists of pairs of time (milliseconds) and volts. The sensor that gathers the data only runs during the day so the time is actually the milliseconds since start-up that day.
For example, if the sensor runs 12 hours per day, then the maximum possible time value for one day is 43,200,000 ms (12h * 60m * 60s * 1000ms).
The data is continually added to a single file, which means there are many overlapping time values:
X: [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5...] // example if range was 1-5 for one day
Y: [voltage readings at each point in time...]
I would like to separate each "run" into unique data frames so that I could clearly see individual days. Currently when I plot the entire data set it is incredibly muddy because in fact all of the days are being shown in the single plot. Thanks for any help.
If your data.frame df has columns X and Y, you can use diff to find every time X goes down (meaning a new day, it sounds like):
df$Day = cumsum(c(1, diff(df$X) < 0))
Day1 = df[df$Day==1,]
plot(Day1$X, Day1$Y)

Resources