Changing Time To A Comparable Function In R - r

I have a dataset in .csv, and I have added in a column on my own in the csv that takes the total time taken for a task to be completed. There are two other columns that consists of the start time and the end time, and that is where I calculated the total time taken column from. The format of the start time and end time columns are in the datetime format 5/7/2018 16:13 while the format of the total time taken column is 0:08:20(H:MM:SS).
I understand that for datetime, it is possible to use the functions as.Date or as.POSIXlt to change the variable type from a factor to that of date. Is there a function that I can convert my total time taken column to (from that of factor) so that I can use it to plot scatterplots/plots in general? I tried as.numeric but the numbers that come out are gibberish and do not correspond to the original time.

If you want to plot the total time taken for each row, then I would suggest just plotting that difference as seconds. Here is a code snippet which shows how you can convert your start or end date into a numerical value:
start <- "5/7/2018 16:13"
start_date <- as.POSIXct(start, format="%d/%m/%Y %H:%M")
as.numeric(start_date)
[1] 1530799980
The above is a UNIX timestamp, which is number of seconds since the epoch (January 1, 1970). But, since you want a difference between start and end times, this detail does not really matter for you, and the difference you get should be valid.
If you want to use minutes, hours, or some other time unit, then you can easily convert.

Related

Grouping Time Duration by Date when Intervals Cross Midnight

I'm working with a simple dataframe in R containing two columns that represent a time interval:
Started (Date/Time)
Ended (Date/Time)
I want to create a column containing the duration of these time intervals where I can then group by date. The issue is some of the intervals cross midnight and thus have time durations associated with two different dates. Rather than arbitrarily grouping these by their start/end dates I'd like to find a way to include times prior to midnight in one date group and those after midnight in the next day's group.
My current approach seems inefficient, plus I'm hitting a roadblock. First I reformatted the df and created a blank column to hold duration, plus another to hold a "new end date" for performing interval operations:
Start.Date
Start.Time
End.Date
End.Time
Duration
End.Date.New
I then used a loop to find instances where the time crossed midnight to store the last second of that day 23:59:59 in the End.Date.New column"
for(i in 1:nrow(df)) {
if(df$End.Time[i] < df$Start.Time[i]) {
df$End.Time.New[i] = '23:59:59'}}
The idea would be that, for instances where End.Time.New != NA, I could calculate Duration using Start.Time and End.Time.New and use Start.Date as my group-by variable. I would then have to generate an identical row that added 1 day to the start time and perform a similar operation (End.Date and 00:00:00) to populate the duration column, and I haven't been able to figure out how to make this work.
Is this separate-and-loop approach the best way to achieve this or is there a more efficient strategy using functions I may not be aware of?

Convert UTC ISO string date to Unix timestamp

I have this UTC date in a Google spreadsheet: 2018-10-18T08:55:13Z and would like to convert it to Unix timestamp (1539852913). I tried this formula, but it's unable to recognize the timevalue:
=DATEVALUE(MID(A1;1;10)) + TIMEVALUE(MID(A1;12;8))
If I can get a valid date and time, I can use this formula to convert to Unix timestamp:
=(A1-$C$1)*86400
Does anyone have a solution for this?
Simpler:
=86400*(left(substitute(A1,"T"," "),19))-2209161600
Replaces T with space and cuts off Z, leaving what's left recognisable as date and time in arithmetical calculations. Convert day and time index into seconds and adjust for the offset.
Assuming your date has proceeding zeros for single digit days and month, pull each date string part and drop it into the DATE formula as follows:
Year
=LEFT(A1,4)
Month
=MID(A1,6,2)
Day
=MID(A1,9,2)
Use the date formula
=DATE(year,month,day)
=DATE(LEFT(A1,4),MID(A1,6,2),MID(A1,9,2))
A similar process can be used for TIME
Hour
=MID(A1,12,2)
Minutes
=MID(A1,15,2)
Seconds
=MID(A1,18,2)
Time
=TIME(Hour,Minutes,Seconds)
=TIME(MID(A1,12,2),MID(A1,15,2),MID(A1,18,2))
1) There are other methods
2) The formulas will need to be adapted if you do not have leading 0 for each unit. In that case you would need to use FIND to identify the position of key characters and measure the distance between them to determine if there was a single digit unit or double digit unit.
Since the date is the integer part (left of the decimal) represents the number of days since 1900/01/01 (with that date being 1) and decimal portion represents time in terms of fraction of a day, to get a full date and time, you would add the date formula to the time formula as follows:
=DATE(LEFT(A1,4),MID(A1,6,2),MID(A1,9,2))+TIME(MID(A1,12,2),MID(A1,15,2),MID(A1,18,2))

R: subsetting timestamped dataframe periodically

I have a csv file that contains many thousands of timestamped data points. The file includes the following columns: Date, Tag, East, North & DistFromMean. The following is a sample of the data in the file:
The data is recorded approximately every 15 minutes for 12 tags over a month. What I'm wanting to do is select from the data, starting from the first date entry, subsets of data i.e. every 3 hours but due to the tags transmitting at slightly different rates I need a minimum and maximum value start and end time.
I have found the a related previous question but don't understand the answer enough to implement.
The solution could firstly ask for the Tag number, then the period required perhaps in minutes from the start time (i.e. every 3hrs or 180 minutes), the minimum time range and the maximum time range, both of which would be constant for whatever time period was used. The minimum and maximum would probably need to be plus and minus 6 minutes from the period selected.
As the code below shows, I've managed to read in the file, change the Date format to POSIXlt and extract data within a specific time frame but the bit I'm stuck on is extracting the data every nth minute and within a range.
TestData<- read.csv ("TestData.csv", header=TRUE, as.is=TRUE)
TestData$Date <- strptime(TestData$Date, "%d/%m/%Y %H:%M")
TestData[TestData$Date >= as.POSIXlt("2014-02-26 7:10:00") & TestData$Date < as.POSIXlt("2014-02-26 7:18:00"),]

convert string to time in r

I have an array of time strings, for example 115521.45 which corresponds to 11:55:21.45 in terms of an actual clock.
I have another array of time strings in the standard format (HH:MM:SS.0) and I need to compare the two.
I can't find any way to convert the original time format into something useable.
I've tried using strptime but all it does is add a date (the wrong date) and get rid of time decimal places. I don't care about the date and I need the decimal places:
for example
t <- strptime(105748.35, '%H%M%OS') = ... 10:57:48
using %OSn (n = 1,2 etc) gives NA.
Alternatively, is there a way to convert a time such as 10:57:48 to 105748?
Set the options to allow digits in seconds, and then add the date you wish before converting (so that the start date is meaningful).
options(digits.secs=3)
strptime(paste0('2013-01-01 ',105748.35), '%Y-%M-%d %H%M%OS')

Creating a specific sequence of date/times in R

I want to create a single column with a sequence of date/time increasing every hour for one year or one month (for example). I was using a code like this to generate this sequence:
start.date<-"2012-01-15"
start.time<-"00:00:00"
interval<-60 # 60 minutes
increment.mins<-interval*60
x<-paste(start.date,start.time)
for(i in 1:365){
print(strptime(x, "%Y-%m-%d %H:%M:%S")+i*increment.mins)
}
However, I am not sure how to specify the range of the sequence of dates and hours. Also, I have been having problems dealing with the first hour "00:00:00"? Not sure what is the best way to specify the length of the date/time sequence for a month, year, etc? Any suggestion will be appreciated.
I would strongly recommend you to use the POSIXct datatype. This way you can use seq without any problems and use those data however you want.
start <- as.POSIXct("2012-01-15")
interval <- 60
end <- start + as.difftime(1, units="days")
seq(from=start, by=interval*60, to=end)
Now you can do whatever you want with your vector of timestamps.
Try this. mondate is very clever about advancing by a month. For example, it will advance the last day of Jan to last day of Feb whereas other date/time classes tend to overshoot into Mar. chron does not use time zones so you can't get the time zone bugs that code as you can using POSIXct. Here x is from the question.
library(chron)
library(mondate)
start.time.num <- as.numeric(as.chron(x))
# +1 means one month. Use +12 if you want one year.
end.time.num <- as.numeric(as.chron(paste(mondate(x)+1, start.time)))
# 1/24 means one hour. Change as needed.
hours <- as.chron(seq(start.time.num, end.time.num, 1/24))

Resources