I am working in R. My data set has a column of times (hh:mm:ss). The column is listed as factors.
Ultimately, I would like to be able to get the diff() between the values in this column (that is, the difference between time1 and time2 in the same column). I can't do that with factors.
How would I convert the times to another form that will allow me to calculate the difference in times within the same column?
Any help would be really appreciated!!
Try this code:
library(lubridate)
library(dplyr)
id <- c(1,2,3,4)
time1 <- hms(factor(c("09:00:01","04:02:50","10:30:21","11:15:25")))
time2 <- hms(factor(c("00:00:01","01:02:00","09:30:11","14:15:25")))
df<-data.frame(id, time1, time2)
df %>%
group_by(id) %>%
summarize(t_diff = time1 - time2)
t_diff
1 9H 0M 0S
2 3H 0M 50S
3 1H 0M 10S
4 -3H 0M 0S
taking what you've suggested:
x <- factor(c("12:34:56", "12:35:45", "12:48:00"))
y <- factor(c("12:42:56", "13:22:41", "17:11:21"))
## convert to time
x <- strptime(x, format = "%H:%M:%S")
y <- strptime(y, format = "%H:%M:%S")
## now you want a vector of differences
difftime(y,x, units="hours")
Time differences in hours
[1] 0.1333333 0.7822222 4.3891667
Edit: I misread your question. You want differences within one column. Remember that you'll end up with the fencepost problem because your output will have one fewer rows than your column. Try
diff.difftime(x)
Time differences in seconds
[1] 49 735
Related
Wanted to get the interval between 2 days displayed in days. Using lubridate package.
Method 1 using interval function.
library(lubridate)
date1 <- as.Date("2022-08-08")
date2 <- as.Date("2022-09-08")
x <- interval(date1, date2)
print(x)
days(x)
Output as follows
[1] 2022-08-08 UTC--2022-09-08 UTC
[1] "2678400d 0H 0M 0S"
Question: Why the answer is not correct? 2678400 days!
Method 2 using difftime function.
y <- difftime(date1, date2, units="days")
print(y)
Output as follows
Time difference of 31 days
The thing is, I want it to display only 31 instead of the whole sentence "Time difference of 31 days"
Need some guidance here.
lubridate::days() works with numerics. You've given it a period. as.numeric(x) gives 2678400 (the number of seconds between 2022-08-08 and 2022-09-08?). You're a victim of implicit coercion.
#jay.sf has given you the solution for difftime. To get the correct answer using lubridate:
time_length(x, "days")
[1] 31
#JustJames gave the full explanation of what went wrong in their now-deleted answer:
"According to the docs
as.interval changes difftime, Duration, Period and numeric class objects to intervals that begin at the specified date-time. Numeric objects are first coerced to timespans equal to the numeric value in seconds."
Try this,
date1 <- as.Date("2022-08-08")
date2 <- as.Date("2022-09-08")
dateDiff <- as.numeric(difftime(date2, date1))
print(dateDiff)
Output
> dateDiff = as.numeric(difftime(date2, date1))
> print(dateDiff)
[1] 31
Hope this helps!
I have a list of times that are an output of a model, these are in a decimal number format and represent the time from the start of the model running in hours, but they have not been given any units in the output, they are just numbers.
What I would like to do is convert these numbers into a date stamp format, using the numbers as hours the model has run. Specifically as either a POSIXct or Date variable so I can start use the Bupar library.
So for example I would like to convert 1.75, into 1 hour 45 minutes and 25.5 into a time that would equal 1 day 1 hour and 30 mins.
Thanks for any help
One possible way:
We could use seconds_to_period() function from lubridate after multiplying by 3600 to get seconds:
library(lubridate)
x <- 1.75
seconds_to_period(x*3600)
#[1] "1H 45M 0S"
I want to add a column which is a subtraction of Store_Entry_Time from Store_Exit_Time.
For example the result for row 1 should be (2014-12-02 18:49:05.402863 - 2014-12-02 16:56:32.394052) = 1 hour 53 minutes approximately.( I want this result in just hours).
I entered class(Store_Entry_Time) and it says "character".
How do I obtain the subtracting and put it into new column as "Time Spent"?
You can use ymd_hms from lubridate to convert the column into POSIXct format and then use difftime to caluclate the difference in time.
library(dplyr)
df <- df %>%
mutate(across(c(Store_Entry_Time, Store_Exit_Time), lubridate::ymd_hms),
Time_Spent = as.numeric(difftime(Store_Exit_Time,
Store_Entry_Time, units = 'hours')))
For a base R option here, we can try using as.POSIXct:
df$Time_Spent <- as.numeric(as.POSIXct(df$Store_Exit_Time) -
as.POSIXct(df$Store_Entry_Time)
The above column would give the difference in time, measured in hours.
Example:
Store_Exit_Time <- "2014-12-02 18:49:05.402863"
Store_Entry_Time <- "2014-12-02 16:56:32.394052"
Time_Spent <- as.numeric(as.POSIXct(Store_Exit_Time) - as.POSIXct(Store_Entry_Time))
Time_Spent
[1] 1.875836
I have a vector with DateTime character ("2014-04-17 23:33:00") and want to make a matrix with date and time as my columns.
This is my code:
dat <- as.POSIXct(dates)
date = data.frame(
date=dat,
time=format(dat, "%H:%M")
)
I took a look at extract hours and seconds from POSIXct for plotting purposes in R and it helped, but the problem is that I only get 00:00 as the time in my time column. It does not extract the time from the dates vector.
Any help is appreciated.
Using the following vector as an example:
dates<- c("2012-02-06 15:47:00","2012-02-06 15:02:00")
dat <- as.POSIXct(dates)
date.df = data.frame(
date=dat,
time=format(dat, "%H:%M")
)
You will obtain the correct times ("%H:%M")
> date.df
date time
1 2012-02-06 15:47:00 15:47
2 2012-02-06 15:02:00 15:02
I have a chunk of data logging temperatures from a few dozen devices every hour for over a year. The data are stored as a zoo object. I'd very much like to summarize those data by looking at the average values for every one of the 24 hours in a day (1am, 2am, 3am, etc.). So that for each device I can see what its average value is for all the 1am times, 2am times, and so on. I can do this with a loop but sense that there must be a way to do this in zoo with an artful use of aggregate.zoo. Any help?
require(zoo)
# random hourly data over 30 days for five series
x <- matrix(rnorm(24 * 30 * 5),ncol=5)
# Assign hourly data with a real time and date
x.DateTime <- as.POSIXct("2014-01-01 0100",format = "%Y-%m-%d %H") +
seq(0,24 * 30 * 60 * 60, by=3600)
# make a zoo object
x.zoo <- zoo(x, x.DateTime)
#plot(x.zoo)
# what I want:
# the average value for each series at 1am, 2am, 3am, etc. so that
# the dimensions of the output are 24 (hours) by 5 (series)
# If I were just working on x I might do something like:
res <- matrix(NA,ncol=5,nrow=24)
for(i in 1:nrow(res)){
res[i,] <- apply(x[seq(i,nrow(x),by=24),],2,mean)
}
res
# how can I avoid the loop and write an aggregate statement in zoo that
# will get me what I want?
Calculate the hour for each time point and then aggregate by that:
hr <- as.numeric(format(time(x.zoo), "%H"))
ag <- aggregate(x.zoo, hr, mean)
dim(ag)
## [1] 24 5
ADDED
Alternately use hours from chron or hour from data.table:
library(chron)
ag <- aggregate(x.zoo, hours, mean)
This is quite similar to the other answer but takes advantage of the fact the the by=... argument to aggregate.zoo(...) can be a function which will be applied to time(x.zoo):
as.hour <- function(t) as.numeric(format(t,"%H"))
result <- aggregate(x.zoo,as.hour,mean)
identical(result,ag) # ag from G. Grothendieck answer
# [1] TRUE
Note that this produces a result identical to the other answer, not not the same as yours. This is because your dataset starts at 1:00am, not midnight, so your loop produces a matrix wherein the 1st row corresponds to 1:00am and the last row corresponds to midnight. These solutions produce zoo objects wherein the first row corresponds to midnight.