I have a series of duration that range up to 118 hours in a format like so "118:34:42" where 118 is hours, 34 is minutes, and 42 is seconds. Output should be a number of seconds.
I would like to convert this to some kind of time type in R, but most of the libraries I've looked at want to add a date (lubridate, zoo, xts), or return "NA" due to the hours being beyond a 24 hour range. I could parse the string and return a number of seconds, but I'm wondering if there's a faster way.
I'm slightly new to R (maybe 3 months in to working with this).
Any help figuring out how to deal with this would be appreciated.
Example:
library(lubridate)
x <- c("118:34:42", "114:12:12")
tt <- hms(x)
Error in parse_date_time(hms, orders, truncated = truncated, quiet = TRUE) :
No formats could be infered from the training set.
#try another route
w <- "118:34:42"
tt2 <- hms(w)
tt2
#[1] NA
z <- "7:02:02"
tt3 <- hmw(z)
tt3
#[1] "7H 2M 2S"
In the lubridate package there is a function hms() that returns a time object:
library(lubridate)
x <- c("118:34:42", "114:12:12")
tt <- hms(x)
tt
[1] 118 hours, 34 minutes and 42 seconds
[2] 114 hours, 12 minutes and 12 seconds
The function hms() returns an object of class Period:
str(tt)
Formal class 'Period' [package "lubridate"] with 6 slots
..# .Data : num [1:2] 42 12
..# year : num [1:2] 0 0
..# month : num [1:2] 0 0
..# day : num [1:2] 0 0
..# hour : num [1:2] 118 114
..# minute: num [1:2] 34 12
You can do arithmetic using these objects. For example:
tt[2] - tt[1]
[1] -4 hours, -22 minutes and -30 seconds
Related
This question already has answers here:
Why does unlist() kill dates in R?
(2 answers)
Closed 2 years ago.
I have a list of periods that I would like to convert it to behave like a vector (eventually to add as a column in a data frame).
library(lubridate)
x <- list(ms("09:10"), ms("09:02"), ms("1:10"))
# some_function(x)
# with output
ms(c("09:10", "09:02", "1:10"))
unlist and purrr::flatten don't work in this case since it loses it period properties.
d <- do.call("c", x)
class(d)
[1] "Period"
attr(,"package")
[1] "lubridate"
Or
d <- data.frame(date = do.call("c", x))
str(d)
'data.frame': 3 obs. of 1 variable:
$ date:Formal class 'Period' [package "lubridate"] with 6 slots
.. ..# .Data : num 10 2 10
.. ..# year : num 0 0 0
.. ..# month : num 0 0 0
.. ..# day : num 0 0 0
.. ..# hour : num 0 0 0
.. ..# minute: num 9 9 1
d
date
1 9M 10S
2 9M 2S
3 1M 10S
See here: why does unlist() kill dates in R
I have a dataframe of times looking like this:
library(lubridate)
times <- structure(list(exp1 = c("17:19:04 \r", "17:28:53 \r", "17:38:44 \r"),
exp2 = c("17:22:04 \r", "17:31:53 \r", "17:41:45 \r")),
row.names = c(NA, 3L), class = "data.frame")
I want to convert the times in more convenient date-time objects, which I will do with the hms() helper function from the lubridate package.
Running hms() on one column of my dataframe works like a charm:
hms(times[,1])
[1] "17H 19M 4S" "17H 28M 53S" "17H 38M 44S"
Great, surely I can just apply() on my whole dataframe then.
apply(times, 2, hms)
Which gives a weird dataframe with some integers, definitely not what I am expecting.
What is the right way to convert all of my times dataframe with the hms() function?
library(dplyr)
library(lubridate)
times %>% mutate_all(hms)
#OR
mutate_all(times, hms)
# exp1 exp2
#1 17H 19M 4S 17H 22M 4S
#2 17H 28M 53S 17H 31M 53S
#3 17H 38M 44S 17H 41M 45S
Instead of apply, can use lapply as apply converts to matrix and it wouln't hold the attributes created by hms
library(lubridate)
times[] <- lapply(times, hms)
str(times)
#'data.frame': 3 obs. of 2 variables:
# $ exp1:Formal class 'Period' [package "lubridate"] with 6 slots
# .. ..# .Data : num 4 53 44
# .. ..# year : num 0 0 0
# .. ..# month : num 0 0 0
# .. ..# day : num 0 0 0
# .. ..# hour : num 17 17 17
# .. ..# minute: num 19 28 38
# $ exp2:Formal class 'Period' [package "lubridate"] with 6 slots
# .. ..# .Data : num 4 53 45
# .. ..# year : num 0 0 0
# .. ..# month : num 0 0 0
# .. ..# day : num 0 0 0
# .. ..# hour : num 17 17 17
# .. ..# minute: num 22 31 41
With the devel version of dplyr, we can use mutate with across
library(dplyr)
times %>%
mutate(across(everything(), hms))
I have used lubridate to create a column of time format ####M ##S
df$delay <- minutes(df$finish-Delays$start)
How can I covert this column to just give the number #### in front of minutes?
Thanks for any help.
Take a look at the S4 slots...
library(lubridate)
# create data
df <- data.frame(finish = 20)
Delays <- data.frame(start = 10)
(df$delay <- minutes(df$finish-Delays$start))
[1] "10M 0S"
# take a look at the 'delay' object
str(df$delay)
Formal class 'Period' [package "lubridate"] with 6 slots
..# .Data : num 0
..# year : num 0
..# month : num 0
..# day : num 0
..# hour : num 0
..# minute: num 10
# access the 'minute' slot
df$delay#minute
[1] 10
I have a data frame of almost 1600 observations with this structure:
head(df)
Start_Time Duration
1 2014-09-18 10:01:00 4 mins
2 2014-09-18 08:01:00 41 mins
3 2014-09-18 08:01:00 22 mins
4 2014-09-18 08:01:00 41 mins
5 2014-09-18 08:01:00 60 mins
6 2014-09-18 07:02:00 17 mins
I have plotted my data with this function:
plot(df$Start_Time,as.numeric(df$Duration), ylab = "Duration", xlab = "Date", ylim = c(0,450))
Since the data frame contains several tens of observations per day, I would like to draw a trend line in order to make it easier to read the data visually.
I tried this code:
fit <- glm(df$Start_Time~df$Duration)
co <- coef(fit)
abline(fit, col="red", lwd=2)
but I get this error:
Error in model.frame.default(formula = df$Start_Time ~ df$Duration, :
invalid type (list) for variable 'df$Start_Time'
I got the same error with this code:
abline(lm(df$Start_Time ~ df$Duration))
From reading the error messages, I suppose that those functions can't hande non-numeric values.
I tried this and got no error, but the line wasn't displayed on my graph:
fit <- glm(as.numeric(df$Start_Time)~df$Duration)
co <- coef(fit)
abline(fit, col="red", lwd=2)
What is the correct way of drawing trend lines / regression lines when one of the variables is in the datetime format?
NOTE: what follows is the result of str(df)
str(df)
'data.frame': 4121 obs. of 2 variables:
$ Start_Time: POSIXlt, format: "2014-09-18 10:01:00" "2014-09-18 08:01:00" "2014-09-18 08:01:00" "2014-09-18 08:01:00" ...
$ Duration :Class 'difftime' atomic [1:4121] 4 41 22 41 60 17 17 2 3 3 ... .. ..- attr(*, "units")= chr "mins"
Try the following code that reproduces data in the format you stated, then fits a linear model using lm() instead of glm() and plots the results, including a line of best fit.
set.seed(1)
times <- as.POSIXct("2014-09-18") + sort(runif(11, min=0, max=1000))
df <- data.frame(Start_time = times[-11])
df$Duration <- difftime(times[-11], times[-1])
model <- lm(Start_time ~ Duration, df)
plot(Start_time ~ Duration, df)
abline(model)
The structure of the data frame is the same as you report:
str(df)
'data.frame': 10 obs. of 2 variables:
$ Start_time: POSIXct, format: "2014-09-18 00:01:01" "2014-09-18 00:03:21" "2014-09-18 00:03:25" ...
$ Duration :Class 'difftime' atomic [1:10] -139.9 -4.29 -59.53 -106.62 -200.73 ...
.. ..- attr(*, "units")= chr "secs"
I was about to blog about a useful R function I'd made, went to create some dummy data, but the dummy data behaves differently! Help!
library(xts)
data=xts(1:139,Sys.Date()-139:1)
Looking at it, it all looks good:
> head(data)
[,1]
2012-03-07 1
2012-03-08 2
2012-03-09 3
2012-03-10 4
2012-03-11 5
2012-03-12 6
> tail(data)
[,1]
2012-07-18 134
2012-07-19 135
2012-07-20 136
2012-07-21 137
2012-07-22 138
2012-07-23 139
> head(index(data))
[1] "2012-03-07" "2012-03-08" "2012-03-09" "2012-03-10" "2012-03-11" "2012-03-12"
> tail(index(data))
[1] "2012-07-18" "2012-07-19" "2012-07-20" "2012-07-21" "2012-07-22" "2012-07-23"
> range(index(data))
[1] "2012-03-07" "2012-07-23"
But, rollapply is weird. The range(index()) gives "1 40" instead of the strings.
> rollapply(data,width=40,by=30,FUN=function(x){print(range(index(x)));length(x)})
[1] 1 40
[1] 1 40
[1] 1 40
[1] 1 40
2012-03-26 40
2012-04-25 40
2012-05-25 40
2012-06-24 40
This is officially weird, because on my real data rollapply outputs a date range as strings. Comparing str on my real data and the above artificial data, and they are identical. In particular they both say 'Indexed by objects of class: [Date] TZ:' and they both say: 'tclass: chr "Date"'
Well, no, I exaggerate; the following artificial data has identical structure to my real data:
data=xts(data.frame(a=1:139,b=seq(3.14,by=0.01,length.out=139)),Sys.Date()-139:1)
It has exactly the same weird rollapply issue.
P.S. The useful function I mentioned is a rollapply wrapper; I've not shown it above because I don't need to: the core xts rollapply shows the problem too. But I'll post a link to it, in a comment, when I finally blog about it :-)
UPDATE
Here is some output with an xts object where it works:
> rollapply(data,width=40,by=30,FUN=function(x){print(class(x));print(range(index(x)));length(x)})
[1] "xts" "zoo"
[1] "2012-01-02" "2012-02-24"
...
> class(data)
[1] "xts" "zoo"
> str(data)
An ‘xts’ object from 2012-01-02 to 2012-07-18 containing:
Data: num [1:139, 1] 76.9 76.7 76.7 77.1 76.9 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "Close"
Indexed by objects of class: [Date] TZ:
xts Attributes:
List of 2
$ tclass: chr "Date"
$ tzone : chr ""
Here is some output with my artificial xts object (except I've added: colnames(data)=c("Close"))
> rollapply(data,width=40,by=30,FUN=function(x){print(class(x));print(range(index(x)));length(x)})
[1] "integer"
[1] 1 40
...
> class(data)
[1] "xts" "zoo"
> str(data)
An ‘xts’ object from 2012-03-07 to 2012-07-23 containing:
Data: int [1:139, 1] 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "Close"
Indexed by objects of class: [Date] TZ:
xts Attributes:
List of 2
$ tclass: chr "Date"
$ tzone : chr ""
I.e. identical str/class, identical function call, but different result. The xts object where it works is read from a csv file using this code:
d=read.table(fname,sep=',',header=T,stringsAsFactors=F)
x=as.xts(subset(d,select=-datestamp),order.by=as.Date(d$datestamp))
Observe the following:
rollapply(data,width=40,by=30,FUN=function(x){class(x)})
2012-03-26 integer
2012-04-25 integer
2012-05-25 integer
2012-06-24 integer
rollapply is passing the subsets of data as integer rather than xts objects.
The code for zoo:::rollapply.zoo appears to only use standard [ subsetting so it's not clear why the class information is being lost.
Edit
Actually there is a line:
dat <- mapply(f, seq_along(time(data)), width, MoreArgs = list(data = coredata(data),
...), SIMPLIFY = FALSE)
So only the coredata is being passed to the eventual function. This means you can't use rollapply to get these partial ranges.