For each row of a data frame I want to count how many events occur between the date time indicated in the column start and end.
Please consider the following function
calcFreqTimeInterval <- function (startTime, endTime, timestampVector) {
sum(timestampVector >= startTime & timestampVector <= endTime)
}
as argument of the function I want
df <- data.frame(start=c("06/11/2013 10:00:00","06/11/2013 17:30:00"), end=c("06/11/2013 11:15:00","06/11/2013 17:45:00"))
timestamp <- as.POSIXlt(c("2013-11-06 10:30:19","2013-11-06 10:32:19","2013-11-06 11:00:19", "2013-11-06 17:40:50","2013-11-06 17:42:50"))
respectively. After converting the columns to Posix with
df$start <- as.POSIXlt((df$start), format="%d/%m/%Y %H:%M:%S")
df$end <- as.POSIXlt((df$end), format="%d/%m/%Y %H:%M:%S")
I would like to obtain as result
expectedResult <- c(3,2)
I should be able to use apply if my arguments were all in the df, but how to use as argument also a vector?
You need to use mapply. Before that you need to use POSIXct class instead of POSIXlt class.
df <- data.frame(start=c("06/11/2013 10:00:00","06/11/2013 17:30:00"), end=c("06/11/2013 11:15:00","06/11/2013 17:45:00"))
timestamp <- as.POSIXct(c("2013-11-06 10:30:19","2013-11-06 10:32:19","2013-11-06 11:00:19", "2013-11-06 17:40:50","2013-11-06 17:42:50"))
df$start <- as.POSIXct((df$start), format="%d/%m/%Y %H:%M:%S")
df$end <- as.POSIXct((df$end), format="%d/%m/%Y %H:%M:%S")
mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary. MoreArgs is a list of other arguments to FUN.
mapply(FUN = calcFreqTimeInterval, startTime = df$start, endTime = df$end, MoreArgs = list(timestampVector = timestamp))
## [1] 3 2
Related
I've got the following time frame:
A <- c('2016-01-01', '2019-01-05')
B <- c('2017-05-05','2019-06-05')
X_Period <- interval("2015-01-01", "2019-12-31")
Y_Periods <- interval(A, B)
I'd like to find the non overlapping periods between X_Period and Y_Periods so that the result would be:
[1]'2015-01-01'--'2015-12-31'
[2]'2017-05-06'--'2019-01-04'
[3]'2019-06-06'--'2019-31-12'
I'm trying to use setdiff but it does not work
setdiff(X_Period, Y_Periods)
Here is an option:
library(lubridate)
seq_X <- as.Date(seq(int_start(X_Period), int_end(X_Period), by = "1 day"))
seq_Y <- as.Date(do.call("c", sapply(Y_Periods, function(x)
seq(int_start(x), int_end(x), by = "1 day"))))
unique_dates_X <- seq_X[!seq_X %in% seq_Y]
lst <- aggregate(
unique_dates_X,
by = list(cumsum(c(0, diff.Date(unique_dates_X) != 1))),
FUN = function(x) c(min(x), max(x)),
simplify = F)$x
lapply(lst, function(x) interval(x[1], x[2]))
#[[1]]
#[1] 2015-01-01 UTC--2015-12-31 UTC
#
#[[2]]
#[1] 2017-05-06 UTC--2019-01-04 UTC
#
#[[3]]
#[1] 2019-06-06 UTC--2019-12-31 UTC
The strategy is to convert the intervals to by-day sequences (one for X_Period and one for Y_Period); then we find all days that are only part of X_Period (and not part of Y_Periods). We then aggregate to determine the first and last date in all sub-sequences of consecutive dates. The resulting lst is a list with those start/end dates. To convert to interval, we simply loop through the list and convert the start/end dates to an interval.
I'm currently struggling with R and calculating the time difference in days.
I have data.frame with around 60 000 rows. In this data frame there are two columns called "start" and "end". Both columns contain data in UNIX time format WITH milliseconds - as you can see by the last three digits.
Start <- c("1470581434000", "1470784954000", "1470811368000", "1470764345000")
End <- c("1470560601000", "1470581549000", "1470785452000", "1470764722000")
d <- data.frame(Start, End)
My desired output should be a extra column called timediff where the time difference is outline in days.
I tried it with timediff and strptime which I found here. But nothing worked out.
Maybe one of you worked with calculation of time differences in the past.
Thanks a lot
There is a very small and fast solution:
Start_POSIX <- as.POSIXct(as.numeric(Start)/1000, origin="1970-01-01")
End_POSIX <- as.POSIXct(as.numeric(End)/1000, origin="1970-01-01")
difftime(Start_POSIX, End_POSIX)
Time differences in mins
[1] 347.216667 3390.083333 431.933333 -6.283333
or if you want another unit:
difftime(Start_POSIX, End_POSIX, unit = "sec")
Time differences in secs
[1] 20833 203405 25916 -377
You have a few steps you'll need to take:
# 1. Separate the milliseconds.
# To do this, insert a period in front of the last three digits
Start <-
sub(pattern = "(\\d{3}$)", # get the pattern of three digits at the end of the string
replacement = ".\\1", # replace with a . and then the pattern
x = Start)
# 2. Convert to numeric
Start <- as.numeric(Start)
# 3. Convert to POSIXct
Start <- as.POSIXct(Start,
origin = "1970-01-01")
For convenience, it would be good to put these all into a function
# Bundle all three steps into one function
unixtime_to_posixct <- function(x)
{
x <- sub(pattern = "(\\d{3}$)",
replacement = ".\\1",
x = x)
x <- as.numeric(x)
as.POSIXct(x,
origin = "1970-01-01")
}
And with that, you can get your differences in days
#* Put it all together.
library(dplyr)
library(magrittr)
Start <- c("1470581434000", "1470784954000", "1470811368000", "1470764345000")
End <- c("1470560601000", "1470581549000", "1470785452000", "1470764722000")
d <- data.frame(Start,
End,
stringsAsFactors = FALSE)
lapply(
X = d,
FUN = unixtime_to_posixct
) %>%
as.data.frame() %>%
mutate(diff = difftime(Start, End, units = "days"))
I'm trying to adapt the answer to my previous question (Difference between dates in many columns in R). I've realised I only want the time difference between a given column, and the column immediately to it's left. Example for clarification:
df <- data.frame(
Group=c("A","B"),
ID=c(1,2),
Date1=as.POSIXct(c('2016-04-25 09:15:29','2016-04-25 09:15:29')),
Date2=as.POSIXct(c('2016-04-25 14:01:19','2016-04-25 14:01:19')),
Date3=as.POSIXct(c('2016-04-26 13:28:19','2016-04-26 13:28:19')),
stringsAsFactors=F
)
My desired output is Date2-Date1 and Date3-Date2. And this of course would extend for many columns i.e. Date4-Date3 etc. But I do not need Date3-Date1. To clarify, how can I automate this for many columns
df$Date2_Date1 <- difftime(df$Date2,df$Date1, units = c("hours"))
df$Date3_Date2 <- difftime(df$Date3,df$Date2, units = c("hours"))
Thanks to #bgoldst for the original answer. I think I just need to adapt cmb below to have the correct sequence:
cmb <- combn(seq_len(ncol(df)-1L)+1L,2L);
res <- abs(apply(cmb,2L,function(x) difftime(df[[x[1L]]],df[[x[2L]]],units='hours')));
colnames(res) <- apply(cmb,2L,function(x,cns) paste0(cns[x[1L]],'_',cns[x[2L]]),names(df))
Thanks
Given your example, this should to the trick:
df <- data.frame(
Group=c("A","B"),
ID=c(1,2),
Date1=as.POSIXct(c('2016-04-25 09:15:29','2016-04-25 09:15:29')),
Date2=as.POSIXct(c('2016-04-25 14:01:19','2016-04-25 14:01:19')),
Date3=as.POSIXct(c('2016-04-26 13:28:19','2016-04-26 13:28:19')),
stringsAsFactors=F
)
mapply(difftime, df[, 4:5], df[, 3:4], units = "hours")
> Date2 Date3
> [1,] 4.763889 23.45
> [2,] 4.763889 23.45
In my call mapply applies function difftime to the two arrays provided, so it starts with df[, 4] - df[, 3], then df[, 5] - df[, 4]. You of course have to change this with the column numbers for your dates, and make sure they are ordered in the right way.
Good luck!
You could use Non-Standard Evaluation:
First you create a character vector with the name of the columns containing the dates. So let' say all the columns starting with 'Date'
dates = names(df)[grepl("^Date", names(df))]
We create a list of formulas that dynamically calculate the difference between to adjacent columns:
all_operations = lapply(seq_len(length(dates) - 1), function(i){
as.formula(paste("~difftime(", dates[i + 1], ",", dates[i],", units = c('hours'))"))
})
this will create the formulas:
[[1]]: ~difftime(Date2, Date1, units = c("hours"))
[[2]]: ~difftime(Date3, Date2, units = c("hours"))
Then you can use dplyr's NSE mutate_ to apply the dynamic formulas generated above:
df %>%
mutate_(.dots = setNames(all_operations, paste0("Diff", seq_len(length(dates) - 1))))
I have a data frame df like:
ID time
a 121:24:30
b 130:30:00
The time column is of factor after importing data.
I want convert the values of time column into minutes. At first, I have tried:
df$time <- times(df$time)
But I got warning message:
"out of day time entry"
I notice the value in the hour position is more than 24 in my dataset.
So how am I supposed to do now?
Thanks in advance!
You could use the lubridate package for this.
library(lubridate)
x <- hms(df$time)
(hour(x) * 60) + minute(x) + (second(x) / 60)
# [1] 7284.5 7830.0
Assuming your data is saved as dat use the following
#convert to character
dat$time <- as.character(dat$time)
#split by ":"
times <- strsplit(dat$time, ":")
# get minutes
dat$time <- sapply(times, function(x){
x = as.numeric(x)
x[1]*60+x[2]+x[3]/60
})
Another option (just for fun) is to play around with the gsubfn package
s <- factor(c("121:24:30", "130:30:00"))
library(gsubfn)
as.numeric(gsubfn("(\\d+):(\\d+):(\\d+)",
~ as.numeric(x)*60 + as.numeric(y) + as.numeric(z)/60,
as.character(s)))
## [1] 7284.5 7830.0
Lets say I have dataframe consisting of 3 columns with dates:
index <- c("31.10.2012", "16.06.2012")
begin <- c("22.10.2012", "29.05.2012")
end <- c("24.10.2012", "17.06.2012")
index.new <- as.Date(index, format = "%d.%m.%Y")
begin.new <- as.Date(begin, format = "%d.%m.%Y")
end.new <- as.Date(end, format = "%d.%m.%Y")
data.frame(index.new, begin.new, end.new)
My problem: I want to select (subset) the rows, where the interval of begin and end-date is within 4 days before the index-day. This is obviously only in row no 2.
Can you help me out here?
Your way to express the problem is messy, in the first case dates.new[1]>dates.new[2] and in the second case dates.new[3]<dates.new[4]. Making things proper:
interval1 = c(dates.new[2], dates.new[1])
interval2 = c(dates.new[3],dates.new[4])
If you wanna check interval2 CONTAINS interval1:
all.equal(findInterval(interval1, interval2),c(1,1))
Pleas let me know if this works and if is what you want
library("timeDate")
index <- c("31.10.2012", "16.06.2012")
begin <- c("22.10.2012", "29.05.2012")
end <- c("24.10.2012", "17.06.2012")
index.new <- as.Date(index, format = "%d.%m.%Y")
begin.new <- as.Date(begin, format = "%d.%m.%Y")
end.new <- as.Date(end, format = "%d.%m.%Y")
data <- data.frame(index.new, begin.new, end.new)
apply(data, 1, function(x){paste(x[1]) %in% paste(timeSequence(x[2], x[3], by = "day"))})