compare time intervals in R - r

Lets say I have dataframe consisting of 3 columns with dates:
index <- c("31.10.2012", "16.06.2012")
begin <- c("22.10.2012", "29.05.2012")
end <- c("24.10.2012", "17.06.2012")
index.new <- as.Date(index, format = "%d.%m.%Y")
begin.new <- as.Date(begin, format = "%d.%m.%Y")
end.new <- as.Date(end, format = "%d.%m.%Y")
data.frame(index.new, begin.new, end.new)
My problem: I want to select (subset) the rows, where the interval of begin and end-date is within 4 days before the index-day. This is obviously only in row no 2.
Can you help me out here?

Your way to express the problem is messy, in the first case dates.new[1]>dates.new[2] and in the second case dates.new[3]<dates.new[4]. Making things proper:
interval1 = c(dates.new[2], dates.new[1])
interval2 = c(dates.new[3],dates.new[4])
If you wanna check interval2 CONTAINS interval1:
all.equal(findInterval(interval1, interval2),c(1,1))

Pleas let me know if this works and if is what you want
library("timeDate")
index <- c("31.10.2012", "16.06.2012")
begin <- c("22.10.2012", "29.05.2012")
end <- c("24.10.2012", "17.06.2012")
index.new <- as.Date(index, format = "%d.%m.%Y")
begin.new <- as.Date(begin, format = "%d.%m.%Y")
end.new <- as.Date(end, format = "%d.%m.%Y")
data <- data.frame(index.new, begin.new, end.new)
apply(data, 1, function(x){paste(x[1]) %in% paste(timeSequence(x[2], x[3], by = "day"))})

Related

Changing date formats from MM-YYYY to DD-MM-YYYY by creating random DD

I have a dataset that has a variable date_of_birth (MM-YYYY). I would like to change this format to DD-MM-YYYY by creating random DD for each observation.
df1 <- as.Date(paste0(df,"01/",MMYYYY),format="%d-%m-%Y")
dates <- c("02-1986", "03-1990")
add_random_day <- function(date) {
date <- lubridate::as_date(date, format="%m-%Y")
days_in_month <- lubridate::days_in_month(date)
random_day <- sapply(days_in_month, sample, size = 1)
lubridate::day(date) <- random_day
date
}
add_random_day(dates)

Replace NA value in column with modified date in other column [duplicate]

This question already has answers here:
How to prevent ifelse() from turning Date objects into numeric objects
(7 answers)
Closed 2 years ago.
I have the following dataset:
A B
2007-11-22 2004-11-18
<NA> 2004-11-10
when the value of column A is NA, I want this value to be replaced by the date in B, except with an additional 25 days added.
Here is what the outcome should look like:
A B
2007-11-22 2004-11-18
2004-12-05 2004-11-10
So far, I have tried the following if else formula, but with no success.
library(lubridate)
data$A<- ifelse(is.na(data$A),data$B+days(25),data$A)
Could anyone tell me what's wrong with it or give me an alternate solution? The code to build my dataset is below.
A<-c("2007-11-22 01:00:00", NA)
B<-c("2004-11-18","2004-11-10")
data<-data.frame(A,B)
data$A<-as.Date(data$A);data$B<-as.Date(data$B)
The reason of the issue can be traced back from the source code of ifelse. When you type View(ifelse), you will see some lines in the bottom of the source code as below
ans <- test
len <- length(ans)
ypos <- which(test)
npos <- which(!test)
if (length(ypos) > 0L)
ans[ypos] <- rep(yes, length.out = len)[ypos]
if (length(npos) > 0L)
ans[npos] <- rep(no, length.out = len)[npos]
ans
where test is logic array, and ans is initialized as a copy of test. When running ans[ypos] <- rep(yes, length.out = len)[ypos], the class of ans is coerced to numeric, rather than Date. That's why you have integers on A column after using ifelse.
You can try the code below
data$A <- as.Date(ifelse(is.na(data$A), data$B + days(25), data$A), origin = "1970-01-01")
which gives
> data
A B
1 2007-11-22 2004-11-18
2 2004-12-05 2004-11-10
Assuming the data given reproducibly in the Note at the end -- in particular we assume both columns are of Date class -- compute a logical vector is_na which indicates which entries are NA and then set those from B.
is_na <- is.na(data$A)
data$A[is_na] <- data$B[is_na] + 25
This would also work and has the advantage that it does not overwrite data:
transform(data, A = replace(A, is.na(A), B[is.na(A)] + 25))
Note
Lines <- "
A B
2007-11-22 2004-11-18
NA 2004-11-10"
data <- read.table(text = Lines, header = TRUE)
data[] <- lapply(data, as.Date) # convert to Date class
Instead of ifelse you could use coalesce
library(tidyverse)
library(lubridate)
A <- c("2007-11-22 01:00:00", NA)
B <- c("2004-11-18","2004-11-10")
data <-data.frame(A,B)
data <- data %>%
mutate(A = as_date(A),
B = as_date(B),
A = coalesce(A,B+days(25)))

R - How to calculate with values by different minutes per hour within the same column

Dear Stackoverflow Community,
I have a Dataset with Datetimes [posixct '%d.%m.%Y %H:%M'] and Sensor measurements in [A] and [V].
The Datetime is one column and the different sensors are the other columns, with one column for each sensor.
I'd like to calculate a correction value with values within the column of each sensor.
The correction value should be written into a new colum hourly.
Therefore I'd like to calculate the correction as following:
correction = |x - (0.5 * (y+z))|
x= value of sensor 1, if Minute =='00'
y= value of sensor 1, if Minute =='03'
z= value of sensor 1, if Minute =='06'
What I'd like to have is a function, which calculates the written formula for every hour, but only if a value for all three minutes ('00'&'03'&'06') in the hour is given and write out the correction value into a new column (Data$correction).
I hope I could explain, what I'd like to do.
I tried several loops and apply and mapply functions, but there was always a problem with the date format, or the function.
This is, what seems to be the the best approach to me, though it doesn't work right now, but I hope there is a way to make it start working.
Also I think, that writing out vectors and merge them back with melt or merge might not be the best way. but right now I'm jst struggling and don't now how to solve the problem.
I really hope you can help me. Thanks so much.
Test_sub <- read.table(file= 'Test_sub.csv',
header=T, sep= ';', dec='.', stringsAsFactors= F)
sensor1_V_0 <- Test_sub[format(Test_sub$Datehour, format = '%M') == '00',]
sensor1_V_3 <- Test_sub[format(Test_sub$Datehour, format = '%M') == '03',]
sensor12_V_6 <- Test_sub[format(Test_sub$Datehour, format = '%M') == '06',]
test_sub2<- mapply(function(x, y, z) x-(0.5*(y+z)), sensor1_V_0$sensor1_V, sensor1_V_3$sensor1_V, sensor1_V_6$sensor1_V)
Let's start by creating some fake data:
dill<-data.frame(time=seq(as.POSIXct("2019-01-01 11:30"), as.POSIXct("2019-01-01 13:20"), by=180),val=runif(37,0,100))
Now we can do this:
require(tidyverse)
require(lubridate)
dill<- dill %>%
group_by(hour(time)) %>% # group by the hour -- note this assumes there's only one day in the data, you'll need to adjust this if there's more than one day
filter(any(minute(time)==3) & any(minute(time)==6) & any(minute(time)==0)) %>% # remove any hours in the data that don't have minutes 0, 3 and 6
mutate(correction=abs(val[minute(time)==0]-0.5*(val[minute(time)==3]+val[minute(time)==6]))) # calculate the correction
An example of the data would be:
y <- seq(from= 0.1, to= 0.5, by= 0.1)
min <- as.POSIXct('2018-09-25 09:00:00')
max <- as.POSIXct('2018-09-26 17:45:00')
SEQ <- data.frame(Datehour = seq.POSIXt(min,max, by = 60*03))
str(SEQ)
SEQ <- data.frame(SEQ[format(SEQ, format = '%M') == '00' |
format(SEQ, format = '%M') == '03' |
format(SEQ, format = '%M') == '06' |
format(SEQ, format = '%M') == '15' |
format(SEQ, format = '%M') == '30' |
format(SEQ, format = '%M') == '45' ,])
data <- data.frame(Datehour=SEQ, y = 0.1, z= 0.3)

Calculate time difference in R

I'm currently struggling with R and calculating the time difference in days.
I have data.frame with around 60 000 rows. In this data frame there are two columns called "start" and "end". Both columns contain data in UNIX time format WITH milliseconds - as you can see by the last three digits.
Start <- c("1470581434000", "1470784954000", "1470811368000", "1470764345000")
End <- c("1470560601000", "1470581549000", "1470785452000", "1470764722000")
d <- data.frame(Start, End)
My desired output should be a extra column called timediff where the time difference is outline in days.
I tried it with timediff and strptime which I found here. But nothing worked out.
Maybe one of you worked with calculation of time differences in the past.
Thanks a lot
There is a very small and fast solution:
Start_POSIX <- as.POSIXct(as.numeric(Start)/1000, origin="1970-01-01")
End_POSIX <- as.POSIXct(as.numeric(End)/1000, origin="1970-01-01")
difftime(Start_POSIX, End_POSIX)
Time differences in mins
[1] 347.216667 3390.083333 431.933333 -6.283333
or if you want another unit:
difftime(Start_POSIX, End_POSIX, unit = "sec")
Time differences in secs
[1] 20833 203405 25916 -377
You have a few steps you'll need to take:
# 1. Separate the milliseconds.
# To do this, insert a period in front of the last three digits
Start <-
sub(pattern = "(\\d{3}$)", # get the pattern of three digits at the end of the string
replacement = ".\\1", # replace with a . and then the pattern
x = Start)
# 2. Convert to numeric
Start <- as.numeric(Start)
# 3. Convert to POSIXct
Start <- as.POSIXct(Start,
origin = "1970-01-01")
For convenience, it would be good to put these all into a function
# Bundle all three steps into one function
unixtime_to_posixct <- function(x)
{
x <- sub(pattern = "(\\d{3}$)",
replacement = ".\\1",
x = x)
x <- as.numeric(x)
as.POSIXct(x,
origin = "1970-01-01")
}
And with that, you can get your differences in days
#* Put it all together.
library(dplyr)
library(magrittr)
Start <- c("1470581434000", "1470784954000", "1470811368000", "1470764345000")
End <- c("1470560601000", "1470581549000", "1470785452000", "1470764722000")
d <- data.frame(Start,
End,
stringsAsFactors = FALSE)
lapply(
X = d,
FUN = unixtime_to_posixct
) %>%
as.data.frame() %>%
mutate(diff = difftime(Start, End, units = "days"))

Pass dataframe and vector values as argument of function

For each row of a data frame I want to count how many events occur between the date time indicated in the column start and end.
Please consider the following function
calcFreqTimeInterval <- function (startTime, endTime, timestampVector) {
sum(timestampVector >= startTime & timestampVector <= endTime)
}
as argument of the function I want
df <- data.frame(start=c("06/11/2013 10:00:00","06/11/2013 17:30:00"), end=c("06/11/2013 11:15:00","06/11/2013 17:45:00"))
timestamp <- as.POSIXlt(c("2013-11-06 10:30:19","2013-11-06 10:32:19","2013-11-06 11:00:19", "2013-11-06 17:40:50","2013-11-06 17:42:50"))
respectively. After converting the columns to Posix with
df$start <- as.POSIXlt((df$start), format="%d/%m/%Y %H:%M:%S")
df$end <- as.POSIXlt((df$end), format="%d/%m/%Y %H:%M:%S")
I would like to obtain as result
expectedResult <- c(3,2)
I should be able to use apply if my arguments were all in the df, but how to use as argument also a vector?
You need to use mapply. Before that you need to use POSIXct class instead of POSIXlt class.
df <- data.frame(start=c("06/11/2013 10:00:00","06/11/2013 17:30:00"), end=c("06/11/2013 11:15:00","06/11/2013 17:45:00"))
timestamp <- as.POSIXct(c("2013-11-06 10:30:19","2013-11-06 10:32:19","2013-11-06 11:00:19", "2013-11-06 17:40:50","2013-11-06 17:42:50"))
df$start <- as.POSIXct((df$start), format="%d/%m/%Y %H:%M:%S")
df$end <- as.POSIXct((df$end), format="%d/%m/%Y %H:%M:%S")
mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary. MoreArgs is a list of other arguments to FUN.
mapply(FUN = calcFreqTimeInterval, startTime = df$start, endTime = df$end, MoreArgs = list(timestampVector = timestamp))
## [1] 3 2

Resources