Calculate the difference between to date columns of a dataframe

Calculate the difference between to date columns of a dataframe - r

How can I get the difference between Date1 and Date2 columns of my dataframe?
Date1 Tfd Date2 Sex
13/08/1936 3 09/01/2013 M
25/04/1948 2 14/05/2014 M
26/01/1939 1 03/07/2015 F
13/02/1935 8 03/08/2012 F
I have tryed:
age<-apply(df[, c("Date1", "Date2")], function(x, y) difftime(strptime(y, format = "%d.%m.%Y"), strptime(x, format = "%d.%m.%Y"),units="years"))
but I get this error:
Error in strptime(y, format = "%d.%m.%Y") :
argument "y" is missing, with no default
Do you know how can I solve this?

You don't need apply here :
as.numeric(as.Date(df$Date2, "%d/%m/%Y") - as.Date(df$Date1, "%d/%m/%Y"))
#[1] 27908 24125 27917 28296
difftime does not have units as 'years'. The maximum units it has is of weeks. You can divide the week value with 52.25 to get year of use lubridate's time_length function.
Or using dplyr with difftime
library(dplyr)
library(lubridate)
df %>%
mutate_at(vars(starts_with('date')), lubridate::dmy) %>%
mutate(diff = time_length(difftime(Date2, Date1), 'years'))
# Date1 Tfd Date2 Sex diff
#1 1936-08-13 3 2013-01-09 M 76.4
#2 1948-04-25 2 2014-05-14 M 66.1
#3 1939-01-26 1 2015-07-03 F 76.4
#4 1935-02-13 8 2012-08-03 F 77.5

Related

Build datetime column in R

I have 2 columns
one is date :
2011-04-13
2013-07-29
2010-11-23
the other is time :
3
22
15
I want to make a new column contains date time
it will be like this
2011-04-13 3:00:00
2013-07-29 22:00:00
2010-11-23 15:00:00
I managed to combine them as string
but when i convert them to datetime i get only date the time disappears
any idea how to get date and time in one column?
my script
data <- read.csv("d:\\__r\\hour.csv")
data$date <- as.POSIXct(paste(data$dteday , paste(data$hr, ":00:00", sep=""), sep=" "))

as example you can use ymd_hm function from lubridate:
a <- c("2014-09-08", "2014-09-08", "2014-09-08")
b <- c(3, 4, 5)
library(lubridate)
library(tidyverse)
tibble(a, b) %>%
mutate(time = paste0(a, " ", b, "-0"),
time = ymd_hm(time))
output would be:
# A tibble: 3 x 3
a b time
<chr> <dbl> <dttm>
1 2014-09-08 3 2014-09-08 03:00:00
2 2014-09-08 4 2014-09-08 04:00:00
3 2014-09-08 5 2014-09-08 05:00:00

found this fixed the problem
data$date <- as.POSIXct(strptime(paste(data$dteday , paste(data$hr, ":00:00", sep=""), sep=" "), "%Y-%m-%d %H:%M:%S"))

Complete data frame with missing date ranges for multiple parameters

I have the following data frame:
Date_from <- c("2013-02-01","2013-05-10","2013-08-13","2013-02-01","2013-05-10","2013-08-13","2013-02-01","2013-05-10","2013-08-13")
Date_to <- c("2013-05-07","2013-08-12","2013-11-18","2013-05-07","2013-08-12","2013-11-18","2013-05-07","2013-08-12","2013-11-18")
y <- data.frame(Date_from,Date_to)
y$concentration <- c("1.5","2.5","1.5","3.5","1.5","2.5","1.5","3.5","3")
y$Parameter<-c("A","A","A","B","B","B","C","C","C")
y$Date_from <- as.Date(y$Date_from)
y$Date_to <- as.Date(y$Date_to)
y$concentration <- as.numeric(y$concentration)
I will need to check the data frame if for EACH Parameter the date range begins at the first day of the year (2013-01-01) and ends at the last day of the year (2013-12-31). If not I will need to add an extra row at the beginning and at the end for each of the parameters to complete the date range to a full year for each parameter. The result should look like this:
Date_from Date_to concentration Parameter
2013-01-01 2013-01-31 NA NA
2013-02-01 2013-05-07 1.5 A
2013-05-10 2013-08-12 2.5 A
2013-08-13 2013-11-18 1.5 A
2013-11-19 2013-12-31 NA NA
2013-01-01 2013-01-31 NA NA
2013-02-01 2013-05-07 3.5 B
2013-05-10 2013-08-12 1.5 B
2013-08-13 2013-11-18 2.5 B
2013-11-19 2013-12-31 NA NA
2013-01-01 2013-01-31 NA NA
2013-02-01 2013-05-07 1.5 C
2013-05-10 2013-08-12 3.5 C
2013-08-13 2013-11-18 3.0 C
2013-11-19 2013-12-31 NA NA
Please note: The date ranges are only equal in this example for simplification.
UPDATE: This is my original data snippet and code:
sm<-read.csv("https://www.dropbox.com/s/tft6inwcrjqujgt/Test_data.csv?dl=1",sep=";",header=TRUE)
cleaned_sm<-sm[,c(4,5,11,14)] ##Delete obsolete columns
colnames(cleaned_sm)<-c("Parameter","Concentration","Date_from","Date_to")
cleaned_sm$Date_from<-as.Date(cleaned_sm$Date_from, format ="%d.%m.%Y")
cleaned_sm$Date_to<-as.Date(cleaned_sm$Date_to, format ="%d.%m.%Y")
#detect comma decimal separator and replace with dot decimal separater as comma is not recognised as a number
cleaned_sm=lapply(cleaned_sm, function(x) gsub(",", ".", x))
cleaned_sm<-data.frame(cleaned_sm)
cleaned_sm$Concentration <- as.numeric(cleaned_sm$Concentration)
cleaned_sm$Date_from <- as.Date(cleaned_sm$Date_from)
cleaned_sm$Date_to <- as.Date(cleaned_sm$Date_to)
Added code based on #jasbner:
cleaned_sm %>%
group_by(Parameter) %>%
do(add_row(.,
Date_from = ymd(max(Date_to))+1 ,
Date_to = ymd(paste(year(max(Date_to)),"1231")),
Parameter = .$Parameter[1])) %>%
do(add_row(.,
Date_to = ymd(min(Date_from))-1,
Date_from = ymd(paste(year(min(Date_from)),"0101")) ,
Parameter = .$Parameter[1],
.before = 0)) %>%
filter(!duplicated(Date_from,fromLast = T),!duplicated(Date_to))

My attempt with dplyr and lubridate. Hacked together but I think it should work. Note this does not look for any gaps in the middle of the date ranges. Basically, for each group, you add a row before and after that particular group. Then if there are any cases where the date range starts at the beginning of the year or ends at the end of the year the added rows are filtered out.
library(dplyr)
library(lubridate)
cleaned_sm %>%
group_by(Parameter) %>%
do(add_row(.,
Date_from = ymd(max(.$Date_to))+1 ,
Date_to = ymd(paste(year(max(.$Date_to)),"1231")),
Parameter = .$Parameter[1])) %>%
do(add_row(.,
Date_to = ymd(min(.$Date_from))-1,
Date_from = ymd(paste(year(min(.$Date_from)),"0101")) ,
Parameter = .$Parameter[1],
.before = 0)) %>%
filter(!duplicated(Date_from,fromLast = T),!duplicated(Date_to))
# A tibble: 15 x 4
# Groups: Parameter [3]
# Date_from Date_to concentration Parameter
# <date> <date> <dbl> <chr>
# 1 2013-01-01 2013-01-31 NA A
# 2 2013-02-01 2013-05-07 1.50 A
# 3 2013-05-10 2013-08-12 2.50 A
# 4 2013-08-13 2013-11-18 1.50 A
# 5 2013-11-19 2013-12-31 NA A
# 6 2013-01-01 2013-01-31 NA B
# 7 2013-02-01 2013-05-07 3.50 B
# 8 2013-05-10 2013-08-12 1.50 B
# 9 2013-08-13 2013-11-18 2.50 B
# 10 2013-11-19 2013-12-31 NA B
# 11 2013-01-01 2013-01-31 NA C
# 12 2013-02-01 2013-05-07 1.50 C
# 13 2013-05-10 2013-08-12 3.50 C
# 14 2013-08-13 2013-11-18 3.00 C
# 15 2013-11-19 2013-12-31 NA C

This seems like it requires a combination of different packages to attack it. I am using tidyr, data.table, and I used lubridate.
date.start <- seq.Date(as.Date("2013-01-01"), as.Date("2013-12-31"), by = "day")
Date.Int <- data.frame(Date_from = date.start, Date_to = date.start)
y_wide <- y %>% spread(Parameter, concentration)
y_wide <- as.data.table(setkey(as.data.table(y_wide), Date_from, Date_to))
Date.Int <- as.data.table(setkey(as.data.table(Date.Int), Date_from, Date_to))
dats <- foverlaps(Date.Int, y_wide, nomatch = NA)
fin.dat <- dats %>%
mutate(A = ifelse(is.na(A), -5, A),
seqs = cumsum(!is.na(A) & A != lag(A, default = -5))) %>%
group_by(seqs) %>%
summarise(Date_from = first(i.Date_from),
Date_to = last(i.Date_to) ,
A = first(A),
B = first(B),
C = first(C)) %>%
mutate(A = ifelse(A == -5, NA, A)) %>%
ungroup()%>%
gather(Concentration, Parameter, A:C) %>%
mutate(Concentration = ifelse(is.na(Parameter), NA, Concentration))
Okay, so I created a vector of dates from a start point to an end point (date.start); then I turned into a data.frame with the same interval names and interval dates for Date.Int. This is because foverlaps needs to compare two intervals (same date start and end dates in Date.Int are now officially intervals). I then took your data you provided and spread, turning it from long format data to wide format data and turned that into a data.table. keying a data.table sets up how it should be arranged, and when using foverlaps you have to key the start dates and end dates (in that order). foverlaps determines if an interval falls within another interval of dates. If you print out dats, you will see a bunch of lines with NA for everything because they did not fall within an interval. So now we have to group these in some manner. I picked grouping by values of "A" in dats. The grouping variable is called seqs. But then I summarised the data, and then switched it back from wide format to long format and replaced the appropriate NA values.

R Difference in time between rows

I've triangulated information from other SO answers for the below code, but getting stuck with an error message. Searched SO for similar errors and resolutions but haven't been able to figure it out, so help is appreciated.
For every group ("id"), I want to get the difference between the start times for consecutive rows.
Reproducible data:
require(dplyr)
df <-data.frame(id=as.numeric(c("1","1","1","2","2","2")),
start= c("1/31/17 10:00","1/31/17 10:02","1/31/17 10:45",
"2/10/17 12:00", "2/10/17 12:20","2/11/17 09:40"))
time <- strptime(df$start, format = "%m/%d/%y %H:%M")
df %>%
group_by(id)%>%
mutate(diff = time - lag(time),
diff_mins = as.numeric(diff, units = 'mins'))
Gets me error:
Error in mutate_impl(.data, dots) :
Column diff must be length 3 (the group size) or one, not 6
In addition: Warning message:
In unclass(time1) - unclass(time2) :
longer object length is not a multiple of shorter object length

Do you mean something like this?
There is no need for lag here, a simple diff on the grouped times is sufficient.
df %>%
mutate(start = as.POSIXct(start, format = "%m/%d/%y %H:%M")) %>%
group_by(id) %>%
mutate(diff = c(0, diff(start)))
## A tibble: 6 x 3
## Groups: id [2]
# id start diff
# <dbl> <dttm> <dbl>
#1 1. 2017-01-31 10:00:00 0.
#2 1. 2017-01-31 10:02:00 2.
#3 1. 2017-01-31 10:45:00 43.
#4 2. 2017-02-10 12:00:00 0.
#5 2. 2017-02-10 12:20:00 20.
#6 2. 2017-02-11 09:40:00 1280.

You can use lag and difftime (per Hadley):
df %>%
mutate(time = as.POSIXct(start, format = "%m/%d/%y %H:%M")) %>%
group_by(id) %>%
mutate(diff = difftime(time, lag(time)))
# A tibble: 6 x 4
# Groups: id [2]
id start time diff
<dbl> <fct> <dttm> <time>
1 1. 1/31/17 10:00 2017-01-31 10:00:00 <NA>
2 1. 1/31/17 10:02 2017-01-31 10:02:00 2
3 1. 1/31/17 10:45 2017-01-31 10:45:00 43
4 2. 2/10/17 12:00 2017-02-10 12:00:00 <NA>
5 2. 2/10/17 12:20 2017-02-10 12:20:00 20
6 2. 2/11/17 09:40 2017-02-11 09:40:00 1280

r subset dataset by date range over multiple years

I am working with a multi-year dataset that has columns for date (%Y-%m-%d) and daily values for several variables.
In R, how do I subset the data by a date range (i.e., June 29 +/- 5 days) but capture the data from all years?
DATE A B C
1996-06-10 12:00:00 178.0 24.1 1.7
1996-06-11 12:00:00 184.1 30.2 1.1
1996-06-12 12:00:00 187.2 29.4 1.8
1996-06-13 12:00:00 194.4 35.0 5.3
1996-06-14 12:00:00 200.3 35.9 1.5
1996-06-15 12:00:00 138.9 15.1 0.0
...

1) Base R
Let yrs be all unique years in the data and targets be each of those years with the target's month and day. Then create dates which contains all dates within delta days of any value in targets. Note that sapply strips dates of its "Date" class but that does not matter since it is only subsequently used in %in% and that ignores the class. Finally subset DF down to those rows whose DATE is in dates. No packages are used.
# inputs (also DF defined in Note at end)
target <- "06-19"
delta <- 5
DATE <- as.Date(DF$DATE)
yrs <- unique(format(DATE, "%Y"))
targets <- as.Date(paste(yrs, target, sep = "-"))
dates <- c(sapply(targets, "+", seq(-delta, delta)))
DF[DATE %in% dates, ]
giving:
DATE A B C
5 1996-06-14 12:00:00 200.3 35.9 1.5
6 1996-06-15 12:00:00 138.9 15.1 0.0
2) sqldf
Alternately, this can be done using a single SQL statement. Note that we assume that the DATE column is character since the question referred to it being in a particular format. Now, using the same inputs the inner select generates target dates from each year and then the outer select joins DF to those rows within delta days of any target date. We use the H2 database backend here since it has better date support than SQLite.
library(sqldf)
library(RH2)
# inputs (also DF defined in Note at end)
target <- "06-19"
delta <- 5
fn$sqldf("select DF.* from DF
join (select distinct cast(substr(DATE, 1, 4) || '-' || '$target' as DATE) as target
from DF)
on cast(substr(DATE, 1, 10) as DATE) between target - $delta and target + $delta")
giving:
DATE A B C
1 1996-06-14 12:00:00 200.3 35.9 1.5
2 1996-06-15 12:00:00 138.9 15.1 0.0
We could simplify the SQL somewhat if DATE is of R's "Date" class. That is, replace the sqldf statement above with:
DF2 <- transform(DF, DATE = as.Date(DATE))
fn$sqldf("select DF2.* from DF2
join (select distinct cast(year(DATE) || '-' || '$target' as DATE) as target from DF2)
on DATE between target - $delta and target + $delta")
giving:
DATE A B C
1 1996-06-14 200.3 35.9 1.5
2 1996-06-15 138.9 15.1 0.0
Note
The input DF is assumed to be:
DF <- structure(list(DATE = c("1996-06-10 12:00:00", "1996-06-11 12:00:00",
"1996-06-12 12:00:00", "1996-06-13 12:00:00", "1996-06-14 12:00:00",
"1996-06-15 12:00:00"), A = c(178, 184.1, 187.2, 194.4, 200.3,
138.9), B = c(24.1, 30.2, 29.4, 35, 35.9, 15.1), C = c(1.7, 1.1,
1.8, 5.3, 1.5, 0)), .Names = c("DATE", "A", "B", "C"), row.names = c(NA,
-6L), class = "data.frame")

A base R attempt.
Stealing the example data from the other answer by Kevin:
df <- data.frame(
my_date = seq.Date(as.Date("1990-01-01"), as.Date("1999-12-31"), by = 1),
x = rnorm(3652),
y = rnorm(3652),
z = rnorm(3652)
)
Set your variables for the selection:
month_num <- 6
day_num <- 29
bound <- 5
Find the key dates in your range of years:
keydates <- as.Date(sprintf(
"%d-%02d-%02d",
do.call(seq, as.list(as.numeric(range(format(df$my_date, "%Y"))))),
month_num,
day_num
))
Make a selection:
out <- df[df$my_date %in% outer(keydates, -bound:bound, `+`),]
Check that it worked:
table(format(out$my_date, "%m-%d"))
#06-24 06-25 06-26 06-27 06-28 06-29 06-30 07-01 07-02 07-03 07-04
# 10 10 10 10 10 10 10 10 10 10 10
One valid value for each day/month for each year 1990 to 1999, centred on "06-29" with a range of 5 days either side

You can use lubridate intervals to provide valid date ranges and then use a purrr map to run each interval over your data to filter.
library(dplyr)
library(lubridate)
library(magrittr) # only because I've used the "exposition" (%$%) pipe
library(purrr)
df <- tibble(
my_date = as.POSIXct(
seq.Date(as.Date("1990-01-01"), as.Date("1999-12-31"), by = 1),
tz = "UTC"
),
x = rnorm(3652),
y = rnorm(3652),
z = rnorm(3652)
)
month_num <- 6
day_num <- 29
bound <- 5
date_span <- df %>%
select(my_date) %>%
filter(month(my_date) == month_num & day(my_date) == day_num) %>%
mutate(
start = my_date - days(bound),
end = my_date + days(bound)
) %$%
interval(start, end, tzone = "UTC")
map_dfr(date_span, ~filter(df, my_date %within% .x))
# # A tibble: 110 x 4
# my_date x y z
# <dttm> <dbl> <dbl> <dbl>
# 1 1990-06-24 10:00:00 0.404 1.33 1.58
# 2 1990-06-25 10:00:00 0.351 -1.73 0.665
# 3 1990-06-26 10:00:00 -0.512 1.01 1.72
# 4 1990-06-27 10:00:00 1.55 0.417 -0.126
# 5 1990-06-28 10:00:00 1.86 1.18 0.322
# 6 1990-06-29 10:00:00 -0.0193 -0.105 0.356
# 7 1990-06-30 10:00:00 0.844 -0.712 1.51
# 8 1990-07-01 10:00:00 -0.431 0.451 -2.19
# 9 1990-07-02 10:00:00 1.74 -0.0650 -0.866
# 10 1990-07-03 10:00:00 0.965 -0.506 -0.0690
# # ... with 100 more rows

You could also go via the Julian day, which allows you to do basic arithmetic operations (e.g. ± 5 days) without the need to convert back and forth between Date and character objects. Keep in mind that your target date translates into a different Julian day during leap years, so you'll need to extract this piece of information somehow (use lubridate::leap_year if you don't like the base R approach below):
## convert dates to julian day
dat$JULDAY = format(
dat$DATE
, "%j"
)
## target date (here 19 june) as julian day
dat$TARGET = ifelse(
as.integer(
format(
dat$DATE
, "%y"
)
) %% 4 == 0
, 171 # leap year
, 170 # common year
)
## create subset
subset(
dat
, JULDAY >= (TARGET - 5) & JULDAY <= (TARGET + 5)
, select = c("DATE", "A", "B", "C")
)
# DATE A B C
# 5 1996-06-14 12:00:00 200.3 35.9 1.5
# 6 1996-06-15 12:00:00 138.9 15.1 0.0

Grouping every n minutes with dplyr

I have a dataset containing 10 events occuring at a certain time on a given day, with corresponding value for each event:
d1 <- data.frame(date = as.POSIXct(c("21/05/2010 19:59:37", "21/05/2010 08:40:30",
"21/05/2010 09:21:00", "21/05/2010 22:29:50", "21/05/2010 11:27:34",
"21/05/2010 18:25:14", "21/05/2010 15:16:01", "21/05/2010 09:41:53",
"21/05/2010 15:01:29", "21/05/2010 09:02:06"), format ="%d/%m/%Y %H:%M:%S"),
value = c(11313,42423,64645,643426,1313313,1313,3535,6476,11313,9875))
I want to aggregate the results every 3 minutes, in a standard dataframe format (from "21/05/2010 00:00:00" to "21/05/2010 23:57:00", so that the dataframe has 480 bins of 3 minutes each)
First, I create a dataframe containing bins of 3 minutes each:
d2 <- data.frame(date = seq(as.POSIXct("2010-05-21 00:00:00"),
by="3 min", length.out=(1440/3)))
Then, I merge the two dataframes together and remove NAs:
library(dplyr)
m <- merge(d1, d2, all=TRUE) %>% mutate(value = ifelse(is.na(value),0,value))
Finally, I use period.apply() from the xts package to sum the values for each bin:
library(xts)
a <- period.apply(m$value, endpoints(m$date, "minutes", 3), sum)
Is there a more efficient way to do this ? It does not feel optimal.
Update #1
I adjusted my code after Joshua's answer:
library(xts)
startpoints <- function (x, on = "months", k = 1) {
head(endpoints(x, on, k) + 1, -1)
}
m <- seq(as.POSIXct("2010-05-21 00:00:00"), by="3 min", length.out=1440/3)
x <- merge(value=xts(d1$value, d1$date), xts(,m))
y <- period.apply(x, c(0,startpoints(x, "minutes", 3)), sum, na.rm=TRUE)
I wasn't aware that na.rm=TRUE could be used with period.apply(), which now allows me to skip mutate(value = ifelse(is.na(value),0,value)). It's a step forward and I'm actually pleased with the xts approach here but I would like to know if there is a pure dplyr solution I could use in such a situation.
Update #2
After trying Khashaa's answer, I had an error because my timezone was not specified. So I had:
> tail(d4)
interval sumvalue
476 2010-05-21 23:45:00 NA
477 2010-05-21 23:48:00 NA
478 2010-05-21 23:51:00 NA
479 2010-05-21 23:54:00 NA
480 2010-05-21 23:57:00 11313
481 2010-05-22 02:27:00 643426
> d4[450,]
interval sumvalue
450 2010-05-21 22:27:00 NA
Now, after Sys.setenv(TZ="UTC"), it all works fine.

lubridate-dplyr-esque solution.
library(lubridate)
library(dplyr)
d2 <- data.frame(interval = seq(ymd_hms('2010-05-21 00:00:00'), by = '3 min',length.out=(1440/3)))
d3 <- d1 %>%
mutate(interval = floor_date(date, unit="hour")+minutes(floor(minute(date)/3)*3)) %>%
group_by(interval) %>%
mutate(sumvalue=sum(value)) %>%
select(interval,sumvalue)
d4 <- merge(d2,d3, all=TRUE) # better if left_join is used
tail(d4)
# interval sumvalue
#475 2010-05-21 23:42:00 NA
#476 2010-05-21 23:45:00 NA
#477 2010-05-21 23:48:00 NA
#478 2010-05-21 23:51:00 NA
#479 2010-05-21 23:54:00 NA
#480 2010-05-21 23:57:00 NA
d4[450,]
# interval sumvalue
#450 2010-05-21 22:27:00 643426
If you are comfortable working with Date (I am not), you can dispense with lubridate, and replace the final merge with left_join.

If you need to group data into n minute bins, the floor_date function can allow multiple units to be specified within the unit argument of the function. For example:
library(lubridate)
x <- ymd_hms("2009-08-03 12:25:59.23")
floor_date(x, unit = "3minutes")
"2009-08-03 12:24:00 UTC"
Using your example:
library(lubridate)
library(tidyverse)
# make complete time sequence
d2 <- data.frame(timePeriod = seq(as.POSIXct("2010-05-21 00:00:00"),
by="3 min", length.out=(1440/3)))
d1 %>%
mutate(timePeriod = floor_date(date, "3minutes")) %>%
group_by(timePeriod) %>%
summarise(sum = sum(value)) %>%
right_join(d2)

I'm not sure about a dplyr solution, but here's an xts solution:
startpoints <- function (x, on = "months", k = 1) {
head(endpoints(x, on, k) + 1, -1)
}
m3 <- seq(as.POSIXct("2010-05-21 00:00:00"),
by="3 min", length.out=1440/3)
x <- merge(value=xts(d1$value, d1$date), xts(,m3))
y <- period.apply(x, c(0,startpoints(x, "minutes", 3)), sum, na.rm=TRUE)
Update: Here's another xts solution that is a bit more careful about correctly aligning the aggregated values. Not to suggest the prior solution was wrong, but this solution is easier to follow and repeat in other analysis.
m3 <- seq(as.POSIXct("2010-05-20 23:59:59.999"),
by="3 min", length.out=1440/3)
x <- merge(value=xts(d1$value, d1$date), xts(,m3))
y <- period.apply(x, endpoints(x, "minutes", 3), sum, na.rm=TRUE)
y <- align.time(y, 60*3)

Recently, the padr package has been developed which can also solve this in a clean way.
library(lubridate)
library(dplyr)
library(padr)
d1 <- data.frame(date = as.POSIXct(c("21/05/2010 19:59:37", "21/05/2010 08:40:30",
"21/05/2010 09:21:00", "21/05/2010 22:29:50", "21/05/2010 11:27:34",
"21/05/2010 18:25:14", "21/05/2010 15:16:01", "21/05/2010 09:41:53",
"21/05/2010 15:01:29", "21/05/2010 09:02:06"), format ="%d/%m/%Y %H:%M:%S"),
value = c(11313,42423,64645,643426,1313313,1313,3535,6476,11313,9875))
res <- d1 %>%
as_tibble() %>%
arrange(date) %>%
# Thicken the results to fall in 3 minute buckets
thicken(
interval = '3 min',
start_val = as.POSIXct('2010-05-21 00:00:00'),
colname = "date_pad") %>%
# Pad the results to fill in the rest of the 3 minute buckets
pad(
interval = '3 min',
by = 'date_pad',
start_val = as.POSIXct('2010-05-21 00:00:00'),
end_val = as.POSIXct('2010-05-21 23:57:00')) %>%
select(date_pad, value)
res
#> # A tibble: 480 x 2
#> date_pad value
#> <dttm> <dbl>
#> 1 2010-05-21 00:00:00 NA
#> 2 2010-05-21 00:03:00 NA
#> 3 2010-05-21 00:06:00 NA
#> 4 2010-05-21 00:09:00 NA
#> 5 2010-05-21 00:12:00 NA
#> 6 2010-05-21 00:15:00 NA
#> 7 2010-05-21 00:18:00 NA
#> 8 2010-05-21 00:21:00 NA
#> 9 2010-05-21 00:24:00 NA
#> 10 2010-05-21 00:27:00 NA
#> # ... with 470 more rows
res[450,]
#> # A tibble: 1 x 2
#> date_pad value
#> <dttm> <dbl>
#> 1 2010-05-21 22:27:00 643426

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Calculate the difference between to date columns of a dataframe - r

Related

Build datetime column in R

Complete data frame with missing date ranges for multiple parameters

R Difference in time between rows

r subset dataset by date range over multiple years

Grouping every n minutes with dplyr

Categories

Resources