Get means of every n rows without groups

Get means of every n rows without groups - r

My data (df) looks similar to this
date
address1
address2
2015-01-01
2
8
2015-01-02
3
7
2015-01-03
7
3
2015-01-04
3
1
2015-01-05
9
4
2015-01-06
3
4
I want to get 3 days average of value at each address like this
date
address1
address2
2015-01-03
4
6
2015-01-06
5
3
I have tried to extract date by every three days like d<-date[seq(1,length(date),by=3)]. I calculated the value using dat<-rowsum(df[,-1],rep(1:6,each=3)). Then divided the whole dataframe by 3 and combined d and dat.
I have tried to find rowmean works like rowsum, but did not manage to. Also, rolling means does not suit my case as it averages rows with overlapping (multiple use of rows).
Please help me to improve my method. Thanks a lot.

You can create group of every 3 rows and take mean of all the "address" columns -
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
group_by(grp = ceiling(row_number()/3)) %>%
summarise(date = last(date),
across(starts_with('address'), mean, na.rm = TRUE)) %>%
select(-grp)
# date address1 address2
# <date> <dbl> <dbl>
#1 2015-01-03 4 6
#2 2015-01-06 5 3
Another option is to cut by 3 days but it will give the date of starting of the group.
df %>%
mutate(date = as.Date(date)) %>%
group_by(date = cut(date, '3 days')) %>%
summarise(across(starts_with('address'), mean, na.rm = TRUE))

The solutions below use the input shown reproducibly in the Note at the end. The first two use only base R. The first requires that the number of rows be a multiple of 3 but the others so not have this restriction.
1) rowsum Create a grouping vector, date, and use it in the second argument to rowsum giving the numeric matrix shown.
nr <- nrow(df)
date <- df$date[ 3 * col(matrix(0, 3, nr/3)) ]
rowsum(df[-1], date) / 3
## address1 address2
## 2015-01-03 4 6
## 2015-01-06 5 3
2) aggregate Alternately use aggregate giving a 3 column data frame.
nr <- nrow(df)
date <- ave(df$date, seq(0, length = nr) %/% 3, FUN = max)
aggregate(df[-1], data.frame(date), mean)
## date address1 address2
## 1 2015-01-03 4 6
## 2 2015-01-06 5 3
3) collap collap from the collapse package can be used in place of aggregate. date is from (2).
library(collapse)
collap(df[-1], date)
## date address1 address2
## 1 2015-01-03 4 6
## 2 2015-01-06 5 3
4) data.table Using data.table and date from (2) this returns a data.table (which is also a data frame).
library(data.table)
as.data.table(df[, -1])[, lapply(.SD, mean), by = .(date)]
## date address1 address2
## 1: 2015-01-03 4 6
## 2: 2015-01-06 5 3
Note
The input in reproducible form is:
df <-
structure(list(date = c("2015-01-01", "2015-01-02", "2015-01-03",
"2015-01-04", "2015-01-05", "2015-01-06"), address1 = c(2L, 3L,
7L, 3L, 9L, 3L), address2 = c(8L, 7L, 3L, 1L, 4L, 4L)), class = "data.frame", row.names = c(NA,
-6L))

Another base R option with aggregate + ave
aggregate(
. ~ date,
transform(
df,
date = ave(date, ceiling(seq_along(date) / 3), FUN = max)
),
mean
)
gives
date address1 address2
1 2015-01-03 4 6
2 2015-01-06 5 3

Related

How to combine data with same rownames to one column in R

I'm trying to move a large list with >200000 character from this:
startTime 1
max 3
min 1
EndTime 2
avg 2
startTime 2
max ..
min ..
EndTime ..
avg ..
..
to a dataframe like this:
startTime max min EndTime avg
1 3 1 2 2
2 .. .. .. ..
I managed it by looping it through a for-loop. It takes to much time. Is there a more sufficient way by not looping it through a for-loop?

Expanding your input data a bit you could use unstack from base R.
Input:
dat
# V1 V2
#1 startTime 1
#2 max 3
#3 min 1
#4 EndTime 2
#5 avg 2
#6 startTime 2
#7 max 3
#8 min 4
#9 EndTime 5
#10 avg 6
Result:
out <- unstack(dat, V2 ~ V1)
out
# avg EndTime max min startTime
#1 2 2 3 1 1
#2 6 5 3 4 2
If you want the column names in the same order as the they appear in dat$V1 do
out <- out[unique(dat$V1)]
data
dat <- structure(list(V1 = c("startTime", "max", "min", "EndTime", "avg",
"startTime", "max", "min", "EndTime", "avg"), V2 = c(1L, 3L,
1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-10L))

simply tranform it
library( data.table )
dt <- data.table::fread(" startTime 1
max 3
min 1
EndTime 2
avg 2
startTime 2", header = FALSE)
as.data.table( t( dt ) )
# V1 V2 V3 V4 V5 V6
# 1: startTime max min EndTime avg startTime
# 2: 1 3 1 2 2 2

This is not an exact duplicate of How to reshape data from long to wide format? so I will answer.
First create a new column ID and then use one of the solutions in the duplicate. I will use the solution based on package reshape2.
pattern <- as.character(df1[1, 1])
ipat <- grep(pattern, df1[[1]])
df1$ID <- rep(seq_along(ipat), nrow(df1)/length(ipat))
library(reshape2)
result <- dcast(df1, ID ~ V1, value.var = "V2")[-1]
# avg EndTime max min startTime
#1 2 3 4 1 1
#2 1 2 3 2 2
Final clean up, put the input dataset df1 back as it were.
df1 <- df1[-ncol(df1)]
Data.
df1 <- read.table(text = "
startTime 1
max 3
min 1
EndTime 2
avg 2
startTime 2
max 4
min 2
EndTime 3
avg 1
")

Here are some alternatives. They do not use any packages.
Assume the input DF shown reproducibly in the Note at the end.
1) xtabs The first line of code converts the first column to character in case it is factor. We do not need this with the data shown in the Note but it doesn't hurt and might be useful if the column were factor so that it is in a known state.
Then convert the V1 column to a factor having levels in the order that appear so that they don't get rearranged upon output. Also define nicer names and create a Group number vector which numbers the first group of 5 rows as 1, the second group 2 and so on.
Finally use xtabs to create the desired table. If you prefer a data frame as the output rather than a table then use as.data.frame(xt).
DF2 <- transform(DF, V1 = as.character(V1))
DF2 <- transform(DF2, Stat = factor(V1, levels = V1[1:5]),
Value = V2,
Group = cumsum(V1== "startTime"))
xt <- xtabs(Value ~ Group + Stat, DF2)
xt
giving:
Stat
Group startTime max min EndTime avg
1 1 3 1 2 2
2 2 4 1 3 2
2) matrix Even shorter is this one-liner. It gives a matrix. Use as.data.frame(m) if you want a data frame.
m <- matrix(DF$V2,, 5, byrow = TRUE, list(NULL, DF$V1[1:5]))
m
giving:
startTime max min EndTime avg
[1,] 1 3 1 2 2
[2,] 2 4 1 3 2
Note
The input in reproducible form. I have added a few rows.
Lines <- "
startTime 1
max 3
min 1
EndTime 2
avg 2
startTime 2
max 4
min 1
EndTime 3
avg 2"
DF <- read.table(text = Lines, as.is = TRUE)

A tidyverse solution using #markus' data would be :
library(tidyverse)
dat %>%
group_by(tmp = cumsum(V1=="startTime")) %>%
spread(V1,V2) %>%
ungroup %>%
select(-tmp)
# # A tibble: 2 x 5
# avg EndTime max min startTime
# <int> <int> <int> <int> <int>
# 1 2 2 3 1 1
# 2 6 5 3 4 2

calculate timeline for different subjects in dataframe

I have data like
subject date number
1 1/2/01 4
1 3/2/01 6
1 10/2/01 7
2 1/1/01 2
2 4/1/01 3
I want to get R to work out the number of days since the first sample for each subject. eg:
Subject days
1 0
1 2
1 9
2 0
2 3
How can I do this? I have converted the dates using lubridate.
SOmething like:
for(i in 1:nrow(data)){
if(data$date[i] != data$date[i -1]) {
data$timeline <- data$date[i] - data$date[i-1]
}
}
I get the error:
argument is of length 0 - I think the problem is the first line where there is no preceeding row..?

I would use dplyr to do some grouping and data manipulation. Note that we first have to convert your date into something R will recognize as a date.
library(dplyr)
dat$Date <- as.Date(dat$date, '%d/%m/%y')
dat %>%
group_by(subject) %>%
mutate(days = Date - min(Date))
# subject date number Date days
# <int> <chr> <int> <date> <time>
# 1 1 1/2/01 4 2001-02-01 0
# 2 1 3/2/01 6 2001-02-03 2
# 3 1 10/2/01 7 2001-02-10 9
# 4 2 1/1/01 2 2001-01-01 0
# 5 2 4/3/01 3 2001-03-04 62
here's the data:
dat <- structure(list(subject = c(1L, 1L, 1L, 2L, 2L), date = c("1/2/01",
"3/2/01", "10/2/01", "1/1/01", "4/3/01"), number = c(4L, 6L,
7L, 2L, 3L), Date = structure(c(11354, 11356, 11363, 11323, 11385
), class = "Date")), .Names = c("subject", "date", "number",
"Date"), row.names = c(NA, -5L), class = "data.frame")

Using the input shown in the note convert the date column to Date class (assuming that it is in the form dd/mm/yy) and then use ave to subtract the least date from all the dates for each subject. If the input is sorted as in the question we could optionally use x[1] instead of min(x). No packages are used.
data$date <- as.Date(data$date, "%d/%m/%y")
diff1 <- function(x) x - min(x)
with(data, data.frame(subject, days = ave(as.numeric(date), subject, FUN = diff1)))
giving:
subject days
1 1 0
2 1 2
3 1 9
4 2 0
5 2 62
Note
The input used, in reproducible form, is:
Lines <- "
subject date number
1 1/2/01 4
1 3/2/01 6
1 10/2/01 7
2 1/1/01 2
2 4/3/01 3"
data <- read.table(text = Lines, header = TRUE)

R Finding times between events and admissions, within patients

I have a series of admissions for patients (dataframe 'admissions' below) and a series of events (2nd dataframe called 'events').
I am interested in whether events occurred within 5 days after an admission. Obviously matches have to be made within patient ID ('id').
In real life, the admissions data frame contains >500k admissions on 100k pts. One patient might have multiple admissions, and multiple events. Not all patients will have an event.
admissions <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L), date = structure(c(16436,
16443, 16574, 16468, 16481, 16494), class = "Date")), .Names = c("id",
"date"), row.names = c(NA, 6L), class = "data.frame")
> admissions
id date
1 1 2015-01-01
2 1 2015-01-08
3 1 2015-05-19
4 2 2015-02-02
5 2 2015-02-15
6 2 2015-02-28
events <- structure(list(id = c(1L, 1L, 2L), date = structure(c(16453,
16578, 16467), class = "Date")), .Names = c("id", "date"), row.names = 7:9, class = "data.frame")
> events
id date
7 1 2015-01-18
8 1 2015-05-23
9 2 2015-02-01
I guess I just need the minimum difference in days (only positive values considered) for each event relative to the admissions, matched within patients.
Event 1 (id ==1): +10 days (10 days after 08/01/2015)
Event 2 (id ==1): +4 days
Event 3 (id ==2): -1 days
I can then select those events that fall within my window (which will probably be 5 days).
My guess would be that lapply() is involved, but for some reason the apply's are not every natural to me (yet!).

Using dplyr:
library(dplyr)
mutate(events, event_id=row_number()) %>% # Add event id
right_join(admissions, by="id") %>% # Join with admissions
rename(adm_date = date.y, ev_date = date.x) %>% # Clean names
mutate(diff = ev_date - adm_date) %>% # Compute diffrence
filter(diff >= 0) %>% # Filter
group_by(event_id) %>%
arrange(diff) %>% # Sort ascending by diff by event_id
summarise_each(funs(first), ev_date, adm_date, diff) # Get nearest
Source: local data frame [2 x 4]
event_id ev_date adm_date diff
1 1 2015-01-18 2015-01-08 10 days
2 2 2015-05-23 2015-05-19 4 days
Using data.table rolling join:
keycols <- c("id", "date")
admissions_dt <- admissions %>% mutate(adm_date = date) %>% as.data.table()
setkeyv(admissions_dt, keycols)
events_dt <- mutate(events, event_id=row_number()) %>% as.data.table()
setkeyv(events_dt, keycols)
admissions_dt[events_dt, roll=10][order(event_id)]
id date adm_date event_id
1: 1 2015-01-18 2015-01-08 1
2: 1 2015-05-23 2015-05-19 2
3: 2 2015-02-01 <NA> 3

Using data.table 1.9.5 for its on= feature.
For each row in event, find the index corresponding to the closest date <= admissions$date.
idx = setDT(admissions)[events, which=TRUE, roll=TRUE, on=c("id", "date")]
idx
# [1] 2 3 NA
If you already know you'll only prefer 5 day window, then you can use roll=5 instead of roll=TRUE. roll=<positive number> performs a LOCF rolling join.
The indices correspond to matching rows in admission for each row of event. So we can now extract the date as follows:
setDT(events)[, adm_date := admission$date[idx]]
# id date adm_date
# 1: 1 2015-01-18 2015-01-08
# 2: 1 2015-05-23 2015-05-19
# 3: 2 2015-02-01 <NA>

Replace NA with values from previous date

I have dated data frame like this one with approximately 1 million rows
id date variable
1 1 2015-01-01 NA
2 1 2015-01-02 -1.1874087
3 1 2015-01-03 -0.5936396
4 1 2015-01-04 -0.6131957
5 1 2015-01-05 1.0291688
6 1 2015-01-06 -1.5810152
Reproducible example is here:
#create example data set
Df <- data.frame(id = factor(rep(1:3, each = 10)),
date = rep(seq.Date(from = as.Date('2015-01-01'),
to = as.Date('2015-01-10'), by = 1),3),
variable = rnorm(30))
Df$variable[c(1,7,12,18,22,23,29)] <- NA
What I want to do is replace NA values in variable with values from previous date for each id. I created loop which works but very slow (You can find it below). Can you please advice fast alternative for this task. Thank you!
library(dplyr)
#create new variable
Df$variableNew <- Df$variable
#create row numbers vector
Df$n <- 1:dim(Df)[1]
#order data frame by date
Df <- arrange(Df, date)
for (id in levels(Df$id)){
I <- Df$n[Df$id == id] # create vector of rows for specific id
for (row in 1:length(I)){ #if variable == NA for the first date change it to mean value
if (is.na(Df$variableNew[I[1]])) {
Df$variableNew[I[row]] <- mean(Df$variable,na.rm = T)
}
if (is.na(Df$variableNew[I[row]])){ # if variable == NA fassign to this date value from previous date
Df$variableNew[I[row]] <- Df$variableNew[I[row-1]]
}
}
}

This data.table solution should be extremely fast.
library(zoo) # for na.locf(...)
library(data.table)
setDT(Df)[,variable:=na.locf(variable, na.rm=FALSE),by=id]
Df[,variable:=if (is.na(variable[1])) c(mean(variable,na.rm=TRUE),variable[-1]) else variable,by=id]
Df
# id date variable
# 1: 1 2015-01-01 -0.288720759
# 2: 1 2015-01-02 -0.005344028
# 3: 1 2015-01-03 0.707310667
# 4: 1 2015-01-04 1.034107735
# 5: 1 2015-01-05 0.223480415
# 6: 1 2015-01-06 -0.878707613
# 7: 1 2015-01-07 -0.878707613
# 8: 1 2015-01-08 -2.000164945
# 9: 1 2015-01-09 -0.544790740
# 10: 1 2015-01-10 -0.255670709
# ...
So this replaces all embedded NA using locf by id, and then makes a second pass replacing any leading NA with the average of variable for that id. Note that if you do this is the reverse order you may get a different answer.

If you get the dev version of tidyr(0.3.0) available on github, there is a function fill which will do this exactly:
#devtools::install_github("hadley/tidyr")
library(tidyr)
library(dplyr)
Df %>% group_by(id) %>%
fill(variable)
It will not do the first value - We can do that with a mutate and replace:
Df %>% group_by(id) %>%
mutate(variable = ifelse(is.na(variable) & row_number()==1,
replace(variable, 1, mean(variable, na.rm = TRUE)),
variable)) %>%
fill(variable)

How to flatten / merge overlapping time periods

I have a large data set of time periods, defined by a 'start' and and an 'end' column. Some of the periods overlap.
I would like to combine (flatten / merge / collapse) all overlapping time periods to have one 'start' value and one 'end' value.
Some example data:
ID start end
1 A 2013-01-01 2013-01-05
2 A 2013-01-01 2013-01-05
3 A 2013-01-02 2013-01-03
4 A 2013-01-04 2013-01-06
5 A 2013-01-07 2013-01-09
6 A 2013-01-08 2013-01-11
7 A 2013-01-12 2013-01-15
Desired result:
ID start end
1 A 2013-01-01 2013-01-06
2 A 2013-01-07 2013-01-11
3 A 2013-01-12 2013-01-15
What I have tried:
require(dplyr)
data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "A"),
start = structure(c(1356998400, 1356998400, 1357084800, 1357257600,
1357516800, 1357603200, 1357948800), tzone = "UTC", class = c("POSIXct",
"POSIXt")), end = structure(c(1357344000, 1357344000, 1357171200,
1357430400, 1357689600, 1357862400, 1358208000), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), .Names = c("ID", "start", "end"), row.names = c(NA,
-7L), class = "data.frame")
remove.overlaps <- function(data){
data2 <- data
for ( i in 1:length(unique(data$start))) {
x3 <- filter(data2, start>=data$start[i] & start<=data$end[i])
x4 <- x3[1,]
x4$end <- max(x3$end)
data2 <- filter(data2, start<data$start[i] | start>data$end[i])
data2 <- rbind(data2,x4)
}
data2 <- na.omit(data2)}
data <- remove.overlaps(data)

Here's a possible solution. The basic idea here is to compare lagged start date with the maximum end date "until now" using the cummax function and create an index that will separate the data into groups
data %>%
arrange(ID, start) %>% # as suggested by #Jonno in case the data is unsorted
group_by(ID) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(ID, indx) %>%
summarise(start = first(start), end = last(end))
# Source: local data frame [3 x 4]
# Groups: ID
#
# ID indx start end
# 1 A 0 2013-01-01 2013-01-06
# 2 A 1 2013-01-07 2013-01-11
# 3 A 2 2013-01-12 2013-01-15

#David Arenburg's answer is great - but I ran into an issue where an earlier interval ended after a later interval - but using last in the summarise call resulted in the wrong end date. I'd suggest changing first(start) and last(end) to min(start) and max(end)
data %>%
group_by(ID) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(ID, indx) %>%
summarise(start = min(start), end = max(end))
Also, as #Jonno Bourne mentioned, sorting by start and any grouping variables is important before applying the method.

For the sake of completeness, the IRanges package on Bioconductor has some neat functions which can be used to deal with date or date time ranges. One of it is the reduce() function which merges overlapping or adjacent ranges.
However, there is a drawback because IRanges works on integer ranges (hence the name), so the convenience of using IRanges functions comes at the expense of converting Date or POSIXct objects to and fro.
Also, it seems that dplyr doesn't play well with IRanges (at least judged by my limited experience with dplyr) so I use data.table:
library(data.table)
options(datatable.print.class = TRUE)
library(IRanges)
library(lubridate)
setDT(data)[, {
ir <- reduce(IRanges(as.numeric(start), as.numeric(end)))
.(start = as_datetime(start(ir)), end = as_datetime(end(ir)))
}, by = ID]
ID start end
<fctr> <POSc> <POSc>
1: A 2013-01-01 2013-01-06
2: A 2013-01-07 2013-01-11
3: A 2013-01-12 2013-01-15
A code variant is
setDT(data)[, as.data.table(reduce(IRanges(as.numeric(start), as.numeric(end))))[
, lapply(.SD, as_datetime), .SDcols = -"width"],
by = ID]
In both variants the as_datetime() from the lubridate packages is used which spares to specify the origin when converting numbers to POSIXct objects.
Would be interesting to see a benchmark comparision of the IRanges approaches vs David's answer.

It looks like I'm a little late to the party, but I took #zach's code and re-wrote it using data.table below. I didn't do comprehensive testing, but this seemed to run about 20% faster than the tidy version. (I couldn't test the IRange method because the package is not yet available for R 3.5.1)
Also, fwiw, the accepted answer doesn't capture the edge case in which one date range is totally within another (e.g., 2018-07-07 to 2017-07-14 is within 2018-05-01 to 2018-12-01). #zach's answer does capture that edge case.
library(data.table)
start_col = c("2018-01-01","2018-03-01","2018-03-10","2018-03-20","2018-04-10","2018-05-01","2018-05-05","2018-05-10","2018-07-07")
end_col = c("2018-01-21","2018-03-21","2018-03-31","2018-04-09","2018-04-30","2018-05-21","2018-05-26","2018-05-30","2018-07-14")
# create fake data, double it, add ID
# change row 17, such that each ID grouping is a little different
# also adds an edge case in which one date range is totally within another
# (this is the edge case not currently captured by the accepted answer)
d <- data.table(start_col = as.Date(start_col), end_col = as.Date(end_col))
d2<- rbind(d,d)
d2[1:(.N/2), ID := 1]
d2[(.N/2 +1):.N, ID := 2]
d2[17,end_col := as.Date('2018-12-01')]
# set keys (also orders)
setkey(d2, ID, start_col, end_col)
# get rid of overlapping transactions and do the date math
squished <- d2[,.(START_DT = start_col,
END_DT = end_col,
indx = c(0, cumsum(as.numeric(lead(start_col)) > cummax(as.numeric(end_col)))[-.N])),
keyby=ID
][,.(start=min(START_DT),
end = max(END_DT)),
by=c("ID","indx")
]

I think that you can solve this problem pretty nicely with dplyr and the ivs package, which is designed for working with interval vectors, exactly like what you have here. It is inspired by IRanges, but is more suitable for use in the tidyverse and is completely generic so it can handle date intervals automatically (no need to convert to numeric and back).
The key is to combine the start/end boundaries into a single interval vector column, and then use iv_groups(). This merges all of the overlapping intervals in the interval vector and returns the intervals that remain after the overlaps have been merged.
It seems like you want to do this by ID, so I've also grouped by ID.
library(ivs)
library(dplyr)
data <- tribble(
~ID, ~start, ~end,
"A", "2013-01-01", "2013-01-05",
"A", "2013-01-01", "2013-01-05",
"A", "2013-01-02", "2013-01-03",
"A", "2013-01-04", "2013-01-06",
"A", "2013-01-07", "2013-01-09",
"A", "2013-01-08", "2013-01-11",
"A", "2013-01-12", "2013-01-15"
) %>%
mutate(
start = as.Date(start),
end = as.Date(end)
)
data
#> # A tibble: 7 × 3
#> ID start end
#> <chr> <date> <date>
#> 1 A 2013-01-01 2013-01-05
#> 2 A 2013-01-01 2013-01-05
#> 3 A 2013-01-02 2013-01-03
#> 4 A 2013-01-04 2013-01-06
#> 5 A 2013-01-07 2013-01-09
#> 6 A 2013-01-08 2013-01-11
#> 7 A 2013-01-12 2013-01-15
# Combine `start` and `end` into a single interval vector column
data <- data %>%
mutate(interval = iv(start, end), .keep = "unused")
# Note that this is a half-open interval!
data
#> # A tibble: 7 × 2
#> ID interval
#> <chr> <iv<date>>
#> 1 A [2013-01-01, 2013-01-05)
#> 2 A [2013-01-01, 2013-01-05)
#> 3 A [2013-01-02, 2013-01-03)
#> 4 A [2013-01-04, 2013-01-06)
#> 5 A [2013-01-07, 2013-01-09)
#> 6 A [2013-01-08, 2013-01-11)
#> 7 A [2013-01-12, 2013-01-15)
# It seems like you'd want to group by ID, so lets do that.
# Then we use `iv_groups()` which merges all overlapping intervals and returns
# the intervals that remain after all the overlaps have been merged
data %>%
group_by(ID) %>%
summarise(interval = iv_groups(interval), .groups = "drop")
#> # A tibble: 3 × 2
#> ID interval
#> <chr> <iv<date>>
#> 1 A [2013-01-01, 2013-01-06)
#> 2 A [2013-01-07, 2013-01-11)
#> 3 A [2013-01-12, 2013-01-15)
Created on 2022-04-05 by the reprex package (v2.0.1)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Get means of every n rows without groups - r

Another base R option with aggregate + ave aggregate( . ~ date, transform( df, date = ave(date, ceiling(seq_along(date) / 3), FUN = max) ), mean ) gives date address1 address2 1 2015-01-03 4 6 2 2015-01-06 5 3

Related

How to combine data with same rownames to one column in R

calculate timeline for different subjects in dataframe

R Finding times between events and admissions, within patients

Replace NA with values from previous date

How to flatten / merge overlapping time periods

Categories

Resources