Compute the maximum value by group and by a time "window" - r

For the following Panel data (Tracking the Value for unit "ID" over "Time" :
ID=c(1,1,1,1,1,2,2,2,2,2)
Time=c(1,2,3,4,5,1,2,3,4,5)
Value=c(1,9,4,8,5,2,5,9,7,6)
I would like to create a vector which is a maximum value for each "ID" over the last two days (assuming that the unit of Time is a day)
Output vector "Max_Value" would be as follows:
Max_Value=c(1,9,9,8,8,2,5,9,9,7)
To clarify, here's how Max_Value is computed for ID "1".
For ID "1", the maximum value by the "Time=1" is 1, which is a maximum of {1}.
Similarly, for ID "1", the maximum value at the "Time 2" is 9, which is a maximum of {1,9}.
Again, for ID "1", the maximum value at the "Time 3" is 9, which is a maximum of {9,4}.
For ID "1", the maximum value at the "Time 4" is 8, which is a maximum of {4,8}.
For ID "1", the maximum value at the "Time 5" is 8, which is a maximum of {8,5}.

If you just have vectors and Time is complete and sorted, slide + ave could work well for you:
ave(Value, ID, FUN = function(x) slider::slide_dbl(x, max, .before=1))
#> [1] 1 9 9 8 8 2 5 9 9 7
Or even a full Base R solution:
Value[ave(Value, ID, FUN = function(x) c(0, -(diff(x)<0))) + seq_along(Value)]
#> [1] 1 9 9 8 8 2 5 9 9 7
Otherwise you can solve it with dplyr + slider:
library(dplyr)
data.frame(ID, Time, Value) %>%
group_by(ID) %>%
mutate(Max_Value = slider::slide_index_dbl(Value, Time, max, .before=1)) %>%
ungroup()
#> # A tibble: 10 x 4
#> ID Time Value Max_Value
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 1
#> 2 1 2 9 9
#> 3 1 3 4 9
#> 4 1 4 8 8
#> 5 1 5 5 8
#> 6 2 1 2 2
#> 7 2 2 5 5
#> 8 2 3 9 9
#> 9 2 4 7 9
#> 10 2 5 6 7

Try this:
library(data.table)
dt <- data.table(ID=c(1,1,1,1,1,2,2,2,2,2),
Time=c(1,2,3,4,5,1,2,3,4,5),
Value=c(1,9,4,8,5,2,5,9,7,6))
max_v <- function(x) max(dt[ID==x$ID & Time <= x$Time & Time > (x$Time-2) ,Value])
sapply(split(dt,1:nrow(dt)),max_v)

I believe you can use a rollapply() style function from zoo setting a width of 2:
library(dplyr)
library(tidyr)
library(zoo)
#Data
df <- data.frame(ID,Time,Value)
#Code
newdf <- df %>% group_by(ID) %>%
mutate(Max=rollapply(Value,width=2,FUN=function(x) max(x, na.rm=TRUE),
by=1, by.column=TRUE,partial=TRUE,fill=NA, align="right"))
Output:
# A tibble: 10 x 4
# Groups: ID [2]
ID Time Value Max
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 1 2 9 9
3 1 3 4 9
4 1 4 8 8
5 1 5 5 8
6 2 1 2 2
7 2 2 5 5
8 2 3 9 9
9 2 4 7 9
10 2 5 6 7

With data.table you also can try frollapply (fast rolling function). Note that fill is set to first(Value) in initial row of ID group where there is only one element available instead of two.
dt <- data.frame(ID,Time,Value)
setDT(dt)
dt[, ValueMax := frollapply(x = Value,
n = 2,
max,
fill = first(Value),
align = "right",
na.rm = TRUE),
by = ID]
Output
ID Time Value ValueMax
1: 1 1 1 1
2: 1 2 9 9
3: 1 3 4 9
4: 1 4 8 8
5: 1 5 5 8
6: 2 1 2 2
7: 2 2 5 5
8: 2 3 9 9
9: 2 4 7 9
10: 2 5 6 7

Related

How can I create a new column with mutate function in R that is a sequence of values of other columns in R?

I have a data frame that looks like this :
a
b
c
1
2
10
2
2
10
3
2
10
4
2
10
5
2
10
I want to create a column with mutate function of something else under the dplyr framework of functions (or base) that will be sequence from b to c (i.e from 2 to 10 with length the number of rows of this tibble or data frame)
Ideally my new data frame I want to like like this :
a
b
c
c
1
2
10
2
2
2
10
4
3
2
10
6
4
2
10
8
5
2
10
10
How can I do this with R using dplyr ?
library(tidyverse)
n=5
a = seq(1,n,length.out=n)
b = rep(2,n)
c = rep(10,n)
data = tibble(a,b,c)
We may do
library(dplyr)
data %>%
rowwise %>%
mutate(new = seq(b, c, length.out = n)[a]) %>%
ungroup
-output
# A tibble: 5 × 4
a b c new
<dbl> <dbl> <dbl> <dbl>
1 1 2 10 2
2 2 2 10 4
3 3 2 10 6
4 4 2 10 8
5 5 2 10 10
If you want this done "by group" for each a value (creating many new rows), we can create the sequence as a list column and then unnest it:
data %>%
mutate(result = map2(b, c, seq, length.out = n)) %>%
unnest(result)
# # A tibble: 25 × 4
# a b c result
# <dbl> <dbl> <dbl> <dbl>
# 1 1 2 10 2
# 2 1 2 10 4
# 3 1 2 10 6
# 4 1 2 10 8
# 5 1 2 10 10
# 6 2 2 10 2
# 7 2 2 10 4
# 8 2 2 10 6
# 9 2 2 10 8
# 10 2 2 10 10
# # … with 15 more rows
# # ℹ Use `print(n = ...)` to see more rows
If you want to keep the same number of rows and go from the first b value to the last c value, we can use seq directly in mutate:
data %>%
mutate(result = seq(from = first(b), to = last(c), length.out = n()))
# # A tibble: 5 × 4
# a b c result
# <dbl> <dbl> <dbl> <dbl>
# 1 1 2 10 2
# 2 2 2 10 4
# 3 3 2 10 6
# 4 4 2 10 8
# 5 5 2 10 10
This one?
library(dplyr)
df %>%
mutate(c1 = a*b)
a b c c1
1 1 2 10 2
2 2 2 10 4
3 3 2 10 6
4 4 2 10 8
5 5 2 10 10

Roll max in R. From first row to current row

I would like to calculate max value from first row to current row
df <- data.frame(id = c(1,1,1,1,2,2,2), value = c(2,5,3,2,4,5,4), result = c(NA,2,5,5,NA,4,5))
I have tried grouping by id with dplyr and using rollmax function from zoo but did not success
1) rollmax is used with a fixed width but here we have a variable width so using rollapplyr, which seems close to the approach of the question, we have:
library(dplyr)
library(zoo)
df %>%
group_by(id) %>%
mutate(out = lag(rollapplyr(value, 1:n(), max))) %>%
ungroup
giving:
# A tibble: 7 x 4
# Groups: id [2]
id value result out
<dbl> <dbl> <dbl> <dbl>
1 1 2 NA NA
2 1 5 2 2
3 1 3 5 5
4 1 2 5 5
5 2 4 NA NA
6 2 5 4 4
7 2 4 5 5
2) It is also possible to perform the grouping via the width (second) argument of rollapplyr like this eliminating dplyr. In this case the widths are 1, 2, 3, 4, 1, 2, 3 and Max is like max except it does not use the last element of its argument x. (An alternate expression for the width would be seq_along(id) - match(id, id) + 1).
library(zoo)
Max <- function(x) if (length(x) == 1) NA else max(head(x, -1))
transform(df, out = rollapplyr(value, sequence(rle(id)$lengths), Max))
giving:
id value result out
1 1 2 NA NA
2 1 5 2 2
3 1 3 5 5
4 1 2 5 5
5 2 4 NA NA
6 2 5 4 4
7 2 4 5 5
A data.table option using shift + cummax
> setDT(df)[, result2 := shift(cummax(value)), id][]
id value result result2
1: 1 2 NA NA
2: 1 5 2 2
3: 1 3 5 5
4: 1 2 5 5
5: 2 4 NA NA
6: 2 5 4 4
7: 2 4 5 5
library(dplyr)
df |>
group_by(id) |>
mutate(result = lag(cummax(value)))
# # A tibble: 7 x 3
# # Groups: id [2]
# id value result
# <dbl> <dbl> <dbl>
# 1 1 2 NA
# 2 1 5 2
# 3 1 3 5
# 4 1 2 5
# 5 2 4 NA
# 6 2 5 4
# 7 2 4 5
Here is a base R solution. This would just get you the cumulative maximum:
df$result = ave(df$value, df$i, FUN=cummax)
To get the cumulative maximum with the lag you wanted:
df$result = ave(df$value, df$i, FUN=function(x) c(NA,cummax(x[-(length(x))])))

R Tidy : Dynamic Sequential Threshold

I'm trying to find a tidy way to dynamically adjust a threshold as I "move" through a tibble using library(tidyverse). For example, imagine a tibble containing sequential observations:
example <-
tibble(observed = c(2,1,1,2,2,4,10,4,2,2,3))
example
# A tibble: 11 x 1
observed
<dbl>
1 2
2 1
3 1
4 2
5 2
6 4
7 10
8 4
9 2
10 2
11 3
I'm trying to calculate a threshold that starts with the initial value (2) and increments by a prespecified amount (in this case, 1) unless the current observation is greater than that threshold in which case the current observation becomes the reference threshold and further thresholds increment from it. Here is what the final tibble would look like:
answer <-
example %>%
mutate(threshold = c(2,3,4,5,6,7,10,11,12,13,14))
answer
# A tibble: 11 x 2
observed threshold
<dbl> <dbl>
1 2 2
2 1 3
3 1 4
4 2 5
5 2 6
6 4 7
7 10 10
8 4 11
9 2 12
10 2 13
11 3 14
I'm looking for the best way to do this using dplyr/tidy. All help is appreciated!
EDIT:
The answers so far are very close, but miss in the case that the observed values drop and increase again. For example consider the same tibble as example above, but with a 4 instead of a 3 for the final observation:
example <-
tibble(observed = c(2,1,1,2,2,4,10,4,2,2,4))
example
# A tibble: 11 x 1
observed
<dbl>
1 2
2 1
3 1
4 2
5 2
6 4
7 10
8 4
9 2
10 2
11 4
The diff & cumsum method then gives:
example %>%
group_by(gr = cumsum(c(TRUE, diff(observed) > thresh))) %>%
mutate(thresold = first(observed) + row_number() - 1) %>%
ungroup %>%
select(-gr)
A tibble: 11 x 2
observed thresold
<dbl> <dbl>
1 2 2
2 1 3
3 1 4
4 2 5
5 2 6
6 4 4
7 10 10
8 4 11
9 2 12
10 2 13
11 4 4
Where the final threshold value is incorrect.
You could use diff to create groups and add row number in the group to the first value.
library(dplyr)
thresh <- 1
example %>%
group_by(gr = cumsum(c(TRUE, diff(observed) > thresh))) %>%
mutate(thresold = first(observed) + row_number() - 1) %>%
ungroup %>%
select(-gr)
# A tibble: 11 x 2
# observed thresold
# <dbl> <dbl>
# 1 2 2
# 2 1 3
# 3 1 4
# 4 2 5
# 5 2 6
# 6 4 4
# 7 10 10
# 8 4 11
# 9 2 12
#10 2 13
#11 3 14
To understand how the groups are created here are the detailed steps :
We first calculate the difference between consecutive values
diff(example$observed)
#[1] -1 0 1 0 2 6 -6 -2 0 1
Note that diff gives output of length 1 less than the actual length.
We compare it with thresh which gives TRUE for every time we have value greater than the threshold
diff(example$observed) > thresh
#[1] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
Now since output of diff has one value less we add one value as TRUE
c(TRUE, diff(example$observed) > thresh)
# [1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
and then finally take cumsum to create groups which is used in group_by.
cumsum(c(TRUE, diff(example$observed) > thresh))
# [1] 1 1 1 1 1 2 3 3 3 3 3
EDIT
For the updated question we can add another condition to check of the previous value is greater than the current count and update the values accordingly.
example %>%
group_by(gr = cumsum(c(TRUE, diff(observed) > thresh) &
observed > first(observed) + row_number())) %>%
mutate(thresold = first(observed) + row_number() - 1) %>%
ungroup() %>%
select(-gr)
# A tibble: 11 x 2
# observed thresold
# <dbl> <dbl>
# 1 2 2
# 2 1 3
# 3 1 4
# 4 2 5
# 5 2 6
# 6 4 7
# 7 10 10
# 8 4 11
# 9 2 12
#10 2 13
#11 4 14
We can create the grouping variable with lag of the column difference
library(dplyr)
thresh <- 1
example %>%
group_by(grp = cumsum((observed - lag(observed, default = first(observed)) >
thresh))) %>%
mutate(threshold = observed[1] + row_number() - 1) %>%
ungroup %>%
mutate(new = row_number() + 1,
threshold = pmax(threshold, new)) %>%
select(-grp, -new)
# A tibble: 11 x 2
# observed threshold
# <dbl> <dbl>
# 1 2 2
# 2 1 3
# 3 1 4
# 4 2 5
# 5 2 6
# 6 4 7
# 7 10 10
# 8 4 11
# 9 2 12
#10 2 13
#11 3 14
I think I've figured out a way to do this, by utilizing zoo::locf (although I'm not sure this part is really necessary).
First create the harder of the two examples I've listed in my description:
example2 <-
tibble(observed = c(2,1,1,2,2,4,10,4,2,2,4))
example2 %>%
mutate(def = first(observed) + row_number() - 1) %>%
mutate(t1 = pmax(observed,def)) %>%
mutate(local_maxima = ifelse(observed == t1,t1,NA)) %>%
mutate(groupings = zoo::na.locf(local_maxima)) %>%
group_by(groupings) %>%
mutate(threshold = groupings + row_number() - 1) %>%
ungroup() %>%
select(-def,-t1,-local_maxima,-groupings)
Result:
# A tibble: 11 x 2
observed threshold
<dbl> <dbl>
1 2 2
2 1 3
3 1 4
4 2 5
5 2 6
6 4 7
7 10 10
8 4 11
9 2 12
10 2 13
11 4 14
I'd definitely prefer a more elegant solution if anyone finds one.

Create new column and carry forward value from previous group to next

I am trying to carry forward value from the previous group to the next group. I tried to solve it using rleid but that could not get the desired result.
df <- data.frame(signal = c(1,1,5,5,5,2,3,3,3,4,4,5,5,5,5,6,7,7,8,9,9,9,10),
desired_outcome = c(NA, NA, 1, 1, 1, 5, 2, 2, 2, 3, 3, 4, 4,4,4,5,6,6,7,8,8,8,9))
# outcome column has the expected result -
signal desired_outcome
1 1 NA
2 1 NA
3 5 1
4 5 1
5 5 1
6 2 5
7 3 2
8 3 2
9 3 2
10 4 3
11 4 3
12 5 4
13 5 4
14 5 4
15 5 4
16 6 5
17 7 6
18 7 6
19 8 7
20 9 8
21 9 8
22 9 8
23 10 9
rle will give the lengths and values of sequences where the same value occur. Then: remove the last value, shift remaining values one over, add an NA to the beginning of the value to account for removing the last value, and repeat each value as given by lengths (i.e. the lengths of sequences of same value in the original vector).
with(rle(df$signal), rep(c(NA, head(values, -1)), lengths))
# [1] NA NA 1 1 1 5 2 2 2 3 3 4 4 4 4 5 6 6 7 8 8 8 9
Another way could be to first lag signal then use rleid to create groups and use mutate to broadcast first value of each group to all the values.
library(dplyr)
df %>%
mutate(out = lag(signal)) %>%
group_by(group = data.table::rleid(signal)) %>%
mutate(out = first(out)) %>%
ungroup() %>%
select(-group)
# A tibble: 23 x 2
# signal out
# <dbl> <dbl>
# 1 1 NA
# 2 1 NA
# 3 5 1
# 4 5 1
# 5 5 1
# 6 2 5
# 7 3 2
# 8 3 2
# 9 3 2
#10 4 3
# … with 13 more rows

Create new column based on condition from other column per group using tidy evaluation

Similar to this question but I want to use tidy evaluation instead.
df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
speed = c(3,4,3,4,5,6,6,4,9))
> df
group date speed
1 1 1 3
2 1 2 4
3 1 3 3
4 2 4 4
5 2 5 5
6 2 6 6
7 3 7 6
8 3 8 4
9 3 9 9
The task is to create a new column (newValue) whose values equals to the values of the date column (per group) with one condition: speed == 4. Example: group 1 has a newValue of 2 because date[speed==4] = 2.
group date speed newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
It worked without tidy evaluation
df %>%
group_by(group) %>%
mutate(newValue=date[speed==4L])
#> # A tibble: 9 x 4
#> # Groups: group [3]
#> group date speed newValue
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 3 2
#> 2 1 2 4 2
#> 3 1 3 3 2
#> 4 2 4 4 4
#> 5 2 5 5 4
#> 6 2 6 6 4
#> 7 3 7 6 8
#> 8 3 8 4 8
#> 9 3 9 9 8
But had error with tidy evaluation
my_fu <- function(df, filter_var){
filter_var <- sym(filter_var)
df <- df %>%
group_by(group) %>%
mutate(newValue=!!filter_var[speed==4L])
}
my_fu(df, "date")
#> Error in quos(..., .named = TRUE): object 'speed' not found
Thanks in advance.
We can place the evaluation within brackets. Otherwise, it may try to evaluate the whole expression (filter_var[speed = 4L]) instead of filter_var alone
library(rlang)
library(dplyr)
my_fu <- function(df, filter_var){
filter_var <- sym(filter_var)
df %>%
group_by(group) %>%
mutate(newValue=(!!filter_var)[speed==4L])
}
my_fu(df, "date")
# A tibble: 9 x 4
# Groups: group [3]
# group date speed newValue
# <dbl> <dbl> <dbl> <dbl>
#1 1 1 3 2
#2 1 2 4 2
#3 1 3 3 2
#4 2 4 4 4
#5 2 5 5 4
#6 2 6 6 4
#7 3 7 6 8
#8 3 8 4 8
#9 3 9 9 8
Also, you can use from sqldf. Join df with a constraint on that:
library(sqldf)
df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
speed = c(3,4,3,4,5,6,6,4,9))
sqldf("SELECT df_origin.*, df4.`date` new_value FROM
df df_origin join (SELECT `group`, `date` FROM df WHERE speed = 4) df4
on (df_origin.`group` = df4.`group`)")

Resources