How to perform a cumulative sum with unique IDs only? - r

I have the following data frame:
d<-data.frame(Day=c(1, 1, 1, 1, 1, 1, 2), ID=c("A", "B", "C", "D", "A", "B", "B"), Value=c(1, 2, 3, 4, 5, 6, 7))
On each day, I would like a cumulative sum of unique values, taking only the most recent value for an entry that repeats. My expected output is as follows:
d<-data.frame(Day=c(1, 1, 1, 1, 1, 1, 2), ID=c("A", "B", "C", "D", "A", "B", "B"), Value=c(1, 2, 3, 4, 5, 6, 7), Sum=c(1, 3, 6, 10, 14, 18, 7))
Day ID Value Sum
1 1 A 1 1
2 1 B 2 3
3 1 C 3 6
4 1 D 4 10
5 1 A 5 14
6 1 B 6 18
7 2 B 7 7
where the 5th entry adds up values 2, 3, 4, 5 (because A repeats) and the 6th entry adds up values 3, 4, 5, and 6 (because both A and B repeat). The 7th entry restarts because it is a new day.
I don't think I can use cumsum() as it only accepts 1 parameter. I also don't want to keep a counter for each ID as I may have up to 100 unique IDs per day.
Any hints or help would be appreciated! Thank you!

You can difference the values by ID and Day and then use cumsum:
library(data.table)
setDT(d)
d[, v_eff := Value - shift(Value, fill=0), by=.(Day, ID)]
d[, s := cumsum(v_eff), by=Day]
Day ID Value Sum v_eff s
1: 1 A 1 1 1 1
2: 1 B 2 3 2 3
3: 1 C 3 6 3 6
4: 1 D 4 10 4 10
5: 1 A 5 14 4 14
6: 1 B 6 18 4 18
7: 2 B 7 7 7 7
Base R analogue...
d$v_eff <- with(d, ave(Value, Day, ID, FUN = function(x) c(x[1], diff(x)) ))
d$s <- with(d, ave(v_eff, Day, FUN = cumsum))
Day ID Value Sum v_eff s
1 1 A 1 1 1 1
2 1 B 2 3 2 3
3 1 C 3 6 3 6
4 1 D 4 10 4 10
5 1 A 5 14 4 14
6 1 B 6 18 4 18
7 2 B 7 7 7 7

Related

Replace value when value above and below are the same

I have the following dataframe df (dput below):
> df
group value
1 A 2
2 A 2
3 A 3
4 A 2
5 A 1
6 A 2
7 A 2
8 A 2
9 B 3
10 B 3
11 B 3
12 B 4
13 B 3
14 B 3
15 B 4
16 B 4
I would like to replace value when the value above and below are the same per group. For example row 3 has a value above of 2 and below of 2 which means the 3 should be 2. The desired output should look like this:
group value
1 A 2
2 A 2
3 A 2
4 A 2
5 A 2
6 A 2
7 A 2
8 A 2
9 B 3
10 B 3
11 B 3
12 B 3
13 B 3
14 B 3
15 B 4
16 B 4
So I was wondering if anyone knows how to replace values when the value above and below are the same like in the example above?
dput data:
df<-structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B"), value = c(2, 2, 3, 2,
1, 2, 2, 2, 3, 3, 3, 4, 3, 3, 4, 4)), class = "data.frame", row.names = c(NA,
-16L))
With ifelse, lead and lag:
library(dplyr)
df %>%
mutate(value = ifelse(lead(value, default = TRUE) == lag(value, default = TRUE),
lag(value), value))
group value
1 A 2
2 A 2
3 A 2
4 A 2
5 A 2
6 A 2
7 A 2
8 A 2
9 B 3
10 B 3
11 B 3
12 B 3
13 B 3
14 B 3
15 B 4
16 B 4

Counter sequential of specific values in R

I have a column like that :
a = c(3, 1, 2, 3, 3, 3, 1, 3, 2, 3, 3, 1, 3, 2, 1, 3, 1)
I want to have a column that counts 1 and 2 sequentially to make a column like this:
a b
1 3 0
2 1 1
3 2 2
4 3 2
5 3 2
6 3 2
7 1 3
8 3 3
9 2 4
10 3 4
11 3 4
12 1 5
13 3 5
14 2 6
15 1 7
16 3 7
We can use cumsum on a logical vector
df1$b <- cumsum(df1$a %in% c(1, 2))
data
df1 <- data.frame(a)

Is there a way to use the lead function to figure out the first row that meets a condition?

Hi I have a dataframe as such,
df= structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4,
2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), d = c(6, 2, 4,
5, 3, 7, 2, 6), e = c(1, 2, 4, 5, 6, 7, 6, 3), f = c(2, 3, 4,
2, 2, 7, 5, 2)), .Names = c("a", "b", "c", "d", "e", "f"), row.names = c(NA,
8L), class = "data.frame")
df$total = apply ( df, 1,sum )
df$row = seq ( 1, nrow ( df ))
so the dataframe looks like this.
> df
a b c d e f total row
1 1 1 6 6 1 2 17 1
2 3 3 3 2 2 3 16 2
3 4 4 6 4 4 4 26 3
4 6 2 5 5 5 2 25 4
5 3 6 3 3 6 2 23 5
6 2 7 6 7 7 7 36 6
7 5 2 5 2 6 5 25 7
8 1 6 3 6 3 2 21 8
what I want to do is figure the first leading row where the total is greater than the current. For example for row 1 the total is 17 and the nearest leading row >= 17 would be row 3.
I could loop through each row but it gets really messy. Is this possible?
thanks in advance.
We can do this in 2 steps with dplyr. First we set grouping to rowwise, which applies the operation on each row (basically it makes it work like we were doing an apply loop through the rows), then we find all the rows where total is larger than that row's total. Then we drop those that come before the current row and pick the first (which is the next one):
library(dplyr)
df %>%
rowwise() %>%
mutate(nxt = list(which(.$total > total)),
nxt = nxt[nxt > row][1])
# A tibble: 8 × 9
# Rowwise:
a b c d e f total row nxt
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 1 1 6 6 1 2 17 1 3
2 3 3 3 2 2 3 16 2 3
3 4 4 6 4 4 4 26 3 6
4 6 2 5 5 5 2 25 4 6
5 3 6 3 3 6 2 23 5 6
6 2 7 6 7 7 7 36 6 NA
7 5 2 5 2 6 5 25 7 NA
8 1 6 3 6 3 2 21 8 NA

Conditional statement within group

I have a dataframe in which I want to make a new column with values based on condition within groups. So for the dataframe below, I want to make a new column n_actions which gives
Cond1. for the whole group GROUP the number 2 if a 6 appears in column STEP
Cond 2. for the whole group GROUP the number 3 if a 9 appears in column STEP
Cond 3. if not a 6 or 9 appears within column STEP for the GROUP, then 1
#dataframe start
dataframe <- data.frame(group = c("A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D", "D", "D", "D", "D", "D"),
step = c(1, 2, 3, 1, 2, 3, 4, 5, 6, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 8, 9))
# dataframe desired
dataframe$n_actions <- c(rep(1, 3), rep(2, 6,), rep(1, 3), rep(3, 9))
Try out:
library(dplyr)
dataframe %>%
group_by(group) %>%
mutate(n_actions = ifelse(9 %in% step, 3,
ifelse(6 %in% step, 2, 1)))
# A tibble: 21 x 3
# Groups: group [4]
group step n_actions
<fctr> <dbl> <dbl>
1 A 1 1
2 A 2 1
3 A 3 1
4 B 1 2
5 B 2 2
6 B 3 2
7 B 4 2
8 B 5 2
9 B 6 2
10 C 1 1
# ... with 11 more rows
Another way with dplyr's case_when:
library(dplyr)
dataframe %>%
group_by(group) %>%
mutate(
n_actions1 = case_when(
9 %in% step ~ 3,
6 %in% step ~ 2,
TRUE ~ 1
)
)
Output:
# A tibble: 21 x 3
# Groups: group [4]
group step n_actions
<fct> <dbl> <dbl>
1 A 1 1
2 A 2 1
3 A 3 1
4 B 1 2
5 B 2 2
6 B 3 2
7 B 4 2
8 B 5 2
9 B 6 2
10 C 1 1
11 C 2 1
12 C 3 1
13 D 1 3
14 D 2 3
15 D 3 3
16 D 4 3
17 D 5 3
18 D 6 3
19 D 7 3
20 D 8 3
21 D 9 3
You could divide the maximum value per group by %/% 3, it seems.
dataframe <- transform(dataframe,
n_actions2 = ave(step, group, FUN = function(x) max(x) %/% 3))
dataframe
# group step n_actions n_actions2
#1 A 1 1 1
#2 A 2 1 1
#3 A 3 1 1
#4 B 1 2 2
#5 B 2 2 2
#6 B 3 2 2
#7 B 4 2 2
#8 B 5 2 2
#9 B 6 2 2
#10 C 1 1 1
#11 C 2 1 1
#12 C 3 1 1
#13 D 1 3 3
#14 D 2 3 3
#15 D 3 3 3
#16 D 4 3 3
#17 D 5 3 3
#18 D 6 3 3
#19 D 7 3 3
#20 D 8 3 3
#21 D 9 3 3

R: new column of row difference from max value of another column according to group

The title of the question may be unclear but I hope these codes will clearly demonstrate my problem.
I have a data frame with three columns. $sensor (A and B); $hour of the day (0-4); and the $value taken by the temperature (1-5).
new.df <- data.frame(
sensor = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
hour_day = c(0:4, 0:4),
value = c(1, 1, 3, 1, 2, 1, 3, 4, 5, 2)
new.df
sensor hour_day value
1 A 0 1
2 A 1 1
3 A 2 3
4 A 3 1
5 A 4 2
6 B 0 1
7 B 1 3
8 B 2 4
9 B 3 5
10 B 4 2
I want to make a new column that indicates the difference in hour from the hour with maximum value according to the sensor.
Desired result
sensor value hour_day hour_from_max_hour
1 A 1 0 -2
2 A 1 1 -1
3 A 3 2 0
4 A 1 3 1
5 A 2 4 2
6 B 1 0 -3
7 B 3 1 -2
8 B 4 2 -1
9 B 5 3 0
10 B 2 4 1
Note that for sensor A (max = hour 2), and sensor B (max = hour 3). I just want a new column that tells me how many hour different is that sensor-value group is from the max sensor-value.
Thank you in advance and please let me know if I can provide more information.
EDIT
Previous answer were very helpful, I forgot that there is one more variable (day) in this problem. Also, some times there is more than one maximum in a column. When this is the case, I would like to base the difference on the first maximum.
df_add <- data.frame(
sensor = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
hour_day = c(0:4, 0:4, 0:4, 0:4),
value = c(1, 1, 3, 3, 2,
3, 2, 4, 4, 1,
1, 5, 6, 6, 2,
2, 1, 3, 3, 1),
day = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
2, 2, 2, 2, 2,
2, 2, 2, 2, 2)
)
df_add
sensor hour_day value day
1 A 0 1 1
2 A 1 1 1
3 A 2 3 1
4 A 3 3 1
5 A 4 2 1
6 B 0 3 1
7 B 1 2 1
8 B 2 4 1
9 B 3 4 1
10 B 4 1 1
11 A 0 1 2
12 A 1 5 2
13 A 2 6 2
14 A 3 6 2
15 A 4 2 2
16 B 0 2 2
17 B 1 1 2
18 B 2 3 2
19 B 3 3 2
20 B 4 1 2
A simple pipe can do it. All you have to do is to get max(value) in the mutate instruction.
new.df %>%
group_by(sensor) %>%
mutate(hour_from_max_hour = hour_day - hour_day[which(value == max(value))[1]])
## A tibble: 10 x 4
## Groups: sensor [2]
# sensor hour_day value hour_from_max_hour
# <fct> <int> <dbl> <int>
# 1 A 0 1. -2
# 2 A 1 1. -1
# 3 A 2 3. 0
# 4 A 3 1. 1
# 5 A 4 2. 2
# 6 B 0 1. -3
# 7 B 1 3. -2
# 8 B 2 4. -1
# 9 B 3 5. 0
#10 B 4 2. 1
library(dplyr)
new.df.2 <-
# First get the hours with the max values
new.df %>%
group_by(sensor) %>%
filter(value == max(value)) %>%
ungroup() %>%
select(sensor, max_hour = hour_day) %>% # This renames hour_day as max_hour
# Now join that to the original table and make the calculation
right_join(new.df) %>%
mutate(hour_from_max_hour = hour_day - max_hour)
Result:
new.df.2
# A tibble: 10 x 5
sensor max_hour hour_day value hour_from_max_hour
<fct> <int> <int> <dbl> <int>
1 A 2 0 1 -2
2 A 2 1 1 -1
3 A 2 2 3 0
4 A 2 3 1 1
5 A 2 4 2 2
6 B 3 0 1 -3
7 B 3 1 3 -2
8 B 3 2 4 -1
9 B 3 3 5 0
10 B 3 4 2 1
This is probably how I would do it:
library(plyr)
dd = ddply(new.df, .(sensor), summarize,
max.value = max(value),
hour.of.max = hour_day[which.max(value)])
new.df = merge(new.df, dd, all.x=T, by='sensor')
new.df$hour_from_max_hour = new.df$hour_day - new.df$hour.of.max
Gave you a couple extra columns, but you can delete them:
sensor hour_day value max.value hour.of.max hour_from_max_hour
1 A 0 1 3 2 -2
2 A 1 1 3 2 -1
3 A 2 3 3 2 0
4 A 3 1 3 2 1
5 A 4 2 3 2 2
6 B 0 1 5 3 -3
7 B 1 3 5 3 -2
8 B 2 4 5 3 -1
9 B 3 5 5 3 0
10 B 4 2 5 3 1

Resources