Replace value when value above and below are the same - r

I have the following dataframe df (dput below):
> df
group value
1 A 2
2 A 2
3 A 3
4 A 2
5 A 1
6 A 2
7 A 2
8 A 2
9 B 3
10 B 3
11 B 3
12 B 4
13 B 3
14 B 3
15 B 4
16 B 4
I would like to replace value when the value above and below are the same per group. For example row 3 has a value above of 2 and below of 2 which means the 3 should be 2. The desired output should look like this:
group value
1 A 2
2 A 2
3 A 2
4 A 2
5 A 2
6 A 2
7 A 2
8 A 2
9 B 3
10 B 3
11 B 3
12 B 3
13 B 3
14 B 3
15 B 4
16 B 4
So I was wondering if anyone knows how to replace values when the value above and below are the same like in the example above?
dput data:
df<-structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B"), value = c(2, 2, 3, 2,
1, 2, 2, 2, 3, 3, 3, 4, 3, 3, 4, 4)), class = "data.frame", row.names = c(NA,
-16L))

With ifelse, lead and lag:
library(dplyr)
df %>%
mutate(value = ifelse(lead(value, default = TRUE) == lag(value, default = TRUE),
lag(value), value))
group value
1 A 2
2 A 2
3 A 2
4 A 2
5 A 2
6 A 2
7 A 2
8 A 2
9 B 3
10 B 3
11 B 3
12 B 3
13 B 3
14 B 3
15 B 4
16 B 4

Related

filling NA with values from another table

I have the following datasets in RStudio:
df =
a b
1 A
1 NA
1 A
1 NA
2 C
2 NA
2 B
3 A
3 NA
3 C
3 D
and fill_with =
a b
1 A
2 B
3 C
How do I fill the NA values in df in the b column according to the a column?
Ex: a=1, b=NA, then I look at the table fill_with at a=1, and I see that I should fill it with b=A.
In the end it should look the following way:
df =
a b
1 A
1 A
1 A
1 A
2 C
2 B
2 B
3 A
3 C
3 C
3 D
We can use ifelse
df$b <- ifelse(is.na(df$b) ,
fill_with$b[match(df$a , fill_with$a)] , df$b)
Output
a b
1 1 A
2 1 A
3 1 A
4 1 A
5 2 C
6 2 B
7 2 B
8 3 A
9 3 C
10 3 C
11 3 D
library(tidyverse)
df <- read_table("a b
1 A
1 NA
1 A
1 NA
2 C
2 NA
2 B
3 A
3 NA
3 C
3 D")
df %>%
group_by(a) %>%
fill(b, .direction = "updown")
# A tibble: 11 x 2
# Groups: a [3]
a b
<dbl> <chr>
1 1 A
2 1 A
3 1 A
4 1 A
5 2 C
6 2 B
7 2 B
8 3 A
9 3 C
10 3 C
11 3 D
Base R
tmp=which(is.na(df$b))
df$b[tmp]=fill_with$b[match(df$a,fill_with$a)[tmp]]
a b
1 1 A
2 1 A
3 1 A
4 1 A
5 2 C
6 2 B
7 2 B
8 3 A
9 3 C
10 3 C
11 3 D
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
a = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3),
b = c("A", NA, "A", NA, "C", NA, "B", "A", NA, "C", "D")
)
fill_with <- data.frame(
stringsAsFactors = FALSE,
a = c(1L, 2L, 3L),
b = c("A", "B", "C")
)
rows_update(x = df, y = fill_with, by = "a")
#> a b
#> 1 1 A
#> 2 1 A
#> 3 1 A
#> 4 1 A
#> 5 2 B
#> 6 2 B
#> 7 2 B
#> 8 3 C
#> 9 3 C
#> 10 3 C
#> 11 3 C
Created on 2022-08-22 with reprex v2.0.2

Create an edges dataframe using the values of in the cells of other dataframe

I have the dataframe below:
dummy<-structure(list(Name = c("A", "B", "C", "A", "B", "C"), `#BISB` = c(2,
6, 4, 0, 4, 6), `#BISC` = c(2, 6, 4, 0, 4, 6)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
and I have created its node file like
nodes<-structure(list(id = c(1, 2, 3, 4, 5), group = c("A", "B", "C",
"#BISB", "#BISC")), class = "data.frame", row.names = c(NA, -5L
))
now given the information from these 2 dfs I would like to create the edges dataframe based on the dummy dataframe logic that every Name (A,B,C) is connected to different departments (#BISB,#BISC) With edge title and label the relative values.it will be like:
from to label arrows title length
1 1 4 2 to 2 300
2 1 5 2 to 2 300
3 1 4 0 to 0 300
4 1 5 0 to 0 300
5 2 4 6 to 6 300
6 2 5 6 to 6 300
7 2 4 4 to 4 300
8 2 5 4 to 4 300
9 3 4 4 to 4 300
10 3 5 4 to 4 300
11 3 4 6 to 6 300
12 3 5 6 to 6 300
>
I'n not sure where all values in your desired output are coming from.. but I believe this is a good startingpoint.
library(data.table)
setDT(dummy)
setDT(nodes)
ans <- melt(dummy, id.vars = "Name", variable.factor = FALSE, value.name = "arrows")
ans[nodes, from := i.id, on = .(Name = group)]
ans[nodes, to := i.id, on = .(variable = group)]
# Name variable arrows from to
# 1: A #BISB 2 1 4
# 2: B #BISB 6 2 4
# 3: C #BISB 4 3 4
# 4: A #BISB 0 1 4
# 5: B #BISB 4 2 4
# 6: C #BISB 6 3 4
# 7: A #BISC 2 1 5
# 8: B #BISC 6 2 5
# 9: C #BISC 4 3 5
#10: A #BISC 0 1 5
#11: B #BISC 4 2 5
#12: C #BISC 6 3 5

Conditional statement within group

I have a dataframe in which I want to make a new column with values based on condition within groups. So for the dataframe below, I want to make a new column n_actions which gives
Cond1. for the whole group GROUP the number 2 if a 6 appears in column STEP
Cond 2. for the whole group GROUP the number 3 if a 9 appears in column STEP
Cond 3. if not a 6 or 9 appears within column STEP for the GROUP, then 1
#dataframe start
dataframe <- data.frame(group = c("A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D", "D", "D", "D", "D", "D"),
step = c(1, 2, 3, 1, 2, 3, 4, 5, 6, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 8, 9))
# dataframe desired
dataframe$n_actions <- c(rep(1, 3), rep(2, 6,), rep(1, 3), rep(3, 9))
Try out:
library(dplyr)
dataframe %>%
group_by(group) %>%
mutate(n_actions = ifelse(9 %in% step, 3,
ifelse(6 %in% step, 2, 1)))
# A tibble: 21 x 3
# Groups: group [4]
group step n_actions
<fctr> <dbl> <dbl>
1 A 1 1
2 A 2 1
3 A 3 1
4 B 1 2
5 B 2 2
6 B 3 2
7 B 4 2
8 B 5 2
9 B 6 2
10 C 1 1
# ... with 11 more rows
Another way with dplyr's case_when:
library(dplyr)
dataframe %>%
group_by(group) %>%
mutate(
n_actions1 = case_when(
9 %in% step ~ 3,
6 %in% step ~ 2,
TRUE ~ 1
)
)
Output:
# A tibble: 21 x 3
# Groups: group [4]
group step n_actions
<fct> <dbl> <dbl>
1 A 1 1
2 A 2 1
3 A 3 1
4 B 1 2
5 B 2 2
6 B 3 2
7 B 4 2
8 B 5 2
9 B 6 2
10 C 1 1
11 C 2 1
12 C 3 1
13 D 1 3
14 D 2 3
15 D 3 3
16 D 4 3
17 D 5 3
18 D 6 3
19 D 7 3
20 D 8 3
21 D 9 3
You could divide the maximum value per group by %/% 3, it seems.
dataframe <- transform(dataframe,
n_actions2 = ave(step, group, FUN = function(x) max(x) %/% 3))
dataframe
# group step n_actions n_actions2
#1 A 1 1 1
#2 A 2 1 1
#3 A 3 1 1
#4 B 1 2 2
#5 B 2 2 2
#6 B 3 2 2
#7 B 4 2 2
#8 B 5 2 2
#9 B 6 2 2
#10 C 1 1 1
#11 C 2 1 1
#12 C 3 1 1
#13 D 1 3 3
#14 D 2 3 3
#15 D 3 3 3
#16 D 4 3 3
#17 D 5 3 3
#18 D 6 3 3
#19 D 7 3 3
#20 D 8 3 3
#21 D 9 3 3

R: new column of row difference from max value of another column according to group

The title of the question may be unclear but I hope these codes will clearly demonstrate my problem.
I have a data frame with three columns. $sensor (A and B); $hour of the day (0-4); and the $value taken by the temperature (1-5).
new.df <- data.frame(
sensor = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
hour_day = c(0:4, 0:4),
value = c(1, 1, 3, 1, 2, 1, 3, 4, 5, 2)
new.df
sensor hour_day value
1 A 0 1
2 A 1 1
3 A 2 3
4 A 3 1
5 A 4 2
6 B 0 1
7 B 1 3
8 B 2 4
9 B 3 5
10 B 4 2
I want to make a new column that indicates the difference in hour from the hour with maximum value according to the sensor.
Desired result
sensor value hour_day hour_from_max_hour
1 A 1 0 -2
2 A 1 1 -1
3 A 3 2 0
4 A 1 3 1
5 A 2 4 2
6 B 1 0 -3
7 B 3 1 -2
8 B 4 2 -1
9 B 5 3 0
10 B 2 4 1
Note that for sensor A (max = hour 2), and sensor B (max = hour 3). I just want a new column that tells me how many hour different is that sensor-value group is from the max sensor-value.
Thank you in advance and please let me know if I can provide more information.
EDIT
Previous answer were very helpful, I forgot that there is one more variable (day) in this problem. Also, some times there is more than one maximum in a column. When this is the case, I would like to base the difference on the first maximum.
df_add <- data.frame(
sensor = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
hour_day = c(0:4, 0:4, 0:4, 0:4),
value = c(1, 1, 3, 3, 2,
3, 2, 4, 4, 1,
1, 5, 6, 6, 2,
2, 1, 3, 3, 1),
day = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
2, 2, 2, 2, 2,
2, 2, 2, 2, 2)
)
df_add
sensor hour_day value day
1 A 0 1 1
2 A 1 1 1
3 A 2 3 1
4 A 3 3 1
5 A 4 2 1
6 B 0 3 1
7 B 1 2 1
8 B 2 4 1
9 B 3 4 1
10 B 4 1 1
11 A 0 1 2
12 A 1 5 2
13 A 2 6 2
14 A 3 6 2
15 A 4 2 2
16 B 0 2 2
17 B 1 1 2
18 B 2 3 2
19 B 3 3 2
20 B 4 1 2
A simple pipe can do it. All you have to do is to get max(value) in the mutate instruction.
new.df %>%
group_by(sensor) %>%
mutate(hour_from_max_hour = hour_day - hour_day[which(value == max(value))[1]])
## A tibble: 10 x 4
## Groups: sensor [2]
# sensor hour_day value hour_from_max_hour
# <fct> <int> <dbl> <int>
# 1 A 0 1. -2
# 2 A 1 1. -1
# 3 A 2 3. 0
# 4 A 3 1. 1
# 5 A 4 2. 2
# 6 B 0 1. -3
# 7 B 1 3. -2
# 8 B 2 4. -1
# 9 B 3 5. 0
#10 B 4 2. 1
library(dplyr)
new.df.2 <-
# First get the hours with the max values
new.df %>%
group_by(sensor) %>%
filter(value == max(value)) %>%
ungroup() %>%
select(sensor, max_hour = hour_day) %>% # This renames hour_day as max_hour
# Now join that to the original table and make the calculation
right_join(new.df) %>%
mutate(hour_from_max_hour = hour_day - max_hour)
Result:
new.df.2
# A tibble: 10 x 5
sensor max_hour hour_day value hour_from_max_hour
<fct> <int> <int> <dbl> <int>
1 A 2 0 1 -2
2 A 2 1 1 -1
3 A 2 2 3 0
4 A 2 3 1 1
5 A 2 4 2 2
6 B 3 0 1 -3
7 B 3 1 3 -2
8 B 3 2 4 -1
9 B 3 3 5 0
10 B 3 4 2 1
This is probably how I would do it:
library(plyr)
dd = ddply(new.df, .(sensor), summarize,
max.value = max(value),
hour.of.max = hour_day[which.max(value)])
new.df = merge(new.df, dd, all.x=T, by='sensor')
new.df$hour_from_max_hour = new.df$hour_day - new.df$hour.of.max
Gave you a couple extra columns, but you can delete them:
sensor hour_day value max.value hour.of.max hour_from_max_hour
1 A 0 1 3 2 -2
2 A 1 1 3 2 -1
3 A 2 3 3 2 0
4 A 3 1 3 2 1
5 A 4 2 3 2 2
6 B 0 1 5 3 -3
7 B 1 3 5 3 -2
8 B 2 4 5 3 -1
9 B 3 5 5 3 0
10 B 4 2 5 3 1

How to perform a cumulative sum with unique IDs only?

I have the following data frame:
d<-data.frame(Day=c(1, 1, 1, 1, 1, 1, 2), ID=c("A", "B", "C", "D", "A", "B", "B"), Value=c(1, 2, 3, 4, 5, 6, 7))
On each day, I would like a cumulative sum of unique values, taking only the most recent value for an entry that repeats. My expected output is as follows:
d<-data.frame(Day=c(1, 1, 1, 1, 1, 1, 2), ID=c("A", "B", "C", "D", "A", "B", "B"), Value=c(1, 2, 3, 4, 5, 6, 7), Sum=c(1, 3, 6, 10, 14, 18, 7))
Day ID Value Sum
1 1 A 1 1
2 1 B 2 3
3 1 C 3 6
4 1 D 4 10
5 1 A 5 14
6 1 B 6 18
7 2 B 7 7
where the 5th entry adds up values 2, 3, 4, 5 (because A repeats) and the 6th entry adds up values 3, 4, 5, and 6 (because both A and B repeat). The 7th entry restarts because it is a new day.
I don't think I can use cumsum() as it only accepts 1 parameter. I also don't want to keep a counter for each ID as I may have up to 100 unique IDs per day.
Any hints or help would be appreciated! Thank you!
You can difference the values by ID and Day and then use cumsum:
library(data.table)
setDT(d)
d[, v_eff := Value - shift(Value, fill=0), by=.(Day, ID)]
d[, s := cumsum(v_eff), by=Day]
Day ID Value Sum v_eff s
1: 1 A 1 1 1 1
2: 1 B 2 3 2 3
3: 1 C 3 6 3 6
4: 1 D 4 10 4 10
5: 1 A 5 14 4 14
6: 1 B 6 18 4 18
7: 2 B 7 7 7 7
Base R analogue...
d$v_eff <- with(d, ave(Value, Day, ID, FUN = function(x) c(x[1], diff(x)) ))
d$s <- with(d, ave(v_eff, Day, FUN = cumsum))
Day ID Value Sum v_eff s
1 1 A 1 1 1 1
2 1 B 2 3 2 3
3 1 C 3 6 3 6
4 1 D 4 10 4 10
5 1 A 5 14 4 14
6 1 B 6 18 4 18
7 2 B 7 7 7 7

Resources