group_by and case_when() function for multiple conditions - r

I'm struggling with a problem in R. I want to create a new variable (qc) by group_by the variable (NAME and PLOT) using case_when for where "EH” > “PH” then give me B else give me Q......
I have a data set like this:
df <- tibble(
NAMEOFEXPERIMENT= c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","B"),
PLOT= c(2,1,2,1,2,1,2,1,2,1,2,1,2,1,2),
trait= c("EH","NP","NP","PH","PH","PL","PL","EH","EH","NP","NP","PH","PH","PL","PL"),
traitValue= c(125,36,36,240,"NA",36,36,90,110,35,33,215,190,36,31)
)
# A tibble: 15 x 4
NAME PLOT trait traitValue
<chr> <dbl> <chr> <chr>
1 A 2 EH 250
2 A 1 NP 36
3 A 2 NP 36
4 A 1 PH 240
5 A 2 PH 200
6 A 1 PL 36
7 A 2 PL 36
8 B 1 EH 90
9 B 2 EH 110
10 B 1 NP 35
11 B 2 NP 33
12 B 1 PH 215
13 B 2 PH 190
14 B 1 PL 36
15 B 2 PL 31
This is what I want to achieve: If “EH” > “PH” then give me B else give me Q
If “PL” > “NP” then give me B else give me Q
Thus, line qc line 4 to be empty since there is no NAME "A", PLOT 1, Trait "EH" to compare with
# A tibble: 15 x 4
NAME PLOT trait traitValue dc
<chr> <dbl> <chr> <chr> <chr>
1 A 2 EH 250 B
2 A 1 NP 36 Q
3 A 2 NP 36 Q
4 A 1 PH 240
5 A 2 PH 200 B
6 A 1 PL 36 Q
7 A 2 PL 36 Q
8 B 1 EH 90 Q
9 B 2 EH 110 Q
10 B 1 NP 35 B
11 B 2 NP 33 Q
12 B 1 PH 215 Q
13 B 2 PH 190 Q
14 B 1 PL 36 B
15 B 2 PL 31 Q
When I run this code
dt2 <- df %>%
group_by(NAME, PLOT) %>%
traitValue[trait == "EH"] > traitValue[trait == "PH"] ~ "B",
traitValue[trait == "EH"] < traitValue[trait == "PH"] ~ "Q",
traitValue[trait == "PL"] > traitValue[trait == "NP"] ~ "B",
traitValue[trait == "PL"] < traitValue[trait == "NP"] ~ "Q"
))
I got this Error
Error in `mutate()`:
! Problem while computing `data_qc = case_when(...)`.
i The error occurred in group 1: NAME = "A", PLOT = 1.
Caused by error in`case_when()`:
! `traitValue[trait == "EH"] > traitValue[trait == "PH"] ~ "B"`, traitValue[trait == "EH"] < traitValue[trait == "PH"] ~ "Q"`
must be length 3 or one, not 0.

I don't fully understand your constraints. You did not specify what would happen if "PH" > "EH" and "PL" > "NP" at the same time. In this case, will the final outcome be "B" or "Q".
However, to get you started I wrote the following code:
## Loading the required libraries
library(dplyr)
library(tidyverse)
## Creating the dataframe
df <- data.frame(
NAMEOFEXPERIMENT= c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","B"),
PLOT= c(2,1,2,1,2,1,2,1,2,1,2,1,2,1,2),
trait= c("EH","NP","NP","PH","PH","PL","PL","EH","EH","NP","NP","PH","PH","PL","PL"),
traitValue= c(125,36,36,240,200,36,36,90,110,35,33,215,190,36,31)
)
## Removing duplicates
unique(df)
## Pivot longer to wider
df %>%
pivot_wider(names_from = trait, values_from = traitValue) %>%
arrange(NAMEOFEXPERIMENT,PLOT) %>%
mutate(ConditionalValue1 = ifelse(EH>PH,"B", "Q"),
ConditionalValue2 = ifelse(PL>NP,"B", "Q"))
Output
# A tibble: 4 x 8
NAMEOFEXPERIMENT PLOT EH NP PH PL ConditionalValue1 ConditionalValue2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 A 1 NA 36 240 36 NA Q
2 A 2 125 36 200 36 Q Q
3 B 1 90 35 215 36 Q B
4 B 2 110 33 190 31 Q Q

Related

Calculating changes from "yesterday's" value in R?

I have data on bird individuals and their feeding locations. The feeding locations move, and so I want to create a variable that calculates the distance from yesterday's feeding location to "today's" feeding options.
Here is a reprex that exemplifies what I'm talking about. So, the 'bird' column represents the bird individual id's, feedLoc represents the the possible feeding locations for each day. Then there is the date of that observation. H (horizontal) and V (vertical) represent coordinate locations of the feeding locations on a grid. And finally, bp represents if that individual was identified at the feeding location or not.
reprex <- tibble(bird = c("A", "A", "A", "B", "B", "B", "C", "C"),
feedLoc = c("x","y", "x", "x", "y", "x", "y", "z"),
date = as.Date(c("2020-05-10", "2020-05-11", "2020-05-11",
"2020-05-24", "2020-05-25", "2020-05-25",
"2020-05-22", "2020-05-23")),
h = c(100, 123, 45, 75, 89, 64, 99, 101),
v = c(89, 23, 65, 92, 29, 90, 120, 34),
bp = c(1, 1, 0, 1, 0, 1, 1, 0))
Which produces this:
# A tibble: 8 × 6
bird feedLoc date h v bp
<chr> <chr> <date> <dbl> <dbl> <dbl>
1 A x 2020-05-10 100 89 1
2 A y 2020-05-11 123 23 1
3 A x 2020-05-11 45 65 0
4 B x 2020-05-24 75 92 1
5 B y 2020-05-25 89 29 0
6 B x 2020-05-25 64 90 1
7 C y 2020-05-22 99 120 1
8 C z 2020-05-23 101 34 0
My question is, I want to make a new variable that calculates the distance from yesterday's feeding choice (so, the rows where bp == 1 AND date == date - 1), to the current feeding location options for each bird individual using the coordinate data. How would I do this? Thanks!
I initially tried to group by bird and feedLoc id's, arrange by date, and then lag the h and v variables so that I could then use the distance formula to calculate distance from yesterday's ant swarm choice. However, the issue with that is that in the data set, the row previous when arranged is not always exactly "yesterday".
Create a dataframe filtered to bp == 1, add 1 to the date to match rows to the next day, then left_join() to your original data to compute distances:
library(dplyr)
yesterday <- reprex %>%
filter(bp == 1) %>%
transmute(bird, date = date + 1, h.yest = h, v.yest = v)
reprex %>%
left_join(yesterday) %>%
mutate(
dist = sqrt((h - h.yest)^2 + (v - v.yest)^2)
) %>%
select(!h.yest:v.yest)
# A tibble: 8 × 7
bird feedLoc date h v bp dist
<chr> <chr> <date> <dbl> <dbl> <dbl> <dbl>
1 A x 2020-05-10 100 89 1 NA
2 A y 2020-05-11 123 23 1 69.9
3 A x 2020-05-11 45 65 0 60.0
4 B x 2020-05-24 75 92 1 NA
5 B y 2020-05-25 89 29 0 64.5
6 B x 2020-05-25 64 90 1 11.2
7 C y 2020-05-22 99 120 1 NA
8 C z 2020-05-23 101 34 0 86.0
Try something like this dplyr approach, which first restricts the manipulation to just bp == 1 then checks to see if the feeding location is different and the previous date is one day behind (date == date - 1) then calculates the difference for h and y. After all that it adds back in the bp == 0 data and rearranges (this approach saves a more convoluted case_when statement. If this isn't exactly what you need post an example of the desired output and I will edit. Good luck!
library(dplyr)
reprex %>%
group_by(bird) %>%
filter(bp == 1) %>%
arrange(date) %>%
mutate(h_change = case_when(
feedLoc != lag(feedLoc) & lag(date) == date - 1 ~ h - lag(h)),
v_change = case_when(
feedLoc != lag(feedLoc) & lag(date) == date - 1 ~ v - lag(v)
)) %>%
right_join(reprex) %>% arrange(bird, date)
Output:
# bird feedLoc date h v bp h_change v_change
# <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A x 2020-05-10 100 89 1 NA NA
# 2 A y 2020-05-11 123 23 1 23 -66
# 3 A x 2020-05-11 45 65 0 NA NA
# 4 B x 2020-05-24 75 92 1 NA NA
# 5 B x 2020-05-25 64 90 1 NA NA
# 6 B y 2020-05-25 89 29 0 NA NA
# 7 C y 2020-05-22 99 120 1 NA NA
# 8 C z 2020-05-23 101 34 0 NA NA

Identify pairs or groups of rows that have the same values across multiple columns

Say I have a data.frame:
file = read.table(text = "sex age num
M 32 5
F 31 2
M 91 2
M 30 1
M 23 1
F 19 1
F 31 2
F 21 2
M 32 5
F 65 3
M 24 5", header = T, sep = "")
I want to get a sorted data frame of all rows that have the exact same values of sex, age, and num with any other row in the data frame.
The result should look like this (note that the data frame is sorted by the pairs or groups that are duplicated with each other):
result = read.table(text = "sex age num
M 32 5
M 32 5
F 31 2
F 31 2", header = T, sep = "")
I have tried various combinations of distinct in dplyr and duplicated, but they don't quite get at this use case.
We need duplicated twice i.e. one duplicated in the normal direction from up to bottom and second from bottom to top (fromLast = TRUE) and then use | so that it can be TRUE in either direction for subsetting
out <- file[duplicated(file)|duplicated(file, fromLast = TRUE),]
out$sex <- factor(out$sex, levels = c("M", "F"))
out1 <- out[do.call(order, out),]
row.names(out1) <- NULL
-output
> out1
sex age num
1 M 32 5
2 M 32 5
3 F 31 2
4 F 31 2
The above can be written in tidyverse
library(dplyr)
file %>%
arrange(sex == "F", across(everything())) %>%
filter(duplicated(.)|duplicated(., fromLast = TRUE))
sex age num
1 M 32 5
2 M 32 5
3 F 31 2
4 F 31 2
An alternative approach:
Here all groups with more then 1 nrow will be kept:
library(dplyr)
file %>%
group_by(sex, age, num) %>%
filter(n() > 1) %>%
arrange(.by_group = T)
ungroup()
sex age num
<chr> <int> <int>
1 F 31 2
2 F 31 2
3 M 32 5
4 M 32 5
file = read.table(text = "sex age num
M 32 5
F 31 2
M 91 2
M 30 1
M 23 1
F 19 1
F 31 2
F 21 2
M 32 5
F 65 3
M 24 5", header = T, sep = "")
library(vctrs)
library(dplyr, warn = F)
#> Warning: package 'dplyr' was built under R version 4.1.2
file %>%
filter(vec_duplicate_detect(.)) %>%
arrange(across(everything()))
#> sex age num
#> 1 F 31 2
#> 2 F 31 2
#> 3 M 32 5
#> 4 M 32 5
Created on 2022-08-19 by the reprex package (v2.0.1.9000)
A base R option using subset + ave
> subset(file, ave(seq_along(num), sex, age, num, FUN = length) > 1)
sex age num
1 M 32 5
2 F 31 2
7 F 31 2
9 M 32 5
or rbind + split
> do.call(rbind, Filter(function(x) nrow(x) > 1, split(file, ~ sex + age + num)))
sex age num
F.31.2.2 F 31 2
F.31.2.7 F 31 2
M.32.5.1 M 32 5
M.32.5.9 M 32 5
Here is an approach, using .SD[.N>1] by group in data.table
library(data.table)
result = setDT(file)[, i:=.I][, .SD[.N>1],.(sex,age,num)][, i:=NULL]
Output:
sex age num
1: M 32 5
2: M 32 5
3: F 31 2
4: F 31 2

Dplyr, join successive dataframes to pre-existing columns, summing their values

I want to perform multiple joins to original dataframe, from the same source with different IDs each time. Specifically I actually only need to do two joins, but when I perform the second join, the columns being joined already exist in the input df, and rather than add these columns with new names using the .x/.y suffixes, I want to sum the values to the existing columns. See the code below for the desired output.
# Input data:
values <- tibble(
id = LETTERS[1:10],
variable1 = 1:10,
variable2 = (1:10)*10
)
df <- tibble(
twin_id = c("A/F", "B/G", "C/H", "D/I", "E/J")
)
> values
# A tibble: 10 x 3
id variable1 variable2
<chr> <int> <dbl>
1 A 1 10
2 B 2 20
3 C 3 30
4 D 4 40
5 E 5 50
6 F 6 60
7 G 7 70
8 H 8 80
9 I 9 90
10 J 10 100
> df
# A tibble: 5 x 1
twin_id
<chr>
1 A/F
2 B/G
3 C/H
4 D/I
5 E/J
So this is the two joins:
joined_df <- df %>%
tidyr::separate(col = twin_id, into = c("left_id", "right_id"), sep = "/", remove = FALSE) %>%
left_join(values, by = c("left_id" = "id")) %>%
left_join(values, by = c("right_id" = "id"))
> joined_df
# A tibble: 5 x 7
twin_id left_id right_id variable1.x variable2.x variable1.y variable2.y
<chr> <chr> <chr> <int> <dbl> <int> <dbl>
1 A/F A F 1 10 6 60
2 B/G B G 2 20 7 70
3 C/H C H 3 30 8 80
4 D/I D I 4 40 9 90
5 E/J E J 5 50 10 100
And this is the output I want, using the only way I can see to get it:
output_df_wanted <- joined_df %>%
mutate(
variable1 = variable1.x + variable1.y,
variable2 = variable2.x + variable2.y) %>%
select(twin_id, left_id, right_id, variable1, variable2)
> output_df_wanted
# A tibble: 5 x 5
twin_id left_id right_id variable1 variable2
<chr> <chr> <chr> <int> <dbl>
1 A/F A F 7 70
2 B/G B G 9 90
3 C/H C H 11 110
4 D/I D I 13 130
5 E/J E J 15 150
I can see how to get what I want using a mutate statement, but I will have a much larger number of variables in the actually dataset. I am wondering if this is the best way to do this.
You can try reshaping your data and using dplyr::summarise_at:
library(tidyr)
library(dplyr)
df %>%
separate(col = twin_id, into = c("left_id", "right_id"), sep = "/", remove = FALSE) %>%
pivot_longer(-twin_id) %>%
left_join(values, by = c("value" = "id")) %>%
group_by(twin_id) %>%
summarise_at(vars(starts_with("variable")), sum) %>%
separate(col = twin_id, into = c("left_id", "right_id"), sep = "/", remove = FALSE)
## A tibble: 5 x 5
# twin_id left_id right_id variable1 variable2
# <chr> <chr> <chr> <int> <dbl>
#1 A/F A F 7 70
#2 B/G B G 9 90
#3 C/H C H 11 110
#4 D/I D I 13 130
#5 E/J E J 15 150
You can use my package safejoin if it's acceptable to you to use a github package.
The idea is that you have conflicting columns, dplyr and base R deal with conflict by renaming them while safejoin is more flexible, you can use the function you want to apply in case of conflicts. Here you want to add them so we'll use conflict = `+`, for the same effect you could have used conflict = ~ .x + .y or conflict = ~ ..1 + ..2.
# remotes::install_github("moodymudskipper/safejoin")
library(tidyverse)
library(safejoin)
values <- tibble(
id = LETTERS[1:10],
variable1 = 1:10,
variable2 = (1:10)*10
)
df <- tibble(
twin_id = c("A/F", "B/G", "C/H", "D/I", "E/J")
)
joined_df <- df %>%
tidyr::separate(col = twin_id, into = c("left_id", "right_id"), sep = "/", remove = FALSE) %>%
left_join(values, by = c("left_id" = "id")) %>%
safe_left_join(values, by = c("right_id" = "id"), conflict = `+`)
joined_df
#> # A tibble: 5 x 5
#> twin_id left_id right_id variable1 variable2
#> <chr> <chr> <chr> <int> <dbl>
#> 1 A/F A F 7 70
#> 2 B/G B G 9 90
#> 3 C/H C H 11 110
#> 4 D/I D I 13 130
#> 5 E/J E J 15 150
Created on 2020-04-29 by the reprex package (v0.3.0)

Dynamic Columns in Dplyr using NSE on the RHS

I am attempting to reference existing columns in dplyr through a loop. Effectively, I would like to evaluate the operations from one table (evaluation in below example) to be performed to another table (dt in below example). I do not want to hardcode the column names on the RHS within mutate(). I would like to control the evaluations being performed from the evaluation table below. So I am trying to make the process dynamic.
Here is a sample dataframe:
dt = data.frame(
A = c(1:20),
B = c(11:30),
C = c(21:40),
AA = rep(1, 20),
BB = rep(2, 20)
)
Here is a table of sample operations to be performed:
evaluation = data.frame(
New_Var = c("AA", "BB"),
Operation = c("(A*2) > B", "(B*2) <= C"),
Result = c("True", "False")
) %>% mutate_all(as.character)
What I am trying to do is the following:
for (i in 1:nrow(evaluation)) {
var = evaluation$New_Var[i]
dt = dt %>%
rowwise() %>%
mutate(!!var := ifelse(eval(parse(text = evaluation$Operation[i])),
evaluation$Result[i],
!!var))
}
my desired result would be something like this except for the "AA" in the AA column would be the original numeric values of the AA column of 1, 1, 1, 1, 1.
UPDATED:
I believe my syntax in the "False" part of the ifelse statement is incorrect. What is the correct syntax to specify "!!var" in the false portion of the ifelse statement?
I know there are other ways to do it using base R, but I would rather do it through dplyr as it is cleaner code to look at. I am leveraging "rowise()" to do it element by element.
Modified data to (a) enforce type consistency for columns AA and BB and (b) ensure that at least one row satisfies the second condition.
dt = tibble(
A = c(1:20),
B = c(10:29), ## Note the change
C = c(21:40),
AA = rep("a", 20), ## Note initialization with strings
BB = rep("b", 20) ## Ditto
)
To make your loop work, you need to convert your code strings into actual expressions. You can use rlang::sym() for variable names and rlang::parse_expr() for everything else.
for( i in 1:nrow(evaluation) )
{
var <- rlang::sym(evaluation$New_Var[i])
op <- rlang::parse_expr(evaluation$Operation[i])
dt = dt %>% rowwise() %>%
mutate(!!var := ifelse(!!op, evaluation$Result[i],!!var))
}
# # A tibble: 20 x 5
# A B C AA BB
# <int> <int> <int> <chr> <chr>
# 1 1 10 21 a False
# 2 2 11 22 a False
# 3 3 12 23 a b
# 4 4 13 24 a b
# 5 5 14 25 a b
# 6 6 15 26 a b
# 7 7 16 27 a b
# 8 8 17 28 a b
# 9 9 18 29 a b
# 10 10 19 30 True b
# 11 11 20 31 True b
# 12 12 21 32 True b
# 13 13 22 33 True b
# 14 14 23 34 True b
# 15 15 24 35 True b
# 16 16 25 36 True b
# 17 17 26 37 True b
# 18 18 27 38 True b
# 19 19 28 39 True b
# 20 20 29 40 True b
Assuming that Felipe's answer was the functionality you desired, here's a more "tidyverse"/pipe-oriented/functional approach.
Data
library(rlang)
library(dplyr)
library(purrr)
operations <- tibble(
old_var = exprs(A, B),
new_var = exprs(AA, BB),
test = exprs(2*A > B, 2*B <= C),
result = exprs("True", "False")
)
original <- tibble(
A = sample.int(30, 10),
B = sample.int(30, 10),
C = sample.int(30, 10)
)
original
# A tibble: 10 x 3
A B C
<int> <int> <int>
1 4 20 5
2 30 29 11
3 1 27 14
4 2 21 4
5 17 19 24
6 14 25 9
7 5 22 22
8 6 13 7
9 25 4 21
10 12 11 12
Functions
# Here's your reusable functions
generic_mutate <- function(dat, new_var, test, result, old_var) {
dat %>% mutate(!!new_var := ifelse(!!test, !!result, !!old_var))
}
generic_ops <- function(dat, ops) {
pmap(ops, generic_mutate, dat = dat) %>%
reduce(full_join)
}
generic_mutate takes a single original dataframe, a single new_var, etc. It performs the test, adds the new column with the appropriate name and values.
generic_ops is the "vectorized" version. It takes the original dataframe as the first argument, and a dataframe of operations as the second. It then parallel maps over each column of new variable names, tests, etc, and calls generic_mutate on each one. That results in a list of dataframes, each with one added column. The reduce then combines them back all together with a sequential full_join.
Results
original %>%
generic_ops(operations)
Joining, by = c("A", "B", "C")
# A tibble: 10 x 5
A B C AA BB
<int> <int> <int> <chr> <chr>
1 4 20 5 4 20
2 30 29 11 True 29
3 1 27 14 1 27
4 2 21 4 2 21
5 17 19 24 True 19
6 14 25 9 True 25
7 5 22 22 5 22
8 6 13 7 6 13
9 25 4 21 True False
10 12 11 12 True 11
The magic here is using exprs(...) so you can store NSE names and operations in a tibble without forcing their evaluation. I think this is a lot cleaner than storing names and operations in strings with quotation marks.
How's this:
evaluation = data.frame(
Old_Var = c('A', 'B'),
New_Var = c("AA", "BB"),
Operation = c("(A*2) > B", "(B*2) <= C"),
Result = c("True", "False")
) %>% mutate_all(as.character)
for (i in 1:nrow(evaluation)) {
old <- sym(evaluation$Old_Var[i])
new <- sym(evaluation$New_Var[i])
op <- sym(evaluation$Operation[i])
res <- sym(evaluation$Result[i])
dt <- dt %>%
mutate(!!new := ifelse(!!op, !!res, !!old))
}
EDIT: My last answer doesn't work because rlang tries to find a variable named !!op (e.g. named (A*2) > B) instead of evaluating the expression. I got this to work using a mix of tidyselect and base R. You can of course follow #Brian's advice and use this solution with pmap. I honestly don't know how well this will perform though, as I think it will evaluate the ifelse once per row, and am not sure it's a vectorized operation...
dt <- tibble(
A = c(1:20),
B = c(11:30),
C = c(21:40),
AA = rep(1, 20),
BB = rep(2, 20)
)
evaluation = tibble(
Old_Var = c('A', 'B'),
New_Var = c("AA", "BB"),
Operation = c('(A*2) > B', '(B*2) <= C'),
Result = c("True", "False")
)
for (i in 1:nrow(evaluation)) {
old <- evaluation$Old_Var[i]
new <- evaluation$New_Var[i]
op <- evaluation$Operation[i]
res <- evaluation$Result[i]
dt <- dt %>%
mutate(!!sym(new) := eval(parse(text = sprintf('ifelse(%s, "%s", %s)', op, res, old))))
}
One way is to rework the conditions first, then pass them to mutate :
conds <- parse(text=evaluation$Operation) %>%
as.list() %>%
setNames(evaluation$New_Var) %>%
imap(~expr(ifelse(!!.,"True", !!sym(.y))))
conds
#> $AA
#> ifelse((A * 2) > B, "True", AA)
#>
#> $BB
#> ifelse((B * 2) <= C, "True", BB)
dt %>% mutate(!!!conds)
#> A B C AA BB
#> 1 1 11 21 1 2
#> 2 2 12 22 1 2
#> 3 3 13 23 1 2
#> 4 4 14 24 1 2
#> 5 5 15 25 1 2
#> 6 6 16 26 1 2
#> 7 7 17 27 1 2
#> 8 8 18 28 1 2
#> 9 9 19 29 1 2
#> 10 10 20 30 1 2
#> 11 11 21 31 True 2
#> 12 12 22 32 True 2
#> 13 13 23 33 True 2
#> 14 14 24 34 True 2
#> 15 15 25 35 True 2
#> 16 16 26 36 True 2
#> 17 17 27 37 True 2
#> 18 18 28 38 True 2
#> 19 19 29 39 True 2
#> 20 20 30 40 True 2

Substract value from tibble column based on another tibble

Say I have a tibble of values:
raw = tibble(
group = c("A", "B", "C", "A", "B", "C"),
value = c(10, 20, 30, 40, 50, 60)
)
# A tibble: 6 x 2
group value
<chr> <dbl>
1 A 10
2 B 20
3 C 30
4 A 40
5 B 50
6 C 60
I want to subtract a certain amount from each value in my tibble depending on which group it belongs to. The amounts I need to subtract are in another tibble:
corrections = tibble(
group = c("A", "B", "C"),
corr = c(0, 1, 2)
)
# A tibble: 3 x 2
group corr
<chr> <dbl>
1 A 0
2 B 1
3 C 2
What is the most elegant way to achieve this? The following works, but I feel like it is messy - surely there is another way?
mutate(raw, corrected = value - as_vector(corrections[corrections["group"] == group, "corr"]))
# A tibble: 6 x 3
group value corrected
<chr> <dbl> <dbl>
1 A 10 10
2 B 20 19
3 C 30 28
4 A 40 40
5 B 50 49
6 C 60 58
How about first joining raw and corrections and then calculating corrected?
library(dplyr)
left_join(raw, corrections, by = "group") %>%
mutate(corrected = value - corr) %>%
select(-corr)
#> # A tibble: 6 x 3
#> group value corrected
#> <chr> <dbl> <dbl>
#> 1 A 10 10
#> 2 B 20 19
#> 3 C 30 28
#> 4 A 40 40
#> 5 B 50 49
#> 6 C 60 58

Resources