Declaring variables inside mutate - r

I am trying to declare the variables inside mutate using all_of but not getting proper output
asd <- data.frame(Col1 = c("A","B"), Col2 = c("R","E"))
a1 <- "Col1"
When I perform below operations, I get invalid output
asd %>% mutate(q1 = case_when(all_of(a1) == "A" ~ 1))
Col1 Col2 a1
1 A R NA
2 B E NA
Expected Output
asd %>% mutate(q1 = case_when(Col1 == "A" ~ 1))
Col1 Col2 q1
1 A R 1
2 B E NA

Or we could use glue::glue just bear in mind that whatever you put inside curly braces will be evaluate as R code:
library(glue)
asd %>%
mutate(q1 = case_when(
eval(parse(text = glue("{a1}"))) == "A" ~ 1
))
Col1 Col2 q1
1 A R 1
2 B E NA

Wrap it in get()
R> asd %>% mutate(q1 = case_when(all_of(get(a1)) == "A" ~ 1))
Col1 Col2 q1
1 A R 1
2 B E NA

We could use across
library(dplyr)
library(stringr)
asd %>%
mutate(across(all_of(a1), ~ case_when(. == 'A' ~ 1),
.names = "{str_replace(.col, '.*', 'q1')}"))
Col1 Col2 q1
1 A R 1
2 B E NA

Related

how to simplify repetitive mutate conditions

I have an example df:
df <- data.frame(
col1 = c(4,5,6,11),
col2 = c('b','b','c', 'b')
)
> df
col1 col2
1 4 b
2 5 b
3 6 c
4 11 b
and I mutate based on these conditions:
df2 <- df %>%
mutate(col3 = case_when(
col2 == 'b' & col1 == 4 ~ 10,
col2 == 'b' & col1 == 5 ~ 15,
col2 == 'b' & col1 == 11 ~ 20,
col2 == 'c' & col1 == 6 ~ 7)
)
> df2
col1 col2 col3
1 4 b 10
2 5 b 15
3 6 c 7
4 11 b 20
You can see that the first 3 conditions are repetitive in that they require col2 == 'b'. Is there some syntax or another package or more efficient way of combining same/similar conditions so that I don't need to repeat col2 == 'b'? Like a one liner that if col2 == 'b' then do these transformations.
You can write nested case_when
df %>% mutate(col3 = case_when(
col2=="b" ~ case_when(
col1 == 4 ~ 10,
col1 == 5 ~ 15,
col1 == 11 ~ 20),
col2=="c" ~ case_when(
col1 == 6 ~ 7)))
Another solution colud be using a auxiliary table, this way however will limit the flexibility of case_when and only works for equality matches but i suspect is a lot faster.
library(dplyr)
df <- data.frame(
col1 = c(4,5,6,11),
col2 = c('b','b','c', 'b')
)
choices <- data.frame(
col2 = c(rep("b",3),"c"),
col1 = c(4,5,11,6),
col3 = c(10,15,20,7)
)
df %>% left_join(choices, by = c("col1"="col1", "col2"="col2"))
#> col1 col2 col3
#> 1 4 b 10
#> 2 5 b 15
#> 3 6 c 7
#> 4 11 b 20
Be sure to capture the default unmatched cases.

User Defined Function with Case When Reference

I'm trying to define a function that references a value in another cell according to its row (if that makes sense). Some generalized data would look like this:
col1 col2 col3
A 1 B
B 2 C
C 6 NA
My goal is the below, where "calc" is a sum of col2 from the respective row and the col2 value from the referenced row
col1 col2 calc
A 1 3
B 2 8
C 6 6
letters <- c("A","B","C")
calc<- case_when(data$col1 == "A" ~ subset(data$col2, data$col3 == "A"),
data$col1 == "B" ~ subset(data$col2, data$col3 == "B"),
data$col1 == "C" ~ subset(data$col2, data$col3 == "C"))
totals<- data.frame(cbind(data$col2,calc)) %>% rowSums(., na.rm = TRUE)
data.frame(cbind(data$col1,totals))
I'm not sure how to turn this into a function -- I tried the below,
udfunction<- function(x){
calc<- case_when(data$col1 == x ~ subset(data$col2, data$col3 == x))
totals<- data.frame(cbind(data$col1,data$col2,calc)) %>% rowSums(., na.rm = TRUE)
data.frame(cbind(data$col1,totals))
}
udfunction(letters)
then got the error:
In col1 == x : longer object length is not a multiple of shorter object length
udfunction(letters)
Any help would be very appreciated!
transform(df, calc = col2 + c(0, col2)[match(col3, col1, 0) + 1])
col1 col2 col3 calc
1 A 1 B 3
2 B 2 C 8
3 C 6 <NA> 6

R - identify cols that contain any of a values set

I have a dataframe like this
df <- data.frame(col1 = c(letters[1:4],"a"),col2 = 1:5,col3 = letters[10:14])
df
col1 col2 col3
1 a 1 j
2 b 2 k
3 c 3 l
4 d 4 m
5 a 5 n
I would like to identify the columns that contain any value from the following vector:
vals=c("a","b","n","w")
A tidy solution would be awesome!
We may use select
library(dplyr)
df %>%
select(where(~ any(. %in% vals, na.rm = TRUE)))
-output
col1 col3
1 a j
2 b k
3 c l
4 d m
5 a n
A similar option in base R is with Filter
Filter(\(x) any(x %in% vals, na.rm = TRUE), df)
col1 col3
1 a j
2 b k
3 c l
4 d m
5 a n
Another tidyverse option is to use keep() from purrr.
library(purrr)
df %>%
keep( ~ any(.x %in% vals))

dplyr ifelse mutate reference to variable outside the data frame

I have a simple problem but i haven't figured out the solution yet. I don't know how to reference to a variable outside the data frame when I'm using dplyr. Here is a small chunk of code:
library(dplyr)
var <- 1
df <- data.frame(col1 = c("a", "b", "c"), col2 = c(1, 2, 3))
df %>% mutate(col2 = ifelse(var == 1, col2 + var, col2))
Result:
col1 col2
1 a 2
2 b 2
3 c 2
Desired output:
col1 col2
1 a 2
2 b 3
3 c 4
This is not a dplyr specific issue but when you have a condition to check of length 1 use if and else instead of vectorized ifelse.
library(dplyr)
df %>% mutate(col2 = if(var == 1) col2 + var else col2)
# col1 col2
#1 a 2
#2 b 3
#3 c 4
We could use rowwise and sum
df %>%
rowwise() %>%
mutate(col2 = ifelse(var == 1, sum(col2,var), col2))
col1 col2
<chr> <dbl>
1 a 2
2 b 3
3 c 4
We could use base R for this
i1 <- df$col2 == var
df$col2[i1] <- df$col2[i1] + var
-output
> df
col1 col2
1 a 2
2 b 2
3 c 3
Or use data.table
library(data.table)
setDT(df)[col2 == var, col2 := col2 + var]

Variable names as Input in an R Function

I have a dataframe with several numeric variables along with factors. I wish to run over the numeric variables and replace the negative values to missing. I couldn't do that.
My alternative idea was to write a function that gets a dataframe and a variable, and does it. It didn't work either.
My code is:
NegativeToMissing = function(df,var)
{
df$var[df$var < 0] = NA
}
Error in $<-.data.frame(`*tmp*`, "var", value = logical(0)) : replacement has 0 rows, data has 40
what am I doing wrong ?
Thank you.
Here is an example with some dummy data.
df1 <- data.frame(col1 = c(-1, 1, 2, 0, -3),
col2 = 1:5,
col3 = LETTERS[1:5])
df1
# col1 col2 col3
#1 -1 1 A
#2 1 2 B
#3 2 3 C
#4 0 4 D
#5 -3 5 E
Now find columns that are numeric
numeric_cols <- sapply(df1, is.numeric)
And replace negative values
df1[numeric_cols] <- lapply(df1[numeric_cols], function(x) replace(x, x < 0 , NA))
df1
# col1 col2 col3
#1 NA 1 A
#2 1 2 B
#3 2 3 C
#4 0 4 D
#5 NA 5 E
You could also do
df1[df1 < 0] <- NA
With tidyverse, we can make use of mutate_if
library(tidyverse)
df1 %>%
mutate_if(is.numeric, funs(replace(., . < 0, NA)))
If you still want to change only one selected variable a solution withdplyr would be to use non-standard evaluation:
library(dplyr)
NegativeToMissing <- function(df, var) {
quo_var = quo_name(var)
df %>%
mutate(!!quo_var := ifelse(!!var < 0, NA, !!var))
}
NegativeToMissing(data, var=quo(val1)) # use quo() function without ""
# val1 val2
# 1 1 1
# 2 NA 2
# 3 2 3
Data used:
data <- data.frame(val1 = c(1, -1, 2),
val2 = 1:3)
data
# val1 val2
# 1 1 1
# 2 -1 2
# 3 2 3

Resources