Check if single column is equal to any multiple others - r

My question seems simple, but I just can't do it. I have a dataframe with multiple columns with the name starting with coa and another column p with values like A, D, F, and so on, which changes according to the id.
All I found is how to do this matching with a fixed value, let's say "A", as below:
df <-df %>%
mutate(ly = any(str_detect(c_across(starts_with("coa")), "A")))
However, in my case, I want to compare to the column p specifically, where p changes, something like this:
df <-df %>%
mutate(ly = any(str_detect(c_across(starts_with("coa")), p)))
In this case, I get the error:
x no applicable method for 'type' applied to an object of class "factor"
Any thoughts? Thanks!

If we need to create a column, use if_any
library(dplyr)
library(stringr)
df <- df %>%
mutate(ly = if_any(starts_with("coa"), ~ str_detect(.x, p)))

I think this is a good place to use dplyr::across. You can run vignette('colwise') for a more comprehensive guide, but the key point here is that we can mutate all columns starting with "coa" simultaneously using the function == and we can pass a second argument, p, to == using the ... option provided by across.
library(dplyr)
df <- tibble(p = 1:10, coa1 = 1:10, coa2 = 11:20)
df %>%
mutate(across(.cols = starts_with('coa'), .fns = `==`, p))

Related

How to get grouping variables and tibble to a function (possibly group_nest)?

Suppose I have the following code:
df <- data.frame(a=c(1,2,3), b=c(4,5,6), c=c(7,8,9))
func <- function(...) {
the_args <- list(...)
data <- the_args[[1]]
message(names(data))
}
Now I want to make three calls to func, one for each distinct value of a. I thought maybe group_nest was my friend, but not quite:
# func gets all rows instead of one group at a time
df %>% group_nest(a) %>% func()
# func gets one group at a time, but without a
df %>% group_nest(a) %>% mutate(result=map(data, func))
I'd like func to be called three times (one for each distinct value of a), each time with all three columns (a, b, c).
Suggestions?
EDIT: If I knew the grouping in advance, I could hardcode it in advance:
df %>% group_nest(a) %>% mutate(result=map(data, func, a))
and inside the function I could set a <- the_args[[2]]
However, I want a result that is resilient to different groupings, and passes a complete data frame (data and grouping columns put together), so func doesn't have to know anything about how to assemble data.
EDIT 2: My actual use case has grouping columns specified more generally, i.e., something like
grouping_cols <- c('a')
df %>% group_nest(across(all_of(grouping_cols))) %>%
mutate(result=map(data, func))
For the simplest case, you can just
df %>% group_nest(a_ = a)
And as pointed out by the OP, you can also use a variable for more generic cases
df %>% group_nest(foo = across(all_of(grouping_cols)))
Another alternative would be
df %>%
mutate(across(!!grouping_cols, `(`, .names = "{.col}_")) %>%
group_nest(across(paste0(grouping_cols, "_"))

multivariable function in long-format dataframe in R

Calculating a function of multiple variables for a dataframe in wide format is very familiar:
library(tidyverse)
df <- tibble(t = 1:3, b = 11:13, c = 21:23)
df <- df %>% mutate(d = b + c) # or base R: df$d <- df$b + df$c
What about when the dataframe is in long format? e.g.
df <- df %>% pivot_longer(-t, names_to = "variable", values_to = "value")
In this long format, you could imagine the same operation working by first group_by(t), and then calculating one value of d for each group, namely that group's variable=b value plus that group's variable=c value. Is this possible? One might think of something like summarise(d = b + c) but that expects wide format.
NB my real-world example has more than two cols b and c and I want to put them into a defined function, not just add them. My working solution is pivoting a huge dataframe from long to wide, calling my multivariable function to define a new column, then pivoting back to long.
Edit: to make the real world example explicit, I need to call a defined function that treats its arguments differently, unlike sum. For example
my.func <- function(b, c) { b^c }
How could the variable d be calculated by applying this function to the values of b and c associated with the same value of t?
We can just do sum instead of +
library(dplyr)
library(tidyr)
df %>%
group_by(t) %>%
summarise(d =sum(value[variable %in% c('b', 'c')]))
If it is to apply the my.func, we need to extract the value that correspond to 'b', 'c'
df %>%
group_by(t) %>%
mutate(new = my.func(value[variable == 'b'], value[variable == 'c']))

Passing variable in function to other function variables in R

I am trying to pass a variable Phyla (which is also the name of a df column of interest) into other functions. However I get the error: Error: Columntax_levelis unknown. Which I understand. It would just be more convenient to state the column you want to use once in the function since this will also be repeated numerous times in the script. I Have tried using OTU_melt_grouped[,1] since this will always be the first column to use in the dcast function, but get the error: Error: Must use a vector in[, not an object of class matrix. Moreover, it does not solve my solution in the group_by function since I want to be able to specify Phyla, Class, Order etc...
I am sure there must be a simple solution, but I don't know where to start!
taxa_specific_columns_func <- function(data, tax_level = Phyla) {
OTU_melt_grouped <- data %>%
group_by(tax_level, variable) %>%
summarise(value = sum(value))
taxa_cols <- dcast(OTU_melt_grouped, variable ~ tax_level)
rownames(taxa_cols) <- meta_data$site
taxa_cols <- taxa_cols[-1]
return(taxa_cols)
}
tax_test <- taxa_specific_columns_func(OTU_melt)
As we are passing an unquoted variable, we could make use of curly-curly ({{..}}) operator in group_by
library(dplyr)
library(tidyr)
library(tibble)
taxa_specific_columns_func <- function(data, tax_level = Phyla) {
data %>%
group_by({{tax_level}}, variable) %>%
summarise(value = sum(value)) %>%
pivot_wider(names_from = {{tax_level}}, values_from = value) %>%
column_to_rownames("variable")
}
taxa_specific_columns_func(OTU_melt)
# A B C D E
#a 0.01859254 0.42141238 -0.196961 -0.1859115 -0.2901680
#b -0.64700080 NA -0.161108 NA NA
#c -0.03297331 0.05871052 -1.963341 NA 0.7608218
data
set.seed(48)
OTU_melt <- data.frame(Phyla = rep(LETTERS[1:5], each = 3),
variable = sample(letters[1:3], 15, replace = TRUE), value = rnorm(15))

How can I simultaneously assign value to multiple new columns with R and dplyr?

Given
base <- data.frame( a = 1)
f <- function() c(2,3,4)
I am looking for a solution that would result in a function f being applied to each row of base data frame and the result would be appended to each row. Neither of the following works:
result <- base %>% rowwise() %>% mutate( c(b,c,d) = f() )
result <- base %>% rowwise() %>% mutate( (b,c,d) = f() )
result <- base %>% rowwise() %>% mutate( b,c,d = f() )
What is the correct syntax for this task?
This appears to be a similar problem (Assign multiple new variables on LHS in a single line in R) but I am specifically interested in solving this with functions from tidyverse.
I think the best you are going to do is a do() to modify the data.frame. Perhaps
base %>% do(cbind(., setNames(as.list(f()), c("b","c","d"))))
would probably be best if f() returned a list in the first place for the different columns.
In case you're willing to do this without dplyr:
# starting data frame
base_frame <- data.frame(col_a = 1:10, col_b = 10:19)
# the function you want applied to a given column
add_to <- function(x) { x + 100 }
# run this function on your base data frame, specifying the column you want to apply the function to:
add_computed_col <- function(frame, funct, col_choice) {
frame[paste(floor(runif(1, min=0, max=10000)))] = lapply(frame[col_choice], funct)
return(frame)
}
Usage:
df <- add_computed_col(base_frame, add_to, 'col_a')
head(df)
And add as many columns as needed:
df_b <- add_computed_col(df, add_to, 'col_b')
head(df_b)
Rename your columns.

How to pass multiple columns as string to function in dplyr::mutate_at

I have the following (heavily simplified) dplyr example for mutate:
xx <- data.frame(x = 1:10, y = c(rep(1,4),rep(2,6)))
bla_fun <- function(x,y){cat(x," ",y,"\n"); min(x,y)}
xx %>% rowwise() %>%
mutate( z = bla_fun(x,y))
I would like to get it working with mutate_at which enables to pass me the column names as strings.
xx %>% rowwise() %>%
mutate_at( c("x","y"), funs("bla_fun") )
But this does not work. How to get it working?
mutate_at mutates every single column separately.
Your particular example can be solved (assuming the min is a placeholder that cannot be replaced by pmin) like this:
xx %>%
mutate(z = map2(!!sym("x"), !!sym("y"), !!sym("bla_fun")))
syms <- rlang::syms(c("x", "y"))
xx %>%
rowwise() %>%
mutate( z = bla_fun(!!! syms))
Side note 1: mutate_at is typically for applying n unary functions to n variables, not 1 n-ary function to n variables. mutate does the job.
Side note 2: there is no need to group rowwise. You could more simply mutate(xx, z = purrr::map2_dbl(x, y, bla_fun)) or rewrite/vectorize bla_fun with pmin() to mutate directly.
Combine this with the use of syms for strings: mutate(xx, z = mapply(bla_fun, !!! syms)) for instance, or mutate(xx, z = purrr::pmap_dbl(list(!!! syms), bla_fun)).

Resources