from mutate_at to mutate(across(, also .names - r

I'm recoding survey responses (character) to a set of questions (that are not in continuous columns), and I was thrilled to get the following code to work:
#make a list of the selected columns
fcols <- c(2, 6, 8, 9, 14)
#recode the selected columns
d <- d %>% mutate_at(vars(fcols),
~(recode(.,
"OriginalResponse1" = "NewResponse1",
"OriginalResponse2" = "NewResponse2",
"OriginalResponse3" = "NewResponse3",
"OriginalResponse4" = "NewResponse4",
.default = NA_character_)))
My main question has to do with making this work with "across", since "mutate_at" is apparently superseded.
I tried the below - put in the "across", and make sure to add a new closed paren at the end - but it doesn't work:
d <- d %>% mutate(across(vars(fcols),
~(recode(.,
"OriginalResponse1" = "NewResponse1",
"OriginalResponse2" = "NewResponse2",
"OriginalResponse3" = "NewResponse3",
"OriginalResponse4" = "NewResponse4",
.default = NA_character_))))
Error: Problem with `mutate()` input `..1`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `quosures`.
i It must be numeric or character.
i Input `..1` is `across(...)`.
Also, I've been trying to create a new set of columns (rather than just changing the existing ones) using the .names argument, after the .default argument, but I haven't been able to get that to work, except once only partially - when the columns appeared but they were all empty.
Main question: what am I missing in converting this to "across" from the working "mutate_at" version?
Bonus: how do I get the .names part to work?

For across when you have fcols as numbers you don't need vars -
library(dplyr)
d %>% mutate(across(fcols,
~recode(.,
"OriginalResponse1" = "NewResponse1",
"OriginalResponse2" = "NewResponse2",
"OriginalResponse3" = "NewResponse3",
"OriginalResponse4" = "NewResponse4",
.default = NA_character_)))
.names is useful when you want to keep the original columns as it is and create new columns.
d %>% mutate(across(fcols,
~recode(.,
"OriginalResponse1" = "NewResponse1",
"OriginalResponse2" = "NewResponse2",
"OriginalResponse3" = "NewResponse3",
"OriginalResponse4" = "NewResponse4",
.default = NA_character_), .names = '{col}_new'))

We can wrap with all_of instead of vars. Also, recode can take a named vector
library(dplyr)
library(stringr)
nm1 <- setNames(str_c("NewResponse", 1:4),
str_c("OriginalResponse", 1:4))
d %>%
mutate(across(all_of(fcols),
~recode(., !!! nm1,
.default = NA_character_)))

Related

Pass column names into a function using apply or map

I want to apply multiple functions to the same dataframe. However, I have not been able to successfully pass column names as a parameter in purrr::imap. I keep get the following error:
Error in UseMethod("select") : no applicable method for 'select'
applied to an object of class "character"
I have tried many combinations for evaluation (e.g., using !!!, [[, enquo, sys.lang, and on and on). when I apply a function (e.g., check_1) directly to a dataframe, select works fine. However, it does not work when I try to pass column names as a parameter using imap and exec.The format of the column name is part of the issue (e.g., 1.1.), but I have tried quotes and single quotes, etc.
This is a follow up to a previous post, but that post and solution focused on applying multiple functions to individual columns. Now, I need to apply multiple functions, which use more than one column in the dataframe; hence, the need to specify column names in a function.
Minimal Example
Data
df <- structure(
list(
`1.1.` = c("Andrew", "Max", "Sylvia", NA, "1",
NA, NA, "Jason"),
`1.2.` = c(1, 2, 2, NA, 4, 5, 3, NA),
`1.2.1.` = c(
"cool", "amazing", "wonderful", "okay",
NA, NA, "chocolate", "fine"
)
),
class = "data.frame",
row.names = c(NA, -8L)
)
What I have Tried
library(purrr)
library(dplyr)
check_1 <- function(x, col1, col2) {
x %>%
dplyr::select(col1, col2) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(col1 == "Jason" & is.na(col2) == TRUE) %>%
dplyr::select(row.index) %>%
unlist() %>%
as.vector()
}
check_2 <- function(x, col1, col2) {
index <- x %>%
dplyr::select(col1, col2) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(col1 >= 3 & col1 <= 5 & is.na(col2) == TRUE) %>%
dplyr::select(row.index) %>%
unlist() %>%
as.vector()
return(index)
}
checks <-
list("df" = list(fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
"df" = list(fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))
results <-
purrr::imap(checks, ~ exec(.x$fn, x = .y,!!!.x$pars))
Expected Output
> results
$df
[1] 8
$df
[1] 5 6
Besides the "class character" error, I also get an additional error when I try to test the check_2 function on its own, where it returns no expected values.
[1] 1.2. 1.2.1. row.index
<0 rows> (or 0-length row.names)
I have looked at many other similar SO posts (e.g., this one), but none have solved this issue for me.
The first issue is that you pass the name of the dataframe but not the the dataframe itself. That's why you get the first error as you are trying to select from a character string. To solve this issue add the dataframe to the list you are looping over.
The second issue is that when you pass the column names as character string you have to tell dplyr that these characters refer to columns in your data. This could be achieved by e.g. making use of the .data pronoun.
Finally, instead of select + unlist + as.vector you could simply use dplyr::pull:
library(purrr)
library(dplyr)
check_1 <- function(x, col1, col2) {
x %>%
dplyr::select(all_of(c(col1, col2))) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(.data[[col1]] == "Jason" & is.na(.data[[col2]]) == TRUE) %>%
dplyr::pull(row.index)
}
check_2 <- function(x, col1, col2) {
x %>%
dplyr::select(all_of(c(col1, col2))) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(.data[[col1]] >= 3 & .data[[col1]] <= 5 & is.na(.data[[col2]]) == TRUE) %>%
dplyr::pull(row.index)
}
checks <-
list(df = list(df = df, fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
df = list(df = df, fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))
purrr::map(checks, ~ exec(.x$fn, x = .x$df, !!!.x$pars))
#> $df
#> [1] 8
#>
#> $df
#> [1] 5 6
Use select({{col1}},{{col2}})
this most probably help you

replace NA is selected columns with replace_na

I have a dataset that contains columns hh_c22j, hh_r02a, hh_r02b. I want to replace NAs in these col with 0. Right now I have the command as below, it works. But is redundant, as I need to specify for each column to replace with 0.
df %>% select(case_id, hh_c22j, hh_r02a, hh_r02b) %>% replace_na(list(hh_c22j=0, hh_r02a=0, hh_r02b=0))
I want to select the columns together in an array/list like below.
df %>% select(case_id, hh_c22j, hh_r02a, hh_r02b) %>% replace_na(c(hh_c22j, hh_r02a, hh_r02b), 0)
But I got an error. The error msg is :
Error in is_list(replace) : object 'hh_c22j' not found
Error: 1 components of `...` were not used.
We detected these problematic arguments:
* `..1`
Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlib_error_dots_unused>
1 components of `...` were not used.
We detected these problematic arguments:
* `..1`
Did you misspecify an argument?
Backtrace:
1. `%>%`(...)
5. ellipsis:::action_dots(...)
Run `rlang::last_trace()` to see the full context.
Assuming you have other columns in the data as well but want to change just the three columns, you can do this:
library(dplyr)
df %>% mutate_at(vars(hh_c22j, hh_r02a, hh_r02b), list(~ replace(., which(is.na(.)), 0)))
# Alternatively, using replace_na
df %>% mutate_at(vars(hh_c22j, hh_r02a, hh_r02b), list(~ replace_na(., 0)))
Just for future reference, a small reproducible sample would go a long way to get better answers!
One option to do this in a clean way is make use of the mutate_all function and pass it the function to use on each of the columns. For example, here I create a dataset similar to what you have and replace the null values with 0s:
data <- data.frame(hh_c22j = sample(c(NA, 1), size = 5, replace = TRUE),
hh_r02a = sample(c(NA, 1), size = 5, replace = TRUE),
hh_r02b = sample(c(NA, 1), size = 5, replace = TRUE))
data %>%
mutate_all(replace_na, 0)
If you only want to perform this operation on some columns, mutate_at is a similar option where you can specify which column(s) to use this on.

R - How to input a list value into a function? Non-numeric argument to binary operator

I am trying to input a value from my list of names into my function so that I can perform some calculations on it using values from a dataframe.
library(dplyr)
## My list of names
name_list = list(c("A", "B"), c("C", "D"))
## Some random function to perform calculations
random_function = function(input){
input/10
}
## The reason you see name_list[[1]][1] is because I wish to do this repeatedly for different list of names.
data.frame("A"=c(1,1,2,2,3,4), "B"=c(1,3,5,7,9,11)) %>%
mutate(A2 = random_function(name_list[[1]][1]))
Unfortunately, this doesn't work and returns the error:
"non-numeric argument to binary operator"
Is there anyway around this?
What I want is essentially:
data.frame("A"=c(1,1,2,2,3,4), "B"=c(1,3,5,7,9,11)) %>%
mutate(A2 = random_function(A))
We can use mutate_all
library(dplyr)
df1 %>%
mutate_all(list(`2` = ~ random_function(.)))
Or if we need to do this based on 'name_list', convert to symbol and evaluage (!!)
df1 %>%
mutate(A2 = random_function(!! rlang::sym(name_list[[1]][1])))
Or specify it in mutate_at
df1 %>%
mutate_at(vars(name_list[[1]][1]), list(A2 = ~ random_function(.)))

Dplyr Non Standard Evaluation -- Help Needed

I am making my first baby steps with non standard evaluation (NSE) in dplyr.
Consider the following snippet: it takes a tibble, sorts it according to the values inside a column and replaces the n-k lower values with "Other".
See for instance:
library(dplyr)
df <- cars%>%as_tibble
k <- 3
df2 <- df %>%
arrange(desc(dist)) %>%
mutate(dist2 = factor(c(dist[1:k],
rep("Other", n() - k)),
levels = c(dist[1:k], "Other")))
What I would like is a function such that:
df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)
produces the same result, where old_column column "dist" (the column I use to sort the data set), new_column (the column I generate) is "dist2" and levels_to_keep is "k" (number of values I explicitly retain).
I am getting lost in enquo, quo_name etc...
Any suggestion is appreciated.
You can do:
library(dplyr)
sort_keep=function(df,old_column, new_column, levels_to_keep){
old_column = enquo(old_column)
new_column = as.character(substitute(new_column))
df %>%
arrange(desc(!!old_column)) %>%
mutate(use = !!old_column,
!!new_column := factor(c(use[1:levels_to_keep],
rep("Other", n() - levels_to_keep)),
levels = c(use[1:levels_to_keep], "Other")),
use=NULL)
}
df%>%sort_keep(dist,dist2,3)
Something like this?
old_column = "dist"
new_column = "dist2"
levels_to_keep = 3
command = "df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)"
command = gsub('old_column', old_column, command)
command = gsub('new_column', new_column, command)
command = gsub('levels_to_keep', levels_to_keep, command)
eval(parse(text=command))

R: dplyr::lag throws error when trying to lag characters in tibble

I'm getting the following error in R when I try to use the lag function (from the dplyr library) on a column of characters in a tibble:
Error in mutate_impl(.data, dots) : Expecting a single string
value: [type=logical; extent=1].
This error does not occur for a column of characters in a data frame. I also don't get the error for a column of numbers in either a tibble or a data frame.
Does anyone know why I'm getting this discrepancy in the lag function for data frames versus tibbles? Thanks!
Here is some sample code that reproduces the error. I have examples of both when lag works and when it doesn't. I have tried updating the tidyverse and dplyr libraries on my machine but I'm still getting the same error.
tib = data_frame(x = c('a','b','c'), y = 1:3)
# lagging column of characters in tibble throws error
res = tib %>%
mutate(lag_n = lag(x, n=1, default = NA))
# lagging column of numbers in tibble does NOT throw error
res = tib %>%
mutate(lag_c = lag(y, n=1, default = NA))
df = data.frame(x = c('a','b','c'), y = 1:3)
# lagging column of characters in data frame does NOT throw error
res = df %>%
mutate(lag_n = lag(x, n=1, default = NA))
# lagging column of numbers in data frame does NOT throw error
res = df %>%
mutate(lag_c = lag(y, n=1, default = NA))
You're running into this error because dplyr and tibble are strict about the type of NA values that they allow you to use (or, more specifically, they are more strict about checking the type of the variable you create). You needed NA_character_, like so:
res = tib %>%
mutate(lag_n = lag(x, n=1, default = NA_character_))

Resources