I want to apply multiple functions to the same dataframe. However, I have not been able to successfully pass column names as a parameter in purrr::imap. I keep get the following error:
Error in UseMethod("select") : no applicable method for 'select'
applied to an object of class "character"
I have tried many combinations for evaluation (e.g., using !!!, [[, enquo, sys.lang, and on and on). when I apply a function (e.g., check_1) directly to a dataframe, select works fine. However, it does not work when I try to pass column names as a parameter using imap and exec.The format of the column name is part of the issue (e.g., 1.1.), but I have tried quotes and single quotes, etc.
This is a follow up to a previous post, but that post and solution focused on applying multiple functions to individual columns. Now, I need to apply multiple functions, which use more than one column in the dataframe; hence, the need to specify column names in a function.
Minimal Example
Data
df <- structure(
list(
`1.1.` = c("Andrew", "Max", "Sylvia", NA, "1",
NA, NA, "Jason"),
`1.2.` = c(1, 2, 2, NA, 4, 5, 3, NA),
`1.2.1.` = c(
"cool", "amazing", "wonderful", "okay",
NA, NA, "chocolate", "fine"
)
),
class = "data.frame",
row.names = c(NA, -8L)
)
What I have Tried
library(purrr)
library(dplyr)
check_1 <- function(x, col1, col2) {
x %>%
dplyr::select(col1, col2) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(col1 == "Jason" & is.na(col2) == TRUE) %>%
dplyr::select(row.index) %>%
unlist() %>%
as.vector()
}
check_2 <- function(x, col1, col2) {
index <- x %>%
dplyr::select(col1, col2) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(col1 >= 3 & col1 <= 5 & is.na(col2) == TRUE) %>%
dplyr::select(row.index) %>%
unlist() %>%
as.vector()
return(index)
}
checks <-
list("df" = list(fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
"df" = list(fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))
results <-
purrr::imap(checks, ~ exec(.x$fn, x = .y,!!!.x$pars))
Expected Output
> results
$df
[1] 8
$df
[1] 5 6
Besides the "class character" error, I also get an additional error when I try to test the check_2 function on its own, where it returns no expected values.
[1] 1.2. 1.2.1. row.index
<0 rows> (or 0-length row.names)
I have looked at many other similar SO posts (e.g., this one), but none have solved this issue for me.
The first issue is that you pass the name of the dataframe but not the the dataframe itself. That's why you get the first error as you are trying to select from a character string. To solve this issue add the dataframe to the list you are looping over.
The second issue is that when you pass the column names as character string you have to tell dplyr that these characters refer to columns in your data. This could be achieved by e.g. making use of the .data pronoun.
Finally, instead of select + unlist + as.vector you could simply use dplyr::pull:
library(purrr)
library(dplyr)
check_1 <- function(x, col1, col2) {
x %>%
dplyr::select(all_of(c(col1, col2))) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(.data[[col1]] == "Jason" & is.na(.data[[col2]]) == TRUE) %>%
dplyr::pull(row.index)
}
check_2 <- function(x, col1, col2) {
x %>%
dplyr::select(all_of(c(col1, col2))) %>%
dplyr::mutate(row.index = row_number()) %>%
dplyr::filter(.data[[col1]] >= 3 & .data[[col1]] <= 5 & is.na(.data[[col2]]) == TRUE) %>%
dplyr::pull(row.index)
}
checks <-
list(df = list(df = df, fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
df = list(df = df, fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))
purrr::map(checks, ~ exec(.x$fn, x = .x$df, !!!.x$pars))
#> $df
#> [1] 8
#>
#> $df
#> [1] 5 6
Use select({{col1}},{{col2}})
this most probably help you
Related
I have
df = data.frame(Col1 = c( NA, 1," ", 2345.768,"hi","", NA, 3.4, "44.99"))
and want to format specific values, and created a udf
format_it = function(y, n_decimals, dash_type, suffix = ""){
if(is.na(y)) return(dash_type)
if(nchar(gsub(" ", "", y))==0) return(y)
has_letter = grep("[A-z]+", y)
if(is_empty(has_letter)== TRUE) {
return(paste0(format(round(as.numeric(y), n_decimals), nsmall=n_decimals, big.mark = ","),suffix))
}
if(has_letter == 1){
return(y)
} else{
x = as.numeric(y)
ifelse(is.na(x),
dash_type,
paste0(format(round(as.numeric(x), n_decimals), nsmall=n_decimals, big.mark = ","),suffix))}
}
I tested each value individually, ie format_it(df$Col1[1],1,"-"), and each one worked ok
but, when I set up a set_formatter in flextable,
df %>%
flextable() %>%
set_formatter(Col1 = function(x) format_it(x,1,"-"))
I hoped the results would be correct, but received the wrong results,
with the message: the condition has length > 1 and only the first element will be used
I tried updating to include Vectorize, but received the same error
Any suggestions?
I would like to see
I'm a little confused on your function, but a fresh code approach to recreating your table (based on your function) in a reproducible way is below, which produces your desired output. It first replaces any NA values in the original data with "-", then checks for all non-numeric values (ie, "hi") using grepl and keeps those the same, then standardizes the significant digits in the numeric values with sprintf. This approach was within the dplyr "world" using mutate() and case_when() and did not use a user-defined function.
df %>%
mutate(Col1 = case_when(
is.na(Col1) ~ "-",
!grepl("[^A-Za-z]", Col1) ~ Col1,
grepl(".", Col1) ~ sprintf("%.1f", as.numeric(Col1)),
)) %>%
flextable::flextable()
I built a function to use it inside the purrr::map2 function and run it in two lists. When I run the function steps separately it works ok. But apparently in map2 it runs the first time (for the first elements of list .x[[1]] .y[[1]]) and then in the second round throws this error in the for function:
How can I find out why it's not working?
PS: It's hard to put an example of the data here because they are lists with very specific characteristics for this function. I'm sorrry.
Follow the function:
df <- list()
build_HUW_raster <- function(.x, .y) {
list.time <- .x %>%
split(.$id) %>%
purrr::map(~list(t=as.matrix(.x$date),
xy=unname(as.matrix(.x[,c(22,23)])))
)
for(i in 1:50){
cat(i," ")
path=list.time[[i]]
ctmc=ctmcmove::path2ctmc(path$xy,path$t,r,method="LinearInterp")
df[[i]] <- as.data.frame(do.call(cbind, ctmc))
}
df <- df %>% purrr::map(~ group_by(., ec) %>%
summarise(rt = mean(rt)) %>%
arrange(desc(rt))
)
stacktime <- df %>% purrr::map(~ rename(., cell = ec)) %>%
map(~dplyr::left_join(cargo.grid, ., by="cell", copy=T)) %>%
map(~raster::rasterize(., r, field="rt", na.rm=F, background=0)) %>%
raster::stack()
stackprop <- .y %>%
split(.$id) %>%
purrr::map(~ raster::rasterize(., y = r,
field=.$proportion,
fun=function(x, ...)median(x))) %>%
raster::stack()
stack_huw <- raster::overlay(raster::calc(stacktime, fun=function(x)
ifelse(is.na(x), NA, x/sum(x, na.rm=T))), stackprop, fun=function(x,y)x*y
)
raster_mean <- raster::stackApply(stack_huw,
indices = rep(1,raster::nlayers(stack_huw)),
fun = "mean",
na.rm = F
)
}
result.list <- purrr::map2 (.x=list1, .y=list2, fun=build_HUW_raster)
The reason is based on the element looped. [[ extracts the list element and depending on the class of the element, map loops over either individual elements if it is a vector/matrix or the columns in case of data.frame as these are units. By using [, it extracts the element as a list
list(1, 2, 3)[1]
[[1]]
[1] 1
vs
list(1, 2, 3)[[1]]
[1] 1
When we loop over map and apply some functions that require a specific structure i.e. colSums require a matrix/data.frame ie. with dim attributes, it fails if we use [[
> map(replicate(2, data.frame(col1 = 1:5, col2 = 6:10), simplify = FALSE)[[1]], colSums)
Error in .f(.x[[i]], ...) :
'x' must be an array of at least two dimensions
> map(replicate(2, data.frame(col1 = 1:5, col2 = 6:10), simplify = FALSE)[1], colSums)
[[1]]
col1 col2
15 40
Here, we may change the code to
purrr::map2(.x=list1[1], .y=list2[1], fun=build_HUW_raster)
Consider the following data frame with two columns of strings of variable length:
library("tidyverse")
df <- tibble(REF = c("TTG", "CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT", "T", "TTGTGTGTGTGTGTGTGTGTGT"),
ALT = c("T", "CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT,CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT", "TTG", "TTGTGTGTGTGTGTGTGTGTGTGT"))
# # A tibble: 4 × 2
# REF ALT
# <chr> <chr>
# 1 TTG T
# 2 CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT,CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
# 3 T TTG
# 4 TTGTGTGTGTGTGTGTGTGTGT TTGTGTGTGTGTGTGTGTGTGTGT
Differently from column REF, column ALT sometimes includes several strings concatenated by comma (e.g. row 2).
I want to compare the length of strings in REF (REF_LEN) and ALT (ALT_LEN), and generate a TYPE column with values:
"SNM" when REF_LEN = ALT_LEN
"INS" when REF_LEN < ALT_LEN
"DEL" when REF_LEN > ALT_LEN
But I want to do it in a way that, when several strings are present in ALT, the output of this new TYPE column contains these types as well separated by a comma. i.e., the expected output here would be:
"DEL" "INS,DEL" "INS" "INS"
So far, I know how to get the length of values in ALT, but I fail at collapsing these values, as the output will contain lengths from all ALTs in the table, not just pairwise (i.e. 1,35,31,3,24):
df %>%
dplyr::mutate(REF_LEN = str_length(REF),
ALT_LEN = str_split(ALT, ","),
ALT_LEN = purrr::map(ALT_LEN, str_length) %>% unlist() %>% paste(collapse = ","))
Code above is incomplete as you can see, but I am also unable to work in a different direction using a helper function that gets the TYPE column above done. This will return many errors, but not sure why, since it seems to work nicely with values from ALT_LEN individually:
name <- function(alt_lens, ref_len) {
alt_lens <- unlist(alt_lens)
ifelse(alt_lens < ref_len, "DEL", ifelse(alt_lens > ref_len, "INS", "SNM"))
}
df %>%
dplyr::mutate(REF_LEN = str_length(REF),
ALT_LEN = str_split(ALT, ","),
TYPE = purrr::map(ALT_LEN, str_length) %>% name(REF_LEN))
Any ideas? thanks!
Here's a codegolf-ish base R solution :
df <- data.frame(REF = c("TTG", "CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT", "T", "TTGTGTGTGTGTGTGTGTGTGT"),
ALT = c("T", "CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT,CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT", "TTG", "TTGTGTGTGTGTGTGTGTGTGTGT"))
df$TYPE <- mapply(
function(x, y) paste(c("INS", "SNM", "DEL")[2 + sign(nchar(x)- nchar(y))], collapse = ","),
df$REF, strsplit(df$ALT, ","), USE.NAMES = FALSE)
df$TYPE
#> [1] "DEL" "INS,DEL" "INS" "INS"
Created on 2022-04-20 by the reprex package (v2.0.1)
Update: Removed first answer. Thanks to akrun for pointing me there!. The concept is the same: using nchar with case_when, the difference is to use separate_rows from tidyr package:
library(dplyr)
library(tidyr)
df %>%
mutate(id = row_number()) %>%
separate_rows(ALT, sep = ",") %>%
mutate(TYPE = case_when(nchar(REF)==nchar(ALT) ~ "SNM",
nchar(REF)< nchar(ALT) ~ "INS",
nchar(REF)> nchar(ALT) ~ "DEL",
TRUE ~ NA_character_)) %>%
group_by(id) %>%
mutate(TYPE = toString(TYPE)) %>%
slice(1)
REF ALT id TYPE
<chr> <chr> <int> <chr>
1 TTG T 1 DEL
2 CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT CGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT 2 INS, DEL
3 T TTG 3 INS
4 TTGTGTGTGTGTGTGTGTGTGT TTGTGTGTGTGTGTGTGTGTGTGT 4 INS
Sometimes I change column names of nested tibble depending on the value of another column.
I can do it preparing a function using in map().
library(tidyverse)
# sample data
d <- tibble(col1 = 1:8, col2 = 11:18, group = letters[rep(1:2, each = 4)]) %>%
nest(-group)
f <- function(data, group) rename(data, !!paste0(group, "_col1") := col1)
# Run
desired_output <- d %>%
mutate(data = map2(data, group, f))
names(desired_output$data[[1]]) # "a_col1" "col2" # work
I want to do it by anonymous function, but it doesn't work. Is there a way to do like mutate(map(rename(!!a := b))) with anonymous function?
Thank you for any advice.
d %>%
mutate(data2 = map2(data, group,
function(data, group) {
data %>%
rename(!!paste0(group, "_col1") := col1)
}))
# Error in paste0(group, "_col1") : object 'group' not found
I am not sure what you exactly mean by anonymous function but here is a way to do this without having to assign the function f to a variable:
d %>%
mutate(data = map2(data, group, ~rename_at(.x, 1, function(z) paste(.y, z, sep = "_"))))
I am making my first baby steps with non standard evaluation (NSE) in dplyr.
Consider the following snippet: it takes a tibble, sorts it according to the values inside a column and replaces the n-k lower values with "Other".
See for instance:
library(dplyr)
df <- cars%>%as_tibble
k <- 3
df2 <- df %>%
arrange(desc(dist)) %>%
mutate(dist2 = factor(c(dist[1:k],
rep("Other", n() - k)),
levels = c(dist[1:k], "Other")))
What I would like is a function such that:
df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)
produces the same result, where old_column column "dist" (the column I use to sort the data set), new_column (the column I generate) is "dist2" and levels_to_keep is "k" (number of values I explicitly retain).
I am getting lost in enquo, quo_name etc...
Any suggestion is appreciated.
You can do:
library(dplyr)
sort_keep=function(df,old_column, new_column, levels_to_keep){
old_column = enquo(old_column)
new_column = as.character(substitute(new_column))
df %>%
arrange(desc(!!old_column)) %>%
mutate(use = !!old_column,
!!new_column := factor(c(use[1:levels_to_keep],
rep("Other", n() - levels_to_keep)),
levels = c(use[1:levels_to_keep], "Other")),
use=NULL)
}
df%>%sort_keep(dist,dist2,3)
Something like this?
old_column = "dist"
new_column = "dist2"
levels_to_keep = 3
command = "df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)"
command = gsub('old_column', old_column, command)
command = gsub('new_column', new_column, command)
command = gsub('levels_to_keep', levels_to_keep, command)
eval(parse(text=command))