Specify arguments when applying function with sapply - r

I created the following function which finds the columns correlation to the target. The function is applied on the diamonds dataset (assigned to dt here) for this purpose.
select_variables_gen <- function(variable, target = dt$price, threshold = 0.9){
if(all(class(variable) %in% c("numeric","integer"))){
corr <- abs(cor(variable, target));
if(corr > threshold){
return(T);
}else{F}
}else{F}
};
Now that I want to apply the function I can't figure out how to specify the arguments of the function. This is what I tried
alt_selected_gen <- names(dt)[sapply(dt,
select_variables(variable = dt, target = dt$carat, threshold = 0.1))]
alt_selected_gen;
Which returns an error saying thaht the 2nd and 3rd argument are unused. How can I use the function (with sapply or any other way) to be able to specify the arguments?
My desired output is the column names of the columns which have a correlation above the threshold. So using the default values with the above code that would be;
[1] "carat" "price"

You pass your function to sapply. What you are trying to pass is a call to your function.
When you use sapply on a data frame, the columns get sent one by one to your function as its first argument. If you want to pass further named arguments to your function you just add them directly as parameters to sapply after the function itself. This works because of the dots operator (...) in sapply's formal arguments, which pass any extra parameters into the call to your function.
It should therefore just be
names(dt)[sapply(dt, select_variables_gen, target = dt$carat, threshold = 0.1)]
#> [1] "carat" "table" "price" "x" "y" "z"
Notice also that the function is called select_variables_gen in your example, not select_variables.

Related

How to pass a list whose names is not NULL to the arguments of do.call in r

When I try to pass a list whose names is not NULL, I get the following error from evaluating do.call: Error: argument "x" is missing, with no default. Is there another way to bypass the names of the list and access instead the actual elements within the list without setting the names to NULL?
# with NULL names, do.call runs
num_list <- list(1:10)
do.call(mean,num_list)
# without names being NULL, do.call fails
names(num_list) <- 'a'
do.call(mean,num_list)
Specifically, I'd like to pass the list to a function's ellipsis such as for raster::merge, https://www.rdocumentation.org/packages/raster/versions/3.3-7/topics/merge.
library(rgdal)
library(sf)
library(raster)
cities <- sf::st_read(system.file("vectors/cities.shp", package = "rgdal"))
birds <- sf::st_read(system.file("vectors/trin_inca_pl03.shp", package = "rgdal"))
sf_shapes <- list(cities, birds)
# without names works
sf_shape_extents = lapply(sf_shapes, raster::extent)
sf_max <- do.call(what = raster::merge, args = sf_shape_extents)
# with names does not
names(sf_shapes) <- c('cities', 'birds')
sf_shape_extents_names = lapply(sf_shapes, raster::extent)
sf_max_names <- do.call(what = raster::merge, args = sf_shape_extents)
You either ensure that the names of the list being passed in corresponds to the parameters of the function, or that the list is unnamed and the position of the list elements corresponds to the position of the parameter in question.
names(num_list) <- 'x'
do.call(mean,num_list)
[1] 5.5
names(num_list) <- 'a'
do.call(mean,unname(num_list))
[1] 5.5
EDIT:
I do not see any structural change in your edited version. The error is because of the names since they do not correspond to the named parameters of the function. You are passing in named arguments and that will throw an error.
The question you need to ask yourself is what are the parameter names of the function you intend to use?
If the ellipsis of a function takes in unnamed parameters, then whether the passed in arguments are named or not, it does not matter. eg, the paste function in R:
a <- list(a="a",b=3,c="d");
do.call(paste,a)
[1] "a 3 d"

Modify elipsis in R

I have a problem with elipsis usecase. My function accepts list of objects, let's call them objects of class "X". Now, objects X are being processed inside of my function to class "Xs", so I have list of "Xs" objects. Function that I import from other package can compute multiple "Xs" objects at once but they have to be enumerated (elipsis mechanic), not passed as list. Is there a way how to solve it? I want something like this
examplefun <- function(charlist){
nums <- lapply(charlist, as.numeric)
sum(... = nums)
}
Of course example above throws an error but it shows what i want to achieve. I tried to unlist with recursive = FALSE ("X" and "Xs" are the list itself) but it does not work.
If there is no solution then:
Let's assume I decideed to accept ... insted of list of "X" objects. Can I modify elipsis elements (change them to "Xs") and then pass to function that accepts elipsis? So it will look like this:
examplefun2 <- function(...){
function that modify object in ... to "Xs" objects
sum(...)
}
In your first function, just call sum directly because sum works correctly on vectors of numbers instead of individual numbers.
examplefun <- function (charlist) {
nums <- vapply(charlist, as.numeric, numeric(1L))
sum(nums)
}
(Note the use of vapply instead of lapply: sum expects an atomic vector, we can’t pass a list.)
In your second function, you can capture ... and work with the captured variable:
examplefun2 <- function (...) {
nums <- as.numeric(c(...))
sums(nums)
}
For more complex arguments, Roland’s comment is a good alternative: Modify the function arguments as a list, and pass it to do.call.

how to get variable names from list

I have list of functions which also contains one user defined function:
> fun <- function(x) {x}
> funs <- c(median, mean, fun)
Is it possible to get function names as strings from this list? My only workaround so far was to create vector which contains function names as strings:
> fun.names <- c("median", "mean", "fun")
When I want to get variable name I use to do this trick (if this is not correct correct me please) but as you can see it only work for one variable not for list:
> as.character(substitute(mean))
[1] "mean"
> as.character(substitute(funs))
[1] "funs"
Is there something that will work also for list? Is there any difference if list contains functions or data types?
EDIT:
I need to pass this list of functions (plus another data) to another function. Then those functions from list will be applied to dataset. Function names are needed because if there are several functions passed in list I want to being able to determine which function was applied. So far I've been using this:
window.size <- c(1,2,3)
combinations <- expand.grid(window.size, c(median, mean))
combinations <- cbind(combinations, rep(c("median","mean"), each = length(window.size)))
Generally speaking, this is not possible. Consider this definition of funs:
funs <- c(median,mean,function(x) x);
In this case, there's no name associated with the user-defined function at all. There's no rule in R that says all functions must be bound to a name at any point in time.
If you want to start making some assumptions about whether and where all such lambdas are defined, then possibilities open up.
One idea is to search the closure environment of each function for an entry that matches (identically) to the function itself, and then use that name. This will incur a performance penalty due to the comparison work, but may be tolerable if you don't have to run it repetitively:
getFunNameFromClosure <- function(fun) names(which(do.call(c,eapply(environment(fun),identical,fun)))[1L]);
Demo:
fun <- function(x) x;
funs <- c(median,mean,fun);
sapply(funs,getFunNameFromClosure);
## [1] "median" "mean" "fun"
Caveats:
1: As explained earlier, this will not work on functions that were never bound to a name. Furthermore, it will not work on functions whose closure environment does not contain a binding to the function. This could happen if the function was bound to a name in a different environment than its closure (via a return value, superassignment, or assign() call) or if its closure environment was explicitly changed.
2: It is possible to bind a function to multiple names. Thus, the name you get as a result of the eapply() search may not be the one you expect. Here's a good demonstration of this:
getFunNameFromClosure(ls); ## gets wrong name
## [1] "objects"
identical(ls,objects); ## this is why
## [1] TRUE
Here is a hacky approach:
funs <- list(median, mean)
fun_names = sapply(funs, function(x) {
s = as.character(deparse(eval(x)))[[2]]
gsub('UseMethod\\(|[[:punct:]]', '', s)
})
names(funs) <- fun_names
funs
$median
function (x, na.rm = FALSE)
UseMethod("median")
<bytecode: 0x103252878>
<environment: namespace:stats>
$mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x103ea11b8>
<environment: namespace:base>
combinations <- expand.grid(window.size, fun_names, c(median, mean))

Pass Name and Value of Character Vector into Function using lapply

I have a function that takes two arguments:
my.function <- function(name, value) {
print(name)
print(value) #using print as example
}
I have an integer vector that has names and values:
freq.chars <- table(sample(LETTERS[1:5], 10, replace=TRUE))
I'd like to use lapply to apply my.function to freq.chars where the name of each item is passed in as x, and the value (in this case frequency) is passed in as y.
When I try,
lapply(names(freq.chars), my.function)
I get an error that "value" is missing with no default.
I've also tried
lapply(names(freq.chars), my.function, name = names(freq.chars), value = freq.chars)
, in which case I get an error: unused argument value = c(...).
Sorry for the edits and clarity, I'm new at this...
We use this test data:
set.seed(123) # needed for reproducibility
char.vector <- sample(LETTERS[1:5], 10, replace=TRUE)
freq.chars <- table(char.vector)
Here are several variations:
# 1. iterate simultaneously over names and values
mapply(my.function, names(freq.chars), unname(freq.chars))
# 2. same code except Map replaces mapply. Map returns a list.
Map(my.function, names(freq.chars), unname(freq.chars))
# 3. iterate over index and then turn index into names and values
sapply(seq_along(freq.chars),
function(i) my.function(names(freq.chars)[i], unname(freq.chars)[i]))
# 4. same code as last one except lapply replaces sapply. Returns list.
lapply(seq_along(freq.chars),
function(i) my.function(names(freq.chars)[i], unname(freq.chars)[i]))
# 5. this iterates over names rather than over an index
sapply(names(freq.chars), function(nm) my.function(nm, freq.chars[[nm]]))
# 6. same code as last one except lapply replaces sapply. Returns list.
lapply(names(freq.chars), function(nm) my.function(nm, freq.chars[[nm]]))
Note that mapply and sapply have an optional USE.NAMES argument that controls whether names are inferred for the result and an optional simplify argument ('SIMPLIFYformapply`) which controls whether list output is simplified. Use these arguments for further control.
Update Completely revised presentation.
If you just want to add another parameter to your function, specify it after the function name (3rd parm in lapply).
lapply(names(freq.chars), my.function, char.vector)

How do I access function names in R?

I am writing a function that receives two parameters: a data frame, and a function, and, after processing the data frame, summarizes it using the function parameter (e.g. mean, sd,...). My question is, how can I get the name of the function received as a parameter?
How about:
f <- function(x) deparse(substitute(x))
f(mean)
# [1] "mean"
f(sd)
# [1] "sd"
do.call may be what you want here. You can get a function name as character value, and then pass that and a list of arguments to do.call for evaluation. For example:
X<-"mean"
do.call(X,args=list(c(1:5)) )
[1] 3
Perhaps I'm misunderstanding the question, but it seems like you could simply have the function name as a parameter, and evaluate the function like normal within your function. This approach works fine for me. The ellipsis is for added parameters to your function of interest.
myFunc=function(data,func,...){return(func(data,...))}
myFunc(runif(100), sd)
And if you'd want to apply it to every column or row of a data.frame, you could simply use an apply statement in myFunc.
Here's my try, perhaps, you want to return both the result and the function name:
y <- 1:10
myFunction <- function(x, param) {
return(paste(param(x), substitute(param)))
}
myFunction(y, mean)
# [1] "5.5 mean"

Resources