data name as an argument passed in a function in R - r

I built an R package and include a dataset called mouse where I can access it using data(mouse). In the package, I also have a function called fun which takes, as its first argument, the name of a dataset (included in the package):
fun = function(dt = NULL, ...) {
data(dt)
...
dt.sub = dt[ ,1:6]
...
}
However, when I use the function as fun(dt = "mouse"), it says In data(dt) : data set ‘dt’ not found. Also, I cannot use dt[ ,1:6] since dt here is a string. I tried to use noquote and as.name functions to get rid of the quotation marks, but the object dt does NOT refer to the mouse dataset.
My question is, what's the best approach to pass the name of a dataset (mouse in this case) in the function argument, and then use it in the function body? Thanks!

Try this:
f <- function(dt = NULL) {
do.call("data", list(dt))
dt <- eval(as.name(dt))
head(dt)
}

Related

R: how to write a wrapper for a function calling another function

I use the function caRamel from the package with the same name. The function takes another function my_func as an argument....
caRamel(
fn=my_func,
other_caRamel_parameters...
)
my_func is a function taking a unique parameter i (by design, mandatory and given by the caRamel function):
my_func <- function(i) {
require(somelib)
my_path = "C:/myfolder/"
do things with i
...
}
By design, there is no way to pass further arguments to the my_func function inside the caRamel function and I have to hard code everything like the my_path variable for example inside my_func.
However, I would like to be able to pass my_path and others variables as parameters in the caRamel function like so:
caRamel(
fn=my_func,
other_caRamel_parameters...,
my_path="C:/myfolder/", ...
)
with:
my_func <- function(i, my_path) {
require(somelib)
my_path = my_path
do things with i
...
}
I was then wondering if it was possible to write a "wrapper" in this case so that further parameters can be passed to my_func? What could be the options to achieve that?
We can use {purrr} and {rlang} in a custom myCaRamel function which only takes the ellipsis ... as argument.
First we capture the dots with rlang::list2(...).
Then we get all formals of the caRamel function as names with rlang::fn_fmls_names.
Now we differentiate between arguments which go into the caRamel function and those which go into your custom function.
We use purrr::partial to supply the argument which go into your my_fun.
Then we call the original caRamel function using the partial function we created together with the arguments that go in to caRamel.
In the example below I rlang::expr in the last call to show that it is working. Please delete the expr() around the last call to make it actually work.
The downside is that every argument needs to be correctly named. Unnamed arguments and partial matching won't work. Further, if my_fun and caRamel contain arguments with similar names they only go into caRamel and won't reach my_fun.
library(purrr)
library(rlang)
library(caRamel)
my_fun <- function(x, my_path) {
# use my_path
# and some x
}
myCaRamel <- function(...) {
dots <- rlang::list2(...)
caramel_args <- rlang::fn_fmls_names(caRamel)
caramel_args_wo_fun <- caramel_args["func" != caramel_args]
other_args <- names(dots)[!names(dots) %in% caramel_args]
my_fn <- dots[["func"]]
my_fn <- purrr::partial(my_fn, !!! dots[other_args])
rlang::expr( # just to show correct out put - delete this line
caRamel(func = my_fn,
!!! dots[names(dots) %in% caramel_args_wo_fun])
) # delete this line
}
myCaRamel(func = my_fun,
my_path = "C:/myfolder/",
nvar = 10)
#> caRamel(func = my_fn, nvar = 10)
Created on 2021-12-11 by the reprex package (v2.0.1)

dataset and UDFs using sapply

I have a dataset (employee) created from a csv, that displays data as given below;
employee[1,]
age name designation
28 Tony Manager
I have created a function that returns a decision based on an input parameter;
loan_eligible_decision <- function(p)
{
if(p$designation == "manager")
{
decision <- "yes"
}
return(decision)
}
when the function is called directly it works fine and gives the result below;
loan_eligible_decision(employee[1,])
gives me output: yes
However when called within an sapply family it throws a reference error;
sapply(data.frame(employee[1,]),loan_eligible_decision(x))
Error in p$marital : $ operator is invalid for atomic vectors
Any suggestions as to what could be a possible workaround/solution?
I have also tried replacing the if condition with;
if(p[[designation]] == "manager")
and called upon the function like so;
sapply(employee['1',],loan_eligible_decision(x))
The error:
Error in loan_eligible_decision(x) : object 'designation' not found
You are calling the function incorrectly. It should be
myfun <- function(x) x^2
sapply(xy, FUN = myfun)
In any case, try inserting a browser() call within the function and inspect what is going on. See ?browser for more info.
myfun <- function(x) {
browser()
x^2
}

Accessing ... function arguments by (string) name inside the function in R?

I'm trying to write a function with dynamic arguments (i.e. the function argument names are not determined beforehand). Inside the function, I can generate a list of possible argument names as strings and try to extract the function argument with the corresponding name (if given). I tried using match.arg, but that does not work.
As a (massively stripped-down) example, consider the following attempt:
# Override column in the dataframe. Dots arguments can be any
# of the column names of the data.frame.
dataframe.override = function(frame, ...) {
for (n in names(frame)) {
# Check whether this col name was given as an argument to the function
if (!missing(n)) {
vl = match.arg(n);
# DO something with that value and assign it as a column:
newval = vl
frame[,n] = newval
}
}
frame
}
AA = data.frame(a = 1:5, b = 6:10, c = 11:15)
dataframe.override(AA, b = c(5,6,6,6,6)) # Should override column b
Unfortunately, the match.arg apparently does not work:
Error in match.arg(n) : 'arg' should be one of
So, my question is: Inside a function, how can I check whether the function was called with a given argument and extract its value, given the argument name as a string?
Thanks,
Reinhold
PS: In reality, the "Do something..." part is quite complicated, so simply assigning the vector to the dataframe column directly without such a function is not an option.
You probably want to review the chapter on Non Standard Evaluation in Advanced-R. I also think Hadley's answer to a related question might be useful.
So: let's start from that other answer. The most idiomatic way to get the arguments to a function is like this:
get_arguments <- function(...){
match.call(expand.dots = FALSE)$`...`
}
That provides a list of the arguments with names:
> get_arguments(one, test=2, three=3)
[[1]]
one
$test
[1] 2
$three
[1] 3
You could simply call names() on the result to get the names.
Note that if you want the values as strings you'll need to use deparse, e.g.
deparse(get_arguments(one, test=2, three=3)[[2]])
[1] "2"
P.S. Instead of looping through all columns, you might want to use intersect or setdiff, e.g.
dataframe.override = function(frame, ...) {
columns = names(match.call(expand.dots = FALSE)$`...`)[-1]
matching.cols <- intersect(names(frame), names(columns))
for (i in seq_along(matching.cols) {
n = matching.cols[[i]]
# Check whether this col name was given as an argument to the function
if (!missing(n)) {
vl = match.arg(n);
# DO something with that value and assign it as a column:
newval = vl
frame[,n] = newval
}
}
frame
}
P.P.S: I'm assuming there's a reason you're not using dplyr::mutate for this.

R data tables accessing columns by name

If I have a data table, foo, in R with a column named "date", I can get the vector of date values by the notation
foo[, date]
(Unlike data frames, date doesn't need to be in quotes).
How can this be done programmatically? That is, if I have a variable x whose value is the string "date", then how to I access the column of foo with that name?
Something that sort of works is to create a symbol:
sym <- as.name(x)
v <- foo[, eval(sym)]
...
As I say, that sort of works, but there is something not quite right about it. If that code is inside a function myFun in package myPackage, then it seems that it doesn't work if I explicitly use the package through:
myPackage::myFun(...)
I get an error message saying "undefined columns selected".
[edited] Some more details
Suppose I create a package called myPackage. This package has a single file with the following in it:
library(data.table)
#' export
myFun <- function(table1) {
names1 <- names(table1)
name1 <- names1[[1]]
sym <- as.Name(name1)
table1[, eval(sym)]
}
If I load that function using R Studio, then
myFun(tbl)
returns the first column of the data table tbl.
On the other hand, if I call
myPackage::myFun(tbl)
it doesn't work. It complains about
Error in .subset(x, j) : invalid subscript type 'builtin'
I'm just curious as to why myPackage:: would make this difference.
A quick way which points to a longer way is this:
subset(foo, TRUE, date)
The subset function accepts unquoted symbol/names for its 'subset' and 'select' arguments. (Its author, however, thinks this was a bad idea and suggests we use formulas instead.) This was the jumping off place for sections of Hadley Wickham's Advanced Programming webpages (and book).: http://adv-r.had.co.nz/Computing-on-the-language.html and http://adv-r.had.co.nz/Functional-programming.html . You can also look at the code for subset.data.frame:
> subset.data.frame
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r & !is.na(r)
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
The problem with the use of "naked" expressions that get passed into functions is that their evaluation frame is sometimes not what is expected. R formulas, like other functions, carry a pointer to the environment in which they were defined.
I think the problem is that you've defined myFun in your global environment, so it only appeared to work.
I changed as.Name to as.name, and created a package with the following functions:
library(data.table)
myFun <- function(table1) {
names1 <- names(table1)
name1 <- names1[[1]]
sym <- as.name(name1)
table1[, eval(sym)]
}
myFun_mod <- function(dt) {
# dt[, eval(as.name(colnames(dt)[1]))]
dt[[colnames(dt)[1]]]
}
Then, I tested it using this:
library(data.table)
myDt <- data.table(a=letters[1:3],b=1:3)
myFun(myDt)
myFun_mod(myDt)
myFun didn't work
myFun_mod did work
The output:
> library(test)
> myFun(myDt)
Error in eval(expr, envir, enclos) : object 'a' not found
> myFun_mod(myDt)
[1] "a" "b" "c"
then I added the following line to the NAMESPACE file:
import(data.table)
This is what #mnel was talking about with this link:
Using data.table package inside my own package
After adding import(data.table), both functions work.
I'm still not sure why you got the particular .subset error, which is why I went though the effort of reproducing the result...

Passing expression through functions

I'm using data.table package and trying to write a function (shown below):
require(data.table)
# Function definition
f = function(path, key) {
table = data.table(read.delim(path, header=TRUE))
e = substitute(key)
setkey(table, e) # <- Error in setkeyv(x, cols, verbose = verbose) : some columns are not in the data.table: e
return(table)
}
# Usage
f("table.csv", ID)
Here I try to pass an expression to the function. Why this code doesn't work?
I've already tried different combinations of substitute(), quote() and eval(). So, it'd be great if you could also explain how to get this to work.
First, let's look at how the setkey function does things from the data.table package:
# setkey function
function (x, ..., verbose = getOption("datatable.verbose"))
{
if (is.character(x))
stop("x may no longer be the character name of the data.table. The possibility was undocumented and has been removed.")
cols = getdots()
if (!length(cols))
cols = colnames(x)
else if (identical(cols, "NULL"))
cols = NULL
setkeyv(x, cols, verbose = verbose)
}
So, when you do:
require(data.table)
dt <- data.table(ID=c(1,1,2,2,3), y = 1:5)
setkey(dt, ID)
It calls the function getdots which is internal to data.table (that is, it's not exported). Let's have a look at that function:
# data.table:::getdots
function ()
{
as.character(match.call(sys.function(-1), call = sys.call(-1),
expand.dots = FALSE)$...)
}
So, what does this do? It takes the parameter you entered in setkey and it uses match.call to extract the arguments separately. That is, the match.call argument for this example case would be:
setkey(x = dt, ... = list(ID))
and since it's a list, you can access the ... parameter with $... to get a list of 1 element with its value ID and converting to this list to a character with as.character results in "ID" (a character vector). And then setkey passes this to setkeyv internally to set the keys.
Now why doesn't this work when you write setkey(table, key) inside your function?
This is precisely because of the way setkey/getdots is. The setkey function is designed to take any argument after the first argument (which is a data.table) and then return the ... argument as a character.
That is, if you give setkey(dt, key) then it'll return cols <- "key". If you give setkey(dt, e), it'll give back cols <- "e". It doesn't look for if "key" is an existing variable and then if so substitute the value of the variable. All it does is convert the value you provide (whether it be a symbol or character) back to a character.
Of course this won't work in your case because you want the value in key = ID to be provided in setkey. At least I can't think of a way to do this.
How to get around this?
As #agstudy already mentions, the best/easiest way is to pass "ID" and use setkeyv. But, if you really insist on using f("table.csv", ID) then, this is what you could do:
f <- function(path, key) {
table = data.table(read.delim(path, header=TRUE))
e = as.character(match.call(f)$key)
setkeyv(table, e)
return(table)
}
Here, you first use match.call to get the value corresponding to argument key and then convert it to a character and then pass that to setkeyv.
In short, setkey internally uses setkeyv. And imho, setkey is a convenient function to be used when you already know the column name of the data.table for which you need to set the key. Hope this helps.
I can't tell from your code what you're trying to achieve, so I'll answer the question the title asks instead; "How to pass an expression through a function?"
If you want to do this (this should be avoided where possible), you can do the following:
f <- function(expression) {
return(eval(parse(text=expression)))
}
For example:
f("a <- c(1,2,3); sum(a)")
# [1] 6

Resources