Create multiple functions/varying arguments in R using a list - r

I'm trying to create multiple functions with varying arguments.
Just some background: I need to compute functions describing 75 days respectively and multiply them later to create a Maximum-Likelihood function. They all have the same form, they only differ in some arguments. That's why I wanted to this via a loop.
I've tried to put all the equations in a list to have access to them later on.
The list this loop generates has 75 arguments, but they're all the same, as the [i] in the defined function is not taken into account by the loop, meanging that the M_b[i] (a vector with 75 arguments) does not vary.
Does someone know, why this is the case?
simplified equation used
for (i in 1:75){
log_likelihood[[i]] <-
list(function(e_b,mu_b){M_b[i]*log(e_b*mu_b))})
}
I couldn't find an answer to this in different questions. I'm sorry, if there's a similar thread already existing.

you need to force the evaluation of the variable M_b[i], see https://adv-r.hadley.nz/function-factories.html. Below I try and make it work
func = function(i){
i = force(i)
f = function(e_b,mu_b){i*log(e_b*mu_b) }
return(f)
}
# test
func(9)(7,3) == 9*log(7*3)
#some simulated values for M_b
M_b = runif(75)
log_likelihood = vector("list",75)
for (idx in 1:75){
log_likelihood[[idx]] <- func(M_b[idx])
}
# we test it on say e_b=5, mu_b=6
test = sapply(log_likelihood,function(i)i(5,6))
actual = sapply(M_b,function(i)i*log(5*6))
identical(test,actual)
[1] TRUE
This is called lazy evaluation, where R doesn't evaluate an expression when it is not used. As correctly pointed about by #SDS0, the value you get is at i=75. We try it with your original function:
func = function(i){function(e_b,mu_b){i*log(e_b*mu_b) }}
M_b = 1:3
log_likelihood = vector("list",3)
for (idx in 1:3){
log_likelihood[[idx]] = func(M_b[idx])
}
sapply(log_likelihood,function(f)f(5,6))
[1] 10.20359 10.20359 10.20359
#you get 10.20359 which is M_b[3]*log(5*6)
There is one last option, which I just learned of, which is to do lapply which no longer does lazy evaluation:
func = function(i){function(e_b,mu_b){i*log(e_b*mu_b) }}
log_likelihood = lapply(1:3,function(idx)func(M_b[idx]))
sapply(log_likelihood,function(f)f(5,6))
[1] 3.401197 6.802395 10.203592

Related

How to pass a parameter name as an argument to a function

I understand this title may not make any sense. I searched everywhere but couldn't find an answer. What I'm trying to do is make a function that will take a parameter name for another function, a vector, and then keep calling that function with the parameter value equal to every item in the vector.
For simplicity's sake I'm not dealing with a vector below but just a single integer.
tuner <- function(param, a, ...) {
myfunction(param = a, ...)
}
and the code would effectively just run
myfunction(param = a)
I can't get this to work! The code actually runs but the resulting call completely ignores the parameter I put in and just runs
myfunction()
instead. Any solutions?
You can't really treat parameter names as variables that need to be evaluated in R. Onw work around would be to build a list of parameters and then pass that to do.call. For eample
myfunction <- function(x=1, y=5) {
x+y
}
tuner <- function(param, a, ...) {
do.call("myfunction", c(setNames(list(a), param), list(...)))
}
tuner("x", 100)
# [1] 105
tuner("y", 100)
# [1] 101
tuner("y", 100, 2)
# [1] 102
Another way using rlang would be
library(rlang)
tuner <- function(param, a, ...) {
args <- exprs(!!param := a, ...)
eval_tidy(expr(myfunction(!!!args)))
}
which would give the same results.

R: eval parse function call not accessing correct environments

I'm trying to read a function call as a string and evaluate this function within another function. I'm using eval(parse(text = )) to evaluate the string. The function I'm calling in the string doesn't seem to have access to the environment in which it is nested. In the code below, my "isgreater" function finds the object y, defined in the global environment, but can't find the object x, defined within the function. Does anybody know why, and how to get around this? I have already tried adding the argument envir = .GlobalEnv to both of my evals, to no avail.
str <- "isgreater(y)"
isgreater <- function(y) {
return(eval(y > x))
}
y <- 4
test <- function() {
x <- 3
return(eval(parse(text = str)))
}
test()
Error:
Error in eval(y > x) : object 'x' not found
Thanks to #MrFlick and #r2evans for their useful and thought-provoking comments. As far as a solution, I've found that this code works. x must be passed into the function and cannot be a default value. In the code below, my function generates a list of results with the x variable being changed within the function. If anyone knows why this is, I would love to know.
str <- "isgreater(y, x)"
isgreater <- function(y, x) {
return(eval(y > x))
}
y <- 50
test <- function() {
list <- list()
for(i in 1:100) {
x <- i
bool <- eval(parse(text = str))
list <- append(list, bool)
}
return(list)
}
test()
After considering the points made by #r2evans, I have elected to change my approach to the problem so that I do not arrive at this string-parsing step. Thanks a lot, everyone.
I offer the following code, not as a solution, but rather as an insight into how R "works". The code does things that are quite dangerous and should only be examined for its demonstration of how to assert a value for x. Unfortunately, that assertion does destroy the x-value of 3 inside the isgreater-function:
str <- "isgreater(y)"
isgreater <- function(y) {
return(eval( y > x ))
}
y <- 4
test <- function() {
environment(isgreater)$x <- 5
return(eval(parse(text = str) ))
}
test()
#[1] FALSE
The environment<- function is used in the R6 programming paradigm. Take a look at ?R6 if you are interested in working with a more object-oriented set of structures and syntax. (I will note that when I first ran your code, there was an object named x in my workspace and some of my efforts were able to succeed to the extent of not throwing an error, but they were finding that length-10000 vector and filling up my console with logical results until I escaped the console. Yet another argument for passing both x and y to isgreater.)

R, pass-by-value inside a function

Suppose you define a function in R using the following code:
a <- 1
f <- function(x) x + a
If you latter redefine a you will change the function f. (So, f(1) = 2 as given but if you latter on redefine a =2 then f(1) = 3. Is there a way to force R to use the value of a at the time it compiles the function? (That is, f would not change with latter redefinitions of a).
The above is the shortest case I could thought of that embodies the problem I am having. More specifically, as requested, my situation is:
I am working with a bunch of objects I am calling "person". Each person is defined as a probability distribution that depends on a n dimensional vector $a$ and a n dimensional vector of constrains w (the share of wealth).
I want to create a "society" with N people, that is a list of N persons. To that end, I created two n by N matrices A and W. I now loop over 1 to N to create the individuals.
Society <- list()
### doesn't evaluate theta at the time, but does w...
for (i in 1:Npeople) {
w <- WealthDist[i,]
u <- function(x) prod(x^A[i,])
P <- list(u,w)
names(P) <- c("objective","w")
Society[[length(Society)+1]] <- P
}
w gets is pass-by-value, so each person gets the right amount of wealth. But A is pass-by-reference -- everybody is being assigned the same function u (namely, the function using i = N)
To finish it up, the next steps are to get the Society and, via two optimizations get an "equilibrium point".
You can create a function which uses a locked binding and creates a function to complete your purpose. The former value of a will be used for w which will be stored in the environment of the function and will not be replaced by further values changes of a.
a <- 1
j <- new.env() # create a new environment
create.func <- function () {
j$w <<- a
function (x) {
x+ j$w
}
}
f <- create.func()
a <- 2
f(2)
[1] 3 # if w was changed this should be 4
Credits to Andrew Taylor (see comments)
EDIT: BE CAREFUL: f will change if you call create.func, even if you do not store it into f. To avoid this, you could write this code (it clearly depends on what you want).
a <- 1
create.func <- function (x) {
j <- new.env()
j$w <- a
function (x) {
x + j$w
}
}
f <- create.func()
f(1)
[1] 2
a <- 2
q <- create.func()
q(1)
[1] 3
f(1)
[1] 2
EDIT 2: Lazy evaluation doesn't apply here because a is evaluated by being set to j$w. If you had used it as an argument say:
function(a)
function(x)
#use a here
you would have to use force before defining the second function, because then it wouldn't be evaluated.
EDIT 3: I removed the foo <- etc. The function will return as soon as it is declared, since you want it to be similar to the code factories defined in your link.
EDIT by OPJust to add to the accepted answer that in spirit of
Function Factory in R
the code below works:
funs.gen <- function(n) {
force(n)
function(x) {
x + n
}
}
funs = list()
for (i in seq(length(names))) {
n = names[i]
funs[[n]] = funs.gen(i)
}
R doesn't do pass by reference; everything is passed to functions by value. As you've noticed, since a is defined in the global environment, functions which reference a are referencing the global value of a, which is subject to change. To ensure that a specific value of a is used, you can use it as a parameter in the function.
f <- function(x, a = 1) {
x + a
}
This defines a as a parameter that defaults to 1. The value of a used by the function will then always be the value passed to the function, regardless of whether a is defined in the global environment.
If you're going to use lapply(), you simply pass a as a parameter to lapply().
lapply(X, f, a = <value>)
Define a within f
f <- function(x) {a<-1;x + a}

combination of expand.grid and mapply?

I am trying to come up with a variant of mapply (call it xapply for now) that combines the functionality (sort of) of expand.grid and mapply. That is, for a function FUN and a list of arguments L1, L2, L3, ... of unknown length, it should produce a list of length n1*n2*n3 (where ni is the length of list i) which is the result of applying FUN to all combinations of the elements of the list.
If expand.grid worked to generate lists of lists rather than data frames, one might be able to use it, but I have in mind that the lists may be lists of things that won't necessarily fit into a data frame nicely.
This function works OK if there are exactly three lists to expand, but I am curious about a more generic solution. (FLATTEN is unused, but I can imagine that FLATTEN=FALSE would generate nested lists rather than a single list ...)
xapply3 <- function(FUN,L1,L2,L3,FLATTEN=TRUE,MoreArgs=NULL) {
retlist <- list()
count <- 1
for (i in seq_along(L1)) {
for (j in seq_along(L2)) {
for (k in seq_along(L3)) {
retlist[[count]] <- do.call(FUN,c(list(L1[[i]],L2[[j]],L3[[k]]),MoreArgs))
count <- count+1
}
}
}
retlist
}
edit: forgot to return the result. One might be able to solve this by making a list of the indices with combn and going from there ...
I think I have a solution to my own question, but perhaps someone can do better (and I haven't implemented FLATTEN=FALSE ...)
xapply <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
inds <- do.call(expand.grid,lapply(L,seq_along)) ## Marek's suggestion
retlist <- list()
for (i in 1:nrow(inds)) {
arglist <- mapply(function(x,j) x[[j]],L,as.list(inds[i,]),SIMPLIFY=FALSE)
if (FLATTEN) {
retlist[[i]] <- do.call(FUN,c(arglist,MoreArgs))
}
}
retlist
}
edit: I tried #baptiste's suggestion, but it's not easy (or wasn't for me). The closest I got was
xapply2 <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
xx <- do.call(expand.grid,L)
f <- function(...) {
do.call(FUN,lapply(list(...),"[[",1))
}
mlply(xx,f)
}
which still doesn't work. expand.grid is indeed more flexible than I thought (although it creates a weird data frame that can't be printed), but enough magic is happening inside mlply that I can't quite make it work.
Here is a test case:
L1 <- list(data.frame(x=1:10,y=1:10),
data.frame(x=runif(10),y=runif(10)),
data.frame(x=rnorm(10),y=rnorm(10)))
L2 <- list(y~1,y~x,y~poly(x,2))
z <- xapply(lm,L2,L1)
xapply(lm,L2,L1)
#ben-bolker, I had a similar desire and think I have a preliminary solution worked out, that I've also tested to work in parallel. The function, which I somewhat confusingly called gmcmapply (g for grid) takes an arbitrarily large named list mvars (that gets expand.grid-ed within the function) and a FUN that utilizes the list names as if they were arguments to the function itself (gmcmapply will update the formals of FUN so that by the time FUN is passed to mcmapply it's arguments reflect the variables that the user would like to iterate over (which would be layers in a nested for loop)). mcmapply then dynamically updates the values of these formals as it cycles over the expanded set of variables in mvars.
I've posted the preliminary code as a gist (reprinted with an example below) and would be curious to get your feedback on it. I'm a grad student, that is self-described as an intermediately-skilled R enthusiast, so this is pushing my R skills for sure. You or other folks in the community may have suggestions that would improve on what I have. I do think even as it stands, I'll be coming to this function quite a bit in the future.
gmcmapply <- function(mvars, FUN, SIMPLIFY = FALSE, mc.cores = 1, ...){
require(parallel)
FUN <- match.fun(FUN)
funArgs <- formals(FUN)[which(names(formals(FUN)) != "...")] # allow for default args to carry over from FUN.
expand.dots <- list(...) # allows for expanded dot args to be passed as formal args to the user specified function
# Implement non-default arg substitutions passed through dots.
if(any(names(funArgs) %in% names(expand.dots))){
dot_overwrite <- names(funArgs[which(names(funArgs) %in% names(expand.dots))])
funArgs[dot_overwrite] <- expand.dots[dot_overwrite]
#for arg naming and matching below.
expand.dots[dot_overwrite] <- NULL
}
## build grid of mvars to loop over, this ensures that each combination of various inputs is evaluated (equivalent to creating a structure of nested for loops)
grid <- expand.grid(mvars,KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
# specify formals of the function to be evaluated by merging the grid to mapply over with expanded dot args
argdefs <- rep(list(bquote()), ncol(grid) + length(expand.dots) + length(funArgs) + 1)
names(argdefs) <- c(colnames(grid), names(funArgs), names(expand.dots), "...")
argdefs[which(names(argdefs) %in% names(funArgs))] <- funArgs # replace with proper dot arg inputs.
argdefs[which(names(argdefs) %in% names(expand.dots))] <- expand.dots # replace with proper dot arg inputs.
formals(FUN) <- argdefs
if(SIMPLIFY) {
#standard mapply
do.call(mcmapply, c(FUN, c(unname(grid), mc.cores = mc.cores))) # mc.cores = 1 == mapply
} else{
#standard Map
do.call(mcmapply, c(FUN, c(unname(grid), SIMPLIFY = FALSE, mc.cores = mc.cores)))
}
}
example code below:
# Example 1:
# just make sure variables used in your function appear as the names of mvars
myfunc <- function(...){
return_me <- paste(l3, l1^2 + l2, sep = "_")
return(return_me)
}
mvars <- list(l1 = 1:10,
l2 = 1:5,
l3 = letters[1:3])
### list output (mapply)
lreturns <- gmcmapply(mvars, myfunc)
### concatenated output (Map)
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE)
## N.B. This is equivalent to running:
lreturns <- c()
for(l1 in 1:10){
for(l2 in 1:5){
for(l3 in letters[1:3]){
lreturns <- c(lreturns,myfunc(l1,l2,l3))
}
}
}
### concatenated outout run on 2 cores.
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE, mc.cores = 2)
Example 2. Pass non-default args to FUN.
## Since the apply functions dont accept full calls as inputs (calls are internal), user can pass arguments to FUN through dots, which can overwrite a default option for FUN.
# e.g. apply(x,1,FUN) works and apply(x,1,FUN(arg_to_change= not_default)) does not, the correct way to specify non-default/additional args to FUN is:
# gmcmapply(mvars, FUN, arg_to_change = not_default)
## update myfunc to have a default argument
myfunc <- function(rep_letters = 3, ...){
return_me <- paste(rep(l3, rep_letters), l1^2 + l2, sep = "_")
return(return_me)
}
lreturns <- gmcmapply(mvars, myfunc, rep_letters = 1)
A bit of additional functionality I would like to add but am still trying to work out is
cleaning up the output to be a pretty nested list with the names of mvars (normally, I'd create multiple lists within a nested for loop and tag lower-level lists onto higher level lists all the way up until all layers of the gigantic nested loop were done). I think using some abstracted variant of the solution provided here will work, but I haven't figured out how to make the solution flexible to the number of columns in the expand.grid-ed data.frame.
I would like an option to log the outputs of the child processesthat get called in mcmapply in a user-specified directory. So you could look at .txt outputs from every combination of variables generated by expand.grid (i.e. if the user prints model summaries or status messages as a part of FUN as I often do). I think a feasible solution is to use the substitute() and body() functions, described here to edit FUN to open a sink() at the beginning of FUN and close it at the end if the user specifies a directory to write to. Right now, I just program it right into FUN itself, but later it would be nice to just pass gmcmapply an argument called something like log_children = "path_to_log_dir. and then editing the body of the function to (pseudocode) sink(file = file.path(log_children, paste0(paste(names(mvars), sep = "_"), ".txt")
Let me know what you think!
-Nate

Writing a function for initializing parameters in R/Splus

I'd like to write a function that will create and return a set of parameters to be used in a function mySimulation I've created. Until now, I've basically been doing, e.g., mySimulation(parm1 = 3, parm2 = 4). This is now suboptimal because (1) in the actual version, the number of parameters is becoming unwieldy and (2) I'd like to keep track of different combinations of the parameters that produce the different models I'm using. So, I wrote createParms (a minimally sufficient version shown below) to do the trick. My whole approach just seems so clunky though. With all the statisticians using R, I'm sure there's a more standard way of handling my issue...right?
createParms <- function(model = "default", ...) {
# Returns a list `parms` of parameters which will then be used in
# mySimultation(parms)
#
# Args:
# model: ["default" | "mymodel"] character string representation of a model
# with known parameters
# ...: parameters of the existing `model` to overwrite.
# if nothing is supplied then the model parameters will be left as is.
# passed variables must be named.
# e.g., `parm1 = 10, parm2 = 20` is good. `10, 20` is bad.
#
# Returns:
# parms: a list of parameters to be used in mySimulation(parms)
#
parms.names <- c("parm1", "parm2")
parms <- vector(mode = "list", length = length(parms.names))
names(parms) <- parms.names
overwrite <- list(...)
overwrite.names <- names(overwrite)
if (model == "default") {
parms$parm1 <- 0
parms$parm2 <- 0
} else if (model == "mymodel") {
parms$parm1 <- 1
parms$parm2 <- 2
}
if (length(overwrite) != 0) {
parms[overwrite.names] <- overwrite
}
return(parms)
}
I think if you know the combination of parameters to be used for each model, then it is better to create a data frame of model names and parameters as shown below
# create a data frame with model names and parameters
# NOTE: i am assuming all models have equal number of parameters
# if they are unequal, then store as list of models
model = c('default', 'mymodel');
parm1 = c(0.5, 0.75);
parm2 = c(1, 2);
models.df = data.frame(model, parm1, parm2)
You can now simulate any of the models by passing it as an argument to your mySimulation function. I have used a dummy simulation example, which you can replace with your code.
# function to run simulation based on model name
mySimulation = function(model = 'default'){
# find row corresponding to model of interest
mod.row = match(model, models.df$model)
# extract parameters corresponding to model
parms = models.df[mod.row, -1]
# run dummy simulation of drawing normal random variables
sim.df = rnorm(100, mean = parms[,1], sd = parms[,2])
return(sim.df)
}
If you now want to run all your simulations in one step, you can use the excellent plyr package and invoke
library(plyr)
sim.all = ldply(models.df$model, mySimulation)
If each of your simulations returns unequal number of values then you can use the function llply instead of ldply.
If you provide more information about the return values of your simulation and details on what it does, this code can be easily tweaked to get what you want.
Let me know if this works
If the simulation function always takes the same set of arguments, then Ramnath's approach of storing them in a data frame is best. For the more general case of variable inputs to mySimulation, you should store each set of inputs in a list – probably using a list of lists for running several simluations.
The idea behind your createParms function looks sound; you can simplify the code a little bit.
createParms <- function(model = "default", ...)
{
#default case
parms <- list(
parm1 = 0,
parm2 = 0
)
#other special cases
if(model == "mymodel")
{
parms <- within(parms,
{
parm1 <- 1
parm2 <- 2
})
}
#overwrite from ...
dots <- list(...)
parms[names(dots)] <- dots
parms
}
Test this with, e.g.,
createParms()
createParms("mymodel")
createParms("mymodel", parm2 = 3)
do.call may come in handy for running your simulation, as in
do.call(mySimulation, createParms())
EDIT: What do.call does for you
If you have parms <- createParms(), then
do.call(mySimulation, parms)
is the same as
with(parms, mySimulation(parm1, parm2))
The main advantage is that you don't need to spell out each parameter that you are passing into mySimulation (or to modify that function to accept the parameters in list form).

Resources