I've came up to an interesting problem. I have a function of three variables, let's say (for simplicity and transparency) it is this:
my_fun <- function(a, b, c) paste(a, b, c, sep = '-')
I want to create multiple functions with only argument c for several combinations of a anb b. I am using functions map2 and partial (both from package purrr).
require(purrr)
funs <- map2(letters[1:5], LETTERS[1:5], partial, ...f = my_fun)
I would expect each function in the list of functions produce different output, but that is not true.
funs[[1]]('hi') # [1] "e-E-hi"
funs[[3]]('hi') # [1] "e-E-hi"
funs[[5]]('hi') # [1] "e-E-hi"
I am able to create different solution to my problem, so my question isn't "how to do it". I am rather interested in why it does this.
Another example using base mapply:
mapply(partial, letters[1:5], LETTERS[1:5], MoreArgs = list(...f = my_fun))[[1]]('hi')
# [1] "e-E-hi"
The problem stems from the fact that partial uses lazy evaluation, which within map2 means that it is storing .x and .y instead of a and A. Luckily there is a function argument for that, and we can use:
funs <- map2(letters[1:5], LETTERS[1:5], partial, ...f = my_fun, .lazy = FALSE)
funs[[1]]('hi')
# [1] "a-A-hi"
If you look at your version, we see this:
funs[[1]]
# function (...)
# my_fun(.x[[i]], .y[[i]], ...)
# <environment: 0x00000000201d9598>
And the same for each one of the other 4.
Now, if we look into that environment, we can see:
ls(envir = environment(funs[[1]]))
# [1] "i"
So there is an object stored i in there, that will determine which .x and .y we get and its value is:
get('i', environment(funs[[1]]))
# [1] 5
Also note that your arguments are stored there as well, but are hidden due to their starting with a .:
ls(envir = environment(funs[[1]]), all.names = TRUE)
# [1] "..." ".f" ".x" ".y" "i"
get('.x', envir = environment(funs[[1]]))
# [1] "a" "b" "c" "d" "e"
So for all of these, we get the same result. Specifically, the executed call ends up being:
my_fun(letters[1:5][[5]], LETTERS[1:5][[5]], 'hi')
The lazy evaluation is not playing nice here, and using the stored internal loop counter inside map2.
Related
I have been studying purrr family functions recently and while I was reading the documentation of map_if I came across an alternative definition form for .p argument aka. predicate function that I could not understand. It say:
"Alternatively, if the elements of .x are themselves lists of objects,
a string indicating the name of a logical element in the inner lists"
I was wondering if someone could tell me what it means and how I can go about using it while I deal with a list whose elements are also lists. Something like this:
x <- list(a = list(foo = 1:2, bar = 3:4), b = list(baz = 5:6))
A simple example would be much appreciated as I've done some research and could not find any indication of it.
Thank you very much in advance.
Though I am not fully sure what actually you want to understand, but taking the case of list of lists, we need to consider that here only map_if is available and pmap_if is not available. Let's take another list of lists than you have suggested.
x <- list(a = list(foo = 1:2, bar = 3:4), b = list(baz = 5:6), c = list(bird = 7:10))
Now map_if applies .f wherever .p is T. So if we want to take mean of all odd indexed lists in list x, we have to actually use nested map again.
see
map_if(x, as.logical(seq_along(x) %% 2) , ~map(.x, ~mean(.x)))
$a
$a$foo
[1] 1.5
$a$bar
[1] 3.5
$b
$b$baz
[1] 5 6
$c
$c$bird
[1] 8.5
we may also other predicate functions in .p. The below example produces same output.
map_if(x, names(x) %in% c("a", "c") , ~map(.x, ~mean(.x)))
Or if let's say x is named something like this
x <- list(val1 = list(foo = 1:2, bar = 3:4), ind1 = list(baz = 5:6), val2 = list(bird = 7:10))
then below syntax will produce similar results
map_if(x, str_detect(names(x), "val") , ~map(.x, ~mean(.x)))
I hope this is somewhat near to you may want to understand.
P.S. You can give it a read too.
It seems to refer to an inner variable name with a TRUE/FALSE value. Here is the basic example I created to test it.
Create a list where the inner list has boolean values for one variable:
A <- list(foo=list(x=1, y=TRUE), bar=list(x=2, y=FALSE))
Reference the boolean variable (y) as the .p predicate by passing a string with the variable name:
map_if(A, "y", as.character)
$foo
[1] "1" "TRUE"
$bar
$bar$x
[1] 2
$bar$y
[1] FALSE
So, it only modified the foo variable since y was TRUE and bar wasn't altered since y was FALSE.
As applied to the same R code or objects, quote and substitute typically return different objects. How can one make this difference apparent?
is.identical <- function(X){
out <- identical(quote(X), substitute(X))
out
}
> tmc <- function(X){
out <- list(typ = typeof(X), mod = mode(X), cls = class(X))
out
}
> df1 <- data.frame(a = 1, b = 2)
Here the printed output of quote and substitute are the same.
> quote(df1)
df1
> substitute(df1)
df1
And the structure of the two are the same.
> str(quote(df1))
symbol df1
> str(substitute(df1))
symbol df1
And the type, mode and class are all the same.
> tmc(quote(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"
> tmc(substitute(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"
And yet, the outputs are not the same.
> is.identical(df1)
[1] FALSE
Note that this question shows some inputs that cause the two functions to display different outputs. However, the outputs are different even when they appear the same, and are the same by most of the usual tests, as shown by the output of is.identical() above. What is this invisible difference, and how can I make it appear?
note on the tags: I am guessing that the Common LISP quote and the R quote are similar
The reason is that the behavior of substitute() is different based on where you call it, or more precisely, what you are calling it on.
Understanding what will happen requires a very careful parsing of the (subtle) documentation for substitute(), specifically:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
So there are essentially three options.
In this case:
> df1 <- data.frame(a = 1, b = 2)
> identical(quote(df1),substitute(df1))
[1] TRUE
df1 is an "ordinary variable", but it is called in .GlobalEnv, since env argument defaults to the current evaluation environment. Hence we're in the very last case where the symbol, df1, is left unchanged and so it identical to the result of quote(df1).
In the context of the function:
is.identical <- function(X){
out <- identical(quote(X), substitute(X))
out
}
The important distinction is that now we're calling these functions on X, not df1. For most R users, this is a silly, trivial distinction, but when playing with subtle tools like substitute it becomes important. X is a formal argument of a function, so that implies we're in a different case of the documented behavior.
Specifically, it says that now "the expression slot of the promise replaces the symbol". We can see what this means if we debug() the function and examine the objects in the context of the function environment:
> debugonce(is.identical)
> is.identical(X = df1)
debugging in: is.identical(X = df1)
debug at #1: {
out <- identical(quote(X), substitute(X))
out
}
Browse[2]>
debug at #2: out <- identical(quote(X), substitute(X))
Browse[2]> str(quote(X))
symbol X
Browse[2]> str(substitute(X))
symbol df1
Browse[2]> Q
Now we can see that what happened is precisely what the documentation said would happen (Ha! So obvious! ;) )
X is a formal argument, or a promise, which according to R is not the same thing as df1. For most people writing functions, they are effectively the same, but the internal implementation disagrees. X is a promise object, and substitute replaces the symbol X with the one that it "points to", namely df1. This is what the docs mean by the "expression slot of the promise"; that's what R sees when in the X = df1 part of the function call.
To round things out, try to guess what will happen in this case:
is.identical <- function(X){
out <- identical(quote(A), substitute(A))
out
}
is.identical(X = df1)
(Hint: now A is not a "bound symbol in the environment".)
A final example illustrating more directly the final case in the docs with the confusing exception:
#Ordinary variable, but in .GlobalEnv
> a <- 2
> substitute(a)
a
#Ordinary variable, but NOT in .GlobalEnv
> e <- new.env()
> e$a <- 2
> substitute(a,env = e)
[1] 2
Lets say I have a function named Fun1 within which I am using many different in-built functions of R for different different processes. Then how can I get a list of in-built functions used inside this function Fun1
Fun1 <- function(x,y){
sum(x,y)
mean(x,y)
c(x,y)
print(x)
print(y)
}
So My output should be like list of characters i.e. sum, mean, c, print. Because these are the in-built functions I have used inside function Fun1.
I have tried using grep function
grep("\\(",body(Fun1),value=TRUE)
# [1] "sum(x, y)" "mean(x, y)" "c(x, y)" "print(x)" "print(y)"
It looks ok, but arguments should not come i.e. x and y. Just the list of function names used inside body of function Fun1 here.
So my overall goal is to print the unique list of in-built functions or any create functions inside a particular function, here Fun1.
Any help on this is highly appreciated. Thanks.
You could use all.vars() to get all the variable names (including functions) that appear inside the body of Fun1, then compare that with some prepared list of functions. You mention in-built functions, so I will compare it with the base package object names.
## full list of variable names inside the function body
(vars <- all.vars(body(Fun1)[-1], functions = TRUE))
# [1] "sum" "x" "y" "mean" "c" "print"
## compare it with the base package object names
intersect(vars, ls(baseenv()))
# [1] "sum" "mean" "c" "print"
I removed the first element of the function body because presumably you don't care about {, which would have been matched against the base package list.
Another possibility, albeit a bit less reliable, would be to compare the formal arguments of Fun1 to all the variable names in the function. Like I said, likely less reliable though because if you make assignments inside the function you will end up with incorrect results.
setdiff(vars, names(formals(Fun1)))
# [1] "sum" "mean" "c" "print"
These are fun though, and you can fiddle around with them.
Access to the parser tokens is available with functions from utils.
tokens <- utils::getParseData(parse(text=deparse(body(Fun1))))
unique(tokens[tokens[["token"]] == "SYMBOL_FUNCTION_CALL", "text"])
[1] "sum" "mean" "c" "print"
This should be somewhat helpful - this will return all functions however.
func_list = Fun1 %>%
body() %>% # extracts function
toString() %>% # converts to single string
gsub("[{}]", "", .) %>% # removes curly braces
gsub("\\s*\\([^\\)]+\\)", "", .) %>% # removes all contents between brackets
strsplit(",") %>% # splits strings at commas
unlist() %>% # converts to vector
trimws(., "both") # removes all white spaces before and after`
[1] "" "sum" "mean" "c" "print" "print"
> table(func_list)
func_list
c mean print sum
1 1 1 2 1
This is extremely limited to your example... you could modify this to be more robust. It will fall over where a function has brackets nesting other functions etc.
this is not so beautiful but working:
Fun1 <- function(x,y){
sum(x,y)
mean(x,y)
c(x,y)
print(x)
print(y)
}
getFNamesInFunction <- function(f.name){
f <- deparse(body(get(f.name)))
f <- f[grepl(pattern = "\\(", x = f)]
f <- sapply(X = strsplit(split = "\\(", x = f), FUN = function(x) x[1])
unique(trimws(f[f != ""]))
}
getFNamesInFunction("Fun1")
[1] "sum" "mean" "c" "print"
as.list(Fun1)[3]
gives you the part of the function between the curly braces.
{
sum(x, y)
mean(x, y)
c(x, y)
print(x)
print(y)
}
Hence
gsub( ").*$", "", as.list(Fun1)[3])
gives you everything before the first " ) " appears which is presumable the name of the first function.
Taking this as a starting point you should be able to include a loop which gives you the other functions and not the first only the first one.
In R, I have a list of functions (strategies for a simulation). For example:
a <- function(x){
return(x)
}
b <- function(y){
return(y)
}
funclist <- list(a,b)
I'd like to write some code that returns the name of each function. Normally, for functions I would use:
as.character(substitute(a))
But this does not work for the list, as it just would return the list name (as expected). I then tried lapply:
> lapply(X = funclist,FUN = substitute)
Error in lapply(X = funclist, FUN = substitute) :
'...' used in an incorrect context
But get the above error.
Ideally I would get (lapply solution):
[[1]]
[1] "a"
[[2]]
[1] "b"
or even (sapply solution):
[1] "a" "b"
After you do
funclist <- list(a,b)
The parameters a and b are evaluated and the functions they point to are returned. There is no way to get back to the original names. (The substitute() "trick" works on parameters passed to functions as promises. It will not work on evaluated called without additional escaping.)
If you want to retain names, it's best to use a named list. You can do
funclist <- list(a=a,a=b)
or
funclist <- setNames(list(a,b), c("a","b"))
or even use mget() here
funclist <- mget(c("a","b"))
All these methods will returned a named list and you can use
names(funclist)
# [1] "a" "b"
to get the names
I have a series of objects storing the results of some statistical models in my workspace. Call them "model1", "model2", etc. Each of these models has the same set of named elements attached, $coef, for example. I would like to extract into a list or vector the values stored in a particular element from all objects containing the string "model".
The following code entered at the command line does what I want:
unlist(lapply(parse(text = paste0(ls()[grep("model", ls() )], "$", "coef")), eval))
From this, I've created the following generic function:
get.elements <- function(object, element) {
unlist(lapply(parse(text = paste0(ls()[grep(object, ls() )], "$", element)), eval))
}
However, when I run this function I get the following error:
Error in parse(text = paste0(ls()[grep(object, ls() )], "$", element)) :
<text>:1:1: unexpected '$'
1: $
^
Q1. Why does this code work when run from the command line but not as a function, and more importantly, how do I fix it?
Q2. Even better, is there a simpler method that will accomplish the same thing? This seems like such a common task for statisticians and simulation modelers that I would expect some kind of command in the base package, yet I've been unable to find anything. Surely there must be a more elegant way to do this than my cumbersome method.
Thanks for all help.
--Dave
Q1) The code fails because ls() looks in the environment of the function and since there are no matching objects there,
paste0(ls()[grep(object, ls() )], "$", element)
is equivalent to
paste0("$", element)
To get ls() to look in your workspace, you'd need ls(pos = 1).
Q2) This is a common task, but as far as know there isn't a function to do this because where the models are, what they are called, what objects you want to extract and how you want them returned will depend on your requirements. A slightly neater version of what you propose above would be
nm <- paste0("model", 1:2) # adjust numbers as required
unlist(lapply(nm, function(x) get(nm)$coef))
Alternatively you could put your models in a list and use
modList <- list(model1, model2)
unlist(lapply(modList, "[[", "coefficients"))
You can use map from the purrr package.
l1 <- list(a1 = 3, a2 = 2)
l2 <- list(a1 = 334, a2 = 34)
l3 <- list(a1 = 90, a2 = 112)
l <- list(l1, l2, l3)
purrr::map(l, "a1")
this gives:
### Result
[[1]]
[1] 3
[[2]]
[1] 334
[[3]]
[1] 90
This is a way to obtain the desired output:
get.elements <- function(object, element) {
unlist(lapply(ls(pattern = object, .GlobalEnv),
function(x) get(x, .GlobalEnv)[[element, exact = FALSE]]))
}
Both element and object are character strings.
Note that I used the argument exact = FALSE, since the element is named coefficients, not coef. In this way, you can still use element = "coef".