cannot combine with and functions in R - r

Could somebody please point out to me why is that the following example does not work:
df <- data.frame(ex =rep(1,5))
sample.fn <- function(var1) with(df, mean(var1))
sample.fn(ex)
It seems that I am using the wrong syntax to combine with inside of a function.
Thanks,

This is what I meant by learning to use "[" (actually"[["):
> df <- data.frame(ex =rep(1,5))
> sample.fn <- function(var1) mean(df[[var1]])
> sample.fn('ex')
[1] 1
You cannot use an unquoted ex since there is no object named 'ex', at least not at the global environment where you are making the call to sample.fn. 'ex' is a name only inside the environment of the df-dataframe and only df itself is "visible" when the sample.fn-function is called.
Out of interest, I tried using the method that the with.default function uses to build a function taking an unquoted expression argument in the manner you were expecting:
samp.fn <- function(expr) mean(
eval(substitute(expr), df, enclos = parent.frame())
)
samp.fn(ex)
#[1] 1
It's not a very useful function, since it would only be applicable when there was a dataframe named 'df' in the parent.frame(). And apologies for incorrectly claiming that there was a warning on the help page. As #rawr points out the warning about using functions that depend on non-standard evaluation appears on the subset page.

Related

How do you call a function that takes no inputs within a pipeline?

I tried searching for this but couldn't find any similar questions. Let's say, for the sake of a simple example, I want to do the following using dplyr's pipe %>%.
c(1,3,5) %>% ls() %>% mean()
Setting aside what the use case would be for a pipeline like this, how can I call a function "mid-pipeline" that doesn't need any inputs coming from the left-hand side and just pass them along to the next function in the pipeline? Basically, I want to put an "intermission" or "interruption" of sorts into my pipeline, let that function do its thing, and then continue on my merry way. Obviously, the above doesn't actually work, and I know the T pipe %T>% also won't be of use here because it still expects the middle function to need inputs coming from the lhs. Are there options here shy of assigning intermediate objects and restarting the pipeline?
With the ‘magrittr’ pipe operator you can put an operand inside {…} to prevent automatic argument substitution:
c(1,3,5) %>% {ls()} %>% mean()
# NA
# Warning message:
# In mean.default(.) : argument is not numeric or logical: returning NA
… but of course this serves no useful purpose.
Incidentally, ls() inside a pipeline is executed in its own environment rather than the calling environment so its use here is even less useful. But a different function that returned a sensible value could be used, e.g.:
c(1,3,5) %>% {rnorm(10)} %>% mean()
# [1] -0.01068046
Or, if you intended for the left-hand side to be passed on, skipping the intermediate ls(), you could do the following:
c(1,3,5) %>% {ls(); .} %>% mean()
# [1] 3
… again, using ls() here won’t be meaningful but some other function that has a side-effect would work.
You could define an auxilliary function like this, that takes an argument it doesn't use in order to allow it to fit in the pipe:
ls_return_x <- function(x){
print(ls())
x
}
c(1,3,5) %>% ls() %>% mean()
Note, the ls() call in this example will print the objects in the environment within the ls_return_x() function. Check out the help page for ls() if you want to print the environment from the global environment.
I don't know if there is an inbuilt function but you could certainly create a helper function for this
> callfun <- function(x, fun){fun(); return(x)}
> c(1, 3, 5) %>% callfun(fun = ls) %>% mean()
# [1] 3
I don't really see the point but hey - it's your life.

dplyr, rlang: Unable to predict if minor varients of passing names to nested dplyr functions will work

Data for reproducibility
.i <- tibble(a=2*1:4+1, b=2*1:4)
This function is supposed to take its data and other arguments as unquoted names, find those names in the data, and use them to add a column and filter out the
top row. It does not work. Mutate says it can not find a.
t1 <- function(.j=.i, X=a, Y=b){
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=.j, pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
This function, which I found by typo -- note the .i instead of .j in the mutate statement -- does what the previous function was supposed to do. And I don't know why. I think it is skipping over the function arguments and finding .i in the global environment. Or maybe it is using a ouiji board.
t2 <- function(.j=.i, X=a, Y=b){
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=.i, pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
Since mutate could not find .j when passed to it in the usual R way, maybe it needs to be passed in an rlang-style quosure, like the formals X and Y. This function also does not work, with UQ in mutate saying that it can not find a. Like the first function above, it works if the .j in mutate is replaced with a .i. (Seems like there should be an "enquos" to parallel quos).
t3 <- function(.j=.i, X=a, Y=b){
e_j <- enquo(.j)
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=UQ(.j), pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
Finally, it appears that, once the .i substitution in mutate is made, t4() no longer needs a data argument at all. See below, where I replace it with bop_foo_foo. If, however, you replace bop_foo_foo throughout with the name of the data, .i, (t5()) then UQ again fails to find a.
bop_foo_foo <- 0
t4 <- function(bop_foo_foo, X=a, Y=b){
e_j <- enquo(bop_foo_foo)
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=UQ(.i), pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
The functions above seem to me to be relatively minor variants on a single function. I have run dozens more, and although I have observed some patterns,
and read the enquo and UQ help files I do not know how many times, a real
understanding continues to elude me.
I would like to know why the functions above that that don't work don't, and why the ones that do work do. I don't necessarily need a function by function critique. If you can state general principles that embody the required, understanding, that would be delightful. And more than sufficient.
I think it is skipping over the function arguments and finding .i in the global environment.
Yes, scope of symbols in R is hierarchical. The variables local to a function are looked up first, and then the surrounding environment of the function is inspected, and so on.
mutate(.data = UQ(.j), ...)
I think you are missing the difference between regular arguments and (quasi)quoted arguments. Unquoting is only relevant for quasiquoted arguments. Since the .data argument of mutate() is not quasiquoted it does not make sense to try and unquote stuff. The quasiquoted arguments are the ones that are captured/quoted with enexpr() or enquo(). You can tell whether an argument is quasiquoted either by looking at the documentation or by recognising that the argument supports direct references to columns (regular arguments need to be explicit about where to find the columns).
In the next version of rlang, the exported UQ() function will throw an error to make it clear that it should not be called directly and that it can only be used in quasiquoted arguments.
I would suggest:
Call the first argument of your function data or df rather than .i.
Don't give it a default. The user should always supply the data.
Don't capture it with enquo() or enexpr() or substitute(). Instead pass it directly to the data argument of other verbs.
Once this is out of the way it will be easier to work out the rest.

Evaluating a function that is an argument in another function using quo() in R

I have made a function that takes as an argument another function, the argument function takes as its argument some object (in the example a vector) which is supplied by the original function. It has been challenging to make the function call in the right way. Below are three approaches I have used after having read Programming with dplyr.
Only Option three works,
I would like to know if this is in fact the best way to evaluate a function within a function.
library(dplyr);library(rlang)
#Function that will be passed as an argument
EvaluateThis1 <- quo(mean(vector))
EvaluateThis2 <- ~mean(vector)
EvaluateThis3 <- quo(mean)
#First function that will recieve a function as an argument
MyFunc <- function(vector, TheFunction){
print(TheFunction)
eval_tidy(TheFunction)
}
#Second function that will recieve a function as an argument
MyFunc2 <- function(vector, TheFunction){
print(TheFunction)
quo(UQ(TheFunction)(vector)) %>%
eval_tidy
}
#Option 1
#This is evaluating vector in the global environment where
#EvaluateThis1 was captured
MyFunc(1:4, EvaluateThis1)
#Option 2
#I don't know what is going on here
MyFunc(1:4, EvaluateThis2)
MyFunc2(1:4, EvaluateThis2)
#Option 3
#I think this Unquotes the function splices in the argument then
#requotes before evaluating.
MyFunc2(1:4, EvaluateThis3)
My question is:
Is option 3 the best/most simple way to perform this evaluation
An explanation of what is happening
Edit
After reading #Rui Barradas very clear and concise answer I realised that I am actually trying to do someting similar to below which I didn't manage to make work using Rui's method but solved using environment setting
OtherStuff <-c(10, NA)
EvaluateThis4 <-quo(mean(c(vector,OtherStuff), na.rm = TRUE))
MyFunc3 <- function(vector, TheFunction){
#uses the captire environment which doesn't contain the object vector
print(get_env(TheFunction))
#Reset the enivronment of TheFunction to the current environment where vector exists
TheFunction<- set_env(TheFunction, get_env())
print(get_env(TheFunction))
print(TheFunction)
TheFunction %>%
eval_tidy
}
MyFunc3(1:4, EvaluateThis4)
The function is evaluated within the current environment not the capture environment. Because there is no object "OtherStuff" within that environment, the parent environments are searched finding "OtherStuff" in the Global environment.
I will try to answer to question 1.
I believe that the best and simpler way to perform this kind of evaluation is to do without any sort of fancy evaluation techniques. To call the function directly usually works. Using your example, try the following.
EvaluateThis4 <- mean # simple
MyFunc4 <- function(vector, TheFunction){
print(TheFunction)
TheFunction(vector) # just call it with the appropriate argument(s)
}
MyFunc4(1:4, EvaluateThis4)
function (x, ...)
UseMethod("mean")
<bytecode: 0x000000000489efb0>
<environment: namespace:base>
[1] 2.5
There are examples of this in base R. For instance approxfun and ecdf both return functions that you can use directly in your code to perform subsequent calculations. That's why I've defined EvaluateThis4 like that.
As for functions that use functions as arguments, there are the optimization ones, and, of course, *apply, byand ave.
As for question 2, I must admit to my complete ignorance.

'=' vs. '<-' as a function argument in R

I am a beginner so I'd appreciate any thoughts, and I understand that this question might be too basic for some of you.
Also, this question is not about the difference between <- and =, but about the way they get evaluated when they are part of the function argument. I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
Here's the first line of code:
My objective is to get rid of variables in the environment. From reading the above thread, I would believe that <- would exist in the user workspace, so there shouldn't be any issue with deleting all variables.
Here is my code and two questions:
Question 1
First off, this code doesn't work.
rm(ls()) #throws an error
I believe this happens because ls() returns a character vector, and rm() expects an object name. Am I correct? If so, I would appreciate if someone could guide me how to get object names from character array.
Question 2
I googled this topic and found that this code below deletes all variables.
rm(list = ls())
While this does help me, I am unsure why = is used instead of <-. If I run the following code, I get an error Error in rm(list <- ls()) : ... must contain names or character strings
rm(list <- ls())
Why is this? Can someone please guide me? I'd appreciate any help/guidance.
I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
No wonder, since the answers there are actually quite confusing, and some are outright wrong. Since that’s the case, let’s first establish the difference between them before diving into your actual question (which, it turns out, is mostly unrelated):
<- is an assignment operator
In R, <- is an operator that performs assignment from right to left, in the current scope. That’s it.
= is either an assignment operator or a distinct syntactic token
=, by contrast, has several meanings: its semantics change depending on the syntactic context it is used in:
If = is used inside a parameter list, immediately to the right of a parameter name, then its meaning is: “associate the value on the right with the parameter name on the left”.
Otherwise (i.e. in all other situations), = is also an operator, and by default has the same meaning as <-: i.e. it performs assignment in the current scope.
As a consequence of this, the operators <- and = can be used interchangeably1. However, = has an additional syntactic role in an argument list of a function definition or a function call. In this context it’s not an operator and cannot be replaced by <-.
So all these statements are equivalent:
x <- 1
x = 1
x[5] <- 1
x[5] = 1
(x <- 1)
(x = 1)
f((x <- 5))
f((x = 5))
Note the extra parentheses in the last example: if we omitted these, then f(x = 5) would be interpreted as a parameter association rather than an assignment.
With that out of the way, let’s turn to your first question:
When calling rm(ls()), you are passing ls() to rm as the ... parameter. Ronak’s answer explains this in more detail.
Your second question should be answered by my explanation above: <- and = behave differently in this context because the syntactic usage dictates that rm(list = ls()) associates ls() with the named parameter list, whereas <- is (as always) an assignment operator. The result of that assignment is then once again passed as the ... parameter.
1 Unless somebody changed their meaning: operators, like all other functions in R, can be overwritten with new definitions.
To expand on my comment slightly, consider this example:
> foo <- function(a,b) b+1
> foo(1,b <- 2) # Works
[1] 3
> ls()
[1] "b" "foo"
> foo(b <- 3) # Doesn't work
Error in foo(b <- 3) : argument "b" is missing, with no default
The ... argument has some special stuff going on that restricts things a little further in the OP's case, but this illustrates the issue with how R is parsing the function arguments.
Specifically, when R looks for named arguments, it looks specifically for arg = val, with an equals sign. Otherwise, it is parsing the arguments positionally. So when you omit the first argument, a, and just do b <- 1, it thinks the expression b <- 1 is what you are passing for the argument a.
If you check ?rm
rm(..., list = character(),pos = -1,envir = as.environment(pos), inherits = FALSE)
where ,
... - the objects to be removed, as names (unquoted) or character strings (quoted).
and
list - a character vector naming objects to be removed.
So, if you do
a <- 5
and then
rm(a)
it will remove the a from the global environment.
Further , if there are multiple objects you want to remove,
a <- 5
b <- 10
rm(a, b)
This can also be written as
rm(... = a, b)
where we are specifying that the ... part in syntax takes the arguments a and b
Similarly, when we want to specify the list part of the syntax, it has to be given by
rm(list = ls())
doing list <- ls() will store all the variables from ls() in the variable named list
list <- ls()
list
#[1] "a" "b" "list"
I hope this is helpful.

Why subset doesn't mind missing subset argument for dataframes?

Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from.
Let
numbers <- c(1, 2, 3)
frame <- as.data.frame(numbers)
If I type
subset(numbers, )
(so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should):
Error in subset.default(numbers, ) :
argument "subset" is missing, with no default
However when I type
subset(frame,)
(so the same thing with a data.frame instead of a vector), it doesn't give an error but instead just returns the (full) dataframe.
What is going on here? Why don't I get my well deserved error message?
tl;dr: The subset function calls different functions (has different methods) depending on the type of object it is fed. In the example above, subset(numbers, ) uses subset.default while subset(frame, ) uses subset.data.frame.
R has a couple of object-oriented systems built-in. The simplest and most common is called S3. This OO programming style implements what Wickham calls a "generic-function OO." Under this style of OO, an object called a generic function looks at the class of an object and then applies the proper method to the object. If no direct method exists, then there is always a default method available.
To get a better idea of how S3 works and the other OO systems work, you might check out the relevant portion of the Advanced R site. The procedure of finding the proper method for an object is referred to as method dispatch. You can read more about this in the help file ?UseMethod.
As noted in the Details section of ?subset, the subset function "is a generic function." This means that subset examines the class of the object in the first argument and then uses method dispatch to apply the appropriate method to the object.
The methods of a generic function are encoded as
< generic function name >.< class name >
and can be found using methods(<generic function name>). For subset, we get
methods(subset)
[1] subset.data.frame subset.default subset.matrix
see '?methods' for accessing help and source code
which indicates that if the object has a data.frame class, then subset calls the subset.data.frame the method (function). It is defined as below:
subset.data.frame
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r & !is.na(r)
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
Note that if the subset argument is missing, the first lines
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
produce a vector of TRUES of the same length as the data.frame, and the last line
x[r, vars, drop = drop]
feeds this vector into the row argument which means that if you did not include a subset argument, then the subset function will return all of the rows of the data.frame.
As we can see from the output of the methods call, subset does not have methods for atomic vectors. This means, as your error
Error in subset.default(numbers, )
that when you apply subset to a vector, R calls the subset.default method which is defined as
subset.default
function (x, subset, ...)
{
if (!is.logical(subset))
stop("'subset' must be logical")
x[subset & !is.na(subset)]
}
The subset.default function throws an error with stop when the subset argument is missing.

Resources