input variables in R functions from third party libraries - r

I am a reasonably proficient python programmer messing around with some R.
On this website, for the third party library ICC, I'm confused about input variables for the function ICCest.
Located here:
http://www.inside-r.org/packages/cran/ICC/docs/ICCest
I can use:
ICCest(Chick, weight, data=ChickWeight, CI.type="S")
And I got this to work. Chick and weight are column names for the data frame variable called ChickWeight. All is well and good.
Except, that, what type of variables are "Chick" and "weight"?? They aren't in my R namespace. They aren't strings because they don't have quotes around them.
Doing:
ICCest(Chick, "weight", data=ChickWeight, CI.type="S")
yields:
In ICCest(Chick, "weight", data = ChickWeight, CI.type = "S") :
passing a character string to 'y' is deprecated since ICC vesion 2.3.0 and will not be supported in future versions. The argument to 'y' should either be an unquoted column name of 'data' or an object
So again in my nice friendly python land you can't pass in unquoted characters strings that are not objects in your namespace so I am quite confused.
What is happening here?

You can take a look at the function's code by typing ICCest (without the parantheses):
> ICCest
Object with tracing code, class "functionWithTrace"
Original definition:
function (x, y, data = NULL, alpha = 0.05, CI.type = c("THD", "Smith")){
square <- function(z) {
z^2
}
icall <- list(y = substitute(y), x = substitute(x))
if (is.character(icall$y)) {
warning("passing a character string to 'y' is deprecated since ICC vesion 2.3.0 and will not be supported in future versions. The argument to 'y' should either be an unquoted column name of 'data' or an object")
if (missing(data))
stop("Supply either the unquoted name of the object containing 'y' or supply both 'data' and then 'y' as an unquoted column name to 'data'")
icall$y <- eval(as.name(y), data, parent.frame())
} ...
what happens after the square function block, is that the input is stored in icall in a parse tree, which you can think of as a set of unevaluated expressions. So there's no error when you pass plain weight without the quotation marks, because at this point, there hasn't been an attempt to evaluate the expressions yet. (I'm a bit unsure about this last statement. I hope someone can confirm if it is technically correct)
Inside the if block (where your warning is raised), you can see that they are using eval to update the local variable icall$y. What eval does is essentially evaluating an expression within an environment. Specifically, in the environment of a dataframe, the column names are considered part of the environment.
Now it says in the documentation, that eval takes an expression as its first input. This is why y is cast to an object with as.name before being passed to eval (remember that we are in the if block for string input y)
eval(expr, envir = parent.frame(),...)
And expressions and strings are different in R. So in the last line of code shown above, the y input (here, weight) is being evaluated in the data environment --which, here, is ChickWeight.
To get a better feeling, try this:
> eval(weight, ChickWeight)
Error in eval(weight, ChickWeight) : object 'weight' not found
But if you make an unevaluated expression first, it will work:
> expr <- quote(weight)
> eval(expr, ChickWeight)
Here, quote is doing roughly the same thing as substitute in the 4th line of the function. Check here for more on quote and substitute\.

Why are you passing your y as a quoted string. The function doesn't appear to require quoted strings for variable names. Doing
str(ChickWeight)
will give you the types for the variables. They aren't in a 'name space' because they are variable names in the data.frame ChickWeight.

Related

Dynamic scoping questions in R

I'm reading the AdvancedR by Hadley and am testing the following code on this URL
subset2 = function(df, condition){
condition_call = eval(substitute(condition),df )
df[condition_call,]
}
df = data.frame(a = 1:10, b = 2:11)
condition = 3
subset2(df, a < condition)
Then I got the following error message:
Error in eval(substitute(condition), df) : object 'a' not found
I read the explanation as follows but don't quite understand:
If eval() can’t find the variable inside the data frame (its second argument), it looks in the environment of subset2(). That’s obviously not what we want, so we need some way to tell eval() where to look if it can’t find the variables in the data frame.
In my opinion, while "eval(substitute(condition),df )", the variable they cannot find is condition, then why object "a" cannot be found?
On the other hand, why the following code won't make any error?
subset2 = function(df, condition){
condition_call = eval(substitute(condition),df )
df[condition_call,]
}
df = data.frame(a = 1:10, b = 2:11)
y = 3
subset2(df, a < y)
This more stripped down example may make it easier for you to see what's going on in Hadley's example. The first thing to note is that the symbol condition appears here in four different roles, each of which I've marked with a numbered comment.
## Role of symbol `condition`
f <- function(condition) { #1 -- formal argument
a <- 100
condition + a #2 -- symbol bound to formal argument
}
condition <- 3 #3 -- symbol in global environment
f(condition = condition + a) #4 -- supplied argument (on RHS)
## Error in f(condition = condition + a) (from #1) : object 'a' not found
The other important thing to understand is that symbols in supplied arguments (here the right hand side part of condition = condition + a at #4) are searched for in the evaluation frame of the calling function. From Section 4.3.3 Argument Evaluation of the R Language Definition:
One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
In the example above, the evaluation frame of the call to f() is the global environment, .GlobalEnv.
Taking this step by step, here is what happens when you call (condition = condition + a). During function evaluation, R comes across the expression condition + a in the function body (at #2). It searches for values of a and condition, and finds a locally assigned symbol a. It finds that the symbol condition is bound to the formal argument named condition (at #1). The value of that formal argument, supplied during the function call, is condition + a (at #4).
As noted in the R Language Definition, the values of the symbols in the expression condition + a are searched for in the environment of the calling function, here the global environment. Since the global environment contains a variable named condition (assigned at #3) but no variable named a, it is unable to evaluate the expression condition + a (at #4), and fails with the error that you see.
I want to add some details in case someone stumbles on this question. The problematic line is
condition_call = eval(substitute(condition),df )
The condition object in substitute() function is a promise object, its expression slot is "a < condition" and substitute(condition) takes expression and returns a call object with expression as "a < condition".
Then eval() function start to evaluate the "a < condition" in the df environment. Its target is finding both a and condition.
a is found in df successfully, and this is not where the bug generated.
Then R starts searching condition in df and cannot find it.
So R goes up to the execution environment of subset2, and finds condition in the execution environment.
The variable it finds is actually the promise object mentioned before with expression slot as "a < condition".
To evaluate this expression, R has to find a again, and now it cannot find a any more because it has passed the df environment. This is the part that really generates the error.
To summarize the problem here:
R does find a in the df for once.
The bug arises when R tries to look for condition and then R takes the promise object condition instead of the 4 assigned outside as the argument and tries to evaluate it.
Then R runs into the problem:
it tries to evaluate "a < condition" and it cannot find a either in the execution environment of subset2() or global environment.
For my second example, R cannot find y in the execution environment and then finds y in the calling environment of subset2() as 4, generating no errors. In this case, the name of y is different from the promise object condition and R won't try to evaluate "a < y" and no bugs generated.

What's the difference between substitute and quote in R

In the official docs, it says:
substitute returns the parse tree for the (unevaluated) expression
expr, substituting any variables bound in env.
quote simply returns its argument. The argument is not evaluated and
can be any R expression.
But when I try:
> x <- 1
> substitute(x)
x
> quote(x)
x
It looks like both quote and substitute returns the expression that's passed as argument to them.
So my question is, what's the difference between substitute and quote, and what does it mean to "substituting any variables bound in env"?
Here's an example that may help you to easily see the difference between quote() and substitute(), in one of the settings (processing function arguments) where substitute() is most commonly used:
f <- function(argX) {
list(quote(argX),
substitute(argX),
argX)
}
suppliedArgX <- 100
f(argX = suppliedArgX)
# [[1]]
# argX
#
# [[2]]
# suppliedArgX
#
# [[3]]
# [1] 100
R has lazy evaluation, so the identity of a variable name token is a little less clear than in other languages. This is used in libraries like dplyr where you can write, for instance:
summarise(mtcars, total_cyl = sum(cyl))
We can ask what each of these tokens means: summarise and sum are defined functions, mtcars is a defined data frame, total_cyl is a keyword argument for the function summarise. But what is cyl?
> cyl
Error: object 'cyl' not found
It isn't anything! Well, not yet. R doesn't evaluate it right away, but treats it as an expression to be parsed later with some parse tree that is different than the global environment your command line is working in, specifically one where the columns of mtcars are defined. Somewhere in the guts of dplyr, something like this is happening:
> substitute(cyl, mtcars)
[1] 6 6 4 6 8 ...
Suddenly cyl means something. That's what substitute is for.
So what is quote for? Well sometimes you want your lazily-evaluated expression to be represented somewhere else before it's evaluated, i.e. you want to display the actual code you're writing without any (or only some) values substituted. The docs you quoted explain this is common for "informative labels for data sets and plots".
So, for example, you could create a quoted expression, and then both print the unevaluated expression in your chart to show how you calculated and actually calculate with the expression.
expr <- quote(x + y)
print(expr) # x + y
eval(expr, list(x = 1, y = 2)) # 3
Note that substitute can do this expression trick also while giving you the option to parse only part of it. So its features are a superset of quote.
expr <- substitute(x + y, list(x = 1))
print(expr) # 1 + y
eval(expr, list(y = 2)) # 3
Maybe this section of the documentation will help somewhat:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
Note the final bit, and consider this example:
e <- new.env()
assign(x = "a",value = 1,envir = e)
> substitute(a,env = e)
[1] 1
Compare that with:
> quote(a)
a
So there are two basic situations when the substitution will occur: when we're using it on an argument of a function, and when env is some environment other than .GlobalEnv. So that's why you particular example was confusing.
For another comparison with quote, consider modifying the myplot function in the examples section to be:
myplot <- function(x, y)
plot(x, y, xlab = deparse(quote(x)),
ylab = deparse(quote(y)))
and you'll see that quote really doesn't do any substitution.
Regarding your question why GlobalEnv is treated as an exception for substitute, it is just a heritage of S. From The R language definition (https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Substitutions):
The special exception for substituting at the top level is admittedly peculiar. It has been inherited from S and the rationale is most likely that there is no control over which variables might be bound at that level so that it would be better to just make substitute act as quote.

Why subset doesn't mind missing subset argument for dataframes?

Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from.
Let
numbers <- c(1, 2, 3)
frame <- as.data.frame(numbers)
If I type
subset(numbers, )
(so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should):
Error in subset.default(numbers, ) :
argument "subset" is missing, with no default
However when I type
subset(frame,)
(so the same thing with a data.frame instead of a vector), it doesn't give an error but instead just returns the (full) dataframe.
What is going on here? Why don't I get my well deserved error message?
tl;dr: The subset function calls different functions (has different methods) depending on the type of object it is fed. In the example above, subset(numbers, ) uses subset.default while subset(frame, ) uses subset.data.frame.
R has a couple of object-oriented systems built-in. The simplest and most common is called S3. This OO programming style implements what Wickham calls a "generic-function OO." Under this style of OO, an object called a generic function looks at the class of an object and then applies the proper method to the object. If no direct method exists, then there is always a default method available.
To get a better idea of how S3 works and the other OO systems work, you might check out the relevant portion of the Advanced R site. The procedure of finding the proper method for an object is referred to as method dispatch. You can read more about this in the help file ?UseMethod.
As noted in the Details section of ?subset, the subset function "is a generic function." This means that subset examines the class of the object in the first argument and then uses method dispatch to apply the appropriate method to the object.
The methods of a generic function are encoded as
< generic function name >.< class name >
and can be found using methods(<generic function name>). For subset, we get
methods(subset)
[1] subset.data.frame subset.default subset.matrix
see '?methods' for accessing help and source code
which indicates that if the object has a data.frame class, then subset calls the subset.data.frame the method (function). It is defined as below:
subset.data.frame
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r & !is.na(r)
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
Note that if the subset argument is missing, the first lines
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
produce a vector of TRUES of the same length as the data.frame, and the last line
x[r, vars, drop = drop]
feeds this vector into the row argument which means that if you did not include a subset argument, then the subset function will return all of the rows of the data.frame.
As we can see from the output of the methods call, subset does not have methods for atomic vectors. This means, as your error
Error in subset.default(numbers, )
that when you apply subset to a vector, R calls the subset.default method which is defined as
subset.default
function (x, subset, ...)
{
if (!is.logical(subset))
stop("'subset' must be logical")
x[subset & !is.na(subset)]
}
The subset.default function throws an error with stop when the subset argument is missing.

invalid 'envir' argument of type 'character' -- in self-defined function with lattice histogram

I want a function with parameters such as data name (dat), factor(myfactor), variable names(myvar) to dynamically generate histograms (have to use lattice).
Using IRIS as a minimal example:
data(iris)
my_histogram <- function(myvar,myfactor,dat){
listofparam <- c(myvar,myfactor)
myf <- as.formula(paste("~",paste(listofparam,collapse="|")))
histogram(myf,
data=dat,
main=bquote(paste(.(myvar),"distribution by",.(myfactor),seq=" ")))}
my_histogram("Sepal.Length","Species","iris")
I also tried do.call as some posts indicated:
my_histogram <- function(myvar,myfactor,dat){
listofparam <- c(myvar,myfactor)
myf <- as.formula(paste("~",paste(listofparam,collapse="|")))
p <- do.call("histogram",
args = list(myf,
data=dat))
print(p)
}
my_histogram("Sepal.Length","Species","iris")
But the error appears: invalid 'envir' argument of type 'character'. I think the program doesn't know where to look for thismyf` string. How can I fix this or there's a better way?
Readers of this should be aware that the question has completely mutated from an earlier version and doesn't really match up with this answer anymore. The answer to the new question appears in the comments.
There is no object named Sepal.Length. (So R is creating an error even before that my_function gets called.) There is only a column name and it would need to be quoted to pass it to a function. (The data object could not be created because that URL fails to deliver the data. Why aren't you using the built-in copy of the iris data object?
You will also need to build a formula from myvar and fac. Formulas are expressions and get parsed without evaluation of their tokens. You need to build a formula inside your function that looks like: ~Sepal.Length|Species and then pass it to the histogram call. Consult ?as.formula

Convert character vector to numeric vector in R for value assignment?

I have:
z = data.frame(x1=a, x2=b, x3=c, etc)
I am trying to do:
for (i in 1:10)
{
paste(c('N'),i,sep="") -> paste(c('z$x'),i,sep="")
}
Problems:
paste(c('z$x'),i,sep="") yields "z$x1", "z$x1" instead of calling the actual values. I need the expression to be evaluated. I tried as.numeric, eval. Neither seemed to work.
paste(c('N'),i,sep="") yields "N1", "N2". I need the expression to be merely used as name. If I try to assign it a value such as paste(c('N'),5,sep="") -> 5, ie "N5" -> 5 instead of N5 -> 5, I get target of assignment expands to non-language object.
This task is pretty trivial since I can simply do:
N1 = x1...
N2 = x2...
etc, but I want to learn something new
I'd suggest using something like for( i in 1:10 ) z[,i] <- N[,i]...
BUT, since you said you want to learn something new, you can play around with parse and substitute.
NOTE: these little tools are funny, but experienced users (not me) avoid them.
This is called "computing on the language". It's very interesting, and it helps understanding the way R works. Let me try to give an intro:
The basic language construct is a constant, like a numeric or character vector. It is trivial because it is not different from its "unevaluated" version, but it is one of the building blocks for more complicated expressions.
The (officially) basic language object is the symbol, also known as a name. It's nothing but a pointer to another object, i.e., a token that identifies another object which may or may not exist. For instance, if you run x <- 10, then x is a symbol that refers to the value 10. In other words, evaluating the symbol x yields the numeric vector 10. Evaluating a non-existant symbol yields an error.
A symbol looks like a character string, but it is not. You can turn a string into a symbol with as.symbol("x").
The next language object is the call. This is a recursive object, implemented as a list whose elements are either constants, symbols, or another calls. The first element must not be a constant, because it must evaluate to the real function that will be called. The other elements are the arguments to this function.
If the first argument does not evaluate to an existing function, R will throw either Error: attempt to apply non-function or Error: could not find function "x" (if the first argument is a symbol that is undefined or points to something other than a function).
Example: the code line f(x, y+z, 2) will be parsed as a list of 4 elements, the first being f (as a symbol), the second being x (another symbol), the third another call, and the fourth a numeric constant. The third element y+z, is just a function with two arguments, so it parses as a list of three names: '+', y and z.
Finally, there is also the expression object, that is a list of calls/symbols/constants, that are meant to be evaluated one by one.
You'll find lots of information here:
https://github.com/hadley/devtools/wiki/Computing-on-the-language
OK, now let's get back to your question :-)
What you have tried does not work because the output of paste is a character string, and the assignment function expects as its first argument something that evaluates to a symbol, to be either created or modified. Alternativelly, the first argument can also evaluate to a call associated with a replacement function. These are a little trickier, but they are handled by the assignment function itself, not by the parser.
The error message you see, target of assignment expands to non-language object, is triggered by the assignment function, precisely because your target evaluates to a string.
We can fix that building up a call that has the symbols you want in the right places. The most "brute force" method is to put everything inside a string and use parse:
parse(text=paste('N',i," -> ",'z$x',i,sep=""))
Another way to get there is to use substitute:
substitute(x -> y, list(x=as.symbol(paste("N",i,sep="")), y=substitute(z$w, list(w=paste("x",i,sep="")))))
the inner substitute creates the calls z$x1, z$x2 etc. The outer substitute puts this call as the taget of the assignment, and the symbols N1, N2 etc as the values.
parse results in an expression, and substitute in a call. Both can be passed to eval to get the same result.
Just one final note: I repeat that all this is intended as a didactic example, to help understanding the inner workings of the language, but it is far from good programming practice to use parse and substitute, except when there is really no alternative.
A data.frame is a named list. It usually good practice, and idiomatically R-ish not to have lots of objects in the global environment, but to have related (or similar) objects in lists and to use lapply etc.
You could use list2env to multiassign the named elements of your list (the columns in your data.frame) to the global environment
DD <- data.frame(x = 1:3, y = letters[1:3], z = 3:1)
list2env(DD, envir = parent.frame())
## <environment: R_GlobalEnv>
## ta da, x, y and z now exist within the global environment
x
## [1] 1 2 3
y
## [1] a b c
## Levels: a b c
z
## [1] 3 2 1
I am not exactly sure what you are trying to accomplish. But here is a guess:
### Create a data.frame using the alphabet
data <- data.frame(x = 'a', y = 'b', z = 'c')
### Create a numerical index corresponding to the letter position in the alphabet
index <- which(tolower(letters[1:26]) == data[1, ])
### Use an 'lapply' to apply a function to every element in 'index'; creates a list
val <- lapply(index, function(x) {
paste('N', x, sep = '')
})
### Assign names to our list
names(val) <- names(data)
### Observe the result
val$x

Resources