Evaluate strings for regression - r

From these strings
data = "mtcars"
y = "mpg"
x = c("cyl","disp")
, I am trying to perform a linear model. I tried things like
epp=function(x) eval(parse(text=paste0(x,collapse="+")))
lm(data=epp(data),epp(y)~epp(x))
# Error in eval(expr, envir, enclos) : object 'cyl' not found
where the last line was aimed to be equivalent to
lm(data=mtcars,mpg~cyl+disp)

This involves two operations that are both described in multiple SO entries that use perhaps singly either the get or as.formula functions:
lm(data=get(data),
formula=as.formula( paste( y, "~", paste(x, collapse="+") ) )
)
In both cases you are use a text/character object to return a language object. In the first argument get returns a 'symbol' that can be evaluated and in the second instance as.formula returns a 'formula' object. #blmoore is correct in advising us that lm will accept a character object, so the as.formula call is not needed here.

Related

Why subset doesn't mind missing subset argument for dataframes?

Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from.
Let
numbers <- c(1, 2, 3)
frame <- as.data.frame(numbers)
If I type
subset(numbers, )
(so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should):
Error in subset.default(numbers, ) :
argument "subset" is missing, with no default
However when I type
subset(frame,)
(so the same thing with a data.frame instead of a vector), it doesn't give an error but instead just returns the (full) dataframe.
What is going on here? Why don't I get my well deserved error message?
tl;dr: The subset function calls different functions (has different methods) depending on the type of object it is fed. In the example above, subset(numbers, ) uses subset.default while subset(frame, ) uses subset.data.frame.
R has a couple of object-oriented systems built-in. The simplest and most common is called S3. This OO programming style implements what Wickham calls a "generic-function OO." Under this style of OO, an object called a generic function looks at the class of an object and then applies the proper method to the object. If no direct method exists, then there is always a default method available.
To get a better idea of how S3 works and the other OO systems work, you might check out the relevant portion of the Advanced R site. The procedure of finding the proper method for an object is referred to as method dispatch. You can read more about this in the help file ?UseMethod.
As noted in the Details section of ?subset, the subset function "is a generic function." This means that subset examines the class of the object in the first argument and then uses method dispatch to apply the appropriate method to the object.
The methods of a generic function are encoded as
< generic function name >.< class name >
and can be found using methods(<generic function name>). For subset, we get
methods(subset)
[1] subset.data.frame subset.default subset.matrix
see '?methods' for accessing help and source code
which indicates that if the object has a data.frame class, then subset calls the subset.data.frame the method (function). It is defined as below:
subset.data.frame
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r & !is.na(r)
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
Note that if the subset argument is missing, the first lines
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
produce a vector of TRUES of the same length as the data.frame, and the last line
x[r, vars, drop = drop]
feeds this vector into the row argument which means that if you did not include a subset argument, then the subset function will return all of the rows of the data.frame.
As we can see from the output of the methods call, subset does not have methods for atomic vectors. This means, as your error
Error in subset.default(numbers, )
that when you apply subset to a vector, R calls the subset.default method which is defined as
subset.default
function (x, subset, ...)
{
if (!is.logical(subset))
stop("'subset' must be logical")
x[subset & !is.na(subset)]
}
The subset.default function throws an error with stop when the subset argument is missing.

Specifying variables in cor.matrix

Trying to use Deducer's cor.matrix to create a correlation matrix to be used in ggcorplot.
Trying to run a simple example. Only explicitly specifying the variable names in the data works:
cor.mat<-cor.matrix(variables=d(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width),
data=iris[1:4],test=cor.test,method='p')
But I'd like it to simple use all columns in the provided data.
This:
cor.mat<-cor.matrix(data=iris[1:4],test=cor.test,method='p')
throws an error:
Error in eval(expr, envir, enclos) : argument is missing, with no default
This:
cor.mat<-cor.matrix(variables=d(as.name(paste(colnames(iris)[1:4]),collapse=",")),
data=iris[1:4],test=cor.test,method='p')
Error in as.name(paste(colnames(iris)[1:4]), collapse = ",") :
unused argument (collapse = ",")
So is there any way to tell variables to use all columns in data without explicitly specifying them?
The first argument of the function is variables =, which is required but you did not specify (you had data =). Try
cor.mat <- cor.matrix(variables = iris[1:4], test = cor.test, method = 'p')
ggcorplot(cor.mat, data=iris)

input variables in R functions from third party libraries

I am a reasonably proficient python programmer messing around with some R.
On this website, for the third party library ICC, I'm confused about input variables for the function ICCest.
Located here:
http://www.inside-r.org/packages/cran/ICC/docs/ICCest
I can use:
ICCest(Chick, weight, data=ChickWeight, CI.type="S")
And I got this to work. Chick and weight are column names for the data frame variable called ChickWeight. All is well and good.
Except, that, what type of variables are "Chick" and "weight"?? They aren't in my R namespace. They aren't strings because they don't have quotes around them.
Doing:
ICCest(Chick, "weight", data=ChickWeight, CI.type="S")
yields:
In ICCest(Chick, "weight", data = ChickWeight, CI.type = "S") :
passing a character string to 'y' is deprecated since ICC vesion 2.3.0 and will not be supported in future versions. The argument to 'y' should either be an unquoted column name of 'data' or an object
So again in my nice friendly python land you can't pass in unquoted characters strings that are not objects in your namespace so I am quite confused.
What is happening here?
You can take a look at the function's code by typing ICCest (without the parantheses):
> ICCest
Object with tracing code, class "functionWithTrace"
Original definition:
function (x, y, data = NULL, alpha = 0.05, CI.type = c("THD", "Smith")){
square <- function(z) {
z^2
}
icall <- list(y = substitute(y), x = substitute(x))
if (is.character(icall$y)) {
warning("passing a character string to 'y' is deprecated since ICC vesion 2.3.0 and will not be supported in future versions. The argument to 'y' should either be an unquoted column name of 'data' or an object")
if (missing(data))
stop("Supply either the unquoted name of the object containing 'y' or supply both 'data' and then 'y' as an unquoted column name to 'data'")
icall$y <- eval(as.name(y), data, parent.frame())
} ...
what happens after the square function block, is that the input is stored in icall in a parse tree, which you can think of as a set of unevaluated expressions. So there's no error when you pass plain weight without the quotation marks, because at this point, there hasn't been an attempt to evaluate the expressions yet. (I'm a bit unsure about this last statement. I hope someone can confirm if it is technically correct)
Inside the if block (where your warning is raised), you can see that they are using eval to update the local variable icall$y. What eval does is essentially evaluating an expression within an environment. Specifically, in the environment of a dataframe, the column names are considered part of the environment.
Now it says in the documentation, that eval takes an expression as its first input. This is why y is cast to an object with as.name before being passed to eval (remember that we are in the if block for string input y)
eval(expr, envir = parent.frame(),...)
And expressions and strings are different in R. So in the last line of code shown above, the y input (here, weight) is being evaluated in the data environment --which, here, is ChickWeight.
To get a better feeling, try this:
> eval(weight, ChickWeight)
Error in eval(weight, ChickWeight) : object 'weight' not found
But if you make an unevaluated expression first, it will work:
> expr <- quote(weight)
> eval(expr, ChickWeight)
Here, quote is doing roughly the same thing as substitute in the 4th line of the function. Check here for more on quote and substitute\.
Why are you passing your y as a quoted string. The function doesn't appear to require quoted strings for variable names. Doing
str(ChickWeight)
will give you the types for the variables. They aren't in a 'name space' because they are variable names in the data.frame ChickWeight.

cannot combine with and functions in R

Could somebody please point out to me why is that the following example does not work:
df <- data.frame(ex =rep(1,5))
sample.fn <- function(var1) with(df, mean(var1))
sample.fn(ex)
It seems that I am using the wrong syntax to combine with inside of a function.
Thanks,
This is what I meant by learning to use "[" (actually"[["):
> df <- data.frame(ex =rep(1,5))
> sample.fn <- function(var1) mean(df[[var1]])
> sample.fn('ex')
[1] 1
You cannot use an unquoted ex since there is no object named 'ex', at least not at the global environment where you are making the call to sample.fn. 'ex' is a name only inside the environment of the df-dataframe and only df itself is "visible" when the sample.fn-function is called.
Out of interest, I tried using the method that the with.default function uses to build a function taking an unquoted expression argument in the manner you were expecting:
samp.fn <- function(expr) mean(
eval(substitute(expr), df, enclos = parent.frame())
)
samp.fn(ex)
#[1] 1
It's not a very useful function, since it would only be applicable when there was a dataframe named 'df' in the parent.frame(). And apologies for incorrectly claiming that there was a warning on the help page. As #rawr points out the warning about using functions that depend on non-standard evaluation appears on the subset page.

Anonymous function in R

Using a dataset w, which includes a numeric column PY, I can do:
nrow(subset(w, PY==50))
and get the correct answer. If, however, I try to create a function:
fxn <- function(dataset, fac, lev){nrow(subset(dataset, fac==lev))}
and run
fxn(w, PY, 50)
I get the following error:
Error in eval(expr, envir, enclos) : object 'PY' not found
What am I doing wrong? Thanks.
From the documentation of subset:
Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
This rather obscure warning was very well explained here: Why is `[` better than `subset`?
The final word is you can't use subset other than interactively, in particular, not via a wrapper like you are trying. You should use [ instead:
fxn <- function(dataset, fac, lev) nrow(dataset[dataset[fac] == lev, , drop = FALSE])
or rather simply:
fxn <- function(dataset, fac, lev) sum(dataset[fac] == lev)

Resources