R - extract variable names from unevaluated expression - r

Assume following model
is written in a text file by someone not familiar with R as follows:
goal1 = dec1_g1 + dec2_g1 + dec3_g1
goal2 = min(dec1_g2, dec2_g2, dec3_g2)
goal3 = dec1_g3 - dec2_g3 - dec3_g3
...
I need to be able to parse the text file with the model and evaluate any one line without having to assign values to the dec variables from the remaining lines of the model. While the parse function creates an unevaluated expression exp that can be queried and evaluated in parts as eval(exp[1]), eval(exp[2]), I haven't found a way to do something like eval(exp['goal1']).
Question: is there a way to parse the model without evaluating it and create a list with elements named by the left-hand sides of the model expressions, e.g.
model = list(
"goal1" = expression(goal1 = dec1_g1 + dec2_g1 + dec3_g1),
"goal2" = expression(goal2 = min(dec1_g2, dec2_g2, dec3_g2)),
"goal3" = expression(goal3 = dec1_g3 * dec2_g3 * dec3_g3),
...
)
Motivation: I want to be able to load the model from within an R code, parse it and evaluate it expression by expression assigning correct values to the dec variables depending no the goal that's being evaluated.

The "left hand side" of expression(x=y+z) is actually the name of the argument you're passing to expression(), whose value is the (unevaluated) call y + z. So it's not a part of the expression, but is returned as the name of the list element (an expression is a list of calls, usually unnamed):
> as.list(expression(x=y+z))
$x
y + z
> names(expression(x=y+z))
[1] "x"
If, OTOH, you use the formula constructor ~, then you get the LHS as a part of the expression:
> as.list(expression(x~y+z))
[[1]]
x ~ y + z
And you can get to it selecting the second element of the call:
> expression(x~y+z)[[1]]
x ~ y + z
> expression(x~y+z)[[1]][[1]]
`~`
> expression(x~y+z)[[1]][[2]]
x
Note: in the last line, x is a symbol.

Related

Creating a function for GWR maps

I have created a function for GWR maps and I have run the code without it being in the function and it works well. However, when I create into a function I get an error. I was wondering if anyone could help, thank you!
#a=polygonshapefile
#b= Dependent variabable of shapefile
#c= Explantory variable 1
#d= Explantory vairbale 2
GWR_map <- function(a,b,c,d){
GWRbandwidth <- gwr.sel(a$b ~ a$c+a$d, a,adapt=T)
gwr.model = gwr(a$b ~ a$c+a$d, data = a, adapt=GWRbandwidth, hatmatrix=TRUE, se.fit=TRUE)
gwr.model
}
GWR_map(OA.Census,"Qualification", "Unemployed", "White_British")
The above code produces the following error:
Error in model.frame.default(formula = a$b ~ a$c + a$d, data = a, drop.unused.levels = TRUE) :
invalid type (NULL) for variable 'a$b'
You can't use function parameters with the $. Try changing your function to use the [[x]] notation instead. It should look like this:
GWR_map <- function(a,b,c,d){
GWRbandwidth <- gwr.sel(a[[b]] ~ a[[c]]+a[[d]], a,adapt=T)
gwr.model = gwr(a[[b]] ~ a[[c]]+a[[d]], data = a, adapt=GWRbandwidth, hatmatrix=TRUE, se.fit=TRUE)
gwr.model
}
The R help docs (section 6.2 on lists) explain this difference well:
Additionally, one can also use the names of the list components in double square brackets,
i.e., Lst[["name"]] is the same as Lst$name. This is especially useful, when the name of the component to be extracted is stored in another variable as in
x <- "name"; Lst[[x]] It is very important to distinguish Lst[[1]] from Lst[1]. ‘[[...]]’ is the operator used to select a single element, whereas ‘[...]’ is a general subscripting operator. Thus the former is the first object in the list Lst, and if it is a named list the name is not included. The latter
is a sublist of the list Lst consisting of the first entry only. If it is a named list, the names are transferred to the sublist.

What's the difference between substitute and quote in R

In the official docs, it says:
substitute returns the parse tree for the (unevaluated) expression
expr, substituting any variables bound in env.
quote simply returns its argument. The argument is not evaluated and
can be any R expression.
But when I try:
> x <- 1
> substitute(x)
x
> quote(x)
x
It looks like both quote and substitute returns the expression that's passed as argument to them.
So my question is, what's the difference between substitute and quote, and what does it mean to "substituting any variables bound in env"?
Here's an example that may help you to easily see the difference between quote() and substitute(), in one of the settings (processing function arguments) where substitute() is most commonly used:
f <- function(argX) {
list(quote(argX),
substitute(argX),
argX)
}
suppliedArgX <- 100
f(argX = suppliedArgX)
# [[1]]
# argX
#
# [[2]]
# suppliedArgX
#
# [[3]]
# [1] 100
R has lazy evaluation, so the identity of a variable name token is a little less clear than in other languages. This is used in libraries like dplyr where you can write, for instance:
summarise(mtcars, total_cyl = sum(cyl))
We can ask what each of these tokens means: summarise and sum are defined functions, mtcars is a defined data frame, total_cyl is a keyword argument for the function summarise. But what is cyl?
> cyl
Error: object 'cyl' not found
It isn't anything! Well, not yet. R doesn't evaluate it right away, but treats it as an expression to be parsed later with some parse tree that is different than the global environment your command line is working in, specifically one where the columns of mtcars are defined. Somewhere in the guts of dplyr, something like this is happening:
> substitute(cyl, mtcars)
[1] 6 6 4 6 8 ...
Suddenly cyl means something. That's what substitute is for.
So what is quote for? Well sometimes you want your lazily-evaluated expression to be represented somewhere else before it's evaluated, i.e. you want to display the actual code you're writing without any (or only some) values substituted. The docs you quoted explain this is common for "informative labels for data sets and plots".
So, for example, you could create a quoted expression, and then both print the unevaluated expression in your chart to show how you calculated and actually calculate with the expression.
expr <- quote(x + y)
print(expr) # x + y
eval(expr, list(x = 1, y = 2)) # 3
Note that substitute can do this expression trick also while giving you the option to parse only part of it. So its features are a superset of quote.
expr <- substitute(x + y, list(x = 1))
print(expr) # 1 + y
eval(expr, list(y = 2)) # 3
Maybe this section of the documentation will help somewhat:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
Note the final bit, and consider this example:
e <- new.env()
assign(x = "a",value = 1,envir = e)
> substitute(a,env = e)
[1] 1
Compare that with:
> quote(a)
a
So there are two basic situations when the substitution will occur: when we're using it on an argument of a function, and when env is some environment other than .GlobalEnv. So that's why you particular example was confusing.
For another comparison with quote, consider modifying the myplot function in the examples section to be:
myplot <- function(x, y)
plot(x, y, xlab = deparse(quote(x)),
ylab = deparse(quote(y)))
and you'll see that quote really doesn't do any substitution.
Regarding your question why GlobalEnv is treated as an exception for substitute, it is just a heritage of S. From The R language definition (https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Substitutions):
The special exception for substituting at the top level is admittedly peculiar. It has been inherited from S and the rationale is most likely that there is no control over which variables might be bound at that level so that it would be better to just make substitute act as quote.

peculiar syntax for function within()

I came across this fantastic function called
within {base}
I use it more often now than the much hyped
mutate {dplyr}
My question is, why is within() having such a peculiar format with assignment operators used <- instead of the usual = for args; How is it different from mutate other than what is given in this fantastic article I found. I am interested to know the underlying mechanism.
Article of Bob Munchen - 2013
The function within takes an expression as second argument. That expression is essentially a codeblock, best contained within curly brackets {}.
In this codeblock, you can assign new variables, change values and the likes. The variables can be used in the codeblock as objects.
mutate on the other hand takes a set of arguments for the mutation. These arguments have to be named after the variable that should be created, and get the value for that variable as the value.
So :
mutate(iris, ratio = Sepal.Length/Petal.Length)
# and
within(iris, {ratio = Sepal.Length/Petal.Length})
give the same result. The problem starts when you remove the curly brackets:
> within(iris, ratio = Sepal.Length/Petal.Length)
Error in eval(substitute(expr), e) : argument is missing, with no default
The curly brackets enclosed an expression (piece of code), and hence within() worked correctly. If you don't use the {}, then R semantics reads that last command as "call the function within with iris as first argument and a second argument called ratio set to Sepal.Length/Petal.Length". And as the function within() doesn't have an argument ratio, that one is ignored. Instead, within looks for the expression that should be the second argument. But it can't find that one, so that explains the error.
So there's little peculiar about it. Both functions just have different arguments. All the rest is pretty much how R deals with arguments.
Args of within are not assigned with <- but with the usual =.
Let's see the first example in your link:
mydata.new <- within(mydata, {
+ x2 <- x ^ 2
+ x3 <- x2 + 100
+ } )
Here,
{
x2 <- x ^ 2
x3 <- x2 + 100
}
is just an argument of the function (an R expression). Nor x2 nor x3 are arguments to within. The function could have been called in that way instead to make it clearer:
mydata.new <- within(data = mydata, expr = {
x2 <- x ^ 2
x3 <- x2 + 100
})

Creating call objects to compare to formula elements

I would like to create an object from a string to compare with an element of a formula.
For example, in the following:
# note that f does not exist
myForm <- y ~ f(x)
theF <- myForm[[3]]
fString <- "f(x)"
How can I compare fString to theF?
If I know the string is "f(x)" I can manually enter the following
cheating <- as.call(quote(f(x)))
identical(theF, cheating)
which works (it gives TRUE) but I want to be able to take the string "f(x)" as an argument (e.g. maybe it's "g(x)".
The real point of this question is for me to understand better how to work with call objects and quote function.
parse(text = s) converts text, s, to an expression and e[[1]] extracts the call object from a length 1 expression e. theF is a call object so putting these together we have:
identical(theF, parse(text = fString)[[1]])
## TRUE
note that formula's are really nothing on their own in R.
the only thing they do is convert it into a string like object...
"y~f(x)"
it's then on to the functions that accept formulas to interpret it...
check coplot for an example implementation

Converting from a Formula object to a list

In R, I would like to iterate over a formula object. R automatically converts a formula to a parse tree, so I see no reason why I shouldn't be able to iterate.
For example, f <- ~x + y has elements f[[1]] = ~ and f[[2]] = x + y. However, for(v in f) print(toString(v)) does not output
[1] "~"
[1] "+, x, y"
as I would expect it to. Instead, it gives the error invalid for() loop sequence.
If I need to do it manually, I could always use for(i in 1:length(f)) print(toString(f[[i]])) which does produce the correct output. However, I would like to know why the first method does not work.

Resources