General information or tips about eval(parse( [duplicate] - r

There are several questions on how to avoid using eval(parse(...))
r-evalparse-is-often-suboptimal
avoiding-the-infamous-evalparse-construct
Which sparks the questions:
Why Specifically should eval(parse()) be avoided?
And most importantly, What are the dangers?
Are there any dangerous if the code is not used in production? (I'm thinking, any danger of getting back unintended results. Clearly if you are not careful about what you are parsing, you will have issues. But is that any more dangerous than being sloppy with get()?)

Most of the arguments against eval(parse(...)) arise not because of security concerns, after all, no claims are made about R being a safe interface to expose to the Internet, but rather because such code is generally doing things that can be accomplished using less obscure methods, i.e. methods that are both quicker and more human parse-able. The R language is supposed to be high-level, so the preference of the cognoscenti (and I do not consider myself in that group) is to see code that is both compact and expressive.
So the danger is that eval(parse(..)) is a backdoor method of getting around lack of knowledge and the hope in raising that barrier is that people will improve their use of the R language. The door remains open but the hope is for more expressive use of other features. Carl Witthoft's question earlier today illustrated not knowing that the get function was available, and the question he linked to exposed a lack of understanding of how the [[ function behaved (and how $ was more limited than [[). In both cases an eval(parse(..)) solution could be constructed, but it was clunkier and less clear than the alternative.

The security concerns only really arise if you start calling eval on strings that another user has passed to you. This is a big deal if you are creating an application that runs R in the background, but for data analysis where you are writing code to be run by yourself, then you shouldn't need to worry about the effect of eval on security.
Some other problems with eval(parse( though.
Firstly, code using eval-parse is usually much harder to debug than non-parsed code, which is problematic because debugging software is twice as difficult as writing it in the first place.
Here's a function with a mistake in it.
std <- function()
{
mean(1to10)
}
Silly me, I've forgotten about the colon operator and created my vector wrongly. If I try and source this function, then R notices the problem and throws an error, pointing me at my mistake.
Here's the eval-parse version.
ep <- function()
{
eval(parse(text = "mean(1to10)"))
}
This will source, because the error is inside a valid string. It is only later, when we come to run the code that the error is thrown. So by using eval-parse, we've lost the source-time error checking capability.
I also think that this second version of the function is much more difficult to read.
The other problem with eval-parse is that it is much slower than directly executed code. Compare
system.time(for(i in seq_len(1e4)) mean(1:10))
user system elapsed
0.08 0.00 0.07
and
system.time(for(i in seq_len(1e4)) eval(parse(text = "mean(1:10)")))
user system elapsed
1.54 0.14 1.69

Usually there's a better way of 'computing on the language' than working with code-strings; evalparse heavy-code needs a lot of safe-guarding to guarantee a sensible output, in my experience.
The same task can usually be solved by working on R code as a language object directly; Hadley Wickham has a useful guide on meta-programming in R here:
The defmacro() function in the gtools library is my favourite substitute (no half-assed R pun intended) for the evalparse construct
require(gtools)
# both action_to_take & predicate will be subbed with code
F <- defmacro(predicate, action_to_take, expr =
if(predicate) action_to_take)
F(1 != 1, action_to_take = print('arithmetic doesnt work!'))
F(pi > 3, action_to_take = return('good!'))
[1] 'good!'
# the raw code for F
print(F)
function (predicate = stop("predicate not supplied"), action_to_take = stop("action_to_take not supplied"))
{
tmp <- substitute(if (predicate) action_to_take)
eval(tmp, parent.frame())
}
<environment: 0x05ad5d3c>
The benefit of this method is that you are guaranteed to get back syntactically-legal R code. More on this useful function can be found here:
Hope that helps!

In some programming languages, eval() is a function which evaluates
a string as though it were an expression and returns a result; in
others, it executes multiple lines of code as though they had been
included instead of the line including the eval. The input to eval is
not necessarily a string; in languages that support syntactic
abstractions (like Lisp), eval's input will consist of abstract
syntactic forms.
http://en.wikipedia.org/wiki/Eval
There are all kinds of exploits that one can take advantage of if eval is used improperly.
An attacker could supply a program with the string
"session.update(authenticated=True)" as data, which would update the
session dictionary to set an authenticated key to be True. To remedy
this, all data which will be used with eval must be escaped, or it
must be run without access to potentially harmful functions.
http://en.wikipedia.org/wiki/Eval
In other words, the biggest danger of eval() is the potential for code injection into your application. The use of eval() can also cause performance issues in some languages depending on what is being used for.
Specifically in R, it's probably because you can use get() in place of eval(parse()) and your results will be the same without having to resort to eval()

Related

what's preventing additions to the current set of R reserved words/symbols?

Is there a historical precedent of internal changes to the R parser, adding new reserved words or symbols?
If I remember correctly data.table uses a serendipitous := that was once defined but left unused in R internals, but I'm not aware of others. However, as the language evolves, it would sometimes seem useful to define new symbols.
An obvious case could be made for magrittr's pipe %>% which has become ubiquitous for many, but remains a pain to type (sure, there are keyboard tricks, but still). Similarly, dplyr/rlang introduce/repurpose notations for "tidy evaluation" (!!, !!!, :=, ~, etc.).
Another case I'm seeing is the verbosity of lambda functions. Would it be possible, theoretically, to define internally something like f = λ(x) x+1 instead of f = function(x) x+1, or are there character restrictions on top of other reasons?
Why add an ergonomics feature if you risk breaking a runtime that hosts a huge ecosystem? Also, once you add one feature, you are on a slippery slope and are staring straight in the face of feature bloat.
And if you say that we can be smart and judicious about what features we add, how do we structure that decision process? R does not have a "benevolent dictator" having a final word in decisions like this so you are left with design by committee with all that it entails.
The big thing with R has always been the package ecosystem, in which if you want a feature you write it yourself -- as in your magrittr example. The language itself has remained close to its S roots and has successfully served as a stable platform for all the development that has been happening.

How to not fall into R's 'lazy evaluation trap'

"R passes promises, not values. The promise is forced when it is first evaluated, not when it is passed.", see this answer by G. Grothendieck. Also see this question referring to Hadley's book.
In simple examples such as
> funs <- lapply(1:10, function(i) function() print(i))
> funs[[1]]()
[1] 10
> funs[[2]]()
[1] 10
it is possible to take such unintuitive behaviour into account.
However, I find myself frequently falling into this trap during daily development. I follow a rather functional programming style, which means that I often have a function A returning a function B, where B is in some way depending on the parameters with which A was called. The dependency is not as easy to see as in the above example, since calculations are complex and there are multiple parameters.
Overlooking such an issue leads to difficult to debug problems, since all calculations run smoothly - except that the result is incorrect. Only an explicit validation of the results reveals the problem.
What comes on top is that even if I have noticed such a problem, I am never really sure which variables I need to force and which I don't.
How can I make sure not to fall into this trap? Are there any programming patterns that prevent this or that at least make sure that I notice that there is a problem?
You are creating functions with implicit parameters, which isn't necessarily best practice. In your example, the implicit parameter is i. Another way to rework it would be:
library(functional)
myprint <- function(x) print(x)
funs <- lapply(1:10, function(i) Curry(myprint, i))
funs[[1]]()
# [1] 1
funs[[2]]()
# [1] 2
Here, we explicitly specify the parameters to the function by using Curry. Note we could have curried print directly but didn't here for illustrative purposes.
Curry creates a new version of the function with parameters pre-specified. This makes the parameter specification explicit and avoids the potential issues you are running into because Curry forces evaluations (there is a version that doesn't, but it wouldn't help here).
Another option is to capture the entire environment of the parent function, copy it, and make it the parent env of your new function:
funs2 <- lapply(
1:10, function(i) {
fun.res <- function() print(i)
environment(fun.res) <- list2env(as.list(environment())) # force parent env copy
fun.res
}
)
funs2[[1]]()
# [1] 1
funs2[[2]]()
# [1] 2
but I don't recommend this since you will be potentially copying a whole bunch of variables you may not even need. Worse, this gets a lot more complicated if you have nested layers of functions that create functions. The only benefit of this approach is that you can continue your implicit parameter specification, but again, that seems like bad practice to me.
As others pointed out, this might not be the best style of programming in R. But, one simple option is to just get into the habit of forcing everything. If you do this, realize you don't need to actually call force, just evaluating the symbol will do it. To make it less ugly, you could make it a practice to start functions like this:
myfun<-function(x,y,z){
x;y;z;
## code
}
There is some work in progress to improve R's higher order functions like the apply functions, Reduce, and such in handling situations like these. Whether this makes into R 3.2.0 to be released in a few weeks depend on how disruptive the changes turn out to be. Should become clear in a week or so.
R has a function that helps safeguard against lazy evaluation, in situations like closure creation: forceAndCall().
From the online R help documentation:
forceAndCall is intended to help defining higher order functions like apply to behave more reasonably when the result returned by the function applied is a closure that captured its arguments.

Recursion using only global variables

For the cause of simplicity, Smallbasic has only global variables. It does not have locals or parameters.
Although this makes it simpler to teach or learn it, it also complicates some matters, like recursive functions. I had a hard time creating a simple recursive function in smallbasic and had to use a manual stack. This works but it makes it more complicated and contradicts the initial main goal of simplicity!
This is how i can write the factorial:
n = 5
ind = 1
fact()
TextWindow.WriteLine("fact(5)=" + f)
Sub fact
If n = 1 Then
f = 1
Else
ind = ind+1
keepn[ind] = n
n = n-1
fact()
f = f * keepn[ind]
ind = ind-1
EndIf
EndSub
Note: I wrote it just now and it could have errors.
You see the picture. I'm manually creating a stack and using it to simulate local variable and use it for recursion.
Is there an easy way to create this recursive function?
I think you do have to resort to global variables to write a recursive function in SmallBasic.
I'd agree that SmallBasic's lack of function arguments is quite limiting and often makes a supposedly simple programming language quite complex to use in practice.
SmallBasic's library however is great for beginners, making it significantly easier to put things on the screen than enterprise frameworks like WinForms or WPF. The library, SmallBasicLibrary.dll, can be easily loaded into other .Net languages including VB.Net, C# and F#. Simply create a console application and add a reference to the library and then use import/using/open against the Library namespace.
While teaching my kids programming I started with SmallBasic, they loved the Turtle functionality, but then quickly switched to F# which has first-class support for functions and far less ceremony when compared to VB.Net or C#. Having to explain public static void Main to a 7yo before they could print "Hello World" just wasn't an attractive option to me.
As an experiment I've also created an alternative SmallBasic compiler implementation which you may find interesting as it includes support for function arguments, tuples and pattern matching.
I think it's worth noting that creating a recursive function in this way - i.e. with only global variables, using a stack - is very educational in its own right. This is closer to the way assembly works, so from that perspective having to do things this way could actually be considered a feature...

What specifically are the dangers of eval(parse(...))?

There are several questions on how to avoid using eval(parse(...))
r-evalparse-is-often-suboptimal
avoiding-the-infamous-evalparse-construct
Which sparks the questions:
Why Specifically should eval(parse()) be avoided?
And most importantly, What are the dangers?
Are there any dangerous if the code is not used in production? (I'm thinking, any danger of getting back unintended results. Clearly if you are not careful about what you are parsing, you will have issues. But is that any more dangerous than being sloppy with get()?)
Most of the arguments against eval(parse(...)) arise not because of security concerns, after all, no claims are made about R being a safe interface to expose to the Internet, but rather because such code is generally doing things that can be accomplished using less obscure methods, i.e. methods that are both quicker and more human parse-able. The R language is supposed to be high-level, so the preference of the cognoscenti (and I do not consider myself in that group) is to see code that is both compact and expressive.
So the danger is that eval(parse(..)) is a backdoor method of getting around lack of knowledge and the hope in raising that barrier is that people will improve their use of the R language. The door remains open but the hope is for more expressive use of other features. Carl Witthoft's question earlier today illustrated not knowing that the get function was available, and the question he linked to exposed a lack of understanding of how the [[ function behaved (and how $ was more limited than [[). In both cases an eval(parse(..)) solution could be constructed, but it was clunkier and less clear than the alternative.
The security concerns only really arise if you start calling eval on strings that another user has passed to you. This is a big deal if you are creating an application that runs R in the background, but for data analysis where you are writing code to be run by yourself, then you shouldn't need to worry about the effect of eval on security.
Some other problems with eval(parse( though.
Firstly, code using eval-parse is usually much harder to debug than non-parsed code, which is problematic because debugging software is twice as difficult as writing it in the first place.
Here's a function with a mistake in it.
std <- function()
{
mean(1to10)
}
Silly me, I've forgotten about the colon operator and created my vector wrongly. If I try and source this function, then R notices the problem and throws an error, pointing me at my mistake.
Here's the eval-parse version.
ep <- function()
{
eval(parse(text = "mean(1to10)"))
}
This will source, because the error is inside a valid string. It is only later, when we come to run the code that the error is thrown. So by using eval-parse, we've lost the source-time error checking capability.
I also think that this second version of the function is much more difficult to read.
The other problem with eval-parse is that it is much slower than directly executed code. Compare
system.time(for(i in seq_len(1e4)) mean(1:10))
user system elapsed
0.08 0.00 0.07
and
system.time(for(i in seq_len(1e4)) eval(parse(text = "mean(1:10)")))
user system elapsed
1.54 0.14 1.69
Usually there's a better way of 'computing on the language' than working with code-strings; evalparse heavy-code needs a lot of safe-guarding to guarantee a sensible output, in my experience.
The same task can usually be solved by working on R code as a language object directly; Hadley Wickham has a useful guide on meta-programming in R here:
The defmacro() function in the gtools library is my favourite substitute (no half-assed R pun intended) for the evalparse construct
require(gtools)
# both action_to_take & predicate will be subbed with code
F <- defmacro(predicate, action_to_take, expr =
if(predicate) action_to_take)
F(1 != 1, action_to_take = print('arithmetic doesnt work!'))
F(pi > 3, action_to_take = return('good!'))
[1] 'good!'
# the raw code for F
print(F)
function (predicate = stop("predicate not supplied"), action_to_take = stop("action_to_take not supplied"))
{
tmp <- substitute(if (predicate) action_to_take)
eval(tmp, parent.frame())
}
<environment: 0x05ad5d3c>
The benefit of this method is that you are guaranteed to get back syntactically-legal R code. More on this useful function can be found here:
Hope that helps!
In some programming languages, eval() is a function which evaluates
a string as though it were an expression and returns a result; in
others, it executes multiple lines of code as though they had been
included instead of the line including the eval. The input to eval is
not necessarily a string; in languages that support syntactic
abstractions (like Lisp), eval's input will consist of abstract
syntactic forms.
http://en.wikipedia.org/wiki/Eval
There are all kinds of exploits that one can take advantage of if eval is used improperly.
An attacker could supply a program with the string
"session.update(authenticated=True)" as data, which would update the
session dictionary to set an authenticated key to be True. To remedy
this, all data which will be used with eval must be escaped, or it
must be run without access to potentially harmful functions.
http://en.wikipedia.org/wiki/Eval
In other words, the biggest danger of eval() is the potential for code injection into your application. The use of eval() can also cause performance issues in some languages depending on what is being used for.
Specifically in R, it's probably because you can use get() in place of eval(parse()) and your results will be the same without having to resort to eval()

Smart design of a math parser?

What is the smartest way to design a math parser? What I mean is a function that takes a math string (like: "2 + 3 / 2 + (2 * 5)") and returns the calculated value? I did write one in VB6 ages ago but it ended up being way to bloated and not very portable (or smart for that matter...). General ideas, psuedo code or real code is appreciated.
A pretty good approach would involve two steps. The first step involves converting the expression from infix to postfix (e.g. via Dijkstra's shunting yard) notation. Once that's done, it's pretty trivial to write a postfix evaluator.
I wrote a few blog posts about designing a math parser. There is a general introduction, basic knowledge about grammars, sample implementation written in Ruby and a test suite. Perhaps you will find these materials useful.
You have a couple of approaches. You could generate dynamic code and execute it in order to get the answer without needing to write much code. Just perform a search on runtime generated code in .NET and there are plenty of examples around.
Alternatively you could create an actual parser and generate a little parse tree that is then used to evaluate the expression. Again this is pretty simple for basic expressions. Check out codeplex as I believe they have a math parser on there. Or just look up BNF which will include examples. Any website introducing compiler concepts will include this as a basic example.
Codeplex Expression Evaluator
If you have an "always on" application, just post the math string to google and parse the result. Simple way but not sure if that's what you need - but smart in some way i guess.
I know this is old, but I came across this trying to develop a calculator as part of a larger app and ran across some issues using the accepted answer. The links were IMMENSELY helpful in understanding and solving this problem and should not be discounted. I was writing an Android app in Java and for each item in the expression "string," I actually stored a String in an ArrayList as the user types on the keypad. For the infix-to-postfix conversion, I iterated through each String in the ArrayList, then evaluated the newly arranged postfix ArrayList of Strings. This was fantastic for a small number of operands/operators, but longer calculations were consistently off, especially as the expressions started evaluating to non-integers. In the provided link for Infix to Postfix conversion, it suggests popping the Stack if the scanned item is an operator and the topStack item has a higher precedence. I found that this is almost correct. Popping the topStack item if it's precedence is higher OR EQUAL to the scanned operator finally made my calculations come out correct. Hopefully this will help anyone working on this problem, and thanks to Justin Poliey (and fas?) for providing some invaluable links.
The related question Equation (expression) parser with precedence? has some good information on how to get started with this as well.
-Adam
Assuming your input is an infix expression in string format, you could convert it to postfix and, using a pair of stacks: an operator stack and an operand stack, work the solution from there. You can find general algorithm information at the Wikipedia link.
ANTLR is a very nice LL(*) parser generator. I recommend it highly.
Developers always want to have a clean approach, and try to implement the parsing logic from ground up, usually ending up with the Dijkstra Shunting-Yard Algorithm. Result is neat looking code, but possibly ridden with bugs. I have developed such an API, JMEP, that does all that, but it took me years to have stable code.
Even with all that work, you can see even from that project page that I am seriously considering to switch over to using JavaCC or ANTLR, even after all that work already done.
11 years into the future from when this question was asked: If you don't want to re-invent the wheel, there are many exotic math parsers out there.
There is one that I wrote years ago which supports arithmetic operations, equation solving, differential calculus, integral calculus, basic statistics, function/formula definition, graphing, etc.
Its called ParserNG and its free.
Evaluating an expression is as simple as:
MathExpression expr = new MathExpression("(34+32)-44/(8+9(3+2))-22");
System.out.println("result: " + expr.solve());
result: 43.16981132075472
Or using variables and calculating simple expressions:
MathExpression expr = new MathExpression("r=3;P=2*pi*r;");
System.out.println("result: " + expr.getValue("P"));
Or using functions:
MathExpression expr = new MathExpression("f(x)=39*sin(x^2)+x^3*cos(x);f(3)");
System.out.println("result: " + expr.solve());
result: -10.65717648378352
Or to evaluate the derivative at a given point(Note it does symbolic differentiation(not numerical) behind the scenes, so the accuracy is not limited by the errors of numerical approximations):
MathExpression expr = new MathExpression("f(x)=x^3*ln(x); diff(f,3,1)");
System.out.println("result: " + expr.solve());
result: 38.66253179403897
Which differentiates x^3 * ln(x) once at x=3.
The number of times you can differentiate is 1 for now.
or for Numerical Integration:
MathExpression expr = new MathExpression("f(x)=2*x; intg(f,1,3)");
System.out.println("result: " + expr.solve());
result: 7.999999999998261... approx: 8
This parser is decently fast and has lots of other functionality.
Work has been concluded on porting it to Swift via bindings to Objective C and we have used it in graphing applications amongst other iterative use-cases.
DISCLAIMER: ParserNG is authored by me.

Resources