Smart design of a math parser? - math

What is the smartest way to design a math parser? What I mean is a function that takes a math string (like: "2 + 3 / 2 + (2 * 5)") and returns the calculated value? I did write one in VB6 ages ago but it ended up being way to bloated and not very portable (or smart for that matter...). General ideas, psuedo code or real code is appreciated.

A pretty good approach would involve two steps. The first step involves converting the expression from infix to postfix (e.g. via Dijkstra's shunting yard) notation. Once that's done, it's pretty trivial to write a postfix evaluator.

I wrote a few blog posts about designing a math parser. There is a general introduction, basic knowledge about grammars, sample implementation written in Ruby and a test suite. Perhaps you will find these materials useful.

You have a couple of approaches. You could generate dynamic code and execute it in order to get the answer without needing to write much code. Just perform a search on runtime generated code in .NET and there are plenty of examples around.
Alternatively you could create an actual parser and generate a little parse tree that is then used to evaluate the expression. Again this is pretty simple for basic expressions. Check out codeplex as I believe they have a math parser on there. Or just look up BNF which will include examples. Any website introducing compiler concepts will include this as a basic example.
Codeplex Expression Evaluator

If you have an "always on" application, just post the math string to google and parse the result. Simple way but not sure if that's what you need - but smart in some way i guess.

I know this is old, but I came across this trying to develop a calculator as part of a larger app and ran across some issues using the accepted answer. The links were IMMENSELY helpful in understanding and solving this problem and should not be discounted. I was writing an Android app in Java and for each item in the expression "string," I actually stored a String in an ArrayList as the user types on the keypad. For the infix-to-postfix conversion, I iterated through each String in the ArrayList, then evaluated the newly arranged postfix ArrayList of Strings. This was fantastic for a small number of operands/operators, but longer calculations were consistently off, especially as the expressions started evaluating to non-integers. In the provided link for Infix to Postfix conversion, it suggests popping the Stack if the scanned item is an operator and the topStack item has a higher precedence. I found that this is almost correct. Popping the topStack item if it's precedence is higher OR EQUAL to the scanned operator finally made my calculations come out correct. Hopefully this will help anyone working on this problem, and thanks to Justin Poliey (and fas?) for providing some invaluable links.

The related question Equation (expression) parser with precedence? has some good information on how to get started with this as well.
-Adam

Assuming your input is an infix expression in string format, you could convert it to postfix and, using a pair of stacks: an operator stack and an operand stack, work the solution from there. You can find general algorithm information at the Wikipedia link.

ANTLR is a very nice LL(*) parser generator. I recommend it highly.

Developers always want to have a clean approach, and try to implement the parsing logic from ground up, usually ending up with the Dijkstra Shunting-Yard Algorithm. Result is neat looking code, but possibly ridden with bugs. I have developed such an API, JMEP, that does all that, but it took me years to have stable code.
Even with all that work, you can see even from that project page that I am seriously considering to switch over to using JavaCC or ANTLR, even after all that work already done.

11 years into the future from when this question was asked: If you don't want to re-invent the wheel, there are many exotic math parsers out there.
There is one that I wrote years ago which supports arithmetic operations, equation solving, differential calculus, integral calculus, basic statistics, function/formula definition, graphing, etc.
Its called ParserNG and its free.
Evaluating an expression is as simple as:
MathExpression expr = new MathExpression("(34+32)-44/(8+9(3+2))-22");
System.out.println("result: " + expr.solve());
result: 43.16981132075472
Or using variables and calculating simple expressions:
MathExpression expr = new MathExpression("r=3;P=2*pi*r;");
System.out.println("result: " + expr.getValue("P"));
Or using functions:
MathExpression expr = new MathExpression("f(x)=39*sin(x^2)+x^3*cos(x);f(3)");
System.out.println("result: " + expr.solve());
result: -10.65717648378352
Or to evaluate the derivative at a given point(Note it does symbolic differentiation(not numerical) behind the scenes, so the accuracy is not limited by the errors of numerical approximations):
MathExpression expr = new MathExpression("f(x)=x^3*ln(x); diff(f,3,1)");
System.out.println("result: " + expr.solve());
result: 38.66253179403897
Which differentiates x^3 * ln(x) once at x=3.
The number of times you can differentiate is 1 for now.
or for Numerical Integration:
MathExpression expr = new MathExpression("f(x)=2*x; intg(f,1,3)");
System.out.println("result: " + expr.solve());
result: 7.999999999998261... approx: 8
This parser is decently fast and has lots of other functionality.
Work has been concluded on porting it to Swift via bindings to Objective C and we have used it in graphing applications amongst other iterative use-cases.
DISCLAIMER: ParserNG is authored by me.

Related

Recursion using only global variables

For the cause of simplicity, Smallbasic has only global variables. It does not have locals or parameters.
Although this makes it simpler to teach or learn it, it also complicates some matters, like recursive functions. I had a hard time creating a simple recursive function in smallbasic and had to use a manual stack. This works but it makes it more complicated and contradicts the initial main goal of simplicity!
This is how i can write the factorial:
n = 5
ind = 1
fact()
TextWindow.WriteLine("fact(5)=" + f)
Sub fact
If n = 1 Then
f = 1
Else
ind = ind+1
keepn[ind] = n
n = n-1
fact()
f = f * keepn[ind]
ind = ind-1
EndIf
EndSub
Note: I wrote it just now and it could have errors.
You see the picture. I'm manually creating a stack and using it to simulate local variable and use it for recursion.
Is there an easy way to create this recursive function?
I think you do have to resort to global variables to write a recursive function in SmallBasic.
I'd agree that SmallBasic's lack of function arguments is quite limiting and often makes a supposedly simple programming language quite complex to use in practice.
SmallBasic's library however is great for beginners, making it significantly easier to put things on the screen than enterprise frameworks like WinForms or WPF. The library, SmallBasicLibrary.dll, can be easily loaded into other .Net languages including VB.Net, C# and F#. Simply create a console application and add a reference to the library and then use import/using/open against the Library namespace.
While teaching my kids programming I started with SmallBasic, they loved the Turtle functionality, but then quickly switched to F# which has first-class support for functions and far less ceremony when compared to VB.Net or C#. Having to explain public static void Main to a 7yo before they could print "Hello World" just wasn't an attractive option to me.
As an experiment I've also created an alternative SmallBasic compiler implementation which you may find interesting as it includes support for function arguments, tuples and pattern matching.
I think it's worth noting that creating a recursive function in this way - i.e. with only global variables, using a stack - is very educational in its own right. This is closer to the way assembly works, so from that perspective having to do things this way could actually be considered a feature...

matrix multiplication order PVM vs MVP in graphics programming

hi there I was wondering why most tutorials and programming code use MVP to describe the Model-View-Projection matrix. Instead of PVM which is the actual order of implementation in the code:
mat4 MVP = ProjectionMatrix * ViewMatrix * ModelMatrix;
gl_Position = MVP * VertexInModelSpace;
seems much more understandable to me to write PVM instead of MVP.
Matrices don't actually have a fixed meaning, just relations between rows and columns. The meaning is freely definable by the developers. The MVP order follows from standard mathematical conventions. But since nothing says you can not define the vectors as columns instead of rows nothing precludes this ordering.
Clarification: Since changing notation transposes the meaning. Then following applies:
MmvpT = Mpvm
Due to the definition of matrix multiplication following rule kicks in:
(AB)T = BTAT
Since B can be recursively another matrix multiplication a infinite chain of these are possible. Which means essentially that you have swapped the multiplication order, by changing notation.
Its a bit like looking at the problem from the outside or the problem from the inside. In this case your thinking as a outside observer. Whereas the other way around one would observe the thing from the standpoint of the first operator in the chain. Personally I think the notation you use may be more intuitive for this specific task, the other is just way more common. Mainly due to the fact that all mathematics books I have ever seen use this convention, so blame the mathematicians.
So better stick with the more common way, makes things more generally understandable. For example: Nothing stops me from typing the answer in Finnish but the convention of stackoverflow is to answer in English, making answers more understandable to most users. Use the more common form since others may not grasp the difference, and this leads to errors.
The other problem is that matrix multiplication is not necessarily commutative:
AB != BA
So it's a good idea to stick with the convention.

Code generation for mathematical problems [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I would like to write a program that takes in a description of a mathematical (optimization) problem, parses it, and generates compact, efficient C code that solves it. I have a hacked up solution to a much smaller, more specific problem, in python, but it is ugly and just relies on templating the C code - so I have a whole mess of strings that look like
for (k = 0; k <= %s; k += %s) a[k] = v[k]/%s * a[i];
And then there is a mess of complex conditional logic, and at some point the above line gets written to solve_problem.c, after filling in the correct values of %s.
It actually gets much more complicated, because typically the problem is parameterized by matrices with certain structure, etc, and the approach above, while workable, is sort of starting to fall apart under its own weight.
So I suppose what I'm looking for is high-level advice on how to represent these sorts of problems in code, or rather just examples of other projects where this has been solved. Someone told me to use OCaml or F# and look at FFTW, but something simpler would be appreciated.
I'm sorry for being so inarticulate, but it's difficult for me to even express what I'm looking for to myself, which is, I think, the root of the problem.
for (k = 0; k <= %s; k += %s) a[k] = v[k]/%s * a[i];
You are asking for ways to represent code like the above. This could be represented by the value:
For("k", Int 0, Leq(Var "k", a), Set("k", Add(Var "k", b)),
SetElt(Var "a", Var "k",
Mul(Div(GetElt(Var "v", Var "k"), c, GetElt(Var "a", Var "i")))))
given a type like this:
type Expr =
| Int of int
| Var of string
| Leq of Expr * Expr
| Mul of Expr * Expr
| Div of Expr * Expr
| Set of string * Expr
| SetElt of Expr * Expr * Expr
| GetElt of Expr * Expr
| For of string * Expr * Expr * Expr
I wrote a very simple high-level VM called HLVM that you might find enlightening because it uses such representations in a simple way. The definitions are here and a bunch of tests written using those definitions are here.
This representation is far more powerful than string munging because the pattern match compiler does exhaustiveness and redundancy checking for you, making it easy to write functions over values of this Expr type including optimization passes and code generators.
You are trying to implement a compiler, and this is how you should approach your problem. There is an input language which describes your optimization problem, and the output language is C.
You can chop up your problem into the following tasks (not necessarily solved in this order):
Design a data structure which represents the abstract syntax for your input language.
Design a data structure which represents the abstract syntax of your output language, which in your case is (a subset of) C.
Design concrete syntax of your input language.
Implement a lexer and a parser which converts concrete syntax to abstract syntax.
Implement a pretty printer which converts the abstract syntax of your output language to concrete syntax.
Implement a compiler which takes an optimization problem, expressed in abstract syntax, into the output, again expressed in abstract syntax.
If you are not used to implementing languages and compilers you will be tempted to take shortcuts. For example, you might consider parsing using regular expressions. Or you might think it is a good idea to skip the abstract syntax, and just generate the C source directly. I strongly advise against this. Abstraction is your friend because it will make your problem manageable.
You should carefully choose the language in which you will implement the whole thing. Of course, something like Ocaml is perfect for the job. But if you do not know Ocaml already, you should just stick to whatever language you are most comfortable with. You should not try to implement parsers by hand, there are plenty of parser generators out there. It is worth learning one. You may find my PL Zoo helpful.
I don't know how much background you have in optimization, but I doubt the path you described is the way to go. Specifically, I would be surprised if you could write efficient C code to solve optimization problems, unless you are restricting yourself to specific classes of problems. Optimization typically distinguishes between different types of problems (linear vs. non linear, integer vs continuous vs mixed-integer programming), which each typically use very different algorithms to solve the solution.
You might want to look into the Microsoft Solver Foundation for some ideas. Essentially, the MSF is a general API, which allows you to declare your problem in multiple forms (OML, a declarative language for specifying optimization problems, but also C# and F#), and then feeds the problem to the appropriate solver, given the nature of the problem.
Dunno about simpler. Suggest you look at existing work in mathematical modelling. I wouldn't expect this to be simple; solver codes are difficult enough, and generating them is harder.
You need ways to specify the details of your problem, and means to assemble the parts of the answer controlled by these details.
I recommend:
Sinapse, a system for generating mathematical modelling codes; this paper talks about how the knowledge is organized and support the generation of finite-differencing codes,
and
Solving finite differencing equations, an MIT thesis in the same vein.
(I worked on the Sinapse system during its initial development).
Sounds that you want something like symbolic computation. Look into some implementations such as the following:
Matlab
Mathematica
TomSym
Python
In general try looking at optimization packages, many support some kind of symbolic representations.

Coding mathematical algorithms - should I use variables in the book or more descriptive ones?

I'm maintaining code for a mathematical algorithm that came from a book, with references in the comments. Is it better to have variable names that are descriptive of what the variables represent, or should the variables match what is in the book?
For a simple example, I may see this code, which reflects the variable in the book.
A_c = v*v/r
I could rewrite it as
centripetal_acceleration = velocity*velocity/radius
The advantage of the latter is that anyone looking at the code could understand it. However, the advantage of the former is that it is easier to compare the code with what is in the book. I may do this in order to double check the implementation of the algorithms, or I may want to add additional calculations.
Perhaps I am over-thinking this, and should simply use comments to describe what the variables are. I tend to favor self-documenting code however (use descriptive variable names instead of adding comments to describe what they are), but maybe this is a case where comments would be very helpful.
I know this question can be subjective, but I wondered if anyone had any guiding principles in order to make a decision, or had links to guidelines for coding math algorithms.
I would prefer to use the more descriptive variable names. You can't guarantee everyone that is going to look at the code has access to "the book". You may leave and take your copy, it may go out of print, etc. In my opinion it's better to be descriptive.
We use a lot of mathematical reference books in our work, and we reference them in comments, but we rarely use the same mathematically abbreviated variable names.
A common practise is to summarise all your variables, indexes and descriptions in a comment header before starting the code proper. eg.
// A_c = Centripetal Acceleration
// v = Velocity
// r = Radius
A_c = (v^2)/r
I write a lot of mathematical software. IF I can insert in the comments a very specific reference to a book or a paper or (best) web site that explains the algorithm and defines the variable names, then I will use the SHORT names like a = v * v / r because it makes the formulas easier to read and write and verify visually.
IF not, then I will write very verbose code with lots of comments and long descriptive variable names. Essentially, my code becomes a paper that describes the algorithm (anyone remember Knuth's "Literate Programming" efforts, years ago? Though the technology for it never took off, I emulate the spirit of that effort). I use a LOT of ascii art in my comments, with box-and-arrow diagrams and other descriptive graphics. I use Jave.de -- the Java Ascii Vmumble Editor.
I will sometimes write my math with short, angry little variable names, easier to read and write for ME because I know the math, then use REFACTOR to replace the names with longer, more descriptive ones at the end, but only for code that is much more informal.
I think it depends almost entirely upon the audience for whom you're writing -- and don't ever mistake the compiler for the audience either. If your code is likely to be maintained by more or less "general purpose" programmers who may not/probably won't know much about physics so they won't recognize what v and r mean, then it's probably better to expand them to be recognizable for non-physicists. If they're going to be physicists (or, for another example, game programmers) for whom the textbook abbreviations are clear and obvious, then use the abbreviations. If you don't know/can't guess which, it's probably safer to err on the side of the names being longer and more descriptive.
I vote for the "book" version. 'v' and 'r' etc are pretty well understood as acronymns for velocity and radius and is more compact.
How far would you take it?
Most (non-greek :-)) keyboards don't provide easy access to Δ, but it's valid as part of an identifier in some languages (e.g. C#):
int Δv;
int Δx;
Anyone coming afterwards and maintaining the code may curse you every day. Similarly for a lot of other symbols used in maths. So if you're not going to use those actual symbols (and I'd encourage you not to), I'd argue you ought to translate the rest, where it doesn't make for code that's too verbose.
In addition, what if you need to combine algorithms, and those algorithms have conflicting usage of variables?
A compromise could be to code and debug as contained in the book, and then perform a global search and replace for all of your variables towards the end of your development, so that it is easier to read. If you do this I would change the names of the variables slightly so that it is easier to change them later.
e.g A_c# = v#*v#/r#

Can rebol parse function be able to create rules for parsing css2 / css3 fully?

Are there limitation to rebol parse function power ? Would it be capable of parsing the whole css2 / css 3 spec or will it encounter theorical impossibility to form some rules ?
Update after HostileFork answer: I mean in regexp I think it would be rather impossible, is parse much more powerfull ?
If yes does it mean it would be possible to build a browser in rebol vid compatible with html5 ?
Your question of "are there limits" is slippery. I'll try and give you "the answer" instead of just "yeah, sure"...which would be more expedient albeit not too educational. :)
Consider the following snippet. It captures the parser position into x, and then runs what's in parentheses in the DO dialect. That code re-sets x to the tail of the input if the css-parser function succeeds, or to the head of the input if the function fails. Finally, it sets the parse position to the current x. And as we know, PARSE returns true only if we're at the end of the input series when the rules finish...
parse my-css [x: (x: either css-parser x [tail x] [head x]]) :x]
That's valid parse dialect code AND it returns true if (and only if) the css-parser function succeeds. Therefore, if you can write a css parser in Rebol at all, you can write it "in the parse dialect".
(This leads to the question of it's possible to solve a given computing problem in a Rebol function. Thankfully, computer scientists don't have to re-answer that question each time a new language pops up. You can compute anything that be computed by a Turing machine, and nothing that can't be...and check out Alan Turing's own words, in layman's terms. CSS parsing isn't precisely the halting problem, so yeah... it can be done.)
I'll take a stab at re-framing your question:
"Is it possible to write a block of rules (which do not use PAREN!, SET-WORD!, or GET-WORD! constructs) that can be passed into the PARSE function and return TRUE on any valid CSS file and FALSE on any malformed one?"
The formal specification of what makes for good or bad CSS is put out by the W3C:
http://www.w3.org/TR/CSS2/grammar.html
But notice that even there, it's not all cut-and-dry. Their "formal" specification of color constants can't rule out #abcd, they had to write about it in the comments, in English:
/*
* There is a constraint on the color that it must
* have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
* after the "#"; e.g., "#000" is OK, but "#abcd" is not.
*/
hexcolor
: HASH S*
;
This leads us to ask if we would forgive Rebol for not being able to do that kind of recognition after we've tied PARSE's hands by taking away PAREN!/GET-WORD!/SET-WORD! (I just want to point out this kind of issue in light of your question).
As part of the Rebol 3 parse project there's been a write-up of the Theory of Parse...
The PARSE dialect is an enhanced member of the family of Top-down parsing languages (TDPL family) including the Top-down parsing language (TDPL), the Generalized top-down parsing language (GTDPL) and the Parsing expression grammar (PEG) and uses the same "ordered choice" parsing method as the other members of the family.
As pointed out in the link above, being a member of this class makes Rebol's PARSE strictly more powerful than both regular expressions and LL parsers. I assume it's more powerful than LL(k) and LL* parsers as well, but it has been a while since I've studied this stuff and I wouldn't bet my life on it. :)
You don't really need to understand what all that means in order to make use of it to answer your "can it be done" question. Since people have claimed to parse CSS with ANTLR, and ANTLR is an LL* parser, then I'd say Rebol can do it. PAREN! is the ace-in-the-hole which lets you do "anything" if you hit a wall, but it's a slippery slope to start using it too carelessly.
Should be perfectly capable of parsing the spec, should you have motive and patience to write the rules. It'd be a bit more involved than, say, a JSON parser, but it'd be the same idea.

Resources