I'm trying to solve a CSP (Constraint-Satisfaction-Problem), which is based on arbitrary context-free grammars. A quick example: Let's say we have a context-free grammar with the following production rules: S->A, S->B, S->AB, A->Aa, A->a, A->aa, B->Bb, B->b, B->bb.
Now I'm looking for a word, which uses specific (sub-sequences of) production rules. For example:
"B->b" must be used 3 times in the whole derivation-sequence
If "S->AB" is used, it must be followed by "B->Bb"
If "S->Aa" is used, it must directly be followed by "A->a"
I know the problem is a CSP, but I couldnt find a specific algorithm on how to solve this. Any ideas, which (concrete) algorithm could be used? Also, I'm thinking about the right data-strucure. Should I rather use an n-Tree or an array (CYK-Table)?
Related
I have confusion. Semantically we can construct 2^(2^n) boolean functions, but I read in Digital Electronics Morris Mano that we can construct 2^2n combinations of minterm/maxterm. How?
Samsamp, could you point to a specific place in the book or even better provide an exact quote? In the copy I found over the Internet I was not able to find such a claim after a fast glance over. The closest thing I found is:
Since the function can be either I or 0 for each minterm, and since there
are 2^n min terms, one can calculate the possible functions that can be formed with n variables to be 2^2^n.
which looks OK to me.
I would like to create a grammar for parsing a toy like formula language that resembles S-expression syntax.
I read through the "Getting Started with PyParsing" book and it included a very nice section that sort of covers a similar grammar.
Two examples of data to parse are:
sum(5,10,avg(15,20))+10
stdev(5,10)*2
Now, I have come up with a grammar that sort-of parses the formula but disregards
expanding the functions and operator precedence.
What would be the best practice to continue on with it: Should I add parseActions
for words that match oneOf the function names ( sum, avg ... ). If I build a nested
list, I could do a depth-first walking of parse results and evaluate the functions ?
It's a little difficult to advise without seeing more of your code. Still, from what you describe, it sounds like you are mostly tokenizing, to recognize the various bits of punctuation and distinguishing variable names from numeric constants from algebraic operators. nestedExpr will impart some structure, but only basic parenthetical nesting - this still leaves operator precedence handling for your post-parsing work.
If you are learning about parsing infix notation, there is a succession of pyparsing examples to look through and study (at the pyparsing wiki Examples page). Start with fourFn.py, which is actually a five function infix notation parser. Look through its BNF() method, and get an understanding of how the recursive definitions work (don't worry about the pushFirst parse actions just yet). By structuring the parser this way, operator precedence gets built right into the parsed results. If you parse 4 + 2 * 3, a mere tokenizer just gives you ['4','+','2','*','3'], and then you have to figure out how to do the 2*3 before adding the 4 to get 10, and not just brute force add 4 and 2, then multiply by 3 (which gives the wrong answer of 18). The parser in fourFn.py will give you ['4','+',['2','*','3']], which is enough structure for you to know to evaluate the 2*3 part before adding it to 4.
This whole concept of parsing infix notation with precedence of operations is so common, I wrote a helper function that does most of the hard work, called operatorPrecedence. You can see how this works in the example simpleArith.py and then move on to eval_arith.py to see the extensions need to create an evaluator of the parsed structure. simpleBool.py is another good example showing precedence for logical terms AND'ed and OR'ed together.
Finally, since you are doing something Excel-like, take a look at excelExpr.py. It tries to handle some of the crazy corner cases you get when trying to evaluate Excel cell references, including references to other sheets and other workbooks.
Good luck!
I'm maintaining code for a mathematical algorithm that came from a book, with references in the comments. Is it better to have variable names that are descriptive of what the variables represent, or should the variables match what is in the book?
For a simple example, I may see this code, which reflects the variable in the book.
A_c = v*v/r
I could rewrite it as
centripetal_acceleration = velocity*velocity/radius
The advantage of the latter is that anyone looking at the code could understand it. However, the advantage of the former is that it is easier to compare the code with what is in the book. I may do this in order to double check the implementation of the algorithms, or I may want to add additional calculations.
Perhaps I am over-thinking this, and should simply use comments to describe what the variables are. I tend to favor self-documenting code however (use descriptive variable names instead of adding comments to describe what they are), but maybe this is a case where comments would be very helpful.
I know this question can be subjective, but I wondered if anyone had any guiding principles in order to make a decision, or had links to guidelines for coding math algorithms.
I would prefer to use the more descriptive variable names. You can't guarantee everyone that is going to look at the code has access to "the book". You may leave and take your copy, it may go out of print, etc. In my opinion it's better to be descriptive.
We use a lot of mathematical reference books in our work, and we reference them in comments, but we rarely use the same mathematically abbreviated variable names.
A common practise is to summarise all your variables, indexes and descriptions in a comment header before starting the code proper. eg.
// A_c = Centripetal Acceleration
// v = Velocity
// r = Radius
A_c = (v^2)/r
I write a lot of mathematical software. IF I can insert in the comments a very specific reference to a book or a paper or (best) web site that explains the algorithm and defines the variable names, then I will use the SHORT names like a = v * v / r because it makes the formulas easier to read and write and verify visually.
IF not, then I will write very verbose code with lots of comments and long descriptive variable names. Essentially, my code becomes a paper that describes the algorithm (anyone remember Knuth's "Literate Programming" efforts, years ago? Though the technology for it never took off, I emulate the spirit of that effort). I use a LOT of ascii art in my comments, with box-and-arrow diagrams and other descriptive graphics. I use Jave.de -- the Java Ascii Vmumble Editor.
I will sometimes write my math with short, angry little variable names, easier to read and write for ME because I know the math, then use REFACTOR to replace the names with longer, more descriptive ones at the end, but only for code that is much more informal.
I think it depends almost entirely upon the audience for whom you're writing -- and don't ever mistake the compiler for the audience either. If your code is likely to be maintained by more or less "general purpose" programmers who may not/probably won't know much about physics so they won't recognize what v and r mean, then it's probably better to expand them to be recognizable for non-physicists. If they're going to be physicists (or, for another example, game programmers) for whom the textbook abbreviations are clear and obvious, then use the abbreviations. If you don't know/can't guess which, it's probably safer to err on the side of the names being longer and more descriptive.
I vote for the "book" version. 'v' and 'r' etc are pretty well understood as acronymns for velocity and radius and is more compact.
How far would you take it?
Most (non-greek :-)) keyboards don't provide easy access to Δ, but it's valid as part of an identifier in some languages (e.g. C#):
int Δv;
int Δx;
Anyone coming afterwards and maintaining the code may curse you every day. Similarly for a lot of other symbols used in maths. So if you're not going to use those actual symbols (and I'd encourage you not to), I'd argue you ought to translate the rest, where it doesn't make for code that's too verbose.
In addition, what if you need to combine algorithms, and those algorithms have conflicting usage of variables?
A compromise could be to code and debug as contained in the book, and then perform a global search and replace for all of your variables towards the end of your development, so that it is easier to read. If you do this I would change the names of the variables slightly so that it is easier to change them later.
e.g A_c# = v#*v#/r#
Are there limitation to rebol parse function power ? Would it be capable of parsing the whole css2 / css 3 spec or will it encounter theorical impossibility to form some rules ?
Update after HostileFork answer: I mean in regexp I think it would be rather impossible, is parse much more powerfull ?
If yes does it mean it would be possible to build a browser in rebol vid compatible with html5 ?
Your question of "are there limits" is slippery. I'll try and give you "the answer" instead of just "yeah, sure"...which would be more expedient albeit not too educational. :)
Consider the following snippet. It captures the parser position into x, and then runs what's in parentheses in the DO dialect. That code re-sets x to the tail of the input if the css-parser function succeeds, or to the head of the input if the function fails. Finally, it sets the parse position to the current x. And as we know, PARSE returns true only if we're at the end of the input series when the rules finish...
parse my-css [x: (x: either css-parser x [tail x] [head x]]) :x]
That's valid parse dialect code AND it returns true if (and only if) the css-parser function succeeds. Therefore, if you can write a css parser in Rebol at all, you can write it "in the parse dialect".
(This leads to the question of it's possible to solve a given computing problem in a Rebol function. Thankfully, computer scientists don't have to re-answer that question each time a new language pops up. You can compute anything that be computed by a Turing machine, and nothing that can't be...and check out Alan Turing's own words, in layman's terms. CSS parsing isn't precisely the halting problem, so yeah... it can be done.)
I'll take a stab at re-framing your question:
"Is it possible to write a block of rules (which do not use PAREN!, SET-WORD!, or GET-WORD! constructs) that can be passed into the PARSE function and return TRUE on any valid CSS file and FALSE on any malformed one?"
The formal specification of what makes for good or bad CSS is put out by the W3C:
http://www.w3.org/TR/CSS2/grammar.html
But notice that even there, it's not all cut-and-dry. Their "formal" specification of color constants can't rule out #abcd, they had to write about it in the comments, in English:
/*
* There is a constraint on the color that it must
* have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
* after the "#"; e.g., "#000" is OK, but "#abcd" is not.
*/
hexcolor
: HASH S*
;
This leads us to ask if we would forgive Rebol for not being able to do that kind of recognition after we've tied PARSE's hands by taking away PAREN!/GET-WORD!/SET-WORD! (I just want to point out this kind of issue in light of your question).
As part of the Rebol 3 parse project there's been a write-up of the Theory of Parse...
The PARSE dialect is an enhanced member of the family of Top-down parsing languages (TDPL family) including the Top-down parsing language (TDPL), the Generalized top-down parsing language (GTDPL) and the Parsing expression grammar (PEG) and uses the same "ordered choice" parsing method as the other members of the family.
As pointed out in the link above, being a member of this class makes Rebol's PARSE strictly more powerful than both regular expressions and LL parsers. I assume it's more powerful than LL(k) and LL* parsers as well, but it has been a while since I've studied this stuff and I wouldn't bet my life on it. :)
You don't really need to understand what all that means in order to make use of it to answer your "can it be done" question. Since people have claimed to parse CSS with ANTLR, and ANTLR is an LL* parser, then I'd say Rebol can do it. PAREN! is the ace-in-the-hole which lets you do "anything" if you hit a wall, but it's a slippery slope to start using it too carelessly.
Should be perfectly capable of parsing the spec, should you have motive and patience to write the rules. It'd be a bit more involved than, say, a JSON parser, but it'd be the same idea.
What is the smartest way to design a math parser? What I mean is a function that takes a math string (like: "2 + 3 / 2 + (2 * 5)") and returns the calculated value? I did write one in VB6 ages ago but it ended up being way to bloated and not very portable (or smart for that matter...). General ideas, psuedo code or real code is appreciated.
A pretty good approach would involve two steps. The first step involves converting the expression from infix to postfix (e.g. via Dijkstra's shunting yard) notation. Once that's done, it's pretty trivial to write a postfix evaluator.
I wrote a few blog posts about designing a math parser. There is a general introduction, basic knowledge about grammars, sample implementation written in Ruby and a test suite. Perhaps you will find these materials useful.
You have a couple of approaches. You could generate dynamic code and execute it in order to get the answer without needing to write much code. Just perform a search on runtime generated code in .NET and there are plenty of examples around.
Alternatively you could create an actual parser and generate a little parse tree that is then used to evaluate the expression. Again this is pretty simple for basic expressions. Check out codeplex as I believe they have a math parser on there. Or just look up BNF which will include examples. Any website introducing compiler concepts will include this as a basic example.
Codeplex Expression Evaluator
If you have an "always on" application, just post the math string to google and parse the result. Simple way but not sure if that's what you need - but smart in some way i guess.
I know this is old, but I came across this trying to develop a calculator as part of a larger app and ran across some issues using the accepted answer. The links were IMMENSELY helpful in understanding and solving this problem and should not be discounted. I was writing an Android app in Java and for each item in the expression "string," I actually stored a String in an ArrayList as the user types on the keypad. For the infix-to-postfix conversion, I iterated through each String in the ArrayList, then evaluated the newly arranged postfix ArrayList of Strings. This was fantastic for a small number of operands/operators, but longer calculations were consistently off, especially as the expressions started evaluating to non-integers. In the provided link for Infix to Postfix conversion, it suggests popping the Stack if the scanned item is an operator and the topStack item has a higher precedence. I found that this is almost correct. Popping the topStack item if it's precedence is higher OR EQUAL to the scanned operator finally made my calculations come out correct. Hopefully this will help anyone working on this problem, and thanks to Justin Poliey (and fas?) for providing some invaluable links.
The related question Equation (expression) parser with precedence? has some good information on how to get started with this as well.
-Adam
Assuming your input is an infix expression in string format, you could convert it to postfix and, using a pair of stacks: an operator stack and an operand stack, work the solution from there. You can find general algorithm information at the Wikipedia link.
ANTLR is a very nice LL(*) parser generator. I recommend it highly.
Developers always want to have a clean approach, and try to implement the parsing logic from ground up, usually ending up with the Dijkstra Shunting-Yard Algorithm. Result is neat looking code, but possibly ridden with bugs. I have developed such an API, JMEP, that does all that, but it took me years to have stable code.
Even with all that work, you can see even from that project page that I am seriously considering to switch over to using JavaCC or ANTLR, even after all that work already done.
11 years into the future from when this question was asked: If you don't want to re-invent the wheel, there are many exotic math parsers out there.
There is one that I wrote years ago which supports arithmetic operations, equation solving, differential calculus, integral calculus, basic statistics, function/formula definition, graphing, etc.
Its called ParserNG and its free.
Evaluating an expression is as simple as:
MathExpression expr = new MathExpression("(34+32)-44/(8+9(3+2))-22");
System.out.println("result: " + expr.solve());
result: 43.16981132075472
Or using variables and calculating simple expressions:
MathExpression expr = new MathExpression("r=3;P=2*pi*r;");
System.out.println("result: " + expr.getValue("P"));
Or using functions:
MathExpression expr = new MathExpression("f(x)=39*sin(x^2)+x^3*cos(x);f(3)");
System.out.println("result: " + expr.solve());
result: -10.65717648378352
Or to evaluate the derivative at a given point(Note it does symbolic differentiation(not numerical) behind the scenes, so the accuracy is not limited by the errors of numerical approximations):
MathExpression expr = new MathExpression("f(x)=x^3*ln(x); diff(f,3,1)");
System.out.println("result: " + expr.solve());
result: 38.66253179403897
Which differentiates x^3 * ln(x) once at x=3.
The number of times you can differentiate is 1 for now.
or for Numerical Integration:
MathExpression expr = new MathExpression("f(x)=2*x; intg(f,1,3)");
System.out.println("result: " + expr.solve());
result: 7.999999999998261... approx: 8
This parser is decently fast and has lots of other functionality.
Work has been concluded on porting it to Swift via bindings to Objective C and we have used it in graphing applications amongst other iterative use-cases.
DISCLAIMER: ParserNG is authored by me.