Can rebol parse function be able to create rules for parsing css2 / css3 fully? - css

Are there limitation to rebol parse function power ? Would it be capable of parsing the whole css2 / css 3 spec or will it encounter theorical impossibility to form some rules ?
Update after HostileFork answer: I mean in regexp I think it would be rather impossible, is parse much more powerfull ?
If yes does it mean it would be possible to build a browser in rebol vid compatible with html5 ?

Your question of "are there limits" is slippery. I'll try and give you "the answer" instead of just "yeah, sure"...which would be more expedient albeit not too educational. :)
Consider the following snippet. It captures the parser position into x, and then runs what's in parentheses in the DO dialect. That code re-sets x to the tail of the input if the css-parser function succeeds, or to the head of the input if the function fails. Finally, it sets the parse position to the current x. And as we know, PARSE returns true only if we're at the end of the input series when the rules finish...
parse my-css [x: (x: either css-parser x [tail x] [head x]]) :x]
That's valid parse dialect code AND it returns true if (and only if) the css-parser function succeeds. Therefore, if you can write a css parser in Rebol at all, you can write it "in the parse dialect".
(This leads to the question of it's possible to solve a given computing problem in a Rebol function. Thankfully, computer scientists don't have to re-answer that question each time a new language pops up. You can compute anything that be computed by a Turing machine, and nothing that can't be...and check out Alan Turing's own words, in layman's terms. CSS parsing isn't precisely the halting problem, so yeah... it can be done.)
I'll take a stab at re-framing your question:
"Is it possible to write a block of rules (which do not use PAREN!, SET-WORD!, or GET-WORD! constructs) that can be passed into the PARSE function and return TRUE on any valid CSS file and FALSE on any malformed one?"
The formal specification of what makes for good or bad CSS is put out by the W3C:
http://www.w3.org/TR/CSS2/grammar.html
But notice that even there, it's not all cut-and-dry. Their "formal" specification of color constants can't rule out #abcd, they had to write about it in the comments, in English:
/*
* There is a constraint on the color that it must
* have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
* after the "#"; e.g., "#000" is OK, but "#abcd" is not.
*/
hexcolor
: HASH S*
;
This leads us to ask if we would forgive Rebol for not being able to do that kind of recognition after we've tied PARSE's hands by taking away PAREN!/GET-WORD!/SET-WORD! (I just want to point out this kind of issue in light of your question).
As part of the Rebol 3 parse project there's been a write-up of the Theory of Parse...
The PARSE dialect is an enhanced member of the family of Top-down parsing languages (TDPL family) including the Top-down parsing language (TDPL), the Generalized top-down parsing language (GTDPL) and the Parsing expression grammar (PEG) and uses the same "ordered choice" parsing method as the other members of the family.
As pointed out in the link above, being a member of this class makes Rebol's PARSE strictly more powerful than both regular expressions and LL parsers. I assume it's more powerful than LL(k) and LL* parsers as well, but it has been a while since I've studied this stuff and I wouldn't bet my life on it. :)
You don't really need to understand what all that means in order to make use of it to answer your "can it be done" question. Since people have claimed to parse CSS with ANTLR, and ANTLR is an LL* parser, then I'd say Rebol can do it. PAREN! is the ace-in-the-hole which lets you do "anything" if you hit a wall, but it's a slippery slope to start using it too carelessly.

Should be perfectly capable of parsing the spec, should you have motive and patience to write the rules. It'd be a bit more involved than, say, a JSON parser, but it'd be the same idea.

Related

What does the "note" command do in Isabelle and when is it necessary?

I am trying to learn the Isar language (as of Isabelle 2020), and understand the note command. It seems to be a fundamental element of the language since a lot of the "Derived elements" are defined based on it.
I am not sure what it does in terms of the equivalents of English:
When it is necessary to note something? Aren't all the facts and/or assumptions known at this point of time automatically used, or do I have to explicit note certain facts before I can use them? If so, which ones do and do not need to be noted?
What are the things to be noted?
In the documentation Isar-ref.PDF, appendix A1.2 (pp319), under "Primitives", it says:
note a = b reconsider and declare facts
so it seems that an equality is to be noted.
Then, in A 1.3 in the same page, it says:
from a ≡ note a then
...
from this ≡ then
Here note doesn't seem to work on an equality. Also, there seems to be a endless loop.
from a = note a then = note a from this = note a note this then ...
(then expands to from and back to then).
then on the same page, there is:
also ~ note calculation = this
In English, the word "also" (and to some extent "note that") is optional. This is partly why it is so confusing to me here because it seems to be required, and I am not sure what it does. (I can understand other things such as assume as it moves some facts into the context.)
I've seen this and that used in a note command (e.g. in here Why are the following trivial self-equalities needed in the Isabelle/Isar proof?). This confused me a lot.
Is there a dictionary (English dictionary) style explanation somewhere in terms of what note does on the things (this, that, calculation etc.)?
,... and why is note required?
(The above appendix is the closest thing to a specification I could find.)
No one ever uses that in the way of the question. It hurts to look at that proof. The Isar ref is telling you how things are defined internally, but it is a terrible introduction to Isar. Don't read it. Forget everything you read there. Really.
The right way to start Isar is the prog-prove. Or slides used for teaching.
No facts are used implicitly. Isabelle only uses the facts that you tell it to use. That is why you need using or to pass theorems as argument to tactics.
The usual way to use note is to give a name to something that does not have a name yet, like:
case Suc
note meaningfull_name_one = this(1) and meaningfull_name_two = this(2)
or variants of that. This allows you to refer to theorems by name meaningfull_name_one instead of writing down the full expression. It is not necessary, but easier (also maintenance-wise). When you have 10 assumptions, you don't want to use prems(1) or Suc(8) or whatever.
As mentioned #ManuelEberl
note [simp] = ...
is useful to locally declare a theorem as simp.
Short summary:
this ::= last fact
note H = ... ::= name facts
that ::= name of the fact generate by obtain/obtain
Don't ever use calculation directly.
It is possible to abuse note like in question, by not giving a name to things and relying on its side effects. Don't do it.
One reason is that you can create new theorems with low-level operations such as [OF _]. To simplify reusing such a theorem, you can give it a name.

In GHC's STG output with -O2, what's this sequence following Str=DmdType all about?

(Misleading title: it's only one of a plethora of inter-related similar questions below: these sound like asking for a full reference manual but keep in mind for this topic there is no reference manual other than the entirety of GHC's source-codes of its STG pipeline stage, and the collective accumulated experience of others/"insiders"..)
I'm exploring "transpiling" Haskell (from scratch for fun/learning, ignoring existing projects; target language/s similarly high-level / "already-fit-for-STG-machine" with existing GC + lambdas/func-values + closures) and so I'm trying to become ever more familiar with GHC's STG IR. Having repeatedly gone through the dozen-or-two online articles/videos of varying age, depth, detail that actually deal with the topic (plus the original paper, plus StgSyn.hs), and understanding many-perhaps-most basic principles, seeing -ddump-stged output still baffles me in various parts (I won't manually parse it but reuse GHC API's in-memory AST later on of course) --- mostly I think I'm stuck mapping my "roughly known" concepts to the "still-foreign" abbreviated/codified identifiers of that IR. If you know your way around STG a bit, mind looking at the following mini-sample to clarify a few open questions and help further solidify my (and future searchers') grasp?
From a most simple .hs module, I have -ddump-stged twice, first (on the left) with -O0 and then (on the right) with -O2, both captured in this diff.
Walking through everything def-by-def..
Lines L_|R5-11: so in O2, testX1 and testX2 seem to be global constants/literals for the integers 4 and 5 --- O0 doesn't have them. Curious!
Is Str=DmdType something about strictness? "Strictness is of type on-demand" or some such? But then a top-level/heap-ish/"global" constant literal can't be "lazy" can it.. (one of the things where I can't just casually Ctrl+F in StgSyn.hs --- it's not in there! which is odd in its own way, how come there's STG syntax not in StgSyn.hs)
Caf have a rough idea about constant-applicative-forms, but Unf=OtherCon? "Other constructor" (unboxed/native Type.S#-related?) ..
Line L6|R14: Surprised to still see type-class information in there (Num), is that "just info/annotation" or is this crucial for any of the built-in code-gens to set up some "dictionary" lookup machinery at runtime? (I'd sure hope by the late STG / pre-CMM stage that would be resolved and inlined already where possible at least in O2. After all GHC has also decided to type-default 4 and 5 to Integer). Generally speaking I understand STG is "untyped" other than denoting prim types, saturated cons, perhaps strings (looks like it later on at the bottom), so such "typeclass" annotations can only be.. I guess for readers to find their way around the ddump-ed *.stg. But correct me if not.
GblId probably just "global identifier" aka top-level CAF right? Arity clear.
Line L7|R18: now Str=DmdType for testX is, only in O2, followed by a freakish <S(LLC(C(S))LLLL),U(1*C1(C1(U)),A,1*C1(C1(U)),A,A,A,C(U))><L,U>! What's that, SKI calculus? ;D no seriously, LLC.. LLLL.. stack or other memory layout hints for CMM? Any idea? Must be some optimization, would like to understand which-and-how..
Line L8|R20: $dNum_sGM (left) and $dNum_sIx (right) have me a bit concerned, they don't seem to be "defined at the module level" here anywhere. Typeclass "method dispatch dictionary lookup" kind of thing? Would eg. CMM take this together with the above Num annotation to set things up? It always appears together with the input func arg.
The whole function "body" for both left and right can be seen here essentially as "3 lets with a lambda-ish form for 3 atoms, 2 of which are statically known literal-constants" --- I suppose this is standard and to be expected in the STG IR AST? For the first of these, funnily enough we could say that O0 has "inlined the global (what is testX1 or testX2 in O2) and O2 hasn't" (making the latter much shorter as that applies to both these constant literals).
I've only ever seen Occ=Once, what are the others and how to interpret? Once for one isn't even in StgSyn.hs..
Now LclId a counterpart to the earlier encountered GblId. That's denoting the scope of the identifier? Could it also be anything else, in this expression context? As in: if traversing the AST I roughly know how deep I am, I can ignore this since if I'm at the top-level it must be GblId and otherwise LclId? Hm.. maybe better take what STG gives me but then I need to be sure about the semantics and possibilities.. guys, using StgSyn.hs I have the wrong source file, right? Nothing on this in there either.. (always hopeful as its comments are quite well-done)
the rest is just metadata as string constants, OK.. oh wait, look at O2, there's Str=DmdType m1 and Str=DmdType m, what's the m/m1 about, another thing I don't see "defined anywhere at the module level" here? And it's not in O0..
still going strong? Merely a bonus question (for now), tell us about srt:SRT:[] ;)
Just a few tidbits - a full answer is quite beyond my knowledge.
The type of your function is
testX :: GHC.Num.Num a => a -> a
It’s compiled to a function with two arguments: a dictionary of the Num type class, and the actual argument.
The $d… names stand for dictionaries of type class instances. The <S(LLC(C(S))LLLL),… annotations are strictness information about the function arguments. They basically say which part of the argument will be used by your function and which not. Looks a bit weird here because it contains information about all the class instance members.
Some of this is explained here:
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand
The str:STR: is the „Static reference table“, i.e. list of free variables of the expression - in your case, always [].

Excel-like toy-formula parsing

I would like to create a grammar for parsing a toy like formula language that resembles S-expression syntax.
I read through the "Getting Started with PyParsing" book and it included a very nice section that sort of covers a similar grammar.
Two examples of data to parse are:
sum(5,10,avg(15,20))+10
stdev(5,10)*2
Now, I have come up with a grammar that sort-of parses the formula but disregards
expanding the functions and operator precedence.
What would be the best practice to continue on with it: Should I add parseActions
for words that match oneOf the function names ( sum, avg ... ). If I build a nested
list, I could do a depth-first walking of parse results and evaluate the functions ?
It's a little difficult to advise without seeing more of your code. Still, from what you describe, it sounds like you are mostly tokenizing, to recognize the various bits of punctuation and distinguishing variable names from numeric constants from algebraic operators. nestedExpr will impart some structure, but only basic parenthetical nesting - this still leaves operator precedence handling for your post-parsing work.
If you are learning about parsing infix notation, there is a succession of pyparsing examples to look through and study (at the pyparsing wiki Examples page). Start with fourFn.py, which is actually a five function infix notation parser. Look through its BNF() method, and get an understanding of how the recursive definitions work (don't worry about the pushFirst parse actions just yet). By structuring the parser this way, operator precedence gets built right into the parsed results. If you parse 4 + 2 * 3, a mere tokenizer just gives you ['4','+','2','*','3'], and then you have to figure out how to do the 2*3 before adding the 4 to get 10, and not just brute force add 4 and 2, then multiply by 3 (which gives the wrong answer of 18). The parser in fourFn.py will give you ['4','+',['2','*','3']], which is enough structure for you to know to evaluate the 2*3 part before adding it to 4.
This whole concept of parsing infix notation with precedence of operations is so common, I wrote a helper function that does most of the hard work, called operatorPrecedence. You can see how this works in the example simpleArith.py and then move on to eval_arith.py to see the extensions need to create an evaluator of the parsed structure. simpleBool.py is another good example showing precedence for logical terms AND'ed and OR'ed together.
Finally, since you are doing something Excel-like, take a look at excelExpr.py. It tries to handle some of the crazy corner cases you get when trying to evaluate Excel cell references, including references to other sheets and other workbooks.
Good luck!

What, if any, is wrong with this approach to declarative I/O

I'm not sure exactly how much this falls under 'programming' opposed to 'program language design'. But the issue is this:
Say, for sake of simplicity we have two 'special' lists/arrays/vectors/whatever we just call 'ports' for simplicity, one called stdIn and another stdOut. These conceptually represent respectively
All the user input given to the program in the duration of the program
All the output written to the terminal during the duration of the program
In Haskell-inspired pseudocode, it should then be possible to create this wholly declarative program:
let stdOut = ["please input a number",
"and please input another number",
"The product of both numbers is: " ++ stdIn[0] * stdIn[1]]
Which would do the expected, ask for two numbers, and print their product. The trick being that stdOut represents the list of strings written to the terminal at the completion of the program, and stdIn the list of input strings. Type errors and the fact that there needs to be some safeguard to only print the next line after a new line has been entered left aside here for simplicity's sake, it's probably easy enough to solve that.
So, before I go of to implement this idea, are there any pitfalls to it that I overlooked? I'm not aware of a similar construct already existing so it'd be naïve to not take into account that there is an obvious pitfall to it I overlooked.
Otherwise, I know that of course:
let stdOut = [stdIn[50],"Hello, World!"]
Would be an error if these results need to be interwoven in a similar fashion as above.
A similar approach was used in early versions of Haskell, except that the elements of the stdin and stdout channels were not strings but generic IO 'actions'--in fact, input and output were generalized to 'response' and 'request'. As long as both channels are lazy (i.e. they are actually 'iterators' or 'enumerators'), the runtime can simply walk the request channel, act on each request and tack appropriate responses onto the response channel. Unfortunately, the system was very hard to use, so it was scrapped in favor of monadic IO. See these papers:
Hudak, P., and Sundaresh, R. On the expressiveness of purely-functional I/O systems. Tech. Rep. YALEU/DCS/RR-665, Department of Computer Science, Yale University, Mar. 1989.
Peyton Jones, S. Tackling the Awkward Squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell. In Engineering theories of software construction, 2002, pp. 47--96.
The approach you're describing sounds like "Dialogs." In their award-winning 1993 paper Imperative Functional Programming, Phil Wadler and Simon Peyton Jones give some examples where dialogs really don't work very well, and they explain why monadic I/O is better.
I don't see how you will weave them considering this example compared to your own:
let stdOut = ["Welcome to the program which multiplies.",
"please input a number",
"and please input another number",
"The product of both numbers is: " ++ stdIn[0] * stdIn[1]]
Should the program prompt for the number represented by stdIn[0] after outputting one line (as in your example) or two lines? If the index 0 represents the 0th input from stdin, then it seems something similar to:
let stdOut = ["Welcome to the program which multiplies.",
"please input a number",
some_annotation(stdIn[0]),
"and please input another number",
some_annotation(stdIn[1]),
"The product of both numbers is: " ++ stdIn[0] * stdIn[1]]
will be required in order to coordinate the timing of output and input.
I like your idea. Replace some_annotation with your preference, perhaps something akin "synchronize?" I couldn't come up with the incisive word for it.
This approach seems to be the "most obvious" way to add I/O to a pure λ-calculus, and other people have mentioned that something along those lines has been tried in Haskell and Miranda.
However, I am aware of a language, not based on a λ-calculus, that still uses a very similar system:
How to handle input and output in a
language without side effects? In a
certain sense, input and output aren't
side effects; they are, so to speak,
front- and back-effects. (...) [A program is]
a function from the space
of possible inputs to the space of
possible outputs.
Input and output streams are
represented as lists of natural
numbers from 0 to 255, each
corresponding to one byte. End-of-file
is represented by the value 256, not
by end of list. (This is because it is
often easier to deal with EOF as a
character than as a special case.
Nevertheless, I wonder if it wouldn't
be better to use end-of-list.)
(...)
It's not difficult to write
interactive programs (...) [but] doing
so is, technically speaking, a sin.
(...) In a referentially transparent
language, anything not explicitly
synchronized is fair game for
evaluation in any order whatsoever, at
the run-time system's discretion.
(...) The most obvious way of writing
this particular program is to cons
together the "Hello, [name]!" string
in an expression which is conditioned
on receipt of a newline. If you do
this you are safe, because there's no
way for any evaluator to prove in
advance that the user will ever type a
newline.
(...)
So there's no practical problem with
interactive software. Nevertheless,
there's something unpleasant about the
way the second case is prevented. A
referentially transparent program
should not have to rely on lazy
evaluation in order to work properly.
How to escape this moral dilemma? The
hard way is to switch to a more
sophisticated I/O system, perhaps
based on Haskell's, in which input and
output are explicitly synchronized.
I'm rather disinclined to do this, as
I much prefer the simplicity of the
current system. The easy way out is to
write batch programs which happen to
work well interactively. This is
mainly just a matter of not prompting
the user.
Perhaps you would enjoying doing some programming in Lazy K?

Smart design of a math parser?

What is the smartest way to design a math parser? What I mean is a function that takes a math string (like: "2 + 3 / 2 + (2 * 5)") and returns the calculated value? I did write one in VB6 ages ago but it ended up being way to bloated and not very portable (or smart for that matter...). General ideas, psuedo code or real code is appreciated.
A pretty good approach would involve two steps. The first step involves converting the expression from infix to postfix (e.g. via Dijkstra's shunting yard) notation. Once that's done, it's pretty trivial to write a postfix evaluator.
I wrote a few blog posts about designing a math parser. There is a general introduction, basic knowledge about grammars, sample implementation written in Ruby and a test suite. Perhaps you will find these materials useful.
You have a couple of approaches. You could generate dynamic code and execute it in order to get the answer without needing to write much code. Just perform a search on runtime generated code in .NET and there are plenty of examples around.
Alternatively you could create an actual parser and generate a little parse tree that is then used to evaluate the expression. Again this is pretty simple for basic expressions. Check out codeplex as I believe they have a math parser on there. Or just look up BNF which will include examples. Any website introducing compiler concepts will include this as a basic example.
Codeplex Expression Evaluator
If you have an "always on" application, just post the math string to google and parse the result. Simple way but not sure if that's what you need - but smart in some way i guess.
I know this is old, but I came across this trying to develop a calculator as part of a larger app and ran across some issues using the accepted answer. The links were IMMENSELY helpful in understanding and solving this problem and should not be discounted. I was writing an Android app in Java and for each item in the expression "string," I actually stored a String in an ArrayList as the user types on the keypad. For the infix-to-postfix conversion, I iterated through each String in the ArrayList, then evaluated the newly arranged postfix ArrayList of Strings. This was fantastic for a small number of operands/operators, but longer calculations were consistently off, especially as the expressions started evaluating to non-integers. In the provided link for Infix to Postfix conversion, it suggests popping the Stack if the scanned item is an operator and the topStack item has a higher precedence. I found that this is almost correct. Popping the topStack item if it's precedence is higher OR EQUAL to the scanned operator finally made my calculations come out correct. Hopefully this will help anyone working on this problem, and thanks to Justin Poliey (and fas?) for providing some invaluable links.
The related question Equation (expression) parser with precedence? has some good information on how to get started with this as well.
-Adam
Assuming your input is an infix expression in string format, you could convert it to postfix and, using a pair of stacks: an operator stack and an operand stack, work the solution from there. You can find general algorithm information at the Wikipedia link.
ANTLR is a very nice LL(*) parser generator. I recommend it highly.
Developers always want to have a clean approach, and try to implement the parsing logic from ground up, usually ending up with the Dijkstra Shunting-Yard Algorithm. Result is neat looking code, but possibly ridden with bugs. I have developed such an API, JMEP, that does all that, but it took me years to have stable code.
Even with all that work, you can see even from that project page that I am seriously considering to switch over to using JavaCC or ANTLR, even after all that work already done.
11 years into the future from when this question was asked: If you don't want to re-invent the wheel, there are many exotic math parsers out there.
There is one that I wrote years ago which supports arithmetic operations, equation solving, differential calculus, integral calculus, basic statistics, function/formula definition, graphing, etc.
Its called ParserNG and its free.
Evaluating an expression is as simple as:
MathExpression expr = new MathExpression("(34+32)-44/(8+9(3+2))-22");
System.out.println("result: " + expr.solve());
result: 43.16981132075472
Or using variables and calculating simple expressions:
MathExpression expr = new MathExpression("r=3;P=2*pi*r;");
System.out.println("result: " + expr.getValue("P"));
Or using functions:
MathExpression expr = new MathExpression("f(x)=39*sin(x^2)+x^3*cos(x);f(3)");
System.out.println("result: " + expr.solve());
result: -10.65717648378352
Or to evaluate the derivative at a given point(Note it does symbolic differentiation(not numerical) behind the scenes, so the accuracy is not limited by the errors of numerical approximations):
MathExpression expr = new MathExpression("f(x)=x^3*ln(x); diff(f,3,1)");
System.out.println("result: " + expr.solve());
result: 38.66253179403897
Which differentiates x^3 * ln(x) once at x=3.
The number of times you can differentiate is 1 for now.
or for Numerical Integration:
MathExpression expr = new MathExpression("f(x)=2*x; intg(f,1,3)");
System.out.println("result: " + expr.solve());
result: 7.999999999998261... approx: 8
This parser is decently fast and has lots of other functionality.
Work has been concluded on porting it to Swift via bindings to Objective C and we have used it in graphing applications amongst other iterative use-cases.
DISCLAIMER: ParserNG is authored by me.

Resources