Empty string as base case in BNF recursion? - bnf

I'm writing a small grammar as an exercise for class and my professor didn't really get too specific with what qualifies a legal BNF expression in terms.
The BNF grammar is supposed to recognize strings in this form: AB, AABB, AAABBB, A...B... (general form: AnBn)
So, what I got up to was writing:
<S> --> A<S>B | ""
My simple yes/no question is whether or not this is legal in BNF, and if not why?
<...> represents a non-terminal, I have no idea if that's convention or what not

Yes, this is acceptable BNF. You can see an example of this on the Wikipedia entry in the example about US postal addresses.
Typically, though, I've seen the empty string represented as ε.

Related

How can I write an unambiguous nearley grammar for boolean search operators

The Context
I am climbing the Nearley learning curve and trying to write a grammar for a search query parser.
The Goal
I would like to write grammar that is able to parse a querystring that contains boolean operators (e.g. AND, OR, NOT). Lets use AND for this question as a trivial case.
For instance, the grammar should recognize these example strings as valid:
pants
pants AND socks
jumping jacks
The Attempt
My naive attempt looks something like this:
query ->
statement
| statement "AND" statement
statement -> .:+
The Problem
The above grammar attempt is ambiguous because .:+ will match literally any string.
What I really want is for the first condition to match any string that does not contain AND in it. Once "AND" appears I want to enter the second condition only.
The Question
How can I detect these two distinct cases without having ambiguous grammar?
I am worried I'm missing something fundamental; I can imagine a ton of use cases where we want arbitrary text split up by known operators.
Yeah, if you've got an escape hatch that could be literally anything, you're going to have a problem.
Somewhere you're going to want to define what your base set of tokens are, at least something like \S+ and then how those tokens can be composed.
The place I'd typically start for a parser is trying to figure out where recursion is accounted for in the parser, and what approach to parsing the lib you're relying on takes.
Looks like Nearley is an Earley parser, and as the wikipedia entry for them notes, they're efficient for left-recursion.
This is just hazarding a guess, but something like this might get you to conjunction at least.
CONJUNCTION -> AND | OR
STATEMENT -> TOKENS | (TOKENS CONJUNCTION STATEMENT)
TOKENS -> [^()]+
A structure like this should be unambiguous and bans parentheses in tokens, unless they're surrounded by double quotes.

What is retracting in the context of OCaml?

I have scribbled the term "retracting in OCaml" in a small space in my notebook and now I can't seem to recollect what it was about nor can I find anything about it on the internet.
Does this term really exist or is it my lecturer's own notation for some property of OCaml. My classmates also don't seem to remember what it was about so I just want to confirm if I was dreaming or not.
Another possible explanation: in math, a retraction is a left inverse of a morphism (see this definition). In particular, a parser can be seen as a retraction w.r.t. a given pretty-printer: start from an abstract syntax tree (AST) and pretty-print it, then parsing the resulting source code should yield the original AST (while the opposite is not necessarily true). It doesn't have much to do with OCaml per se but it is linked to an algebraic view (of compiling) which is quite common in functional programming.

calculate binary expression - convert string to binary data

I get a string like this: "000AND111"
I need to calculate this and return the result.
How I do it in Flex?
just see this post thanks to the pingback by #powerlljf3
I would suggest a 3 phases approach.
1- write a small parser that split up the string in meaningful tokens (numbers and operands). Since Operands are all litterals and numbers are 0/1 combination, the parser is pretty easy (the grammer is LL1), so regular expressions can be really do the work here.
2- after building up the sequency of tokens and what is tecnically call the parsed expression tree (the sequency of tokens and operands), just implements any operand with the specific function (the link to my blog, works for few of the common boolean algebra operands)
3- finally just start reading tokens from left to right, and apply function where operands are found.
I would look through this http://www.nicolabortignon.com/as3-bitwise-operations/. It has many examples of binary math that can be used in AS3.

Can rebol parse function be able to create rules for parsing css2 / css3 fully?

Are there limitation to rebol parse function power ? Would it be capable of parsing the whole css2 / css 3 spec or will it encounter theorical impossibility to form some rules ?
Update after HostileFork answer: I mean in regexp I think it would be rather impossible, is parse much more powerfull ?
If yes does it mean it would be possible to build a browser in rebol vid compatible with html5 ?
Your question of "are there limits" is slippery. I'll try and give you "the answer" instead of just "yeah, sure"...which would be more expedient albeit not too educational. :)
Consider the following snippet. It captures the parser position into x, and then runs what's in parentheses in the DO dialect. That code re-sets x to the tail of the input if the css-parser function succeeds, or to the head of the input if the function fails. Finally, it sets the parse position to the current x. And as we know, PARSE returns true only if we're at the end of the input series when the rules finish...
parse my-css [x: (x: either css-parser x [tail x] [head x]]) :x]
That's valid parse dialect code AND it returns true if (and only if) the css-parser function succeeds. Therefore, if you can write a css parser in Rebol at all, you can write it "in the parse dialect".
(This leads to the question of it's possible to solve a given computing problem in a Rebol function. Thankfully, computer scientists don't have to re-answer that question each time a new language pops up. You can compute anything that be computed by a Turing machine, and nothing that can't be...and check out Alan Turing's own words, in layman's terms. CSS parsing isn't precisely the halting problem, so yeah... it can be done.)
I'll take a stab at re-framing your question:
"Is it possible to write a block of rules (which do not use PAREN!, SET-WORD!, or GET-WORD! constructs) that can be passed into the PARSE function and return TRUE on any valid CSS file and FALSE on any malformed one?"
The formal specification of what makes for good or bad CSS is put out by the W3C:
http://www.w3.org/TR/CSS2/grammar.html
But notice that even there, it's not all cut-and-dry. Their "formal" specification of color constants can't rule out #abcd, they had to write about it in the comments, in English:
/*
* There is a constraint on the color that it must
* have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
* after the "#"; e.g., "#000" is OK, but "#abcd" is not.
*/
hexcolor
: HASH S*
;
This leads us to ask if we would forgive Rebol for not being able to do that kind of recognition after we've tied PARSE's hands by taking away PAREN!/GET-WORD!/SET-WORD! (I just want to point out this kind of issue in light of your question).
As part of the Rebol 3 parse project there's been a write-up of the Theory of Parse...
The PARSE dialect is an enhanced member of the family of Top-down parsing languages (TDPL family) including the Top-down parsing language (TDPL), the Generalized top-down parsing language (GTDPL) and the Parsing expression grammar (PEG) and uses the same "ordered choice" parsing method as the other members of the family.
As pointed out in the link above, being a member of this class makes Rebol's PARSE strictly more powerful than both regular expressions and LL parsers. I assume it's more powerful than LL(k) and LL* parsers as well, but it has been a while since I've studied this stuff and I wouldn't bet my life on it. :)
You don't really need to understand what all that means in order to make use of it to answer your "can it be done" question. Since people have claimed to parse CSS with ANTLR, and ANTLR is an LL* parser, then I'd say Rebol can do it. PAREN! is the ace-in-the-hole which lets you do "anything" if you hit a wall, but it's a slippery slope to start using it too carelessly.
Should be perfectly capable of parsing the spec, should you have motive and patience to write the rules. It'd be a bit more involved than, say, a JSON parser, but it'd be the same idea.

Is there any decryption algorithm that uses a dictionary to decrypt an encrypted algorithm?

Well I have been working on an assigment and it states:
A program has to be developed, and coded in C language, to decipher a document written
in Italian that is encoded using a secret key. The secret key is obtained as random
permutation of all the uppercase letters, lowercase letters, numbers and blank space. As
an example, let us consider the following two strings:
Plain: “ABCDEFGHIJKLMNOPQRSTUVXWYZabcdefghijklmnopqrstuvwxyz0123456789 ”
Code: “BZJ9y0KePWopxYkQlRjhzsaNTFAtM7H6S24fC5mcIgXbnLOq8Uid 3EDv1ruVGw”
The secret key modifies only letters, numbers, and spaces of the original document, while
the remaining characters are left unchanged. The document is stored in a text file whose
length is unknown.
The program has to read the document, find the secret key (which by definition is
unknown; the above table is just an example and it is not the key used for preparing the
sample files available on the web course) using a suitable decoding algorithm, and write
the decoded document to a new text file.
And I know that I have to upload an English dictionary into the program but I don't why it has been asked (may be not in that statement but I have to do THAT). My question is, while I can do that program using simple encryption/decryption algorithm then what's the use of uploading the English dictionary in our program? So is there any decryption algorithm that uses a dictionary to decrypt an encrypted algorithm? Or can somebody tell me what approach or algorithm should I use to solve that problem???
An early reply (and also authentic one) will be highly appreciated from you.
Thank you guys.
This is a simple substitution cipher. It can be broken using frequency analysis. The Wikipedia articles explain both concepts thoroughly. What you need to do is:
Find the statistical frequency of characters in Italian texts. If you can't find this published anywhere, you can build it yourself by analyzing a large corpus of Italian texts.
Analyze the frequency of characters in the cipher text, and match it to the statistical data.
The first Wikipedia article links to a set of tools that implement all of the above. You just need to use and possibly adapt it to your use case.
Your cipher is a substitution cipher. That is it substitutes one letter for another.
consider the cipher text
"yjr,1drv2ry1od1q1..."
We can use a dictionary to find the plaintext.
Find punctuation, since a space always follows a comma, you can find the substitution rule for spaces.
which gives you.
"yjr, drv2ry od q..."
Notice the word lengths. Since there only two 1 letter words in the english language the q is probably i or a. "yjr" is probably "why", "the", "how" etc.
We try why with the result
"why, dyv2yw od q..."
There are no english words with two y's, and end in w.
So we try "the" and get
"the, dev2et od q..."
We conclude that the is a likely answer.
Now we search our dictionary for words that start look like ?e??et.
rinse repeat.
That is, find some set of words which fit into the lengths available and do not break each others substitution rules.
Personally I just do the frequency analysis suggested above.
Frequency analysis, as both other respondents said, is the way to go, and you can use digrams and trigrams to make it much stronger. Just grab tons of Italian text from the web and churn ahead! It's really pretty simple programming.

Resources