How do you abstract some expression to BNF? - bnf

For example :
waldo:=fern+alpha/-beta^gamma;
The above arithmetical expression may be abstracted by this BNF(there may be some difference from standard BNF,but lets ignore it for now):
AEXP = AS $AS ;
AS = .ID ':=' EX1 ';' ;
EX1 = EX2 $( '+' EX2 / '-' EX2 ) ;
EX2 = EX3 $( '*' EX3 / '/' EX3 ) ;
EX3 = EX4 $( '^' EX3 ) ;
EX4 = '+' EX5 / '-' EX5 / EX5 ;
EX5 = .ID / .NUMBER / '(' EX1 ')' ;
.END
But the EX1~EX5 abstraction is not so intuitive to me.(I don't quite understand how they are crafted in the first place)
Is there any steps to follow when normalizing such expressions?

You can translate this notation to EBNF directly.
Naming categories EX1 through EX5 is not an uncommon way of specifying operator precedence. In fact it is a good one, IMHO, especially in some languages that have 15 or more precedence levels, like C and C++ do. :)
You can rename them to expression, term, factor, primary, etc. (or whatever terms make sense to you).
ADDENDUM
If you need a translation of the above into more traditional EBNF, here is how I would do it:
AEXP => AS+
AS => id ':=' EX1 ';'
EX1 => EX2 (('+' | '-') EX2)*
EX2 => EX3 (('*' | '/') EX3)*
EX3 => EX4 ('^' EX3)*
EX4 => ('+'|'-')? EX5
EX5 => id | number | '(' EX1 ')'
I use '*' for zero or more, '+' for one or more, and '?' for optional. It is pretty cool how operator precedence is handled here, I think.
ADDENDUM 2:
Please note: It appears that the rule for EX3 is wrong. The way it stands now you can get parse trees like this
EX3
|
+---+----+----+----+---------+
| | | | | | |
EX4 ^ EX3 ^ EX3 ^ EX3
/ | \ / | \
EX4 ^ EX3 EX4 ^ EX3
So writing a^b^c^d^e^f could mean a^(b^c)^d^(e^f). But in fact there are other ways to make this tree. The grammar is ambiguous.
It appears the designer of the grammar wanted to make the ^ operator right-associative. But to do so, the rule should have been
EX3 => EX4 ('^' EX3)?
Now the grammar is no longer ambiguous. Look how the derivation of a^b^c^d^e^f MUST now proceed:
EX3
/ | \
EX4 ^ EX3
/ | \
EX4 ^ EX3
/ | \
EX4 ^ EX3
/ | \
EX4 ^ EX3
/ | \
EX4 ^ EX3
Now a^b^c^d^e^f can ONLY parse as a^(b^(c^(d^(e^f))))
An alternative is to rewrite the rule as EX3 => EX4 ('^' EX4)* and have a side rule saying "OBTW the caret is right associative."

Related

how to accomplish each_slice like ruby with jq

Sample Input
[1,2,3,4,5,6,7,8,9]
My Solution
$ echo '[1,2,3,4,5,6,7,8,9]' | jq --arg g 4 '. as $l|($g|tonumber) as $n |$l|length as $c|[range(0;$c;($g|tonumber))]|map($l[.:.+$n])' -c
Output
[[1,2,3,4],[5,6,7,8],[9]]
shorthand, handy method anything else?
Use a while loop to chop off the first 4 elements .[4:] until the array is empty []. Then, for each result array, consider only its first 4 items [:4]. Generalized to $n:
jq -c --argjson n 4 '[while(. != []; .[$n:])[:$n]]'
[[1,2,3,4],[5,6,7,8],[9]]
Demo
There's an undocumented builtin function, _nwise/1, which you would use like this:
jq -nc --argjson n 4 '[1,2,3,4,5,6,7,8,9] | [_nwise($n)]'
[[1,2,3,4],[5,6,7,8],[9]]
Notice that using --argjson allows you to avoid the call to tonumber.
One way using reduce operating on the whole list, forming only n entries (sub-arrays) at a time
jq -c --argjson g 4 '. as $input |
reduce range(0; ( $input | length ) ; $g) as $r ( []; . + [ $input[ $r: ( $r + $g ) ] ] )'
The three argument form of range(from: upto; by) generates numbers from to upto with an increment of by
E.g. range(0; 9; 4) from your original input produces a set of indices - 0, 4, 8 which is ranged over and the final list is formed by appending the slices, coming out of the array slice operation e.g. [0:4], [4:8] and [8:12]

Recursive macro makes infinite recursion

I made a simple macro that returns the taken parameter.
macro_rules! n {
($n:expr) => {{
let val: usize = $n;
match val {
0 => 0,
_ => n!(val - 1),
}
}};
}
When I compile this code with the option external-macro-backtrace, it raises an error:
error: recursion limit reached while expanding the macro `n`
--> src/main.rs:15:18
|
10 | macro_rules! n {
| _-
| |_|
| |
11 | | ($n:expr) => {{
12 | | let val: usize = $n;
13 | | match val {
14 | | 0 => 0,
15 | | _ => n!(val - 1),
| | ^^^^^^^^^^^
| | |
| | in this macro invocation
16 | | }
17 | | }};
18 | | }
| | -
| |_|
| |_in this expansion of `n!`
| in this expansion of `n!`
...
31 | | n!(1);
| | ------ in this macro invocation
|
= help: consider adding a `#![recursion_limit="128"]` attribute to your crate
I changed the recursion_limit to 128 and higher, but the compiler error message just increase as well. Even when I call n!(0) it makes the same error. I think it is infinite recursion, but I can't find the reason.
Well, it really is an infinite recursion. Check what your macro invocation n!(0) will be expanded into:
{
let val: usize = 0;
match val {
0 => 0,
_ => n!(0 - 1),
}
}
...and since there's no way for argument of n! to stop growing negative, it'll repeat (with n!(0 - 1 - 1) in the second match arm, then n!(0 - 1 - 1 - 1) etc.) infinitely.
The key point here is that the macro expansion happens in compile-time, while the match statement you're trying to use to limit the recursion is invoked only at run-time and can't stop anything from appear before that. Unhappily, there's no easy way to do this, since Rust won't evaluate macro arguments (even if it's a constant expression), and so just adding the (0) => {0} branch to the macro won't work, since the macro will be invoked as (for example) n!(1 - 1).

jq select elements with array not containing string

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.
When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json
This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.
Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

Backus Naur Form Assoicativity

Is this the correct way to implement right associativity for Exponentiation PowExp? So that 2^3^4 is actually (2^(3^4))
<Exp> ::= <Exp> + <MulExp>
| <Exp> - <MulExp>
| <MulExp>
<MulExp> ::= <MulExp> * <PowExp>
| <MulExp> / <PowExp>
| <PowExp>
<PowExp> ::= <NegExp> ^ <PowExp>
|<NegExp>
<NegExp> ::= - <RootExp>
| <RootExp>
<RootExp> ::= ( <Exp> )
| 1 | 2 | 3 | 4
The way you've written it is correct.
Incidentally, you might want to reconsider your hierarchy; in regular math, −34 is −(34), not (−3)4. So you might want - 3 ^ 4 to mean - (3 ^ 4), in which case NegExp would include PowExp rather than the other way around. (But I suppose it could be confusing if -3 ^ 4 means -(3 ^ 4), so maybe there's no intuitive order-of-operations here? Another possibility is to require parentheses for either reading, by having PowExp and NegExp both depend directly on RootExp.)

Find out the language generated, given a context-free grammar?

Should I manually apply the production rules to find out the language generated by this grammar? This is tedious, is there any trick/tip to speed up things?
G = {{S, B}, {a, b}, P, S}
P = {S -> aSa | aBa, B -> bB | b}
EDIT: I found Matajon's answer a good one, that is thinking about each language generated by non-terminal symbol and then combine them.
But I'm still stuck when I have to solve some complicated examples like this:
G = {{S, R, T}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, P, S}
P = {S -> A | AS | BR | CT,
R -> AR | BT | C | CS,
T -> AT | B | BS | CR,
A -> 0 | 3 | 6 | 9,
B -> 1 | 4 | 7,
C -> 2 | 5 | 8}
Crazy, isn't it? Taken from past exams (programming languages course).
I don't know any general trick, but usually it helps to think about the language generated from each non-terminal.
In your example language generated from B is obviously L(B) = {b}^+. Then you think about S rules, using the first rule, you can generate sentencial forms {a^n.S.a^n | n >= 1}. If you use second rule on these sentencial forms or on S alone you can generate sentencial forms {a^n.B.a^n | n >= 1}.
Rest is pretty easy, you combine these two things and get L(G) = {a^n.b^+.a^n | n >= 1}
By the way, in the definition of grammar terminals and nonterminals are sets, not tuples. And third component is production rules, not start symbol. So you should write G = {{S, B}, {a, b}, P, S}.
Edit
Actually, there is a way to solve your second example without much thinking just by following something like a cookbook. Because, language generated by your second context-free grammar is in fact regular.
When you substitute rules for A, B and C to first three rules, you get
P' = {S -> 0 | 3 | 6 | 9 | 0S | 3S | 6S | 9S | 1R | 4R | 7R | 2T | 5T | 8T
R -> 0R | 3R | 6R | 9R | 1T | 4T | 7T | 2 | 5 | 8 | 2S | 5S | 8S
T -> 0T | 3T | 6T | 9T | 1 | 4 | 7 | 1S | 4S | 7S | 2R | 5R | 8R}
And P' is regular grammar. Because of that, you can convert it to nondeterministic finite automaton (there is really simple way, look for it) and then convert resulting NFA to the regular expression (this is not so simple but if you follow an algorithm and don't get lost, you should be ok). And it from regular expression it is easy to tell what language it describes.
Also, once you have NFA for this language you can look at it and determine what it does logically (it has something to do with counts of 1,4,7 and 2,5,8 in the word and mod 3 of their difference. Think it through, it is your homework, afterall :-) )
Of course, if you don't context-free grammar generating regular language you can't use this trick. There is no general way to tell what language the grammar generates (language equality problem for CFG's is undecideable), you have to think about every single example and look for similarities and patterns in it's logical structure.
I think you'll just need to apply the production rules.

Resources