Determining if grammar meets LL(1) requirements - bnf

I have a BNF for a recursive descent parser. One of the steps to solving this is to verify that the grammar is LL(1), but I keep coming up with verification that it is not.
The BNF in question, or more specifically, the exact area I'm having an issue:
<S> -> start <vars> <block>
<block> -> begin <vars> <stats> end
<vars> -> e | id = number <vars>
<stats> -> <if> | <block> | <loop> | <assign>
There is more to this, but these are the only productions that are relevant to this question, I believe.
My approach to solving this is to compute FIRST of the right hand sides of those productions that have a choice. If there is no choice, I skip, as I know they are already k=0.
FIRST(e | id = number <vars>) = {e, id} // Since it produces the empty set, I must also compute follow.
FOLLOW( e | id = number <vars> ) = FOLLOW(<vars>)
Non-terminal 'vars' appears in 2 productions: and , and is followed by two nonterminals: 'block' and 'stats'
FIRST(<block>) = {begin}
FIRST(<stats>) = { ... begin ... } // contains all terminals
Now, my problem. In computing the FOLLOW(), I have found two begin tokens, which leads me to say that this grammar is not LL(1). However, I don't believe that the answer to this exercise is that it is not possible to create a recursive descent parser, so I believe that I've made an error somewhere or that I have executed the algorithm incorrectly.
Can anyone point me in the right direction?

So you've correctly found that FOLLOW(var) = FIRST(block) ∪ FIRST(stats). These are all sets, so when you compute the union of the two first sets (each of which contains begin), you end up with just a single begin. As long as neither of these sets ends up containing id, everything is fine and your grammar is still LL(1).

Related

Recursive Decision Tree In Problems

So I am trying to draw the decision of tree of 2 Prolog problems, one that uses the accumulator and other that doesn't. Here are my problems and the solutions I did, respectively:
length([H|T],N) :- length(T,N1), N is N1+1.
length([ ],0).
Goal: ?: length([1,2,3],N)
Second one with accumulator:
length_acc(L,N) :- len_acc(L,0,N).
len_acc([H|T], A, N) :- A1 is A+1, len_acc(T, A1, N).
len_acc([], A, A).
Goal: ?-length_acc([1,2], N).
Are the decision trees correctly drawn? Or have I made a mistake? Whats the correct way to draw these kind of recursive decision tree?
Thanks.
The tree you are referring to is usually called a search-tree aka SLD-tree, not to be confused with a proof-tree.
Both the problems you have outlined are the most simple cases of search-trees:
there is only one solution
the query does not fail
each step in the search can only match a single clause (empty list vs non-empty list)
These three characteristics imply that there will only be a single branch in the SLD tree.
You'll get the following search-trees:
Note that for it to be a correct search-tree, at most one goal is resolved in each step, which makes search-trees very large... therefore it's common that people make simplified trees where multiple goals can be resolved in each step, which arguably are not true search-trees but illustrates the search in a more succint way.
Edges in the tree are labeled with substitutions that are applied to the variables as part of the unification algorithm.
Search-trees correspond closely to traces, and you can usually do a straight translation from a trace of your program to a search tree.
I advise you to study search-trees for queries that have multiple answers and branches that can fail, which gives more interesting trees with multiple branches. An example from The Art of Prolog by Sterling, Shapiro:
Program:
father(abraham, isaac). male(isaac)
father(haran, lot). male(lot).
father(haran, milcah). female(milcah).
father(haran, yiscah). female(yiscah).
son(X,Y):- father(Y,X), male(X).
daughter(X,Y):- father(Y,X), female(X).
Query:
?: son(S, haran)
Search-tree:
A nice way to understand something is to re-implement it yourself.
It's especially nice to implement Prolog when you already have Prolog to implement it with. :)
program( patriarchs, P ) :-
P = [ % [son(S, haran)] , % Resolvent
[father(abraham, isaac)] % Clauses...
, [father(haran, lot)] % [Head, Body...]
, [father(haran, milcah)]
, [father(haran, yiscah)]
, [male(isaac)]
, [male(lot)]
, [female(milcah)]
, [female(yiscah)]
, [son(X,Y), father(Y,X), male(X)]
, [daughter(X,Y), father(Y,X), female(X)]
].
solve( Program ):-
Program = [[] | _]. % empty resolvent -- success
solve( [[Goal | Res] | Clauses] ) :-
member( Rule, Clauses),
copy_term( Rule, [Head | Body]), % rename vars
Goal = Head, % unify head
append( Body, Res, Res2 ), % replace goal
solve( [Res2 | Clauses] ).
query( What, Query ):- % Query is a list of Goals to Solve
program( What, Program),
solve( [ Query | Program ] ).
Testing,
23 ?- query( patriarchs, [son(S, haran)] ).
S = lot ;
false.
Now the above solve/1 can be augmented to record the record of successful instantiations of Goal making the unifications Goal = Head possible.

PostScript forall on dictionaries

According to the PLRM it doesn't matter in which order you execute a forall on a dict:
(p. 597) forall pushes a key and a value on the operand stack and executes proc for each key-value pair in the dictionary
...
(p. 597) The order in which forall enumerates the entries in the dictionary is arbitrary. New entries put in the dictionary during the execution of proc may or may not be included in the enumeration. Existing entries removed from the dictionary by proc will not be encountered later in the enumeration.
Now I was executing some code:
/d 5 dict def
d /abc 123 put
d { } forall
My output (operand stack) is:
--------top-
/abc
123
-----bottom-
The output of ghostscript and PLRM (operand stack) is:
--------top-
123
/abc
-----bottom-
Does it really not matter in what order you process the key-value pairs of the dict?
on the stack, do you first need to push the value and then the key, or do you need to push the key first? (as the PLRM only talks about "a key and a value", but doesnt tell you anything about the order).
Thanks in advance
It would probably help if you quoted the page number qhen you quote sections from the PLRM, its hard to see where you are getting this from.
When executing forall the order in which forall enumerates the dictionary pairs is arbitrary, you have no influence over it. However forall always pushes the key and then the value. Even if this is implied in the text you (didn't quite) quote, you can see from the example in the forall operator that this is hte case.
when you say 'my output' do you mean you are writing your own PostScript interpreter ? If so then your output is incorrect, when pushing a key/value pair the key is pushed first.

Writing a syntax analyser (parser) from BNF without using a generator

I'm interested in writing a parser (syntax analyser) from a BNF grammar without using a generator tool like yacc or bison.
I take as example the BNF for simple arithmetic expression : (extracted from the Dragon Book 2.2.6)
expr -> expr + term | expire - term | term
term -> term * factor | term / factor | factor
factor -> number | (expr)
Suppose I would like to create the equivalent code.
I guess I should create three functions called parseExpr, parseTerm and parseFactor.
How do I construct these functions regarding to the BNF defined upwards ?
For the parseFactor it seems to be obvious :
Get the token type
If type number return a node for number type
If the token represents an opening parenthesis, get the node returned by parseExpr
Check that next token is a closing parenthesis. If yes, return the node obtained at 3. If no, throw an error
For the parseExpr, I'm a little bit confused about how to start and interpret the production : expr -> expr + term | expire - term | term
How to translate this production ? How to detect which case applies ? Same question for the last production ?

Variable Names in SWI Prolog

I have been using the chr library along with the jpl interface. I have a general inquiry though. I send the constraints from SWI Prolog to an instance of a java class from within my CHR program. The thing is if the input constraint is leq(A,B) for example, the names of the variables are gone, and the variable names that appear start with _G. This happens even if I try to print leq(A,B) without using the interface at all. It appears that whenever the variable is processed the name is replaced with a fresh one. My question is whether there is a way to do the mapping back. For example whether there is a way to know that _G123 corresponds to A and so on.
Thank you very much.
(This question has nothing to do with CHR nor is it specific to SWI).
The variable names you use when writing a Prolog program are discarded completely by the Prolog system. The reason is that this information cannot be used to print variables accurately. There might be several independent instances of that variable. So one would need to add some unique identifier to the variable name. Also, maintaining that information at runtime would incur significant overheads.
To see this, consider a predicate mylist/1.
?- [user].
|: mylist([]).
|: mylist([_E|Es]) :- mylist(Es).
|: % user://2 compiled 0.00 sec, 4 clauses
true.
Here, we have used the variable _E for each element of the list. The toplevel now prints all those elements with a unique identifier:
?- mylist(Fs).
Fs = [] ;
Fs = [_G295] ;
Fs = [_G295, _G298] .
Fs = [_G295, _G298, _G301] .
The second answer might be printed as Fs = [_E] instead. But what about the third? It cannot be printed as Fs = [_E,_E] since the elements are different variables. So something like Fs = [_E_295,_E_298] is the best we could get. However, this would imply a lot of extra book keeping.
But there is also another reason, why associating source code variable names with runtime variables would lead to extreme complexities: In different places, that variable might have a different name. Here is an artificial example to illustrate this:
p1([_A,_B]).
p2([_B,_A]).
And the query:
?- p1(L), p2(L).
L = [_G337, _G340].
What names, would you like, these two elements should have? The first element might have the name _A or _B or maybe even better: _A_or_B. Or, even _Ap1_and_Bp2. For whom will this be a benefit?
Note that the variable names mentioned in the query at the toplevel are retained:
?- Fs = [_,F|_], mylist(Fs).
Fs = [_G231, F] ;
Fs = [_G231, F, _G375] ;
Fs = [_G231, F, _G375, _G378]
So there is a way to get that information. On how to obtain the names of variables in SWI and YAP while reading a term, please refer to this question.

Erlang Hash Tree

I'm working on a p2p app that uses hash trees.
I am writing the hash tree construction functions (publ/4 and publ_top/4) but I can't see how to fix publ_top/4.
I try to build a tree with publ/1:
nivd:publ("file.txt").
prints hashes...
** exception error: no match of right hand side value [67324168]
in function nivd:publ_top/4
in call from nivd:publ/1
The code in question is here:
http://github.com/AndreasBWagner/nivoa/blob/886c624c116c33cc821b15d371d1090d3658f961/nivd.erl
Where do you think the problem is?
Thank You,
Andreas
Looking at your code I can see one issue that would generate that particular exception error
publ_top(_,[],Accumulated,Level) ->
%% Go through the accumulated list of hashes from the prior level
publ_top(string:len(Accumulated),Accumulated,[],Level+1);
publ_top(FullLevelLen,RestofLevel,Accumulated,Level) ->
case FullLevelLen =:= 1 of
false -> [F,S|T]=RestofLevel,
io:format("~w---~w~n",[F,S]),
publ_top(FullLevelLen,T,lists:append(Accumulated,[erlang:phash2(string:concat([F],[S]))]),Level);
true -> done
end.
In the first function declaration you match against the empty list. In the second declaration you match against a list of length (at least) 2 ([F,S|T]). What happens when FullLevelLen is different from 1 and RestOfLevel is a list of length 1? (Hint: You'll get the above error).
The error would be easier to spot if you would pattern match on the function arguments, perhaps something like:
publ_top(_,[],Accumulated,Level) ->
%% Go through the accumulated list of hashes from the prior level
publ_top(string:len(Accumulated),Accumulated,[],Level+1);
publ_top(1, _, _, _) ->
done;
publ_top(_, [F,S|T], Accumulated, Level) ->
io:format("~w---~w~n",[F,S]),
publ_top(FullLevelLen,T,lists:append(Accumulated,[erlang:phash2(string:concat([F],[S]))]),Level);
%% Missing case:
% publ_top(_, [H], Accumulated, Level) ->
% ...

Resources