Grako left recursion - bnf

I'm trying to use grako to describe a simple left-recursive grammar but I have trouble to do so.
Right-recursion does work without any problem :
symbol = /[a-z]/ ;
condition = symbol "AND" condition | symbol ;
start = condition $ ;
According to all examples I found, left-recursion should be described this way :
symbol = /[a-z]/ ;
condition = condition "AND" symbol | symbol ;
start = condition $ ;
However, it does not work for the rule given below :
a AND b AND c
I get this error :
grako.exceptions.FailedParse: srecur(1:3) Expecting end of text. :
a AND b AND c
^
start
What I understand at this point is that first character of rule matches symbol and not condition "AND" symbol, so grako would like to use it. But my start rule forces that all characters have been consumed.
I've tried to use many workarounds yet but I've not been able to find one that fits.

Grako is in fact a PEG parser. Those parsers have the implicit property of not being able to handle left recursion easily.
More details here and there.
For my needs, I have been able to solve my problem with this kind of expressions :
condition = symbol { "AND" symbol }* ;

Related

Avoiding left recursion in parsing LiveScript object definitions

I'm working on a parser for LiveScript language, and am having trouble with parsing both object property definition forms — key: value and (+|-)key — together. For example:
prop: "val"
+boolProp
-boolProp
prop2: val2
I have the key: value form working with this:
Expression ::= TestExpression
| ParenExpression
| OpExpression
| ObjDefExpression
| PropDefExpression
| LiteralExpression
| ReferenceExpression
PropDefExpression ::= Expression COLON Expression
ObjDefExpression ::= PropDefExpression (NEWLINE PropDefExpression)*
// ... other expressions
But however I try to add ("+"|"-") IDENTIFIER to PropDefExpression or ObjDefExpression, I get errors about using left recursion. What's the (right) way to do this?
The grammar fragment you posted is already left-recursive, i.e. without even adding (+|-)boolprop, the non-terminal 'Expression' derives a form in which 'Expression' reappears as the leftmost symbol:
Expression -> PropDefExpression -> Expression COLON Expression
And it's not just left-recursive, it's ambiguous. E.g.
Expression COLON Expression COLON Expression
can be derived in two different ways (roughly, left-associative vs right-associative).
You can eliminate both these problems by using something more restricted on the left of the colon, e.g.:
PropDefExpression ::= Identifier COLON Expression
Also, another ambiguity: Expression derives PropDefExpression in two different ways, directly and via ObjDefExpression. My guess is, you can drop the direct derivation.
Once you've taken care of those things, it seems to me you should be able to add (+|-)boolprop without errors (unless it conflicts with one of the other kinds of expression that you didn't show).
Mind you, looking at the examples at http://livescript.net, I'm doubtful how much of that you'll be able to capture in a conventional grammar. But if you're just going for a subset, you might be okay.
I don't know how much help this will be, because I know nothing about GrammarKit and not much more about the language you're trying to parse.
However, it seems to me that
PropDefExpression ::= Expression COLON Expression
is not quite accurate, and it is creating an ambiguity when you add the boolean property production because an Expression might start with a unary - operator. In the actual grammar, though, a property cannot start with an arbitrary Expression. There are two types of key-property definitions:
name : expression
parenthesized_expression : expression
(Which is to say, expressions need to start with a ().
That means that a boolean property definition, starting with + or - is recognizable from the first token, which is precisely the condition needed for successful recursive descent parsing. There are several other property definition syntaxes, including names and parenthesized_expressions not followed by a :
That's easy to parse with an LR(1) parser, like the one Jison produces, but to parse it with a recursive-descent parser you need to left-factor. (It's possible that GrammarKit can do this for you, by the way.) Basically, you'd need something like (this is not complete):
PropertyDefinition ::= PropertyPrefix PropertySuffix? | BooleanProperty
PropertyPrefix ::= NAME | ParenthesizedExpression
PropertySuffix ::= COLON Expression | DOT NAME

What is meaning of ##*/ in unix?

I found syntax like below.
${VARIABLE##*/}
what is the meaning of ##*/ in this?
I know meaning of */ in ls */ but not aware about what above syntax does.
This example will make it clear:
VARIABLE='abcd/def/123'
echo "${VARIABLE#*/}"
def/123
echo "${VARIABLE##*/}"
123
##*/ is stripping out longest match of anything followed by / from start of input.
#*/ is stripping out shortest match of anything followed by / from start of input.
PS: Using all capital variable names is not considered very good practice in Unix shell. Better to use variable instead of VARIABLE.
From man bash:
${parameter#word}
${parameter##word}
Remove matching prefix pattern. The word is expanded to produce
a pattern just as in pathname expansion. If the pattern matches
the beginning of the value of parameter, then the result of the
expansion is the expanded value of parameter with the shortest
matching pattern (the ``#'' case) or the longest matching pat‐
tern (the ``##'' case) deleted. If parameter is # or *, the
pattern removal operation is applied to each positional parame‐
ter in turn, and the expansion is the resultant list. If param‐
eter is an array variable subscripted with # or *, the pattern
removal operation is applied to each member of the array in
turn, and the expansion is the resultant list.

Sliding window matching in functional programming

I'm trying to implement a sliding window algorithm for matching words in a text file. I come from a procedural background and my first attempt to do this in a functional language like Erlang seems to require time O(n^2) (or even more). How would one do this in a functional language?
-module(test).
-export([readText/1,patternCount/2,main/0]).
readText(FileName) ->
{ok,File} = file:read_file(FileName),
unicode:characters_to_list(File).
patternCount(Text,Pattern) ->
patternCount_(Text,Pattern,string:len(Pattern),0).
patternCount_(Text,Pattern,PatternLength,Count) ->
case string:len(Text) < PatternLength of
true -> Count;
false ->
case string:equal(string:substr(Text,1,PatternLength),Pattern) of
true ->
patternCount_(string:substr(Text,2),Pattern,PatternLength,Count+1);
false ->
patternCount_(string:substr(Text,2),Pattern,PatternLength,Count)
end
end.
main() ->
test:patternCount(test:readText("file.txt"),"hello").
Your question is a bit too broad, since it asks about implementing this algorithm in functional languages but how best to do that is language-dependent. My answer therefore focuses on Erlang, given your example code.
First, note that there's no need to have separate patternCount and patternCount_ functions. Instead, you can just have multiple patternCount functions with different arities as well as multiple clauses of the same arity. First, let's rewrite your functions to take that into account, and also replace calls to string:len/1 with the length/1 built-in function:
patternCount(Text,Pattern) ->
patternCount(Text,Pattern,length(Pattern),0).
patternCount(Text,Pattern,PatternLength,Count) ->
case length(Text) < PatternLength of
true -> Count;
false ->
case string:equal(string:substr(Text,1,PatternLength),Pattern) of
true ->
patternCount(string:substr(Text,2),Pattern,PatternLength,Count+1);
false ->
patternCount(string:substr(Text,2),Pattern,PatternLength,Count)
end
end.
Next, the multi-level indentation in the patternCount/4 function is a "code smell" indicating it can be done better. Let's split that function into multiple clauses:
patternCount(Text,Pattern,PatternLength,Count) when length(Text) < PatternLength ->
Count;
patternCount(Text,Pattern,PatternLength,Count) ->
case string:equal(string:substr(Text,1,PatternLength),Pattern) of
true ->
patternCount(string:substr(Text,2),Pattern,PatternLength,Count+1);
false ->
patternCount(string:substr(Text,2),Pattern,PatternLength,Count)
end.
The first clause uses a guard to detect that no more matches are possible, while the second clause looks for matches. Now let's refactor the second clause to use Erlang's built-in matching. We want to advance through the input text one element at a time, just as the original code does, but we also want to detect matches as we do so. Let's perform the matches in our function head, like this:
patternCount(_Text,[]) -> 0;
patternCount(Text,Pattern) ->
patternCount(Text,Pattern,Pattern,length(Pattern),0).
patternCount(Text,_Pattern,_Pattern,PatternLength,Count) when length(Text) < PatternLength ->
Count;
patternCount(Text,[],Pattern,PatternLength,Count) ->
patternCount(Text,Pattern,Pattern,PatternLength,Count+1);
patternCount([C|TextTail],[C|PatternTail],Pattern,PatternLength,Count) ->
patternCount(TextTail,PatternTail,Pattern,PatternLength,Count);
patternCount([_|TextTail],_,Pattern,PatternLength,Count) ->
patternCount(TextTail,Pattern,Pattern,PatternLength,Count).
First, note that we added a new argument to the bottom four clauses: we now pass Pattern as both the second and third arguments to allow us to use one of them for matching and one of them to maintain the original pattern, as explained more fully below. Note also that we added a new clause at the very top to check for an empty Pattern and just return 0 in that case.
Let's focus only on the bottom three patternCount/5 clauses. These clauses are tried in order at runtime, but let's look at the second of these three clauses first, then the third clause, then the first of the three:
In the second of these three clauses, we write the first and second arguments in [Head|Tail] list notation, which means Head is the first element of the list and Tail is the rest of the list. We use the same variable for the head of both lists, which means that if the first elements of both lists are equal, we have a potential match in progress, so we then recursively call patternCount/5 passing the tails of the lists as the first two arguments. Passing the tails allows us to advance through both the input text and the pattern an element at a time, checking for matching elements.
In the last clause, the heads of the first two arguments do not match; if they did, the runtime would execute the second clause, not this one. This means that our pattern match has failed, and so we no longer care about the first element of the first argument nor about the second argument, and we have to advance through the input text to look for a new match. Note that we write both the head of the input text and the second argument as the _ "don't care" variable, as they are no longer important to us. We recursively call patternCount/5, passing the tail of the input text as the first argument and the full Pattern as the second argument, allowing us to start looking for a new match.
In the first of these three clauses, the second argument is the empty list, which means we've gotten here by successfully matching the full Pattern, element by element. So we recursively call patternCount/5 passing the full Pattern as the second argument to start looking for a new match, and we also increment the match count.
Try it! Here's the full revised module:
-module(test).
-export([read_text/1,pattern_count/2,main/0]).
read_text(FileName) ->
{ok,File} = file:read_file(FileName),
unicode:characters_to_list(File).
pattern_count(_Text,[]) -> 0;
pattern_count(Text,Pattern) ->
pattern_count(Text,Pattern,Pattern,length(Pattern),0).
pattern_count(Text,_Pattern,_Pattern,PatternLength,Count)
when length(Text) < PatternLength ->
Count;
pattern_count(Text,[],Pattern,PatternLength,Count) ->
pattern_count(Text,Pattern,Pattern,PatternLength,Count+1);
pattern_count([C|TextTail],[C|PatternTail],Pattern,PatternLength,Count) ->
pattern_count(TextTail,PatternTail,Pattern,PatternLength,Count);
pattern_count([_|TextTail],_,Pattern,PatternLength,Count) ->
pattern_count(TextTail,Pattern,Pattern,PatternLength,Count).
main() ->
pattern_count(read_text("file.txt"),"hello").
A few final recommendations:
Searching through text element by element is slower than necessary. You should have a look at the Boyer-Moore algorithm and other related algorithms to see ways of advancing through text in larger chunks. For example, Boyer-Moore attempts to match at the end of the pattern first, since if that's not a match, it can advance through the text by as much as the full length of the pattern.
You might want to also looking into using Erlang binaries rather than lists, as they are more compact memory-wise and they allow for matching more than just their first elements. For example, if Text is the input text as a binary and Pattern is the pattern as a binary, and assuming the size of Text is equal to or greater than the size of Pattern, this code attempts to match the whole pattern:
case Text of
<<Pattern:PatternLength/binary, TextTail/binary>> = Text ->
patternCount(TextTail,Pattern,PatternLength,Count+1);
<<_/binary,TextTail/binary>> ->
patternCount(TextTail,Pattern,PatLen,Count)
end.
Note that this code snippet reverts to using patternCount/4 since we no longer need the extra Pattern argument to work through element by element.
As shown in the full revised module, when calling functions in the same module, you don't need the module prefix. See the simplified main/0 function.
As shown in the full revised module, conventional Erlang style does not use mixed case function names like patternCount. Most Erlang programmers would use pattern_count instead.

Subtracting count values in UNIX

I am trying to subtracts one count value from another but,I am facing problem in following code :
count=$?
count1=$?
(then some operations and above count values got some value suppose 1,2 respectively)
$count=$count1 - $count ==> Here it should get : 2-1=1 )
I don't know exact syntax for this so, can any one help me please?
You can use the shell's expression syntax:
count=$(($count1-$count))
the $ prefix on variables is optional inside $(()), so this can also be written as:
count=$((count1-count))
Unix provides you the command expr that lets you evaluate any arithmetic expression you want. At shell prompt try :
expr 2 - 3 + 5 '*' 8
Remember that * is used as wildcard so you need to un-specialized it in any way you want.
Then now, you could use ` to evaluate an expression at any place :
count=`expr $count1 - $count`
Be aware that all arguments MUST be separated with spaces.
This will work for Bourne-shell which is the one recommended for shell-scripts.

Simple Vim Programming (vimrc file)

I'm trying to learn how to configure my .vimrc file with my own functions.
I'd like to write a function that traverses every line in a file and counts the total number of characters, but ignores all whitespace. This is for a programming exercise and as a stepping stone to more complex programs (I know there are other ways to get this example value using Vim or external programs).
Here's what I have so far:
function countchars()
let line = 0
let count = 0
while line < line("$")
" update count here, don't count whitespace
let line = getline(".")
return count
endfun
What functional code could I replace that commented line with?
If I understand the question correctly, you're looking to count the number of non-whitespace characters in a line. A fairly simple way to do this is to remove the whitespace and look at the length of the resulting line. Therefore, something like this:
function! Countchars()
let l = 1
let char_count = 0
while l <= line("$")
let char_count += len(substitute(getline(l), '\s', '', 'g'))
let l += 1
endwhile
return char_count
endfunction
The key part of the answer to the question is the use of substitute. The command is:
substitute(expr,pattern,repl,flags)
expr in this case is getline(l) where l is the number of the line being iterated over. getline() returns the content of the line, so this is what is being parsed. The pattern is the regular expression \s which matches any single whitespace character. It is replaced with '', i.e. an empty string. The flag g makes it repeat the substitute as many times as whitespace is found on the line.
Once the substitution is complete, len() gives the number of non-whitespace characters and this is added to the current value of char_count with +=.
A few things that I've changed from your sample:
The function name starts with a capital letter (this is a requirement for user defined functions: see :help user-functions)
I've renamed count to char_count as you can't have a variable with the same name as a function and count() is a built-in function
Likewise for line: I renamed this to l
The first line in a file is line 1, not line 0, so I initialised l to 1
The while loop counted up to but not including the last line, I assume you wanted all the lines in the file (this is probably related to the line numbering starting at 1): I changed your code to use <= instead of <
Blocks aren't based on indentation in vim, so the while needs an endwhile
In your function, you have let line = getline('.')
I added a ! on the function definition as it makes incremental development much easier (everytime you re-source the file, it will override the function with the new version rather than spitting out an error message about it already existing).
Incrementing through the file works slightly differently...
In your function, you had let line = getline('.'). Ignoring the variable name, there are still some problems with this implementation. I think what you meant was let l = line('.'), which gives the line number of the current line. getline('.') gives the contents of the current line, so the comparison on the while line would be comparing the content of the current line with the number of the last line and this would fail. The other problem is that you're not actually moving through the file, so the current line would be whichever line you were on when you called the function and would never change, resulting in an infinite loop. I've replaced this with a simple += 1 to step through the file.
There are ways in which the current line would be a useful way to do this, for example if you were writing a function with that took a range of lines, but I think I've written enough for now and the above will hopefully get you going for now. There are plenty of people on stackoverflow to help with any issues anyway!
Have a look at:
:help usr_41.txt
:help function-list
:help user-functions
:help substitute()
along with the :help followed by the various things I used in the function (getline(), line(), let+= etc).
Hope that was helpful.
This approach uses lists:
function! Countchars()
let n = 0
for line in getline(1,line('$'))
let n += len(split(line,'\zs\s*'))
endfor
return n
endfunction
I suppose you have already found the solution.
Just for info:
I use this to count characters without spaces in Vim:
%s/\S/&/gn

Resources