Pyparser grammar not parsing correctly - pyparsing

Here is my grammar:
from pyparsing import Combine, Forward, Group, Literal, Optional, Word
from pyparsing import alphas, delimitedList, infixNotation, nums, oneOf, opAssoc, operatorPrecedence, quotedString, removeQuotes
integer = Combine(Optional(oneOf("+ -")) + Word(nums)).setParseAction(lambda t: int(t[0]))
real = Combine(Optional(oneOf("+ -")) + Word(nums) + "." + Optional(Word(nums))).setParseAction(
lambda t: float(t[0]))
variable = Word(alphas)
qs = quotedString.setParseAction(removeQuotes)
lt_brac = Literal('[').suppress()
rt_brac = Literal(']').suppress()
exp_op = Literal('^')
mult_op = oneOf('* /')
plus_op = oneOf('+ -')
relation = oneOf('== != < >')
regex_compare = Literal('~')
function_call = Forward()
operand = function_call | qs | real | integer | variable
expr = operatorPrecedence(operand,
[
(":", 2, opAssoc.LEFT),
(exp_op, 2, opAssoc.RIGHT),
(regex_compare, 2, opAssoc.LEFT),
(mult_op, 2, opAssoc.LEFT),
(plus_op, 2, opAssoc.LEFT),
(relation, 2, opAssoc.LEFT)
])
bool_operand = expr
bool_expr = infixNotation(bool_operand,
[
("not", 1, opAssoc.RIGHT),
("and", 2, opAssoc.LEFT),
("or", 2, opAssoc.LEFT),
])
function_param = function_call | expr | variable | integer | real
function_call <<= Group(variable + lt_brac + Group(Optional(delimitedList(function_param))) + rt_brac)
final_expr = Group(function_call | bool_expr | expr )
final_expr.enablePackrat()
def parse(expression):
return final_expr.parseString(expression)
The above grammar is suppose to parse arithmetic expression, relations statements like (<, >, !=, ==) the operands can be arithmetic expressions, bool expression ( or, and, not) the operands can be arithmetic or relational statement.
The grammar supports functions in the form of []. Params can be arithmetic expression.
This works fine in most cases. However I have the following question, using the above grammar when I try to parse
print(parse(""abs[abc:sec - abc:sge] > 1")
I get the following output
[[['abs', [[['abc', ':', 'sec'], '-', ['abc', ':', 'sge']]]]]]
Why is the ' > 1' ignored?

It's ignored because of this definition of final_expr:
final_expr = Group(function_call | bool_expr | expr )
Why do you define this expression this way? An expr is a simple bool_expr, and a function_call is a simple expr. Just do this:
final_expr = bool_expr
And you'll parse your given expression as:
[[['abs', [[['abc', ':', 'sec'], '-', ['abc', ':', 'sge']]]], '>', 1]]

Related

pyparsing parse c/cpp enums with values as user defined macros

I have a usecase where i need to match enums where values can be userdefined macros.
Example enum
typedef enum
{
VAL_1 = -1
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = **TEST_ENUM_CUSTOM(1,2)**,
}MyENUM;
I am using the below code, if i don't use format as in VAL_4 it works. I need match format as in VAL_4 as well. I am new to pyparsing, any help is appeciated.
My code:
BRACE, RBRACE, EQ, COMMA = map(Suppress, "{}=,")
_enum = Suppress("enum")
identifier = Word(alphas, alphanums + "_")
integer = Word("-"+alphanums) **#I have tried to "_(,)" to this but is not matching.**
enumValue = Group(identifier("name") + Optional(EQ + integer("value")))
enumList = Group(enumValue + ZeroOrMore(COMMA + enumValue) + Optional(COMMA))
enum = _enum + Optional(identifier("enum")) + LBRACE + enumList("names") + RBRACE + Optional(identifier("typedef"))
enum.ignore(cppStyleComment)
enum.ignore(cStyleComment)
Thanks
-Purna
Just adding more characters to integer is just the wrong way to go. Even this expression:
integer = Word("-"+alphanums)
isn't super-great, since it would match "---", "xyz", "q--10-", and many other non-integer strings.
Better to define integer properly. You could do:
integer = Combine(Optional('-') + Word(nums))
but I've found that for these low-level expressions that occur many places in your parse string, a Regex is best:
integer = Regex(r"-?\d+") # Regex(r"-?[0-9]+") if you like more readable re's
Then define one for hex_integer also,
Then to add macros, we need a recursive expression, to handle the possibility of macros having arguments that are also macros.
So at this point, we should just stop writing code for a bit, and do some design. In parser development, this design usually looks like a BNF, where you describe your parser in a sort of pseudocode:
enum_expr ::= "typedef" "enum" [identifier]
"{"
enum_item_list
"}" [identifier] ";"
enum_item_list ::= enum_item ["," enum_item]... [","]
enum_item ::= identifier "=" enum_value
enum_value ::= integer | hex_integer | macro_expression
macro_expression ::= identifier "(" enum_value ["," enum_value]... ")"
Note the recursion of macro_expression: it is used in defining enum_value, but it includes enum_value as part of its own definition. In pyparsing, we use a Forward to set up this kind of recursion.
See how that BNF is implemented in the code below. I build on some of the items you posted, but the macro expression required some rework. The bottom line is "don't just keep adding characters to integer trying to get something to work."
LBRACE, RBRACE, EQ, COMMA, LPAR, RPAR, SEMI = map(Suppress, "{}=,();")
_typedef = Keyword("typedef").suppress()
_enum = Keyword("enum").suppress()
identifier = Word(alphas, alphanums + "_")
# define an enumValue expression that is recursive, so that enumValues
# that are macros can take parameters that are enumValues
enumValue = Forward()
# add more types as needed - parse action on hex_integer will do parse-time
# conversion to int
integer = Regex(r"-?\d+").addParseAction(lambda t: int(t[0]))
# or just use the signed_integer expression found in pyparsing_common
# integer = pyparsing_common.signed_integer
hex_integer = Regex(r"0x[0-9a-fA-F]+").addParseAction(lambda t: int(t[0], 16))
# a macro defined using enumValue for parameters
macro_expr = Group(identifier + LPAR + Group(delimitedList(enumValue)) + RPAR)
# use '<<=' operator to attach recursive definition to enumValue
enumValue <<= hex_integer | integer | macro_expr
# remaining enum expressions
enumItem = Group(identifier("name") + Optional(EQ + enumValue("value")))
enumList = Group(delimitedList(enumItem) + Optional(COMMA))
enum = (_typedef
+ _enum
+ Optional(identifier("enum"))
+ LBRACE
+ enumList("names")
+ RBRACE
+ Optional(identifier("typedef"))
+ SEMI
)
# this comment style includes cStyleComment too, so no need to
# ignore both
enum.ignore(cppStyleComment)
Try it out:
enum.runTests([
"""
typedef enum
{
VAL_1 = -1,
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = TEST_ENUM_CUSTOM(1,2)
}MyENUM;
""",
])
runTests is for testing and debugging your parser during development. Use enum.parseString(some_enum_expression) or enum.searchString(some_c_header_file_text) to get the actual parse results.
Using the new railroad diagram feature in the upcoming pyparsing 3.0 release, here is a visual representation of this parser:

antlr same type arithmetic expression

Trying to write a antlr grammar that parse the arithmetic expression for only same typed variable. If it is not a same type as left or right side, it should not be parse. This is what I have;
stat
: Left = VARIABLE Op = ASSIGMENT Right = expr # Assigment
;
expr
: '(' Exp = expr ')' # Parens
| MINUS Exp = expr # UnaryMinus
| Left = expr Op = (TIMES | DIV) Right = expr # MulDiv
| Left = expr Op = (PLUS | MINUS) Right = expr # AddSub
| (VARIABLE | CONSTANT) # Element
;
ASSIGMENT : '=' ;
PLUS : '+' ;
MINUS : '-' ;
TIMES : '*' ;
DIV : '/' ;
LPAREN : '(' ;
RPAREN : ')' ;
I don't want anything like x = 5 + 'f' or x = c - 5, (if c is variable that is not integer)
It's called Semantic analysis.
When parsing is done you have to walk through the generated AST and check correctness of each expression and variable.

filter max of N

Could it be possible to write in FFL a version of filter that stops filtering after the first negative match, i.e. the remaining items are assumed to be positive matches? more generally, a filter.
Example:
removeMaxOf1([1,2,3,4], value>=2)
Expected Result:
[1,3,4]
This seems like something very difficult to write in a pure functional style. Maybe recursion or let could acheive it?
Note: the whole motivation for this question was hypothesizing about micro-optimizations. so performance is very relevant. I am also looking for something that is generally applicable to any data type, not just int.
I have recently added find_index to the engine which allows this to be done easily:
if(n = -1, [], list[:n] + list[n+1:])
where n = find_index(list, value<2)
where list = [1,2,3,4]
find_index will return the index of the first match, or -1 if no match is found. There is also find_index_or_die which returns the index of the first match, asserting if none is found for when you're absolutely certain there is an instance in the list.
You could also implement something like this using recursion:
def filterMaxOf1(list ls, function(list)->bool pred, list result=[]) ->list
base ls = []: result
base not pred(ls[0]): result + ls[1:]
recursive: filterMaxOf1(ls[1:], pred, result + [ls[0]])
Of course recursion can! :D
filterMaxOf1(input, target)
where filterMaxOf1 = def
([int] l, function f) -> [int]
if(size(l) = 0,
[],
if(not f(l[0]),
l[1:],
flatten([
l[0],
recurse(l[1:], f)
])
)
)
where input = [
1, 2, 3, 4, ]
where target = def
(int i) -> bool
i < 2
Some checks:
--> filterOfMax1([1, ]) where filterOfMax1 = [...]
[1]
--> filterOfMax1([2, ]) where filterOfMax1 = [...]
[]
--> filterOfMax1([1, 2, ]) where filterOfMax1 = [...]
[1]
--> filterOfMax1([1, 2, 3, 4, ]) where filterOfMax1 = [...]
[1, 3, 4]
This flavor loses some strong type safety, but is nearer to tail recursion:
filterMaxOf1(input, target)
where filterMaxOf1 = def
([int] l, function f) -> [int]
flatten(filterMaxOf1i(l, f))
where filterMaxOf1i = def
([int] l, function f) -> [any]
if(size(l) = 0,
[],
if(not f(l[0]),
l[1:],
[
l[0],
recurse(l[1:], f)
]
)
)
where input = [
1, 2, 3, 4, ]
where target = def
(int i) -> bool
i < 2

pyparsing: Grouping guidelines

pyparsing: The below is the code i put up which can parse a nested function call , a logical function call or a hybrid call which nests both the function and a logical function call. The dump() data adds too many unnecessary levels of braces because of grouping. Removing the Group() results in a wrong output. Is there a guideline to use Group(parsers)?
Also the Pyparsing document does'nt detail on how to walk the tree created and not much of data is available out there. Please point me to a link/guide which helps me write the tree walker for recursively parsed data for my test cases.
I will be translating this parsed data to a valid tcl code.
from pyparsing import *
from pyparsing import OneOrMore, Optional, Word, delimitedList, Suppress
# parse action -maker; # from Paul's example
def makeLRlike(numterms):
if numterms is None:
# None operator can only by binary op
initlen = 2
incr = 1
else:
initlen = {0:1,1:2,2:3,3:5}[numterms]
incr = {0:1,1:1,2:2,3:4}[numterms]
# define parse action for this number of terms,
# to convert flat list of tokens into nested list
def pa(s,l,t):
t = t[0]
if len(t) > initlen:
ret = ParseResults(t[:initlen])
i = initlen
while i < len(t):
ret = ParseResults([ret] + t[i:i+incr])
i += incr
return ParseResults([ret])
return pa
line = Forward()
fcall = Forward().setResultsName("fcall")
flogical = Forward()
lparen = Literal("(").suppress()
rparen = Literal(")").suppress()
arg = Word(alphas,alphanums+"_"+"."+"+"+"-"+"*"+"/")
args = delimitedList(arg).setResultsName("arg")
fargs = delimitedList(OneOrMore(flogical) | OneOrMore(fcall) |
OneOrMore(arg))
fname = Word(alphas,alphanums+"_")
fcall << Group(fname.setResultsName('func') + Group(lparen +
Optional(fargs) + rparen).setResultsName('fargs'))
flogic = Keyword("or") | Keyword("and") | Keyword("not")
logicalArg = delimitedList(Group(fcall.setResultsName("fcall")) |
Group(arg.setResultsName("arg")))
#logicalArg.setDebug()
flogical << Group(logicalArg.setResultsName('larg1') +
flogic.setResultsName('flogic') + logicalArg.setResultsName('larg2'))
#logical = operatorPrecedence(flogical, [(not, 1, opAssoc.RIGHT,
makeLRlike(2)),
# (and, 2, opAssoc.LEFT,
makeLRlike(2)),
# (or , 2, opAssoc.LEFT,
makeLRlike(2))])
line = flogical | fcall #change to logical if operatorPrecedence is used
# Works fine
print line.parseString("f(x, y)").dump()
print line.parseString("f(h())").dump()
print line.parseString("a and b").dump()
print line.parseString("f(a and b)").dump()
print line.parseString("f(g(x))").dump()
print line.parseString("f(a and b) or h(b not c)").dump()
print line.parseString("f(g(x), y)").dump()
print line.parseString("g(f1(x), a, b, f2(x,y, k(x,y)))").dump()
print line.parseString("f(a not c) and g(f1(x), a, b, f2(x,y,
k(x,y)))").dump()
#Does'nt work fine yet;
#try changing flogical assignment to logicalArg | flogic
#print line.parseString("a or b or c").dump()
#print line.parseString("f(a or b(x) or c)").dump()

pyparsing not parsing the whole string

I have the following grammar and test case:
from pyparsing import Word, nums, Forward, Suppress, OneOrMore, Group
#A grammar for a simple class of regular expressions
number = Word(nums)('number')
lparen = Suppress('(')
rparen = Suppress(')')
expression = Forward()('expression')
concatenation = Group(expression + expression)
concatenation.setResultsName('concatenation')
disjunction = Group(lparen + OneOrMore(expression + Suppress('|')) + expression + rparen)
disjunction.setResultsName('disjunction')
kleene = Group(lparen + expression + rparen + '*')
kleene.setResultsName('kleene')
expression << (number | disjunction | kleene | concatenation)
#Test a simple input
tests = """
(8)*((3|2)|2)
""".splitlines()[1:]
for t in tests:
print t
print expression.parseString(t)
print
The result should be
[['8', '*'],[['3', '2'], '2']]
but instead, I only get
[['8', '*']]
How do I get pyparsing to parse the whole string?
parseString has a parameter parseAll. If you call parseString with parseAll=True you will get error messages if your grammar does not parse the whole string. Go from there!
Your concatenation expression is not doing what you want, and comes close to being left-recursive (fortunately it is the last term in your expression). Your grammar works if you instead do:
expression << OneOrMore(number | disjunction | kleene)
With this change, I get this result:
[['8', '*'], [['3', '2'], '2']]
EDIT:
You an also avoid the precedence of << over | if you use the <<= operator instead:
expression <<= OneOrMore(number | disjunction | kleene)

Resources