pyparsing not parsing the whole string - pyparsing

I have the following grammar and test case:
from pyparsing import Word, nums, Forward, Suppress, OneOrMore, Group
#A grammar for a simple class of regular expressions
number = Word(nums)('number')
lparen = Suppress('(')
rparen = Suppress(')')
expression = Forward()('expression')
concatenation = Group(expression + expression)
concatenation.setResultsName('concatenation')
disjunction = Group(lparen + OneOrMore(expression + Suppress('|')) + expression + rparen)
disjunction.setResultsName('disjunction')
kleene = Group(lparen + expression + rparen + '*')
kleene.setResultsName('kleene')
expression << (number | disjunction | kleene | concatenation)
#Test a simple input
tests = """
(8)*((3|2)|2)
""".splitlines()[1:]
for t in tests:
print t
print expression.parseString(t)
print
The result should be
[['8', '*'],[['3', '2'], '2']]
but instead, I only get
[['8', '*']]
How do I get pyparsing to parse the whole string?

parseString has a parameter parseAll. If you call parseString with parseAll=True you will get error messages if your grammar does not parse the whole string. Go from there!

Your concatenation expression is not doing what you want, and comes close to being left-recursive (fortunately it is the last term in your expression). Your grammar works if you instead do:
expression << OneOrMore(number | disjunction | kleene)
With this change, I get this result:
[['8', '*'], [['3', '2'], '2']]
EDIT:
You an also avoid the precedence of << over | if you use the <<= operator instead:
expression <<= OneOrMore(number | disjunction | kleene)

Related

pyparsing parse c/cpp enums with values as user defined macros

I have a usecase where i need to match enums where values can be userdefined macros.
Example enum
typedef enum
{
VAL_1 = -1
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = **TEST_ENUM_CUSTOM(1,2)**,
}MyENUM;
I am using the below code, if i don't use format as in VAL_4 it works. I need match format as in VAL_4 as well. I am new to pyparsing, any help is appeciated.
My code:
BRACE, RBRACE, EQ, COMMA = map(Suppress, "{}=,")
_enum = Suppress("enum")
identifier = Word(alphas, alphanums + "_")
integer = Word("-"+alphanums) **#I have tried to "_(,)" to this but is not matching.**
enumValue = Group(identifier("name") + Optional(EQ + integer("value")))
enumList = Group(enumValue + ZeroOrMore(COMMA + enumValue) + Optional(COMMA))
enum = _enum + Optional(identifier("enum")) + LBRACE + enumList("names") + RBRACE + Optional(identifier("typedef"))
enum.ignore(cppStyleComment)
enum.ignore(cStyleComment)
Thanks
-Purna
Just adding more characters to integer is just the wrong way to go. Even this expression:
integer = Word("-"+alphanums)
isn't super-great, since it would match "---", "xyz", "q--10-", and many other non-integer strings.
Better to define integer properly. You could do:
integer = Combine(Optional('-') + Word(nums))
but I've found that for these low-level expressions that occur many places in your parse string, a Regex is best:
integer = Regex(r"-?\d+") # Regex(r"-?[0-9]+") if you like more readable re's
Then define one for hex_integer also,
Then to add macros, we need a recursive expression, to handle the possibility of macros having arguments that are also macros.
So at this point, we should just stop writing code for a bit, and do some design. In parser development, this design usually looks like a BNF, where you describe your parser in a sort of pseudocode:
enum_expr ::= "typedef" "enum" [identifier]
"{"
enum_item_list
"}" [identifier] ";"
enum_item_list ::= enum_item ["," enum_item]... [","]
enum_item ::= identifier "=" enum_value
enum_value ::= integer | hex_integer | macro_expression
macro_expression ::= identifier "(" enum_value ["," enum_value]... ")"
Note the recursion of macro_expression: it is used in defining enum_value, but it includes enum_value as part of its own definition. In pyparsing, we use a Forward to set up this kind of recursion.
See how that BNF is implemented in the code below. I build on some of the items you posted, but the macro expression required some rework. The bottom line is "don't just keep adding characters to integer trying to get something to work."
LBRACE, RBRACE, EQ, COMMA, LPAR, RPAR, SEMI = map(Suppress, "{}=,();")
_typedef = Keyword("typedef").suppress()
_enum = Keyword("enum").suppress()
identifier = Word(alphas, alphanums + "_")
# define an enumValue expression that is recursive, so that enumValues
# that are macros can take parameters that are enumValues
enumValue = Forward()
# add more types as needed - parse action on hex_integer will do parse-time
# conversion to int
integer = Regex(r"-?\d+").addParseAction(lambda t: int(t[0]))
# or just use the signed_integer expression found in pyparsing_common
# integer = pyparsing_common.signed_integer
hex_integer = Regex(r"0x[0-9a-fA-F]+").addParseAction(lambda t: int(t[0], 16))
# a macro defined using enumValue for parameters
macro_expr = Group(identifier + LPAR + Group(delimitedList(enumValue)) + RPAR)
# use '<<=' operator to attach recursive definition to enumValue
enumValue <<= hex_integer | integer | macro_expr
# remaining enum expressions
enumItem = Group(identifier("name") + Optional(EQ + enumValue("value")))
enumList = Group(delimitedList(enumItem) + Optional(COMMA))
enum = (_typedef
+ _enum
+ Optional(identifier("enum"))
+ LBRACE
+ enumList("names")
+ RBRACE
+ Optional(identifier("typedef"))
+ SEMI
)
# this comment style includes cStyleComment too, so no need to
# ignore both
enum.ignore(cppStyleComment)
Try it out:
enum.runTests([
"""
typedef enum
{
VAL_1 = -1,
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = TEST_ENUM_CUSTOM(1,2)
}MyENUM;
""",
])
runTests is for testing and debugging your parser during development. Use enum.parseString(some_enum_expression) or enum.searchString(some_c_header_file_text) to get the actual parse results.
Using the new railroad diagram feature in the upcoming pyparsing 3.0 release, here is a visual representation of this parser:

pyparsing: Grouping guidelines

pyparsing: The below is the code i put up which can parse a nested function call , a logical function call or a hybrid call which nests both the function and a logical function call. The dump() data adds too many unnecessary levels of braces because of grouping. Removing the Group() results in a wrong output. Is there a guideline to use Group(parsers)?
Also the Pyparsing document does'nt detail on how to walk the tree created and not much of data is available out there. Please point me to a link/guide which helps me write the tree walker for recursively parsed data for my test cases.
I will be translating this parsed data to a valid tcl code.
from pyparsing import *
from pyparsing import OneOrMore, Optional, Word, delimitedList, Suppress
# parse action -maker; # from Paul's example
def makeLRlike(numterms):
if numterms is None:
# None operator can only by binary op
initlen = 2
incr = 1
else:
initlen = {0:1,1:2,2:3,3:5}[numterms]
incr = {0:1,1:1,2:2,3:4}[numterms]
# define parse action for this number of terms,
# to convert flat list of tokens into nested list
def pa(s,l,t):
t = t[0]
if len(t) > initlen:
ret = ParseResults(t[:initlen])
i = initlen
while i < len(t):
ret = ParseResults([ret] + t[i:i+incr])
i += incr
return ParseResults([ret])
return pa
line = Forward()
fcall = Forward().setResultsName("fcall")
flogical = Forward()
lparen = Literal("(").suppress()
rparen = Literal(")").suppress()
arg = Word(alphas,alphanums+"_"+"."+"+"+"-"+"*"+"/")
args = delimitedList(arg).setResultsName("arg")
fargs = delimitedList(OneOrMore(flogical) | OneOrMore(fcall) |
OneOrMore(arg))
fname = Word(alphas,alphanums+"_")
fcall << Group(fname.setResultsName('func') + Group(lparen +
Optional(fargs) + rparen).setResultsName('fargs'))
flogic = Keyword("or") | Keyword("and") | Keyword("not")
logicalArg = delimitedList(Group(fcall.setResultsName("fcall")) |
Group(arg.setResultsName("arg")))
#logicalArg.setDebug()
flogical << Group(logicalArg.setResultsName('larg1') +
flogic.setResultsName('flogic') + logicalArg.setResultsName('larg2'))
#logical = operatorPrecedence(flogical, [(not, 1, opAssoc.RIGHT,
makeLRlike(2)),
# (and, 2, opAssoc.LEFT,
makeLRlike(2)),
# (or , 2, opAssoc.LEFT,
makeLRlike(2))])
line = flogical | fcall #change to logical if operatorPrecedence is used
# Works fine
print line.parseString("f(x, y)").dump()
print line.parseString("f(h())").dump()
print line.parseString("a and b").dump()
print line.parseString("f(a and b)").dump()
print line.parseString("f(g(x))").dump()
print line.parseString("f(a and b) or h(b not c)").dump()
print line.parseString("f(g(x), y)").dump()
print line.parseString("g(f1(x), a, b, f2(x,y, k(x,y)))").dump()
print line.parseString("f(a not c) and g(f1(x), a, b, f2(x,y,
k(x,y)))").dump()
#Does'nt work fine yet;
#try changing flogical assignment to logicalArg | flogic
#print line.parseString("a or b or c").dump()
#print line.parseString("f(a or b(x) or c)").dump()

Pyparsing: ParseAction not called

On a simple grammar I am in the bad situation that one of my ParseActions is not called.
For me this is strange as parseActions of a base symbol ("logic_oper") and a derived symbol ("cmd_line") are called correctly. Just "pa_logic_cmd" is not called. You can see this on the output which is included at the end of the code.
As there is no exception on parsing the input string, I am assuming that the grammar is (basically) correct.
import io, sys
import pyparsing as pp
def diag(msg, t):
print("%s: %s" % (msg , str(t)) )
def pa_logic_oper(t): diag('logic_oper', t)
def pa_operand(t): diag('operand', t)
def pa_ident(t): diag('ident', t)
def pa_logic_cmd(t): diag('>>>>>> logic_cmd', t)
def pa_cmd_line(t): diag('cmd_line', t)
def make_grammar():
semi = pp.Literal(';')
ident = pp.Word(pp.alphas, pp.alphanums).setParseAction(pa_ident)
operand = (ident).setParseAction(pa_operand)
op_and = pp.Keyword('A')
op_or = pp.Keyword('O')
logic_oper = (( op_and | op_or) + pp.Optional(operand))
logic_oper.setParseAction(pa_logic_oper)
logic_cmd = logic_oper + pp.Suppress(semi)
logic_cmd.setParseAction(pa_logic_cmd)
cmd_line = (logic_cmd)
cmd_line.setParseAction(pa_cmd_line)
grammar = pp.OneOrMore(cmd_line) + pp.StringEnd()
return grammar
if __name__ == "__main__":
inp_str = '''
A param1;
O param2;
A ;
'''
grammar = make_grammar()
print( "pp-version:" + pp.__version__)
parse_res = grammar.parseString( inp_str )
'''USAGE/Output: python test_4.py
pp-version:2.0.3
operand: ['param1']
logic_oper: ['A', 'param1']
cmd_line: ['A', 'param1']
operand: ['param2']
logic_oper: ['O', 'param2']
cmd_line: ['O', 'param2']
logic_oper: ['A']
cmd_line: ['A']
'''
Can anybody give me a hint on this parseAction problem?
Thanks,
The problem is here:
cmd_line = (logic_cmd)
cmd_line.setParseAction(pa_cmd_line)
The first line assigns cmd_line to be the same expression as logic_cmd. You can verify by adding this line:
print("???", cmd_line is logic_cmd)
Then the second line calls setParseAction, which overwrites the parse action of logic_cmd, so the pa_logic_cmd will never get called.
Remove the second line, since you are already testing the calling of the parse action with pa_logic_cmd. You could change to using the addParseAction method instead, but to my mind that is an invalid test (adding 2 parse actions to the same pyparsing expression object).
Or, change the definition of cmd_line to:
cmd_line = pp.Group(logic_cmd)
Now you will have wrapped logic_cmd inside another expression, and you can then independently set and test the running of parse actions on the two different expressions.

How to find double quotation marks in ML

I have this code which finds double quotation marks and converts the inside of those quotation marks into a string. It manages to find the first quotation mark but fails to find the second so: "this" would be "this . How do I get it I can get this function to find the full string.
Maybe this is too obvious:
if (ch = #"\"") then SOME(String(x ^ "\""))
I do not really understand your code: you return the string just after the first occurence of the quotation mark, but this string has been built with the characters that you've found before it. Moreover, why do you return SOME(Error) instead of NONE?
You need to use a boolean variable to know when the first quotation mark has been seen and to stop when the second one is found. So I would write something like this:
fun parseString x inStr quote =
case (TextIO.input1 inStr, quote) of
(NONE, _) => NONE
| (SOME #"\"", true) => SOME x
| (SOME #"\"", false) => parseString x inStr true
| (SOME ch, true) => parseString (x ^ (String.str ch)) inStr quote
| (SOME _ , false) => parseString x inStr quote;
and initialize quote with false.

How should I understand `let _ as s = "abc" in s ^ "def"` in OCaml?

let _ as s = "abc" in s ^ "def"
So how should understand this?
I guess it is some kind of let pattern = expression thing?
First, what's the meaning/purpose/logic of let pattern = expression?
Also, in pattern matching, I know there is pattern as identifier usage, in let _ as s = "abc" in s ^ "def", _ is pattern, but behind as, it is an expression s = "abc" in s ^ "def", not an identifier, right?
edit:
finally, how about this: (fun (1 | 2) as i -> i + 1) 2, is this correct?
I know it is wrong, but why? fun pattern -> expression is allowed, right?
I really got lost here.
The grouping is let (_ as s) = "abc" -- which is just a convoluted way of saying let s = "abc", because as with a wildcard pattern _ in front is pretty much useless.
The expression let pattern = expr1 in expr2 is pretty central to OCaml. If the pattern is just a name, it lets you name an expression. This is like a local variable in other language. If the pattern is more complicated, it lets you destructure expr1, i.e., it lets you give names to its components.
In your expression, behind as is just an identifier: s. I suspect your confusion all comes down to this one thing. The expression can be parenthesized as:
let (_ as s) = "abc" in s ^ "def"
as Andreas Rossberg shows.
Your final example is correct if you add some parentheses. The compiler/toplevel rightly complains that your function is partial; i.e., it doesn't know what to do with most ints,
only with 1 and 2.
Edit: here's a session that shows how to add the parentheses to your final example:
$ ocaml
OCaml version 4.00.0
# (fun (1 | 2) as i -> i + 1) 2;;
Error: Syntax error
# (fun ((1 | 2) as i) -> i + 1) 2;;
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
0
- : int = 3
#
Edit 2: here's a session that shows how to remove the warning by specifying an exhaustive set of patterns.
$ ocaml
OCaml version 4.00.0
# (function ((1|2) as i) -> i + 1 | _ -> -1) 2;;
- : int = 3
# (function ((1|2) as i) -> i + 1 | _ -> -1) 3;;
- : int = -1
#

Resources