I want to do something like this (I tried four different ways). Create a string value that contains hex characters:
$searchString = "filterValue\xEF\xBF\xBD"
#$searchString = "filterValue" + 0xEF -as [char] + 0xBF -as [char] + 0xBD -as [char]
#$searchString = "filterValue" + 0xEF + 0xBF + 0xBD
$searchString = "filterValue\0xEF\0xBF\0xBD"
Write-Host "$searchString len=$($searchString.Length)"
Length should be 14.
Is there someone to add them inline rather than creating three separate char values as shown in this post: Writing a hex escaped char in Powershell
I tried the longer way:
$char1 = (0xEF -as [char])
$char2 = (0xBF -as [char])
$char3 = (0xBD -as [char])
$searchString = "filterValue" + $char1 + $char2 + $char3
Write-Host "$searchString len=$($searchString.Length)"
$posSearchString = $filterBytes.IndexOf("$searchString")
But that doesn't seem to match what I'm looking for:
Use subexpressions:
$searchString = "filterValue$([char]0xEF)$([char]0xBF)$([char]0xBD)"
String concatenation or using the format operator would work too as long as you cast the values:
$searchString = "filterValue" + [char]0xEF + [char]0xBF + [char]0xBD
$searchString = "filterValue{0}{1}{2}" -f [char]0xEF, [char]0xBF, [char]0xBD
If you want to use the -as operator instead of casting the numbers to chars you need to group the expressions:
$searchString = "filterValue" + (0xEF -as [char]) + (0xBF -as [char]) + (0xBD -as [char])
$searchString = "filterValue{0}{1}{2}" -f (0xEF -as [char]), (0xBF -as [char]), (0xBD -as [char])
Related
I have a usecase where i need to match enums where values can be userdefined macros.
Example enum
typedef enum
{
VAL_1 = -1
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = **TEST_ENUM_CUSTOM(1,2)**,
}MyENUM;
I am using the below code, if i don't use format as in VAL_4 it works. I need match format as in VAL_4 as well. I am new to pyparsing, any help is appeciated.
My code:
BRACE, RBRACE, EQ, COMMA = map(Suppress, "{}=,")
_enum = Suppress("enum")
identifier = Word(alphas, alphanums + "_")
integer = Word("-"+alphanums) **#I have tried to "_(,)" to this but is not matching.**
enumValue = Group(identifier("name") + Optional(EQ + integer("value")))
enumList = Group(enumValue + ZeroOrMore(COMMA + enumValue) + Optional(COMMA))
enum = _enum + Optional(identifier("enum")) + LBRACE + enumList("names") + RBRACE + Optional(identifier("typedef"))
enum.ignore(cppStyleComment)
enum.ignore(cStyleComment)
Thanks
-Purna
Just adding more characters to integer is just the wrong way to go. Even this expression:
integer = Word("-"+alphanums)
isn't super-great, since it would match "---", "xyz", "q--10-", and many other non-integer strings.
Better to define integer properly. You could do:
integer = Combine(Optional('-') + Word(nums))
but I've found that for these low-level expressions that occur many places in your parse string, a Regex is best:
integer = Regex(r"-?\d+") # Regex(r"-?[0-9]+") if you like more readable re's
Then define one for hex_integer also,
Then to add macros, we need a recursive expression, to handle the possibility of macros having arguments that are also macros.
So at this point, we should just stop writing code for a bit, and do some design. In parser development, this design usually looks like a BNF, where you describe your parser in a sort of pseudocode:
enum_expr ::= "typedef" "enum" [identifier]
"{"
enum_item_list
"}" [identifier] ";"
enum_item_list ::= enum_item ["," enum_item]... [","]
enum_item ::= identifier "=" enum_value
enum_value ::= integer | hex_integer | macro_expression
macro_expression ::= identifier "(" enum_value ["," enum_value]... ")"
Note the recursion of macro_expression: it is used in defining enum_value, but it includes enum_value as part of its own definition. In pyparsing, we use a Forward to set up this kind of recursion.
See how that BNF is implemented in the code below. I build on some of the items you posted, but the macro expression required some rework. The bottom line is "don't just keep adding characters to integer trying to get something to work."
LBRACE, RBRACE, EQ, COMMA, LPAR, RPAR, SEMI = map(Suppress, "{}=,();")
_typedef = Keyword("typedef").suppress()
_enum = Keyword("enum").suppress()
identifier = Word(alphas, alphanums + "_")
# define an enumValue expression that is recursive, so that enumValues
# that are macros can take parameters that are enumValues
enumValue = Forward()
# add more types as needed - parse action on hex_integer will do parse-time
# conversion to int
integer = Regex(r"-?\d+").addParseAction(lambda t: int(t[0]))
# or just use the signed_integer expression found in pyparsing_common
# integer = pyparsing_common.signed_integer
hex_integer = Regex(r"0x[0-9a-fA-F]+").addParseAction(lambda t: int(t[0], 16))
# a macro defined using enumValue for parameters
macro_expr = Group(identifier + LPAR + Group(delimitedList(enumValue)) + RPAR)
# use '<<=' operator to attach recursive definition to enumValue
enumValue <<= hex_integer | integer | macro_expr
# remaining enum expressions
enumItem = Group(identifier("name") + Optional(EQ + enumValue("value")))
enumList = Group(delimitedList(enumItem) + Optional(COMMA))
enum = (_typedef
+ _enum
+ Optional(identifier("enum"))
+ LBRACE
+ enumList("names")
+ RBRACE
+ Optional(identifier("typedef"))
+ SEMI
)
# this comment style includes cStyleComment too, so no need to
# ignore both
enum.ignore(cppStyleComment)
Try it out:
enum.runTests([
"""
typedef enum
{
VAL_1 = -1,
VAL_2 = 0,
VAL_3 = 0x10,
VAL_4 = TEST_ENUM_CUSTOM(1,2)
}MyENUM;
""",
])
runTests is for testing and debugging your parser during development. Use enum.parseString(some_enum_expression) or enum.searchString(some_c_header_file_text) to get the actual parse results.
Using the new railroad diagram feature in the upcoming pyparsing 3.0 release, here is a visual representation of this parser:
My input is growing from simple 2-level nested lists to a complex nested list of lists. I see where pyparsing.nestedExpr() is the bees' knees for this kind of thing, but I'm still wanting to build up a nested Dict structure.
With the basics somewhat squared away I've crafted this:
import pyparsing as pp
input_works = '''
(unitsOfMeasure
(altitudeUnits "m")
(capacitanceUnits "pF")
(designUnits "MIL")
(drawingUnits "MIL")
(drawingAccuracy 2)
(drawingHeight 28000)
)''
# recursive dict
input_doesnt_work = '''
(parameterFile "out.tf"
(revision "15.6")
(xcoord 1234.567)
(ycoord -3456.890)
(unitsOfMeasure
(altitudeUnits "m")
(capacitanceUnits "pF")
(designUnits "MIL")
(drawingUnits "MIL")
(drawingAccuracy 2)
(drawingHeight 28000)
)
)'''
v_string = pp.Word(pp.alphanums+'_'+'-'+'.')
v_quoted_string = pp.Combine( '"' + v_string + '"')
v_number = pp.Regex(r'[+-]?(?P<float1>\d+)(?P<float2>\.\d+)?(?P<float3>[Ee][+-]?\d+)?')
keyy = v_string
valu = pp.Or( [ v_string, v_quoted_string, v_number])
item = pp.Group( pp.Literal('(').suppress() + keyy + pp.OneOrMore( valu) + pp.Literal(')').suppress() )
# some magic - use Forward to make the dicts self-referential and thus recursive
dicts = pp.Forward()
dicts << pp.Group( pp.Literal('(').suppress() + \
keyy + \
pp.Optional( valu) + \
pp.OneOrMore( pp.Or( item, dicts)) + \
pp.Literal(')').suppress() )
print "dicts_input_works yields: ", dicts.parseString( input_works)
print "dicts_input_doesnt_work yields: ", dicts.parseString( input_doesnt_work
input_doesnt_work chokes on like 6, col 5, as if the self-reference in
pp.OneOrMore( pp.Or( item, dicts))
isn't being seen.
TIA,
code_warrior
Spotted my error 3sec after posting:
pp.OneOrMore( pp.Or( item, dicts )) + \ # wrong
pp.OneOrMore( pp.Or( [item, dicts] )) + \ # right
Never mind, nothing to see here, move along.
Thanks,
code_warrior
Using HtmlProvider to access a web-based table sometimes returns a fraction as a string (correct) and, at other times, returns a DateTime (incorrect).
What am I missing?
module Test =
open FSharp.Data
let [<Literal>] url = "https://www.example.com/fractions"
type profile = HtmlProvider<url>
let profile = profile.Load(url)
let [<Literal>] resultFile = #"C:\temp\data\Profile.csv"
let CsvResult =
do
use writer = new StreamWriter(resultFile, false)
writer.WriteLine "\"Date\";\"Fraction\""
for row in profile.Tables.Table1.Rows do
"\"" + row.``Date``.ToString() + "\"" + ";" |> writer.Write
"\"" + row.``Fraction``.ToString() + "\"" + ";" |> writer.WriteLine
writer.Close
let csvResult = CsvResult
Without seeing sample data I can't be 100% certain, but I'm guessing that it's parsing fractions as dates if the numbers involved would be valid dates in the culture you're using: e.g., 1/4 would be a valid date in any culture that uses / as a separator, and would be treated either as April 1st or as January 4th, depending on which parsing culture your system defaults to.
Other type providers in FSharp.Data (such as the CSV type provideryou could ) allow you to configure how each column will be parsed, but that's not an option the HTML type provider gives you. (Which is a bit of a missing feature, of course). But since the HTML type provider does allow you to specify the culture info for datetime and number parsing, one way you might be able to work around this is specify a culture that does not use / as a separator (but still uses . as a decimal point, since otherwise if the HTML you're parsing has numbers written like 1,000 for one thousand, that could be interpreted as 1). One such culture is the en-IN culture ("English (India)"), where the date separator is - and the decimal point is ..
So try passing Culture=System.Globalization.CultureInfo.GetCultureInfo("en-IN") in your HtmlProvider options, and see if that helps it stop treating fractions as dates.
The following combination of functions worked:
// http://www.fssnip.net/29/title/Regular-expression-active-pattern
module Solution =
open System
open System.Text.RegularExpressions
open FSharp.Data
let (|Regex|_|) pattern input =
let m = Regex.Match(input, pattern)
if m.Success then Some(List.tail [ for g in m.Groups -> g.Value ])
else None
let ptrnFraction = #"^([0-9]?[0-9]?)(\/)([0-9]?[0-9]?)$"
let ptrnDateTime = #"(\d{2})\/(\d{2})\/(\d{4}) (\d{2}):(\d{2}):(\d{2})"
let ToFraction input =
match input with
| Regex ptrnFraction [ numerator; operator; denominator ] ->
(numerator + operator + denominator).ToString()
| Regex ptrnDateTime [ day; month; year; hours; minutes; seconds ] ->
(day + "/" + month).ToString()
| _ -> "Not valid!"
let dtInput = #"05/09/2017 00:00:00"
let frcInput = #"13/20"
let outDate = ToFraction dtInput
printfn "Out Date: %s" outDate
let outFraction = ToFraction frcInput
printfn "Out Fraction: %s" outFraction
//Output:> Out Date: 05/09 Out Fraction: 13/20
Thus, I was able to replace:
"\"" + row.``Fraction``.ToString() + "\"" + ";" |> writer.WriteLine
with:
"\"" + ToFraction(row.``Fraction``.ToString()) + "\"" + ";" |> writer.Write
Thanks to #rmunn for the clarity of his explanations and the benefit of his expertise.
I have Defined a grammar
column = Word(alphanums + '._`')
stmt = column + Literal("(") + Group(delimitedList( column )) +Literal(")")
Now I want to match below query using close match
sql = seller(food_type,count(sellers),sum(weight),Earned_money)
I do not want to change the grammar defined above. How do I closeMatch given
functions as a argument
result = stmt.parseString(sql)
print result.dump()
def Review(sql):
stmt = GetGrammer(sql)
result = stmt.parseString(sql,parseAll=False)
print result.dump()
Where var sql is a procedure of 400-500 lines. So I Am making a Automating code Review part.For This purpose I have written grammar for sql statements.
But it is throwing exceptions Even If there is a small string which is not matching.And It is terminating after that.I want that it should not abort even if exceptions are Comming Because I know that atleast parsable part is useful for reviewing database queries.
Get Grammar is returning grammar for Procedure and all these are sql statements.
def Getgrammar(sql):
InputParameters = delimitedList( Optional((_in|_out|_inout),'') + column +
DataType)
DeclarativeSyntax = (_declare + column + DataType+';')
createProcedureStmt = createProcedure +
StoredProcedure.setResultsName("Procedure") +
lpar +
Optional(InputParameters.setResultsName("Input"),'') +
rpar +
Optional(_sql_security_invoker,'').setResultsName("SQLSECURITY") +
_begin +
ZeroOrMore( DeclarativeSyntax ).setResultsName("Declare") +
ZeroOrMore( ( selectStmt|setStmt|ifStmt.setResultsName("IfStmt")|
callStmt|updateStmt|createStmt|dropStmt|alterStmt|insertStmt
|deleteStmt|WhileStmt.setResultsName("WhileStmt")|createStmt ) + ';') +
_end+Optional(';','')
return createProcedureStmt
I have the following grammar and test case:
from pyparsing import Word, nums, Forward, Suppress, OneOrMore, Group
#A grammar for a simple class of regular expressions
number = Word(nums)('number')
lparen = Suppress('(')
rparen = Suppress(')')
expression = Forward()('expression')
concatenation = Group(expression + expression)
concatenation.setResultsName('concatenation')
disjunction = Group(lparen + OneOrMore(expression + Suppress('|')) + expression + rparen)
disjunction.setResultsName('disjunction')
kleene = Group(lparen + expression + rparen + '*')
kleene.setResultsName('kleene')
expression << (number | disjunction | kleene | concatenation)
#Test a simple input
tests = """
(8)*((3|2)|2)
""".splitlines()[1:]
for t in tests:
print t
print expression.parseString(t)
print
The result should be
[['8', '*'],[['3', '2'], '2']]
but instead, I only get
[['8', '*']]
How do I get pyparsing to parse the whole string?
parseString has a parameter parseAll. If you call parseString with parseAll=True you will get error messages if your grammar does not parse the whole string. Go from there!
Your concatenation expression is not doing what you want, and comes close to being left-recursive (fortunately it is the last term in your expression). Your grammar works if you instead do:
expression << OneOrMore(number | disjunction | kleene)
With this change, I get this result:
[['8', '*'], [['3', '2'], '2']]
EDIT:
You an also avoid the precedence of << over | if you use the <<= operator instead:
expression <<= OneOrMore(number | disjunction | kleene)