Source + (E)BNF = ast.json - abstract-syntax-tree

Is there any way to parse source string with custom (e)bnf and get AST as json?
Let me explain what do i need:
I have source string and bnf grammar (as string too).
I put EBNF as lexer.
Add source.
Get AST and save it as JSON.

"EBNF as lexer" is nonsensical. But the rest of your question can be interpreted as, "can I get a parser driven by an EBNF to produce an AST in JSON form?"
Sure.
Most parser generators accept (E)BNF and "parse". Most of them don't automatically produce an AST; they make the programmer define how each rule should generate tree nodes. Those won't work for your task.
Some do generate ASTs as data structures automatically using just the BNF and a source file: ANTLR4 (I think), and our DMS Software Reengineering Toolkit. Neither of these produce JSON directly, but in both cases it should be straighforward to write (once) a generic tree-walker that spits JSON.
DMS's BNF will handle any context-free grammar using just BNF rules. ANTLR4 handles most grammars but has restrictions on what you can write (e.g., certain kinds of left recursion are verboten), and requirements for you to add extra disambiguating information where the grammar isn't LL(1).
DMS will export XML directly. See this example.

Related

Is there any way to find special characters in string

I have a requirement to perform different operations if string contains any special character.
Is there any way to implement regular expression in gremlin.
Input_Name= Test#input
if Input_Name.contains( "#/$%...")
{
println " error "
}
else
{
println "sucess"
}
Currently the Gremlin language does not have a TextP.regex predicate. Some implementations, such as JanusGraph, do add custom regex extensions to Gremlin. You could also, if the database you are using allows it, use Groovy closure syntax to include a regex in a query. Within the TinkerPop community we are planning to add a TextP.regex to the Gremlin language. The code is written and on a branch that we hope will be part of the TinkerPop 3.6.0 release if all goes well.
However, in your case, perhaps the existing TextP.containing could be used if there are a finite set of special characters you are looking for but it is likely not the most optimal way to solve the problem as you will have to or several has steps together.
Another option might be to use an external index if your database implementation supports that.
Just as an example of the closure syntax, if your implementation allows it, a REGEX match would look like the example below. In general though, use of closures is not recommended, and many implementations either fully block or extremely limit their use.
gremlin> g.V().limit(20).filter {it.get().values('desc').next() ==~ "[A-Za-z]* [A-Z]'(.*)"}.values('desc')
==>Chicago O'Hare International Airport

Decode and parse a file encoded with BER(Basic encoding rules) to output relevant fields to csv without a .asn (ASN.1) schema?

The files I have been given are sample CDR files(Call Detail Records)
SGSN / GGSN data format: ASN.1 Basic Encoding Rules (BER).
The files have no extensions and I do not have a schema to work with. How can I approach this?
Vasil is correct that, to a degree, BER can be decoded without a schema. However, if the schema uses implicit tags, you won't get very far until you have blocks of data that you have no idea how to interpret. You will very likely need to either get the schema files or use a tool that has the appropriate schema definitions built-in.
If the files follow 3GPP 32.297 and 32.298, those specifications are freely available and you may be interested in https://www.3gpp.org/ftp/Specs/archive/32_series/32.298/ASN.1/
My company has a visual editor that can handle 32.297 CDR files. You can get a trial at: https://www.obj-sys.com/products/asn1ve/index.php. It comes with some CDR specs built in, so you might not need to get the schemas yourself.
To a certain extent it is possible to decode any valid BER encoded data without having the ASN.1 schema.
Try decoding the file using the unber tool from the asn1c project or this online decoder.

ANTLR for parsing calculation in JSON format

I am new to ANTLR and trying to see if ANTLR fit into my scenario or I should stick to JSON deserialization library and write some custom code.
Input text is a JSON which represents an expression like
{
"Operator":"ADD",
"Operands": [{"Operator":"ADD", "Operands":[{"OperandValue":23},{"OperandValue":32} ]},
{"Operator":"ADD", "Operands":[{"OperandValue":11},{"OperandValue":12} ]}]}
}
list of Operators can evolve
JSON is created programmatically and not manually created by users by hand.
I have to read this JSON and validate it and give meaningful error messages to my client.
If JSON in valid/parsed, I have to translate JSON to TSQL code and execute on SQL server.
All my code will be in C# and I need:
max Debug support and ease
unit testability
less custom code
min. learning
With above two needs, I tried to write a rough ANTLR grammar and custom code and below are my observation
ANTLR in a way my logic will reside grammar file, which might be difficult to
debug writing grammar for new comer. and unit test. for doing my grammar POC I was relying only context.GetText() property to figure out where I am currently and change my grammar.
I have to write a modularized grammar(building blocks), so that I can have visitor for my smallest part and more manageable visitor class.
How can I give more meaning full messages to my clients, with them having least knowledge of my grammar parsing engine?
custom JSON deserialization (JSON.NET)
code easy to debug, and everyone understands. I get a JSON reader and write condition to check if JSON Object or JsonArray and if it has Property Operator with value ADD and similar.
custom code I can give more meaningful validation failure messages.
To me ANTLR seems to have high value when your input is not highly structured and you don't have available parsers, but in case of JSON it doesn't give much value add over JSON parsers.
Is ANTLR meant for this scenario?

Can you use wildcards in a token with ParseKit?

I'm trying to add a symbol token using ParseKit below:
[t.symbolState add:#"<p style=\"margin-left: 20px;\">"];
I'm wondering if ParseKit allows for wildcards when adding a symbol, such as:
[t.symbolState add:#"<p style=\"margin-left: ##px;\">"];
I want to be able to then extract the wildcard from the token during the parsing procedure.
Is such a thing possible with ParseKit?
Developer of ParseKit here.
I think using ParseKit in this way is not a good idea.
ParseKit (and its successor PEGKit) excel at tokenizing input and then parsing at the token level.
There are several natural tokens in the example input you've provided, but what you are trying to do here is ignore those natural tokens, combine them into a blob of input, and then do fancy sub-token matching using patterns.
There is a popular, powerful tool for fancy sub-token matching using patterns: Regular Expressions. They will be a much better solution for that kind of thing than PEGKit.
However, I still don't think Regular Expressions are the tool you want to use here (or, at least not the only tool).
It looks like you want to parse XML input. Don't use Regex or PEGKit for that. Use an XML parser. Always use an XML parser for parsing XML input.
You may choose to use another XML API layered on top of the XML Parser (SAX, StAX, DOM, XSLT, XQuery, etc.) but, underneath it all, you should be parsing with an XML parser (and, of course, all of those tools I listed, do).
See here for more info.
Then, once you have the style attribute string value you are looking for, use Regex to do fancy pattern matching.

How to Decode ASN.1 format to CSV format using Unix Programing

I have ASN format files i have to convert into CSV format also readable one
I need a Decoder with some advanced options like schedule,Auto FTP like that
Pretty old thread but it still comes at top on Google search and it will be good for people to get an answer. Not exactly Unix programming but you can find a "generic" Javascript based ASN decoder at http://lapo.it/asn1js/.
You can also download this and run natively on your box.
Erlang provides very good support for reading and writing BER-, DER- and PER-encoded ASN.1 packets. The Erlang compiler even accepts ASN.1 syntax natively and produces a custom codec as an Erlang module.
It's a whole new programming language for most people though, so whether it's worth learning it just for this exercise, I'll leave up to you. It is a very fun language to learn, however, and will teach you a very different way to think about programming.
You could have a look at asn1 compiler.
It converts asn.1 syntax files to C code.
As Marcelo noted in you question, you didn't exactly say precisely what you need, so can't tell if it covers all your bases, but you will be able to compile the code to binary (from C code, obviously)
There is an open source package called asn1c which will do ASN.1 encoding and decoding. Its a C library that you need to build and then write code around to implement your program. In order to build the library, it requires the ASN.1 syntax file that is used to construct the encoded messages. When decoding, one option is to output the data to an XML file which you would then need to convert to a CSV file somehow. At a minimum, it supports BER, XER, and PER.
In Python, there is also the PyASN1 library and tools: http://pyasn1.sourceforge.net/

Resources