Antlr V4 grammar for XQuery 3.1 - xquery

I found the following Antlr 4 grammar:
https://gist.github.com/JoeyAcc/829c28fcf18091ed6ebfcf91d7519f58
I received the following error while trying to generate the code from it:
error(134): xquery31.g4:178:26: rule reference PragmaContentsInternal is not currently supported in a set
error(134): xquery31.g4:264:25: rule reference DirPIContentsInternal is not currently supported in a set
error(134): xquery31.g4:268:32: rule reference CDataSectionContentsInternal is not currently supported in a set
error(134): xquery31.g4:311:17: rule reference StringConstructorCharsInternal is not currently supported in a set
warning(156): xquery31.g4:377:59: invalid escape sequence
warning(156): xquery31.g4:378:60: invalid escape sequence
warning(156): xquery31.g4:392:30: invalid escape sequence
warning(156): xquery31.g4:394:30: invalid escape sequence
error(134): xquery31.g4:416:27: rule reference CommentContentsInternal is not currently supported in a set
warning(146): xquery31.g4:13:0: non-fragment lexer rule Prolog can match the empty string
warning(146): xquery31.g4:110:0: non-fragment lexer rule WindowVars can match the empty string
warning(146): xquery31.g4:127:0: non-fragment lexer rule OrderModifier can match the empty string
warning(146): xquery31.g4:211:0: non-fragment lexer rule PredicateList can match the empty string
warning(146): xquery31.g4:248:0: non-fragment lexer rule DirAttributeList can match the empty string
warning(146): xquery31.g4:261:0: non-fragment lexer rule DirCommentContents can match the empty string
error(99): xquery31.g4::: grammar xquery31 has no rules
Are there any suggestions to fix this ANTLR V4 grammar?
The EBNF for XQuery v3.1 is here

The last line of your error message is very telling. And if you look at the grammar, every word is capitalized. That means that this is just the lexer rules, and ones with errors at that. The parser rules are entirely absent. Perhaps contact the poster of this grammar or search for the parser rules before investing time in this file.

I think the conclusion of the work by JoeyACC is here - https://gist.github.com/JoeyAcc/375663dfc21cd7ec4d35ea8f7ff251d0#file-xquery31-g4
Another approach would be to take this Antlrv4 XQuery 1.0 grammar and add the missing parts for 3.1: https://github.com/bluezio/xquery-parser

I ended up creating my own Antlr4 grammar for XQuery 3.1. It is at: https://github.com/xqdoc/xqdoc/tree/master/src/main/antlr4/org/xqdoc

Related

Do I need an empty sequence () in addition to a comment in an enclosed expression?

When trying to use the oXygen editor to comment out a node inside of an element oXygen simply wrapped it into (:<foo>foo 1</foo>:), but I then found out that that way the node did not get commented out but was rather prefixed by a text node with (: and suffixed by a text node with :).
Then I looked up the syntax and found out you need to use an enclosed expression {(:<foo>foo 1</foo>:)} instead to have access to the comment syntax.
However, while BaseX and Saxon 9.8 happily accept {(:<foo>foo 1</foo>:)}, Altova complains and needs an additional empty sequence {(:<foo>foo 1</foo>:)()}.
https://www.w3.org/TR/xquery-31/#doc-xquery31-EnclosedExpr suggests in XQuery 3.1 the expression inside curly braces is optional and defaults to ().
Does this also mean that in XQuery 3.1 it should suffice to use simply the comment inside of the curly braces, without an empty sequence?
So to summarize, Saxon and BaseX allow me to use <root>{(:<foo>foo 1</foo>:)}</root> while Altova complains about incorrect syntax, forcing me to use <root>{(:<foo>foo 1</foo>:)()}</root>.
Is that still necessary in XQuery 3.1?
Sounds like a bug in their commenter, which is pretty common in XQuery editors. Within in an element - and assuming you are using direct element constructors, not computed element constructors - use XML comments:
<hello>world
<!-- Don't print me -->
</hello>
Computed element constructors still use XQuery comments:
element hello {
'world' (: Don't print me :)
}

IIS UrlRewrite RegEx differences with .NET RegEx

I have a valid RegEx pattern in .NET:
(?>.*param1=value1.*)(?<!.*param2=\d+.*) which matches if:
query string contains param1=value1
but does not contain param2= a number
It works in .NET. However IIS URLRewrite complains that it is not a valid pattern.
Can I not use zero-width negative look behind (?<! ) expressions with IIS URLRewrite?
Note that I tried to apply this pattern both in web.config (properly changing < and > to < and > respectively, as well as in the IIS Manager - all without success.
IIS URLRewrite default regex syntax is ECMAScript, that is not compatible with .NET regex syntax. See URL Rewrite Module Configuration Reference:
ECMAScript – Perl compatible (ECMAScript standard compliant) regular expression syntax. This is a default option for any rule.
You cannot use a lookbehind at all, you will have to rely on lookaheads only:
^(?!.*param2=\d).*param1=value1.*
Pattern explanation:
^ - start of string
(?!.*param2=\d) - if there is param2= followed with a digit (\d) after 0+ characters other than a newline fails the match (return no match)
.*param1=value1.* - match a whole line that contains param1=value1
You can enhance this rule by adding \b around param1=value1 to only match it as a whole word.

What is an example usage of <url-modifier> at CSS url() function?

3.4. Resource Locators: the <url> type describes a <url-modifier> at
A URL is a pointer to a resource and is a functional notation
denoted by <url>. The syntax of a <url> is:
<url> = url( <string> <url-modifier>* )
In addition to the syntax defined above, a can sometimes be
written in other ways:
For legacy reasons, a <url> can be written without quotation marks around the URL itself. This syntax is specially-parsed, and
produces a <url-token> rather than a function syntactically.
[CSS3SYN]
Some CSS contexts, such as #import, allow a <url> to be represented by a <string> instead. This behaves identically to
writing a url() function containing that string. Because these
alternate ways of writing a <url> are not functional notations, they
cannot accept any <url-modifier>s.
Note: The special parsing rules for the legacy quotation mark-less
<url> syntax means that parentheses, whitespace characters, single
quotes (') and double quotes (") appearing in a URL must be escaped
with a backslash, e.g. url(open\(parens), url(close\)parens).
Depending on the type of URL, it might also be possible to write these
characters as URL-escapes (e.g. url(open%28parens) or
url(close%29parens)) as described in[URL]. (If written as a
normal function containing a string, ordinary string escaping rules
apply; only newlines and the character used to quote the string need
to be escaped.)
at
3.4.2. URL Modifiers
The url() function supports specifying additional <url-modifier>s,
which change the meaning or the interpretation of the URL somehow. A
<url-modifier> is either an <ident> or a function.
This specification does not define any <url-modifier>s, but other
specs may do so.
See also CSS Values and Units Module Level 3
Editor’s Draft, 21 March 2016
What are example usages of <ident> and function at url() ?
What are differences between <string> , <ident>, function at url() ?
A <url-modifier> is either an <ident> or a function.
<ident> is an identifier.
A portion of the CSS source that has the same syntax as an <ident-token>.
<ident-token> Syntax ;
I could not find any examples of <ident> used within the url function but
as mentioned in this email there are some possible future uses.
Fetch options to control CORS/cookies/etc
working with Subresource Integrity
Looking at the <ident> syntax you cannot use a key/value pair so i assume
most of this would be implemented using a function which does not yet exist., resource hinting could be implemented using <ident>.
.foo {
background-image: url("//aa.com/img.svg" prefetch);
}
I did however find a "A Collection of Interesting Ideas" with a function <url-modifier> defined.
SVG Parameters (not official spec)
The params() function is a <url-modifier>
.foo {
background-image: url("//aa.com/img.svg" param(--color var(--primary-color)));
}

Case insensitive token matching

Is it possible to set the grammar to match case insensitively.
so for example a rule:
checkName = 'CHECK' Word;
would match check name as well as CHECK name
Creator of PEGKit here.
The only way to do this currently is to use a Semantic Predicate in a round-about sort of way:
checkName = { MATCHES_IGNORE_CASE(LS(1), #"check") }? Word Word;
Some explanations:
Semantic Predicates are a feature lifted directly from ANTLR. The Semantic Predicate part is the { ... }?. These can be placed anywhere in your grammar rules. They should contain either a single expression or a series of statements ending in a return statement which evaluates to a boolean value. This one contains a single expression. If the expression evaluates to false, matching of the current rule (checkName in this case) will fail. A true value will allow matching to proceed.
MATCHES_IGNORE_CASE(str, regexPattern) is a convenience macro I've defined for your use in Predicates and Actions to do regex matches. It has a case-sensitive friend: MATCHES(str, regexPattern). The second argument is an NSString* regex pattern. Meaning should be obvious.
LS(num) is another convenience macro for your use in Predicates/Actions. It means fetch a Lookahead String and the argument specifies how far to lookahead. So LS(1) means lookahead by 1. In other words, "fetch the string value of the first upcoming token the parser is about to try to match".
Notice that I'm still matching Word twice at the end there. The first Word is necessary for matching 'check' (even though it was already tested in the predicate, it was not matched and consumed). The second Word is for your name or whatever.
Hope that helps.

Whitespace in Treetop grammar

How explicit do I need to be when specifying were whitespace is or is not allowed? For instance would these rules:
rule lambda
'lambda' ( '(' params ')' )? block
end
rule params
# ...
end
rule block
'{' # ... '}'
end
be sufficient to match
lambda {
}
Basically do I need to specify everywhere optional whitespace may appear?
Yes, you do. In these rules you need to skip whitespace, but, for instance, when you parse strings, which may contain whitespace, you would like to retain them; that's why you have to specify.
However, before applying treetop to your string, you may try to run a "quick and dirty" regexp-based algorithm that discards whitespace from the places where they're optional. Still, this may be much harder that specifying whitespaces in your grammar.

Resources