Is there a way to prevent (have everything but) specific characters in "match" in .tmLanguage - tmlanguage

As the title said, is there a way to prevent specific characters in "match" in .tmLanguage syntax highlighting? Like, for example, match everything but w and s.

I believe [^ws] is what you're looking for. ^ is a regex negation operator. So, just like you'd use [abc] to match either a, b, or c, adding ^ at the beginning negates it. Just make sure it's inside the square brackets.

Related

Vectorizing Dot before or after the Function Name?

I can vectorize a function using the dot notation:
a = Vector(0:10) .* 4
As in plenty of examples I read the dot comes prior to the asterisk.
However, this does not work in the following case:
Complex.(a,a)
Here the dot suddenly goes behind the function name.
Is this intended? And is there a rule?
For functions, the dot always goes behind the function name.
For operators, like * or + for example, the dot goes before the operator. However, you can enclose the operator in parentheses and suffix the dot.
To make this difference even more explicit, consider this example where we apply "multiply" with function call syntax:
x = rand(2,2)
sqrt.(x)
.*(x,x)
(*).(x,x)
x .* x
The last three commands all do the same thing.
See the corresponding sections of the Julia documentation for more: Dot Syntax for Vectorizing Functions and Vectorized "dot" operators.

How can I write an unambiguous nearley grammar for boolean search operators

The Context
I am climbing the Nearley learning curve and trying to write a grammar for a search query parser.
The Goal
I would like to write grammar that is able to parse a querystring that contains boolean operators (e.g. AND, OR, NOT). Lets use AND for this question as a trivial case.
For instance, the grammar should recognize these example strings as valid:
pants
pants AND socks
jumping jacks
The Attempt
My naive attempt looks something like this:
query ->
statement
| statement "AND" statement
statement -> .:+
The Problem
The above grammar attempt is ambiguous because .:+ will match literally any string.
What I really want is for the first condition to match any string that does not contain AND in it. Once "AND" appears I want to enter the second condition only.
The Question
How can I detect these two distinct cases without having ambiguous grammar?
I am worried I'm missing something fundamental; I can imagine a ton of use cases where we want arbitrary text split up by known operators.
Yeah, if you've got an escape hatch that could be literally anything, you're going to have a problem.
Somewhere you're going to want to define what your base set of tokens are, at least something like \S+ and then how those tokens can be composed.
The place I'd typically start for a parser is trying to figure out where recursion is accounted for in the parser, and what approach to parsing the lib you're relying on takes.
Looks like Nearley is an Earley parser, and as the wikipedia entry for them notes, they're efficient for left-recursion.
This is just hazarding a guess, but something like this might get you to conjunction at least.
CONJUNCTION -> AND | OR
STATEMENT -> TOKENS | (TOKENS CONJUNCTION STATEMENT)
TOKENS -> [^()]+
A structure like this should be unambiguous and bans parentheses in tokens, unless they're surrounded by double quotes.

JavaCC - choice based on return type?

I have an ifElse Statement which can be of the following two types
a) ifElse(condition, expression_bool_result, expression_bool_result)
whereas expression_bool_result may either be TRUE/FALSE, the result of and(), or(), ==, !=.... or further ifElse
b) ifElse(condition, expression_arith_result, expression_arith_result)
whereas expression_arith_result may either be any number, the result of calculations of further functions returning a number... (or further ifElse)
Since I am new to javacc, I would like to ask you how a production could look like which allows the parser for a clear decision.
Currently I get the warning
Warning: Choice conflict involving two expansions at
line 824, column 5 and line 825, column 5 respectively.
A common prefix is: "ifElse" "("
Consider using a lookahead of 3 or more for earlier expansion.
which - as far as I can tell - implies that my grammer (regarding ifelse) is ambiguous.
If there is no way to write it unambiguously, how could the suggested lookahead look like?
Thanks for your feedback in advance!
No fixed amount of lookahead could possibly resolve this ambiguity in all cases. You could have an arbitrarily long stream of tokens that form a valid expression_arith_result - but is then followed by a comparison operator and another arithmetic value, thus turning it into an expression_bool_result.
The solution would be to have a single ifElse statement, that takes two arbitrary expressions. The required agreement in type between the two expressions would be a matter of semantics, not grammar.
Jason's answer is correct in that you can't resolve the choice with a fixed length of lookahead. However JavaCC does not limit you to fixed length of lookahead. So you can do the following.
void IfExpression() :
{ }
{ LOOKAHEAD( <IFELSE> "(" Condition() "," BooleanExpression() )
BooleanIfExpression()
|
ArithmeticIfExpression()
}

Regexpression asp.net validator for a few words

I'm trying to create a validator for a string, that may contain 1-N words, which a separated with 1 whitespace (spaces only between words). I'm a newbie in a regex, so I feel a bit confused, cause my expression seem to be correct:
^[[a-zA-Z]+\s{1}]{0,}[a-zA-Z]+$
What am I doing wrong here? (it accepts only 2 words .. but I want it to accept 1+ words)
Any help is greatly appreciated :)
As often happens with someone beginning a new programming language or syntax, you're close, but not quite! The ^ and $ anchors are being used correctly, and the character classes [a-zA-Z] will match only letters (sounds right to me), but your repetition is a little off, and your grouping is not what you think it is - which is your primary problem.
^[[a-zA-Z]+\s{1}]{0,}[a-zA-Z]+$
^ ^^^^^^^^
a bbbacccc
It only matches two words because you effectively don't have any group repetition; this is because you don't really have any groups - only character classes. The simplest fix is to change the first [ and its matching end brace (marked by a's in the listing above) to parentheses:
^([a-zA-Z]+\s{1}){0,}[a-zA-Z]+$
This single change will make it work the way you expect! However, there a few recommendations and considerations I'd like to make.
First, for readability and code maintenance, use the single character repetition operators instead of repetition braces wherever possible. * repeats zero or more times, + repeats one or more times, and ? repeats 0 or one times (AKA optional). Your repetition curly braces are syntactically correct, and do what you intend them to, but one (marked by b's above) should be removed because it is redundant, and the other (marked by c's above) should be shortened to an asterisk *, as they have exactly the same meaning:
^([a-zA-Z]+\s)*[a-zA-z]+$
Second, I would recommend considering (depending upon your application requirements) the \w shorthand character class instead of the [a-zA-Z] character class, with the following considerations:
it matches both upper and lowercase letters
it does match more than letters (it matches digits 0-9 and the underscore as well)
it can often be configured to match non-English (unicode) letters for multi-lingual input
If any of these are unnecessary or undesirable, then you're on the right track!
On a side note, the character combination \b is a word-boundary assertion and is not needed for your case, as you will already begin and end where there are letters and letters only!
As for learning more about regular expressions, I would recommend Regular-Expressions.info, which has a wealth of info about regexes and the inner workings and quirks of the various implementations. I also use a tool called RegexBuddy to test and debug expressions.

Simple integer regular expression

I have ValidationRegularExpression="[0-9]" which only allows a single character. How do I make it allow between (and including) 1 and 7 digits? I tried [0-9]{1-7} but it didn't work.
You got the syntax almost correct: [0-9]{1,7}.
You can make your solution a bit more elegant (and culture-sensitive) by replacing [0-9] with the generic character group "decimal digit": \d (remember that other languages might use different characters for digits than 0-9).
And here's the documentation for future reference:
.NET Framework Regular Expressions
If you want to avoid leading zeros, you can use this:
^(?!0\d)\d{1,7}$
The first part is a negative lookahead assertion, that checks if there is a 0 followed by a number in the string. If so no match.
Check online here: http://regexr.com?2thtr

Resources