Regex, get only first occurrence and stop - css

I'm trying to grab each individual keyframes declaration in a css file, and copy it, but inserting moz/ms/o to handle each browser with keyframes.
I'm using this regex:
(#)(-webkit-)([\s\S]*)(\}\R\}\R#)
To try and capture each collection (see full example at my Rubular)

Try this:
/(#)(-webkit-)(.*?\R\})/m
The m modifier makes it a multi-line regexp, so . matches across newlines. I removed the match for # at the end, because then it can't match the last block in the file. And *? makes the match non-greedy, so it only matches one block at a time.
Rubular

The closest you get is...
(#-webkit-[^}]*}\s*to\s*{[^}]*}\s*})
...which can handle unusual/mangled indention in your CSS files decently. This is how it works:
( Start a capture group...
#-webkit- ...upon this phrase.
[^}]* } Continue until you you see a '}' character.
\s* to \s* { Next, the phrase ' to ', followed by '{'...
[^}]* } ...keep going till the next '}' character.
\s* } A final '}' character, possibly preceded by whitespace.
) Stop capturing.
It might be that there are cases where you have a false positive since regex doesn't understand nesting.

Related

How can I find all substrings, which are between two strings including a line-break using R?

How would I find all substrings, which are between "##" and "\nā€œ or ā€œ{ā€œ?
For example, facing "## Test\n" or "## Test {", I would like to get back "Test".
I am not experienced in using Regex but started in trying
str_match("## Test\n", "## (.*?) \n")
using the stringr-package. But it seems as there is an issue with the line-break.
The following should work:
str_match("## Test\n", "##\s*([^\n{]*)[\n{]")
## matches ##
\s* matches any number of whitespace characters
([^\n{]*) will match and capture any number of characters that are not \n or {
[\n{] ends the pattern on either \n or {
You can use an assertion and a negated class.
Putting a \r carriage return in the class will take care of line break issues.
(?<=##)[^\r\n{]*
What it matches is what you need.
Expanded
(?<= \#\# )
[^\r\n{]*
Also, if you anticipate ## not separated by line breaks or {,
and is valid, use something like this
(?<=##)(?:(?!##|[\r\n{])[\S\s])*

regex to find pattern not inside another pattern

I'm trying to write a regex to find all ID selectors in a CSS file. Basically, that means any word that starts with a #, so okay
#\w+
Except ... color specifiers can also start with a #. So what I really want is all words that start with a # that are NOT between { and }. I can't figure out how to say this.
I'm doing this in Notepad++ so I need that flavor of regex.
BTW my real objective is to delete everything that's not an ID selector from the file, so I end up with just a list of selectors. My first try was
Find: [^#]*(#\w+)
Replace: \1\r\n
... and then hit Replace All.
But then I ran into the color problem.
Update
Someone asks for an example. Ok:
Input:
.foo {max-width: 500px;}
#bar {text-align: left;}
.splunge, #plugh {color: #ff0088;}
Desired output:
#bar
#plugh
Note the point is that it includes the two "pound strings" that come outside of braces but not the one that comes inside braces.
What about this? You could use a lookahead expression:
#\w+(?=[^}]*?{)
It ensures that a { follows the match (indicating that the match is part of a selector), but not after a } character (excluding any matches against color declarations in the CSS).
#: match must begin with a #
\w+: match one or more word characters (might need tweaked. \w is equivalent to [A-Za-z0-9_])
(?=...): positive lookahead
[^}]*?: Any character not matching }
{: the { character
https://regex101.com/r/Di43hX/3

R - replace last instance of a regex match and everything afterwards

I'm trying to use a regex to replace the last instance of a phrase (and everything after that phrase, which could be any character):
stringi::stri_replace_last_regex("_AB:C-_ABCDEF_ABC:45_ABC:454:", "_ABC.*$", "CBA")
However, I can't seem to get the refex to function properly:
Input: "_AB:C-_ABCDEF_ABC:45_ABC:454:"
Actual output: "_AB:C-CBA"
Desired output: "_AB:C-_ABCDEF_ABC:45_CBA"
I have tried gsub() as well but that hasn't worked.
Any ideas where I'm going wrong?
One solution is:
sub("(.*)_ABC.*", "\\1_CBA", Input)
[1] "_AB:C-_ABCDEF_ABC:45_CBA"
Have a look at what stringi::stri_replace_last_regex does:
Replaces with the given replacement string last substring of the input that matches a regular expression
What does your _ABC.*$ pattern match inside _AB:C-_ABCDEF_ABC:45_ABC:454:? It matches the first _ABC (that is right after C-) and all the text after to the end of the line (.*$ grabs 0+ chars other than line break chars to the end of the line). Hence, you only have 1 match, and it is the last.
Solutions can be many:
1) Capturing all text before the last occurrence of the pattern and insert the captured value with a replacement backreference (this pattern does not have to be anchored at the end of the string with $):
sub("(.*)_ABC.*", "\\1_CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:")
2) Using a tempered greedy token to make sure you only match any char that does not start your pattern up to the end of the string after matching it (this pattern must be anchored at the end of the string with $):
sub("(?s)_ABC(?:(?!_ABC).)*$", "_CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:", perl=TRUE)
Note that this pattern will require perl=TRUE argument to be parsed with a PCRE engine with sub (or you may use stringr::str_replace that is ICU regex library powered and supports lookaheads)
3) A negative lookahead may be used to make sure your pattern does not appear anywhere to the right of your pattern (this pattern does not have to be anchored at the end of the string with $):
sub("(?s)_ABC(?!.*_ABC).*", "_CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:", perl=TRUE)
See the R demo online, all these three lines of code returning _AB:C-_ABCDEF_ABC:45_CBA.
Note that (?s) in the PCRE patterns is necessary in case your strings may contain a newline (and . in a PCRE pattern does not match newline chars by default).
Arguably the safest thing to do is using a negative lookahead to find the last occurrence:
_ABC(?:(?!_ABC).)+$
Demo
gsub("_ABC(?:(?!_ABC).)+$", "_CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:", perl=TRUE)
[1] "_AB:C-_ABCDEF_ABC:45_CBA"
Using gsub and back referencing
gsub("(.*)ABC.*$", "\\1CBA","_AB:C-_ABCDEF_ABC:45_ABC:454:")
[1] "_AB:C-_ABCDEF_ABC:45_CBA"

Whitespace in Treetop grammar

How explicit do I need to be when specifying were whitespace is or is not allowed? For instance would these rules:
rule lambda
'lambda' ( '(' params ')' )? block
end
rule params
# ...
end
rule block
'{' # ... '}'
end
be sufficient to match
lambda {
}
Basically do I need to specify everywhere optional whitespace may appear?
Yes, you do. In these rules you need to skip whitespace, but, for instance, when you parse strings, which may contain whitespace, you would like to retain them; that's why you have to specify.
However, before applying treetop to your string, you may try to run a "quick and dirty" regexp-based algorithm that discards whitespace from the places where they're optional. Still, this may be much harder that specifying whitespaces in your grammar.

Regex for anything between []

I need to find the regex for []
For eg, if the string is - Hi [Stack], Here is my [Tag] which i need to [Find].
It should return
Stack, Tag, Find
Pretty simple, you just need to (1) escape the brackets with backslashes, and (2) use (.*?) to capture the contents.
\[(.*?)\]
The parentheses are a capturing group, they capture their contents for later use. The question mark after .* makes the matching non-greedy. This means it will match the shortest match possible, rather than the longest one. The difference between greedy and non-greedy comes up when you have multiple matches in a line:
Hi [Stack], Here is my [Tag] which i need to [Find].
^______________________________________________^
A greedy match will find the longest string possible between two sets of square brackets. That's not right. A non-greedy match will find the shortest:
Hi [Stack], Here is my [Tag] which i need to [Find].
^_____^
Anyways, the code will end up looking like:
string regex = #"\[(.*?)\]";
string text = "Hi [Stack], Here is my [Tag] which i need to [Find].";
foreach (Match match in Regex.Matches(text, regex))
{
Console.WriteLine("Found {0}", match.Groups[1].Value);
}
\[([\w]+?)\]
should work. You might have to change the matching group if you need to include special chars as well.
Depending on what environment you mean:
\[([^\]]+)]
.NET syntax, taking care of multiple embedded brackets:
\[ ( (?: \\. | (?<OPEN> \[) | (?<-OPEN> \]) | [^\]] )*? (?(OPEN)(?!)) ) \]
This counts the number of opened [ sections in OPEN and only succeeds if OPEN is 0 in the end.
I encountered a similar issue and discovered that this also does the trick.
\[\w{1,}\]
The \w means Metacharacter. This will match 1 or more word characters.
Using n{X,} quantifier matches any string where you can obtain different amounts. With the second number left out on purpose, the expression means 1 or more characters to match.

Resources