Need some help building a somewhat simple REGEX expression - asp.net

I'm trying to build a somewhat REGEX expression of the of only numbers including decimal with a maximum of 3 numbers to the right of the decimal (thousandths) and 50 to the left. Valid entries would like something like these.
1
1.0
.1
1.011
.011
1202938.123
1237923782.0
So far I have ^([0-9]*|\d*\.\d{1}?\d*){1,999}$.. Any help appreciated. Thanks.

I believe this should suffice:
^(?=.)\d{0,50}(?:\.\d{0,3})?$
See the regex demo. Note this will also match 1., if this is undesired change \d{0,3} to \d{1,3}. Similarely, this regex will match .5 (with no integer part), if you dont want this then use \d{1,50} instead of \d{0,50}.

You could try:
^(?=.+)\d{0,50}(?:\.\d{1,3})?$
Demonstration here at regex101.com
Explanation -
^ tells the regex that the match will begin at the start of the string,
\d{0, 50} matches 0 - 50 digits,
(?=.+) is a positive look-ahead, that tells the regex that the matching should only start if the line contains some characters in it (as rightly pointed out in the comments!),
(?:\.\d{1,3})? matches an optional dot (.), followed by 1 - 3 digits,
$ tells the regex that whatever it has matched so far will be followed by the end of the string.

Other way: You can check if the string isn't empty and if the dot is always followed by digits, putting a word-boundary at a strategic place:
^\d{0,50}\.?\b\d{0,3}$
As you can see, all is optional in the pattern except the word-boundary that does the magic.
demo

Related

Extract up to two more digits

This may be a very simple question but I have not much experience with regex expressions. This page is a good source of regex expressions but could not figure out how to include them into my following code:
data %>% filter(grepl("^A01H1", icl))
Question
I would like to extract the values in one column of my data frame starting with this A01H1 up to 2 more digits, for example A01H100, A01H140, A01H110. I could not find a solution despite my few attempts:
Attempts
I looked at this question from which I used ^A01H1[0-9].{2} to select up tot two more digits.
I tried with adding any character ^A01H1[0-9][0-9][x-y] to stop after two digits.
Any help would be much appreciated :)
You can use "^A01H1\\d{1,2}$".
The first part ("^A01H1"), you figured out yourself, so what are we doing in the second part ("\\d{1,2}$")?
\d includes all digits and is equivalent to [0-9], since we are working in R you need to escape \ and thus we use \\d
{1,2} indicates we want to have 1 or 2 matches of \\d
$ specifies the end of the string, so nothing should come afterwards and this prevents to match more than 2 digits
It looks as if you want to match a part of a string that starts with A01H1, then contains 1 or 2 digits and then is not followed with any digit.
You may use
^A01H1\d{1,2}(?!\d)
See the regex demo. If there can be no text after two digits at all, replace (?!\d) with $.
Details
^ - start of strinmg
A01H1 - literal string
\d{1,2} - one to two digits
(?!\d) - no digit allowed immediately to the right
$ - end of string
In R, you could use it like
grepl("^A01H1\\d{1,2}(?!\\d)", icl, perl=TRUE)
Or, with the string end anchor,
grepl("^A01H1\\d{1,2}$", icl)
Note the perl=TRUE is only necessary when using PCRE specific syntax like (?!\d), a negative lookahead.

How to match more than one ending character? [duplicate]

I try to find a regex that matches the string only if the string does not end with at least three '0' or more. Intuitively, I tried:
.*[^0]{3,}$
But this does not match when there one or two zeroes at the end of the string.
If you have to do it without lookbehind assertions (i. e. in JavaScript):
^(?:.{0,2}|.*(?!000).{3})$
Otherwise, use hsz's answer.
Explanation:
^ # Start of string
(?: # Either match...
.{0,2} # a string of up to two characters
| # or
.* # any string
(?!000) # (unless followed by three zeroes)
.{3} # followed by three characters
) # End of alternation
$ # End of string
You can try using a negative look-behind, i.e.:
(?<!000)$
Tests:
Test Target String Matches
1 654153640 Yes
2 5646549800 Yes
3 848461158000 No
4 84681840000 No
5 35450008748 Yes
Please keep in mind that negative look-behinds aren't supported in every language, however.
What wrong with the no-look-behind, more general-purpose ^(.(?!.*0{3,}$))*$?
The general pattern is ^(.(?!.* + not-ending-with-pattern + $))*$. You don't have to reverse engineer the state machine like Tim's answer does; you just insert the pattern you don't want to match at the end.
This is one of those things that RegExes aren't that great at, because the string isn't very regular (whatever that means). The only way I could come up with was to give it every possibility.
.*[^0]..$|.*.[^0].$|.*..[^0]$
which simplifies to
.*([^0]|[^0].|[^0]..)$
That's fine if you only want strings not ending in three 0s, but strings not ending in ten 0s would be long. But thankfully, this string is a bit more regular than some of these sorts of combinations, and you can simplify it further.
.*[^0].{0,2}$

sub command to extract data and split data frame column [duplicate]

Simple regex question. I have a string on the following format:
this is a [sample] string with [some] special words. [another one]
What is the regular expression to extract the words within the square brackets, ie.
sample
some
another one
Note: In my use case, brackets cannot be nested.
You can use the following regex globally:
\[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
(?<=\[).+?(?=\])
Will capture content without brackets
(?<=\[) - positive lookbehind for [
.*? - non greedy match for the content
(?=\]) - positive lookahead for ]
EDIT: for nested brackets the below regex should work:
(\[(?:\[??[^\[]*?\]))
This should work out ok:
\[([^]]+)\]
Can brackets be nested?
If not: \[([^]]+)\] matches one item, including square brackets. Backreference \1 will contain the item to be match. If your regex flavor supports lookaround, use
(?<=\[)[^]]+(?=\])
This will only match the item inside brackets.
To match a substring between the first [ and last ], you may use
\[.*\] # Including open/close brackets
\[(.*)\] # Excluding open/close brackets (using a capturing group)
(?<=\[).*(?=\]) # Excluding open/close brackets (using lookarounds)
See a regex demo and a regex demo #2.
Use the following expressions to match strings between the closest square brackets:
Including the brackets:
\[[^][]*] - PCRE, Python re/regex, .NET, Golang, POSIX (grep, sed, bash)
\[[^\][]*] - ECMAScript (JavaScript, C++ std::regex, VBA RegExp)
\[[^\]\[]*] - Java, ICU regex
\[[^\]\[]*\] - Onigmo (Ruby, requires escaping of brackets everywhere)
Excluding the brackets:
(?<=\[)[^][]*(?=]) - PCRE, Python re/regex, .NET (C#, etc.), JGSoft Software
\[([^][]*)] - Bash, Golang - capture the contents between the square brackets with a pair of unescaped parentheses, also see below
\[([^\][]*)] - JavaScript, C++ std::regex, VBA RegExp
(?<=\[)[^\]\[]*(?=]) - Java regex, ICU (R stringr)
(?<=\[)[^\]\[]*(?=\]) - Onigmo (Ruby, requires escaping of brackets everywhere)
NOTE: * matches 0 or more characters, use + to match 1 or more to avoid empty string matches in the resulting list/array.
Whenever both lookaround support is available, the above solutions rely on them to exclude the leading/trailing open/close bracket. Otherwise, rely on capturing groups (links to most common solutions in some languages have been provided).
If you need to match nested parentheses, you may see the solutions in the Regular expression to match balanced parentheses thread and replace the round brackets with the square ones to get the necessary functionality. You should use capturing groups to access the contents with open/close bracket excluded:
\[((?:[^][]++|(?R))*)] - PHP PCRE
\[((?>[^][]+|(?<o>)\[|(?<-o>]))*)] - .NET demo
\[(?:[^\]\[]++|(\g<0>))*\] - Onigmo (Ruby) demo
If you do not want to include the brackets in the match, here's the regex: (?<=\[).*?(?=\])
Let's break it down
The . matches any character except for line terminators. The ?= is a positive lookahead. A positive lookahead finds a string when a certain string comes after it. The ?<= is a positive lookbehind. A positive lookbehind finds a string when a certain string precedes it. To quote this,
Look ahead positive (?=)
Find expression A where expression B follows:
A(?=B)
Look behind positive (?<=)
Find expression A where expression B
precedes:
(?<=B)A
The Alternative
If your regex engine does not support lookaheads and lookbehinds, then you can use the regex \[(.*?)\] to capture the innards of the brackets in a group and then you can manipulate the group as necessary.
How does this regex work?
The parentheses capture the characters in a group. The .*? gets all of the characters between the brackets (except for line terminators, unless you have the s flag enabled) in a way that is not greedy.
Just in case, you might have had unbalanced brackets, you can likely design some expression with recursion similar to,
\[(([^\]\[]+)|(?R))*+\]
which of course, it would relate to the language or RegEx engine that you might be using.
RegEx Demo 1
Other than that,
\[([^\]\[\r\n]*)\]
RegEx Demo 2
or,
(?<=\[)[^\]\[\r\n]*(?=\])
RegEx Demo 3
are good options to explore.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Test
const regex = /\[([^\]\[\r\n]*)\]/gm;
const str = `This is a [sample] string with [some] special words. [another one]
This is a [sample string with [some special words. [another one
This is a [sample[sample]] string with [[some][some]] special words. [[another one]]`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Source
Regular expression to match balanced parentheses
(?<=\[).*?(?=\]) works good as per explanation given above. Here's a Python example:
import re
str = "Pagination.go('formPagination_bottom',2,'Page',true,'1',null,'2013')"
re.search('(?<=\[).*?(?=\])', str).group()
"'formPagination_bottom',2,'Page',true,'1',null,'2013'"
The #Tim Pietzcker's answer here
(?<=\[)[^]]+(?=\])
is almost the one I've been looking for. But there is one issue that some legacy browsers can fail on positive lookbehind.
So I had to made my day by myself :). I manged to write this:
/([^[]+(?=]))/g
Maybe it will help someone.
console.log("this is a [sample] string with [some] special words. [another one]".match(/([^[]+(?=]))/g));
if you want fillter only small alphabet letter between square bracket a-z
(\[[a-z]*\])
if you want small and caps letter a-zA-Z
(\[[a-zA-Z]*\])
if you want small caps and number letter a-zA-Z0-9
(\[[a-zA-Z0-9]*\])
if you want everything between square bracket
if you want text , number and symbols
(\[.*\])
This code will extract the content between square brackets and parentheses
(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))
(?: non capturing group
(?<=\().+?(?=\)) positive lookbehind and lookahead to extract the text between parentheses
| or
(?<=\[).+?(?=\]) positive lookbehind and lookahead to extract the text between square brackets
In R, try:
x <- 'foo[bar]baz'
str_replace(x, ".*?\\[(.*?)\\].*", "\\1")
[1] "bar"
([[][a-z \s]+[]])
Above should work given the following explaination
characters within square brackets[] defines characte class which means pattern should match atleast one charcater mentioned within square brackets
\s specifies a space
 + means atleast one of the character mentioned previously to +.
I needed including newlines and including the brackets
\[[\s\S]+\]
If someone wants to match and select a string containing one or more dots inside square brackets like "[fu.bar]" use the following:
(?<=\[)(\w+\.\w+.*?)(?=\])
Regex Tester

Using Regex OR operator to solve 2 conditions

I am trying to combine 2 regular expressions into 1 with the OR operator: |
I have one that checks for match of a letter followed by 8 digits:
Regex.IsMatch(s, "^[A-Z]\d{8}$")
I have another that checks for simply 9 digits:
Regex.IsMatch(s, "^\d{9}$")
Now, Instead of doing:
If Not Regex.IsMatch(s, "^[A-Z]\d{8}$") AndAlso
Not Regex.IsMatch(s, "^\d{9}$") Then
...
End If
I thought I could simply do:
If Not Regex.IsMatch(s, "^[A-Z]\d{8}|\d{9}$") Then
...
End If
Apparently I am not combining the two correctly and apparently I am horrible at regular expressions. Any help would be much appreciated.
And for those wondering, I did take a glance at How to combine 2 conditions and more in regex and I am still scratching my head.
The | operator has a high precedence and in your original regex will get applied first. You should be combining the two regex's w/ grouping parentheses to make the precedence clear. As in:
"^(([A-Z]\d{8})|(\d{9}))$"
How about using ^[A-Z0-9]\d{8}$ ?
I think you want to group the conditions:
Regex.IsMatch(s, "^(([A-Z]\d{8})|(\d{9}))$")
The ^ and $ represent the beginning and end of the line, so you don't want them considered in the or condition. The parens allow you to be explicit about "everything in this paren" or "anything in this other paren"
#MikeC's offering seems the best:
^[A-Z0-9]\d{8}$
...but as to why your expression is not working the way you might expect, you have to understand that the | "or" or "alternation" operator has a very high precedence - the only higher one is the grouping construct, I believe. If you use your example:
^[A-Z]\d{8}|\d{9}$
...you're basically saying "match beginning of string, capital letter, then 8 digits OR match 9 digits then end of string" -- if, instead you mean "match beginning of string, then a capital letter followed by 8 digits then the end of string OR the beginning of the string followed by 9 digits, then the end of string", then you want one of these:
^([A-Z]\d{8}|\d{9})$
^[A-Z]\d{8}$|^\d{9}$
Hope this is helpful for your understanding
I find the OR operator a bit weird sometimes as well, what I do I use groups to denote which sections I want to match, so your regex would become something like so: ^(([A-Z]\d{8})|(\d{9}))$

Need help with a regex

Hi I'm trying to right a regular expression that will take a string and ensure it starts with an 'R' and is followed by 4 numeric digits then anything
eg. RXXXX.................
Can anybody help me with this? This is for ASP.NET
You want it to be at the beginning of the line, not anywhere. Also, for efficiency, you dont want the .+ or .* at the end because that will match unnecessary characters. So the following regex is what you really want:
^R\d{4}
This should do it...
^R\d{4}.*$
\d{4} matches 4 digits
.* is simply a way to match any character 0 or more times
the beginning ^ and end $ anchors ensure that nothing precedes or follows
As Vincent suggested, for your specific task it could even be simplified to this...
^R\d{4}
Because as you stated, it doesn't really matter what follows.
/^R\d{4}.*/ and set the case insensitive option unless you only want capital R's
^R\d{4}.*
The caret ^ matches the position before the first character in the string.
\d matches any numeric character (it's the same as [0-9])
{4} indicates that there must be exactly 4 numbers, and
.* matches 0 or more other characters
To use:
string input = "R0012 etc..";
Match match = Regex.Match(input, #"^R\d{4}.*", RexOptions.IgnoreCase);
if (match.Success)
{
// Success!
}
Note the use of RexOptions.IgnoreCase to ignore the case of the letter R (so it'll match strings which start with r. Leave this out if you don't want to undertake a case insensitive match.

Resources