Check if character is number - r

I want to check if a character can be safely converted to a numeric by using a regex.
However, I don't see my error. Example:
stringr::str_detect("4.", pattern = "-{0,1}[0-9]+(.[0-9]+){0,1}")
This produces a TRUE. My intention was to specifiy that whenever a . follows the first sequence of numbers, there must be at least one other number, therefore (.[0-9]+){0,1}.
What's wrong here?

Note:
(.[0-9]+){0,1} is an optional pattern because {0,1} (=?) makes the .[0-9]+ pattern sequence match one or zero times. So, yes, one or more digits ([0-9]+) must follow any char other than line break chars (matched with an unescaped .), but this pattern is optional, and thus you cannot require anything with it.
. is unescaped, so it matches any char other than line break chars. Escape it to match a literal dot
Your regex is not anchored, and can match partial substrings in a longer string. Use ^ and $ to make the pattern match the whole string.
So, consider using
stringr::str_detect("4.", pattern = "^-?[0-9]+(?:\\.[0-9]+)?$")
where
^ - start of string
-? - an optional - char
[0-9]+ - one or more digits
(?:\.[0-9]+)? - a non-capturing group matching an optional sequence of a . and then one or more digits
$ - end of string.

Related

R: How to use stringr to extract the substring as the output to mutate a column of strings that begins with a string pattern and end with a number?

I'm creating a small example to be put into mutate(). Not sure why this doesn't work.
> str_extract("rs1234-<b>C</b>","^rs*\\d$")
[1] NA
I'd be great if you can point to my misunderstanding of the language instead of merely providing a solution. I expect to get "rs1234".
The ^rs*\d$ regex matches
^ - start of string
rs* - r and zero or more occurrences of s char
\d - a digit
$ - end of string.
So, your pattern matches strings like rsssss1, r3, etc.
You need
str_extract("rs1234-<b>C</b>", "^rs\\d+")
where ^rs\d+ matches rs at the start of string and then one or more digits. See this regex demo.
But if I just want the substring in between "rs" and the last number. What should I do?
You would use rs.*\d:
str_extract("rs1234-<b>C</b>", "rs.*\\d")
where rs.*\d matches rs, then any zero or more chars other than line break chars as many as possible and then a digit.
NOTE: If you need to match line endings, too, you need to prepend the last pattern with (?s) inline DOTALL modifier.
See this regex demo.

REGEX pattern match in R for Course number

I need to identify matching course number that have xx.3xxxxxx.
These are some examples of the course numbers.
26.3730004
27.0210000
26.3730009
26.7114001
23.9610071
26.0A34430
23.3670005
26.0B05430
I tried many patterns one example I used is the pattern below. It did not get any match.
"[^0-9]{2}\Q.\E3[^0-9]+$"
I tried using grep and grepl. I actually need the code to return indexes.
This code shows my attempt to tag the rows that have matches.
Teacher$virtual[
which(
grepl("[^0-9]{2}\\Q.\\E3[^0-9]+$",Teacher$CourseNumber))]
<- "1"
I need to remove any row from my dataframe that have the course number with that pattern. XX.3XXXXXX
But, my code did not find any match. Can you please help me?
You should use
grepl("^[0-9]{2}\\.3", Teacher$CourseNumber)
See the regex graph:
Details:
^ - start of a string
[0-9]{2} - two digits
\\. - a dot (note that a regex escape is a literal backslash, but inside a string literal, "...", a single backslash is used to form string escape sequences, hence the backslash must be double to obtain a literal backslash char necessary for a regex escape)
3 - a 3 char.
NOTE: If you want to use in-pattern quoting with \Q and \E (in between which all chars are treated literally) you need to use PCRE regex, add perl=TRUE and use
grepl("^[0-9]{2}\\Q.\\E3", Teacher$CourseNumber, perl=TRUE)
Now, the dot is treated as a literal dot, not a . metacharacter that matches any char but a line break char (in a PCRE regex, . does not match line break chars by default).
Here, this simple expression would likely cover that:
^[0-9]{2}\.[3].+$
which has a [3] boundary right after the .. It would probably work without start and end anchors:
[0-9]{2}\.[3].+
Demo
We can add or reduce the boundaries, if it'd be necessary.

How to remove the first character if it's a comma using QRegExp?

I have the following QRegExpValidator
QRegExpValidator doubleValidator = new QRegExpValidator(QRegExp("[-+]?[0-9]*[\\.,]?[0-9]+([eE][-+]?[0-9]+)?"));
It's supposed to be a Double numbers validator that accepts numbers, only one "e" sign, one comma OR dot and one + or - sign at the beggining of the string or after the "e" sign. It works for every case, except that it allows the string to start with a comma or dot. I tried to use [^\\.,] and variations and they did in fact work, but in this case, it would also allow to put two +- signs.
How can I make this to work?
The [-+]?[0-9]*[.,]?[0-9]+([eE][-+]?[0-9]+)? pattern allows the , or . at the start because [-+]? and [0-9]* can match empty strings due to the ? (one or zero occurrences) and * (zero or more occurrences) quantifiers, and then [.,] matches a single occurrence of . or ,. Besides, if the method you are using does not anchor the pattern by default, you also need ^ and $ anchors around the pattern.
I suggest fixing that with
"^[-+]?[0-9]+([.,][0-9]+)?([eE][-+]?[0-9]+)?$"
^ ^^^^^^^^^^^^^^ ^
Note you do not need to escape the dot inside a character class, [.] always matches a dot char only.
The [0-9]+([.,][0-9]+)? matches 1+ digits and then an optional sequence of a . or , followed with 1+ digits.

Regex to maintain matched parts

I would like to achieve this result : "raster(B04) + raster(B02) - raster(A10mB03)"
Therefore, I created this regex: B[0-1][0-9]|A[1,2,6]0m/B[0-1][0-9]"
I am now trying to replace all matches of the string "B04 + B02 - A10mB03" with gsub("B[0-1][0-9]]|[A[1,2,6]0mB[0-1][0-9]", "raster()", string)
How could I include the original values B01, B02, A10mB03?
PS: I also tried gsub("B[0-1][0-9]]|[A[1,2,6]0mB[0-1][0-9]", "raster(\\1)", string) but it did not work.
Basically, you need to match some text and re-use it inside a replacement pattern. In base R regex methods, there is no way to do that without a capturing group, i.e. a pair of unescaped parentheses, enclosing the whole regex pattern in this case, and use a \\1 replacement backreference in the replacement pattern.
However, your regex contains some issues: [A[1,2,6] gets parsed as a single character class that matches A, [, 1, ,, 2 or 6 because you placed a [ before A. Also, note that , inside character classes matches a literal comma, and it is not what you expected. Another, similar issue, is with [0-9]] - it matches any ASCII digit with [0-9] and then a ] (the ] char does not have to be escaped in a regex pattern).
So, a potential fix for you expression can look like
gsub("(B[0-1][0-9]|A[126]0mB[0-1][0-9])", "raster(\\1)", string)
Or even just matching 1 or more word chars (considering the sample string you supplied)
gsub("(\\w+)", "raster(\\1)", string)
might do.
See the R demo online.

Regular expression for one Alphabet and many numbers

I need a regular expression that check my string contains one alphabet+many digits.Eg, A546456 or B456 or N455,asp.net
I have tried
^[a-z|A-Z|]+[a-z|A-Z|0-9]+[0-9]*
and
[a-zA-Z0-9]+
A regular expression to validate if a string starts with a single letter A-Z in any case and has 1 or more digits and nothing else is: ^[A-Za-z]\d+$
Explanation:
^ ... beginning of string (or beginning of line in other context).
[A-Za-z] ... a character class definition for a single character (no multiplier appended) matching a letter in range A-Z in any case. \w cannot be used as it matches also an underscore and letters from other languages as for example German umlauts.
\d ... a digit which is equal the expression [0-9].
+ ... previous expression (any digit) 1 or more times.
$ ... end of string (or end of line in other context).
Be careful with a regular expression containing a backslash as used here for \d put into a string as the backslash character is in many programming/scripting languages also the escape character inside strings and must be therefore often escaped with one more backslash, i.e. "^[A-Za-z]\\d+$"

Resources