Regex for "Characters Numbers" - asp.net

I need a Regex that matches these Strings:
Test 1
Test 123
Test 1.1 (not required but would be neat)
Test
Test a
But not the following:
Test 1a
I don't know how this pattern should look like that it allows text or whitespace at the end but not if there is a number before.
I tried this one
^.*([0-9])$ (matches only Test 1, but not for example Test or Test a)
and this one
^.*[0-9].$ (matches only Test 1a, but not for example Test or Test 1)
but they don't match what I need.

This is working for all cases you provided
^\w+(\s(\d+(\.\d+)?|[a-z]))?$
Regex Demo
Regex Breakdown
^ #Start of string
\w+ #Match any characters until next space or end of string
(\s #Match a whitespace
(
\d+ #Match any set of digits
(\.\d+)? #Digits after decimal(optional)
| #Alternation(OR)
[a-z] #Match any character
)
)? #Make it optional
$ #End of string
If you also want to include capital letters, then you can use
^\w+(\s(\d+(\.\d+)?|[A-Za-z]))?$

Try with
^\w+\s+((\d+\.\d+)|(\d+)|([^\d^\s]\w+))?\s*$

Another pattern for you to try:
^(Test(?:$|\s(?:\d$|[a-z]$|\d{3}|\d\.\d$)))
LIVE DEMO.

As per your strings in your question (and your comments):
^\w+(\s[a-z]|\s\d+(\.\d+)?)?$

Related

filter paths that have only one "/" in R using regular expression

I have a vector of different paths such as
levs<-c( "20200507-30g_25d" , "20200507-30g_25d/ggg" , "20200507-30g_25d/grn", "20200507-30g_25d/ylw", "ggg" , "grn", "tre_livelli", "tre_livelli/20200507-30g_25d", "tre_livelli/20200507-30g_25d/ggg", "tre_livelli/20200507-30g_25d/grn", "tre_livelli/20200507-30g_25d/ylw" , "ylw" )
which is actually the output of a list.dirs with recursive set to TRUE.
I want to identify only the paths which have just one subfolder (that is "20200507-30g_25d/ggg" , "20200507-30g_25d/grn", "20200507-30g_25d/ylw").
I thought to filter the vector to find only those paths that have only one "/" and then compare the this with the ones that have more than one "/" to get rid of the partial paths.
I tried with regular expression such as:
rep(levs,pattern='/{1}', value=T)
but I get this:
"20200507-30g_25d/ggg" "20200507-30g_25d/grn" "20200507-30g_25d/ylw" "tre_livelli/20200507-30g_25d" "tre_livelli/20200507-30g_25d/ggg" "tre_livelli/20200507-30g_25d/grn" "tre_livelli/20200507-30g_25d/ylw"
Any idea on how to proceed?
/{1} is a regex that is equal to / and just matches a / anywhere in a string, and there can be more than one / inside it. Please have a look at the regex tag page:
Using {1} as a single-repetition quantifier is harmless but never useful. It is basically an indication of inexperience and/or confusion.
h{1}t{1}t{1}p{1} matches the same string as the simpler expression http (or ht{2}p for that matter) but as you can see, the redundant {1} repetitions only make it harder to read.
You can use
grep(levs, pattern="^[^/]+/[^/]+$", value=TRUE)
# => [1] "20200507-30g_25d/ggg" "20200507-30g_25d/grn" "20200507-30g_25d/ylw" "tre_livelli/20200507-30g_25d"
See the regex demo:
^ - matches the start of string
[^/]+- one or more chars other than /
/ - a / char
[^/]+- one or more chars other than /
$ - end of string.
NOTE: if the parts before or after the only / in the string can be empty, replace + with *: ^[^/]*/[^/]*$.
An option with str_count to count the number of instances of /
library(stringr)
levs[str_count(levs, "/") == 1 ]
-ouptut
[1] "20200507-30g_25d/ggg" "20200507-30g_25d/grn"
[3] "20200507-30g_25d/ylw" "tre_livelli/20200507-30g_25d"

regular expression in Teradata query

I have below input strings:
string1:
xyx;;;;str1=P1:P2|str2=1/3|str3=s1:s2
string2:
mzn;;;;str1 = P3:P4 | str2 = 2/5
result expected:
for string1:
str1_val=P1:P2
str2_val=1/3
for string2:
str1_val=P3:P4
str2_val=2/5
I tried with
str1_val= REGEXP_SUBSTR('xyx;;;;str1=P1:P2|strt2=1/3|str3=s1:s2', '(?<=str1=)(.?)(?=|)') - working fine
str2_val=REGEXP_SUBSTR('xyx;;;;str1=P1:P2|str2=1/3|str3=s1:s2', '(?<=str2=)(.?)(?=|)') - working fine
working fine for string1 but not working for string2.
Please help one way which will work for both the case
You need to add optional spaces, but the lookbehind only allows fixed length matches. But \K is similar, it resets the start of the match, i.e. forget the previous match:
REGEXP_SUBSTR(s,'str1\s*=\s*\K([^|]+)')
\s* = optional whitespace
\K = reset start of match
([^|]+) = any char but a |
See RegEx101

R regex match whole word taking punctuation into account

I'm in R. I want to match whole words in text, taking punctuation into account.
Example:
to_match = c('eye','nose')
text1 = 'blah blahblah eye-to-eye blah'
text2 = 'blah blahblah eye blah'
I would like eye to be matched in text2 but not in text1.
That is, the command:
to_match[sapply(paste0('\\<',to_match,'\\>'),grepl,text1)]
should return character(0). But right now, it returns eye.
I also tried with '\\b' instead of '\\<', with no success.
UseĀ 
to_match[sapply(paste0('(?:\\s|^)',to_match,'(?:\\s|$)'),grepl,text1)]
The point is that word boundaries match between a word and a nonword chars, that is why you had a match in eye-to-eye. You want to match only in between start or end of string and whitespace.
In a TRE regex, this is better done with groups as this regex library does not support lookarounds and you just need to test a string for a single pattern match to return true or false.
The (?:\s|^) noncapturing group matches any whitespace or start of string and (?:\s|$) matches whitespace or end of string.

REGEX: Remove middle of string after certain number of "/"

How do I remove the middle of a string using regex. I have the following url:
https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/0001347185-17-000016-index.htm/exh1025730032017.xml
but I want it to look like this:
https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/exh1025730032017.xml
I can get rid of everything after "data/../../"
That last long string of numbers isnt needed
I tried this
sub(sprintf("^((?:[^/]*;){8}).*"),"", URLxml)
But it doesnt do anything! Help please!
To remove the last but one subpart of the path, you may use
x <- "https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/0001347185-17-000016-index.htm/exh1025730032017.xml"
sub("^(.*/).*/(.*)", "\\1\\2", x)
## [1] "https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/exh1025730032017.xml"
See the online R demo and here is a regex demo.
Details:
^ - start of a string
(.*/) - Group 1 (referred to with \1 from the replacement string) any 0+ chars up to the last but one /
.*/ - any 0+ chars up to the last /
(.*) - Group 2 (referred to with \2 backreference from the replacement string) any 0+ chars up to the end.
a<-'https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/0001347185-17-000016-index.htm/exh1025730032017.xml'
gsub('data/(.+?)/(.+?)/(.+?)/','data/\\1/\\2/',a)
so in the url:
data/.../.../..(this is removed)../ ....

Split a string by a plus sign (+) character

I have a string in a data frame as: "(1)+(2)"
I want to split with delimiter "+" such that I get one element as (1) and other as (2), hence preserving the parentheses. I used strsplit but it does not preserve the parenthesis.
Use
strsplit("(1)+(2)", "\\+")
or
strsplit("(1)+(2)", "+", fixed = TRUE)
The idea of using strsplit("(1)+(2)", "+") doesn't work since unless specified otherwise, the split argument is a regular expression, and the + character is special in regex. Other characters that also need extra care are
?
*
.
^
$
\
|
{ }
[ ]
( )
Below Worked for me:
import re
re.split('\\+', 'ABC+CDE')
Output:
['ABC', 'CDE']

Resources