regular expression in Teradata query - teradata

I have below input strings:
string1:
xyx;;;;str1=P1:P2|str2=1/3|str3=s1:s2
string2:
mzn;;;;str1 = P3:P4 | str2 = 2/5
result expected:
for string1:
str1_val=P1:P2
str2_val=1/3
for string2:
str1_val=P3:P4
str2_val=2/5
I tried with
str1_val= REGEXP_SUBSTR('xyx;;;;str1=P1:P2|strt2=1/3|str3=s1:s2', '(?<=str1=)(.?)(?=|)') - working fine
str2_val=REGEXP_SUBSTR('xyx;;;;str1=P1:P2|str2=1/3|str3=s1:s2', '(?<=str2=)(.?)(?=|)') - working fine
working fine for string1 but not working for string2.
Please help one way which will work for both the case

You need to add optional spaces, but the lookbehind only allows fixed length matches. But \K is similar, it resets the start of the match, i.e. forget the previous match:
REGEXP_SUBSTR(s,'str1\s*=\s*\K([^|]+)')
\s* = optional whitespace
\K = reset start of match
([^|]+) = any char but a |
See RegEx101

Related

filter paths that have only one "/" in R using regular expression

I have a vector of different paths such as
levs<-c( "20200507-30g_25d" , "20200507-30g_25d/ggg" , "20200507-30g_25d/grn", "20200507-30g_25d/ylw", "ggg" , "grn", "tre_livelli", "tre_livelli/20200507-30g_25d", "tre_livelli/20200507-30g_25d/ggg", "tre_livelli/20200507-30g_25d/grn", "tre_livelli/20200507-30g_25d/ylw" , "ylw" )
which is actually the output of a list.dirs with recursive set to TRUE.
I want to identify only the paths which have just one subfolder (that is "20200507-30g_25d/ggg" , "20200507-30g_25d/grn", "20200507-30g_25d/ylw").
I thought to filter the vector to find only those paths that have only one "/" and then compare the this with the ones that have more than one "/" to get rid of the partial paths.
I tried with regular expression such as:
rep(levs,pattern='/{1}', value=T)
but I get this:
"20200507-30g_25d/ggg" "20200507-30g_25d/grn" "20200507-30g_25d/ylw" "tre_livelli/20200507-30g_25d" "tre_livelli/20200507-30g_25d/ggg" "tre_livelli/20200507-30g_25d/grn" "tre_livelli/20200507-30g_25d/ylw"
Any idea on how to proceed?
/{1} is a regex that is equal to / and just matches a / anywhere in a string, and there can be more than one / inside it. Please have a look at the regex tag page:
Using {1} as a single-repetition quantifier is harmless but never useful. It is basically an indication of inexperience and/or confusion.
h{1}t{1}t{1}p{1} matches the same string as the simpler expression http (or ht{2}p for that matter) but as you can see, the redundant {1} repetitions only make it harder to read.
You can use
grep(levs, pattern="^[^/]+/[^/]+$", value=TRUE)
# => [1] "20200507-30g_25d/ggg" "20200507-30g_25d/grn" "20200507-30g_25d/ylw" "tre_livelli/20200507-30g_25d"
See the regex demo:
^ - matches the start of string
[^/]+- one or more chars other than /
/ - a / char
[^/]+- one or more chars other than /
$ - end of string.
NOTE: if the parts before or after the only / in the string can be empty, replace + with *: ^[^/]*/[^/]*$.
An option with str_count to count the number of instances of /
library(stringr)
levs[str_count(levs, "/") == 1 ]
-ouptut
[1] "20200507-30g_25d/ggg" "20200507-30g_25d/grn"
[3] "20200507-30g_25d/ylw" "tre_livelli/20200507-30g_25d"

Remove all whitespace from string AX 2012

PurchPackingSlipJournalCreate class -> initHeader method have a line;
vendPackingSlipJour.PackingSlipId = purchParmTable.Num;
but i want when i copy and paste ' FDG 2020 ' (all blanks are tab character) in Num area and click okey, write this value as 'FDG2020' in the PackagingSlipId field of the vendPackingSlipJour table.
I tried -> vendPackingSlipJour.PackingSlipId = strRem(purchParmTable.Num, " ");
but doesn't work for tab character.
How can i remove all whitespace characters from string?
Version 1
Try the strAlpha() function.
From the documentation:
Copies only the alphanumeric characters from a string.
Version 2
Because version 1 also deletes allowed hyphens (-), you could use strKeep().
From the documentation:
Builds a string by using only the characters from the first input string that the second input string specifies should be kept.
This will require you to specify all desired characters, a rather long list...
Version 3
Use regular expressions to replace any unwanted characters (defined as "not a wanted character"). This is similar to version 2, but the list of allowed characters can be expressed a lot shorter.
The example below allows alphanumeric characters(a-z,A-Z,0-9), underscores (_) and hyphens (-). The final value for newText is ABC-12_3.
str badCharacters = #"[^a-zA-Z0-9_-]"; // so NOT an allowed character
str newText = System.Text.RegularExpressions.Regex::Replace(' ABC-12_3 ', badCharacters, '');
Version 4
If you know the only unwanted characters are tabs ('\t'), then you can go hunting for those specifically as well.
vendPackingSlipJour.PackingSlipId = strRem(purchParmTable.Num, '\t');

REGEX: Remove middle of string after certain number of "/"

How do I remove the middle of a string using regex. I have the following url:
https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/0001347185-17-000016-index.htm/exh1025730032017.xml
but I want it to look like this:
https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/exh1025730032017.xml
I can get rid of everything after "data/../../"
That last long string of numbers isnt needed
I tried this
sub(sprintf("^((?:[^/]*;){8}).*"),"", URLxml)
But it doesnt do anything! Help please!
To remove the last but one subpart of the path, you may use
x <- "https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/0001347185-17-000016-index.htm/exh1025730032017.xml"
sub("^(.*/).*/(.*)", "\\1\\2", x)
## [1] "https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/exh1025730032017.xml"
See the online R demo and here is a regex demo.
Details:
^ - start of a string
(.*/) - Group 1 (referred to with \1 from the replacement string) any 0+ chars up to the last but one /
.*/ - any 0+ chars up to the last /
(.*) - Group 2 (referred to with \2 backreference from the replacement string) any 0+ chars up to the end.
a<-'https://www.sec.gov/Archives/edgar/data/1347185/000134718517000016/0001347185-17-000016-index.htm/exh1025730032017.xml'
gsub('data/(.+?)/(.+?)/(.+?)/','data/\\1/\\2/',a)
so in the url:
data/.../.../..(this is removed)../ ....

Regex for "Characters Numbers"

I need a Regex that matches these Strings:
Test 1
Test 123
Test 1.1 (not required but would be neat)
Test
Test a
But not the following:
Test 1a
I don't know how this pattern should look like that it allows text or whitespace at the end but not if there is a number before.
I tried this one
^.*([0-9])$ (matches only Test 1, but not for example Test or Test a)
and this one
^.*[0-9].$ (matches only Test 1a, but not for example Test or Test 1)
but they don't match what I need.
This is working for all cases you provided
^\w+(\s(\d+(\.\d+)?|[a-z]))?$
Regex Demo
Regex Breakdown
^ #Start of string
\w+ #Match any characters until next space or end of string
(\s #Match a whitespace
(
\d+ #Match any set of digits
(\.\d+)? #Digits after decimal(optional)
| #Alternation(OR)
[a-z] #Match any character
)
)? #Make it optional
$ #End of string
If you also want to include capital letters, then you can use
^\w+(\s(\d+(\.\d+)?|[A-Za-z]))?$
Try with
^\w+\s+((\d+\.\d+)|(\d+)|([^\d^\s]\w+))?\s*$
Another pattern for you to try:
^(Test(?:$|\s(?:\d$|[a-z]$|\d{3}|\d\.\d$)))
LIVE DEMO.
As per your strings in your question (and your comments):
^\w+(\s[a-z]|\s\d+(\.\d+)?)?$

Is it possible to turn off case insensitivity using pattern only?

Regex has set option IgnoreCase. Is it possible to turn off case insensitivity using pattern only (like negation of (?i))?
In example below, find pattern for which result would be "aBaaaBBaaB".
string pattern = "???";
string input = "aAaaaAAaaA";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var result = regex.Replace(input, "B");
You can turn off options inline by using - before the option. E.g. the negation of (?i) is (?-i):
a minus sign (-) before an option or set of options turns those options off. For example, (?i-mn) turns case-insensitive matching (i) on, turns multiline mode (m) off, and turns unnamed group captures (n) off.

Resources