REGEXP_SUBSTR to return first and last segment - oracle11g

I have a dataset which may store an account number in several different variations. It may contain hyphens or spaces as segment separators, or it may be fully concatenated. My desired output is the first three and last 5 alphanumeric characters. I'm having problems with joining the two segments "FIRST_THREE_AND_LAST_FIVE:
with testdata as (select '1-23-456-78-90-ABCDE' txt from dual union all
select '1 23 456 78 90 ABCDE' txt from dual union all
select '1234567890ABCDE' txt from dual union all
select '123ABCDE' txt from dual union all
select '12DE' txt from dual)
select TXT
,regexp_replace(txt, '[^[[:alnum:]]]*',null) NO_HYPHENS_OR_SPACES
,regexp_substr(regexp_replace(txt, '[^[[:alnum:]]]*',null), '([[:alnum:]]){3}',1,1) FIRST_THREE
,regexp_substr(txt, '([[:alnum:]]){5}$',1,1) LAST_FIVE
,regexp_substr(regexp_replace(txt, '[^[[:alnum:]]]*',null), '([[:alnum:]]){3}',1,1) FIRST_THREE_AND_LAST_FIVE
from testdata;
My desired output would be:
FIRST_THREE_AND_LAST_FIVE
-------------------------
123ABCDE
123ABCDE
123ABCDE
123ABCDE
(null)

Here's my try. Note that when regexp_replace() does not find a match, the original string is returned, that's why you can't get a null directly. My thought was to see if the result string matched the original string but of course that would not work for line 4 where the result is correct and happens to match the original string. Others have mentioned methods for counting length, etc with a CASE but I would get more strict and check for the first 3 being numeric and the last 5 being alpha as well since just checking for 8 characters being returned doesn't guarantee they are the right 8 characters! I'll leave that up to the reader.
Anyway this looks for a digit followed by an optional dash or space (per the specs) and remembers the digit (3 times) then also remembers the last 5 alpha characters. It then returns the remembered groups in that order.
I highly recommend you make this a function where you pass your string in and get a cleaned string in return as it will be much easier to maintain, encapsulate this code for re-usability and allow for better error checking using PL/SQL code.
SQL> with testdata(txt) as (
2 select '1-23-456-78-90-ABCDE' from dual
3 union
4 select '1 23 456 78 90 ABCDE' from dual
5 union
6 select '1234567890ABCDE' from dual
7 union
8 select '123ABCDE' from dual
9 union
10 select '12DE' from dual
11 )
12 select
13 case when length(regexp_replace(upper(txt), '^(\d)[- ]?(\d)[- ]?(\d)[- ]?.*([A-Z]{5})$', '\1\2\3\4')) < 8
14 -- Needs more robust error checking here
15 THEN 'NULL' -- for readability
16 else regexp_replace(upper(txt), '^(\d)[- ]?(\d)[- ]?(\d)[- ]?.*([A-Z]{5})$', '\1\2\3\4')
17 end result
18 from testdata;
RESULT
--------------------------------------------------------------------------------
123ABCDE
123ABCDE
123ABCDE
123ABCDE
NULL
SQL>

You can use the fact that the position parameter of REGEXP_REPLACE() can take back-references to get a lot closer. Wrapped in a CASE statement you get what you're after:
select case when length(regexp_replace(txt, '[^[:alnum:]]')) >= 8 then
regexp_replace( regexp_replace(txt, '[^[:alnum:]]')
, '^([[:alnum:]]{3}).*([[:alnum:]]{5})$'
, '\1\2')
end
from test_data
This is, where the length of the string with all non-alpha-numeric characters replaced is greater or equal to 8 return the 1st and 2nd groups, which are respectively the first 3 and last 8 alpha-numeric characters.
This feels... overly complex. Once you've replaced all non-alpha-numeric characters you can just use an ordinary SUBSTR():
with test_data as (
select '1-23-456-78-90-ABCDE' txt from dual union all
select '1 23 456 78 90 ABCDE' txt from dual union all
select '1234567890ABCDE' txt from dual union all
select '123ABCDE' txt from dual union all
select '12DE' txt from dual
)
, standardised as (
select regexp_replace(txt, '[^[:alnum:]]') as txt
from test_data
)
select case when length(txt) >= 8 then substr(txt, 1, 3) || substr(txt, -5) end
from standardised

I feel like I'm missing something, but can't you just concatenate your two working columns? I.e., since you have successful regex for first 3 and last 5, just replace FIRST_THREE_AND_LAST_FIVE with:
regexp_substr(regexp_substr(regexp_replace(txt, '[^[[:alnum:]]]*',null), '([[:alnum:]]){3}',1,1)||regexp_substr(txt, '([[:alnum:]]){5}$',1,1),'([[:alnum:]]){5}',1,1)
EDIT: Added regexp_substr wrapper to return null when required

Related

trying to understand how oracle REGEXP_REPLACE work

Need help being new to "REGEXP_REPLACE".
When I do
SELECT REGEXP_REPLACE('7ELEVEN USA','[(\D^USA|^CANADA|^Canada)]','') "NAME" from dual
I get 7ELEVE and you can see that last character N is missing.
I want to replace first numbers from below & display 7-ELEVEN STORE.
20991 7-ELEVEN STORE
Any help is greatly appreciated.
Thanking in advance
I want to replace first numbers from below & display 7-ELEVEN STORE.
20991 7-ELEVEN STORE
Well, you don't even need regular expressions for that - the good, old SUBSTR + INSTR does the job just fine (that's RES2). If you want regexp, then this pattern: ^\d+ does it - it says:
^ anchor to the beginning of the string
\d+ take all digits there are (up to the first non-digit character, which is the space)
An example:
SQL> with test (col) as
2 (select '20991 7-ELEVEN STORE' from dual)
3 select
4 regexp_replace(col, '^\d+') res1,
5 substr(col, instr(col, ' ') + 1) res2
6 from test;
RES1 RES2
--------------- --------------
7-ELEVEN STORE 7-ELEVEN STORE
SQL>
[EDIT]
As of the first query you posted (I didn't understand it was the question): if you want to select the first "word" from that string, I wouldn't use REGEXP_REPLACE but (REGEXP_)SUBSTR:
SQL> with test (col) as
2 (select '7ELEVEN USA' from dual)
3 select regexp_substr(col, '\w+') res1,
4 substr(col, 1, instr(col, ' ') - 1) res2
5 from test;
RES1 RES2
------- -------
7ELEVEN 7ELEVEN
SQL>

How to replace with zero after full-stop if not have any value using regexp_substr in oracle

Values are like:
Num(column)
786.56
35
select num,regexp_substr(num,'[^.]*') "first",regexp_substr(num,'[^.]+$') "second" from cost
when i execute the above query output will be like
num first second
786.56 786 56
35 35 35
I want to print zero if not have any value after full-stop,by default second column repeating first value
There are two options here; using either the occurrence or subexpression parameters available in REGEXP_SUBSTR().
Subexpression - the 5th parameter
Using subexpressions you can pick out which group () in your match you want to return in any given function call
SQL> with the_data (n) as (
2 select 786.56 from dual union all
3 select 35 from dual
4 )
5 select regexp_substr(n, '^(\d+)\.?(\d+)?$', 1, 1, null, 1) as f
6 , regexp_substr(n, '^(\d+)\.?(\d+)?$', 1, 1, null, 2) as s
7 from the_data;
F S
--- ---
786 56
35
^(\d+)\.?(\d+)?$ means at the start of the string ^, pick a group () of digits \d+ followed by an optional \.?. Then, pick an optional group of digits at the end of the string $.
We then use sub-expressions to pick out which group of digits you want to return.
Occurrence - the 3th parameter
If you place the number in a group and forget about matching the start and end of the string you can pick the first group of numbers and the second group of numbers:
SQL> with the_data (n) as (
2 select 786.56 from dual union all
3 select 35 from dual
4 )
5 select regexp_substr(n, '(\d+)\.?', 1, 1, null, 1) as f
6 , regexp_substr(n, '(\d+)\.?', 1, 2, null, 1) as s
7 from the_data;
F S
--- ---
786 56
35
(\d+)\.? means pick a group () of digits \d+ followed by an optional .. For the first group the first occurrence is the data before the ., for the second group the second occurrence is the data after .. You'll note that you still have to use the 5th parameter of REGEXP_SUBSTR() - subexpression - to state that you want the only the data in the group.
Both options
You'll note that neither of these return 0 when there are no decimal places; you'll have to add that in with a COALESCE() when the return value is NULL. You also need an explicit cast to an integer as COALESCE() expects consistent data types (this is best practice anyway):
SQL> with the_data (n) as (
2 select 786.56 from dual union all
3 select 35 from dual
4 )
5 select cast(regexp_substr(n, '^(\d+)\.?(\d+)?$', 1, 1, null, 1) as integer) as f
6 , coalesce(cast(regexp_substr(n, '^(\d+)\.?(\d+)?$', 1, 1, null, 2) as integer), 0) as s
7 from the_data;
F S
---- ----
786 56
35 0

sort semicolon separated values per row in a column

I want to sort semicolon separated values per row in a column. Eg.
Input:
abc;pqr;def;mno
xyz;pqr;abc
abc
xyz;jkl
Output:
abc;def;mno;pqr
abc;pqr;xyz
abc
jkl;xyz
Can anyone help?
Perhaps something like this. Breaking it down:
First we need to break up the strings into their component tokens, and then reassemble them, using LISTAGG(), while ordering them alphabetically.
There are many ways to break up a symbol-separated string. Here I demonstrate the use of a hierarchical query. It requires that the input strings be uniquely distinguished from each other. Since the exact same semicolon-separated string may appear more than once, and since there is no info from the OP about any other unique column in the table, I create a unique identifier (using ROW_NUMBER()) in the most deeply nested subquery. Then I run the hierarchical query to break up the inputs and then reassemble them in the outermost SELECT.
with
test_data as (
select 'abc;pqr;def;mno' as str from dual union all
select 'xyz;pqr;abc' from dual union all
select 'abc' from dual union all
select 'xyz;jkl' from dual
)
-- End of test data (not part of the solution!)
-- SQL query begins BELOW THIS LINE.
select str,
listagg(token, ';') within group (order by token) as sorted_str
from (
select rn, str,
regexp_substr(str, '([^;]*)(;|$)', 1, level, null, 1) as token
from (
select str, row_number() over (order by null) as rn
from test_data
)
connect by level <= length(str) - length(replace(str, ';')) + 1
and prior rn = rn
and prior sys_guid() is not null
)
group by rn, str
;
STR SORTED_STR
--------------- ---------------
abc;pqr;def;mno abc;def;mno;pqr
xyz;pqr;abc abc;pqr;xyz
abc abc
xyz;jkl jkl;xyz
4 rows selected.

How to get value from oracle stored in || separated?

I am new in oracle and I want to get the value from a column which is stored as "Ashu||123 ||Main Menu|ENG||1|1".
as you can see each value is separated by || symbol.in the above value Ashu is the customer name and 123 is the id, I want both value as customer-name and customer id.
In the query below, I include some test data "on the fly" (not part of the solution; use your actual table name instead of test_data in the main query, and your actual column name instead of str). I included several special cases for testing, to make sure the query works correctly in all cases. I assume the first value (before the first ||) is the customer name and the second the customer id, and the rest of the input string can be ignored. I looked in particular to see that the query handles null values correctly (assuming they may happen in your data).
I left the customer id as a string; if it must be a number, it may be better to wrap it all within to_number().
with
test_data ( str ) as (
select 'Ashu||123||Main Menu|ENG||1|1' from dual union all
select 'Misha||125' from dual union all
select 'Babu||||Main Menu|NZL||?' from dual union all
select 'Rim||' from dual union all
select 'Todd' from dual union all
select '||139||Other Stuff' from dual
)
-- end of test data (only for testing and illustration) - not part of solution
-- SQL query begins BELOW THIS LINE
select str,
regexp_substr(str, '([^|]*)(\|\||$)', 1, 1, null, 1) as cust_name,
regexp_substr(str, '([^|]*)(\|\||$)', 1, 2, null, 1) as cust_id
from test_data
;
STR CUST_NAME CUST_ID
----------------------------- --------- -------
Ashu||123||Main Menu|ENG||1|1 Ashu 123
Misha||125 Misha 125
Babu||||Main Menu|NZL||? Babu
Rim|| Rim
Todd Todd
||139||Other Stuff 139
6 rows selected.

How can I concatenate(or merge) values from 2 result sets with the same PK?

I don't know if I'm being dumb here but I can't seem to find an efficient way to do this. I wrote a very long and inefficient query that does what I need, but what I WANT is a more efficient way.
I have 2 result sets that displays an ID (a PK which is generic/from the same source in both sets) and a FLAG (A - approve and V - Validate).
Result Set 1
ID FLAG
1 V
2 V
3 V
4 V
5 V
6 V
Result Set 2
ID FLAG
2 A
5 A
7 A
8 A
I want to "merge" these two sets to give me this output:
ID FLAG
1 V
2 (V/A)
3 V
4 V
5 (V/A)
6 V
7 A
8 A
Neither of the 2 result sets will at any time have all the ID's to make a simple left join with a case statement on the other result set an easy solution.
I'm currently doing a union between the two sets to get ALL the ID's. Thereafter I left join the 2 result sets to get the required '(V/A)' by use of a case statement.
There must be a more efficient way but I just can't seem to figure it out now as I'm running low on amps... I need a holiday... :-/
Thanks in advance!
Use a FULL OUTER JOIN:
SELECT ID,
CASE
WHEN t1.FLAG IS NULL THEN t2.FLAG
WHEN t2.FLAG IS NULL THEN t1.FLAG
ELSE '(' || t1.FLAG || '/' || t2.FLAG || ')'
END AS MERGED_FLAG
FROM TABLE1 t1
FULL OUTER JOIN TABLE2 t2
USING (ID)
ORDER BY ID
See this SQLFiddle.
Share and enjoy.
I think that you can use xmlagg. Here an exemple :
SELECT deptno,
SUBSTR (REPLACE (REPLACE (XMLAGG (XMLELEMENT ("x", ename)
ORDER BY ename),'</x>'),'<x>','|'),2) as concated_list
FROM emp
GROUP BY deptno
ORDER BY deptno;
Bye

Resources