REGEXP_REPLACE replacing text with the same text in lower case - plsql

I am trying to use REGEXP_REPLACE in PL/SQL to replace some text with the same text in lower case. Actually, the rule is that I want all text between "()" that has only one char to be in lower case.
Here is an example :
SELECT REGEXP_REPLACE(
'i want what what is between <> in lower case : I am a test(E) (A) (HELLO)'
, '(\(\D\))', '<\1>'
) FROM DUAL
Result :
I want what is between <> in lower case : I am a test(e) (a) (HELLO)
or this because I am a little confuse about my exercice:
I want what is between <> in lower case : I am a test<(e)> <(a)> (HELLO)
How can I get my text in lower case ? I tried in several ways but I can't get out with it. I don't know hot to tell REGEXP_REPLACE to put "\1" content in lower case.
Thanks you for your help.
Best regards.
MS

(with Oracle11g) Here is how to replace the first occurrence:
Use REGEXP_instr and REGEXP_substr to be able to apply lower to the matched pattern
SELECT substr(it, 1 , REGEXP_instr( it, '(\(\D\))')-1)
||lower( REGEXP_substr(it, '(\(\D\))') )
||substr(it, REGEXP_instr( it, '(\(\D\))')+3, length(it))
FROM (SELECT 'i want what what is between <> in lower case : I am a test(E) (A) (HELLO)' it from dual) ;
and if you want the weird <> around it:
SELECT substr(it, 1 , REGEXP_instr( it, '(\(\D\))')-1)
|| '<'
||lower( REGEXP_substr(it, '(\(\D\))') )
|| '>'
||substr(it, REGEXP_instr( it, '(\(\D\))')+3, length(it))
FROM (SELECT 'i want what what is between <> in lower case : I am a test(E) (A) (HELLO)' it from dual) ;
I think you cannot have recursive regexp in Oracle. So if you want to be able to replace 2 occurrences:
SELECT substr(rit, 1 , REGEXP_instr( rit, '(\([[:upper:]]{1}\))')-1)
||lower( REGEXP_substr(rit, '(\([[:upper:]]{1}\))') )
||substr(rit, REGEXP_instr( rit, '(\([[:upper:]]{1}\))')+3, length(rit))
from (
(SELECT substr(it, 1 , REGEXP_instr( it, '(\([[:upper:]]{1}\))')-1)
||lower( REGEXP_substr(it, '(\([[:upper:]]{1}\))') )
||substr(it, REGEXP_instr( it, '(\([[:upper:]]{1}\))')+3, length(it)) rit
FROM (SELECT 'i want what what is between <> in lower case : I am a test(E) (A) (HELLO)' it from dual))
) ;
(+ I replace the \D by [[:upper:]]{1} which is more accurate)

To bad this has to be such a difficult problem! Seems like it would be easy.
To handle a variable number of occurrences of the pattern, you need to loop through the string looking for them. Maybe someone will come up with a slick solution using CONNECT BY or something but in the meantime and since you are using PL/SQL why don't you go old-school and create a function that does it? It will arguably be easier to maintain and will be wrapped up in a reusable unit available for all to use too. Pass it a string and have it return the cleaned up version.
SQL> select lower_single_letters('I want what is between in lower case : I am a test(E) (A) (HELLO)') text
from dual;
TEXT
--------------------------------------------------------------------------------
I want what is between in lower case : I am a test(e) (a) (HELLO)
SQL>
Here's some sample code since I wanted an example for use in my utility package:
CREATE OR REPLACE function lower_single_letters(string_in varchar2) return varchar2 is
tmp_string varchar2(1000) := string_in; -- Holds the string
regex_pattern constant varchar2(20) := '\([[:upper:]]\)'; -- Pattern to look for '(A)'
letter_offset integer; -- Offset of the pattern
letter varchar2(1); -- The letter to lower()
BEGIN
-- Loop while the pattern is found in the string passed in
while regexp_like(tmp_string, regex_pattern)
loop
-- Get the offset in the string
letter_offset := regexp_instr(tmp_string, regex_pattern)+1;
-- Get the letter
letter := substr(tmp_string, letter_offset, 1);
-- Replace it in the string
tmp_string := regexp_replace(tmp_string, '.', lower(letter), 1, letter_offset);
end loop;
-- Return it when the pattern is no longer found
return(tmp_string);
END lower_single_letters;

Related

Teradata LIKE for range for values - (regular expression?)

In Teradata I need a condition to select only records:
starting in numbers between 0 and 4
followed by string ABCD
followed by anything
I can use substring and it works. But this is not a nice piece of code.
SELECT
'4ABCDXXX' AS T
, CASE WHEN
Cast (Substring (T, 1,1) AS SMALLINT) BETWEEN 0 AND 4
AND Substring (T, 2,4) = 'ABCD'
THEN 'OK' ELSE 'NOK' END
I tried
LIKE '[0-4]ABCD%'
But this does not seem to be working...
How can this be elegantly achieved?
Thanks.
I don't think that Teradata supports the enhanced LIKE syntax which are you attempting. But, in lieu of this, we can use REGEXP_SIMILAR:
SELECT
'4ABCDXXX' AS T,
CASE WHEN REGEXP_SIMILAR('4ABCDXXX', '^[0-4]ABCD.*$', 'c')
THEN 'OK' ELSE 'NOK' END AS label
FROM yourTable;
I've never been able to make negative lookaheads work in Teradata, so I would use two tests:
select
'4ABCD123' as t,
case when
regexp_similar(t,'^[0-4]ABCD') = 1 -- starts with 0-4 followed by ABCD
and t like '%ABCD' -- does not end with ABCD
then 'nok' else 'ok' end,

how do code a sql statement replacing all x'BF' with x'00' for a certain data field that contains the ascii downside ? to replace it with null x'00'

how do I code this properly to work in Oracle SQL :
update table_name
set field_name =
replace(field_name, x'BF', x'00')
where condition expression ;
Not sure how to code the replace all occurrence of hex 'BF' with null value hex'00' contained in data field field_name.
You can use the unistr() function to provide a Unicode character. e.g.:
update table_name
set field_name = replace(field_name, unistr('\00bf'))
where condition expression ;
which would remove the ¿ character completely; or to replace it with a null character:
set field_name = replace(field_name, unistr('\00bf'), unistr('\0000'))
though I suspect sticking a null in there will confuse things even more later, when some other system tries to read that text and stops at the null.
Quick demo:
with t (str) as (
select 'A ¿ char' from dual
)
select str,
replace(str, unistr('\00bf')) as removed,
replace(str, unistr('\00bf'), unistr('\0000')) as replaced,
dump(replace(str, unistr('\00bf')), 16) as removed_hex,
dump(replace(str, unistr('\00bf'), unistr('\0000')), 16) as replaced_hex
from t;
STR REMOVED REPLACED REMOVED_HEX REPLACED_HEX
--------- --------- --------- ----------------------------------- -----------------------------------
A ¿ char A char A char Typ=1 Len=7: 41,20,20,63,68,61,72 Typ=1 Len=8: 41,20,0,20,63,68,61,72
(Just as an example of the problems you'll have - because of the null I couldn't copy and paste that from SQL Developer, and had to switch to SQL*Plus...)
The first dump shows the two spaces (hex 20) next to each other; the second shows a null character between them.

Getting ORA-22922 (nonexistent LOB value) or no result at all with wm_concat()

(Using Oracle 11.2)
I have a rather complicated SQL with something like
wm_concat( distinct abc )
that is expected to return some varchar2(4000) compatible result.
It causes ORA-00932: inconsistent datatypes in my select used in some coalesce( some_varchar_col, wm_concat( ... ) ).
So I tried casting it via two different methods:
dbms_lob.substr( ..., 4000 ) -- L) tried even with 3000 in case of "unicode byte blow-up"
cast( ... as varchar2(4000)) -- C) tried even with 3000 in case of "unicode byte blow-up"
(The are used in a view, but playing around with it suggests, it is not related to the views)
Depending on the column and other operators I either get N) no result or O) ORA-22922:
select * from view_with_above_included where rownum <= 100
N) My Eclipse Data Explorer JDBC connection returns without any result (no columns without results, no (0 rows effected), only the query time statistics). (It could be an internal exception not treated as such?)
O)
ORA-22922: nonexistent LOB value
ORA-06512: in "SYS.DBMS_LOB", line 1092
ORA-06512: in line 1
Strangely the following test queries work:
-- rownum <= 100 would already cause the above problems
select * from view_with_above_included where rownum <= 10
or
select * from view_with_above_included
but looking at the actual aggregated data does not show aggregated data that would exceed 1000 characters in length.
Luckily, it works with the listagg( ... ) function provided since 11.2 (we are already running on), so we did not have to investigate further:
listagg( abc, ',' ) within group ( order by abc )
(Where wm_concat(...) is, as one should know, some internal and officially unsupported function.)
a rather nice solution (because it is not so bloated) to implement the distinct functionality is via self-referencing regexp functionality which should work in many cases:
regexp_replace(
listagg( abc, ',' ) within group ( order by abc )
, '(^|,)(.+)(,\2)+', '\1\2' )
(Maybe/Hopefully we will see some working listagg( distinct abc ) functionality in the future, which would be very neat and cool like the wm_concat syntax. E.g. this is no problem since a long time with Postgres' string_agg( distinct abc )1 )
-- 1: postgres sql example:
select string_agg( distinct x, ',' ) from unnest('{a,b,a}'::text[]) as x`
If the list exceeds 4000 characters, one cannot use listagg anymore (ORA-22922 again).
But luckily we can use the xmlagg function here (as mentioned here).
If you want to realize a distinct on a 4000-chars-truncated result here, you could outcomment the (1)-marked lines.
-- in smallercase everything that could/should be special for your query
-- comment in (1) to realize a distinct on a 4000 chars truncated result
WITH cfg AS (
SELECT
',' AS list_delim,
'([^,]+)(,\1)*(,|$)' AS list_dist_match, -- regexp match for distinct functionality
'\1\3' AS LIST_DIST_REPL -- regexp replace for distinct functionality
FROM DUAL
)
SELECT
--REGEXP_REPLACE( DBMS_LOB.SUBSTR( -- (1)
RTRIM( XMLAGG( XMLELEMENT( E, mycol, listdelim ).EXTRACT('//text()')
ORDER BY mycol ).GetClobVal(), LIST_DELIM )
--, 4000 ), LIST_DIST_MATCH, LIST_DIST_REPL ) -- (1)
AS mylist
FROM mytab, CFG

extract some characters with instr

in pl/sql
I have these text:
${cat};${dog};
I would like to extract these:
${dog}
I'm trying with instr but allways shows me the last semicolon with these:
SELECT substr(field,instr(field,'$',1,2),instr(field,';',1,2)-1),...
Any help please
the function is defined as substr(str, pos, len), so you have to subtract the positions in the 3rd argument as substr(str, pos1, pos2 - pos1)
Not sure if it's what you need, but I would code this:
select substr('${cat};${dog};'
,instr('${cat};${dog};',';',1,1)+1
,instr('${cat};${dog};',';',1,2)-instr('${cat};${dog};',';',1,1)-1
)
from dual;
If looking for the second item in the list, here's a way using REGEXP_SUBSTR() to return the 2nd occurrence of a set of zero or more characters that are not a semi-colon, where they are followed by a semi-colon or the end of the line. This allows for a NULL value in the list:
select REGEXP_SUBSTR('${cat};${dog};', '([^;]*)(;|$)', 1, 2, NULL, 1) from dual;
Even better, make the call to REGEXP_SUBSTR generic and put it into a stored function that you pass a string, the element you want and the delimiter and have it return the string.
Benefits:
- Logic and code is encapsulated in a reusable function all can use without having to understand the regular expression syntax (but still get the power from it)
- There is a consistent, simple way to call it
- Code becomes MUCH easier to follow/debug
- If it needs to change there is only one place to change it
- A particular element of a list can be SELECTed
- An element of a list can be used in a WHERE clause
Here is the function definition itself:
FUNCTION GET_LIST_ELEMENT(string_in VARCHAR2, element_in NUMBER, delimiter_in VARCHAR2 DEFAULT ',') RETURN VARCHAR2 IS
BEGIN
RETURN REGEXP_SUBSTR(string_in, '([^\'||delimiter_in || ']*)(\'||delimiter_in||'|$)', 1, element_in, NULL, 1);
END GET_LIST_ELEMENT;

PLS-00103: Encountered the symbol ","

This procedure is getting following error.
CREATE OR REPLACE PROCEDURE SAMPLE
IS
BEGIN
EXECUTE IMMEDIATE
'CREATE TABLE COLUMN_NAMES AS (
SELECT LISTAGG(COLUMN_NAME, ',') WITHIN GROUP (ORDER BY COLUMN_NAME) AS STUDENTS
FROM
(SELECT DISTINCT COLUMN_NAME
FROM BW_COLUMN_ROW_CELL_JOIN)
)';
END;
/
gives:
PLS-00103: Encountered the symbol "," when expecting one of the following:
* & = - + ; < / > at in is mod remainder not rem return
returning <an exponent (**)> <> or != or ~= >= <= <> and or
like like2 like4 likec between into using || multiset bulk member submultiset
Can any one say what is wrong in this?
Thanks.
Another way (in Oracle 10g and later) is to use the alternative string literal notation - this means you don't need to worry about correctly escaping all the single quotes in the string, e.g. q'{my string's got embedded quotes}':
CREATE OR REPLACE PROCEDURE SAMPLE
IS
BEGIN
EXECUTE IMMEDIATE q'[
CREATE TABLE COLUMN_NAMES AS (
SELECT LISTAGG(COLUMN_NAME, ',') WITHIN GROUP (ORDER BY COLUMN_NAME) AS STUDENTS
FROM
(SELECT DISTINCT COLUMN_NAME
FROM BW_COLUMN_ROW_CELL_JOIN)
)]';
END;
/
The problem I think is you have single quotes within single quotes. I cant test this at the moment, but I'd suggest you try the following (note the inner quotes are double quotes '', which escapes them:
CREATE OR REPLACE PROCEDURE SAMPLE
IS
BEGIN
EXECUTE IMMEDIATE 'CREATE TABLE COLUMN_NAMES AS ( SELECT LISTAGG(COLUMN_NAME, '','') WITHIN GROUP (ORDER BY COLUMN_NAME) AS STUDENTS FROM (SELECT DISTINCT COLUMN_NAME FROM BW_COLUMN_ROW_CELL_JOIN) )';
END;
/
I'd also try the create table part of the code standalone first just to make sure its valid before wrapping it in a proc.
You can't use single quotes directly in select statement of Execute Immediate it need to be coded using CHR(39)
CREATE OR REPLACE PROCEDURE SAMPLE
IS
BEGIN
EXECUTE IMMEDIATE
'CREATE TABLE COLUMN_NAMES AS (
SELECT LISTAGG(COLUMN_NAME,'||chr(39)||','||chr(39)||') WITHIN GROUP (ORDER BY COLUMN_NAME) AS STUDENTS
FROM
(SELECT DISTINCT COLUMN_NAME FROM BW_COLUMN_ROW_CELL_JOIN))';
END;

Resources