Replace special characters in Oracle database - oracle11g

I have more than 200 000 data in a table with a column that has a special character 'Æ??` and text is test1234Æ??. How to replace that with '?' symbol?

The Æ has unicode 00C6. Just google "Æ unicode" to find that - with that knowledge, you can then use the UNISTR function to represent that character. But you can also just paste that character in your sql as shown below if your client supports the unicode characters.
WITH mystrings AS
(SELECT 'found a Æ in this text' as str FROM DUAL UNION ALL
SELECT 'text is test1234Æ??' FROM DUAL
)
SELECT
REPLACE(str,UNISTR('\00c6'),'?') as clean_string,
REPLACE(str,'Æ','?') as clean_string2,
REPLACE(str,UNISTR('\00c6??'),'?') as clean_stringnoqm
FROM mystrings;
CLEAN_STRING CLEAN_STRING2 CLEAN_STRINGNOQM
----------------------- ----------------------- -----------------------
found a ? in this text found a ? in this text found a Æ in this text
text is test1234??? text is test1234??? text is test1234?
If you want to only keep characters in the range a-zA-Z and comma you could use a regular expression. There are plenty of other answers around for that, for example this one.
WITH mystrings AS
( SELECT
'Brand1® is supercool, but I prefer bRand2™ since it supports Æ' AS str
FROM dual
)
SELECT
regexp_replace(str,'[^A-Za-z0-9, ]', '?') AS clean_string
FROM mystrings;

Related

PLSQL: Find invalid characters in a database column (UTF-8)

I have a text column in a table which I need to validate to recognize which records have non UTF-8 characters.
Below is an example record where there are invalid characters.
text = 'PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, monta􀄪 tablic i uchwytów na r􀄊czniki, wymiana zamka systemowego'
There are over 3 million records in this table, so I need to validate them all at once and get the rows where this text column has non UTF-8 characters.
I tried below:
instr(text, chr(26)) > 0 - no records get fetched
text LIKE '%ó%' (tried this for a few invalid characters I noticed) - no records get fetched
update <table> set text = replace(text, 'ó', 'ó') - no change seen in text
Is there anything else I can do?
Appreciate your input.
This is Oracle 11.2
The characters you're seeing might be invalid for your data, but they are valid AL32UTF8 characters. Else they would not be displayed correctly. It's up to you to determine what character set contains the correct set of characters.
For example, to check if a string only contains characters in the US7ASCII character set, use the CONVERT function. Any character that cannot be converted into a valid US7ASCII character will be displayed as ?.
The example below first replaces the question marks with string '~~~~~', then converts and then checks for the existence of a question mark in the converted text.
WITH t (c) AS
(SELECT 'PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, monta􀄪 tablic i uchwytów na r􀄊czniki, wymiana zamka systemowego' FROM DUAL UNION ALL
SELECT 'Just a bit of normal text' FROM DUAL UNION ALL
SELECT 'Question mark ?' FROM DUAL),
converted_t (c) AS
(
SELECT
CONVERT(
REPLACE(c,'?','~~~~~')
,'US7ASCII','AL32UTF8')
FROM t
)
SELECT CASE WHEN INSTR(c,'?') > 0 THEN 'Invalid' ELSE 'Valid' END as status, c
FROM converted_t
;
Invalid
PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, montao??? tablic i uchwyt??w na ro??Sczniki, wymiana zamka systemowego
Valid
Just a bit of normal text
Valid
Question mark ~~~~~
Again, this is just an example - you might need a less restrictive character set.
--UPDATE--
With your data: it's up to you to determine how you want to continue. Determine what is a good target data set. Contrary to what I set earlier, it's not mandatory to pass a "from dataset" argument in the CONVERT function.
Things you could try:
Check which characters show up as '�' when converting from UTF8 at AL32UTF8
select * from G2178009_2020030114_dinllk
WHERE INSTR(CONVERT(text ,'AL32UTF8','UTF8'),'�') > 0;
Check if the converted text matches the original text. In this example I'm converting to UTF8 and comparing against the original text. If it is different then the converted text will not be the same as the original text.
select * from G2178009_2020030114_dinllk
WHERE
CONVERT(text ,'UTF8') = text;
This should be enough tools for you to diagnose your data issue.
As shown by previous comments, you can detect the issue in place, but it's difficult to automatically correct in place.
I have used https://pypi.org/project/ftfy/ to correct invalidly encoded characters in large files.
It guesses what the actual UTF8 character should be, and there are some controls on how it does this. For you, the problem is that you have to pull the data out, fix it, and put it back in.
So assuming you can get the data out to the file system to fix it, you can locate files with bad encodings with something like this:
find . -type f | xargs -I {} bash -c "iconv -f utf-8 -t utf-16 {} &>/dev/null || echo {}"
This produces a list of files that potentially need to be processed by ftfy.

Trimming column data in Snowflake

In Teradata to Trim Leading or Trailing Zero "0" OR any Character we can use
TRIM(LEADING '0' FROM COLUMN) OR TRIM(TRAILING '0' FROM COLUMN)
In Snowflake it seems this doesn't work! Do we know the alternative in Snowflake for this?
Use ltrim. This function removes leading characters, including whitespace, from a string, as shown in the documentation
select ltrim('#000000123', '0');
Gives:
+---------------------------+
| LTRIM('000000123', '0') |
|---------------------------|
| 123 |
+---------------------------+
Scenario 1
If there is blank space on either side of text of oracle.
Solution
select trim(column_name) from table_name
Scenario 2
If we have zero and special character.
Solution
SELECT REGEXP_REPLACE(Column_name,'[#$#!~*)({}.,:;"0]') FROM table_name;
Scenario 3
If we have number and special character.
Solution
SELECT REGEXP_REPLACE(Column_name,'[#$#!~*)({}.,:;"0123456789]') FROM table_name;
Scenario 4
If we have zero,blank space before/after text and special character .
Solution
SELECT REGEXP_REPLACE(TRIM(Column_name),'[#$#!~*)({}.,:;"0]') FROM table_name;
Scenario 5
If we have number,blank space before/after text and special character .
Solution
SELECT REGEXP_REPLACE(TRIM(Column_name),'[#$#!~*)({}.,:;"0123456789]') FROM table_name;
Scenario 6
If we have blank space in between the text.
Solution
SELECT REGEXP_REPLACE(Column_name,' ') FROM table_name;
Scenario 7
If we have blank space in before ,after and between the text.
Solution
SELECT REGEXP_REPLACE(TRIM(Column_name),' ') FROM table_name;
Scenario 8
If we have blank space between the text with special character.
Solution
SELECT REGEXP_REPLACE(REGEXP_REPLACE(TRIM(column_name),' '),'[#$#!~*{}.,:-_;"]') FROM table_name;
Similarly other combination of sql statement can be create.

Get the correct Hexadecimal for strange symbol

I have this strange symbol on my pl/sql developer client (check image it's the symbol between P and B )
In the past, and for a different symbol, i was able to update my DB and remove them making this:
update table set ent_name = replace(ent_name, UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C29B')), ' ');
The problem is that i dont remember how I translated the symbol (i had at that time) to the C29B.
Can you help me to understand how can i translate the currenct symbol to the HEX format, to i can use the command to remove it from my database?
Thanks
As long as it's in your table, you can use the DUMP function to find it.
Use DUMP to get the byte representation of the data in code of you wish to inspect for weirdness.
A good overview: Oracle / PLSQL: DUMP Function
Here's some text with plain ASCII:
select dump('Dashes-and "smart quotes"') from dual;
Typ=96 Len=25:
68,97,115,104,101,115,45,97,110,100,32,34,115,109,97,114,116,32,113,117,111,116,101,115,34
Now introduce funny characters:
select dump('Dashes—and “smart quotes”') from dual;
Typ=96 Len=31:
68,97,115,104,101,115,226,128,148,97,110,100,32,226,128,156,115,109,97,114,116,32,113,117,111,116,101,115,226,128,157
In this case, the number of bytes increased because my DB is using UTF8. Numbers outside of the valid range for ASCII stand out and can be inspected further.
The ASCIISTR function provides an even more convenient way to see the special characters:
select asciistr('Dashes—and “smart quotes”') from dual;
Dashes\2014and \201Csmart quotes\201D
This one converts non-ASCII characters into backslashed Unicode hex.
The DUMP function takes an additional argument that can be used to format the output in a nice way:
select DUMP('Thumbs 👍', 1017) from dual;
Typ=96 Len=11 CharacterSet=AL32UTF8: T,h,u,m,b,s, ,f0,9f,91,8d
select DUMP('Smiley 😊 Face', 17) from dual;
Typ=96 Len=16: S,m,i,l,e,y, ,f0,9f,98,8a, ,F,a,c,e

Remove UTF-8 substring in sqlite

I am trying to remove some invisible characters from a table. I tried this query:
UPDATE table SET text = REPLACE(text, x'202B', '' )
with no luck. I also tried selecting it using:
SELECT REPLACE(text, x'202B', '####') AS text FROM table
but nothing is replaced, so I'm guessing that it can't find x'202B' in the text column, but if I use this query:
SELECT * FROM table WHERE text REGEXP "[\x202B]"
I do get results.
x'202B' is not a single, invisible Unicode character; it is a blob containing the two ASCII characters and +.
All SQLite strings are encoded in UTF-8.
When you are constructing strings from bytes manually, you have to use the same encoding:
x'E280AB'

How to escape a % sign in sqlite?

I do a full text search using LIKE clause and the text can contain a '%'.
What is a good way to search for a % sign in an sqlite database?
I did try
SELECT * FROM table WHERE text_string LIKE '%[%]%'
but that doesn't work in sqlite.
From the SQLite documentation
If the optional ESCAPE clause is present, then the expression following the ESCAPE keyword must evaluate to a string consisting of a single character. This character may be used in the LIKE pattern to include literal percent or underscore characters. The escape character followed by a percent symbol (%), underscore (_), or a second instance of the escape character itself matches a literal percent symbol, underscore, or a single escape character, respectively.
We can achieve same thing with the below query
SELECT * FROM table WHERE instr(text_string, ?)>0
Here :
? => your search word
Example :
You can give text directly like
SELECT * FROM table WHERE instr(text_string, '%')>0
SELECT * FROM table WHERE instr(text_string, '98.9%')>0 etc.
Hope this helps better.

Resources