Get the correct Hexadecimal for strange symbol - plsql

I have this strange symbol on my pl/sql developer client (check image it's the symbol between P and B )
In the past, and for a different symbol, i was able to update my DB and remove them making this:
update table set ent_name = replace(ent_name, UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C29B')), ' ');
The problem is that i dont remember how I translated the symbol (i had at that time) to the C29B.
Can you help me to understand how can i translate the currenct symbol to the HEX format, to i can use the command to remove it from my database?
Thanks

As long as it's in your table, you can use the DUMP function to find it.
Use DUMP to get the byte representation of the data in code of you wish to inspect for weirdness.
A good overview: Oracle / PLSQL: DUMP Function
Here's some text with plain ASCII:
select dump('Dashes-and "smart quotes"') from dual;
Typ=96 Len=25:
68,97,115,104,101,115,45,97,110,100,32,34,115,109,97,114,116,32,113,117,111,116,101,115,34
Now introduce funny characters:
select dump('Dashes—and “smart quotes”') from dual;
Typ=96 Len=31:
68,97,115,104,101,115,226,128,148,97,110,100,32,226,128,156,115,109,97,114,116,32,113,117,111,116,101,115,226,128,157
In this case, the number of bytes increased because my DB is using UTF8. Numbers outside of the valid range for ASCII stand out and can be inspected further.
The ASCIISTR function provides an even more convenient way to see the special characters:
select asciistr('Dashes—and “smart quotes”') from dual;
Dashes\2014and \201Csmart quotes\201D
This one converts non-ASCII characters into backslashed Unicode hex.
The DUMP function takes an additional argument that can be used to format the output in a nice way:
select DUMP('Thumbs 👍', 1017) from dual;
Typ=96 Len=11 CharacterSet=AL32UTF8: T,h,u,m,b,s, ,f0,9f,91,8d
select DUMP('Smiley 😊 Face', 17) from dual;
Typ=96 Len=16: S,m,i,l,e,y, ,f0,9f,98,8a, ,F,a,c,e

Related

PLSQL: Find invalid characters in a database column (UTF-8)

I have a text column in a table which I need to validate to recognize which records have non UTF-8 characters.
Below is an example record where there are invalid characters.
text = 'PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, monta􀄪 tablic i uchwytów na r􀄊czniki, wymiana zamka systemowego'
There are over 3 million records in this table, so I need to validate them all at once and get the rows where this text column has non UTF-8 characters.
I tried below:
instr(text, chr(26)) > 0 - no records get fetched
text LIKE '%ó%' (tried this for a few invalid characters I noticed) - no records get fetched
update <table> set text = replace(text, 'ó', 'ó') - no change seen in text
Is there anything else I can do?
Appreciate your input.
This is Oracle 11.2
The characters you're seeing might be invalid for your data, but they are valid AL32UTF8 characters. Else they would not be displayed correctly. It's up to you to determine what character set contains the correct set of characters.
For example, to check if a string only contains characters in the US7ASCII character set, use the CONVERT function. Any character that cannot be converted into a valid US7ASCII character will be displayed as ?.
The example below first replaces the question marks with string '~~~~~', then converts and then checks for the existence of a question mark in the converted text.
WITH t (c) AS
(SELECT 'PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, monta􀄪 tablic i uchwytów na r􀄊czniki, wymiana zamka systemowego' FROM DUAL UNION ALL
SELECT 'Just a bit of normal text' FROM DUAL UNION ALL
SELECT 'Question mark ?' FROM DUAL),
converted_t (c) AS
(
SELECT
CONVERT(
REPLACE(c,'?','~~~~~')
,'US7ASCII','AL32UTF8')
FROM t
)
SELECT CASE WHEN INSTR(c,'?') > 0 THEN 'Invalid' ELSE 'Valid' END as status, c
FROM converted_t
;
Invalid
PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, montao??? tablic i uchwyt??w na ro??Sczniki, wymiana zamka systemowego
Valid
Just a bit of normal text
Valid
Question mark ~~~~~
Again, this is just an example - you might need a less restrictive character set.
--UPDATE--
With your data: it's up to you to determine how you want to continue. Determine what is a good target data set. Contrary to what I set earlier, it's not mandatory to pass a "from dataset" argument in the CONVERT function.
Things you could try:
Check which characters show up as '�' when converting from UTF8 at AL32UTF8
select * from G2178009_2020030114_dinllk
WHERE INSTR(CONVERT(text ,'AL32UTF8','UTF8'),'�') > 0;
Check if the converted text matches the original text. In this example I'm converting to UTF8 and comparing against the original text. If it is different then the converted text will not be the same as the original text.
select * from G2178009_2020030114_dinllk
WHERE
CONVERT(text ,'UTF8') = text;
This should be enough tools for you to diagnose your data issue.
As shown by previous comments, you can detect the issue in place, but it's difficult to automatically correct in place.
I have used https://pypi.org/project/ftfy/ to correct invalidly encoded characters in large files.
It guesses what the actual UTF8 character should be, and there are some controls on how it does this. For you, the problem is that you have to pull the data out, fix it, and put it back in.
So assuming you can get the data out to the file system to fix it, you can locate files with bad encodings with something like this:
find . -type f | xargs -I {} bash -c "iconv -f utf-8 -t utf-16 {} &>/dev/null || echo {}"
This produces a list of files that potentially need to be processed by ftfy.

I keep getting a syntax error on this code [duplicate]

I'm trying to insert some information to MySQL with Pascal, but when I run the program I get the error
unknown column 'mohsen' in field list
This is my code
procedure TForm1.Button1Click(Sender: TObject);
var
aSQLText: string;
aSQLCommand: string;
namee:string;
family:string;
begin
namee:='mohsen';
family:='dolatshah';
aSQLText:= 'INSERT INTO b_tbl(Name,Family) VALUES (%s,%s)';
aSQLCommand := Format(aSQLText, [namee, family]);
SQLConnector1.ExecuteDirect(aSQLCommand);
SQLTransaction1.Commit;
end;
How can I solve this problem?
It's because your
VALUES (%s,%s)
isn't surrounding the namee and family variable contents by quotes. Therefore, your back-end Sql engine thinks your mohsen is a column name, not a value.
Instead, use, e.g.
VALUES (''%s'',''%s'')
as in
Namee := 'mohsen';
Family := 'dolatshah';
aSQLText:= 'INSERT INTO b_tbl(Name,Family) VALUES (''%s'',''%s'')';
aSQLCommand := Format(aSQLText,[namee,family]);
In the original version of my answer, I explained how to fix your problem by "doubling up" single quotes in the Sql you were trying to build, because it seemed to me that you were having difficulty seeing (literally) what was wrong with what you were doing.
An alternative (and better) way to avoid your problem (and the one I always use in real life) is to use the QuotedStr() function. The same code would then become
aSQLText := 'INSERT INTO b_tbl (Name, Family) VALUES (%s, %s)';
aSQLCommand := Format(aSQLText, [QuotedStr(namee), QuotedStr(family)]);
According to the Online Help:
Use QuotedStr to convert the string S to a quoted string. A single quote character (') >is inserted at the beginning and end of S, and each single quote character in the string is >repeated.
What it means by "repeated" is what I've referred to as "doubling up". Why that's important, and the main reason I use QuotedStr is to avoid the Sql db-engine throwing an error when the value you want to send contains a single quote character as in O'Reilly.
Try adding a row containing that name to your table using MySql Workbench and you'll see what I mean.
So, not only does using QuotedStr make constructing SQL statements as strings in Delphi code less error-prone, but it also avoid problems at the back-end, too.
Just in case this will help anybody else I had the same error when I was parsing a python variable with a sql statement and it had an if statement in i.e.
sql="select bob,steve, if(steve>50,'y','n') from table;"
try as I might it coming up with this "unknown column y" - so I tried everything and then I was about to get rid of it and give it up as a bad job until I thought I would swap the " for ' and ' for "..... Hoooraaahh it works!
This is the statement that worked
sql='select bob,steve, if(steve>50,"y","n") from table;'
Hope it helps...
To avoid this sort of problem and SQL injection you should really look into using SQL parameters for this, not the Pascal format statement.

Why one AL32UTF8 character not display the I-Acute, yet other one displays the tilde-N?

My Oracle 11g is configured with AL32UTF8
NLS_CHARACTERSET AL32UTF8
Why does the tilde-N display as tilde-N in the second record, but the Acute-I and K
not display with Acute-I and K in the first record?
Additional Information:
The hex code for the Accent-I is CD
When I take the HEX code from the dump and convert it using UNISTR(), the character displays with the accent.
select
unistr('\0052\0045\0059\004B\004A\0041\0056\00CD\004B')
as hex_to_unicode
from dual;
This is probably an issue with whatever client you are using to display the results than your database. What are you using?
You can check if the database results are correct using the DUMP function. If the value in your table has the correct byte sequence for your database character set, you're good.
Edit:
OK, I'm pretty sure your data is bad. You're talking about
LATIN CAPITAL LETTER I WITH ACUTE, which is Unicode code point U+00CD. That is not the same as byte 0xCD. You're using database character set AL32UTF8, which uses UTF-8 encoding. The correct UTF-8 encoding for the U+00CD character is the two-byte sequence 0xC38D.
What you have is UTF-8 byte sequence 0xCD4B, which I'm pretty sure is invalid.
The Oracle UNISTR function takes the code point in UCS-2 encoding, which is roughly the same as UTF-16, not UTF-8.
Demonstration here: http://sqlfiddle.com/#!4/7e9d1f/1

Convert hex to text using SQLite

I am trying to convert from a string representing hex data to the textual encoding (ASCII, UTF-8, etc.) of the hex data using purely SQLite language. Essentially I want the functionality of the X'[hex]' syntax, but applied to a programmatically derived hex string.
I want, for example, select X(hex_data_string) from ..., which is not legal SQLite syntax.
Obviously in the above snippet, I would not necessarily be able to output the data if it was not in a valid textual encoding. That is, if hex_data_string contains control chars, etc., the X() should fail in some way. If this is possible, there would have to be a default character encoding or the desired character encoding would have to be specified somehow.
I am not asking about how to retrieve the hex data string value from the SQLite database and then use C or some other facility to convert it. I am trying to perform this conversion in pure SQLite because I have queries that I check which return a text representation of hex characters representing binary data. Most of the binary data is ASCII, so I want to be able to quickly view the content of the binary data in my query output when applicable.
Intuitively, I figured this could be accomplished by casting the hex data string to a blob and using hex() but that still returns the hex data string.
Any ideas?
Possible duplicates:
SQLite X'...' notation with column data
sqlite char, ascii function
This is quite old but I was looking for the same thing so I thought I'd post:
It seems you can just cast hex data as a varchar to convert it to ascii.
ie:
select cast(data as varchar) from some_table will return a string representation of a binary field (data).
This is not possible in pure SQLite.
As an embedded database, SQLite is designed to provide only pure database functions, and leave the program logic to the the application.
You are supposed to retrieve the hex data string value from the SQLite database and then use C or some other facility to convert it.
If you control the program you're running the queries in, you could install a user-defined function that does this conversion.
This will not be fully useful since it can't really be used inline, but I have done this in the past when I just wanted to convert one row of data and not create/find another utility.
WITH RECURSIVE test(c,cur) as (
select '','686F77647921'
UNION ALL
select c || char((case substr(cur,1,1) when 'A' then 10 when 'B' then 11 when 'C' then 12 when 'D' then 13 when 'E' then 14 when 'F' then 15 else substr(cur,1,1) end)*16
+ (case substr(cur,2,1) when 'A' then 10 when 'B' then 11 when 'C' then 12 when 'D' then 13 when 'E' then 14 when 'F' then 15 else substr(cur,2,1) end)),
substr(cur,3)
from test where length(cur)>0
)
select * from test

SQLite X'...' notation with column data

I am trying to write a custom report in Spiceworks, which uses SQLite queries. This report will fetch me hard drive serial numbers that are unfortunately stored in a few different ways depending on what version of Windows and WMI were on the machine.
Three common examples (which are enough to get to the actual question) are as follows:
Actual serial number: 5VG95AZF
Hexadecimal string with leading spaces: 2020202057202d44585730354341543934383433
Hexadecimal string with leading zeroes: 3030303030303030313131343330423137454342
The two hex strings are further complicated in that even after they are converted to ASCII representation, each pair of numbers are actually backwards. Here is an example:
3030303030303030313131343330423137454342 evaluates to 00000000111430B17ECB
However, the actual serial number on that hard drive is 1141031BE7BC, without leading zeroes and with the bytes swapped around. According to other questions and answers I have read on this site, this has to do with the "endianness" of the data.
My temporary query so far looks something like this (shortened to only the pertinent section):
SELECT pd.model as HDModel,
CASE
WHEN pd.serial like "30303030%" THEN
cast(('X''' || pd.serial || '''') as TEXT)
WHEN pd.serial like "202020%" THEN
LTRIM(X'2020202057202d44585730354341543934383433')
ELSE
pd.serial
END as HDSerial
The result of that query is something like this:
HDModel HDSerial
----------------- -------------------------------------------
Normal Serial 5VG95AZF
202020% test case W -DXW05CAT94843
303030% test case X'3030303030303030313131343330423137454342'
This shows that the X'....' notation style does convert into the correct (but backwards) result of W -DXW05CAT94843 when given a fully literal number (the 202020% line). However, I need to find a way to do the same thing to the actual data in the column, pd.serial, and I can't find a way.
My initial thought was that if I could build a string representation of the X'...' notation, then perhaps cast() would evaluate it. But as you can see, that just ends up spitting out X'3030303030303030313131343330423137454342' instead of the expected 00000000111430B17ECB. This means the concatenation is working correctly, but I can't find a way to evaluate it as hex the same was as in the manual test case.
I have been googling all morning to see if there is just some syntax I am missing, but the closest I have come is this concatenation using the || operator.
EDIT: Ultimately I just want to be able to have a simple case statement in my query like this:
SELECT pd.model as HDModel,
CASE
WHEN pd.serial like "30303030%" THEN
LTRIM(X'pd.serial')
WHEN pd.serial like "202020%" THEN
LTRIM(X'pd.serial')
ELSE
pd.serial
END as HDSerial
But because pd.serial gets wrapped in single quotes, it is taken as a literal string instead of taken as the data contained in that column. My hope was/is that there is just a character or operator I need to specify, like X'$pd.serial' or something.
END EDIT
If I can get past this first hurdle, my next task will be to try and remove the leading zeroes (the way LTRIM eats the leading spaces) and reverse the bytes, but to be honest, I would be content even if that part isn't possible because it wouldn't be hard to post-process this report in Excel to do that.
If anyone can point me in the right direction I would greatly appreciate it! It would obviously be much easier if I was using PHP or something else to do this processing, but because I am trying to have it be an embedded report in Spiceworks, I have to do this all in a single SQLite query.
X'...' is the binary representation in sqlite. If the values are string, you can just use them as such.
This should be a start:
sqlite> select X'3030303030303030313131343330423137454342';
00000000111430B17ECB
sqlite> select ltrim(X'3030303030303030313131343330423137454342','0');
111430B17ECB
I hope this puts you on the right path.

Resources