I'm using Delphi XE7 with FireDAC to access SQLite.
When I put data into a TEXT field, any trailing spaces or #0 characters get truncated.
Is there something I can change in either SQLite or FireDAC to have it preserve the trailing white space?
// The trailing spaces after Command don't come back from SQLite.
fFireDACQuery.ParamByName(kSQLFieldScriptCommands).AsString := 'Command ';
Disable the StrsTrim property. This property is described as:
TFDFormatOptions.StrsTrim
Controls the removing of trailing spaces from string values and zero
bytes from binary values.
And it seems that you want to store binary data rather than text. If that is correct, better define your field data type e.g. as BINARY[255] for fixed length binary string of 255 bytes (255 is the maximum length of ShortString that you use).
Parameter value for such field you would then access this way:
var
Data: RawByteString;
begin
ReadByteDataSomehow(Data);
FDQuery.FormatOptions.StrsTrim := False;
FDQuery.SQL.Text := 'INSERT INTO MyTable (MyBinaryField) VALUES (:MyBinaryData)';
FDQuery.ParamByName('MyBinaryData').AsByteStr := Data;
FDQuery.ExecSQL;
end;
Related
I have a text column in a table which I need to validate to recognize which records have non UTF-8 characters.
Below is an example record where there are invalid characters.
text = 'PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, monta􀄪 tablic i uchwytów na r􀄊czniki, wymiana zamka systemowego'
There are over 3 million records in this table, so I need to validate them all at once and get the rows where this text column has non UTF-8 characters.
I tried below:
instr(text, chr(26)) > 0 - no records get fetched
text LIKE '%ó%' (tried this for a few invalid characters I noticed) - no records get fetched
update <table> set text = replace(text, 'ó', 'ó') - no change seen in text
Is there anything else I can do?
Appreciate your input.
This is Oracle 11.2
The characters you're seeing might be invalid for your data, but they are valid AL32UTF8 characters. Else they would not be displayed correctly. It's up to you to determine what character set contains the correct set of characters.
For example, to check if a string only contains characters in the US7ASCII character set, use the CONVERT function. Any character that cannot be converted into a valid US7ASCII character will be displayed as ?.
The example below first replaces the question marks with string '~~~~~', then converts and then checks for the existence of a question mark in the converted text.
WITH t (c) AS
(SELECT 'PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, monta􀄪 tablic i uchwytów na r􀄊czniki, wymiana zamka systemowego' FROM DUAL UNION ALL
SELECT 'Just a bit of normal text' FROM DUAL UNION ALL
SELECT 'Question mark ?' FROM DUAL),
converted_t (c) AS
(
SELECT
CONVERT(
REPLACE(c,'?','~~~~~')
,'US7ASCII','AL32UTF8')
FROM t
)
SELECT CASE WHEN INSTR(c,'?') > 0 THEN 'Invalid' ELSE 'Valid' END as status, c
FROM converted_t
;
Invalid
PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, montao??? tablic i uchwyt??w na ro??Sczniki, wymiana zamka systemowego
Valid
Just a bit of normal text
Valid
Question mark ~~~~~
Again, this is just an example - you might need a less restrictive character set.
--UPDATE--
With your data: it's up to you to determine how you want to continue. Determine what is a good target data set. Contrary to what I set earlier, it's not mandatory to pass a "from dataset" argument in the CONVERT function.
Things you could try:
Check which characters show up as '�' when converting from UTF8 at AL32UTF8
select * from G2178009_2020030114_dinllk
WHERE INSTR(CONVERT(text ,'AL32UTF8','UTF8'),'�') > 0;
Check if the converted text matches the original text. In this example I'm converting to UTF8 and comparing against the original text. If it is different then the converted text will not be the same as the original text.
select * from G2178009_2020030114_dinllk
WHERE
CONVERT(text ,'UTF8') = text;
This should be enough tools for you to diagnose your data issue.
As shown by previous comments, you can detect the issue in place, but it's difficult to automatically correct in place.
I have used https://pypi.org/project/ftfy/ to correct invalidly encoded characters in large files.
It guesses what the actual UTF8 character should be, and there are some controls on how it does this. For you, the problem is that you have to pull the data out, fix it, and put it back in.
So assuming you can get the data out to the file system to fix it, you can locate files with bad encodings with something like this:
find . -type f | xargs -I {} bash -c "iconv -f utf-8 -t utf-16 {} &>/dev/null || echo {}"
This produces a list of files that potentially need to be processed by ftfy.
I'm trying to create a table and I get the error. Could someone please let me know how to add a column which has an integer starting in its name. Find below the statement and error
Create table mutablecode
(
4th_Procedure_Code varchar(20)
);
Syntax error, expected something like ','
between an integer and the word 'th_Procedure_Code'
A valid object name in Teradata consists of a-z,A-Z,0-9,#,_,$ but must not start with a digit.
If you really need this column name you must double quote it (then almost any character is allowed):
"4th_Procedure_Code" varchar(20)
Remark: According to Standard SQL a double quoted name is case-sensitive, but in Teradata it's still case-insensitive.
I am trying to remove some invisible characters from a table. I tried this query:
UPDATE table SET text = REPLACE(text, x'202B', '' )
with no luck. I also tried selecting it using:
SELECT REPLACE(text, x'202B', '####') AS text FROM table
but nothing is replaced, so I'm guessing that it can't find x'202B' in the text column, but if I use this query:
SELECT * FROM table WHERE text REGEXP "[\x202B]"
I do get results.
x'202B' is not a single, invisible Unicode character; it is a blob containing the two ASCII characters and +.
All SQLite strings are encoded in UTF-8.
When you are constructing strings from bytes manually, you have to use the same encoding:
x'E280AB'
When doing a select of all columns from a table consisting of 86 columns in SQLA, I always get the error Row size or Sort Key size overflow. The only way to avoid this error is to trim down the number of columns in the select, but this is an unconventional solution. There has to be a way to select all columns from this table in one select statement.
Bounty
I am adding this bounty because I cannot hack my way past this issue any longer. There has to be a solution to this. Right now, I am selecting from a table with Unicode columns. I am assuming this is causing the row size to exceed capacity. When I remove Session Character Set=UTF8 from my connection string, I get the error of The string contains an untranslatable character. I am using NET data provider 14.0.0.1. Is there a way to increase the size?
Update
Rob, you never cease to impress! You suggestion of using UTF16 works. It even works in SQLA after I update my ODBC config. I think my problem all along is my lack of understanding of ASCII, Latin, UTF8, and UTF16.
We also have an 80-column table that consists of all Latin columns, a few of which are `varchar(1000)'. I get the same error in SQLA when selecting from it in UTF8 and UTF16, but I can select from it just fine after updating my character set to ASCII or Latin mode in my ODBC config.
Rob, can you provide insight as to what's happening here? My theory is that, because it's in the Latin set, using UTF8 or UTF16 causes a conversion to a larger set of bytes which results in the error, especially for the varchar(1000)'s. If I use Latin as my session character set, no conversion is done and I get the string in its native encoding. As for the issue in question, UTF8 fails because the encoding cannot be "downgraded"?
Per request, here is the DDL of the table in question:
CREATE MULTISET TABLE mydb.mytable ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
FIELD1 VARCHAR(214) CHARACTER SET LATIN CASESPECIFIC NOT NULL,
FIELD2 VARCHAR(30) CHARACTER SET UNICODE CASESPECIFIC,
FIELD3 VARCHAR(60) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
FIELD4 VARCHAR(4000) CHARACTER SET UNICODE CASESPECIFIC,
FIELD5 VARCHAR(900) CHARACTER SET UNICODE CASESPECIFIC,
FIELD6 VARCHAR(900) CHARACTER SET UNICODE CASESPECIFIC,
FIELD7 VARCHAR(900) CHARACTER SET UNICODE CASESPECIFIC,
FIELD8 VARCHAR(900) CHARACTER SET UNICODE CASESPECIFIC,
FIELD9 VARCHAR(900) CHARACTER SET UNICODE CASESPECIFIC,
FIELD10 VARCHAR(900) CHARACTER SET UNICODE CASESPECIFIC,
FIELD11 VARCHAR(3600) CHARACTER SET UNICODE CASESPECIFIC,
FIELD12 VARCHAR(3600) CHARACTER SET UNICODE CASESPECIFIC,
FIELD13 VARCHAR(3600) CHARACTER SET UNICODE CASESPECIFIC,
FIELD14 VARCHAR(3600) CHARACTER SET UNICODE CASESPECIFIC)
PRIMARY INDEX ( FIELD1 );
Without seeing your table definition have you considered using UTF16 instead of UTF8 for your SESSION CHARSET?
Some more research on your error message found this post suggesting that UTF16 may afford you the ability to return records that UTF8 otherwise will not.
Edit:
If you recall from the link that I shared above, for a given VARCHAR(n) the bytes to store would be as follows:
LATIN: n bytes
UTF8: n*3 bytes
UTF16: n*2 bytes
This would mean that a VARCHAR(4000) UNICODE field in a UTF8 session should require 12KB. If you have to deal with UNICODE data consistently it may be to your advantage to leave or change your default session character set to UTF16. In my experience I have not had to work with UNICODE data so I couldn't tell you if what the pitfalls to changing your character set may introduce for LATIN data elsewhere in your database(s).
Hope this helps.
I have an SQLite table that contains a BLOB I need to do a size/length check on. How do I do that?
According to documentation length(blob) only works on texts and will stop counting after the first NULL. My tests confirmed this. I'm using SQLite 3.4.2.
I haven't had this problem, but you could try length(hex(glob))/2
Update (Aug-2012):
For SQLite 3.7.6 (released April 12, 2011) and later, length(blob_column) works as expected with both text and binary data.
for me length(blob) works just fine, gives the same results like the other.
As an additional answer, a common problem is that sqlite effectively ignores the column type of a table, so if you store a string in a blob column, it becomes a string column for that row. As length works different on strings, it will then only return the number of characters before the final 0 octet. It's easy to store strings in blob columns because you normally have to cast explicitly to insert a blob:
insert into table values ('xxxx'); // string insert
insert into table values(cast('xxxx' as blob)); // blob insert
to get the correct length for values stored as string, you can cast the length argument to blob:
select length(string-value-from-blob-column); // treast blob column as string
select length(cast(blob-column as blob)); // correctly returns blob length
The reason why length(hex(blob-column))/2 works is that hex doesn't stop at internal 0 octets, and the generated hex string doesn't contain 0 octets anymore, so length returns the correct (full) length.
Example of a select query that does this, getting the length of the blob in column myblob, in table mytable, in row 3:
select length(myblob) from mytable where rowid=3;
LENGTH() function in sqlite 3.7.13 on Debian 7 does not work, but LENGTH(HEX())/2 works fine.
# sqlite --version
3.7.13 2012-06-11 02:05:22 f5b5a13f7394dc143aa136f1d4faba6839eaa6dc
# sqlite xxx.db "SELECT docid, LENGTH(doccontent), LENGTH(HEX(doccontent))/2 AS b FROM cr_doc LIMIT 10;"
1|6|77824
2|5|176251
3|5|176251
4|6|39936
5|6|43520
6|494|101447
7|6|41472
8|6|61440
9|6|41984
10|6|41472