How do I quote a UTF-8 String Literal in Sqlite3 - sqlite

I'm looking to encode and store Unicode in a Sqlite database. Is there any way to raw encode a UTF-8 (unicode) string literal in a sql query.
I'm looking for something similar to java where I can toss a \u00E9 into a string and have it automagically upconvert to Unicode.

What language are you using? SQLite handles Unicode just fine, creating the literals in your hosting language is less obvious.
$ sqlite3 junk.sqlite
SQLite version 3.6.22
sqlite> create table names (id integer primary key, name string);
sqlite> insert into names values (null,
'î℉ yõù g𐌹ѷЄ ΣϘГくטƏ UTF-8, it stores it');
sqlite> select * from names;
1|î℉ yõù g𐌹ѷЄ ΣϘГくטƏ UTF-8, it stores it

SQLite doesn't have escape sequences. But your programming language probably does.
# in Python
db.execute("INSERT INTO MyTable(MyColumn) VALUES('\u00E9')")
or
db.execute("INSERT INTO MyTable(MyColumn) VALUES(?)", ['\u00E9'])
If for some reason you have to write a UTF-8 literal in pure SQL, you can do something like:
sqlite> SELECT CAST(X'C3A9' AS TEXT);
é
Edit: Since this answer was originally written, a CHAR function has been added to SQLite. So now, you could write
INSERT INTO MyTable(MyColumn) VALUES(CHAR(233))

If your problem is reinterpretation of escape sequences in sqlite you can (ab)use json_extract eg.
UPDATE `tableToFix` SET `columnToFix` = json_extract('"' || `columnToFix` || '"', '$');
INSERT INTO test VALUE (json_extract('"P\u0159\u00edli\u0161 \u017elu\u0165ou\u010dk\u00fd k\u016f\u0148 \u00fap\u011bl \u010f\u00e1belsk\u00e9 \u00f3dy."', '$'));
Notice: quotes handling. Valid json string starts and ends with " so you must add them before use of json_extract

If you configure your database to use UTF-8 (I believe this is default for many installations; do PRAGMA encoding="UTF-8"; at schema creation time to be certain), this shouldn't be an issue.
If you send SQLite3 a set of characters encoded in UTF-8, it should have no problem dealing with it.
If Java has the ability to allow you to "toss a \u0039 into a string", I'd just use that, and ensure that when you try to place the string into the database, that you have the string convert to a UTF-8 byte encoding using whatever mechanism Java provides. I do not believe SQLite provides or needs to provide this for you.

Related

How to extract a Teradata .TPT file with UTF-8 encoding

We are currently extracting several Teradata .TPT files that we will upload to AWS S3, however the files are coming with ANSI encode
I need them to come with encode UTF-8
You must specify the character set in your TPT script. At the top add:
USING CHARACTER SET UTF8
The tricky part is that UTF8 here has 3 bytes per character, so in your DEFINE SCHEMA you must triple the size of each field.
For example if your schema looks like:
DEFINE SCHEMA s_some_export
(
status VARCHAR(20),
userid VARCHAR(20),
firstname VARCHAR(64),
);
You'll have to triple the values to accommodate your UTF8 characters:
DEFINE SCHEMA s_some_export
(
status VARCHAR(60),
userid VARCHAR(60),
firstname VARCHAR(192),
);
Sometimes, because I'm lazy, I define my TPT with USING CHARACTER SET UTF16 so that I only need double each field size (the math is easier). BUT it means I have to convert it to UTF8 after extraction. In Linux this would just be iconv -f UTF-16LE -t UTF-8 myoutputfile.csv > myoutputfile.utf8.csv
Some caveats:
If your table's field is defined as CHAR and CHARACTER SET LATIN then you may run into column size issues with your schema. see here
Dates and Timestamps can get wierd as they don't need to be doubled so defining them as VARCHAR in your schema can get you into trouble. You may have to fuss around a bit here. My suggestion would be to change the view from which you are selecting the data for you TPT and CAST(yourdate AS VARCHAR(10)) as yourdate and then use VARCHAR(30) in your schema so you don't have to think about the field types while defining your schema. This means extra CPU overhead in your extraction, but unless you are running tight on resources I think it's worth it. I'm also very lazy that way and always happy to just get the damned TPT to extract data without much debugging.

Escape chars for SQLite3 command (without prepare)

I am creating a C++ program which will output a series of SQL statements (create, insert, etc) and write them to a file. This file will be used to create and populate a SQLite3 database.
I need to ensure that and values inserted are properly escaped so they can fit within the double quoted string (in the insert statement). Since there is no SQLite database available (this program just writes to a text file), I cannot use prepare. Can someone tell me which characters need to be escaped and how?
So far I've only found that the ' character needs to be escaped with another '
Inside a string, the only character to be escaped is the quote ' itself.
As for table/column names, you need to quote them if they conflict with SQL keywords.

Opposite of HEX() in SQLite?

I have this simple query that returns a bunch of guids as hexadecimal strings:
SELECT HEX(guid) FROM table;
One of them is for instance 43F4124307108902B7A919F4D4D0770D. Then imagine I want to get the record with this guid, so I write a query like this:
SELECT * FROM table WHERE guid = '43F4124307108902B7A919F4D4D0770D';
Of course, this will not work, since the string is directly interpreted as a blob and not converted to it's hex value. I looked here, but couldn't find anything that looks like a method that takes a hexadecimal string and converts it to a blob.
While writing the question I found the answer. I simply had to add an X before the string. Like this:
SELECT * FROM table WHERE guid = X'43F4124307108902B7A919F4D4D0770D';
I figured I should post the question anyway, since non of the "Similar Questions" answers this. What I was looking for was not a function, but a literal and when I realized this I quickly found the answer here.
In the upcoming version it's supported:
SQLite version 3.41.0 2023-02-08 12:47:37
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select unhex('41')
...> ;
A
sqlite> select hex('a')
...> ;
61
sqlite> .q

Syntax for using collate nocase in a SQLite replace function

I have an existing database where they created theiw own unicode collation sequence. I'm trying to use the following code and get a "no such collation sequence" exception. Can anybdy hlep with the the syntax to use "collate nocase" with this code?
update Songs set
SongPath = replace (SongPath, 'Owner.Funkytown', 'Jim');
Dump database (via shell), edit output SQL (find and change column definitions, set COLLATION NOCASE). Recreate database.

Classic ASP, SQL Server and character encodings

I have a classic ASP page that gets POSTed to. The data gets POSTed as UTF-8 (I can see this in Fiddler). I then open an ADODB connection to a database and store the data in a VARCHAR field. If the data can be represented by 8859-1 (e.g. iñtërnâtiônàlizætiøn) it is stored correctly in the varchar field. If I try strings that can't be mapped to 8859 (e.g. Здравствуйте!) I get ????????????!. This all makes sense as the varchar field cannot hold unicode. I also understand the using an nvarchar field should enable me to store utf-8 strings.
My question is this. What settings in SQL Server or in the ADODB object control how the strings are converted from UTF-8 to 8859-1? Does VBScript (ASP) send the strings to ADODB.Connection.Execute as UTF-8 (or what I think it is actually doing - UTF-16) and the database itself handles the conversion? Is this controlled by the collation of the database (SQL_Latin1_General_CP1_CI_AS in this case)?
If you switch to using NVARCHAR instead then you'll need to remember to use the N specifier in your SQL commands like so whenever you use a string which is Unicode
INSERT INTO SOME_TABLE (someField) VALUES (N'Some Unicode Text')
SELECT * FROM SOME_TABLE WHERE someField=N'Some Unicode Text'
If you don't do this then the strings won't get treated as Unicode and your data will be silently converted to Latin1 or whatever the default character set for the relevant database/table/field even if that field is a NVARCHAR
You are correct.
VBScript and ADODB only know strings as Unicode (or UTF-16 as its sometimes refered to).
Its part of the DBs collation settings that determine how the VARCHAR fields are encoded.
In SQL_Latin1_General_CP1_CI_AS its really the CP1 bit which is determining the CodePage to use. In this case 1 is a legacy reference to Windows-1252 which is a superset of ISO-8859-1.

Resources