SQLite database supporting Unicode data - sqlite

I'm using java swing application which needs unicode string to drag into jtable.Is it possible to store unicode data in SQLITE database? If so,which SQLite does support unicode..I need free sqlite not the premium..

SQLite always stores text data as Unicode, using the Unicode encoding specified when the database was created. The database driver itself takes care to return the data as the Unicode string in the encoding used by your language/platform.
If you have conversion problems, either your application tried to store an ASCII string without converting it to Unicode, or you tried to read one value and force a conversion on it.
SQLite uses a kind of dynamic typing, where each value is stored using a specific storage class. A column's type specifies the affinity or how the value is treated. For example:
A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, the storage class of the text is converted to INTEGER or REAL
There are five storage classes, NULL, INTEGER, REAL, TEXT, BLOB. TEXT stores string data using the Unicode encoding specified for the database (UTF-8, UTF-16BE or UTF-16LE).
What specific problem are you facing, or is this a general question?

SQLite always uses Unicode strings.

sqlite3 doesn't fully support UNICODE. There is a wrapper class called CppSQLite3 which fully supports UNICODE>

Related

Does SQLite actually support DATE type?

After reading https://sqlite.org/datatype3.html which states
"SQLite does not have a storage class set aside for storing dates
and/or times."
but able to run this
CREATE TABLE User (ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, BORN_ON DATE NULL)
and then see it in "DB Browser for SQL" like this:
I start to wonder if SQLite does support Date type of it is just "faking" the support using other types. And even if so why the DB Browser see it as a Date? Any meta info stored inside the DB?
SQLite does not fake Date with Numerics.
There is no Date data type in SQLite.
In Datatypes In SQLite Version 3 it is explained clearly that:
SQLite uses a more general dynamic type system
Instead of data types there are 5 Storage Classes: NULL, INTEGER, REAL, TEXT and BLOB.
Also:
Any column in an SQLite version 3 database, except an INTEGER PRIMARY
KEY column, may be used to store a value of any storage class.
So when you use Date as the data type of a column in the CREATE TABLE statement you are not restricted to store in it only date-like values. Actually you can store anything in that column.
Tools like "DB Browser for SQLite" and others may offer various data types to select from to define a column when you create the table.
The selection of the data type that you make is not restrictive, but it is rather indicative of what type of data you want to store in a column.
In fact, you can create a table without even declaring the data types of the columns:
CREATE TABLE tablename(col1, col2)
or use fictional data types:
CREATE TABLE tablename(col1 somedatatype, col2 otherdatatype)
and insert values of any data type:
INSERT INTO tablename(col1, col2) VALUES
(1, 'abc'),
('XYZ', '2021-01-06'),
(null, 3.5)
Based on what Colonel Thirty Two suggested (read more on the page) it seems that when you declare a field as Date its affinity will be numeric.
So SQLite "fakes" Date with Numerics.
And even if so why the DB Browser see it as a Date? Any meta info stored inside the DB?
Yes, it simply stores the type name used when the column was created. The linked page calls it "declared type". In this case you get NUMERIC affinity (DATE is even given as one of the examples in 3.1.1) and it behaves like any other column with this affinity:
A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, the storage class of the text is converted to INTEGER or REAL (in order of preference) if the text is a well-formed integer or real literal, respectively. If the TEXT value is a well-formed integer literal that is too large to fit in a 64-bit signed integer, it is converted to REAL. For conversions between TEXT and REAL storage classes, only the first 15 significant decimal digits of the number are preserved. If the TEXT value is not a well-formed integer or real literal, then the value is stored as TEXT. For the purposes of this paragraph, hexadecimal integer literals are not considered well-formed and are stored as TEXT. (This is done for historical compatibility with versions of SQLite prior to version 3.8.6 2014-08-15 where hexadecimal integer literals were first introduced into SQLite.) If a floating point value that can be represented exactly as an integer is inserted into a column with NUMERIC affinity, the value is converted into an integer. No attempt is made to convert NULL or BLOB values.
A string might look like a floating-point literal with a decimal point and/or exponent notation but as long as the value can be expressed as an integer, the NUMERIC affinity will convert it into an integer. Hence, the string '3.0e+5' is stored in a column with NUMERIC affinity as the integer 300000, not as the floating point value 300000.0.
So if you insert dates looking like e.g. "2021-01-05" they will be stored as strings. But
you can also insert strings which don't look like dates.
if you insert "20210105" it will be stored as the number 20210105.
You can use CHECK constraints to prevent inserting non-date strings.
See also https://sqlite.org/lang_datefunc.html which says what (string and number) formats date/time functions expect.

How to extract a Teradata .TPT file with UTF-8 encoding

We are currently extracting several Teradata .TPT files that we will upload to AWS S3, however the files are coming with ANSI encode
I need them to come with encode UTF-8
You must specify the character set in your TPT script. At the top add:
USING CHARACTER SET UTF8
The tricky part is that UTF8 here has 3 bytes per character, so in your DEFINE SCHEMA you must triple the size of each field.
For example if your schema looks like:
DEFINE SCHEMA s_some_export
(
status VARCHAR(20),
userid VARCHAR(20),
firstname VARCHAR(64),
);
You'll have to triple the values to accommodate your UTF8 characters:
DEFINE SCHEMA s_some_export
(
status VARCHAR(60),
userid VARCHAR(60),
firstname VARCHAR(192),
);
Sometimes, because I'm lazy, I define my TPT with USING CHARACTER SET UTF16 so that I only need double each field size (the math is easier). BUT it means I have to convert it to UTF8 after extraction. In Linux this would just be iconv -f UTF-16LE -t UTF-8 myoutputfile.csv > myoutputfile.utf8.csv
Some caveats:
If your table's field is defined as CHAR and CHARACTER SET LATIN then you may run into column size issues with your schema. see here
Dates and Timestamps can get wierd as they don't need to be doubled so defining them as VARCHAR in your schema can get you into trouble. You may have to fuss around a bit here. My suggestion would be to change the view from which you are selecting the data for you TPT and CAST(yourdate AS VARCHAR(10)) as yourdate and then use VARCHAR(30) in your schema so you don't have to think about the field types while defining your schema. This means extra CPU overhead in your extraction, but unless you are running tight on resources I think it's worth it. I'm also very lazy that way and always happy to just get the damned TPT to extract data without much debugging.

The Difference between SQLite NVARCHAR and NVARCHAR2

I don't know what is the difference between SQLite NVARCHAR and NVARCHAR2 column.
I know that NVARCHAR is a Unicode-only text column, but what about NVARCHAR2?
There is a difference. In a way...
Here´s the thing:
As Lasse V. Karlsen says, SQLite does not act on the types you mentioned nor does it restrict the length by an argument passed in like in NVARCHAR(24) (but you could do check constraints to restrict length).
So why are these available in SQLite Expert (and other tools)?
This info will be saved in the database schema (please check https://www.sqlite.org/datatype3.html#affinity and http://www.sqlite.org/pragma.html#pragma_table_info) So should you bother to set these when creating a SQLite db as it will not be used by SQLite?
Yes if you will be using any tool to generate code from the schema! Maybe somebody will ask you to transfer the db to MSSQL, then there are some great tools that will use the schema and will map your SQLite to MSSQL in a blink. Or maybe you will use some .NET tool to map the tables into POCO classes, and these can also use the schema to map to the correct type and they will also use the restrictions and transfer these into data annotations on the properties that the columns map to. And EntityFramework 7 will have support built in for SQLite and their code generation will surely make use of the schema.
There is no difference.
SQLite does not operate with strict data types like that, it has "storage classes".
If you check the official documentation you'll find this rule, one of five used to determine which storage class to assign to a column from the data type you specify:
If the declared type of the column contains any of the strings "CHAR", "CLOB", or "TEXT" then that column has TEXT affinity. Notice that the type VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity.
There are 5 rules in total but rule 2 covers NVARCHAR and NVARCHAR2 and both will assign the storage class TEXT to the column.

Oracle character set data is not being displayed in asp.net

I have stored data in the Oracle database 9i in the WE8MSWIN1252 characterset in URDU and now I want to display this data on browser with asp.net, but the data is not being displayed accurately it is displaying in chinese-like language.
Can anyone tell me how can translate this data into actual URDU form?
You cannot properly encode Urdu data in CHAR or VARCHAR2 data types if the database character set is Windows-1252. You can only encode the characters that are part of the Windows-1252 character set which is a Western European character set. You would need to either use NCHAR and NVARCHAR2 data types (assuming your national character set supports Urdu) or change the database character set of the database. There is a chapter in the Globalization Support Guide that discusses how to change the character set of an existing database.

Classic ASP, SQL Server and character encodings

I have a classic ASP page that gets POSTed to. The data gets POSTed as UTF-8 (I can see this in Fiddler). I then open an ADODB connection to a database and store the data in a VARCHAR field. If the data can be represented by 8859-1 (e.g. iñtërnâtiônàlizætiøn) it is stored correctly in the varchar field. If I try strings that can't be mapped to 8859 (e.g. Здравствуйте!) I get ????????????!. This all makes sense as the varchar field cannot hold unicode. I also understand the using an nvarchar field should enable me to store utf-8 strings.
My question is this. What settings in SQL Server or in the ADODB object control how the strings are converted from UTF-8 to 8859-1? Does VBScript (ASP) send the strings to ADODB.Connection.Execute as UTF-8 (or what I think it is actually doing - UTF-16) and the database itself handles the conversion? Is this controlled by the collation of the database (SQL_Latin1_General_CP1_CI_AS in this case)?
If you switch to using NVARCHAR instead then you'll need to remember to use the N specifier in your SQL commands like so whenever you use a string which is Unicode
INSERT INTO SOME_TABLE (someField) VALUES (N'Some Unicode Text')
SELECT * FROM SOME_TABLE WHERE someField=N'Some Unicode Text'
If you don't do this then the strings won't get treated as Unicode and your data will be silently converted to Latin1 or whatever the default character set for the relevant database/table/field even if that field is a NVARCHAR
You are correct.
VBScript and ADODB only know strings as Unicode (or UTF-16 as its sometimes refered to).
Its part of the DBs collation settings that determine how the VARCHAR fields are encoded.
In SQL_Latin1_General_CP1_CI_AS its really the CP1 bit which is determining the CodePage to use. In this case 1 is a legacy reference to Windows-1252 which is a superset of ISO-8859-1.

Resources