Oracle Alias Encoding and Extended Characters - oracle11g

I'm working on a database access layer and have just noticed that Oracle 11g seems to have some issues handling non-latin characters in the aliases.
It seeems that characters over 0x7F, in an alias, appear to count as two characters as far as the 30 character alias length limit is concerned.
For instance in both Oracle SQL Developer and ODP.net:
SELECT
LENGTH('ÔÔÔÔÔÔÔÔÔÔÔÔÔÔÔ') "ÔÔÔÔÔÔÔÔÔÔÔÔÔÔÔ"
FROM DUAL
Works and reports a string length of 15, however:
SELECT
LENGTH('ÔÔÔÔÔÔÔÔÔÔÔÔÔÔÔx') "ÔÔÔÔÔÔÔÔÔÔÔÔÔÔÔx"
FROM DUAL
reports an ORA-00972: 'identifier too long' error.
This seems to imply that the alias string is being encoded in a way that means the accented characters are becoming two characters.
Is this expected and does anyone know what the actual restriction/encoding is here?
I need a reliable way to determine if a provided alias string is permitted.
For what it's worth the Oracle settings are as follows:
Client:
NLS_LANG = ENGLISH_UNITED KINGDOM.WE8MSWIN1252
Database:
NLS_CHARACTERSET = AL32UTF8
NLS_NCHAR_CHARACTERSET = AL16UTF16

column_name in dba_tab_cols is a varchar2(30 byte). That means that it stores up to 30 bytes of data. Your database character set is UTF-8 so each character may require up to 3 bytes of data which would mean, worst case, that you might be limited to 10 characters. Assuming that all your identifiers use valid Windows-1252 characters, I don't think any characters would require more than 2 bytes of storage.
If you're trying to determine whether an identifier is valid from a client programming language
Convert the identifier to UTF-8
Get the byte length of the UTF-8 encoded identifier
Check if the byte length is greater than 30

Related

How to extract a Teradata .TPT file with UTF-8 encoding

We are currently extracting several Teradata .TPT files that we will upload to AWS S3, however the files are coming with ANSI encode
I need them to come with encode UTF-8
You must specify the character set in your TPT script. At the top add:
USING CHARACTER SET UTF8
The tricky part is that UTF8 here has 3 bytes per character, so in your DEFINE SCHEMA you must triple the size of each field.
For example if your schema looks like:
DEFINE SCHEMA s_some_export
(
status VARCHAR(20),
userid VARCHAR(20),
firstname VARCHAR(64),
);
You'll have to triple the values to accommodate your UTF8 characters:
DEFINE SCHEMA s_some_export
(
status VARCHAR(60),
userid VARCHAR(60),
firstname VARCHAR(192),
);
Sometimes, because I'm lazy, I define my TPT with USING CHARACTER SET UTF16 so that I only need double each field size (the math is easier). BUT it means I have to convert it to UTF8 after extraction. In Linux this would just be iconv -f UTF-16LE -t UTF-8 myoutputfile.csv > myoutputfile.utf8.csv
Some caveats:
If your table's field is defined as CHAR and CHARACTER SET LATIN then you may run into column size issues with your schema. see here
Dates and Timestamps can get wierd as they don't need to be doubled so defining them as VARCHAR in your schema can get you into trouble. You may have to fuss around a bit here. My suggestion would be to change the view from which you are selecting the data for you TPT and CAST(yourdate AS VARCHAR(10)) as yourdate and then use VARCHAR(30) in your schema so you don't have to think about the field types while defining your schema. This means extra CPU overhead in your extraction, but unless you are running tight on resources I think it's worth it. I'm also very lazy that way and always happy to just get the damned TPT to extract data without much debugging.

SQLite database supporting Unicode data

I'm using java swing application which needs unicode string to drag into jtable.Is it possible to store unicode data in SQLITE database? If so,which SQLite does support unicode..I need free sqlite not the premium..
SQLite always stores text data as Unicode, using the Unicode encoding specified when the database was created. The database driver itself takes care to return the data as the Unicode string in the encoding used by your language/platform.
If you have conversion problems, either your application tried to store an ASCII string without converting it to Unicode, or you tried to read one value and force a conversion on it.
SQLite uses a kind of dynamic typing, where each value is stored using a specific storage class. A column's type specifies the affinity or how the value is treated. For example:
A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, the storage class of the text is converted to INTEGER or REAL
There are five storage classes, NULL, INTEGER, REAL, TEXT, BLOB. TEXT stores string data using the Unicode encoding specified for the database (UTF-8, UTF-16BE or UTF-16LE).
What specific problem are you facing, or is this a general question?
SQLite always uses Unicode strings.
sqlite3 doesn't fully support UNICODE. There is a wrapper class called CppSQLite3 which fully supports UNICODE>

What is the maximum size limit of varchar data type in sqlite?

I'm new to sqlite. I want to know the maximum size limit of varchar data type in sqlite?
can anybody suggest me some information related to this? I searched on sqlite.org site and they give the answer as:
Q. What is the maximum size of a VARCHAR in SQLite?
A. SQLite does not enforce the length of a VARCHAR. You can declare a VARCHAR(10) and SQLite will be happy to let you put 500
characters in it. And it will keep all 500 characters intact - it
never truncates.
but I want to know the exact max size limit of varchar datatype in sqlite.
from http://www.sqlite.org/limits.html
Maximum length of a string or BLOB
The maximum number of bytes in a
string or BLOB in SQLite is defined by
the preprocessor macro
SQLITE_MAX_LENGTH. The default value of this macro is 1 billion (1 thousand
million or 1,000,000,000). You can
raise or lower this value at
compile-time using a command-line
option like this:
-DSQLITE_MAX_LENGTH=123456789 The current implementation will only
support a string or BLOB length up to
231-1 or 2147483647. And some built-in
functions such as hex() might fail
well before that point. In
security-sensitive applications it is
best not to try to increase the
maximum string and blob length. In
fact, you might do well to lower the
maximum string and blob length to
something more in the range of a few
million if that is possible.
During part of SQLite's INSERT and
SELECT processing, the complete
content of each row in the database is
encoded as a single BLOB. So the
SQLITE_MAX_LENGTH parameter also
determines the maximum number of bytes
in a row.

What is the difference between related SQLite data-types like INT, INTEGER, SMALLINT and TINYINT?

When creating a table in SQLite3, I get confused when confronted with all the possible datatypes which imply similar contents, so could anyone tell me the difference between the following data-types?
INT, INTEGER, SMALLINT, TINYINT
DEC, DECIMAL
LONGCHAR, LONGVARCHAR
DATETIME, SMALLDATETIME
Is there some documentation somewhere which lists the min./max. capacities of the various data-types? For example, I guess smallint holds a larger maximum value than tinyint, but a smaller value than integer, but I have no idea of what these capacities are.
SQLite, technically, has no data types, there are storage classes in a manifest typing system, and yeah, it's confusing if you're used to traditional RDBMSes. Everything, internally, is stored as text. Data types are coerced/converted into various storage locations based on affinities (ala data types assigned to columns).
The best thing that I'd recommend you do is to :
Temporarily forget everything you used to know about standalone database datatypes
Read the above link from the SQLite site.
Take the types based off of your old schema, and see what they'd map to in SQLite
Migrate all the data to the SQLite database.
Note: The datatype limitations can be cumbersome, especially if you add time durations, or dates, or things of that nature in SQL. SQLite has very few built-in functions for that sort of thing. However, SQLite does provide an easy way for you to make your own built-in functions for adding time durations and things of that nature, through the sqlite3_create_function library function. You would use that facility in place of traditional stored procedures.
The difference is syntactic sugar. Only a few substrings of the type names matter as for as the type affinity is concerned.
INT, INTEGER, SMALLINT, TINYINT → INTEGER affinity, because they all contain "INT".
LONGCHAR, LONGVARCHAR → TEXT affinity, because they contain "CHAR".
DEC, DECIMAL, DATETIME, SMALLDATETIME → NUMERIC, because they don't contain any of the substrings that matter.
The rules for determining affinity are listed at the SQLite site.
If you insist on strict typing, you can implement it with CHECK constraints:
CREATE TABLE T (
N INTEGER CHECK(TYPEOF(N) = 'integer'),
Str TEXT CHECK(TYPEOF(Str) = 'text'),
Dt DATETIME CHECK(JULIANDAY(Dt) IS NOT NULL)
);
But I never bother with it.
As for the capacity of each type:
INTEGER is always signed 64-bit. Note that SQLite optimizes the storage of small integers behind-the-scenes, so TINYINT wouldn't be useful anyway.
REAL is always 64-bit (double).
TEXT and BLOB have a maximum size determined by a preprocessor macro, which defaults to 1,000,000,000 bytes.
Most of those are there for compatibility. You really only have integer, float, text, and blob. Dates can be stored as either a number (unix time is integer, microsoft time is float) or as text.
NULL. The value is a NULL value.
INTEGER. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
BLOB. The value is a blob of data, stored exactly as it was input.
As an addition to answer from dan04, if you want to blindly insert a NUMERIC other than zero represented by a TEXT but ensure that text is convertible to a numeric:
your_numeric_col NUMERIC CHECK(abs(your_numeric_col) <> 0)
Typical use case is in a query from a program that treats all data as text (for uniformity & simplicity, since SQLite already does so). The nice thing about this is that it allows constructs like this:
INSERT INTO table (..., your_numeric_column, ...) VALUES (..., some_string, ...)
which is convenient in case you're using placeholders because you don't have to handle such non-zero numeric fields specially. An example using Python's sqlite3 module would be,
conn_or_cursor.execute(
"INSERT INTO table VALUES (" + ",".join("?" * num_values) + ")",
str_value_tuple) # no need to convert some from str to int/float
In the above example, all values in str_value_tuple will be escaped and quoted as strings when passed to SQlite. However, since we're not checking explicitly the type via TYPEOF but only convertibility to type, it will still work as desired (i.e., SQLite will either store it as a numeric or fail otherwise).

Classic ASP, SQL Server and character encodings

I have a classic ASP page that gets POSTed to. The data gets POSTed as UTF-8 (I can see this in Fiddler). I then open an ADODB connection to a database and store the data in a VARCHAR field. If the data can be represented by 8859-1 (e.g. iñtërnâtiônàlizætiøn) it is stored correctly in the varchar field. If I try strings that can't be mapped to 8859 (e.g. Здравствуйте!) I get ????????????!. This all makes sense as the varchar field cannot hold unicode. I also understand the using an nvarchar field should enable me to store utf-8 strings.
My question is this. What settings in SQL Server or in the ADODB object control how the strings are converted from UTF-8 to 8859-1? Does VBScript (ASP) send the strings to ADODB.Connection.Execute as UTF-8 (or what I think it is actually doing - UTF-16) and the database itself handles the conversion? Is this controlled by the collation of the database (SQL_Latin1_General_CP1_CI_AS in this case)?
If you switch to using NVARCHAR instead then you'll need to remember to use the N specifier in your SQL commands like so whenever you use a string which is Unicode
INSERT INTO SOME_TABLE (someField) VALUES (N'Some Unicode Text')
SELECT * FROM SOME_TABLE WHERE someField=N'Some Unicode Text'
If you don't do this then the strings won't get treated as Unicode and your data will be silently converted to Latin1 or whatever the default character set for the relevant database/table/field even if that field is a NVARCHAR
You are correct.
VBScript and ADODB only know strings as Unicode (or UTF-16 as its sometimes refered to).
Its part of the DBs collation settings that determine how the VARCHAR fields are encoded.
In SQL_Latin1_General_CP1_CI_AS its really the CP1 bit which is determining the CodePage to use. In this case 1 is a legacy reference to Windows-1252 which is a superset of ISO-8859-1.

Resources