I'm using MySQL 5 on shared hosting, connecting from ASP.NET 3.5 using the MySQL 5.1 ODBC driver. I'd like to store UTF8 strings. My tables used to be all in "latin1_swedish_ci", but I converted the the database, table, and column to UTF8 using:
ALTER DATABASE `my_db` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ;
ALTER TABLE `my_table` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ;
ALTER TABLE `my_table` CHANGE `subject` `subjext` TEXT CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL
But I still get this error when inserting non-ascii characters (like "遊ぶ")into my database using an ODBCConnection and ODBCCommand:
ERROR [HY000] [MySQL][ODBC 5.1 Driver][mysqld-5.0.51b-community-nt]Incorrect string value: '\xE3\x80\x80\xE6\x89\x8B...' for column 'subject' at row 1
Note that since I'm using the 5.1 driver, I can't use "SET NAMES utf8;" - it produces an error.
Any ideas what I'm missing?
A few of things to check:
Make sure your tables and text fields really accept utf8:
Use the MySQL Query Browser and try to edit some data manually.
If it stays OK once you saved your edits, then the fields and tables are correctly set.
Make sure that you are actually inserting utf8 compliant-characters and not characters based on another form of encoding such as GB2332 for Chinese.
If that's the case, you may need to convert the strings to utf8 before you can send them to the database.
You can have a look at using the Encoding class in .Net.
I've answered a post on something related a while ago and there is a CodeProject article on just this issue.
You probably also need to make sure that the ODBC connection string includes the following:
CharSet=utf8;
There is a list of all the parameters you can use for the ODBC conection.
Related
I have tested to use AES_DECRYPT on a .asp page and it only showed ???? instead of the descypted value in clear text, when I used this select query.
select *,AES_DECRYPT(thepassword,'myencyptkey2018' ) AS passw from personal
But if I use this with convert using utf8 then it displays the text value.
select *,CONVERT(AES_DECRYPT(thepassword,'myencyptkey2018' ) USING utf8) AS passw from personal
My mySql databas is set to use charset utf8, my .asp uses charset utf8, the connection string as well, as far as I know I'm using utf8 everywhere. So my question is, why do I need to use convert using utf8?
Why isn´t the first select code above work? Where in my settings is it not using utf8? If anywhere?
Thanks.
This is the variables that you might want to see?
We are currently extracting several Teradata .TPT files that we will upload to AWS S3, however the files are coming with ANSI encode
I need them to come with encode UTF-8
You must specify the character set in your TPT script. At the top add:
USING CHARACTER SET UTF8
The tricky part is that UTF8 here has 3 bytes per character, so in your DEFINE SCHEMA you must triple the size of each field.
For example if your schema looks like:
DEFINE SCHEMA s_some_export
(
status VARCHAR(20),
userid VARCHAR(20),
firstname VARCHAR(64),
);
You'll have to triple the values to accommodate your UTF8 characters:
DEFINE SCHEMA s_some_export
(
status VARCHAR(60),
userid VARCHAR(60),
firstname VARCHAR(192),
);
Sometimes, because I'm lazy, I define my TPT with USING CHARACTER SET UTF16 so that I only need double each field size (the math is easier). BUT it means I have to convert it to UTF8 after extraction. In Linux this would just be iconv -f UTF-16LE -t UTF-8 myoutputfile.csv > myoutputfile.utf8.csv
Some caveats:
If your table's field is defined as CHAR and CHARACTER SET LATIN then you may run into column size issues with your schema. see here
Dates and Timestamps can get wierd as they don't need to be doubled so defining them as VARCHAR in your schema can get you into trouble. You may have to fuss around a bit here. My suggestion would be to change the view from which you are selecting the data for you TPT and CAST(yourdate AS VARCHAR(10)) as yourdate and then use VARCHAR(30) in your schema so you don't have to think about the field types while defining your schema. This means extra CPU overhead in your extraction, but unless you are running tight on resources I think it's worth it. I'm also very lazy that way and always happy to just get the damned TPT to extract data without much debugging.
I'm using ROracle to query a database with a VARCHAR2 field containing some Unicode characters. When I access the database directly or via RJDBC, I have no issues with pulling this data.
When I pull the data with ROracle, I get ????? instead of the text.
In OCI you have use env. variable NLS_LANG. For example:
NLS_LANG=AMERICAN_AMERICA.AL32UTF8
will make OCI client return all strings returned in UTF8. This should work, if internal string representation in R also uses UTF8. Then ROracle can make simple binary copy from one buffer into another buffer.
Oracle uses question marks in case when it can not translate char into target code page.
I have an existing database where they created theiw own unicode collation sequence. I'm trying to use the following code and get a "no such collation sequence" exception. Can anybdy hlep with the the syntax to use "collate nocase" with this code?
update Songs set
SongPath = replace (SongPath, 'Owner.Funkytown', 'Jim');
Dump database (via shell), edit output SQL (find and change column definitions, set COLLATION NOCASE). Recreate database.
I have a classic ASP page that gets POSTed to. The data gets POSTed as UTF-8 (I can see this in Fiddler). I then open an ADODB connection to a database and store the data in a VARCHAR field. If the data can be represented by 8859-1 (e.g. iñtërnâtiônàlizætiøn) it is stored correctly in the varchar field. If I try strings that can't be mapped to 8859 (e.g. Здравствуйте!) I get ????????????!. This all makes sense as the varchar field cannot hold unicode. I also understand the using an nvarchar field should enable me to store utf-8 strings.
My question is this. What settings in SQL Server or in the ADODB object control how the strings are converted from UTF-8 to 8859-1? Does VBScript (ASP) send the strings to ADODB.Connection.Execute as UTF-8 (or what I think it is actually doing - UTF-16) and the database itself handles the conversion? Is this controlled by the collation of the database (SQL_Latin1_General_CP1_CI_AS in this case)?
If you switch to using NVARCHAR instead then you'll need to remember to use the N specifier in your SQL commands like so whenever you use a string which is Unicode
INSERT INTO SOME_TABLE (someField) VALUES (N'Some Unicode Text')
SELECT * FROM SOME_TABLE WHERE someField=N'Some Unicode Text'
If you don't do this then the strings won't get treated as Unicode and your data will be silently converted to Latin1 or whatever the default character set for the relevant database/table/field even if that field is a NVARCHAR
You are correct.
VBScript and ADODB only know strings as Unicode (or UTF-16 as its sometimes refered to).
Its part of the DBs collation settings that determine how the VARCHAR fields are encoded.
In SQL_Latin1_General_CP1_CI_AS its really the CP1 bit which is determining the CodePage to use. In this case 1 is a legacy reference to Windows-1252 which is a superset of ISO-8859-1.