SQLite UTF-8 inconsistency - sqlite

I have a SQLite file, with certain pronunciations of words such as /ˈdɪkʃən(ə)ɹi/ in records. However, when I view it in any SQLite browser on a Mac, I see funny characters like /ÃËdêkÃÆÃâ¢n(Ãâ¢)ùi,ÃËdêkÃÆÃâ¢nÃâºÃ¹i/, but when I use SQLite2009 Pro on a Windows, I see the characters encoded properly.
I have also placed PRAGMA encoding = "UTF-8"; but to no avail.
What is going on here?

An old question now, but the browsers you were using were broken and can't properly display UTF-8; there was nothing wrong with your input or storage. What you got back was exactly what you'd expect to see from something that doesn't support UTF-8.
Instead, use the cross-platform SQLite Manager for FireFox

Related

How to display foreign characters?

I just downloaded brackets after hearing that it will probably be the next big editor. I know it is still in beta, but does anyone know how to display foreign characters in this editor? When I try to display a foreign character, this symbol � is displayed.
Can anyone help?
Brackets currently only supports UTF-8 encoded files.
There's an item in the feature backlog to support other encodings, so you should upvote it if that's an important use case for you.
If you're pretty certain that the file you're working with is UTF-8 encoded (bearing in mind that it's tricky to tell for sure), then it sounds like you've hit a bug. As a Brackets core contributor, I definitely encourage you to file it in our issue tracker — and please include a link to the file in question if at all possible.

How to detect wrong encoding declaration?

I am building a ASP.NET webservice loading other webpages and then hand it clients.
I have been doing quite well with character code treatment, reading the meta tag from HTML then use that codeset to read the file.
But nevertheless, some less educated users just don't understand code sets. They declare a specific encoding method e.g. "gb2312", but in fact, he is just using normal UTF8. When I use gb2312 to decode the text, everything turns out a holy mess.
How can I detect whether the text is properly decoded? I loaded that page into my IE, which correctly use UTF-8 to decode the page. How does it achieve that?
Based on the BOM you can tell what encoding is used.
BOM and encoding
If you want to detect character set you could use the C# port of mozilla's character set detector.
CharDetSharp
If you want to make it extra sure that you are using a correct one, you maybe could be looking for special characters that are not supposed to be there. It is not very likely to include "óké". So you could be looking for such characters and try to use different encoding/character set to process your file.
Actually it is really hard to make your application completely "fool-proof".

How to repair unicode letters?

Someone in email sent me letters like this
IVIØR†€™
correct should be
IVIØR†€™
suppose to be
How do I represent them in their original Portuguese langauge, it got altered after being passed through HTTP GET request.
I probably will not be able to fix the site.. but maybe create a repair tool to repair these broken encoded letters? or anyone know of any repair tool? or how to do it manually by hand? Seems like nothing is lost.. just badly interpreted
What happened here is that UTF-8 got misinterpreted as ISO-8859-1; and then other kinds of mangling (the bad ISO-8859-1 string being re-UTF-8-encoded; the non-breaking space character '\xA0' being converted to regular space '\x20') seem to have happened afterward, though those may just be a result of pasting it into Stack Overflow.
Due to the subsequent mangling, there's no really good way to completely undo it, but you can largely undo it by passing it through a not-very-strict UTF-8 interpreter. For example, if I save "IVIØR†€™" as a text-file on my computer, using Notepad, with the "ANSI" (single-byte) encoding, and then I open it in Firefox and tell it to interpret it as UTF-8 (Firefox > Web Developer > Character Encoding > Unicode (UTF-8)), then it displays "IVIØR� €™". (The "�" is because of the '\xA0' having been changed to '\x20', which broke the UTF-8 encoding.)
They're probably not broken. It's just a difference between the encoding they were sent in, vs. the decoding you're viewing them in.
Figure out what encoding was originally used, and use the same one to decode it, and it should look like the original. In terms of writing a "fix-it" tool, you'd always need to know what encoding they were originally created in, which can be complicated depending on the source, and whether or not you have access to said information.

SQLite character encoding for Google Gears

We're using jQuery to get a JSON-string from our server (UTF-8 response, also UTF-8 request through jQuery) and put this JSON into a Google Gears WorkerPool. This workerpool processes the JSON and stores it into a Gears database (SQLite).
It turns out that, apparently, SQLite stores data using iso-8859-1 rather than UTF-8. Since we're trying to store user names that might contain Cyrillic characters (and others that you might encounter in Europe), this goes horribly wrong.
Can anyone tell me how to change the character encoding in either the Gears WorkerPool or the SQLite database that Gears employs? Of course, if I'm looking in the wrong direction with my problem, feel free to offer alternatives!
Unfortunately, HTML5 isn't an option as we're supposed to support IE7 primarily.
Try "PRAGMA encoding='utf-8' " before you define any tables.
see This link
And this link for SQLites PRAGMA syntax

Adobe Flex fails on unicode / foreign input in Linux

I was learning flex for a few days now and suddenly noticed that input of unicode / foreign characters on Linux into TextInput, TextArea or RichTextEditor gives you unreadable text composed of several characters (seems like utf-8 is making things bad). On the other hand, output is flawless.
I was trying hard to find anything for the same issue on the internet, but only this old blog entry could be seen. Author produced temporary solution but it is not sufficient.
So if Windows allows unicode and Linux doesn't, what should I do? Maybe the problem is on my machine only? Did anybody come up with the same problem and maybe the solution?
I have Adobe Flash 10.0.32.18 installed on my Sabayon Linux box.
Might have something to do with this bug:
Incorrect unicode input in linux
Which, apparently, will get fixed once FP 10.1 is released.
Just to further update the answer. Flex 4 components support unicode and the unicode characters can be typed into input controls using Google Chrome, Firefox 3.6+ and IE7+ .
For Java MySQL users
database.url=jdbc:mysql://localhost:3306/sampledb?useUnicode=true&characterEncoding=utf-8
To allow utf8 data-write operations.
Database table and columns must be set to utf8_* encoding to make sure the unicode data can be stored in the tables.

Resources