SQLite database shows question marks (???) instead of these Unicode characters (தமிழ்) - sqlite

I imported a CSV file containing Unicode into an SQLite database but instead of seeing the text, all that I see are question marks. Like this, "???". The encoding is UTF-8 (I've mentioned below what happened when I tried UTF-16). The SQLite manager I'm using is DB Browser for SQLite.
This is the Unicode that I typed: தமிழ்
Now, according to this answer in Stackoverflow, SQLite stores text data as Unicode. So the fact that my text is Unicode can't be the problem.
The characters I'm trying to use belong the language Tamil. I'm trying to use it with Unicode. According to Wikipedia, encoding for Tamil is called TACE16. It's a 16-bit Unicode based character encoding.
So then I set the encoding as UTF-16 when I imported the CSV file. But the file doesn't even show up in the database after importing when I do that. But it says import is successful.
Then I tried importing the CSV file with UTF-8 encoding as usual. But after importing I right clicked the row header, selected "Set Encoding" and set it to UTF-16. Now it didn't show question marks but it shows something like Chinese characters. This is what it shows now: 㼿㼿.
I tried setting TACE16 while importing. I also tried setting it manually. But it said it's either an incorrect encoding or it is not supported.
Further searching online didn't turn up anything. Could someone tell me how I can fix this issue? Basically, I want this text "தமிழ்" to show in the SQLite database after importing the CSV file which has the text.
Thank you so much. I would really appreciate your help.

I had similar issue once but in my case the problem were only on the DB software I used to visualize DB tables. Have you tried to retrieve your data from the database? Are they right when you retrieve them?
Anyways if you tell us what tools are you exactly using for doing what it is impossible to find a solution in your specific case.

OK, it turns out the issue was my csv file. I edited it in excel and I guess excel saved it using another encoding. I'm still not sure what's the exact issue but I'll just write about how I fixed it.
I opened Notepad and typed out the data separated by commas. I saved the file with the extensions csv. Here's the important thing. You have to change the encoding to Unicode. There's a drop down menu just left of the save button. Use that. Here's a link to a youtube video that shows you how.
Also, you don't need to type everything in a Notepad. It can get tedious.
Type everything out in Google Spreadsheets and export download it as a CVS file. It works. If you have to use Notepad, type the data in excel, concatenate everything in each row with using a formula, and copy paste it into a notepad. Don't forget to add a comma between each cell info using the formula in excel.

Related

Save css files using Notepad as Encoding of Ansi or UTF-8?

I'm new at web development and css,
One simple question came into my mind...
I know how to create a css file, but I'm not sure save this Encoding file as ANSI or UTF-8 when I save the file using Notepad?
I'm not sure which one is the best choice.
I searched on the internet, but I didn't find something helpful.
I want to know which one is the best choice for saving the file as css that will not be a problem in the future.
Please take a look at the attach image
Thanks for your helps.
Aattch01, when save a file using windows Notepad
I recommend you save in UTF8
pos if in a futuri your css has characters like
áéíóúñäëïöü, and others more than ANSI would only show an error.
A site where you can find those characters for example is in a content of a pseudelement like:
.menu .spain:after{
content:"España";
}
result UTF8: "España"
result ANSI: "Espa[]a"
Also recommend you to use an editor that gives you more comforts like Atom, which is very simple to use.
Although you may be afraid of it, is a simple editor with many aids to program

Automatic generation of RTF table

I need to export some information that exist in a Linux text file into Windows Word file containing a few tables.
Is there a ready made tool that will create a nice RTF table in Linux?
The input can be CSV, or maybe some other simple table format.
I've tried Googling, but most results are the vice verse (create simple txt from RTF).
I've tried to write something myself, (according to Using Tables in RTF)
but encountered some problems, and thought that maybe there is no need to re-invent the wheel...
Thanks
:)
You can create the table in html and convert it to rtf using unoconv. It requires a recent LibreOffice or OpenOffice with UNO bindings.

how to convert MS word Unicode 2-byte Cyrillic to CP866 1-byte Cyrillic

I am having an issue with a piece of hardware that only contains the CP866 library/code page for Cyrillic. The text that I want to display is currently in MS Word and I need to convert it to the CP866 in a text file. (I know it just keeps getting worse!)
I am aware that MS Word uses Unicode to display Cyrillic and if i am not mistaken it uses the UTF-16. So if I try to copy it to NP++, which from what I can tell only uses UTF-8, the HEX value changes.
For example HEX values for 'й': UTF-16 is 0439 but UTF-8 is d0b9 but what I need is CP866 HEX 89.
Now I wish I could use different hardware, but it is what it is. Does anyone know the best way to make this happen? Maybe a different Text Editor someone could suggest.
Thanks for the help
I think I figured it out.
Open the .doc file, go to Word Options under the main round office button. Advanced tab -> General tab -> check Confirm file format conversion on open. click ok. close that file
Reopen the .doc file. Save as, change type to Plan text (.txt), file conversion should pop up. choose Cyrillic (DOS). click OK. new pop-up about something might not display, blah blah blah... click Yes.
Close the file.
go to the file and open it in NP++. everything looks all strange because its now displaying the format based on the ANSI map... BUT, the HEX values seem (I have not completely verified) to be the correct CP866. Now I can load my hardware.
I will be working on this for another day or two. I will report back if this did not work correctly.
Take a day off and come back later. It always seems to work. Hope this helps out anyone else who maybe experiencing Similar issues.
Best!

Check the encoding of text in SQlite

I'm having a nightmare dealing with non Eurpean texts in SQlite. I think the problem is that SQlite isn't encoding the text in UTF8. So I want to check what the encoding is, and hopefully change it to utf8. I encoded a CSV in UTF8 and simply imported it to SQlite but the non-roman text is garbled.
I would like to know:
1)how to check the encoding.
2)How to change the encoding if it is not utf8. I've been reading about Pragma encoding, but I'm not sure how to use this.
I used OpenOffice 3 to create a spreadsheet with half ENglish and half Japanese text. Next I saved the file as a CSV using utf8. This part seems to be ok. I also tried to do it using Google Docs and it worked fine. Next I opened SQlite Browser and did CSV import. The ENglish text shows up perfectly, but the Japanese text is garbled symbols. I think sqlite is using a dfferent encoding (perhaps utf16?).
You can test the encoding with this pragma:
PRAGMA encoding;
You cannot change the encoding for an existing database. To create a new database with a specific encoding, open a SQLite connection to a blank file, run this pragma:
PRAGMA encoding = "UTF-8";
And then create your database.
If you have a database and need a different encoding, then you need to create a new database with the new encoding, and then recreate the schema and import all the data.
However, if you have a problem with garbled text it's pretty much always a problem with one of the tools being used, not SQLite itself. Even if SQLite is using a different encoding depending, the only end result is that it will cause some extra computation as SQLite converts from stored encoding to API-requested encoding constantly. If you're using anything other than the C-level API's, then you should never care about encoding--the API's used by the tool you're using will dictate what encoding should be used.
Many SQLite tools have shown issues mangling text into our out of SQLite, including command line shells. Try running SQLite from a command line and tell it to import the file itself instead of going through SQLite Browser.
I also experienced a similar issue. I used SQLiteStudio to access the database and export data. SQLiteStudio does not handle UTF8 special characters correctly, however, the SQLite database itself contains the correct UTF8 characters. I ended up writing a code snippet in C# to connect to the database, run my query, and export the data. This approach worked fine.

What is causing the corruption of text fields with ¿ characters?

We have a very strange problem in out application, all of a sudden we started noticing
upside down question marks being saved along with other text typed in to the fields on the screen. These upside down question marks were not originally entered by the users and it is unclear where they come from. We are using Oracle 10g with Asp.Net.
Here is an example of the issue: "140, 141) ¿ 16-Oct-07". If any one have seen this before and found a way to fix this please let me know how.
This sounds like a character encoding issue. Please check what encoding your database (tables) are set to, and what encoding the objects or strings which are passing data in the database are of. If there is a mis-match (DB in ANSI, App in UTF-8), these sorts of issues can appear.
Greg, you should check NLS_CHARACTERSET not NLS_NCHAR_CHARACTERSET settings. And I bet you it's WE8ISO8859P1 or something similar and not unicode. The problem occurs when the submitted data in unicode, which is probably UTF8, and Oracle tries to map the characters to WE8ISO8859P1 character set. It does fine for most of them but fails for high ASCII number characters, like 140.
So yes, I have seen the same issue in our application and in our case it was caused by special quote marks (“example”, ‘example’) that were copied from MS Word. Word automatically converts double quotes to some other quotes. The solution was to convert the database to UTF-8.
IF your users are copying from MS Word you can turn the feature off . Its part of the autocorrect/autoformat functionality. If you uncheck the replace options for quotes and apostrophes you should be ok. Be sure turn off the replacements in both the AutoFormat and AutoFormat as you type.

Resources