how to display chinese character properly in sqlite console? - sqlite

Here is the sample csv file in utf-8 format which can be opened in win7's notepad and the chinese character displayed properly ,please download it .
http://pan.baidu.com/s/1sj0ia4H
Open your cmd ,and set chcp 650001.
C:\Users\pengsir>sqlite3 e:\\test.db
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
sqlite> create table ipo(name TEXT,method TEXT);
sqlite> .separator ","
sqlite> .import "e:\\tmp.csv" ipo
sqlite> select * from ipo;
000001,公开招募
000002,申请表抽签é™é¢è®¤è´­
000004,定å‘å‘è¡Œ
000005,银行储蓄存å•æ–¹å¼
000006,申请表抽签é™é¢è®¤è´­
000007,自办å‘è¡Œ
000008,自办å‘è¡Œ
000009,定å‘å‘è¡Œ
000010,定å‘å‘è¡Œ
000011,申请表抽签等é¢è®¤è´­
sqlite>
why the same sqlite command can get proper display in sqlitemanager?
and how can i set to display chinese character in sqlite console?
In pysqlite3 , it can get right display in python console.
>>> import sqlite3
>>> con=sqlite3.connect("e:\\test.db")
>>> cur=con.cursor()
>>> cur.execute("select * from ipo;")
<sqlite3.Cursor object at 0x01751720>
>>> print(cur.fetchall())
[('000001', '公开招募'), ('000002', '申请表抽签限额认购'), ('000004', '定向发行'
), ('000005', '银行储蓄存单方式'), ('000006', '申请表抽签限额认购'), ('000007',
'自办发行'), ('000008', '自办发行'), ('000009', '定向发行'), ('000010', '定向发
行'), ('000011', '申请表抽签等额认购')]
>>>

This issue concers how
Command Prompt window
shows the characters, and is not about how sqlite3
prints the output;
As a simple demonstration here we absolutely exclude sqlite3 and look at the files by the type command:
Let's see whats happen in other different O.S., for example in OSX:
ISO-8859-1
correspond to (Windows latino 1), windows equivalent code page setting: chcp 819
UTF8
correspond to Unicode (UTF-8), windows equivalent code page setting: chcp 65001
Pretty the same behavior also happens in Windows:
use command chcp to inspect and/or setting-up your current code page
NOTICE: this is a screenshot of an Italian Windows XP and as you can see there is still no luck! :-( , in this case the cause consists in a leak of available fonts configurable in
command prompt properties in my "Windows XP" box:
I hope this is not the case of your "Windows Seven" box ( ..but if it is , please leave me a comment to be a more specific in this part of the answer ).
..when the problem switches to the "fonts available" then Additional Languages supports would be installed and still need forcing UTF-8 by a chcp 65001:
How to get proper fonts
follows the list of steps I followed to get the result on ITA WinXP SP2 as shown in the above screenshot:
Step 1 Install East Asian language files on your computer
lecture link: to install East Asian language files on your computer
In summary these two options have been both checked
and in "Advanced Tab" I've selected Chinese:
Step 2 Switch from raster to chinese font in the terminal/"Command Windows"
Extra Step 3 (Optional) Check font in notepad
Notepad can be useful for some inspections on fonts, for example open the temp.csv and play with fonts but be aware of: Necessary criteria for fonts to be available in a command window

Well the obvious problem is that Windows (pretty much in general) has a problem in dealing with UTF-8. Especially the command line tool is by default set to a country specific codepage rather than unicode.
Usually you can (temporarily) fix it by setting the codepage for the command-line session to utf-8, for example by typing:
chcp 65001
But the problem is that in your case this does not really fix it, since sqlite seems to still run with the default charset, and there does not seem to be any option to set the current sqlite3 session to unicode.
Still the good news above it all is, that your data is correct, and you can work with it correctly using sqlitemanager or similar tools, which are able to handle unicode appropriately.
To further substantiate this: If you open your original csv with Excel it probably also will give you messed up characters (since it usually does not default to unicode). Whereas LibreOffice will typically ask you for the encoding to use, and given unicode will show the correct text, but given a different encoding (eg: western europe, etc.) will give you the same result as excel (you can preview it there quite nicely, give it a shot).
Hope this helps!

Related

How do I tell Vim to use any/all dictionary files that fit a filepath wildcard glob?

I am trying to set the dictionary option (to allow autocompletion of certain words of my choosing) using wildcards in a filename glob, as follows:
:set dict+=$VIM/dict/dict*.lst
The hope is that, with this line in the initially sourced .vimrc (or, in my case of Windows 10, _vimrc), I can add different dictionary files to my $VIM/dict directory later, and each new invocation of Vim will use those dictionary files, without me needing to modify my .vimrc settings.
However, an error message says that there is no such file. When I give a specific filename (as in :set dict+=$VIM/dict/dict01.lst ), then it works.
The thing is, I could swear that this used to work. I had this setting in my .vimrc files since I started using Vim 7.1, and I don't recall any such error message until recently. Now it shows up on my Linux laptop as well as my Windows 7 and Windows 10 laptops. I can't remember exactly when this started happening.
Yes, I tried using backslashes (as in :set dict+=$VIM\dict\dict*.lst ) in case it was a Windows compatibility issue, but that still doesn't work. (Also this is happening on my Linux laptop, too, and that doesn't use backslashes for filepaths.)
Am I going senile? Or is there some other mysterious force going on?
Assuming for now that it is a change in the latest version of Vim, is there some way to specify "use all the dictionary files that fit this glob"?
-- Edited 2021-02-14 06:17:07
I also checked to see if it was due to having more than one file that fits the wildcard glob. (I thought that if I had more than one file that fit the wildcard, the glob would turn into two filenames, equivalent to saying dict+=$VIM/dict/dict01.lst dict02.lst which would not be syntactically valid.) But it still did not working after removing extra files so that only one file fit my pathname of $VIM/dict/dict*.lst . (I had previously put another Addendum here happily explaining that this was how I solved my problem, but it turned out to be premature.)
You must expand wildcards before setting an option. Multiple file names must be separated by commas. For example,
let &dictionary = tr(expand("$VIM/dict/dict*.lst"), "\n", ",")
If adding a value to a non-empty option, don't forget to add comma too (let is more universal than set, so it's less forgiving):
let &dictionary .= "," . tr(expand(...)...)

How to import Geonames into SQLite?

I need to import the Geonames database (http://download.geonames.org/export/dump/) into SQLite (file is about a gigabyte in size, ±8,000,000 records, tab-delimited).
I'm using the built-in SQLite-possibilities of Mac OS X, accessed through terminal. All goes well, until record 381174 (tested with older file, the exact number varies slightly depending on the exact version of the Geonames database, as it is updated every few days), where the error "expected 19 columns of data but found 18" is displayed.
The exact line causing the problem is:
126704 Gora Kyumyurkey Gora Kyumyurkey Gora Kemyurkey,Gora
Kyamyar-Kup,Gora Kyumyurkey,Gora Këmyurkëy,Komur Qu",Komur
Qu',Komurkoy Dagi,Komūr Qū’,Komūr Qū”,Kummer Kid,Kömürköy Dağı,kumwr
qwʾ,كُمور
قوء 38.73335 48.24133 T MT AZ AZ 00 0 2471 Asia/Baku 2014-03-05
I've tested various countries separately, and the western countries all completely imported without a problem, causing me to believe the problem is somewhere in the exotic characters used in some entries. (I've put this line into a separate file and tested with several other database-programs, some did give an error, some imported without a problem).
How do I solve this error, or are there other ways to import the file?
Thanks for your help and let me know if you need more information.
Regarding the question title, a preliminary search resulted in
the GeoNames format description ("tab-delimited text in utf8 encoding")
https://download.geonames.org/export/dump/readme.txt
some libraries (untested):
Perl: https://github.com/mjradwin/geonames-sqlite (+ autocomplete demo JavaScript/PHP)
PHP: https://github.com/robotamer/geonames-to-sqlite
Python: https://github.com/commodo/geonames-dump-to-sqlite
GUI (mentioned by #charlest):
https://github.com/sqlitebrowser/sqlitebrowser/
The SQLite tools have import capability as well:
https://sqlite.org/cli.html#csv_import
It looks like a bi-directional text issue. "كُمور قوء" is expected to be at the end of the comma-separated alternate name list. However, on account of it being dextrosinistral (or RTL), it's displaying on the wrong side of the latitude and longitude values.
I don't have visibility of your import method, but it seems likely to me that that's why it thinks a column is missing.
I found the same problem using the script from the geonames forum here: http://forum.geonames.org/gforum/posts/list/32139.page
Despite adjusting the script to run on Mac OS X (Sierra 10.12.6) I was getting the same errors. But thanks to the script author since it helped me get the sqlite database file created.
After a little while I decided to use the sqlite DB Browser for SQLite (version 3.11.2) rather than continue with the script.
I had errors with this method as well and found that I had to set the "Quote character" setting in the import dialog to the blank state. Once that was done the import from the FULL allCountries.txt file ran to completion taking just under an hour on my MacBookPro (an old one but with SSD).
Although I have not dived in deeper I am assuming that the geonames text files must not be quote parsed in any way. Each line simply needs to be handled as tab delimited UTF-8 strings.
At the time of writing allCountries.txt is 1.5GB with 11,930,517 records. SQLite database file is just short of 3GB.
Hope that helps.
UPDATE 1:
Further investigation has revealed that it is indeed due to the embedded quotes in the geonames files, and looking here: https://sqlite.org/quirks.html#dblquote shows that SQLite has problems with quotes. Hence you need to be able to switch off quote parsing in SQLite.
Despite the 3.11.2 version of DB Browser being based on SQLite 3.27.2 which does not have the required mods to ignore the quotes, I can only assume it must be escaping the quotes when you set the "Quote character" to blank.

Error invalid control sequence "a href=\”%s\”

I am translating a WordPress theme Magnovus using the default .po file in poEditor 1.6.3 on OSX Mavericks (free version). I got the following errors even when I copied the text over to the translation:
error invalid control sequence
the string I am translating is
You must be logged in to post a comment.
whether I copy it or translate it, I keep on getting this error.
I think the quotes are escaped properly here. Any ideas how to fix this?
This is a bug in Poedit 1.6.3 occurring for some (but curiously, not all) users on OS X Mavericks. It is related to OS X's Smart Quotes feature, which is on by default since 10.9.
It will be fixed in Poedit 1.6.4 and you can download a fixed build of 1.6.3 from http://www.poedit.net/dl/Poedit-1.6.3b.zip. You will need to manually fix the already-broken invalid translation to use \" (i.e. the "dumb quote"), but it won't happen again with 1.6.3b+.
(By the way: http://poedit.net/support.php)
Testing with poEdit 1.6.4 (and MacOSX 10.9.2), these errors of invalid control sequence continue to occur randomly (after fuzzy procedure). The place of errors indicated by red lines in screen table is not good. The only way to repair is to used a text editor and replace \” by \" !
Question : is a way to stop automatic fuzzy suggestion ?
M.
You should use: %s, not \"%s\".

vi editor behaving weird

My vi editor has started behaving weirdly since today morning.
When I use keys in command mode, the characters are being overwritten. For example when using 'w' for word jump, it replaces the characters. To make sure something is not wrong with my settings I have removed the .exrc file from my home, but its still not working correctly.
I am using solaris 10, shell is ksh and version is: Version SVR4.0, Solaris 2.5.0
Example:
VARCHAR(3) is changed to (3) CHAR(3) on doing word jump.
Any idea how it has changed suddenly? Does it use any other default resource file that might have been changed?
It need not be a resource file of vi that has been changed. The setting of the environment variable TERM in the original shell session may be wrong. Compare $TERM there and in a new shell.

console print w/o scrolling

I see console apps print colors and seen apps such as ffmpeg print text over itself instead of a new line. How do I print over an existing line? I want to display fps in my console app either at the very top or very bottom and have regular printfs go there and scroll normally.
I need this for windows, but this is meant to be cross platform, so I will eventually have a linux and mac implementation.
There is two simple possibilities which work on linux as well as windows, but only for one line:
printf("\b"); will return for one character, so you might count how many character you want to backspace and fire this in a loop, or you know that you only write n numbers and do it likeprintf("\b\b\b\b\b\b\b\b\b\b");
printf("text to be overwritten by next printf\r"); this will return the cursor to the beginning of the line, so any next printf will overwrite it. Make sure to write a string of same length or longer so you overwrite it entirely.
If you want to rewrite several lines, there is nothing so portable as ncurses, there is libs for it on practically every operating system, and you don't have to take care of the ANSI-differences.
edit: added link to ncurses wikipedia page, gives great overview and introduction, as well as link list and maybe a translation to your preferred language
Check out ncurses. It has bindings for most scripting languages.
You can use '\r' instead of '\n'.
The ASCII character number 8 (A.K.A. Ctrl-H, BS or Backspace) lets you back up one character. ASCII Character number 13 (A.K.A Ctrl-M, CR or Carriage Return) returns the cursor at the beggining of the line.
If you are working in C try putchar(8); and putchar(13);
The magic of the colors, cursor locating and bliking and so on are inside ANSI escape codes. Any text console capable of handling ANSI codes can use them just printing them out to console (i.e. by means of echo in a bash script or printf() function in C).
Unix terminals support ANSI escape sequences and Windows world used to support them back in old MS-DOS days, but the multibyte console support put an end to this. There is more information here. However there are other ways out of just ANSI sequences printing available on Windows. Moreover if you have Cygwin installed on your Windows maching ANSI codes work just as great as on any Unix terminal.
Many people mention Ncurses library that is the de-facto standard for any gui-like text based applications. What this library does is to hide all the terminal differences (Windows/Unix flavours) to represent the same information as identical as possible across all the platforms, though from my own experience I tell you this is not always true (i.e. typical text window frames change because the especial chars are not available under all character encodings). The counterpart of using ncurses is that it is a complete API and it is much harder to start out with it than simply writing out some ANSI escape sequences for simple things such as change the font color, cleaning screen or moving back the cursor to a random position.
For the sake of completeness I paste an example of use of ANSI sequence under Linux that changes the prompt to blue and shows the date:
PS1="\[\033[34m\][\$(date +%H%M)][\u#\h:\w]$ "
You can use Ncurses -
ncurses package is a subroutine library for terminal-independent screen-painting and input-event handling which presents a high level screen model to the programmer, hiding differences between terminal types and doing automatic optimization of output to change one screenfull of text into another
Depending on the platform which you are developing on there's probably a more powerful API which you could use, rather than old ASCII control codes.
e.g. If you are working on Win32 you can actually manipulate the console screen buffer directly.
A good place to start might be here
http://msdn.microsoft.com/en-us/library/ms683171(VS.85).aspx
I have been looking for similar functions/API which would allow me to access the console as something other than a stream of text for other platforms. Haven't found anything yet, but then again, I haven't been looking that hard.
Hope it helps.

Resources