I'm repeatedly getting this error from 10.0.29-MariaDB-0ubuntu0.16.04.1 Ubuntu 16.04
ERROR 1071 (42000) at line 81: Specified key was too long; max key
length is 767 bytes
The line targeted will typically look like this:
name VARCHAR(255) NOT NULL UNIQUE,
Changing it to a VARCHAR(63) makes the error go away. Is this a bug in MariaDB?
To work around this error, do one of
Workaround: do one of
Upgrade to 5.7.7 (or later) for 3072 byte limit instead of 767
Change 255 to 191 on the VARCHAR (assuming your values are not too long)
ALTER .. CONVERT TO utf8 -- but this disallows Emoji and some Chinese
Use a "prefix" index (ill-advised)
Reconfigure (for 5.6.3 - 5.7.6) (below)
Reconfiguring 5.6.3 or 5.5.14:
SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=1;
SET GLOBAL innodb_large_prefix=1;
logout & login (to get the global values);
ALTER TABLE tbl ROW_FORMAT=DYNAMIC; (or COMPRESSED)
(The version numbers are based on Oracle's MySQL; the MariaDB version numbers are different for this issue.)
To follow up on the accepted answer - to change to UTF8, which I found the easiest solution, you can use the below line in MariaDb when creating your table:
name varchar(255) CHARACTER SET 'utf8' UNIQUE NOT NULL,
Related
With Free Pascal 3.0.4, this test program correctly writes ÄÖÜ
program FPCTest;
uses IdURI;
begin
WriteLn(TIdURI.URLDecode('%C3%84%C3%96%C3%9C'));
ReadLn;
end.
However if the unit LazUTF8 (as described here) is used, it writes ???
program FPCTest;
uses IdURI, LazUTF8;
begin
WriteLn(TIdURI.URLDecode('%C3%84%C3%96%C3%9C'));
ReadLn;
end.
How can I fix this decoding error for programs which use LazUTF8?
When the String type is an alias for AnsiString 1, much of Indy's functionality exposes extra parameters/properties to let users control which ANSI encodings are used when AnsiString values are passed around in operations that perform AnsiString<->byte conversions.
1: Delphi pre-2009, and FreePascal/Lazarus when {$ModeSwitch UnicodeStrings} and {$Mode DelphiUnicode} are not used (FYI, Indy 11 will use them!).
In most cases, Indy's default byte encoding is ASCII (because many of the Internet protocols that Indy implements originally supported only ASCII - individual Indy components upgrade themselves to UTF as appropriate per protocol), though some things use the OS default codepage/charset instead.
Indy's default byte encoding can be changed at runtime by setting the global GIdDefaultTextEncoding variable in the IdGlobal unit, eg:
GIdDefaultTextEncoding := encUTF8;
But, in this particular situation, TIdURI.URLEncode() does not use GIdDefaultTextEncoding, but it does have an optional ADestEncoding parameter that you can use to specify a specific byte encoding for the returned AnsiString (in addition to an optional AByteEncoding parameter to specify the byte encoding of the parsed url octets - UTF-8 by default), eg:
TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
{$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding_UTF8{$ENDIF}
)
The above will parse the url-encoded octets as UTF-8, and then return that data as-is in a UTF-8 encoded AnsiString.
If you do not specify an output encoding for ADestEncoding, URLDecode() defaults to the OS default. If you want it to use GIdDefaultTextEncoding instead, specify IndyTextEncoding_Default in the ADestEncoding parameter:
TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
{$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding_Default{$ENDIF}
)
Another option would be to use the IndyTextEncoding(CodePage) function for ADestEncoding, passing it FreePascal's DefaultSystemCodePage variable, which the LazUtils package sets to CP_UTF8 2:
TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
{$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding(DefaultSystemCodePage){$ENDIF}
)
2: I have opened a ticket in Indy's issue tracker to add support for DefaultSystemCodePage when compiling for FreePascal/Lazarus.
With this change in TIdURI.URLDecode lines 386ff LazUTF8 can be used:
{$IFDEF FPC}
Result := string(AByteEncoding.GetString(LBytes));
{$ELSE}
{$IFDEF STRING_IS_ANSI}
EnsureEncoding(ADestEncoding, encOSDefault);
CheckByteEncoding(LBytes, AByteEncoding, ADestEncoding);
SetString(Result, PAnsiChar(LBytes), Length(LBytes));
{$ELSE}
Result := AByteEncoding.GetString(LBytes);
{$ENDIF}
{$ENDIF}
Notes
This change assumes that the LazUTF8 unit is used always, and the Indy source code change needs to be applied every time when a new version is used.
Also I found no way to fix the TIdURI.URLDecode in a way which works with and without LazUTF8.
The following command in the sqlite CLI does not work (ellipses are expanded in the actual command):
create table a ('c0, c1, c2, c3, ..., c1000');
I'm just left with the ...> prompt until I quit out with this error message:
Error: unrecognized token: "'c0, c1, ..., c697," (with ellipses expanded).
Incidentally, creating a table with up to 698 columns works.
It doesn't seem like I'm actually hitting SQLite's limits though, so why does this happen? https://www.sqlite.org/limits.html
The default setting for SQLITE_MAX_COLUMN is 2000.
The maximum number of bytes in the text of an SQL statement is limited to SQLITE_MAX_SQL_LENGTH which defaults to 1000000.
(For the purposes of this question, disregard the wisdom of creating a table with so many columns.)
The Windows console appears to have a limit of 4096 bytes for entered lines.
Executing that string through any other mechanism (e.g., .read from a file) works fine.
Here is the sample csv file in utf-8 format which can be opened in win7's notepad and the chinese character displayed properly ,please download it .
http://pan.baidu.com/s/1sj0ia4H
Open your cmd ,and set chcp 650001.
C:\Users\pengsir>sqlite3 e:\\test.db
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
sqlite> create table ipo(name TEXT,method TEXT);
sqlite> .separator ","
sqlite> .import "e:\\tmp.csv" ipo
sqlite> select * from ipo;
000001,公开招募
000002,申请表抽ç¾é™é¢è®¤è´
000004,定å‘å‘è¡Œ
000005,银行储蓄å˜å•æ–¹å¼
000006,申请表抽ç¾é™é¢è®¤è´
000007,自办å‘è¡Œ
000008,自办å‘è¡Œ
000009,定å‘å‘è¡Œ
000010,定å‘å‘è¡Œ
000011,申请表抽ç¾ç‰é¢è®¤è´
sqlite>
why the same sqlite command can get proper display in sqlitemanager?
and how can i set to display chinese character in sqlite console?
In pysqlite3 , it can get right display in python console.
>>> import sqlite3
>>> con=sqlite3.connect("e:\\test.db")
>>> cur=con.cursor()
>>> cur.execute("select * from ipo;")
<sqlite3.Cursor object at 0x01751720>
>>> print(cur.fetchall())
[('000001', '公开招募'), ('000002', '申请表抽签限额认购'), ('000004', '定向发行'
), ('000005', '银行储蓄存单方式'), ('000006', '申请表抽签限额认购'), ('000007',
'自办发行'), ('000008', '自办发行'), ('000009', '定向发行'), ('000010', '定向发
行'), ('000011', '申请表抽签等额认购')]
>>>
This issue concers how
Command Prompt window
shows the characters, and is not about how sqlite3
prints the output;
As a simple demonstration here we absolutely exclude sqlite3 and look at the files by the type command:
Let's see whats happen in other different O.S., for example in OSX:
ISO-8859-1
correspond to (Windows latino 1), windows equivalent code page setting: chcp 819
UTF8
correspond to Unicode (UTF-8), windows equivalent code page setting: chcp 65001
Pretty the same behavior also happens in Windows:
use command chcp to inspect and/or setting-up your current code page
NOTICE: this is a screenshot of an Italian Windows XP and as you can see there is still no luck! :-( , in this case the cause consists in a leak of available fonts configurable in
command prompt properties in my "Windows XP" box:
I hope this is not the case of your "Windows Seven" box ( ..but if it is , please leave me a comment to be a more specific in this part of the answer ).
..when the problem switches to the "fonts available" then Additional Languages supports would be installed and still need forcing UTF-8 by a chcp 65001:
How to get proper fonts
follows the list of steps I followed to get the result on ITA WinXP SP2 as shown in the above screenshot:
Step 1 Install East Asian language files on your computer
lecture link: to install East Asian language files on your computer
In summary these two options have been both checked
and in "Advanced Tab" I've selected Chinese:
Step 2 Switch from raster to chinese font in the terminal/"Command Windows"
Extra Step 3 (Optional) Check font in notepad
Notepad can be useful for some inspections on fonts, for example open the temp.csv and play with fonts but be aware of: Necessary criteria for fonts to be available in a command window
Well the obvious problem is that Windows (pretty much in general) has a problem in dealing with UTF-8. Especially the command line tool is by default set to a country specific codepage rather than unicode.
Usually you can (temporarily) fix it by setting the codepage for the command-line session to utf-8, for example by typing:
chcp 65001
But the problem is that in your case this does not really fix it, since sqlite seems to still run with the default charset, and there does not seem to be any option to set the current sqlite3 session to unicode.
Still the good news above it all is, that your data is correct, and you can work with it correctly using sqlitemanager or similar tools, which are able to handle unicode appropriately.
To further substantiate this: If you open your original csv with Excel it probably also will give you messed up characters (since it usually does not default to unicode). Whereas LibreOffice will typically ask you for the encoding to use, and given unicode will show the correct text, but given a different encoding (eg: western europe, etc.) will give you the same result as excel (you can preview it there quite nicely, give it a shot).
Hope this helps!
I need to import the Geonames database (http://download.geonames.org/export/dump/) into SQLite (file is about a gigabyte in size, ±8,000,000 records, tab-delimited).
I'm using the built-in SQLite-possibilities of Mac OS X, accessed through terminal. All goes well, until record 381174 (tested with older file, the exact number varies slightly depending on the exact version of the Geonames database, as it is updated every few days), where the error "expected 19 columns of data but found 18" is displayed.
The exact line causing the problem is:
126704 Gora Kyumyurkey Gora Kyumyurkey Gora Kemyurkey,Gora
Kyamyar-Kup,Gora Kyumyurkey,Gora Këmyurkëy,Komur Qu",Komur
Qu',Komurkoy Dagi,Komūr Qū’,Komūr Qū”,Kummer Kid,Kömürköy Dağı,kumwr
qwʾ,كُمور
قوء 38.73335 48.24133 T MT AZ AZ 00 0 2471 Asia/Baku 2014-03-05
I've tested various countries separately, and the western countries all completely imported without a problem, causing me to believe the problem is somewhere in the exotic characters used in some entries. (I've put this line into a separate file and tested with several other database-programs, some did give an error, some imported without a problem).
How do I solve this error, or are there other ways to import the file?
Thanks for your help and let me know if you need more information.
Regarding the question title, a preliminary search resulted in
the GeoNames format description ("tab-delimited text in utf8 encoding")
https://download.geonames.org/export/dump/readme.txt
some libraries (untested):
Perl: https://github.com/mjradwin/geonames-sqlite (+ autocomplete demo JavaScript/PHP)
PHP: https://github.com/robotamer/geonames-to-sqlite
Python: https://github.com/commodo/geonames-dump-to-sqlite
GUI (mentioned by #charlest):
https://github.com/sqlitebrowser/sqlitebrowser/
The SQLite tools have import capability as well:
https://sqlite.org/cli.html#csv_import
It looks like a bi-directional text issue. "كُمور قوء" is expected to be at the end of the comma-separated alternate name list. However, on account of it being dextrosinistral (or RTL), it's displaying on the wrong side of the latitude and longitude values.
I don't have visibility of your import method, but it seems likely to me that that's why it thinks a column is missing.
I found the same problem using the script from the geonames forum here: http://forum.geonames.org/gforum/posts/list/32139.page
Despite adjusting the script to run on Mac OS X (Sierra 10.12.6) I was getting the same errors. But thanks to the script author since it helped me get the sqlite database file created.
After a little while I decided to use the sqlite DB Browser for SQLite (version 3.11.2) rather than continue with the script.
I had errors with this method as well and found that I had to set the "Quote character" setting in the import dialog to the blank state. Once that was done the import from the FULL allCountries.txt file ran to completion taking just under an hour on my MacBookPro (an old one but with SSD).
Although I have not dived in deeper I am assuming that the geonames text files must not be quote parsed in any way. Each line simply needs to be handled as tab delimited UTF-8 strings.
At the time of writing allCountries.txt is 1.5GB with 11,930,517 records. SQLite database file is just short of 3GB.
Hope that helps.
UPDATE 1:
Further investigation has revealed that it is indeed due to the embedded quotes in the geonames files, and looking here: https://sqlite.org/quirks.html#dblquote shows that SQLite has problems with quotes. Hence you need to be able to switch off quote parsing in SQLite.
Despite the 3.11.2 version of DB Browser being based on SQLite 3.27.2 which does not have the required mods to ignore the quotes, I can only assume it must be escaping the quotes when you set the "Quote character" to blank.
I have a strange problem with an FTS table in my sqlite database. I am creating a table like this:
CREATE VIRTUAL TABLE table01 USING fts4(name, content, tokenize=icu ru_RU);
Then inserting two rows, one with name and content and one only with name.
INSERT INTO table01(name, content) VALUES('Abc1', 'asdasd');
INSERT INTO table01(name) VALUES('Abc2');
And when I am trying to update or delete row with only name column filled I am getting an error "Access violation at address 7244CF96 in module 'sqlite3.dll'. Read of address 00000000.". There is all ok with other row, I can update it or delete as usual. It doesn't matter what locale do I use, I've tried en_US and many others. And there is no difference between fts3 and fts4. It seems like ICU bug, because there is no problem with built-in tokenizers of sqlite. Is it ICU or sqlite bug, or I am doing something wrong? I've tried this on SQLite Expert and in my project with all required compile options and all libraries included.
Latest in the stack frame:
msvcr100d.dll!strlen(unsigned char * buf) Line 81 Asm
my_prog.exe!icuOpen(sqlite3_tokenizer * pTokenizer, const char * zInput, int nInput, sqlite3_tokenizer_cursor * * ppCursor) Line 60276 + 0x9 bytes
my_prog.exe!fts3PendingTermsAdd(Fts3Table * p, const char * zText, int iCol, unsigned int * pnWord) Line 52616 + 0x18 bytes C
my_prog.exe!fts3DeleteTerms(int * pRC, Fts3Table * p, Mem * * apVal, unsigned int * aSz) Line 52828 + 0x1a bytes C
fts3DeleteTerms is getting null pointer instead of string and passing it down. I cannot debug it step-by-step right now (I am using amalgamation and VS 2010 cannot debug more than 65535 lines code).
Address 00000000 is, of course, a null pointer. Sounds like an sqlite3 bug (at least in the fts4/icu glue) if it is trying to pass null to the ICU tokenizer. Can you build everything in debug mode and find out who is getting the access violation?