Unicode errors after upgrade to 2.1.0 final - django-cms

I've recently upgraded a DjangoCMS project from 2.1.0beta3 to 2.1.0
final, and I've started getting Unicode errors during page editing.
There was a large volume of production content that was migrated
forward with South. I get the error while (using TinyMCE) I try to
insert another plugin, such as an image, into a text plugin or when I
try to add a plugin to a placeholder.
URL:
/admin/cms/page/188/edit-plugin/673/edit-plugin/676/
Stack Trace:
File "/srv/wsphp/wspython/virtualenv/iaffe-prod/lib/python2.6/site-packages/django/template/__init__.py", line 849, in render
return _render_value_in_context(output, context)
File "/srv/wsphp/wspython/virtualenv/iaffe-prod/lib/python2.6/site-packages/django/template/__init__.py", line 829, in _render_value_in_context
value = force_unicode(value)
File "/srv/wsphp/wspython/virtualenv/iaffe-prod/lib/python2.6/site-packages/django/utils/encoding.py", line 88, in force_unicode
raise DjangoUnicodeDecodeError(s, *e.args)
DjangoUnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12: ordinal not in range(128). You passed in <django.forms.forms.BoundField object at 0xb73cedec> (<class 'django.forms.forms.BoundField'>)
If I repeatedly try to create the plugin, the ID (676 here) increments,
so it looks like the error happens when the form is rendered. This
affects link, picture, and teaser plugins, but not text, file or
snippet plugins.
I'd appreciate any help in isolating the cause here.
Thanks,
Michael

Bit of a late answer, but I had some problems with unicode and solved in with defining source code encodings also see http://evanjones.ca/python-utf8.html
I put
# -*- coding: utf-8 -*-
at the top of the offending files and everything was sorted.

It turns out that this was a data migration issue. The ultimate solution was to force utf8 encoding in the relevant mysql tables using commands like:
alter table cms_page convert to character set utf8;

Related

Testable map base xml to flat file fails in BTS2013r2

I have lots of BTS2010 unit tests that check an XML file can be mapped to flat file.
I have developed my first of such tests on BTS2013r2 but on executing TestableMapBase.TestMap(_inputFilename, _inputType, outputFilename, _outputType), I get the error "Generate schema instance failure"
I've used reflector to debug the MS assemblies and got as far as the following line within CFrameworkSchemaTreeExtensions.cs of Microsoft.BizTalk.TOM.Adapter :
infoArray = instanceGenerator.GenerateInstance(filename, xmlInstance);
on executing, the infoArray is populated with the following error
ErrorInfo: hexadecimal value 0x00, is an invalid character. Line 2, position 1."
Prior to executing I have taken the content of xmlInstance, pasted into Notepad++ and used the Hex plugin to search for null characters (hex 0x00), there are none.
I have tried many different XML inputs to the maps on two different BizTalk development laptops and get the same result.
Has anyone been able to successfully run tests of XML to flat file in BTS2013r2?
Today I have created the most basic of solutions (1 BizTalk project + 1 unit test project) in order to test if this really is a Microsoft bug. It does seem that way because I got the same error when running this very simple test on a third BizTalk development laptop. I have added the source code to the following github repo: https://github.com/RobBowman/FFMapFailBTS2013r2
Make sure it is not an encoding issue. Finding a 0x00 at that position sounds like the input file is in UTF-16 format, while the processor is expecting UTF-8 or another single-byte encoding.
Microsoft have published a hotfix for this - see: https://social.msdn.microsoft.com/Forums/en-US/cacecbfd-8b71-409c-bd59-2eed26950f25/test-map-to-flat-file-in-bts-2013r2-does-this-ever-work?forum=biztalkgeneral

how to display chinese character properly in sqlite console?

Here is the sample csv file in utf-8 format which can be opened in win7's notepad and the chinese character displayed properly ,please download it .
http://pan.baidu.com/s/1sj0ia4H
Open your cmd ,and set chcp 650001.
C:\Users\pengsir>sqlite3 e:\\test.db
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
sqlite> create table ipo(name TEXT,method TEXT);
sqlite> .separator ","
sqlite> .import "e:\\tmp.csv" ipo
sqlite> select * from ipo;
000001,公开招募
000002,申请表抽签é™é¢è®¤è´­
000004,定å‘å‘è¡Œ
000005,银行储蓄存å•æ–¹å¼
000006,申请表抽签é™é¢è®¤è´­
000007,自办å‘è¡Œ
000008,自办å‘è¡Œ
000009,定å‘å‘è¡Œ
000010,定å‘å‘è¡Œ
000011,申请表抽签等é¢è®¤è´­
sqlite>
why the same sqlite command can get proper display in sqlitemanager?
and how can i set to display chinese character in sqlite console?
In pysqlite3 , it can get right display in python console.
>>> import sqlite3
>>> con=sqlite3.connect("e:\\test.db")
>>> cur=con.cursor()
>>> cur.execute("select * from ipo;")
<sqlite3.Cursor object at 0x01751720>
>>> print(cur.fetchall())
[('000001', '公开招募'), ('000002', '申请表抽签限额认购'), ('000004', '定向发行'
), ('000005', '银行储蓄存单方式'), ('000006', '申请表抽签限额认购'), ('000007',
'自办发行'), ('000008', '自办发行'), ('000009', '定向发行'), ('000010', '定向发
行'), ('000011', '申请表抽签等额认购')]
>>>
This issue concers how
Command Prompt window
shows the characters, and is not about how sqlite3
prints the output;
As a simple demonstration here we absolutely exclude sqlite3 and look at the files by the type command:
Let's see whats happen in other different O.S., for example in OSX:
ISO-8859-1
correspond to (Windows latino 1), windows equivalent code page setting: chcp 819
UTF8
correspond to Unicode (UTF-8), windows equivalent code page setting: chcp 65001
Pretty the same behavior also happens in Windows:
use command chcp to inspect and/or setting-up your current code page
NOTICE: this is a screenshot of an Italian Windows XP and as you can see there is still no luck! :-( , in this case the cause consists in a leak of available fonts configurable in
command prompt properties in my "Windows XP" box:
I hope this is not the case of your "Windows Seven" box ( ..but if it is , please leave me a comment to be a more specific in this part of the answer ).
..when the problem switches to the "fonts available" then Additional Languages supports would be installed and still need forcing UTF-8 by a chcp 65001:
How to get proper fonts
follows the list of steps I followed to get the result on ITA WinXP SP2 as shown in the above screenshot:
Step 1 Install East Asian language files on your computer
lecture link: to install East Asian language files on your computer
In summary these two options have been both checked
and in "Advanced Tab" I've selected Chinese:
Step 2 Switch from raster to chinese font in the terminal/"Command Windows"
Extra Step 3 (Optional) Check font in notepad
Notepad can be useful for some inspections on fonts, for example open the temp.csv and play with fonts but be aware of: Necessary criteria for fonts to be available in a command window
Well the obvious problem is that Windows (pretty much in general) has a problem in dealing with UTF-8. Especially the command line tool is by default set to a country specific codepage rather than unicode.
Usually you can (temporarily) fix it by setting the codepage for the command-line session to utf-8, for example by typing:
chcp 65001
But the problem is that in your case this does not really fix it, since sqlite seems to still run with the default charset, and there does not seem to be any option to set the current sqlite3 session to unicode.
Still the good news above it all is, that your data is correct, and you can work with it correctly using sqlitemanager or similar tools, which are able to handle unicode appropriately.
To further substantiate this: If you open your original csv with Excel it probably also will give you messed up characters (since it usually does not default to unicode). Whereas LibreOffice will typically ask you for the encoding to use, and given unicode will show the correct text, but given a different encoding (eg: western europe, etc.) will give you the same result as excel (you can preview it there quite nicely, give it a shot).
Hope this helps!

How to import Geonames into SQLite?

I need to import the Geonames database (http://download.geonames.org/export/dump/) into SQLite (file is about a gigabyte in size, ±8,000,000 records, tab-delimited).
I'm using the built-in SQLite-possibilities of Mac OS X, accessed through terminal. All goes well, until record 381174 (tested with older file, the exact number varies slightly depending on the exact version of the Geonames database, as it is updated every few days), where the error "expected 19 columns of data but found 18" is displayed.
The exact line causing the problem is:
126704 Gora Kyumyurkey Gora Kyumyurkey Gora Kemyurkey,Gora
Kyamyar-Kup,Gora Kyumyurkey,Gora Këmyurkëy,Komur Qu",Komur
Qu',Komurkoy Dagi,Komūr Qū’,Komūr Qū”,Kummer Kid,Kömürköy Dağı,kumwr
qwʾ,كُمور
قوء 38.73335 48.24133 T MT AZ AZ 00 0 2471 Asia/Baku 2014-03-05
I've tested various countries separately, and the western countries all completely imported without a problem, causing me to believe the problem is somewhere in the exotic characters used in some entries. (I've put this line into a separate file and tested with several other database-programs, some did give an error, some imported without a problem).
How do I solve this error, or are there other ways to import the file?
Thanks for your help and let me know if you need more information.
Regarding the question title, a preliminary search resulted in
the GeoNames format description ("tab-delimited text in utf8 encoding")
https://download.geonames.org/export/dump/readme.txt
some libraries (untested):
Perl: https://github.com/mjradwin/geonames-sqlite (+ autocomplete demo JavaScript/PHP)
PHP: https://github.com/robotamer/geonames-to-sqlite
Python: https://github.com/commodo/geonames-dump-to-sqlite
GUI (mentioned by #charlest):
https://github.com/sqlitebrowser/sqlitebrowser/
The SQLite tools have import capability as well:
https://sqlite.org/cli.html#csv_import
It looks like a bi-directional text issue. "كُمور قوء" is expected to be at the end of the comma-separated alternate name list. However, on account of it being dextrosinistral (or RTL), it's displaying on the wrong side of the latitude and longitude values.
I don't have visibility of your import method, but it seems likely to me that that's why it thinks a column is missing.
I found the same problem using the script from the geonames forum here: http://forum.geonames.org/gforum/posts/list/32139.page
Despite adjusting the script to run on Mac OS X (Sierra 10.12.6) I was getting the same errors. But thanks to the script author since it helped me get the sqlite database file created.
After a little while I decided to use the sqlite DB Browser for SQLite (version 3.11.2) rather than continue with the script.
I had errors with this method as well and found that I had to set the "Quote character" setting in the import dialog to the blank state. Once that was done the import from the FULL allCountries.txt file ran to completion taking just under an hour on my MacBookPro (an old one but with SSD).
Although I have not dived in deeper I am assuming that the geonames text files must not be quote parsed in any way. Each line simply needs to be handled as tab delimited UTF-8 strings.
At the time of writing allCountries.txt is 1.5GB with 11,930,517 records. SQLite database file is just short of 3GB.
Hope that helps.
UPDATE 1:
Further investigation has revealed that it is indeed due to the embedded quotes in the geonames files, and looking here: https://sqlite.org/quirks.html#dblquote shows that SQLite has problems with quotes. Hence you need to be able to switch off quote parsing in SQLite.
Despite the 3.11.2 version of DB Browser being based on SQLite 3.27.2 which does not have the required mods to ignore the quotes, I can only assume it must be escaping the quotes when you set the "Quote character" to blank.

where can I download the ispell *.dict and *.affix files?

I am quite new to postgresql full text search and I am setting up the configuration as where can I download the ispell *.dict and *.affix filefollowing (exactly as in docs):
CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);
So, this I think expects files english.dict and english.affix on for example:
/usr/share/postgresql/9.2/tsearch_data
But these files are not there. I just have ispell_sample.dict and ispell_sample.affix - which when included above work fine - no problem.
So... I followed this post and downloaded the required dictionary from the open office people and renamed the .dic to .dict and .aff to .affix. Then I have checked (using file -bi dict.affix and file -bi english.dict and they are UTF8 encoded).
When I run the above text search dictionary, I get the error:
ERROR: wrong affix file format for flag
CONTEXT: line 2778 of configuration file "/usr/share/postgresql/9.2/tsearch_data/english.affix": "COMPOUNDMIN 1
"
I was wondering if anyone had clues on how to solve this problem or if anyone had encountered this before..
Thanks./.
UPDATE:1: I guess the question can be rephrased as follows:
where can I download the ispell *.dict and *.affix file for postgres
Here's a good reference: https://www.cs.hmc.edu/~geoff/ispell-dictionaries.html This is a good resource for those dictionaries of any language.

Set Qt default encoding to UTF-8

In my Qt application my source code files are encoded as UTF-8. For the following code...
QMessageBox::critical(this, "Nepoznata pogreška", "Dogodila se nepoznata pogreška! Želite li zatvoriti ovaj program ?", QMessageBox::Yes, QMessageBox::No);
...when I show that message box, the character "š" wouldn't be displayed as "š", but as something strange. This is because Qt converts all C-strings as if they are encoded using LATIN-1. To solve this I've been using:
QMessageBox::critical(this, QString::fromUtf8("Nepoznata pogreška"), QString::fromUtf8("Dogodila se nepoznata pogreška! Želite li zatvoriti ovaj program ?"), QMessageBox::Yes, QMessageBox::No);
Is there a way to get rid of all the calls to QString::fromUtf8()?
Have you tried using QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"))?
setCodecForCStrings() had been deprecated.
Try instead,
QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"));
It worked for me.
Regarding the "guess" that "Qt5 assumes all source files are UTF-8 encoded": Thiago Macieira explains the decision made by Qt's developers here.
The assumption can be disabled with QT_NO_CAST_FROM_ASCII according to the documentation.

Resources