Chinese Text Display in QT Embedded Linux? - qt

I am using below code to display chinese text on click of a button , its working fine in Windows but when i try in Embedded device it show some junk values.
I am using "Batang" Font .
This font is installed in my Embedded device.
QTextCodec::setCodecForCStrings(QTextCodec::codecForLocale());
QTextCodec::setCodecForTr(QTextCodec::codecForLocale());
QString qString1 = tr("鳶尾花");
QByteArray byteArray = qString1.toUtf8();
const char* cString = byteArray.data();
QString qString2 = QString::fromUtf8(cString);
QTextCodec::setCodecForTr(QTextCodec::codecForName(cString));
ui->txtFirstname->setText(qString2);
Any help is appreciated.
Thanks

When you added the line
QTextCodec::setCodecForTr(QTextCodec::codecForName(cString));
you probably thought the following overload:
QTextCodec * QTextCodec::codecForName ( const QByteArray & name ) [static]
would try to find the best codec for the characters in the byte array you supplied.
However, this function tries to find the codec which has a name closest to the value you supplied, so you would have to do something like
QTextCodec::setCodecForTr(QTextCodec::codecForName("Big5"));
instead.
Have you tried leaving out that line? You are already setting the text codec a few lines above anyway.

I resolved using :
QTextCodec::setCodecForTr(QTextCodec::codecForName("GB18030"));
Big5 was not giving me correct result.
Thanks.

Try using a different encoding and not UTF8 depends on the characters you will be using. Hope this helps.
* Guobiao is mainly used in Mainland China and Singapore. All Guobiao standards are prefixed by GB, the latest version is GB18030 which is a one, two or four byte encoding.
* Big5, used in Taiwan, Hong Kong and Macau, is a one or two byte encoding.
* Unicode, with the set of CJK Unified Ideographs.
Read this for more info: http://doc.qt.nokia.com/stable/codec-big5.html because the characters u use seem to be Big5 encoding characters
A tutorial can be found here: http://doc.qt.nokia.com/latest/qtextcodec.html

Related

Qt - How detect possible invalid data loss with QTextStream readAll?

I am reading in a file that should be UTF-8 encoded using QTextStream::readAll(). If I attempt to open a corrupt UTF-8 file (or a binary file) I want to know that the data was not valid UTF-8.
I tried checking the status() after the read, but it did not indicate any abnormal condition.
I know I could read the whole file in binary mode and write a routine to check it myself, but it seems there should be an easier way, since the read has done all that UTF-8 conversion already.
You can use QTextCodec for this.
QTextCodec * QTextCodec::codecForUtfText(const QByteArray & ba, QTextCodec * defaultCodec)
From documentation:
Tries to detect the encoding of the provided snippet ba by using the
BOM (Byte Order Mark) and returns a QTextCodec instance that is
capable of decoding the text to unicode. If the codec cannot be
detected from the content provided, defaultCodec is returned.

Preserve non-ascii characters between std::string and QString

In my program the user can either provide a filename on the command line or using a QFileDialog. In the first case, I have a char* without any encoding information, in the second I have a QString.
To store the filename for later use (Recent Files), I need it as a QString. But to open the file with std::ifstream, I need a std::string.
Now the fun starts. I can do:
filename = QString::fromLocal8Bit(argv[1]);
later on, I can do:
std::string fn = filename.toLocal8Bit().constData();
This works for most characters, but not all. For example, the word Раи́са will look the same after going through this conversion, but, in fact, have different characters.
So while I can have a Раи́са.txt, and it will display Раи́са.txt, it will not find the file in the filesystem. Most letters work, but и́ doesnt.
(Note that it does work correctly when the file was chosen in the QFileDialog. It does not when it originated from the command line.)
Is there any better way to preserve the filename? Right now I obtain it in whatever native encoding, and can pass-on in the same encoding, without knowing it. At least so I thought.
'и́' is not an ASCII character, that is to say it has no 8-bit representation. How it is represented in argv[1] then is OS dependent. But it's not getting represented in just one char.
The fromLocal8bit uses the same QTextCodec::codecForLocale as toLocal8bit. And as you say your std::string will hold "Раи́са.txt" so that's not the problem.
Depending on how your OS defined std::ifstream though std::ifstream may expect each char to be it's own char and not go through the OS's translation. I expect that you are on Windows since you are seeing this problm. In which case you should use the std::wstring implementation of std::fstream which is Microsoft specific: http://msdn.microsoft.com/en-us/library/4dx08bh4.aspx
You can get a std::wstring from QString by using: toStdWString
See here for more info: fstream::open() Unicode or Non-Ascii characters don't work (with std::ios::out) on Windows
EDIT:
A good cross-platform option for projects with access to it is Boost::Filesystem. ypnos Mentions File-Streams as specifically pertinent.

Retrieve Unicode code points > U+FFFF from QChar

I have an application that is supposed to deal with all kinds of characters and at some point display information about them. I use Qt and its inherent Unicode support in QChar, QString etc.
Now I need the code point of a QChar in order to look up some data in http://unicode.org/Public/UNIDATA/UnicodeData.txt, but QChar's unicode() method only returns a ushort (unsigned short), which usually is a number from 0 to 65535 (or 0xFFFF). There are characters with code points > 0xFFFF, so how do I get these? Is there some trick I am missing or is this currently not supported by Qt/QChar?
Each QChar is a UTF-16 value, not a complete Unicode codepoint. Therefore, non-BMP characters consist of two QChar surrogate pairs.
The solution appears to lay in code that is documented but not seen much on the Web. You can get the utf-8 value in decimal form. You then apply to determine if a single QChar is large enough. In this case it is not. Then you need to create two QChar's.
uint32_t cp = 155222; // a 4-byte Japanese character
QString str;
if(Qchar::requiresSurrogate(cp))
{
QChar charArray[2];
charArray[0] = QChar::highSurrogate(cp);
charArray[1] = QChar::lowSurrogate(cp);
str = QString(charArray, 2);
}
The resulting QString will contain the correct information to display your supplemental utf-8 character.
Unicode characters beyond U+FFFF in Qt
QChar itself only supports Unicode characters up to U+FFFF.
QString supports Unicode characters beyond U+FFFF by concatenating two QChars (that is, by using UTF-16 encoding). However, the QString API doesn't help you much if you need to process characters beyond U+FFFF. As an example, a QString instance which contains the single Unicode character U+131F6 will return a size of 2, not 1.
I've opened QTBUG-18868 about this problem back in 2011, but after more than three years (!) of discussion, it was finally closed as "out of scope" without any resolution.
Solution
You can, however, download and use these Unicode Qt string wrapper classes which have been attached to the Qt bug report. Licensed under the LGPL.
This download contains the wrapper classes QUtfString, QUtfChar, QUtfRegExp and QUtfStringList which supplement the existing Qt classes and allow you to do things like this:
QUtfString str;
str.append(0x1307C); // Some Unicode character beyond U+FFFF
Q_ASSERT(str.size() == 1);
Q_ASSERT(str[0] == 0x1307C);
str += 'a';
Q_ASSERT(str.size() == 2);
Q_ASSERT(str[1] == 'a');
Q_ASSERT(str.indexOf('a') == 1);
For further details about the implementation, usage and runtime complexity please see the API documentation included within the download.

Does QString::fromUtf8 automatically reverse a Hebrew string?

I am having a problem where a Hebrew string is being displayed in reverse. I use QTableWidget to display some info, and here the string appears correctly using:
CString hebrewStr; hebrewStr.ToUTF8();
QString s = QString::fromUtf8( hebrewStr );
In another part of my program this same string is displayed on the screen, but not using QT, and this is what is being shown in reverse:
CString hebrewStr;
hebrewStr.ToUTF8();
I have debugged and hebrewStr.ToUTF8() in both cases produces the exact same unicode string, but the string is only displayed correctly in the QTableWidget. So I am wondering if Qt automatically reverses a given Hebrew string (since it is a rigth-to-left language). Thanks!
Yes, in this case QString generate the full unicode wchar_t from the UTF-8 encoded string. If you would like to do similar thing in MFC, you should use CStringW and decode the string.
Use MultiByteToWideChar for UTF8 to CStringW conversion.
Connected question in StackOverflow.

HttpUtility.HtmlDecode cannot decode ASCII greater than 127

I have a list of character that display fine in WebBrowser in the form of encoded characters such as €  ...
But when posting these characters onto server to I realized that HttpUtility.HtmlDecode cannot convert them to characters as browser did, they all become space.
text = System.Web.HttpUtility.HtmlDecode("€");
I expect it to return € but it return space instead. The same thing happen for some other characters as well.
Does anyone know how to fix this or any workaround?
This is commonly result of using literal values and mixing UTF-8 and ASCII. In UTF-8 euro sign is encoded as 3 bytes so there is no ASCII counterpart for it.
Update
Your code is illegal if you are using UTF-8 since it only supports the first 128 characters and the rest are encoded is multiple bytes. You need to use the Unicode syntax:
// !!! NOT HtmlDecode!!!
text = System.Web.HttpUtility.UrlDecode("%E2%82%AC");
UPDATE
OK, I have left the code as it was but added the comment that it does not work. It does not work because it is not an encoding which is of concern for HTML - it is not an HTML. This is of concern for the URL and as such you need to use UrlDecode instead.
ASCII is 7-Bit; there are no characters 128 through 255. The MSDN article you linked is following the long tradition of pretending ASCII is 8-Bit; the article actually shows code page 437.
I'm not sure why you're not simply writing € (compatibility?), but € or € should do, too.
You typically want to do something like:
string html = "€"
string trash = WebUtility.HtmlDecode(html);
//Convert from default encoding to UTF8
byte[] bytes = Encoding.Default.GetBytes(trash);
string proper = Encoding.UTF8.GetString(bytes);

Resources