Displaying UTF-8 characters in a PlainTextEdit - qt

I'm trying to display Chinese characters encoded in UTF-8 in a PlainTextEdit control, but it doesn't render them properly.
My data comes from a database and I know that the string I get in Qt is correct (the bytes are the same as in the database). Once I have the Chinese character in a QString, I tried various things to display it but always results in either question marks or random ASCII characters:
QString chineseChar = query.value(fieldNo).toString(); // get the character
ui->plainTextEdit->appendPlainText(chineseChar); // doesn't work
ui->plainTextEdit->appendPlainText(chineseChar.toUtf8()); // doesn't work
ui->plainTextEdit->appendPlainText(QString::fromUtf8(chineseChar.toAscii()); // doesn't work
Any suggestion on how to handle that?

"My data comes from a database and I know that the string I get in Qt is correct (the bytes are the same as in the database)."
How did you check that? Try with chineseChar.toUtf8().toHex().
Once your string data is in a QString, all UI elements accepting a QString will handle it correctly. Usually the error happens when converting from plain text data(const char*/QByteArray) to the QString.
The conversions here:
ui->plainTextEdit->appendPlainText(chineseChar.toUtf8()); // doesn't work
ui->plainTextEdit->appendPlainText(QString::fromUtf8(chineseChar.toAscii()); // doesn't work
convert the unicode string to a bytearray, and then implicitely back to a QString, as those methods expect a QString.
I suggest you define QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII to avoid any unwanted QByteArray<->QString conversions.
If the string is wrong, the error usually happened before, when converting from QByteArray/const char* to QString, i.e. in query.value(fieldNo).toString(). Try with:
QString chineseChar = QString::fromUtf8( query.value(fieldNo).toByteArray() );
If that doesn't help, the problem is somewhere in QtSQL assuming the wrong encoding for the data it receives from the database.

Related

invalid pixel in Firefox because of content charset setting in Netty server

I am developing an http server with Netty. On some occasions, the server must answer a 1x1 transparent pixel. So I hard-coded a GIF transparent pixel in base64, and returned it with the following code :
String pixel_string= new String (Base64.decodeBase64("R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="));
HttpResponse response = new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK);
response.setContent(ChannelBuffers.copiedBuffer(pixel_string, CharsetUtil.UTF_8));
EDIT : I also set the content-type :
response.setHeader(HttpHeaders.Names.CONTENT_TYPE,
"image/gif");
In Chrome, everything is fine. However, Firefox tells me that it cannot display the pixel (which is pretty bad for my app), as the pixel data in invalid.
After many investigations, I finally figured out a fix, by changing the charset to Iso-8859-1.
response.setContent(ChannelBuffers.copiedBuffer(
responseBuilder.pixel_string, CharsetUtil.ISO_8859_1));
I don't understand why it works, which makes me think that I may run into troubles in some cases. I tried to change the Firefox preferences (to have UTF8 as default), but it doesn't change much.
Why does Firefox accept the ISO-8859 encoding, and not UTF-8 ? Can I change that ? Would someone have a clue on the origin of the issue and how to be sure that it will work whatever the user's setting ?
Thanks
It's not Firefox that's accepting the encoding or not. It's your server.
When you do your base64 decode you produce a string that contains some characters... but what you really produced was bytes that you're then thinking of as characters somehow. Since a Java String is a container that holds a UTF-16 string, in practice what you're doing is taking each byte, treating it as a a 16-bit integer and constructing the UTF-16 "string" made up of those code units.
But when you want to put all this on the network, you have to convert you string to bytes, and the argument to copiedBuffer says how to do that. If converting to UTF-8, any character that came from a byte that had the high bit set will end up getting encoded as a two-byte UTF-8 sequence. On the other hand, if converting to ISO-8859-1, the conversion just drops the high byte of each UTF-16 code unit (which in your case is always zero anyway).
So the conversion to ISO-8859-1 produces the actual byte array you got out of base64-decoding, while the conversion to UTF-8 produces.... something else which may or may not actually make any sense depending on the exact byte values.
The copiedBuffer constructor you call is not appropriate for the type of data (binary) you are using. According to the JavaDoc of the Netty API, the one you are calling is:
Creates a new big-endian buffer whose content is the specified string
encoded in the specified charset.
Which means that your binary data is being "converted" to UTF-8 (which is meaningless). If you try to save the generated file and look at it with a hex editor, you'll probably see that it is corrupted.
Try with something like this (untested code):
static byte[] pixel_data = Base64.decodeBase64("R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==");
HttpResponse response = ...
response.setHeader(HttpHeaders.Names.CONTENT_TYPE, "image/gif");
response.setContent(ChannelBuffers.copiedBuffer(pixel_data));

Converting UTF-16 QByteArray to QString

I have a QByteArray which contains bytes in UTF-16 format.
A Java program sends data to a QT program via socket using
//dos is DataOutPutStream
dos.writeChars("hello world");
On the receiver side in QT program I read the data from socket into QByteArray and I want to convert it to a QString. inspecting the data variable of QByteArray it has 0h0e0l0l0o0 0w0o0r0l0d
When I try to make a QString out of it like this
QString str(byteArray)
The resulting string is empty perhaps because it encounters a 0 byte at the start and ofcouse because the documentation of the constructor I am using says that it internally uses fromAscii and what I am passing is not ascii.
I guess i have to somehow use QString::fromUTF-16 but that requires a ushort* and I have a QbyteArray.
Please advise what is the best way to do it.
Thanks,
Get a pointer to the QByteArray.data() and cast it to ushort*
This would work, assuming your utf-16 data is of the same endianness or has the BOM (Byte Order Mark):
QByteArray utf16 = ....;
auto str = QString::fromUtf16(
reinterpret_cast<const ushort*>(utf16.constData()));

Define Character Encoding of QWebElement's `toPlainText()`

I'm having trouble getting the hang of the character encoding while dealing with QWebKit's QWebElement and its toPlainText() function (*).
I have got a QString with UTF8 encoding holding the content of a HTML page, which was read from local disc via QFile. No I want to parse this page by using QWebKit. Thus I defined a QWebFrame object as part of a QWebPage. With QWebFrame::setHtml() I filled in the QString into the QWebKit environment.
QString rawReport = "some UTF8 encoded string read in previously";
QWebPage p;
QWebFrame *frame = p.mainFrame();
frame->setHtml(rawReport);
QWebElement report = frame->documentElement();
qDebug() << report.toPlainText();
But somehow, qDebug() seems to get the encoding wrong as for example German umlauts äöüß are shown rather funny. Even not as their corresponding HTML entities.
I doubt it's qDebug's fault but rather the encoding inside QWebElement. Somewhere I read, that QWebFrame::setHtml() expects UTF8 encoding. But I'm almost sure, this is the case here.
What am I missing? Is there somewhere a function/option to force QWebFrame/QWebElement to use a specific character encoding for both, input and output?
[*] Using QWebElement::toOuterXml() or QWebElement::toInnerXml() show the same encoding problem.
Have you tried using from***() functions of QString to find how the string returned by toPlainText() is encoded?
The documentation states
When using this method WebKit assumes that external resources such as JavaScript programs or style sheets are encoded in UTF-8 unless otherwise specified. For example, the encoding of an external script can be specified through the charset attribute of the HTML script tag. It is also possible for the encoding to be specified by web server.''.
I would thus try to change the charset specified in the html source (in the corresponding meta tag) that you are loading to explicitly specify that you are using UTF-8.

Place images byte into String is not working?

I tried on Flex 3, facing issue with uploading JPG/PNG image, trace readUTFBytes would return correct bytes length but tmpFileContent is trucated, it would only appear to have upload just 3 characters of data to the server through PHP script which made image unusable. I have no issue for non-images format. What is wrong here?
var tmpFileContent:String = fileRef.data.readUTFBytes(fileRef.data.length);
Is String capable of handle bytes?
I'm not sure what you're looking to do with the image, but you might want to read this:
http://livedocs.adobe.com/flex/3/html/help.html?content=Filesystem_15.html
You may also need a image encoder such as the JPEGEncoder: http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/mx/graphics/codec/JPEGEncoder.html
You could always encode using base64:
var enc:Base64Encoder = new Base64Encoder();
enc.encodeBytes(fileRef.data);
var base64data:String = enc.drain();
The method used in the tutorial is not going to work safely for anything but text files. An arbitrary binary format is likely to contain zeros. A zero (a byte whose value is 0) is generally considered a string terminator in many languages / platforms. This is also the case in Actionscript as this code shows:
var str:String = "abc\x00def";
trace(str);
The string will be truncated to "abc", since 0x00 is considered to mark the end of a string.
I think your best bet is to encode the content to base 64 as maclema suggested. From the php side, decode it back before writting the file with something like:
file_put_contents($myFilePath, base64_decode($fileData["filedata"]));
Also, I can't remember if file_put_contents is binary safe (I think it's not). If that's the case, you should use fopen('you_path',"wb"), fwrite() and fclose() to write the file. Notice the "b" in "wb", which stands for binary. If you don't pass that flag you'll probably have problems with some characters (newline and carriage return, for example).
Added:
Perhaps, following davr suggestion, you could try sending the data ByteArray to see if AMFPHP handles it correctly.
Php does allow embbeded Nuls in strings as this code shows:
$str = "a\x00b";
var_dump(ord($str{0})); // 97
var_dump(ord($str{1})); // 0
var_dump(ord($str{2})); // 98
So, if AMFPHP converts the bytearray to a string and does not mangle it in the process, this could actually work.
// method saves files on the server
function uploadFiles($fileData) {
// new file path an name
// to not overwrite the files we add the microtime before the file name
$myFilePath = '../../_uploads/'.
preg_replace("/[^0-9]+/","_",microtime()).'_'.$fileData["filename"];
// writing on the disk
$fp = fopen($myFilePath,"wb");
if($fp) {
fwrite($fp,$fileData["filedata"]);
fclose($fp);
}
// returning response - is not used anywhere
return true;
}
Otherwise, try echoing var_dump($fileData['filedata']) to see what the actual type AMFPHP is converting the data to (perhaps it uses an array, not sure; given how strings work in php (much like a buffer of single byte characters, though, I think it could be just using strings).

In a BoundColumn in a DataGrid, how do I format a byte[] column as a string?

I'm passing a datebase reader object to a DataGrid and it sees one of my columns as type byte[] but I happen to known that it should always be a printable string. How can I force the .NET DateBinding system to do that conversion? The only place I can see to put anything is in BoundColumn.DataFormatString but I can't find any indication how to do what I need with that.
Edit: I known how to convert a byte[] to a string in general but don't know how make the BoundColumn do it.
Because in this case I can edit the query string, I hacked passed it by using PADR(column,0) as column in the SELECT. I'm still interested in what to do if I couldn't modify the query.
You can use System.Text.Encoding.UTF8.GetString(byte[]) to get the string (make sure you use the correct encoding where UTF8 is present - there is ASCII, UTF7, UTF8, Unicode, and UTF32).

Resources