I need to display all the bytes from and ELF file to a QTextEdit and i did not find any reasonable way to do this. I could print maximum "?ELF??" then nothing. The content of the ELF is read in a char* array (this is a requirement, can't change that) and yes, for sure the content is read.
I am guessing that your code looks something like this:
char *elf = ReadElfFile();
QString str(elf); // Constructs a string initialized with the 8-bit string str.
QTextEdit edit(str);
The problem is that QString constructor will stop on first NUL character, and the ELF file is full of them.
If you want to make a QString that contains NULs, do something like this:
QString str(QByteArray(elf, length_of_elf));
This just nearly broke me too, so I'll post my solution to anyone interested.
Let's say I have a QByteArray data that is filled like so
data += file.readAll();
I'll then invoke an update of the QTextEdit where I'll do
QByteArray copy = data;
QString text = copy.replace((char)0x00, "\\0");
textEdit.setPlainText(text);
This way, all null bytes in the data will be displayed as the printable string \0.
Since I want changes of the textEdit to be reflected in my data, I have to parse this back using
QByteArray hex = textEdit.toPlainText().toUtf8().toHex().toUpper();
hex.replace("5C30", "00");
hex.replace("5C00", "5C30"); // oops, was escaped
data = QByteArray::fromHex(hex);
I'm using the hex format because I just could not get the replace to work with null byte characters. The code above first replaces all occurrences of the string \0 with null bytes in the data. Then it replaces any \ followed by a null byte back with \0 - which essentially means \\0 becomes \0.
It's not very elegant, but maybe it helps anyone ending up here to move on in the right direction. If you have improvements, please comment.
Related
I am trying to read percentage encoded urls with umlauts, such as äüö,..., with Qt:
QString str = "Nu%CC%88rnberg"
qDebug() << QUrl::fromPercentEncoding(str.toUtf8());
But the output is Nu¨rnberg instead of Nürnberg. How can I correctly decode urls with umlauts in this form?
Regards,
I have done this issue but I am little confused with result. First if you want to use letter ü use %C3%BC not %CC%88 (according to https://www.w3schools.com/tags/ref_urlencode.asp). So you need
QString str = "N%C3%BCrnberg";
QString encoded = QUrl::fromPercentEncoding(str.toUtf8());
But if you output it in qDebug() stream you can get different symbol (I guess it is because your default system encoding). But if you output it in GUI element you will have your ü symbol
QMessageBox::information(this, "", encoded);
this means main window.
I don't understand what happens when I create a text stream and then do setCodec("some_encoding"), does it start assuming that the file I'm reading from is in some_encoding and when I do QTextStream::readAll return me a QString in some_encoding? Or does QTextStream::readAll return a QString in unicode?
Here's what I do:
QString read(const char* encoding)
{
QTextStream stream(&file);
stream.setCodec(encoding);
return stream.readAll();
}
But I don't get a unicode string back. So, bottom line is, I want to know, how, having a file in some encoding, do I save all the contents as Unicode into a QString? If readAll() returns a string in the encoding specified, how do I convert that QString from that encoding to unicode?
Turns out this didn't have anything to do with encodings. I did stream.seek(0) before reading and it read it all right. I suspected that the problem was with encodings because usually when they're off you either get questions marks or empty strings everywhere, in this case I got an empty string.
I am obtaining the content from a QTextEdit object by using the following code:
QString text=my_QTextEdit.toPlainText();
What is the encoding that QTextEdit uses, a what encoding is used in the QString I get back from the toPlainText() call?
Thanks.
QTextEdit.toPlainText() returns a QString object, which is always a unicode character string (see documentation).
The QString class provides the functions toLatin1(), toAscii() and toUtf8(), which allow you to convert the string from unicode to an 8-bit string that you can process further. So Qt handles the encoding & decoding of the string for you.
If you want to create a QString instance from a given byte-string, you can use the functions fromAscii(), fromLatin1() or fromUtf8().
All controls in Qt are enabled for 16-bit characters. That means that content of a QTextEdit is Unicode (or UTF-32/UCS-4) (see also http://developer.nokia.com/Community/Discussion/showthread.php/215203-how-to-correctly-display-Unicodes-in-QPlainTextEdit).
When getting the content of a QTextEdit control (via plainText()), you get back a QString which contains Unicode.
From there on, you can convert to other format as you like: toUTF8(), toUCS4(), ...
I have an application that is supposed to deal with all kinds of characters and at some point display information about them. I use Qt and its inherent Unicode support in QChar, QString etc.
Now I need the code point of a QChar in order to look up some data in http://unicode.org/Public/UNIDATA/UnicodeData.txt, but QChar's unicode() method only returns a ushort (unsigned short), which usually is a number from 0 to 65535 (or 0xFFFF). There are characters with code points > 0xFFFF, so how do I get these? Is there some trick I am missing or is this currently not supported by Qt/QChar?
Each QChar is a UTF-16 value, not a complete Unicode codepoint. Therefore, non-BMP characters consist of two QChar surrogate pairs.
The solution appears to lay in code that is documented but not seen much on the Web. You can get the utf-8 value in decimal form. You then apply to determine if a single QChar is large enough. In this case it is not. Then you need to create two QChar's.
uint32_t cp = 155222; // a 4-byte Japanese character
QString str;
if(Qchar::requiresSurrogate(cp))
{
QChar charArray[2];
charArray[0] = QChar::highSurrogate(cp);
charArray[1] = QChar::lowSurrogate(cp);
str = QString(charArray, 2);
}
The resulting QString will contain the correct information to display your supplemental utf-8 character.
Unicode characters beyond U+FFFF in Qt
QChar itself only supports Unicode characters up to U+FFFF.
QString supports Unicode characters beyond U+FFFF by concatenating two QChars (that is, by using UTF-16 encoding). However, the QString API doesn't help you much if you need to process characters beyond U+FFFF. As an example, a QString instance which contains the single Unicode character U+131F6 will return a size of 2, not 1.
I've opened QTBUG-18868 about this problem back in 2011, but after more than three years (!) of discussion, it was finally closed as "out of scope" without any resolution.
Solution
You can, however, download and use these Unicode Qt string wrapper classes which have been attached to the Qt bug report. Licensed under the LGPL.
This download contains the wrapper classes QUtfString, QUtfChar, QUtfRegExp and QUtfStringList which supplement the existing Qt classes and allow you to do things like this:
QUtfString str;
str.append(0x1307C); // Some Unicode character beyond U+FFFF
Q_ASSERT(str.size() == 1);
Q_ASSERT(str[0] == 0x1307C);
str += 'a';
Q_ASSERT(str.size() == 2);
Q_ASSERT(str[1] == 'a');
Q_ASSERT(str.indexOf('a') == 1);
For further details about the implementation, usage and runtime complexity please see the API documentation included within the download.
I am having a problem where a Hebrew string is being displayed in reverse. I use QTableWidget to display some info, and here the string appears correctly using:
CString hebrewStr; hebrewStr.ToUTF8();
QString s = QString::fromUtf8( hebrewStr );
In another part of my program this same string is displayed on the screen, but not using QT, and this is what is being shown in reverse:
CString hebrewStr;
hebrewStr.ToUTF8();
I have debugged and hebrewStr.ToUTF8() in both cases produces the exact same unicode string, but the string is only displayed correctly in the QTableWidget. So I am wondering if Qt automatically reverses a given Hebrew string (since it is a rigth-to-left language). Thanks!
Yes, in this case QString generate the full unicode wchar_t from the UTF-8 encoded string. If you would like to do similar thing in MFC, you should use CStringW and decode the string.
Use MultiByteToWideChar for UTF8 to CStringW conversion.
Connected question in StackOverflow.