Convert multibyte character array to QChar array - qt

I have two buffers (example sizes):
char c[512];
QChar q[256];
Assuming 'c' contains multibyte character string (UTF-8). I need to convert it to QChar sequence and place it in 'q'.
I guess a good example of what I need could be MultiByteToWideChar function.
IMPORTANT: this operation shall not involve any explicit or implicit memory allocations, except for additional allocations on the stack, maybe.
Please, do not answer if you are not sure what the above means.

QChar contains an ushort as only member, so its size is sizeof(ushort).
In QString context it represents UTF-16 'characters' (code points).
So it's all about encoding here.
If you know your char const * is UTF-16 data in the same endianness / byte order as your system, simply copy it:
memcpy(q, c, 512);
If you want to initialize a QString with your const char * data, you could just interpret it as UTF-16 using QString::fromRawData():
QString strFromData = QString::fromRawData(reinterpret_cast<QChar*>(c), 256);
// where 256 is sizeof(c) * sizeof(char) / sizeof(QChar)
Then you don't even need the QChar q[256] array.
If you know your data is UTF-8, you should use QString::fromUtf8() and then simply access its inner memory with QString::constData().
Using QString with UTF-8 I don't know of any method to completely prevent heap allocations. But the mentioned way should only allocate twice: Once for the PIMPL of QString, once for the UTF-16 string data.
If your input data is encoded as UTF-8, the answer is No: You cannot convert it using Qt.
Proof: Looking at the source code of qtbase/src/corelib/codecs/qutfcodec.cpp we see that all functions for encoding / decoding create new QString / QByteArray instances. No function operates on two arrays as in your question.

Related

How to best convert a std::string_view to q QString?

I have a library that gives me a string_view. What's the best way to get it into a QString (not a QStringView)?
I made QString::fromStdString(std::string(key).c_str()), but is that the best?
Drop the c_str(), you don't need it, since fromStdString() takes a std::string (hence the name):
QString::fromStdString(std::string(key))
You can also drop the explicit string construction, since std::string can be constructed from a std::string_view:
QString::fromStdString(key)
That being said, if the std::string_view is null-terminated (which is not guaranteed), you can use the QString constructor that accepts a char*:
QString(key.data())
Or, if the std::string_view is encoded in Latin-1, you can use:
QString::fromLatin1(key.data(), key.size())
Or, if encoded in UTF-8:
QString::fromUtf8(key.data(), key.size())
Or, if encoded in the user's default locale:
QString::fromLocal8Bit(key.data(), key.size())

How to display the content of an ELF file in QTextEdit?

I need to display all the bytes from and ELF file to a QTextEdit and i did not find any reasonable way to do this. I could print maximum "?ELF??" then nothing. The content of the ELF is read in a char* array (this is a requirement, can't change that) and yes, for sure the content is read.
I am guessing that your code looks something like this:
char *elf = ReadElfFile();
QString str(elf); // Constructs a string initialized with the 8-bit string str.
QTextEdit edit(str);
The problem is that QString constructor will stop on first NUL character, and the ELF file is full of them.
If you want to make a QString that contains NULs, do something like this:
QString str(QByteArray(elf, length_of_elf));
This just nearly broke me too, so I'll post my solution to anyone interested.
Let's say I have a QByteArray data that is filled like so
data += file.readAll();
I'll then invoke an update of the QTextEdit where I'll do
QByteArray copy = data;
QString text = copy.replace((char)0x00, "\\0");
textEdit.setPlainText(text);
This way, all null bytes in the data will be displayed as the printable string \0.
Since I want changes of the textEdit to be reflected in my data, I have to parse this back using
QByteArray hex = textEdit.toPlainText().toUtf8().toHex().toUpper();
hex.replace("5C30", "00");
hex.replace("5C00", "5C30"); // oops, was escaped
data = QByteArray::fromHex(hex);
I'm using the hex format because I just could not get the replace to work with null byte characters. The code above first replaces all occurrences of the string \0 with null bytes in the data. Then it replaces any \ followed by a null byte back with \0 - which essentially means \\0 becomes \0.
It's not very elegant, but maybe it helps anyone ending up here to move on in the right direction. If you have improvements, please comment.

Qt - How to convert a number into QChar

I have a qulonglong variable and I need to convert it into QChar.
For example, from number 65 I should get 'A'.
Or if there is a solution to make that directly into QString would be good too.
Qhat you need is the QChar constructor.
QChar c((short) n);
Notice that QChar provides 16 bit characters:
The QChar class provides a 16-bit Unicode character. In Qt, Unicode
characters are 16-bit entities without any markup or structure. This
class represents such an entity. It is lightweight, so it can be used
everywhere. Most compilers treat it like a unsigned short.
qlonglong is an 64 bit integer so you should be very careful with the conversion to short
qlonglong i = 65;
QString((char)i);
Or see the docs here.

Retrieve Unicode code points > U+FFFF from QChar

I have an application that is supposed to deal with all kinds of characters and at some point display information about them. I use Qt and its inherent Unicode support in QChar, QString etc.
Now I need the code point of a QChar in order to look up some data in http://unicode.org/Public/UNIDATA/UnicodeData.txt, but QChar's unicode() method only returns a ushort (unsigned short), which usually is a number from 0 to 65535 (or 0xFFFF). There are characters with code points > 0xFFFF, so how do I get these? Is there some trick I am missing or is this currently not supported by Qt/QChar?
Each QChar is a UTF-16 value, not a complete Unicode codepoint. Therefore, non-BMP characters consist of two QChar surrogate pairs.
The solution appears to lay in code that is documented but not seen much on the Web. You can get the utf-8 value in decimal form. You then apply to determine if a single QChar is large enough. In this case it is not. Then you need to create two QChar's.
uint32_t cp = 155222; // a 4-byte Japanese character
QString str;
if(Qchar::requiresSurrogate(cp))
{
QChar charArray[2];
charArray[0] = QChar::highSurrogate(cp);
charArray[1] = QChar::lowSurrogate(cp);
str = QString(charArray, 2);
}
The resulting QString will contain the correct information to display your supplemental utf-8 character.
Unicode characters beyond U+FFFF in Qt
QChar itself only supports Unicode characters up to U+FFFF.
QString supports Unicode characters beyond U+FFFF by concatenating two QChars (that is, by using UTF-16 encoding). However, the QString API doesn't help you much if you need to process characters beyond U+FFFF. As an example, a QString instance which contains the single Unicode character U+131F6 will return a size of 2, not 1.
I've opened QTBUG-18868 about this problem back in 2011, but after more than three years (!) of discussion, it was finally closed as "out of scope" without any resolution.
Solution
You can, however, download and use these Unicode Qt string wrapper classes which have been attached to the Qt bug report. Licensed under the LGPL.
This download contains the wrapper classes QUtfString, QUtfChar, QUtfRegExp and QUtfStringList which supplement the existing Qt classes and allow you to do things like this:
QUtfString str;
str.append(0x1307C); // Some Unicode character beyond U+FFFF
Q_ASSERT(str.size() == 1);
Q_ASSERT(str[0] == 0x1307C);
str += 'a';
Q_ASSERT(str.size() == 2);
Q_ASSERT(str[1] == 'a');
Q_ASSERT(str.indexOf('a') == 1);
For further details about the implementation, usage and runtime complexity please see the API documentation included within the download.

Printing out hex values of a char* array in C gives odd values for binary input

Here's an odd problem that's been stumping me for a bit.
The program is written in C89, and it reads a file into a char* array 16 bytes at a time (using fread and a size of sizeof(char)). The file is fopen'd with the "rb" flags. The array is then passed into a function that basically takes the 16 hex values and sticks it into a string, each value seperated by a space.
Here's where the weirdness comes in. The function produces a nice hex dump, 16 bytes at a time, for a text file input that I have. But it screws up if I try it on a small bitmap image -- I end up with output in the string like ffffff88 instead of just 88.
The hex values are placed into the output string using sprintf("%02x ", input[i]); in a loop.
Why would this work properly for some files but not others?
In C the char is treated as a signed value, unless you specify it as unsigned. It seems that when you pass parameters to a function, that when the parameter happens to be a char, it's 'padded out' to the size of a regular integer. If you don't clue the compiler in that this should be done in an unsigned way, 128 becomes 0xFFFFFF80, and so on.
So, the sign extension happens before the print formatter ever gets to look at the value. What this means is that
printf("%02X", (unsigned) input[i]);
won't solve your problem, as the value of input[i] will be sign extended, so all values from 128 to 255 are treated as -127 to -1 and become 0xFFFFFF80 to 0xFFFFFF, then cast, whereas
printf("%02X", ((unsigned char *) input)[i] );
will do the trick, but is kind of ungainly and hard to read. Best to make the type of input[] be unsigned char in the first place.
What you see is the result of sign extension from the char to int, using unsigned char * or casting to unsigned char before the cast to int is (implicitly?) performed should fix your problem.

Resources