How to best convert a std::string_view to q QString? - qt

I have a library that gives me a string_view. What's the best way to get it into a QString (not a QStringView)?
I made QString::fromStdString(std::string(key).c_str()), but is that the best?

Drop the c_str(), you don't need it, since fromStdString() takes a std::string (hence the name):
QString::fromStdString(std::string(key))
You can also drop the explicit string construction, since std::string can be constructed from a std::string_view:
QString::fromStdString(key)
That being said, if the std::string_view is null-terminated (which is not guaranteed), you can use the QString constructor that accepts a char*:
QString(key.data())
Or, if the std::string_view is encoded in Latin-1, you can use:
QString::fromLatin1(key.data(), key.size())
Or, if encoded in UTF-8:
QString::fromUtf8(key.data(), key.size())
Or, if encoded in the user's default locale:
QString::fromLocal8Bit(key.data(), key.size())

Related

Qt - QString Numerical Format String (cformat), Options?

I have user-provided format string (e.g. "%.2f") and a QVariant type that I am attempting to combine to output into a (formatted) string.
I had gone down the path of using QString::asprintf(const char *cformat, ...) to achieve this, where I would supply the appropriate converted data type, like this:
QString result_str = QString::asprintf(disp_fmt.toUtf8(),variant_type.toUInt());
This works fine for the most part, especially when I have a floating point as the input. However, if my format string in this particular integer (.toUInt()) conversion case includes decimal formatting (e.g. "%.2f"), then I get a constant result of "0.00". This caught me by surprise as I expected to instead just get ".00" tacked onto the integer, as I have seen in other languages like Perl.
What am I missing here? Also, I know asprintf() was added fairly recently and the documentation already now advises to use QTextStream or arg() instead. I don't believe this to be an option, however, for me to use this style of format string. Thanks.
The format string is expecting a double, but you're providing an int. It works if you provide an actual double, like this:
QString result_str = QString::asprintf(disp_fmt.toUtf8(),variant_type.toDouble());
Also note, this behavior is identical to how the standard C library functions work (std::sprintf, etc).

Convert multibyte character array to QChar array

I have two buffers (example sizes):
char c[512];
QChar q[256];
Assuming 'c' contains multibyte character string (UTF-8). I need to convert it to QChar sequence and place it in 'q'.
I guess a good example of what I need could be MultiByteToWideChar function.
IMPORTANT: this operation shall not involve any explicit or implicit memory allocations, except for additional allocations on the stack, maybe.
Please, do not answer if you are not sure what the above means.
QChar contains an ushort as only member, so its size is sizeof(ushort).
In QString context it represents UTF-16 'characters' (code points).
So it's all about encoding here.
If you know your char const * is UTF-16 data in the same endianness / byte order as your system, simply copy it:
memcpy(q, c, 512);
If you want to initialize a QString with your const char * data, you could just interpret it as UTF-16 using QString::fromRawData():
QString strFromData = QString::fromRawData(reinterpret_cast<QChar*>(c), 256);
// where 256 is sizeof(c) * sizeof(char) / sizeof(QChar)
Then you don't even need the QChar q[256] array.
If you know your data is UTF-8, you should use QString::fromUtf8() and then simply access its inner memory with QString::constData().
Using QString with UTF-8 I don't know of any method to completely prevent heap allocations. But the mentioned way should only allocate twice: Once for the PIMPL of QString, once for the UTF-16 string data.
If your input data is encoded as UTF-8, the answer is No: You cannot convert it using Qt.
Proof: Looking at the source code of qtbase/src/corelib/codecs/qutfcodec.cpp we see that all functions for encoding / decoding create new QString / QByteArray instances. No function operates on two arrays as in your question.

Preserve non-ascii characters between std::string and QString

In my program the user can either provide a filename on the command line or using a QFileDialog. In the first case, I have a char* without any encoding information, in the second I have a QString.
To store the filename for later use (Recent Files), I need it as a QString. But to open the file with std::ifstream, I need a std::string.
Now the fun starts. I can do:
filename = QString::fromLocal8Bit(argv[1]);
later on, I can do:
std::string fn = filename.toLocal8Bit().constData();
This works for most characters, but not all. For example, the word Раи́са will look the same after going through this conversion, but, in fact, have different characters.
So while I can have a Раи́са.txt, and it will display Раи́са.txt, it will not find the file in the filesystem. Most letters work, but и́ doesnt.
(Note that it does work correctly when the file was chosen in the QFileDialog. It does not when it originated from the command line.)
Is there any better way to preserve the filename? Right now I obtain it in whatever native encoding, and can pass-on in the same encoding, without knowing it. At least so I thought.
'и́' is not an ASCII character, that is to say it has no 8-bit representation. How it is represented in argv[1] then is OS dependent. But it's not getting represented in just one char.
The fromLocal8bit uses the same QTextCodec::codecForLocale as toLocal8bit. And as you say your std::string will hold "Раи́са.txt" so that's not the problem.
Depending on how your OS defined std::ifstream though std::ifstream may expect each char to be it's own char and not go through the OS's translation. I expect that you are on Windows since you are seeing this problm. In which case you should use the std::wstring implementation of std::fstream which is Microsoft specific: http://msdn.microsoft.com/en-us/library/4dx08bh4.aspx
You can get a std::wstring from QString by using: toStdWString
See here for more info: fstream::open() Unicode or Non-Ascii characters don't work (with std::ios::out) on Windows
EDIT:
A good cross-platform option for projects with access to it is Boost::Filesystem. ypnos Mentions File-Streams as specifically pertinent.

How to determine the encoding of the text in a QTextEdit in Qt?

I am obtaining the content from a QTextEdit object by using the following code:
QString text=my_QTextEdit.toPlainText();
What is the encoding that QTextEdit uses, a what encoding is used in the QString I get back from the toPlainText() call?
Thanks.
QTextEdit.toPlainText() returns a QString object, which is always a unicode character string (see documentation).
The QString class provides the functions toLatin1(), toAscii() and toUtf8(), which allow you to convert the string from unicode to an 8-bit string that you can process further. So Qt handles the encoding & decoding of the string for you.
If you want to create a QString instance from a given byte-string, you can use the functions fromAscii(), fromLatin1() or fromUtf8().
All controls in Qt are enabled for 16-bit characters. That means that content of a QTextEdit is Unicode (or UTF-32/UCS-4) (see also http://developer.nokia.com/Community/Discussion/showthread.php/215203-how-to-correctly-display-Unicodes-in-QPlainTextEdit).
When getting the content of a QTextEdit control (via plainText()), you get back a QString which contains Unicode.
From there on, you can convert to other format as you like: toUTF8(), toUCS4(), ...

Retrieve Unicode code points > U+FFFF from QChar

I have an application that is supposed to deal with all kinds of characters and at some point display information about them. I use Qt and its inherent Unicode support in QChar, QString etc.
Now I need the code point of a QChar in order to look up some data in http://unicode.org/Public/UNIDATA/UnicodeData.txt, but QChar's unicode() method only returns a ushort (unsigned short), which usually is a number from 0 to 65535 (or 0xFFFF). There are characters with code points > 0xFFFF, so how do I get these? Is there some trick I am missing or is this currently not supported by Qt/QChar?
Each QChar is a UTF-16 value, not a complete Unicode codepoint. Therefore, non-BMP characters consist of two QChar surrogate pairs.
The solution appears to lay in code that is documented but not seen much on the Web. You can get the utf-8 value in decimal form. You then apply to determine if a single QChar is large enough. In this case it is not. Then you need to create two QChar's.
uint32_t cp = 155222; // a 4-byte Japanese character
QString str;
if(Qchar::requiresSurrogate(cp))
{
QChar charArray[2];
charArray[0] = QChar::highSurrogate(cp);
charArray[1] = QChar::lowSurrogate(cp);
str = QString(charArray, 2);
}
The resulting QString will contain the correct information to display your supplemental utf-8 character.
Unicode characters beyond U+FFFF in Qt
QChar itself only supports Unicode characters up to U+FFFF.
QString supports Unicode characters beyond U+FFFF by concatenating two QChars (that is, by using UTF-16 encoding). However, the QString API doesn't help you much if you need to process characters beyond U+FFFF. As an example, a QString instance which contains the single Unicode character U+131F6 will return a size of 2, not 1.
I've opened QTBUG-18868 about this problem back in 2011, but after more than three years (!) of discussion, it was finally closed as "out of scope" without any resolution.
Solution
You can, however, download and use these Unicode Qt string wrapper classes which have been attached to the Qt bug report. Licensed under the LGPL.
This download contains the wrapper classes QUtfString, QUtfChar, QUtfRegExp and QUtfStringList which supplement the existing Qt classes and allow you to do things like this:
QUtfString str;
str.append(0x1307C); // Some Unicode character beyond U+FFFF
Q_ASSERT(str.size() == 1);
Q_ASSERT(str[0] == 0x1307C);
str += 'a';
Q_ASSERT(str.size() == 2);
Q_ASSERT(str[1] == 'a');
Q_ASSERT(str.indexOf('a') == 1);
For further details about the implementation, usage and runtime complexity please see the API documentation included within the download.

Resources