QRegExp and Null Character in Qt

QRegExp and Null Character in Qt - qt

i want search in a binary file with regular expression.
my search is successful in Text files, but not match in binary file, because QRegExp in function indexIn stop search when meet the NULL Character (chr(0)).
what can i do to solve this problem?

QString can contain null characters, it's just its constructors that are inconsistent...
QString::fromUtf8(const char *str, int size = -1) uses the given size, while QString::fromUtf8(const QByteArray &str) forces a strlen instead of using the bytearray size. See for yourself Qt code.
QRegExp also supports null characters:
QString s(QChar(0));
QRegExp re(s);
qDebug() << re.indexIn(s); // will print 0, not -1

Related

How can i convert a QByteArray into a hex string?

I have the blow QByteArray.
QByteArray ba;
ba[0] = 0x01;
ba[1] = 0x10;
ba[2] = 0x00;
ba[3] = 0x07;
I have really no idea how to convert this QByteArray into resulted string which have "01100007", which i would use the QRegExp for pattern matching on this string?

First of all, the QByteArray does not contain "hex values", it contains bytes (as it's name implies). Number can be "hex" only when it is printed as text.
Your code should be:
QByteArray ba(4, 0); // array length 4, filled with 0
ba[0] = 0x01;
ba[1] = 0x10;
ba[2] = 0x00;
ba[3] = 0x07;
Anyway, to convert a QByteArray to a hex string, you got lucky: just use QByteArray::toHex() method!
QByteArray ba_as_hex_string = ba.toHex();
Note that it returns 8-bit text, but you can just assign it to a QString without worrying much about encodings, since it is pure ASCII. If you want upper case A-F in your hexadecimal numbers instead of the default a-f, you can use QByteArray::toUpper() to convert the case.

QString has following contructor:
constructor QString(const QByteArray &ba)
But note that an octal number is preceeded by 0 in c++, so some of your values are deciamal, some octal, none of them are hex.

QString to unicode std::string

I know there is plenty of information about converting QString to char*, but I still need some clarification in this question.
Qt provides QTextCodecs to convert QString (which internally stores characters in unicode) to QByteArray, allowing me to retrieve char* which represents the string in some non-unicode encoding. But what should I do when I want to get a unicode QByteArray?
QTextCodec* codec = QTextCodec::codecForName("UTF-8");
QString qstr = codec->toUnicode("Юникод");
std::string stdstr(reinterpret_cast<const char*>(qstr.constData()), qstr.size() * 2 ); // * 2 since unicode character is twice longer than char
qDebug() << QString(reinterpret_cast<const QChar*>(stdstr.c_str()), stdstr.size() / 2); // same
The above code prints "Юникод" as I've expected. But I'd like to know if that is the right way to get to the unicode char* of the QString. In particular, reinterpret_casts and size arithmetics in this technique looks pretty ugly.

The below applies to Qt 5. Qt 4's behavior was different and, in practice, broken.
You need to choose:
Whether you want the 8-bit wide std::string or 16-bit wide std::wstring, or some other type.
What encoding is desired in your target string?
Internally, QString stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChars.
Common cases:
Locally encoded 8-bit std::string (as in: system locale):
std::string(str.toLocal8Bit().constData())
UTF-8 encoded 8-bit std::string:
str.toStdString()
This is equivalent to:
std::string(str.toUtf8().constData())
UTF-16 or UCS-4 encoded std::wstring, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width of wchar_t.
str.toStdWString()
U16 or U32 strings of C++11 - from Qt 5.5 onwards:
str.toStdU16String()
str.toStdU32String()
UTF-16 encoded 16-bit std::u16string - this hack is only needed up to Qt 5.4:
std::u16string(reinterpret_cast<const char16_t*>(str.constData()))
This encoding does not include byte order marks (BOMs).
It's easy to prepend BOMs to the QString itself before converting it:
QString src = ...;
src.prepend(QChar::ByteOrderMark);
#if QT_VERSION < QT_VERSION_CHECK(5,5,0)
auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
src.size()};
#else
auto dst = src.toStdU16String();
If you expect the strings to be large, you can skip one copy:
const QString src = ...;
std::u16string dst;
dst.reserve(src.size() + 2); // BOM + termination
dst.append(char16_t(QChar::ByteOrderMark));
dst.append(reinterpret_cast<const char16_t*>(src.constData()),
src.size()+1);
In both cases, dst is now portable to systems with either endianness.

Use this:
QString Widen(const std::string &stdStr)
{
return QString::fromUtf8(stdStr.data(), stdStr.size());
}
std::string Narrow(const QString &qtStr)
{
QByteArray utf8 = qtStr.toUtf8();
return std::string(utf8.data(), utf8.size());
}
In all cases you should have utf8 in std::string.

You can get the QByteArray from a UTF-16 encoded QString using this:
QTextCodec *codec = QTextCodec::codecForName("UTF-16");
QTextEncoder *encoderWithoutBom = codec->makeEncoder( QTextCodec::IgnoreHeader );
QByteArray array = encoderWithoutBom->fromUnicode( str );
This way you ignore the unicode byte order mark (BOM) at the beginning.
You can convert it to char * like:
int dataSize=array.size();
char * data= new char[dataSize];
for(int i=0;i<dataSize;i++)
{
data[i]=array[i];
}
Or simply:
char *data = array.data();

Finding a specific character in a file in Qt

How can i find a specific character in a QFile which has a text in it?
for example i have ' $5000 ' written somewhere in my file. in want to find the "$" sign so i will realize that I've reached the number.
I tried using QString QTextStream::read(qint64 maxlen) by putting 1 as the maxlen :
QFile myfile("myfile.txt");
myfile.open(QIODevice::ReadWrite | QIODevice::Text);
QTextStream myfile_stream(&myfile);
while(! myfile_stream.atEnd())
{
if( myfile_stream.read(1) == '$')
{
qDebug()<<"found";
break;
}
}
and i get "error: invalid conversion from 'char' to 'const char* "
i also tried using the operator[] but apparently it can't be used for files.

Read in a line at a time and search the text that you've read in
QTextStream stream(&myFile);
QString line;
do
{
line = stream.readLine();
if(line.contains("$"))
{
qDebug()<<"found";
break;
}
} while (!line.isNull());

The error message you've posted doesn't match the issue in your code. Possibly the error was caused by something else.
QTextStream::read returns QString. You can't compare QString and const char* directly, but operator[] can help:
QString s = stream.read(1);
if (s.count() == 1) {
if (s[0] == '$') {
//...
}
}
However reading a file by too small pieces will be very slow. If your file is small enough, you can read it all at once:
QString s = stream.readAll();
int index = s.indexOf('$');
If your file is large, it's better to read file by small chunks (1024 bytes for example) and calculate the index of found character using indexOf result and count of already read chunks.

a single char could be read with
QTextStream myfile_stream(&myfile);
QChar c;
while (!myfile_stream.atEnd())
myfile_stream >> c;
if (c == '$') {
...
}

myfile_stream.read(1) - this is not good practice, you should not read from file one byte at a time. Either read the entire file, or buffered/line by line if there is a risk for the file to be too big to fit in memory.
The error you get is because you compare a QString for equality with a character literal - needless to say that is not going to work as expected. A string is a string even if there is only one character in it. As advised - use either the [] operator or better off for reading - QString::at() const which is guaranteed to create no extra copy. You don't use it on the QFile, nor on the QTextStream, but on the QString that is returned from the read() method of the text stream targeted at the file.
Once you have the text in memory, you can either use the regular QString methods like indexOf() to search for the index of a contained character.

in want to find the "$" sign so i will realize that I've reached the
number.
It sounds to me that you're searching for the '$' symbol because you're more interested in the dollar value that follows it. In this case, I suggest reading the files line by line and running them through a QRegExp to extract any values you're looking for.
QRegExp dollarFind("\\$(\\d+)");
while(!myfile_stream.atEnd()){
QString line = myfile_stream.readLine();
if (dollarFind.exactMatch(line)){
QStringList dollars = dollarFind.capturedTexts();
qDebug() << "Dollar values found: " << dollars.join(", ");
}
}

what's the difference between QString and QLatin1String?

Like the title
1.what's the difference between QString and QLatin1String??
2.when and where do I need to use one of them??
3.following:
QString str;
str = "";
str = QLatin1String("");
Is "" == QLatin1String("")??

QString holds unicode. A string literal "foo" is a byte sequence that could contain text in any encoding. When assigning a string literal to a QString, QString str = "foo", you implicitely convert from a byte sequence in undefined encoding to a QString holding unicode. The QString(const char*) constructor assumes ASCII and will convert as if you typed QString str = QString::fromAscii("foo"). That would break if you use non-ascii literals in your source files (e.g., japanese string literals in UTF-8) or pass character data from a char* or QByteArray you read from elsewhere (a file, socket, etc.). Thus it's good practice to keep the unicode QString world and the byte array QByteArray/char* world separated and only convert between those two explicitly, clearly stating which encoding you want to use to convert between those two. One can define QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII to enforce explicit conversions (I would always enable them when writing a parser of any sort).
Now, to assign a latin1 string literal to a QString variable using explicit conversion, one can use
QString foo = QString::fromLatin1("föö");
or
QString foo = QLatin1String("föö");
Both state that the literal is encoded in latin1 and allow "encoding-safe" conversions to unicode.
I find QLatin1String nicer to read and the QLatin1String docs explain why it will be also faster in some situations.
Wrapping string literals, or in some cases QByteArray or char* variables, holding latin1 data for conversion is the main use for QLatin1String, one wouldn't use QLatin1String as method arguments, member variables or temporaries (all QString).

QString is Unicode based while QLatin1String is US-ASCII/Latin-1 based
Unicode is a super set of US-ASCII/Latin-1. If you only deal with US-ASCII/Latin-1 characters, the two are the same for you.
http://doc.qt.io/qt-4.8/qstring.html
http://doc.qt.io/qt-4.8/qlatin1string.html

How to convert QList<QByteArray> to QString in QT?

I have a QList<QByteArray> that I want to print out in a QTextBrowser. QTextBrowser->append() takes a QString.
Despite a ton of searching online, I have not found a way to convert the data I have into a QString.

There are several functions to convert QByteArray to QString: QString::fromAscii(), QString::fromLatin1(), QString::fromUtf8() etc. for the most common ones, and QTextCodec for other encodings. Which one is the correct one depends on the encoding of the text data in the byte array.

Try:
for(int i=0; i<list.size(); ++i){
QString str(list[i].constData());
// use your string as needed
}

from QByteArray to QString, do
const char * QByteArray::constData () const
Returns a pointer to the data stored in the byte array. The pointer
can be used to access the bytes that compose the array. The data is
'\0'-terminated. The pointer remains valid as long as the byte array
isn't reallocated or destroyed.
This function is mostly useful to pass a byte array to a function that
accepts a const char *.
you then have this QString constructor
QString ( const QChar * unicode )

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex