How to efficiently deserialize nvp dictionary from std::string using boost - dictionary

Suppose I have a stringified representation of a name-value-pair dictionary with custom separators between names/values and name-value pairs, e.g. "foo:4|bar:-1" or "Alice=>cat;Bob=>dog".
It can be done with boost's split(), but I was curious if firstly, doing so isn't just reinventing the wheel, secondly, there are more efficient alternatives, like some customized deserialization.

Consider using Boost Spirit. Using the X3 version you'd write
const std::string input = "foo:4|bar:-1"; // or "Alice=>cat;Bob=>dog";
auto text = +~x3::char_(":|");
std::map<std::string, int> parsed;
if (parse(input.begin(), input.end(), (text >> ':' >> x3::int_) % '|', parsed)) {
std::cout << "parsed[bar]: " << parsed["bar"] << "\n";
}
Live On Coliru
The other form:
const std::string input = "Alice=>cat;Bob=>dog";
auto text = +(x3::char_ - ';' - "=>");
std::map<std::string, std::string> parsed;
if (parse(input.begin(), input.end(), (text >> "=>" >> text) % ';', parsed)) {
std::cout << "parsed[Bob]: " << parsed["Bob"] << "\n";
}
Live On Coliru

Related

QRegExp does not match even though regex101.com does

I need to extract some data from string with simple syntax. The syntax is this:
_IMPORT:[any text] - [HEX number] #[decimal number]
Therefore I created regex you can see below in the code:
//SYNTAX: _IMPORT:%1 - %2 #%3
static const QRegExp matchImportLink("^_IMPORT:(.*?) - ([A-Fa-f0-9]+) #([0-9]+)$");
QRegExp importLink(matchImportLink);
QString qtWtf(importLink.pattern());
const int index = importLink.indexIn(mappingName);
qDebug()<< "Input string: "<<mappingName;
qDebug()<< "Regular expression:"<<qtWtf;
qDebug()<< "Result: "<< index;
For some reason, that does not work, I get this output:
Input string: "_IMPORT:ddd - 92806f0f96a6dea91c37244128f7d00f #0"
Regular expression: "^_IMPORT:(.*?) - ([A-Fa-f0-9]+) #([0-9]+)$"
Result: -1
I even tried to remove the anchors ^ and $ but that didn't help and also is undesired. The annoying thing is that this regexp works perfectly if I copy the output in regex101.com, as you can see here: https://regex101.com/r/oT6cY3/1
Can anyone explain what is wrong here? Did I stumble upon Qt bug? I use Qt 5.6. Is there any workaround for this?
It seems like Qt does not recognize the quatifier *? as valid. Check the method QRegExp::isValid() againts your pattern. In my case it did not work because of this. And the documentation tells that any invalid pattern will never match.
So first thing I tried was skipping the ? which perfectly fits your provided string with all capturing groups. Here is my code.
QString str("_IMPORT:ddd - 92806f0f96a6dea91c37244128f7d00f #0");
QRegExp exp("^_IMPORT:(.*) - ([A-Fa-f0-9]+) #([0-9]+)$");
qDebug() << "pattern:" << exp.pattern();
qDebug() << "valid:" << exp.isValid();
int pos = 0;
while ((pos = exp.indexIn(str, pos)) != -1) {
for (int i = 1; i <= exp.captureCount(); ++i)
qDebug() << "pos:" << pos << "len:" << exp.matchedLength() << "val:" << exp.cap(i);
pos += exp.matchedLength();
}
And here is the resulting output.
pattern: "^_IMPORT:(.*) - ([A-Fa-f0-9]+) #([0-9]+)$"
valid: true
pos: 0 len: 49 val: "ddd"
pos: 0 len: 49 val: "92806f0f96a6dea91c37244128f7d00f"
pos: 0 len: 49 val: "0"
Tested using Qt 5.6.1.
Also note that you may set greedy evaluation using QRegExp::setMinimal(bool).

Pass a string from ECL to C++

I'm trying to get into the fascinating world of Common Lisp embedded in C++. My problem is that I can't manage to read and print from c++ a string returned by a lisp function defined in ECL.
In C++ I have this function to run arbitrary Lisp expressions:
cl_object lisp(const std::string & call) {
return cl_safe_eval(c_string_to_object(call.c_str()), Cnil, Cnil);
}
I can do it with a number in this way:
ECL:
(defun return-a-number () 5.2)
read and print in C++:
auto x = ecl_to_float(lisp("(return-a-number)"));
std::cout << "The number is " << x << std::endl;
Everything is set and works fine, but I don't know to do it with a string instead of a number. This is what I have tried:
ECL:
(defun return-a-string () "Hello")
C++:
cl_object y = lisp("(return-a-string)");
std::cout << "A string: " << y << std::endl;
And the result of printing the string is this:
A string: 0x3188b00
that I guess is the address of the string.
Here it is a capture of the debugger and the contents of the y cl_object. y->string.self type is an ecl_character.
Debug
(Starting from #coredump's answer that the string.self field provides the result.)
The string.self field is defined as type ecl_character* (ecl/object.h), which appears to be given in ecl/config.h as type int (although I suspect this is slightly platform dependent). Therefore, you will not be able to just print it as if it was a character array.
The way I found worked for me was to reinterpret it as a wchar_t (i.e. a unicode character). Unfortunately, I'm reasonably sure this isn't portable and depends both on how ecl is configured and the C++ compiler.
// basic check that this should work
static_assert(sizeof(ecl_character)==sizeof(wchar_t),"sizes must be the same");
std::wcout << "A string: " << reinterpret_cast<wchar_t*>(y->string.self) << std::endl;
// prints hello, as required
// note the use of wcout
The alternative is to use the lisp type base-string which does use char (base-char in lisp) as its character type. The lisp code then reads
(defun return-a-base-string ()
(coerce "Hello" 'base-string))
(there may be more elegant ways to do the conversion to base-string but I don't know them).
To print in C++
cl_object y2 = lisp("(return-a-base-string)");
std::cout << "Another: " << y2->base_string.self << std::endl;
(note that you can't mix wcout and cout in the same program)
According to section 2.6 Strings of The ECL Manual, I think that the actual character array is found by accessing the string.self field of the returned object. Can you try the following?
std::cout << y->string.self << std::endl;
std::string str {""};
cl_object y2 = lisp("(return-a-base-string)");
//get dimension
int j = y2->string.dim;
//get pointer
ecl_character* selv = y2->string.self;
//do simple pointer addition
for(int i=0;i<j;i++){
str += (*(selv+i));
}
//do whatever you want to str
this code works when the string is build from ecl_characters
from the documentation:
"ECL defines two C types to hold its characters: ecl_base_char and ecl_character.
When ECL is built without Unicode, they both coincide and typically match unsigned char, to cover the 256 codes that are needed.
When ECL is built with Unicode, the two types are no longer equivalent, with ecl_character being larger.
For your code to be portable and future proof, use both types to really express what you intend to do."
On my system the return-a-base-string is not needed, but I think it could be good to add for compatibility. I use the (ecl) embedded CLISP 16.1.2 version.
The following piece of code reads a string from lisp and converts to C++ strings types - std::string and c-string- and store them on C++ variables:
// strings initializations: string and c-string
std::string str2 {""};
char str_c[99] = " ";
// text read from clisp, whatever clisp function that returns string type
cl_object cl_text = lisp("(coerce (text-from-lisp X) 'base-string)");
//cl_object cl_text = lisp("(text-from-lisp X)"); // no base string conversions
// catch dimension
int cl_text_dim = cl_text->string.dim;
// complete c-string char by char
for(int ind=0;i<cl_text_dim;i++){
str_c[i] = ecl_char(cl_text,i); // ecl function to get char from cl_object
}
str_c[cl_text_dim] ='\0'; // end of the c-string
str2 = str_c; // get the string on the other string type
std::cout << "Dim: " << cl_ text_dim << " C-String var: " << str_c() << " String var << str2 << std::endl;
It is a slow process as passing char by char but it is the only way by the moment I know. Hope it helps. Greetings!

How can I simply parse a CSS like (!) file in my Qt application?

I have a document in a *.css (Cascading Style Sheets) like format, but it has its own keywords. Actually it is a personalized css (I call it *.pss), with own tags and properties. here I have an excerpt:
/* CSS like style sheet file *.pss */
#include "otherStyleSheet.pss";
/* comment */
[propertyID="1230000"] {
fillColor : #f3f1ed;
minSize : 5;
lineWidth : 3;
}
/* sphere */
[propertyID="124???|123000"] {
lineType : dotted;
}
/* square */
[propertyID="125???"] {
lineType : thinline;
}
/* ring */
[propertyID="133???"] {
lineType : thickline;
[hasInnerRing=true] {
innerLineType : thinline;
}
}
I would like to parse it very easily, is there already something Ready-To-Use from Qt? What would be the easiest way?
Since *.css has its own keywords, I am NOT interessted in CSS parsers.
My further intention after parsing that *.pss is to store its properties in a Model structure .
There's nothing public within Qt. You're of course free to use the Qt's private CSS parser - you can copy it and modify to fit your needs.
See qtbase/src/gui/text/qcssparser_p.h, in qtbase/src/gui/text.
The good news is that for the example you've shown above, the modifications would be very minor. Qt's CSS parser already supports #import, so we only additional bit of syntax you have is the nested selector syntax. Without that syntax, you can use QCss::Parser as-is. The parser was written in a flexible fashion, where you don't need to worry about formal CSS keywords: it will still let you access all the declarations, whether they make sense from the formal CSS point of view or not.
Iterating the parse tree is as simple as it gets:
int main() {
QCss::Parser parser(pss);
QCss::StyleSheet styleSheet;
if (!parser.parse(&styleSheet))
return 1;
for (auto rule : styleSheet.styleRules) {
qDebug() << "** Rule **";
for (auto sel : rule.selectors) {
for (auto bSel : sel.basicSelectors)
qDebug() << bSel;
}
for (auto decl : rule.declarations)
qDebug() << decl;
}
}
The output is what we'd expect:
** Rule **
BasicSelector "propertyID"="1230000"
Declaration "fillColor" = '#f3f1ed' % QColor(ARGB 1, 0.952941, 0.945098, 0.929412)
Declaration "minSize" = '5' % 5
Declaration "lineWidth" = '3'
** Rule **
BasicSelector "propertyID"="124???|123000"
Declaration "lineType" = 'dotted'
** Rule **
BasicSelector "propertyID"="125???"
Declaration "lineType" = 'thinline'
** Rule **
BasicSelector "propertyID"="133???"
Declaration "lineType" = 'thickline'
We have to implement the debug stream operators for QCss classes ourselves:
QDebug operator<<(QDebug dbg, const QCss::AttributeSelector & sel) {
QDebugStateSaver saver(dbg);
dbg.noquote().nospace() << "\"" << sel.name << "\"";
switch (sel.valueMatchCriterium) {
case QCss::AttributeSelector::MatchEqual:
dbg << "="; break;
case QCss::AttributeSelector::MatchContains:
dbg << "~="; break;
case QCss::AttributeSelector::MatchBeginsWith:
dbg << "^="; break;
case QCss::AttributeSelector::NoMatch:
break;
}
if (sel.valueMatchCriterium != QCss::AttributeSelector::NoMatch && !sel.value.isEmpty())
dbg << "\"" << sel.value << "\"";
return dbg;
}
QDebug operator<<(QDebug dbg, const QCss::BasicSelector & sel) {
QDebugStateSaver saver(dbg);
dbg.noquote().nospace() << "BasicSelector";
if (!sel.elementName.isEmpty())
dbg << " #" << sel.elementName;
for (auto & id : sel.ids)
dbg << " id:" << id;
for (auto & aSel : sel.attributeSelectors)
dbg << " " << aSel;
return dbg;
}
When traversing the declaration, the QCss::parser already interprets some standard values for us, e.g. colors, integers, etc.
QDebug operator<<(QDebug dbg, const QCss::Declaration & decl) {
QDebugStateSaver saver(dbg);
dbg.noquote().nospace() << "Declaration";
dbg << " \"" << decl.d->property << "\" = ";
bool first = true;
for (auto value : decl.d->values) {
if (!first) dbg << ", ";
dbg << "\'" << value.toString() << "\'";
first = false;
}
if (decl.d->property == "fillColor")
dbg << " % " << decl.colorValue();
else if (decl.d->property == "minSize") {
int i;
if (decl.intValue(&i)) dbg << " % " << i;
}
return dbg;
}
Finally, the boilerplate and the stylesheet to be parsed:
// https://github.com/KubaO/stackoverflown/tree/master/questions/css-like-parser-31583622
#include <QtGui>
#include <private/qcssparser_p.h>
const char pss[] =
"/* #include \"otherStyleSheet.pss\"; */ \
[propertyID=\"1230000\"] { \
fillColor : #f3f1ed; \
minSize : 5; \
lineWidth : 3; \
} \
\
/* sphere */ \
[propertyID=\"124???|123000\"] { \
lineType : dotted; \
} \
\
/* square */ \
[propertyID=\"125???\"] { \
lineType : thinline; \
} \
\
/* ring */ \
[propertyID=\"133???\"] { \
lineType : thickline; \
/*[hasInnerRing=true] { \
innerLineType : thinline; \
}*/ \
}";
Support for nested selectors/rules can be implemented by modifying the parser source. The change needed to make Parser::parseRuleset recursive is very minor. I'll leave this as the exercise for the reader :)
All in all, I'd think that reusing the existing parser is much easier than rolling your own, especially as your users will inevitably wish you to support more and more of the CSS spec.
I know two possibilities:
boost::spirit and here you can find a good introduction to the boost::spirit parser framework
I would recommend to write your own recursive descent parser
Due to the fact, that your personalized *.pss is not that complex as a CSS (simple bracketing etc.), I would recommend 2.
Well, I'm guessing you don't want to be in the business of writing an Object parser, you would just be reinventing JSON, or YAML, or the like. So your best bet is to make your formatting conform to a known configuration or object notation language and then parse it with some library for the language you are using. With very minor modification, the format you describe above could become HOCON, which is a very nice superset of JSON, and has syntax much closer to what you are using:
https://github.com/typesafehub/config/blob/master/HOCON.md
You could then parse it with a HOCON parsing library, and voila, you would have in-memory objects you can model or store any way you please. I believe Qt is C++ based? There is a hocon library for C, I don't know about C++, and I'm guessing you would need to write a Qt plug-in to wrap the HOCON parsing from some other language.
The other option is to use a CSS->object parser like this one:
https://github.com/reworkcss/css
Which you may need to fork and modify to your needs. Either way, I'm guessing that to integrate into a Qt app you will need a plug-in that handles some call-out to a command-line process or other code module.

Do you lose the reserved space in a Qt string with the QString::clear() function?

I am working on a loop where I want to copy a few characters from a string to another. I know the limit is around 20 characters so I want to do this outside the loop:
QString name;
name.reserve(25);
That way I have a buffer ready to be filled and Qt avoids reallocating it every time a name is parsed. Only, to get the name I do something like this:
for(int i(0); i < 20 && *s != '\0'; ++i)
{
name += *s;
}
which means I have to reset the name each time. How can I do that and be sure that the reserved space doesn't get lost every time?
// will reserved memory be lost after this call?
name.clear();
// would that be more likely to keep the memory buffer?
name = "";
The documentation does not seem to say one way or the other.
The complete set of loops goes something like this:
QString name;
name.reserve(25);
for(QChar const *s(input.data()); *s != '\0'; ++s)
{
...snip...
if(<some condition>)
{
name.clear() // losing reserved data here?
for(int i(0); i < 20 && *s != '\0'; ++i)
{
name += *s;
}
...snip...
}
...snip...
}
Calling QString::clear() will cause your reserved space to be lost. Consider the following:
QString s;
s.reserve(25);
qDebug() << "Before Clear: " << s.capacity();
s.clear();
qDebug() << "After Clear: " << s.capacity();
Output:
Before Clear: 25
After Clear: 0
The most efficient way to remove the contents of the string without losing your reserved space is to call QString::resize():
QString s;
s.reserve(25);
qDebug() << "Before Resize: " << s.capacity();
s.resize(0);
qDebug() << "After Resize: " << s.capacity();
Output:
Before Resize: 25
After Resize: 25
In the implementation of QString::resize(), a call to resize(0) for strings with reserved capacity amounts to setting the internal size value to 0 and setting the first character of the internal buffer to '\0'.

QImageReader::text() does not work properly

I need to read image description (QImage::text()) without reading whole image. I used QImageReader, but its behaviour is very strange. It does not return text() or textKeys() but QImage does.
Example code:
QImageReader imageReader(filename,0);
QImage image;
bool ok=imageReader.read(&image);
qDebug() << "KEYS_READER: " << imageReader.textKeys() << "\n";
qDebug() << "KEYS_IMAGE: " << image.textKeys() << "\n";
When I try to load data from image which has one text key it prints in the debug window:
KEYS: ()
KEYS0: ("UVFI")
But I need to retrieve keys without reading the whole image data.

Resources