Is there a way to workaround the limit of 255 types in a flatbuffers union? - unions

I am using flatbuffers to serialize rows from sql tables. I have a Statement.fbs that defines a statement as Insert, Update, Delete, etc. The statement has a member "Row" that is a union of all sql table types. However, I have more than 255 tables and I get this error when compiling with flatc:
$ ~/flatbuffers/flatc --cpp -o gen Statement.fbs
error: /home/jkl/fbtest/allobjects.fbs:773: 18: error: enum value does not fit [0; 255]
I looked through the flatbuffers code and I see that an enum is automatically created for union types and that the underlying type of this enum is uint8_t.
I do not see any options for changing this behavior.
I am able to create an enum that handles all my tables by specifying the underlying type to be uint16 in my flatbuffer schema file.
The statement schema:
include "allobjects.fbs";
namespace Database;
enum StatementKind : byte { Unknown = 0, Insert, Update, Delete, Truncate }
table Statement {
kind:StatementKind;
truncate:[TableKind];
row:Row;
}
root_type Statement;
The allobjects Row union is a bit large to include here.
union Row {
TypeA,
TypeB,
TypeC,
Etc,
...
}
I suppose this is a design decision for flatbuffers that union types should only use one byte. I can accept that, but I would really like a workaround.

This sadly is a bit of a design mistake, and there is no workaround yet. Fixing this to be configurable is possible, but would be a fair bit of work given the amount of language ports that rely on it being a byte. See e.g. here: https://github.com/google/flatbuffers/issues/4209
Yes, multiple unions is a clumsy workaround.
An alternative could be to define the type as an enum. Now you have the problem that you don't have a typesafe way to store the table, though. That could be achieved with a "nested flatbuffer", i.e. storing the union value as a vector of bytes, which you can then cheaply call GetRoot on with the correct type, once you inspected the enum.
Another option may be an enum + a union, if the number of unique kinds of records is < 256. For example, you may have multiple row types that even though they have different names, their contents is just a string, so they can be merged for the union type.
Another hack could be to have declare a table RowBaseClass {} or whatever, which would be the type of the field, but you would never actually instantiate this table. You then cast back and forth to that type to store the actual table, dependending on the language you're using.

The nested buffer solution to the 255 limit of unions is pretty straight forward.
allobjects.fbs:
namespace Database;
table Garbage {
gid:ulong;
type:string;
weight:uint;
}
... many more ...
Statement.fbs:
include "allobjects.fbs";
namespace Database;
enum StatementKind : byte { Unknown = 0, Insert, Update, Delete, Truncate }
// suppose this enum holds the > 255 Row types
enum TableKind : uint16 { Unknown = 0, Garbage, Etc... }
// this is the "union", but with a type enum beyond ubyte size
table Row {
kind:TableKind;
// this payload will be the nested flatbuffer
payload:[ubyte];
}
table Statement {
kind:StatementKind;
truncate:[TableKind];
row:Row;
}
root_type Statement;
main.c:
#include <iostream>
#include "Statement_generated.h"
void encodeInsertGarbage(unsigned long gid,
const std::string& type,
unsigned int weight,
std::vector<uint8_t>& retbuf)
{
flatbuffers::FlatBufferBuilder fbb;
// create Garbage flatbuffer
// I used the "Direct" version so I didn't have to create a flatbuffer string object
auto garbage = Database::CreateGarbageDirect(fbb, gid, type.c_str(), weight);
fbb.Finish(garbage);
// make [ubyte] from encoded "Garbage" object
auto payload = fbb.CreateVector(fbb.GetBufferPointer(), fbb.GetSize());
// make the generic Row homebrewed union
auto obj = Database::CreateRow(fbb, Database::TableKind_Garbage, payload);
fbb.Finish(obj);
// create the Statement - 0 for "truncate" since that is not used for Insert
auto statement = Database::CreateStatement(fbb, Database::StatementKind_Insert, 0, obj);
fbb.Finish(statement);
// copy the resulting flatbuffer to output vector
// just for this test program, typically you write to a file or socket.
retbuf.assign(fbb.GetBufferPointer(), fbb.GetBufferPointer() + fbb.GetSize());
}
void decodeInsertGarbage(std::vector<uint8_t>& retbuf)
{
auto statement = Database::GetStatement(retbuf.data());
auto tableType = statement->row()->kind();
auto payload = statement->row()->payload();
// just using a simple "if" statement here, but a full solution
// could use an array of getters, indexed by TableKind, then
// wrap it up nice with a template function to cast the return type
// like rowGet<Garbage>(payload);
if (tableType == Database::TableKind_Garbage)
{
auto garbage = Database::GetGarbage(payload->Data());
std::cout << " gid: " << garbage->gid() << std::endl;
std::cout << " type: " << garbage->type()->c_str() << std::endl;
std::cout << " weight: " << garbage->weight() << std::endl;
}
}
int main()
{
std::vector<uint8_t> iobuf;
encodeInsertGarbage(0, "solo cups", 12, iobuf);
decodeInsertGarbage(iobuf);
return 0;
}
Output:
$ ./fbtest
gid: 0
type: solo cups
weight: 12

Related

Explaining pointers to a Javascript developer

I started to learn coding backwards: high level first. This has the obvious liability of missing some basic concepts that I should definitely know, and when I try to learn a low level language, it throws me.
I have tried many times to understand pointers, however the explanations rapidly go over my head, usually because all of the example code uses languages that use pointers, which I don't understand other things about, and then I spin.
I am the most (and very at that) fluent in Javascript.
How would you explain pointers to a sad Javascript developer like me? Could someone provide me a practical, real life example?
Maybe even showing how, if Javascript had pointers, you could do x, and a pointer is different than a raw variable because of y.
Here's an attempt at a self-contained answer from first principles.
Pointers are part of a type system that permit the implementation of reference semantics. Here's how. We suppose that our language has a type system, by which every variable is of a certain type. C is a good example, but many languages work like this. So we can have a bunch of variables:
int a = 10;
int b = 25;
Further, we assume that function arguments are always copied from the caller scope into the function scope. (This is also true for many real languages, though the details can quickly become subtle when the type system gets 'hidden' from the user (e.g. such as in Java)). So let's have a function:
void foo(int x, int y);
When calling foo(a, b), the variables a and b are copied into local variables x and y corresponding to the formal parameters, and those copies are visible within the function scope. Whatever the function does with x and y has no effect on the variables a and b at the call site. The entire function call is opaque to the caller.
Now let's move on to pointers. A language with pointers contains, for every object type T, a related type T *, which is the type "pointer to T". Values of type T * are produced by taking the address of an existing object of type T. So a language that has pointers also needs to have a way to produce pointers, which is "taking the address of something". The purpose of a pointer is to store the address of an object.
But that's only one half of the picture. The other half is what to do with the address of an object. The main reason for caring about the address of an object is to be able to refer to the object whose address is being stored. This object is obtained by a second operation, suitably called dereferencing, which when applied to a pointer produces the object which is being "pointed to". Importantly, we do not a copy of the object, but we get the actual object.
In C, the address-of operator is spelled &, and the dereference operator is spelled *.
int * p = &a; // p stores the address of 'a'
*p = 12; // now a == 12
The first operand of the final assignment, *p, is the object a itself. Both a and *p are the same object.
Now why is this useful? Because we can pass pointers to functions to allow functions to change things outside the function's own scope. Pointers allow for indirection, and thus for referencing. You can tell the function about "something else". Here's the standard example:
void swap(int * p, int * q)
{
int tmp = *p;
*p = *q;
*q = tmp;
}
We can tell the function swap about our variables a and b by giving it the addresses of those variables:
swap(&a, &b);
In this way, we are using pointers to implement reference semantics for the function swap. The function gets to refer to variables elsewhere and can modify them.
The fundamental mechanism of reference semantics can thus be summarized thus:
The caller takes the address of the object to be refered to:
T a;
mangle_me(&a);
The callee takes a pointer parameter and dereferneces the pointer to access the refered value.
void mangle_me(T * p)
{
// use *p
}
Reference semantics are important for may aspects of programming, and many programming languages supply them in some way or another. For example, C++ adds native reference support to the language, largely removing the needs for pointers. Go uses explicit pointers, but offers some notational "convenience" by sometimes automagically dereferencing a pointer. Java and Python "hide" pointer-ness inside their type system, e.g. the type of a variable is in some sense a pointer to the type of the object. In some languages, some types like ints are naked value types, and others (like lists and dictionaries) are "hidden-pointer-included" reference types. Your milage may vary.
C++ rules are fairly simple and consistent. I actually find how Javascript handles object references and prototypes way more unintuitive.
Preface A: Why is Javascript A Bad Place To Start?
The first thing you need to fundamentally understand before you can tackle pointers is variables. You need to know what they are and how the computer keeps track of them.
Coming from a Javascript background you are used to every variable assigned to an object being a reference. That is, two variables can reference the same object. This is essentially pointers without any syntax to allow for more intricate use. You are also used to implicit copies of "basic" types like numbers. That is to say:
var a = MyObject;
var b = a;
Now if you change b you also change a. You would need to explicitly copy MyObject in order to have two variables pointing to different instances of it!
var a = 5;
var b = a;
Now if you change b, a is not actually changed. This is because assigning a to b when a is a simple type will copy it automatically for you. You cannot get the same behavior as objects with simple numbers and vise versa, so when you want two variables to refer to the same number you have to wrap it in an object. There is no explicit way to indicate how you want to handle references vs copies for primitive types.
You can see this inconsistent behavior with no variation on syntax (but an extreme variation on behavior) can make the relationship between variables and what they contain muddy. For this reason I highly suggest banishing this mental model for a moment as we continue on our journey to understand explicit pointers.
Preface B: YOLO: Variable Lifetime On The Stack
So, let's talk from here on out in C++ terms. C++ is one of the most explicit languages in terms of what a variable is vs a pointer. C++ is a good entry point because it is low level enough to talk in terms of memory and lifespan, but high level enough to understand things at a decent level of abstraction.
So, in C++ when you create any variable it exists in a certain scope. There are two ways to create a variable, on the stack, and on the heap.
The stack refers to the call stack of your application. Every brace pair pushes a new context onto the stack (and pops it when it runs out). When you create a local variable, it exists in that particular stack frame, when that stack frame is popped the variable is destroyed.
A simple example of scope:
#include <iostream>
#include <string>
struct ScopeTest{
ScopeTest(std::string a_name):
name(a_name){
std::cout << "Create " << name << std::endl;
}
~ScopeTest(){
std::cout << "Destroy " << name << std::endl;
}
ScopeTest(ScopeTest &a_copied){
std::cout << "Copy " << a_copied.name << std::endl;
name = a_copied.name + "(copy)";
a_copied.name += "(original)";
}
std::string name;
};
ScopeTest getVariable(){ //Stack frame push
ScopeTest c("c"); //Create c
return c; //Copy c + Destroy c(original)
}
int main(){
ScopeTest a("a"); //Create a
{
ScopeTest b("b"); //Create b
ScopeTest d = getVariable();
} //Destroy c(copy) + Destroy b
} //Destroy a
Output:
Create a
Create b
Create c
Copy c
Destroy c(original)
Destroy c(copy)
Destroy b
Destroy a
This should illustrate explicitly how a variable ties its life to the stack, how it is copied around, and when it dies.
Preface C: YOLO Variable Lifetime on the Heap
So, that's interesting conceptually, but variables can also be allocated outside of the stack, this is called "heap" memory because it is largely structure-less. The issue with heap memory is that you don't really have automatic cleanup based on scope. So you need a way to tie it to some kind of "handle" to keep track of it.
I'll illustrate here:
{
new ScopeTest("a"); //Create a
} //Whoa, we haven't destroyed it! Now we are leaking memory!
So, clearly we can't just say "new X" without keeping track of it. The memory gets allocated, but doesn't tie itself to a lifespan so it lives forever (like a memory vampire!)
In Javascript you can just tie it to a variable and the object dies when the last reference to it dies. Later I'll talk about a more advanced topic in C++ which allows for that, but for now let's look at simple pointers.
In C++ when you allocate a variable with new, the best way to track it is to assign it to a pointer.
Preface D: Pointers and The Heap
As I suggested, we can track allocated memory on the heap with a pointer. Our previous leaky program can be fixed like so:
{
ScopeTest *a = new ScopeTest("a"); //Create a
delete a; //Destroy a
}
ScopeTest *a; creates a pointer, and assigning it to a new ScopeTest("a") gives us a handle we can actually use to clean up and refer to the variable which exists in heap memory. I know heap memory sounds kinda confusing, but it's basically a jumble of memory that you can point to and say "hey you, I want a variable with no lifespan, make one and let me point at it".
Any variable created with the new keyword must be followed by exactly 1 (and no more than 1) delete or it will live forever, using up memory. If you try to delete any memory address other than 0 (which is a no-op) more than one time, you could be deleting memory not under your program's control which results in undefined behavior.
ScopeTest *a; declares a pointer. From here on out, any time you say "a" you are referring to a specific memory address. *a will refer to the actual object at that memory address, and you can access properties of it (*a).name. a-> in C++ is a special operator that does the same thing as (*a).
{
ScopeTest *a = new ScopeTest("a"); //Create a
std::cout << a << ": " << (*a).name << ", " << a->name << std::endl;
delete a; //Destroy a
}
Output for the above will look something like:
007FB430: a, a
Where 007FB430 is a hex representation of a memory address.
So in the purest sense, a pointer is literally a memory address and the ability to treat that address as a variable.
The Relationship Between Pointers and Variables
We don't just have to use pointers with heap allocated memory though! We can assign a pointer to any memory, even memory living on the stack. Just be careful your pointer doesn't out-live the memory it points to or you'll have a dangling pointer which could do bad things if you continue to try and use it.
It is always the programmer's job to make sure a pointer is valid, there are literally 0 checks in place in C++ to help you out when dealing with bare memory.
int a = 5; //variable named a has a value of 5.
int *pA = &a; //pointer named pA is now referencing the memory address of a (we reference "a" with & to get the address).
Now pA refers to the same value as &a, that is to say, it is the address of a.
*pA refers to the same value as a.
You can treat *pA = 6; the same as a = 6. Observe (continuing from the above two lines of code):
std::cout << *pA << ", " << a << std::endl; //output 5, 5
a = 6;
std::cout << *pA << ", " << a << std::endl; //output 6, 6
*pA = 7;
std::cout << *pA << ", " << a << std::endl; //output 7, 7
You can see why *pA is called a "pointer". It is literally pointing to the same address in memory as a. So far we have been using *pA to de-reference the pointer and access the value at the address it points to.
Pointers have a few interesting properties. One of those properties is that it can change the object it is pointing at.
int b = 20;
pA = &b;
std::cout << *pA << ", " << a << ", " << b << std::endl; //output 20, 7, 20
*pA = 25;
std::cout << *pA << ", " << a << ", " << b << std::endl; //output 25, 7, 25
pA = &a;
std::cout << *pA << ", " << a << ", " << b << std::endl; //output 7, 7, 25
*pA = 8;
std::cout << *pA << ", " << a << ", " << b << std::endl; //output 8, 8, 25
b = 30;
pA = &b;
std::cout << *pA << ", " << a << ", " << b << std::endl; //output 30, 8, 30
So you can see that a pointer is really just a handle to a point in memory. This can be exceptionally useful in many cases, do not write it off just because this sample is simplistic.
Now, the next thing you need to know about pointers is that you can increment them as long as the memory you are incrementing to belongs to your program. The most common example is C strings. In modern C++ strings are stored in a container called std::string, use that, but I will use an old C style string to demonstrate array access with a pointer.
Pay close attention to ++letter. What this does is increment the memory address the pointer is looking at by the size of the type it is pointing to.
Let's break this down a bit more, re-read the above sentence a few times then continue on.
If I have a type that is sizeof(T) == 4, every ++myPointerValue will shift 4 spaces in memory to point to the next "value" of that type. This is part of why the pointer "type" matters.
char text[] { 'H', 'e', 'l', 'l', 'o', '\0' }; //could be char text[] = "Hello"; but I want to show the \0 explicitly
char* letter = text;
for (char* letter = &text[0]; *letter != '\0';++letter){
std::cout << "[" << *letter << "]";
}
std::cout << std::endl;
The above will loop over the string as long as there is no '\0' (null) character. Keep in mind this can be dangerous and is a common source of insecurity in programs. Assuming your array is terminated by some value, but then getting an array that overflows allowing you to read arbitrary memory. That's a high level description anyway.
For that reason it is much better to be explicit with string length and use safer methods such as std::string in regular use.
Alright, and as a final example to put things into context. Let's say I have several discreet "cells" that I want to link together into one coherent "list". The most natural implementation of this with non-contiguous memory is to use pointers to direct each node to the next one in the sequence.
With pointers you can create all sorts of complex data structures, trees, lists, and more!
struct Node {
int value = 0;
Node* previous = nullptr;
Node* next = nullptr;
};
struct List {
List(){
head = new Node();
tail = head;
}
~List(){
std::cout << "Destructor: " << std::endl;
Node* current = head;
while (current != nullptr){
Node* next = current->next;
std::cout << "Deleting: " << current->value << std::endl;
delete current;
current = next;
}
}
void Append(int value){
Node* previous = tail;
tail = new Node();
tail->value = value;
tail->previous = previous;
previous->next = tail;
}
void Print(){
std::cout << "Printing the List:" << std::endl;
Node* current = head;
for (Node* current = head; current != nullptr;current = current->next){
std::cout << current->value << std::endl;
}
}
Node* tail;
Node* head;
};
And putting it to use:
List sampleList;
sampleList.Append(5);
sampleList.Append(6);
sampleList.Append(7);
sampleList.Append(8);
sampleList.Print();
List may seem complicated at a glance, but I am not introducing any new concepts here. This is exactly the same things I covered above, just implemented with a purpose.
Homework for you to completely understand pointers would be to provide two methods in List:
Node* NodeForIndex(int index)
void InsertNodeAtIndex(int index, int value)
This list implementation is exceptionally poor. std::list is a much better example, but it most cases due to data locality you really want to stick with std::vector. Pointers are exceptionally powerful tools, and fundamental in computer science. You need to understand them to appreciate how the common data types you rely on every day are composed, and in time you will come to appreciate the explicit separation of value from pointer in C++.
Beyond simple pointers: std::shared_ptr
std::shared_ptr gives C++ the ability to deal with reference counted pointers. That is to say, it gives a similar behavior to Javascript object assignment (where an object is destroyed when the last reference to that object is set to null or destroyed).
std::shared_ptr is just like any other stack based variable. It ties its lifetime to the stack, and then holds a pointer to memory allocated on the heap. In this regard, it encapsulates the concept of a pointer in a safer manner than having to remember to delete.
Let's re-visit our earlier example that did leak memory:
{
new ScopeTest("a"); //Create a
} //Whoa, we haven't destroyed it! Now we are leaking memory!
With a shared_ptr we can do the following:
{
std::shared_ptr<ScopeTest> a(new ScopeTest("a")); //Create a
}//Destroy a
And, a little more complex:
{
std::shared_ptr<ScopeTest> showingSharedOwnership;
{
std::shared_ptr<ScopeTest> a(new ScopeTest("a")); //"Create a" (ref count 1)
showingSharedOwnership = a; //increments a's ref count by 1. (now 2)
} //the shared_ptr named a is destroyed, decrements ref count by 1. (now 1)
} //"Destroy a" showingSharedOwnership dies and decrements the ref count by 1. (now 0)
I won't go too much further here, but this should open your mind to pointers.

QSqlRelationalTableModel - insert record greater than 256

I have a table node={id,name}, and a table segment={id,nodeFrom,nodeTo} in a SQLite db, where node.id and segment.id are AUTOINCREMENT fields.
I'm creating a QSqlTableModel for Node, as follows:
nodeModel = new QSqlTableModel(this,db);
nodeModel->setTable("Node");
nodeModel->setEditStrategy(QSqlTableModel::OnFieldChange);
and I use the following code for inserting nodes:
int addNode(QString name) {
QSqlRecord newRec = nodeModel->record();
newRec.setGenerated("id",false);
newRec.setValue("name",name);
if (not nodeModel->insertRecord(-1,newRec))
qDebug() << nodeModel->lastError();
if (not nodeModel->submit())
qDebug() << nodeModel->lastError();
return nodeModel->query().lastInsertId().toInt();
}
This seems to work. Now, for segments I define a QSqlRelationalTableModel, as follows:
segModel = new QSqlRelationalTableModel(this,db);
segModel->setTable("Segment");
segModel->setEditStrategy(QSqlTableModel::OnManualSubmit);
segModel->setRelation(segModel->fieldIndex("nodeFrom"),
QSqlRelation("Node","id","name"));
segModel->setRelation(segModel->fieldIndex("nodeTo"),
QSqlRelation("Node","id","name"));
And then I have the following code for inserting segments:
int addSegment(int nodeFrom, int nodeTo) {
QSqlRecord newRec = segModel->record();
newRec.setGenerated("id",false);
newRec.setValue(1,nodeFrom);
newRec.setValue(2,nodeTo);
if (not segModel->insertRecord(-1,newRec)) // (*)
qDebug() << segModel->lastError();
if (not segModel->submitAll())
qDebug() << segModel->lastError(); // (*)
}
I can add successfully 280 nodes using addNode(). I can also add segments sucessfully if nodeFrom<=256 and nodeTo<=256. For any segment referencing a node greater or equal to 256 I get a
QSqlError("19", "Unable to fetch row", "Segment.nodeTo may not be NULL")
in one of the lines marked with a (*) of the addSegment function.
I've googled and found out that people are having other (apparently unrelated) problems when they hit the magical 256 record count. No solution seems to work with this particular problem.
What am I doing wrong?
Thanks!
The reason of this error lies in the void QRelation::populateDictionary() method which uses such a loop for (int i=0; i < model->rowCount(); ++i). If you use the database that does not report the size of the query back (e.g. SQLite), the rowCount() method will return this magical 256 value.
You can solve this by populating the relation model before using data(...) or setData(...). At first you can try with:
setRelation(nodeFromCol, QSqlRelation("Node", "id", "name"));
QSqlTableModel *model = relationModel(nodeFromCol);
while(model->canFetchMore())
model->fetchMore();
Try this way to fix
newRec.setValue(1,QVariant(nodeFrom));
newRec.setValue(2,QVariant(nodeTo));

Combine Thread Results with openmp

I have some problems combining the processing results I recieve from several Threads. And I'm not sure, if I use openmp correctly. The below code extract shows the openmp portion of my code.
Parameters:
thread private:
it: map iterator (timestamp, userkey)
ite: map iterator ((timestamp,userkey)/int amount)
thread_result_map: typedef map < userkey(str),timestamp(str) >
when, who: matching regex (timestamp, userkey)
shared among threads:
log: char array
size: log.size()
identifier, timestamp, userkey: boost::regex patterns
combined_result_map: typedef map < thread_result_map, hits(int) >
#pragma omp parallel shared(log, size, identifier, timestamp, userkey) private(it, ite, str_time, str_key, vec_str_result, i, id, str_current, when, who, thread_result_map)
{
#pragma omp for
for (i = 0 ; i < size ; i++){
str_current.push_back(log[i]);
if (log[i] == '\n') {
if (boost::regex_search(str_current, identifier)){
boost::regex_search(str_current, when, timestamp);
str_time = when[0];
boost::regex_search(str_current, who, userkey);
str_key = who[0];
thread_result_map.insert(make_pair(str_time, str_key));
}
str_current = ""; //reset temp string
}
}
#pragma omp critical
{
for (it=thread_result_map.begin(); it!=thread_result_map.end(); it++) {
id = omp_get_thread_num();
cout << thread_result_map[it->first] <<
thread_result_map[it->second];
cout << "tID_" << id << " reducing" << endl;
}
}
}
As you can see every thread has his own partition of the char array, it parses line by line from the array and if the current string is identified by "identifier", the timestamp and userkey are added to the thread's private result map (string/string).
Now after the loop I have several thread's private result maps. The combined_result_map is a map inside a map. The key is the combination of key/value of the threads result and the value is the amount of occurences of this combination.
I'm parsing only a portion of the timestamp so when in 1 hour the same userkey appears multiple times the hit counter will be increased.
The result should look something like this:
TIME(MMM/DD/HH/);USERKEY;HITS
May/25/13;SOMEKEY124345;3
So I have no problems combining hit amounts in the critical section (which I removed) by specifying combined+=results.
But how can I combine my result maps the same way? I know I have to iterate through threads maps, but when I put a "cout" inside the loop for testing every thread calls it only once.
A test run on my local syslog gives me the following output when I set all the regex to "error" (to make sure every identified line will have a userkey and a timestamp with the same name):
Pattern for parsing Access String:
error Pattern for parsing Timestamp:
error Pattern for parsing Userkey:
error
*** Parsing File /var/log/syslog
errortID_0 reducing errortID_1
reducing errortID_2 reducing
errortID_3 reducing
*** Ok! ________________ hits :
418 worktime: 0.0253871s
(The calculated hits come from thread private counters, that I removed in the code above)
So every of my 4 threads does a single cout and leaves the loop, although all together should have 418 hits. So what do I do wrong? How do I iterate through my results from inside my openmp area?
Found the problem myself, sorry for asking stupid questions.
I was trying to add the same key multiple times, that's why map size didn't increase and every thread looped only once.
Edit:
If anybody is interested in the solution how to combine thread results, this is how I did it. perhaps you see anything that could be improved.
I just changed the local threads result map to a vector of pairs(str,str).
This is the full working openmp code section. Pehaps it's useful for anyone:
#pragma omp parallel shared(log, size, identifier, timestamp, userkey) private(it, ite, str_time, str_key, i, id, str_current, when, who, local_res)
{
#pragma omp for
for (i = 0 ; i < size ; i++){
str_current.push_back(log[i]);
if (log[i] == '\n') { // if char is newline character
if (boost::regex_search(str_current, identifier)){ // if current line is access string
boost::regex_search(str_current, when, timestamp); // get timestamp from string
str_time = when[0];
boost::regex_search(str_current, who, userkey); // get userkey from string
str_key = who[0];
local_res.push_back((make_pair(str_time, str_key))); // append key-value-pair(timestamp/userkey)
id = omp_get_thread_num();
//cout << "tID_" << id << " - adding pair - my local result map size is now: " << local_res.size() << endl;
}
str_current = "";
}
}
#pragma omp critical
{
id = omp_get_thread_num();
hits += local_res.size();
cout << "tID_" << id << " had HITS: " << local_res.size() << endl;
for (i = 0; i < local_res.size(); i++) {
acc_key = local_res[i].second;
acc_time = local_res[i].first;
if(m_KeyDatesHits.count(acc_key) == 0) { // if there are no items for this key yet, make a new entry
m_KeyDatesHits.insert(make_pair(acc_key, str_int_MapType()));
}
if (m_KeyDatesHits[acc_key].count(acc_time) == 0) { // "acc_time" is a key value, if it doesn't exist yet, add it and set "1" as value
m_KeyDatesHits[acc_key].insert(make_pair(acc_time, 1 ));
it = m_KeyDatesHits.begin(); // iterator for userkeys/maps
ite = m_KeyDatesHits[acc_key].begin(); // iterator for times/clicks
} else m_KeyDatesHits[acc_key][acc_time]++; // if userkey already exist and timestamp already exists, count hits +1 for it
}
}
}
I did some tests and it's really running fast.
Using 4 Threads this searches a 150MB LogFile for access events, parses a custom user key and date from every event and combines the results in under 4 seconds.
At the End it creates a export list. This is the program output:
HELLO, welcome to LogMap 0.1!
C++/OpenMP Memory Map Parsing Engine
__________________ Number of processors available = 4
Number of threads = 4
Pattern for parsing Access String:
GET /_openbooknow/key/ Pattern for
parsing Timestamp: \d{2}/\w{3}/\d{4}
Pattern for parsing Userkey:
[a-zA-Z0-9]{20,32}
* Parsing File
/home/c0d31n/Desktop/access_log-test.txt
HITS: 169147 HITS: 169146 HITS: 169146
HITS: 169147
* Ok! ________ hits :
676586 worktime: 4.03816s
* new export file created: "./test.csv"
root#c0d3b0x:~/workspace/OpenBookMap/Release#
cat test.csv
"1nDh0gV6eE3MzK0517aE6VIU0";"28/Mar/2011";"18813"
"215VIU1wBN2O2Fmd63MVmv6QTZy";"28/Mar/2011";"6272"
"36Pu0A2Wly3uYeIPZ4YPAuBy";"18/Mar/2011";"18816"
"36Pu0A2Wly3uYeIPZ4YPAuBy";"21/Mar/2011";"12544"
"36Pu0A2Wly3uYeIPZ4YPAuBy";"22/Mar/2011";"12544"
"36Pu0A2Wly3uYeIPZ4YPAuBy";"23/Mar/2011";"18816"
"9E1608JFGk2GZQ4ppe1Grtv2";"28/Mar/2011";"12544"
"pachCsiog05bpK0kDA3K2lhEY";"17/Mar/2011";"18029"
"pachCsiog05bpK0kDA3K2lhEY";"18/Mar/2011";"12544"
"pachCsiog05bpK0kDA3K2lhEY";"21/Mar/2011";"18816"
"pachCsiog05bpK0kDA3K2lhEY";"22/Mar/2011";"6272"
"pachCsiog05bpK0kDA3K2lhEY";"23/Mar/2011";"18816"
"pachCsiog05bpK0kDA3K2lhEY";"28/Mar/2011";"501760"
"1nDh0gV6eE3MzK0517aE6VIU0";"28/Mar/2011";"18813"
"215VIU1wBN2O2Fmd63MVmv6QTZy";"28/Mar/2011";"6272"
"36Pu0A2Wly3uYeIPZ4YPAuBy";"18/Mar/2011";"18816"
"36Pu0A2Wly3uYeIPZ4YPAuBy";"21/Mar/2011";"12544"
"36Pu0A2Wly3uYeIPZ4YPAuBy";"22/Mar/2011";"12544"
"36Pu0A2Wly3uYeIPZ4YPAuBy";"23/Mar/2011";"18816"
"9E1608JFGk2GZQ4ppe1Grtv2";"28/Mar/2011";"12544"
"pachCsiog05bpK0kDA3K2lhEY";"17/Mar/2011";"18029"
"pachCsiog05bpK0kDA3K2lhEY";"18/Mar/2011";"12544"
"pachCsiog05bpK0kDA3K2lhEY";"21/Mar/2011";"18816"
"pachCsiog05bpK0kDA3K2lhEY";"22/Mar/2011";"6272"
"pachCsiog05bpK0kDA3K2lhEY";"23/Mar/2011";"18816"
"pachCsiog05bpK0kDA3K2lhEY";"28/Mar/2011";"501760"

Pointer won't return with assigned address

I'm using Qt Creator 4.5 with GCC 4.3 and I'm having the following problem that I am not sure is Qt or C++ related: I call a function with a char * as an input parameter. Inside that function I make a dynamic allocation and I assign the address to the char *. The problem is when the function returns it does not point to this address anymore.
bool FPSengine::putData (char CommandByte , int Index)
{
char *msgByte;
structSize=putDatagrams(CommandByte, Index, msgByte);
}
int FPSengine::putDatagrams (char CommandByte, int Index, char *msgByte)
{
int theSize;
switch ( CommandByte ) {
case (CHANGE_CONFIGURATION): {
theSize=sizeof(MsnConfigType);
msgByte=new char[theSize];
union MConfigUnion {
char cByte[sizeof(MsnConfigType)];
MsnConfigType m;
};
MConfigUnion * msnConfig=(MConfigUnion*)msgByte;
...Do some assignments. I verify and everything is OK.
}
}
return theSize;
}
When I return the pointer it contains a completely different address than the one assigned in putDatagrams(). Why?
...
Ok thx I understand my mistake(rookie mistake :( ). When sending a pointer as an input parameter to the function you send the address of your data but not the address of your pointer so you cant make the pointer point somewhere else...it is actually a local copy like Index. The only case the data would of been returned succesfully with the use of a char * is by allocating the memory before the function call:
bool FPSengine::putData (char CommandByte , int Index)
{
char *msgByte;
msgByte=new char[sizeof(MsnConfigType)];
structSize=putDatagrams(CommandByte, Index, msgByte);
}
int FPSengine::putDatagrams (char CommandByte, int Index, char *msgByte)
{
int theSize;
switch ( CommandByte ) {
case (CHANGE_CONFIGURATION): {
theSize=sizeof(MsnConfigType);
union MConfigUnion {
char cByte[sizeof(MsnConfigType)];
MsnConfigType m;
};
MConfigUnion * msnConfig=(MConfigUnion*)msgByte;
...Do some assignments. I verify and everything is OK.
}
}
return theSize;
}
There are two ways. The pass-by-value way (C style):
int FPSengine::putDatagrams (char CommandByte, int Index, char **msgByte)
Note the second * for msgByte. Then inside of putDatagrams(), do:
*msgByte = new char[theSize];
In fact, anywhere in that function where you currently have msgByte, use *msgByte. When calling putDatagrams(), do:
structSize=putDatagrams(CommandByte, Index, &msgByte);
And the second way, since you're in C++, you could use pass-by-reference. Just change the signature of putDatagrams() to:
int FPSengine::putDatagrams (char CommandByte, int Index, char * &msgByte)
And you should be good. In this case, you shouldn't need to modify the caller or anything inside of your putDatagrams() routine.
Well, yes. Everything in C++ is, by default, passed by value. Parameters in the call putDatagrams(a, b, c) are sent by value - you wouldn't expect assigning to index in the code to change the value of b at the call site. Your msgByte=new char[theSize]; is just assigning to the local variable msgByte, overwriting the value passed in.
If you want to change a passed parameter such that the call site variable changes, you'll need to either pass by reference, or (in this case) pass a "pointer to a pointer` (and deference away the first pointer, assigning to the actual pointer).

SQLite3: Insert BLOB with NULL characters in C++

I'm working on the development of a C++ API which uses custom-designed plugins
to interface with different database engines using their APIs and specific SQL
syntax.
Currently, I'm attempting to find a way of inserting BLOBs, but since NULL is
the terminating character in C/C++, the BLOB becomes truncated when constructing
the INSERT INTO query string. So far, I've worked with
//...
char* sql;
void* blob;
int len;
//...
blob = some_blob_already_in_memory;
len = length_of_blob_already_known;
sql = sqlite3_malloc(2*len+1);
sql = sqlite3_mprintf("INSERT INTO table VALUES (%Q)", (char*)blob);
//...
I expect that, if it is at all possible to do it in the SQLite3 interactive console, it should be possible to construct the query string with properly escaped NULL characters. Maybe there's a way to do this with standard SQL which is also supported by SQLite SQL syntax?
Surely someone must have faced the same situation before. I've googled and found some answers but were in other programming languages (Python).
Thank you in advance for your feedback.
Thank you all again for your feedback. This time I'm reporting how I solved the problem with the help of the indications provided here. Hopefully this will help others in the future.
As suggested by the first three posters, I did use prepared statements — additionally because I was also interested in getting the columns' data types, and a simple sqlite3_get_table() wouldn't do.
After preparing the SQL statement in the form of the following constant string:
INSERT INTO table VALUES(?,?,?,?);
it remains the binding of the corresponding values. This is done by issuing as many sqlite3_bind_blob() calls as the columns. (I also resorted to sqlite3_bind_text() for other "simple" data types because the API I'm working on can translate integers/doubles/etc into a string). So:
#include <stdio.h>
#include <string.h>
#include <sqlite3.h>
/* ... */
void* blobvalue[4] = { NULL, NULL, NULL, NULL };
int blobsize[4] = { 0, 0, 0, 0 };
const char* tail = NULL;
const char* sql = "INSERT INTO tabl VALUES(?,?,?,?)";
sqlite3_stmt* stmt = NULL;
sqlite3* db = NULL;
/* ... */
sqlite3_open("sqlite.db", &db);
sqlite3_prepare_v2(db,
sql, strlen(sql) + 1,
&stmt, &tail);
for(unsigned int i = 0; i < 4; i++) {
sqlite3_bind_blob(stmt,
i + 1, blobvalue[i], blobsize[i],
SQLITE_TRANSIENT);
}
if(sqlite3_step(stmt) != SQLITE_DONE) {
printf("Error message: %s\n", sqlite3_errmsg(db));
}
sqlite3_finalize(stmt);
sqlite3_close(db);
Note also that some functions (sqlite3_open_v2(), sqlite3_prepare_v2()) appear on the later SQLite versions (I suppose 3.5.x and later).
The SQLite table tabl in file sqlite.db can be created with (for example)
CREATE TABLE tabl(a TEXT PRIMARY KEY, b TEXT, c TEXT, d TEXT);
You'll want to use this function with a prepared statement.
int sqlite3_bind_blob(sqlite3_stmt*, int, const void*, int n, void(*)(void*));
In C/C++, the standard way of dealing with NULLs in strings is to either store the beginning of the string and a length, or store a pointer to the beginning of a string and one to the end of the string.
You want to precompile the statement sqlite_prepare_v2(), and then bind the blob in using sqlite3_bind_blob(). Note that the statement you bind in will be INSERT INTO table VALUES (?).

Resources