QList crashes when size is large - qt

I am using a QList to store the data read from a SQL Table. The table has more than a million records. I need to get them in a list and then do some processing on the list.
QList<QVariantMap> list;
QString selectNewDB = QString("SELECT * FROM newDatabase.M106SRData");
QSqlQuery selectNewDBQuery = QSqlDatabase::database("CurrentDBConn").exec(selectNewDB);
while (selectNewDBQuery.next())
{
QSqlRecord selectRec = selectNewDBQuery.record();
QVariantMap varMap;
QString key;
QVariant value;
for (int i=0; i < selectRec.count(); ++i)
{
key = selectRec.fieldName(i);
value = selectRec.value(i);
varMap.insert(key, value);
}
list << varMap;
}
I get "qvector.h, line 534: Out of memory" error.
The program crashes when the list reaches the size of <1197762 items>. I tried using reserve() but it didn't work. Is QList limited to a specific size?

You've ran out of memory because the C++ runtime has reported that it cannot allocate any more memory. It's not a problem with Qt containers. The containers are limited to 2^31-1 items due to the size of int the use for the index. You're nowhere near that.
At the very least:
Use a QVector instead of QList as it has much lower overhead for the QVariantMap element.
Attempt to reserve the space if the query allows it: this will almost halve the memory requirements!
Compile for a 64 bit target if you can.
QVector<QVariantMap> list;
QString selectNewDB = QString("SELECT * FROM newDatabase.M106SRData");
QSqlQuery selectNewDBQuery = QSqlDatabase::database("CurrentDBConn").exec(selectNewDB);
auto const size = selectNewDBQuery.size();
if (size > 0) list.reserve(size);
while (selectNewDBQuery.next())
{
auto selectRec = selectNewDBQuery.record();
QVariantMap varMap;
for (int i=0; i < selectRec.count(); ++i)
{
auto const key = selectRec.fieldName(i);
auto const value = selectRec.value(i);
varMap.insert(key, value);
}
list.append(varMap);
}

You either don't have enough ram, or more likely are using a 32bit Qt build, which cannot utilize more than 4 GB of ram. Or maybe both. Size wise the container itself should be able to handle more than 2 billion elements.
QList ain't helping either, as in your case it will likely store every element as a pointer and do an additional heap allocation for the actual variant map. So you end up with a sizeable additional heap allocation overhead.
And since the query already contains a significant amount of data, it probably eats a decent amount of ram itself.
Unless you have disabled pagefile, running out of ram on its own should not result in a crash, as it would just start paging and ruin performance, but keep running, so you are likely hitting the memory limit for a 32 bit process, which may be as low as a mere 2 GB.
Aside from doing the things Kuba suggested in his answer, you might want to split your query into smaller pieces, and get the results in a few queries rather than one if possible, and process them one at a time, reducing the memory used by the query results and freeing the memory for a query once you are done with it.
There is also the option of saving on RAM from QString, in case you have a lot of repeating strings. As it is implicitly shared, you can have a bunch of identical strings that all use the same underlying data. You can take advantage of this, by using a QSet to keep a collection of unique strings and a quick check if a string is already present. Then instead of using the string from the query result, use the one from the set. All identical strings copied by value from the set will reuse the same string data. In contrast, your current approach will use n amount of space for every n duplicated strings.

Related

locks in OpenMP

Everyone good time of day!
Not so long ago, I was able to parallel the recursive algorithm for searching for possible options for combining some events. At the moment, the code is as follows:
//#include's
// function announcements
// declaring a global variable:
QVector<QVector<QVector<float>>> variant; (or "std::vector")
int main() {
// reads data from file
// data are converted and analyzed
// the variant variable containing the current best result is filled in (here - by pre-analysis)
#pragma omp parallel shared(variant)
#pragma omp master
// occurs call a recursive algorithm of search all variants:
PEREBOR(Tabl_1, a, i_a, ..., reс_depth);
return 0;
}
void PEREBOR(QVector<QVector<uint8_t>> Tabl_1, QVector<A_struct> a, uint8_t i_a, ..., uint8_t reс_depth)
{
// looking for the boundaries of the first cycle for some reasons
for (int i = quantity; i < another_quantity; i++) {
// the Tabl_1 is processed and modified to determine the number of steps in the subsequent for cycle
for (int k = 0; k < the_quantity_just_found; k++) {
if the recursion depth is not 1, we go down further: {
// add descent to the next recursion level to the call stack:
#pragma omp task
PEREBOR(Tabl_1_COPY, a, i_a, ..., reс_depth-1);
}
else (if we went down to the lowest level): {
if (condition fulfilled) // condition check - READ variant variable
variant = it_is_equal_to_that_,_to_that...;
else
continue;
}
}
}
}
At the moment, this thing really works well, and on six cores the CPU gives an increase of more than 5.7 from the single-core version.
As you can see, with a sufficiently large number of threads, there may be a failure associated with the simultaneous reading/writing of the variant variable. I understand she needs to be protected. At the moment, I see an output only in the use of blocking functions, since the critical section is not suitable because if the variable variant is written in only one section of the code (at the lowest level of recursion), then the reading occurs in many places.
Actually, here is the question - if I apply the constructions:
omp_lock_t lock;
int main() {
...
omp_init_lock(&lock);
#pragma omp parallel shared(variant, lock)
...
}
...
else (if we went down to the lowest level): {
if (condition fulfilled) { // condition check - READ variant variable
omp_set_lock(&lock);
variant = it_is_equal_to_that_,_to_that...;
omp_unset_lock(&lock);
}
else
continue;
...
will this lock protect the reading of the variable in all other places? Or will I need to manually check the lock status and pause the thread before reading elsewhere?
I will be incredibly grateful to the distinguished community for help!
In OpenMP specification (1.4.1 The structure of OpenMP memory model) you can read
The OpenMP API provides a relaxed-consistency, shared-memory model.
All OpenMP threads have access to a place to store and to retrieve
variables, called the memory. In addition, each thread is allowed to
have its own temporary view of the memory. The temporary view of
memory for each thread is not a required part of the OpenMP memory
model, but can represent any kind of intervening structure, such as
machine registers, cache, or other local storage, between the thread
and the memory. The temporary view of memory allows the thread to
cache variables and thereby to avoid going to memory for every
reference to a variable.
This practically means that (as with any relaxed memory model), only at well-defined points, are threads guaranteed to have the same, consistent view on the value of shared variables. In between such points, the temporary view may be different across the threads.
In your code you handled the problem of simultaneous writing of the same variable, but there is no guarantee that an another thread reads the correct value of the variable without additional measures.
You have 3 options to do (Note that each of these solutions not only will handle simultaneous read/writes, but also provides a consistent view on the value of shared variables.):
If your variable is scalar type, the best solution is to use atomic operations. This is the fastest option as atomic operations are typically supported by the hardware.
#pragma omp parallel
{
...
#pragma omp atomic read
tmp=variant;
....
#pragma omp atomic write
variant=new_value;
}
Use critical construct. This solution can be used if your variable is a complex type (such as class) and its read/write cannot be performed atomically. Note that it is much less efficient (slower) than an atomic operation.
#pragma omp parallel
{
...
#pragma omp critical
tmp=variant;
....
#pragma omp critical
variant=new_value;
}
Use locks for each read/write of your variable. Your code is OK for write, but have to use it for reads as well. It requires the most coding, but practically the result is the same as using the critical construct. Note that OpenMP implementations typically use locks to implement critical constructs.

OpenCL - Storing a large array in private memory

I have a large array of float called source_array with the size of around 50.000. I am current trying to implement a collections of modifications on the array and evaluate it. Basically in pseudo code:
__kernel void doSomething (__global float *source_array, __global boolean *res. __global int *mod_value) {
// Modify values of source_array with mod_value;
// Evaluate the modified array.
}
So in the process I would need to have a variable to hold modified array, because source_array should be a constant for all work item, if i modify it directly it might interfere with another work item (not sure if I am right here).
The problem is the array is too big for private memory therefore I can't initialize in kernel code. What should I do in this case ?
I considered putting another parameter into the method, serves as place holder for modified array, but again it would intefere with another work items.
Private "memory" on GPUs literally consists of registers, which generally are in short supply. So the __private address space in OpenCL is not suitable for this as I'm sure you've found.
Victor's answer is correct - if you really need temporary memory for each work item, you will need to create a (global) buffer object. If all work items need to independently mutate it, it will need a size of <WORK-ITEMS> * <BYTES-PER-ITEM> and each work-item will need to use its own slice of the buffer. If it's only temporary, you never need to copy it back to host memory.
However, this sounds like an access pattern that will work very inefficiently on GPUs. You will do much better if you decompose your problem differently. For example, you may be able to make whole work-groups coordinate work on some subrange of the array - copy the subrange into local (group-shared) memory, the work is divided between the work items in the group, and the results are written back to global memory, and the next subrange is read to local, etc. Coordinating between work-items in a group is much more efficient than each work item accessing a huge range of global memory We can only help you with this algorithmic approach if you are more specific about the computation you are trying to perform.
Why not to initialize this array in OpenCL host memory buffer. I.e.
const size_t buffer_size = 50000 * sizeof(float);
/* cl_malloc, malloc or new float [50000] or = {0.1f,0.2f,...} */
float *host_array_ptr = (float*)cl_malloc(buffer_size);
/*
put your data into host_array_ptr hear
*/
cl_int err_code;
cl_mem my_array = clCreateBuffer( my_cl_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, buffer_size, host_array_ptr, &err_code );
Then you can use this cl_mem my_array in OpenCL kernel
Find out more

Why the size of the QList object is 4 bytes?

When trying to get the size of a QList object using sizeof(), function it gave me 4 bytes.
I tried to change the number of items in the list, it also gave 4 bytes?
I tried to change the type of the items in the list, again it gave 4 bytes?
Why the change in the number or type of items does not affect the object size?
To get the number of items in the list, use the QList::size() not sizeof(). Something like this:
QList<MyObject> list;
list.append(object1);
list.append(object2);
int size = list.size();
sizeof() returns the size of the QList class, which is, as is often the case for Qt classes, made of only one member: a pointer to a private structure. Assuming you are on a 32bit OS, it takes 4 bytes to store an address, hence why you sizeof(list) returns 4.
sizeof provides you with the compile-time size of the object itself, not of the data it may point to.
QList is a class containing a pointer to the data structure that actually holds a pointer to the heap-allocated data; of all this stuff, sizeof just knows (and cares) about that first bit - that QList contains just a pointer, hence it's 4 bytes big.
It's the exact same reason why if you do
int n;
std::cin>>n;
int *foo = new int[n];
std::cout<<sizeof(foo);
delete[] foo;
It'll always print 4 (or whatever the pointer size is on your machine), regardless of what n is set to.

Arduino Zero - Region Ram overflowed with stack

I have some code that uses nested Structs to store device parameters see below:
This is using an Ardunio Zero ( Atmel SAMD21)
The declares Storeage with up to 3 networks each network with 64 devices.
I would like to use 5 networks however when I increase the networks to 4 the code will not compile.
I get region RAM overflowed with stack / RAM overflowed by 4432 bytes.
I understand that this is taking more ram then I have? I am looking to see if there is a solution using a different method to achieve the same thing but get it to fit?
struct device {
int stat;
bool changed;
char data[51];
char state[51];
char atime[14];
char btime[14];
};
struct outputs {
device fitting[64];
};
struct storage {
int deviceid =0;
int addstore =0;
bool set;
bool run_events = false;
char authkey[10];
outputs network[3];
} ;
storage data_store;
Well, the usual approches are:
Consider if all or any of the data is actually read-only, and thus can be made const (which should move it to read-only memory, if that fails you can usually force it by adding compiler-specific magic).
Figure out means of representing the data using fewer bits. For instance using 14 bytes for each of three timestamps might seem excessive; switching these to 32-bit timestamps and generating the strings when needed would save around 70%.
If there are duplicates, then perhaps each storage doesn't need three unique outputs, but can instead store pointers into a shared "pool" of unique configurations.
If not all 64 fittings are used, that array could also be refactored into having non-constant length.
It's hard to be more specific since I don't know your data or application well enough.
Your struct is taking too much place. That's all. Assuming chars, ints and bools are internally 1 byte each, your device struct takes 132 bytes. Then, your outputs struct takes 8448 bytes or 8.25Kb. Your unit has 32Kb of RAM...

How to free resources of QString when use it inside std::vector

I have a structure "rs" for every record of my dataset.
All records are in a vector "r".
My record count is in “rc”.
....
struct rs{
uint ip_i;//index
QString ip_addr;//ip address
};
std::vector <rs> r;//rows ordered by key
int rc;//row count
....
I would like to control this memory usage.
That's why I don't want to use r.insert and r.erase.
When I need to insert a record, I will:
Increase size of r by r.resize(..);r.shrink_to_fit() (if needed).
Shift elements of r to the right (if needed) by std::rotate.
Put new values: r[i].ip_i=...;r[i].ip_addr=...
When I need to delete a record, I will:
Shift elements of r to the left (if needed) by std::rotate.
For example, std::rotate(r.begin()+i,r.begin()+i+1,r.begin()+rc);.
Free resources of r[rc].ip_addr.
How to free resouces of QString r[rc].ip_addr?
I've tried to do r[i].ip_addr.~QString() and catched an runtime error.
Make r.resize() (if needed).
I don't want to loose memory because of Qstring copies stayed after rows deleting.
How can I control them?
Thanks.
QString handles all memory control for you. Just treat it as a regular object and you'll be fine. std::vector is OO-aware, so it will call destructors when freeing elements.
The only thing you should not do is use low-level memory manipulation routines like memcpy or memset. std::vector operations are safe.
If you really want to free a string for a record that is within [0..size-1] range (that is, you do not actually decrease size with resize() after moving elements), then calling r[i].ip_addr.clear() would suffice. Or better yet, introduce the clear() method in your structure that will call ip_addr.clear() (in case you add more fields that need to be cleared). But you can only call it on a valid record, of course, not one beyond your actual vector size (no matter what the underlying capacity is, it's just an implementation detail).
On a side note, it probably makes sense to use QList instead since you're using Qt anyway, unless you have specific reasons to use std::vector. As far as memory control goes, QList offers reserve method which allows you reserve exactly as many elements as you need. Inserting then would look like
list.reserve(list.size() + 1);
list.insert(i, r);

Resources