Arduino Zero - Region Ram overflowed with stack - arduino

I have some code that uses nested Structs to store device parameters see below:
This is using an Ardunio Zero ( Atmel SAMD21)
The declares Storeage with up to 3 networks each network with 64 devices.
I would like to use 5 networks however when I increase the networks to 4 the code will not compile.
I get region RAM overflowed with stack / RAM overflowed by 4432 bytes.
I understand that this is taking more ram then I have? I am looking to see if there is a solution using a different method to achieve the same thing but get it to fit?
struct device {
int stat;
bool changed;
char data[51];
char state[51];
char atime[14];
char btime[14];
};
struct outputs {
device fitting[64];
};
struct storage {
int deviceid =0;
int addstore =0;
bool set;
bool run_events = false;
char authkey[10];
outputs network[3];
} ;
storage data_store;

Well, the usual approches are:
Consider if all or any of the data is actually read-only, and thus can be made const (which should move it to read-only memory, if that fails you can usually force it by adding compiler-specific magic).
Figure out means of representing the data using fewer bits. For instance using 14 bytes for each of three timestamps might seem excessive; switching these to 32-bit timestamps and generating the strings when needed would save around 70%.
If there are duplicates, then perhaps each storage doesn't need three unique outputs, but can instead store pointers into a shared "pool" of unique configurations.
If not all 64 fittings are used, that array could also be refactored into having non-constant length.
It's hard to be more specific since I don't know your data or application well enough.

Your struct is taking too much place. That's all. Assuming chars, ints and bools are internally 1 byte each, your device struct takes 132 bytes. Then, your outputs struct takes 8448 bytes or 8.25Kb. Your unit has 32Kb of RAM...

Related

OpenCL - Storing a large array in private memory

I have a large array of float called source_array with the size of around 50.000. I am current trying to implement a collections of modifications on the array and evaluate it. Basically in pseudo code:
__kernel void doSomething (__global float *source_array, __global boolean *res. __global int *mod_value) {
// Modify values of source_array with mod_value;
// Evaluate the modified array.
}
So in the process I would need to have a variable to hold modified array, because source_array should be a constant for all work item, if i modify it directly it might interfere with another work item (not sure if I am right here).
The problem is the array is too big for private memory therefore I can't initialize in kernel code. What should I do in this case ?
I considered putting another parameter into the method, serves as place holder for modified array, but again it would intefere with another work items.
Private "memory" on GPUs literally consists of registers, which generally are in short supply. So the __private address space in OpenCL is not suitable for this as I'm sure you've found.
Victor's answer is correct - if you really need temporary memory for each work item, you will need to create a (global) buffer object. If all work items need to independently mutate it, it will need a size of <WORK-ITEMS> * <BYTES-PER-ITEM> and each work-item will need to use its own slice of the buffer. If it's only temporary, you never need to copy it back to host memory.
However, this sounds like an access pattern that will work very inefficiently on GPUs. You will do much better if you decompose your problem differently. For example, you may be able to make whole work-groups coordinate work on some subrange of the array - copy the subrange into local (group-shared) memory, the work is divided between the work items in the group, and the results are written back to global memory, and the next subrange is read to local, etc. Coordinating between work-items in a group is much more efficient than each work item accessing a huge range of global memory We can only help you with this algorithmic approach if you are more specific about the computation you are trying to perform.
Why not to initialize this array in OpenCL host memory buffer. I.e.
const size_t buffer_size = 50000 * sizeof(float);
/* cl_malloc, malloc or new float [50000] or = {0.1f,0.2f,...} */
float *host_array_ptr = (float*)cl_malloc(buffer_size);
/*
put your data into host_array_ptr hear
*/
cl_int err_code;
cl_mem my_array = clCreateBuffer( my_cl_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, buffer_size, host_array_ptr, &err_code );
Then you can use this cl_mem my_array in OpenCL kernel
Find out more

Why the size of the QList object is 4 bytes?

When trying to get the size of a QList object using sizeof(), function it gave me 4 bytes.
I tried to change the number of items in the list, it also gave 4 bytes?
I tried to change the type of the items in the list, again it gave 4 bytes?
Why the change in the number or type of items does not affect the object size?
To get the number of items in the list, use the QList::size() not sizeof(). Something like this:
QList<MyObject> list;
list.append(object1);
list.append(object2);
int size = list.size();
sizeof() returns the size of the QList class, which is, as is often the case for Qt classes, made of only one member: a pointer to a private structure. Assuming you are on a 32bit OS, it takes 4 bytes to store an address, hence why you sizeof(list) returns 4.
sizeof provides you with the compile-time size of the object itself, not of the data it may point to.
QList is a class containing a pointer to the data structure that actually holds a pointer to the heap-allocated data; of all this stuff, sizeof just knows (and cares) about that first bit - that QList contains just a pointer, hence it's 4 bytes big.
It's the exact same reason why if you do
int n;
std::cin>>n;
int *foo = new int[n];
std::cout<<sizeof(foo);
delete[] foo;
It'll always print 4 (or whatever the pointer size is on your machine), regardless of what n is set to.

Pointer to a register on a 16 bit controller

How do you declare a pointer on a 16 bit Renesas RL78 microcontroller using IAR's EWB RL78 compiler to a register which has a 20 bit address?
Ex:
static int *ptr = (int *)0xF1000;
The above does not work because pointers are 16 bit addresses.
If the register in question is an on-chip peripheral, then it is likely that your toolchain already includes a processor header with all registers declared, in which case you should use that. If for some reason you cannot or do not wish to do that, then you could at least look at that to see how it declares such registers.
In any event you should at least declare the address volatile since it is not a regular memory location and may change beyond the control and knowledge of your code as part of the normal peripheral behaviour. Moreover you should use explicit sized data types and it is unlikely that this register is signed.
#include <stdint.h>
...
static volatile uint16_t* ptr = (uint16_t*)0xF1000u ;
Added following clarification of target architecture:
The IAR RL78 compiler supports two data models - near and far. From the IAR compiler manual:
● The Near data model can access data in the highest 64 Kbytes of data
memory
● The Far data model can address data in the entire 1 Mbytes of
data memory.
The near model is the default. The far model may be set using the compiler option: --data_model=far; this will globally change the pointer type to allow 20 bit addressing (pointers are 3 bytes long in this case).
Even without specifying the data model globally it is possible to override the default pointer type by explicitly specifying the pointer type using the keywords __near and __far. So in the example in the question the correct declaration would be:
static volatile uint16_t __far* ptr = (uint16_t*)0xF1000u ;
Note the position of the __far keyword is critical. Its position can be used to declare a pointer to far memory, or a pointer in far memory (or you can even declare both to and in far memory).
On an RL78, 0xF1000 in fact refers to the start of data flash rather then a register as stated in the question. Typically a pointer to a register would not be subject to alteration (which would mean it referred to a different register), so might reasonably be declared const:
static volatile uint16_t __far* const ptr = (uint16_t*)0xF1000u ;
Similarly to __far the position of const is critical to the semantics. The above prevents ptr from being modified but allows what ptr refers to to be modified. Being flash memory, this may not always be desirable or possible, so it is possible that it could reasonably be declared a const pointer to a const value.
Note that for RL78 Special Function Registers (SFR) the IAR compiler has a keyword __sfr specifically for addressing registers in the area 0xFFF00-0xFFFFF:
Example:
#pragma location=0xFFF20
__no_init volatile uint8_t __sfr PORT1; // PORT1 is located at address 0xFFF20
Alternative syntax using IAR specfic compiler extension:
__no_init volatile uint8_t __sfr PORT1 # 0xFFF20 ;

QList crashes when size is large

I am using a QList to store the data read from a SQL Table. The table has more than a million records. I need to get them in a list and then do some processing on the list.
QList<QVariantMap> list;
QString selectNewDB = QString("SELECT * FROM newDatabase.M106SRData");
QSqlQuery selectNewDBQuery = QSqlDatabase::database("CurrentDBConn").exec(selectNewDB);
while (selectNewDBQuery.next())
{
QSqlRecord selectRec = selectNewDBQuery.record();
QVariantMap varMap;
QString key;
QVariant value;
for (int i=0; i < selectRec.count(); ++i)
{
key = selectRec.fieldName(i);
value = selectRec.value(i);
varMap.insert(key, value);
}
list << varMap;
}
I get "qvector.h, line 534: Out of memory" error.
The program crashes when the list reaches the size of <1197762 items>. I tried using reserve() but it didn't work. Is QList limited to a specific size?
You've ran out of memory because the C++ runtime has reported that it cannot allocate any more memory. It's not a problem with Qt containers. The containers are limited to 2^31-1 items due to the size of int the use for the index. You're nowhere near that.
At the very least:
Use a QVector instead of QList as it has much lower overhead for the QVariantMap element.
Attempt to reserve the space if the query allows it: this will almost halve the memory requirements!
Compile for a 64 bit target if you can.
QVector<QVariantMap> list;
QString selectNewDB = QString("SELECT * FROM newDatabase.M106SRData");
QSqlQuery selectNewDBQuery = QSqlDatabase::database("CurrentDBConn").exec(selectNewDB);
auto const size = selectNewDBQuery.size();
if (size > 0) list.reserve(size);
while (selectNewDBQuery.next())
{
auto selectRec = selectNewDBQuery.record();
QVariantMap varMap;
for (int i=0; i < selectRec.count(); ++i)
{
auto const key = selectRec.fieldName(i);
auto const value = selectRec.value(i);
varMap.insert(key, value);
}
list.append(varMap);
}
You either don't have enough ram, or more likely are using a 32bit Qt build, which cannot utilize more than 4 GB of ram. Or maybe both. Size wise the container itself should be able to handle more than 2 billion elements.
QList ain't helping either, as in your case it will likely store every element as a pointer and do an additional heap allocation for the actual variant map. So you end up with a sizeable additional heap allocation overhead.
And since the query already contains a significant amount of data, it probably eats a decent amount of ram itself.
Unless you have disabled pagefile, running out of ram on its own should not result in a crash, as it would just start paging and ruin performance, but keep running, so you are likely hitting the memory limit for a 32 bit process, which may be as low as a mere 2 GB.
Aside from doing the things Kuba suggested in his answer, you might want to split your query into smaller pieces, and get the results in a few queries rather than one if possible, and process them one at a time, reducing the memory used by the query results and freeing the memory for a query once you are done with it.
There is also the option of saving on RAM from QString, in case you have a lot of repeating strings. As it is implicitly shared, you can have a bunch of identical strings that all use the same underlying data. You can take advantage of this, by using a QSet to keep a collection of unique strings and a quick check if a string is already present. Then instead of using the string from the query result, use the one from the set. All identical strings copied by value from the set will reuse the same string data. In contrast, your current approach will use n amount of space for every n duplicated strings.

Sharing Pointers Between Multiple Forked Processes

If I want to share something like a char **keys array between fork()'d processes using shm_open and mmap can I just stick a pointer to keys into a shared memory segment or do I have to copy all the data in keys into the shared memory segment?
All data you want to share has to be in the shared segment. This means that both the pointers and the strings have to be in the shared memory.
Sharing something which includes pointers can be cumbersome. This is because mmap doesn't guarantee that a given mapping will end up in the required address.
You can still do this, by two methods. First, you can try your luck with mmap and hope that the dynamic linker doesn't load something at your preferred address.
Second method is to use relative pointers. Inside a pointer, instead of storing a pointer to a string, you store the difference between the address of the pointer and the address of the string. Like so:
char **keys= mmap(NULL, ...);
char *keydata= (char*) keys + npointers * sizeof(char*);
strcpy(keydata, firstring);
keys[0]= (char*) (keydata - (char*) &keys[0]);
keydata+= strlen(firststring)+1;
When you want to access the string from the other process, you do the reverse:
char **keys= mmap(NULL, ...);
char *str= (char*) (&keys[0]) + (ptrdiff_t) keys[0];
It's a little cumbersome but it works regardless of what mmap returns.

Resources