Is there a limit for short strings where using the F() macro brings more RAM overhead then saving?
For (a contrived) example:
Serial.print(F("\n"));
Serial.print(F("Hi"));
Serial.print(F("there!"));
Serial.print(F("How do you doyou how?"));
Would any one of those be more efficient without the F()?
I imagine it uses some RAM to iterate over the string and copy it from PROGMEM to RAM. I guess the question is: how much? Also, is heap fragmentation a concern here?
I'm looking at this purely from SRAM-conserving perspective.
From a purely SRAM-conserving perspective all of your examples are identical in that no SRAM is used. At run-time some RAM is used, but only momentarily on the stack. Keep in mind that calling println() (w/o any parameters) uses some stack/RAM.
For a single character it will take up less space in flash if a char is passed into print or println. For example:
Serial.print('\n');
The char will be in flash (not static RAM).
Using
Serial.print(F("\n"));
will create a string in flash memory that is two bytes long (newline char + null terminator) and will additionally pass a pointer to that string to print which is probably two bytes long.
Additionally at runtime, using the F macro will result in two fetches ('\n' and the null terminator) from flash. While fetches from flash are fast, passing in a char results in zero fetches from flash, which is a tiny bit faster.
I don't think there is any minimum size of the string to be useful. If you look at how the outputting is implemented in Print.cpp:
size_t Print::print(const __FlashStringHelper *ifsh)
{
PGM_P p = reinterpret_cast<PGM_P>(ifsh);
size_t n = 0;
while (1) {
unsigned char c = pgm_read_byte(p++);
if (c == 0) break;
n += write(c);
}
return n;
}
You can see from there that only one byte of RAM is used at a time (plus a couple of variables), as it pulls the string from PROGMEM a byte at a time. These are all on the stack so there is no ongoing overhead.
I imagine it uses some RAM to iterate over the string and copy it from PROGMEM to RAM. I guess the question is: how much?
No, it doesn't as I showed above. It outputs a byte at a time. There is no copying (in bulk) of the string into RAM first.
Also, is heap fragmentation a concern here?
No, the code does not use the heap.
Related
I am using vloadn to load data and as a parameter I pass the range I want to read and it works, but I am wondering what's the behavior of vload4. If this might cause some unexpected issue or I am perfectly safe to do this. An example might be something like this:
__kernel void myKernel(__global float* data_ptr, int size)
{
float4 vec = vload4(0, data_ptr);
float sum = 0.f;
// data_ptr points to an array of 2 floats in global mem
if (size == 2) {
sum += vec.s1;
sum += vec.s0;
}
else if (size == 1) {
sum += vec.s0;
}
}
data_ptr is an array of 2 floats in global memory, but even though I am accessing only those 2 floats, I am loading 4 floats using vload4. The reason I am asking is that I want to use a single vloadn and decide afterwards how much of it I actually want to use and not to use vloadn based on size (e.g. for size==4 use vload4, for size==8 vload8 etc.
If it's still within data_ptr it will be fine; you don't have to use all the data you read. However, if you read past either end of the buffer that data_ptr points to you can have problems (memory read exception, for example, or some other device-dependent error). Note: Check the address alignment requirements for vload to see if you're allowed to read at any address or only multiple of with size.
I'm writing an OpenCL program that applies a convolution matrix on an image. Everything works fine if I store all pixel on an array image[height*width][4] (line 65,commented) (sorry, I speak Spanish, and I code mostly in Spanish). But, since the images I'm working with are really large, I need to allocate the memory dynamically. I execute the code, and I get a Segmentation fault error.
After some poor man's debugging, I found out the problem arises after executing the kernel and reading the output image back into the host, storing the data into the dynamically allocated array. I just can't access the data of the array without getting the error.
I think the problem is the way the clEnqueueReadImage function (line 316) writes the image data into the image array. This array was allocated dynamically, so it has no predefined "structure".
But I need a solution, and I can't find it, nor on my own or on Internet.
The C program and the OpenCL kernel are here:
https://gist.github.com/MigAGH/6dd0fddfa09f5aabe7eb0c2934e58cbe
Don't use pointers to pointers (unsigned char**). Use a regular pointer instead:
unsigned char* image = (unsigned char*)malloc(sizeof(unsigned char)*ancho*alto*4);
Then in the for loop:
for(i=0; i<ancho*alto; i++){
unsigned char* pixel = (unsigned char*)malloc(sizeof(unsigned char)*4);
fread (pixel, 4, 1, bmp_entrada);
image[i*4] = pixel[0];
image[i*4+1] = pixel[1];
image[i*4+2] = pixel[2];
image[i*4+3] = pixel[3];
free(pixel);
}
i've made this little program to test a little part of a bigger program.
int main()
{
char c[]="ddddddddddddd";
char *g= malloc(4*sizeof(char));
*g=NULL;
strcpy (g,c);
printf("Hello world %s!\n",g);
return 0;
}
I expected that the function would return "Hello World dddd" ,since the length of g is 4*sizeof(char), but it returns " Hello World ddddddddddddd ".Can you explain me Where I'm wrong ?
Don't do that, it's undefined behaviour.
The strcpy function will happily copy all those characters in c regardless of the size of g.
That's because it copies characters up to the first \0 in c. In this particular case it may corrupt your heap, or it may not, depending on the minimum size of things that get allocated in the heap (many have a "resolution" of sixteen bytes for example).
There are other functions you can use (though they're optional) if you want your code to be more robust, such as strncpy (provided you understand the limitations), or strcpy_s(), as detailed in Appendix K of the ISO C11 standard (and earlier iterations as well).
Or, if you can't use those for some reason, it's up to the developer to ensure they don't break the rules.
I'm trying to write an OpenCL implementation of memchr to help me learn how OpenCL works. What I'm planning to do is to assign each work item a chunk of memory to search. Then, inside each work item, it loops through the chunk searching for the character.
Especially if the buffer is large, I don't want the other threads to keep searching after an occurrence has already been found (assume there is only one occurrence of the character in any given buffer).
What I'm stuck on is how does a work item indicate, both to the host and other threads, when it has found the character?
Thanks,
One way you could do this is to use a global flag variable. You atomically set it to 1 when you find the value and other threads will check on that value when they are doing work.
For example:
__kernel test(__global int* buffer, __global volatile int* flag)
{
int tid = get_global_id(0);
int sx = get_global_size(0);
int i = tid;
while(buffer[i] != 8) //Whatever value we're trying to find.
{
int stop = atomic_add(&flag, 0); //Read the atomic value
if(stop)
break;
i = i + sx;
}
atomic_xchg(&flag, 1); //Set the atomic value
}
This might add more overhead than by just running the whole kernel (unless you are doing a lot of work on every iteration). In addition, this method won't work if each thread is just checking a single value in the array. Each thread must have multiple iterations of work.
Finally, I've seen instances where writing to an atomic variable doesn't immediately commit, so you need to check to see if this code will deadlock on your system because the write isn't committing.
Not used memcpy much but here's my code that doesn't work.
memcpy((PVOID)(enginebase+0x74C9D),(void *)0xEB,2);
(enginebase+0x74C9D) is a pointer location to the address of the bytes that I want to patch.
(void *)0xEB is the op code for the kind of jmp that I want.
Only problem is that this crashes the instant that the line tries to run, I don't know what I'm doing wrong, any incite?
The argument (void*)0xEB is saying to copy memory from address 0xEB; presumably you want something more like
unsigned char x = 0xEB;
memcpy((void*)(enginebase+0x74c9d), (void*)&x, 2);
in order to properly copy the value 0xEB to the destination. BTW, is 2 the right value to copy a single byte to program memory? Looks like it should be 1, since you're copying 1 byte. I'm also under the assumption that you can't just do
((char*)enginebase)[0x74c9d] = 0xEB;
for some reason? (I don't have any experience overwriting program memory intentionally)
memcpy() expect two pointers for the source and destination buffers. Your second argument is not a pointer but rather the data itself (it is the opcode of jnz, as you described it). If I understand correctly what you are trying to do, you should set an array with the opcode as its contetns, and provide memcpy() with the pointer to that array.
The program crashes b/c you try to reference a memory location out of your assigned space (address 0xEB).