OpenCL casting when accessing array - opencl

I was reading Apple's OpenCL reduction example, and noticed there's a macro for accessing array elements:
#define LOAD_GLOBAL_F1(s, i) \
((__global const float*)(s))[(size_t)(i)]
It was used like this:
float a = LOAD_GLOBAL_F1(input, i);
where input is of type __global const float * and i is size_t.
Why is float a = input[i] not used here? Thanks!

In this example the code code have been written not using the macros as you suggest. Why the macros are used is anyone's guess: code reused from somewhere else; a C based kernel test framework; the code author feeling it improved readability; etc., etc.

Related

Frama-c : Trouble understanding WP memory models

I'm looking for WP options/model that could allow me to prove basic C memory manipulations like :
memcpy : I've tried to prove this simple code :
struct header_src{
char t1;
char t2;
char t3;
char t4;
};
struct header_dest{
short t1;
short t2;
};
/*# requires 0<=n<=UINT_MAX;
# requires \valid(dest);
# requires \valid_read(src);
# assigns (dest)[0..n-1] \from (src)[0..n-1];
# assigns \result \from dest;
# ensures dest[0..n] == src[0..n];
# ensures \result == dest;
*/
void* Frama_C_memcpy(char *dest, const char *src, uint32_t n);
int main(void)
{
struct header_src p_header_src;
struct header_dest p_header_dest;
p_header_src.t1 = 'e';
p_header_src.t2 = 'b';
p_header_src.t3 = 'c';
p_header_src.t4 = 'd';
p_header_dest.t1 = 0x0000;
p_header_dest.t2 = 0x0000;
//# assert \valid(&p_header_dest);
Frama_C_memcpy((char*)&p_header_dest, (char*)&p_header_src, sizeof(struct header_src));
//# assert p_header_dest.t1 == 0x6265;
//# assert p_header_dest.t2 == 0x6463;
}
but the two last assert weren't verified by WP (with default prover Alt-Ergo). It can be proved thanks to Value analysis, but I mostly want to be able to prove the code not using abstract interpretation.
Cast pointer to int : Since I'm programming embedded code, I want to be able to specify something like:
#define MEMORY_ADDR 0x08000000
#define SOME_SIZE 10
struct some_struct {
uint8_t field1[SOME_SIZE];
uint32_t field2[SOME_SIZE];
}
// [...]
// some function body {
struct some_struct *p = (some_struct*)MEMORY_ADDR;
if(p == NULL) {
// Handle error
} else {
// Do something
}
// } body end
I've looked a little bit at WP's documentation and it seems that the version of frama-c that I use (Magnesium-20151002) has several memory model (Hoare, Typed , +cast, +ref, ...) but none of the given example were proved with any of the model above. It is explicitly said in the documentation that Typed model does not handle pointer-to-int casts. I've a lot of trouble to understand what's really going on under the hood with each wp-model. It would really help me if I was able to verify at least post-conditions of the memcpy function. Plus, I have seen this issue about void pointer that apparently are not very well handled by WP at least in the Magnesium version. I didn't tried another version of frama-c yet, but I think that newer version handle void pointer in a better way.
Thank you very much in advance for your suggestions !
memcpy
Reasoning about the result of memcpy (or Frama_C_memcpy) is out of range of the current WP plugin. The only memory model that would work in your case is Bytes (page 13 of the manual for Chlorine), but it is not implemented.
Independently, please note that your postcondition from Frama_C_memcpy is not what you want here. You are asserting the equality of the sets dest[0..n] and src[0..n]. First, you should stop at n-1. Second, and more importantly, this is far too weak, and is in fact not sufficient to prove the two assertions in the caller. What you want is a quantification on all bytes. See e.g. the predicate memcmp in Frama-C's stdlib, or the variant \forall int i; 0 <= i < n -> dest[i] == src[i];
By the way, this postcondition holds only if dest and src are properly separated, which your function does not require. Otherwise, you should write dest[i] == \at (src[i], Pre). And your requires are also too weak for another reason, as you only require the first character to be valid, not the n first ones.
Cast pointer to int
Basically, all current models implemented in WP are unable to reason on codes in which the memory is accessed with multiple incompatible types (through the use of unions or pointer casts). In some cases, such as Typed, the cast is detected syntactically, and a warning is issued to warn that the code cannot be analyzed. The Typed+Cast model is a variant of Typed in which some casts are accepted. The analysis is correct only if the pointer is re-cast into its original type before being used. The idea is to allow using some libc functions that require a cast to void*, but not much more.
Your example is again a bit different, because it is likely that MEMORY_ADDR is always addressed with type some_stuct. Would it be acceptable to change the code slightly, and change your function as taking a pointer to this type? This way, you would "hide" the cast to MEMORY_ADDR inside a function that would remain unproven.
I tried this example in the latest version of Frama-C (of course the format is modified a little bit).
for the memcpy case
Assertion 2 fails but assertion 3 is successfully proved (basically because the failure of assertion 2 leads to a False assumption, which proves everything).
So in fact both assertion cannot be proved, same as your problem.
This conclusion is sound because the memory models used in the wp plugin (as far as I know) has no assumption on the relation between fields in a struct, i.e. in header_src the first two fields are 8 bit chars, but they may not be nestedly organized in the physical memory like char[2]. Instead, there may be paddings between them (refer to wiki for detailed description). So when you try to copy bits in such a struct to another struct, Frama-C becomes completely confused and has no idea what you are doing.
As far as I am concerned, Frama-C does not support any approach to precisely control the memory layout, e.g. gcc's PACKED which forces the compiler to remove paddings.
I am facing the same problem, and the (not elegant at all) solution is, use arrays instead. Arrays are always nested, so if you try to copy a char[4] to a short[2], I think the assertion can be proved.
for the Cast pointer to int case
With memory model Typed+cast, the current version I am using (Chlorine-20180501) supports casting between pointers and uint64_t. You may want to try this version.
Moreover, it is strongly suggested to call Z3 and CVC4 through why3, whose performance is certainly better than Alt-Ergo.

OpenCL Programm freezes everytime private variables used

I got a kernel method, simplified looks like this.
__kernel void calculate (__global float *a, __global float *b, __global int *res) {
int workItem = get_global_id(0); // Syntax may not right, but you get the idea
int found = 0;
for (int i = 0; i<100000000; i++) {
float c = a[i]*3;
float d = b[i]*2;
if (c<d) {
found++;
}
}
res[workItem]= found;
}
So, nothing much but simple calculation and a very big loop, the problem is the programm freezes all the time when i run this code. I have to force reset the computer everytime this happens.
But if i make some changes, like this
if (true) {
found++;
}
Or
if (1<2) {
found++
}
Then the programm works like a charm, and very fast! So i wonder if is there any thing wrong with variables c and d ? I tried to use things like
__private float c= ..;
__private float d= ..;
It didnt work either.
I read the return code of every step while creating programm and kernel, so its not the problem.
What did I do wrong here ?
You have several issues here.
You are running a giant loop, presumably with a single work item, which is exactly what you want to avoid in OpenCL, as the entire concept of parallelism is lost. OpenCL will not magically make your code fast if you don't use it correctly.
Even if you are using multiple work items, you are not making use of the value you retrieve from get_global_id() except for output. But, as they're all using the same input, you'll get the same output for every single work item!
Work items and their associated global IDs are intended to allow you to partition your processing into discrete units, rather than one big monolithic loop. I suggest you look at some tutorials like this one to understand the concept better. Don't start writing your own code until you understand his.
As for why your program freezes your PC, I can only speculate without seeing your host code. Perhaps you are getting buffer overrun?

OpenCL Image reading not working on dynamically allocated array

I'm writing an OpenCL program that applies a convolution matrix on an image. Everything works fine if I store all pixel on an array image[height*width][4] (line 65,commented) (sorry, I speak Spanish, and I code mostly in Spanish). But, since the images I'm working with are really large, I need to allocate the memory dynamically. I execute the code, and I get a Segmentation fault error.
After some poor man's debugging, I found out the problem arises after executing the kernel and reading the output image back into the host, storing the data into the dynamically allocated array. I just can't access the data of the array without getting the error.
I think the problem is the way the clEnqueueReadImage function (line 316) writes the image data into the image array. This array was allocated dynamically, so it has no predefined "structure".
But I need a solution, and I can't find it, nor on my own or on Internet.
The C program and the OpenCL kernel are here:
https://gist.github.com/MigAGH/6dd0fddfa09f5aabe7eb0c2934e58cbe
Don't use pointers to pointers (unsigned char**). Use a regular pointer instead:
unsigned char* image = (unsigned char*)malloc(sizeof(unsigned char)*ancho*alto*4);
Then in the for loop:
for(i=0; i<ancho*alto; i++){
unsigned char* pixel = (unsigned char*)malloc(sizeof(unsigned char)*4);
fread (pixel, 4, 1, bmp_entrada);
image[i*4] = pixel[0];
image[i*4+1] = pixel[1];
image[i*4+2] = pixel[2];
image[i*4+3] = pixel[3];
free(pixel);
}

does Qt implement _countof or equivalent?

I really like using the _countof() macro in VS and I'm wondering if there is an OS-generic implementation of this in Qt.
For those unaware, _countof() gives you the number of elements in the array. so,
wchar_t buf[256];
_countof(buf) => 256 (characters)
sizeof(buf) => 512 (bytes)
It's very nice for use with, say, unicode strings, where it gives you character count.
I'm hoping Qt has a generic version.
_countof is probably defined like this:
#define _countof(arr) (sizeof(arr) / sizeof((arr)[0]))
You can use a definition like this with any compiler and OS.
If there is no such macro provided by Qt you can simply define a custom one yourself in one of your header files.
sth's code will work fine, but won't detect when you're trying to get the size of a pointer rather than an array. The MS solution does this (as danielweberdlc says), but it's possible to have this as a standard solution for C++:
#if defined(Q_OS_WIN)
#define ARRAYLENGTH(x) _countof(x)
#else // !Q_OS_WIN
template< typename T, std::size_t N >
inline std::size_t ARRAYLENGTH(T(&)[N]) { return N; }
#endif // !Q_OS_WIN
A more detailed description of this solution is given here.

How to reverse a QList?

I see qCopy, and qCopybackward but neither seems to let me make a copy in reverse order. qCopybackward only copies it in reverse order, but keeps the darn elements in the same order! All I want to do is return a copy of the list in reverse order. There has to be a function for that, right?
If you don't like the QTL, just use the STL. They might not have a Qt-ish API, but the STL API is rock-stable :) That said, qCopyBackward is just std::copy_backward, so at least they're consistent.
Answering your question:
template <typename T>
QList<T> reversed( const QList<T> & in ) {
QList<T> result;
result.reserve( in.size() ); // reserve is new in Qt 4.7
std::reverse_copy( in.begin(), in.end(), std::back_inserter( result ) );
return result;
}
EDIT 2015-07-21: Obviously (or maybe not), if you want a one-liner (and people seem to prefer that, looking at the relative upvotes of different answers after five years) and you have a non-const list the above collapses to
std::reverse(list.begin(), list.end());
But I guess the index fiddling stuff is better for job security :)
Reverse your QList with a single line:
for(int k = 0; k < (list.size()/2); k++) list.swap(k,list.size()-(1+k));
[Rewrite from original]
It's not clear if OP wants to know "How [do I] reverse a QList?" or actually wants a reversed copy. User mmutz gave the correct answer for a reversed copy, but if you just want to reverse the QList in place, there's this:
#include <algorithm>
And then
std::reverse(list.begin(), list.end());
Or in C++11:
std::reverse(std::begin(list), std::end(list));
The beauty of the C++ standard library (and templates in general) is that the algorithms and containers are separate. At first it may seem annoying that the standard containers (and to a lesser extent the Qt containers) don't have convenience functions like list.reverse(), but consider the alternatives: Which is more elegant: Provide reverse() methods for all containers, or define a standard interface for all containers that allow bidirectional iteration and provide one reverse() implementation that works for all containers that support bidirectional iteration?
To illustrate why this is an elegant approach, consider the answers to some similar questions:
"How do you reverse a std::vector<int>?":
std::reverse(std::begin(vec), std::end(vec));
"How do you reverse a std::deque<int>?":
std::reverse(std::begin(deq), std::end(deq));
What about portions of the container?
"How do you reverse the first seven elements of a QList?": Even if the QList authors had given us a convenience .reverse() method, they probably wouldn't have given us this functionality, but here it is:
if (list.size() >= 7) {
std::reverse(std::begin(list), std::next(std::begin(list), 7));
}
But it gets better: Because the iterator interface is the same as C pointer syntax, and because C++11 added the free std::begin() and std::end functions, you can do these:
"How do you reverse an array float x[10]?":
std::reverse(std::begin(x), std::end(x));
or pre C++11:
std::reverse(x, x + sizeof(x) / sizeof(x[0]));
(That is the ugliness that std::end() hides for us.)
Let's go on:
"How do you reverse a buffer float* x of size n?":
std::reverse(x, x + n);
"How do you reverse a null-terminated string char* s?":
std::reverse(s, s + strlen(s));
"How do you reverse a not-necessarily-null-terminated string char* s in a buffer of size n?":
std::reverse(s, std::find(s, s + n, '\0'));
Note that std::reverse uses swap() so even this will perform pretty much as well as it possibly could:
QList<QList<int> > bigListOfBigLists;
....
std::reverse(std::begin(bigListOfBigLists), std::end(bigListOfBigLists));
Also note that these should all perform as well as a hand-written loop since, when possible, the compiler will turn these into pointer arithmetic. Also, you can't cleanly write a reusable, generic, high-performance reverse function like this C.
#Marc Jentsch's answer is good. And if you want to get an additional 30% performance boost you can change his one-liner to:
for(int k=0, s=list.size(), max=(s/2); k<max; k++) list.swap(k,s-(1+k));
One a ThinkPad W520 with a QList of 10 million QTimers I got these numbers:
reversing list stack overflow took 194 ms
reversing list stack overflow with max and size took 136 ms
The boost is a result of
the expression (list.size()/2) being calculated only once when initializing the loop and not after every step
the expression list.size() in swap() is called only once when initializing the loop and not after every step
You can use the Java style iterator. Complete example here (http://doc.qt.digia.com/3.2/collection.html). Look for the word "reverse".
QList<int> list; // initial list
list << 1;
list << 2;
list << 3;
QList<int> rlist; // reverse list+
QListIterator<int> it(list);
while (it.hasPrevious()) {
rlist << it.previous();
}
Reversing a QList is going to be O(n) however you do it, since QList isn't guaranteed to have its data stored contiguously in memory (unlike QVector). You might consider just traversing the list in backwards order where you need to, or use something like a QStack which lets you retrieve the elements in the opposite order they were added.
For standard library lists it would look like this
std::list<X> result;
std::copy(list.rbegin(), list.rend(), std::back_inserter(result));
Unfortunately, Qt doesn't have rbegin and rend functions that return reverse iterators (the ones that go from the end of the container to its begnning). You may write them, or you can just write copying function on your own -- reversing a list is a nice excersize. Or you can note that QList is actually an array, what makes writing such a function trivial. Or you can convert the list to std::list, and use rbegin and rend. Choose whatever you like.
As of Qt 5.14 (circa 2020), QList provides a constructor that takes iterators, so you can just construct a reversed copy of the list with the reverse iterators of the source:
QList<int> backwards(forwards.rbegin(), forwards.rend());
Or if you want to be able to inline it, more generically (replace QList<I> with just I if you want to be super duper generic):
template <typename I> QList<I> reversed (const QList<I> &forwards) {
return QList<I>(forwards.rbegin(), forwards.rend());
}
Which lets you do fun one-liners with temporaries like:
QString badDay = "reporter covers whale exploding";
QString worseDay = reversed(badDay.split(' ')).join(' ');

Resources