I'm writing an OpenCL code for an n-body algorithm. I'm getting an Invalid Context error when I try to execute it. The error comes in the part of the code that calls the kernel for execution on the GPU. I've pasted my code here. If anyone can help me understand why I'm getting this error and help me solve it, I'd be grateful.
if (gpuSize) {
/*launch the kernel on second device (GPU)*/
ret = clEnqueueNDRangeKernel(
accelState.queues[1],
accelState.kernel,
1,
global_work_offset1,
global_work_size1,
NULL, /*let OpenCL determine localWorkSize*/
1, &enqEvents[noOfQEvents-1],
&enqEvents[noOfEvents]
);
/*noOfEvents++;*/
checkResult(ret);
}
I'm getting an error on the last line checkResult(ret) but as I understand it, there's a mismatch between my command queue accelState.queues[1] and something in the kernel? Any help would be much apppreciated. Thank you.
The problem is clear, you are running a kernel from one context (context B), in a queue of another context (context A).
That is not allowed, all the objects can only interact with their own context objects. That applies to kernel, buffers, queues, events, etc..
However, HW resources like devices can be used in different contexts.
Related
I have a program utilizing OpenCL 2.0 because I want to take advantage of device-side enqueue. I have a test program that performs the following tasks on the host side:
Allocates 16 kilobytes of floating point memory on the device and zeros it out.
Builds the OpenCL program below, and creates a kernel of masterKernel()
Sets the first argument of masterKernel() (heap) to the allocated memory in step 1
Enqueues that masterKernel() via clEnqueueNDRangeKernel() with a work_dim of 1 and a global work size of 1. (So it only runs once, with get_global_id(0) always being zero)
Reads the memory back into the host and displays it.
Here is the OpenCL code:
//This function was stripped down to nothing for testing purposes.
kernel void childKernel(global float* heap)
{
}
//Enqueues the child kernel.
kernel void masterKernel(global float* heap)
{
ndrange_t ndRange = ndrange_1D(16); //Arbitrary, could be any number.
if(get_global_id(0) == 0)
{
enqueue_kernel(get_default_queue(), 0, ndRange,
^{ childKernel(heap); });
}
}
The program builds successfully. However, when I try to run masterKernel(), The call to enqueue_kernel() here causes the host side call to clEnqueueNDRangeKernel() to fail with an error code of CL_OUT_OF_RESOURCES. OpenCL's documentation says enqueue_kernel() should return CL_SUCCESS or CL_ENQUEUE_FAILURE depending on if the block enqueues successfully or not. It does not say that clEnqueueNDRangeKernel() itself should fail. Here are some other things I've tried:
Commenting out the call to enqueue_kernel() causes the program to succeed.
Adding a line that sets heap[0] to any number causes the host-side program to reflect that change. So I know that it's not a problem with how I'm feeding the arguments in
Modifying the if statement so that it reads something impossible like if(get_global_id(0) == 6000) still causes the error. This tells me that the error is not caused by enqueue_kernel() executing (I verified get_global_size(0) == 1), but merely that it exists in the program at all.
Modifying the if statement to if(0) does make the error not happen.
Making it so childKernel() actually does something does not make the error go away.
I am not really sure what to try next. I know my device supports OpenCL 2.0. My device is an AMD Radeon R9 380 graphics card. I do not have access to any other OpenCL 2.0 capable hardware to test it on.
I ended up figuring this one out. This issue happened because I did not create a device-side queue (one with the flags of CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT).
Reading Qt signal & slots documentation, it seems that the only reason for a new style connection to fail is:
"If there is already a duplicate (exact same signal to the exact same slot on the same objects), the connection will fail and connect will return false"
Which means that connection was already successful the first time and does not allow multi-connections when using Qt::UniqueConnection.
Does this means that Qt-5 style connection will always success? Are there any other reasons for failure?
The new-style connect can still fail at runtime for a variety of reasons:
Either sender or receiver is a null pointer. Obviously this requires a check that can only happen at runtime.
The PMF you specified for a signal is not actually a signal. Lacking proper C++ reflection capabilities, all you can do at compile time is checking that the signal is a non-static member function of the sender's class.
However, that's not enough to make it a signal: it also needs to be in a signals: section in your class definition. When moc sees your class definition, it will generate some metadata containing the information that that function is indeed a signal. So, at runtime, the pointer passed to connect is looked up in a table, and connect itself will fail if the pointer is not found (because you did not pass a signal).
The check on the previous point actually requires a comparison between pointers to member functions. It's a particularly tricky one, because it will typically involve different TUs:
one is the TU containing moc-generated data (typically a moc_class.cpp file). In this TU there's the aforementioned table containing, amongst other things, pointers to the signals (which are just ordinary member functions).
is the TU where you actually invoke connect(sender, &Sender::signal, ...), which generates the pointer that gets looked up in the table.
Now, the two TUs may be in the same application, or perhaps one is in a library and the other in your application, or maybe in two libraries, etc; your platform's ABI starts to get into play.
In theory, the pointers stored when doing 1. are identical to the pointers generated when doing 2.; in practice, we've found cases where this does not happen (cf. this bug report that I reported some time ago, where older versions of GNU ld on ARM generated code that failed the comparison).
For Qt this meant disabling certain optimizations and/or passing some extra flags to the places where we know this to happen and break user software. For instance, as of Qt 5.9, there is no support for -Bsymbolic* flags on GCC on anything but x86 and x86-64.
Of course, this does not mean we've found and fixed all the possible places. New compilers and more aggressive optimizations might trigger this bug again in the future, making connect return false, even when everything is supposed to work.
Yes it can fail if either sender or receiver are not valid objects (nullptr for example)
Example
QObject* obj1 = new QObject();
QObject* obj2 = new QObject();
// Will succeed
connect(obj1, &QObject::destroyed, obj2, &QObject::deleteLater);
delete obj1;
obj1 = nullptr;
// Will fail even if it compiles
connect(obj1, &QObject::destroyed, obj2, &QObject::deleteLater);
Do not try to register pointer type. I've used the macro
#define QT_REG_TYPE(T) qRegisterMetaType<T>(#T)
with pointer type CMyWidget*, that was the problem. Using the type directly worked.
No it's not always successful. The docs give an example here where connect would return false because the signal should not contain variable names.
// WRONG
QObject::connect(scrollBar, SIGNAL(valueChanged(int value)),
label, SLOT(setNum(int value)));
I'm new to OpenCL, and have followed this tutorial to get started. Before creating the cl::Context, the tutorial creates a static array of three cl_context_properties which it doesn't explain what it is for, but which it sends as the properties argument in the cl::Context constructor.
However when looking at the reference page for cl::Context, there is no explanation of what the properties parameter is, but it does say that it "is reserved and must be zero". So why does the tutorial send a non-zero value as that argument? What purpose does it serve? And if you have been able to pass that argument before, how does it come that it is suddenly "reserved", doesn't that make OpenCL non-backward compatible?
The code compiles and runs fine both with and without the parameter there. The only difference is that I get a warning that cprops is unused when putting NULL there instead of cprops.
Also, when I pass CL_DEVICE_TYPE_CPU | CL_DEVICE_TYPE_GPU as the type argument to the cl::Context constructor, my application will crash (SIGSEGV) when I later try to create a cl::Buffer with the context. Why? Am I not able to specify more than one device type to use simultaneously?
Update: By giving NULL as the properties argument to the cl::Context constructor, the variable platformList is suddenly not used to anything OpenCL related anymore. The tutorial seems to use platformList to specify the platform for which the cl::Context should be created, but now the context is just created like this:
cl::Context context(
CL_DEVICE_TYPE_GPU,
NULL,
NULL,
NULL,
&err);
so I don't get to specify the platform. Shouldn't I get to do that? How does it come I can't do that when the tutorial seemed to be doing that?
On your first question, see the official OpenCL documentation for a description of this parameter: http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/. This is the C API, but it is the same as the C++ API.
As to your second question - you may want to check the error result from creating the context to see why it doesn't like the type parameters you specify.
Using AMDs APP OpenCL implementation with JOCL bindings, I'm trying to create a generic bracketing profiler using Java automatic resource management. The basic idea is:
class Timer implements AutoCloseable {
...
Timer {
...
clEnqueueMarker( commandQueue, startEvent );
}
void close() {
cl_event stopEvent = new cl_event();
clEnqueueMarker( commandQueue, stopEvent );
clFinish( commandQueue );
... calculate and output times ...
}
}
My problem is that profiling information is not available for the marker command events (stopEvent and startEvent). This is despite a) setting CL_QUEUE_PROFILING_ENABLE on the command queue and b) flushing and waiting on the command queue and verifying that the stop and start events are CL_COMPLETE with no errors.
So my question is, is profiling supported on marker commands in AMD OpenCL? If not, is it explicitly disallowed by the spec (I found nothing to this effect)?
Thanks.
I've rechecked the spec and it seems to me that what you get is normal (though I've never paid much attention to that detail previously). In the section 5.12 about profiling, the standard states:
This section describes profiling of OpenCL functions that are enqueued
as commands to a command-queue. The specific functions being
referred to are: clEnqueue{Read|Write|Map}Buffer,
clEnqueue{Read|Write}BufferRect, clEnqueue{Read|Write|Map}Image,
clEnqueueUnmapMemObject, clEnqueueCopyBuffer, clEnqueueCopyBufferRect,
clEnqueueCopyImage, clEnqueueCopyImageToBuffer,
clEnqueueCopyBufferToImage, clEnqueueNDRangeKernel , clEnqueueTask and
clEnqueueNativeKernel.
So the clEnqueueMarker() function is not in the list, and I guess the CL_PROFILING_INFO_NOT_AVAILABLE value returned makes sense.
I just tried this and it seems to work now. Tested on Windows 10 with an AMD 7870 and on Linux with Nvidias Titan Black and Titan X cards.
The OpenCL 1.2 specs still contain the paragraph #CaptainObvious quoted. The clEnqueueMarker function is still missing, but I can get profiling information without a problem.
The start and end times on marker events are always equal, which makes a lot of sense.
Btw. clEnqueueMarker is deprecated in OpenCL 1.2 and should be replaced with clEnqueueMarkerWithWaitList.
In my user space Linux application, I have a thread which communicated to the main process through a pipe. Below is the code
static void _notify_main(int cond)
{
int r;
int tmp = cond;
r = write( _nfy_fd, &tmp, sizeof(tmp) );
ERROR( "write failed: %d. %s\n", r, strerror(r) );
}
Pretty straight forward. It's been working fine for quite a while now. But recently, the write call will fail with "interrupted system call" error after the programme went under some stress test.
Strangely, the stuff actually went through the pipe no problem. Of course I'd still like to go to the bottom of the error message and get rid of it.
Thanks,
The write(2) man page mentions:
Conforming to
SVr4, 4.3BSD, POSIX.1-2001.
Under SVr4 a write may be interrupted and return EINTR at any point, not just before any data is written.
I guess you were just lucky that it didn't occur so far.
If you google just for the "interrupted system call", you will find this thread which tells you to use siginterrupt() to auto-restart the write call.
From http://www.gnu.org/
A signal can arrive and be handled while an I/O primitive such as open
or read is waiting for an I/O device. If the signal handler returns,
the system faces the question: what should happen next?
POSIX specifies one approach: make the primitive fail right away. The
error code for this kind of failure is EINTR. This is flexible, but
usually inconvenient. Typically, POSIX applications that use signal
handlers must check for EINTR after each library function that can
return it, in order to try the call again. Often programmers forget to
check, which is a common source of error.
So you can handle the EINTR error, there is another choice by the way, You can use sigaction to establish a signal handler specifying how that handler should behave. Using the SA_RESTART flag, return from that handler will resume a primitive; otherwise, return from that handler will cause EINTR.
see interrupted primitives