Limit on number of kernel arguments in OpenCL - opencl

I wanted to know if there is any limit on the number of arguments that are set to kernel function in OpenCL. I am getting the error as INVALID_ARG_INDEX while setting arguments. I am setting 9 arguments in the kernel function. Please help me in this regard.

You might try calling the following function: www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clGetDeviceInfo.html
The only argument limits seem to concern:
-CL_DEVICE_MAX_CONSTANT_ARGS (arguments which are pointers to the __constant memory space)
-CL_DEVICE_MAX_READ_IMAGE_ARGS
-CL_DEVICE_MAX_WRITE_IMAGE_ARGS
Max read image arguments count should not be a problem, however max constant arguments count and max write image arguments count should be at least 8 on all devices. I might be that you are passing 9 write images on a device that only accepts 8, for example...

Related

How can I query the size of an OpenCL kernel argument?

I want to query the size of an OpenCL kernel argument so that I can ensure that I send it a variable of the correct size. I am able to query lots of other properties of each kernel argument using clGetKernelArgInfo, as follows:
clGetKernelArgInfo(k, argc, CL_KERNEL_ARG_TYPE_NAME, sizeof(argType), &argType, &retSize);
This will tell me the string name of the type, for example. But that's not good enough, especially in complex cases where it's a struct and the string name is the same on host and device, but the packing is different, so the size is different. The things that I can query, according to https://man.opencl.org/clGetKernelArgInfo.html , are:
CL_KERNEL_ARG_ADDRESS_QUALIFIER
CL_KERNEL_ARG_ACCESS_QUALIFIER
CL_KERNEL_ARG_TYPE_NAME
CL_KERNEL_ARG_TYPE_QUALIFIER
CL_KERNEL_ARG_NAME
Any ideas?
FYI, this is NOT a duplicate of Get OpenCL Kernel-argument information because that is asking how to use the argument query function, not asking how to query the argument size.
There's no standard way to check before setting the argument as far as I'm aware, but the clSetKernelArg call will return CL_​INVALID_​ARG_​SIZE if the sizes don't match properly, so that should allow you to detect and handle errors accordingly:
CL_INVALID_ARG_SIZE if arg_size does not match the size of the data type for an argument that is not a memory object or if the argument is a memory object and arg_size != sizeof(cl_mem) or if arg_size is zero and the argument is declared with the __local qualifier or if the argument is a sampler and arg_size != sizeof(cl_sampler).

Append OpenCL result to list / Reduce solution room

I have an OpenCL Kernel with multiple work items. Let's assume for discussion, that I have a 2-D Workspace with x*y elements working on an equally sized, but sparce, array of input elements. Few of these input elements produce a result, that I want to keep, most don't. I want to enqueue another kernel, that only takes the kept results as an input.
Is it possible in OpenCL to append results to some kind of list to pass them as input to another Kernel or is there a better idea to reduce the volume of the solution space? Furthermore: Is this even a good question to ask with the programming model of OpenCL in mind?
What I would do if the amount of result data is a small percentage (ie: 0-10%) is use local atomics and global atomics, with a global counter.
Data interface between kernel 1 <----> Kernel 2:
int counter //used by atomics to know where to write
data_type results[counter]; //used to store the results
Kernel1:
Create a kernel function that does the operation on the data
Work items that do produce a result:
Save the result to local memory, and ensure no data races occur using local atomics in a local counter.
Use the work item 0 to save all the local results back to global memory using global atomics.
Kernel2:
Work items lower than "counter" do work, the others just return.

Number of Work items not matching get_global_size openCL

I've a strange situation for openCL application
Global_work_size = 1920x1080,Local_work_size = 512,Work_Dim = 1.
In my kernel, I'm able to see correct values for
get_global_size(2073600) and get_num_groups(4050).
However, get_global_id shows only 518399 which is 1/4 times of actual value.
Similary, get_group_id is showing 1012 which is 1/4 times of actual value.
Because of this, work_item used for indexing is incomplete.
Suggestions to solve this.
In OpenCL 1.x, the global work size must be an integer multiple of the local size. Yours is not (1920 / 512 = not an integer).

OpenCL Sanitize Function Inputs

Some OpenCL functions crash the kernel if any of their arguments are non-standard floats like NAN or INFINITY. I am trying to stop these crashes from occurring by wrapping my arguments to remove bad values.
Here are three of my attempts to create argument sanitizers:
clean_arg = clamp(dirty_arg, 1, 10000); Crashes
clean_arg = convert_float(convert_int(dirty_arg)); Hacky
clean_arg = isnan(dirty_arg) ? 1 : dirty_arg; Verbose
Is there a better way to detect and remove undesirable floating-point values?

Checking get_global_id in OpenCL Kernel Necessary?

I have noticed a number of kernel sources that look like this (found randomly by Googling):
__kernel void fill(__global float* array, unsigned int arrayLength, float val)
{
if(get_global_id(0) < arrayLength)
{
array[get_global_id(0)] = val;
}
}
My question is if that if-statement is actually necessary (assuming that "arrayLength" in this example is the same as the global work size).
In some of the more "professional" kernels I have seen, it is not present. It also seems to me that the hardware would do well to not assign kernels to nonsense coordinates.
However, I also know that processors work in groups. Hence, I can imagine that some processors of a group must do nothing (for example if you have 1 group of size 16, and a work size of 41, then the group would process the first 16 work items, then then next 16, then the next 9, with 7 processors not doing anything--do they get dummy kernels?).
I checked the spec., and the only relevant mention of "get_global_id" is the same as the online documentation, which reads:
The global work-item ID specifies the work-item ID based on the number of global work-items specified to execute the kernel.
. . . based how?
So what is it? Is it safe to omit iff the array's size is a multiple of the work group size? What?
You have the right answer already, I think. If the global size of your kernel execution is the same as the array length, then this if statement is useless.
In general, that type of check is only needed for cases where you've partitioned your data in such a way that you know you might execute extra work items relative to your array size. In my experience, you can almost always avoid such cases.

Resources