I work Interchangeably with 32 bit floats and 32 bit integers. I want two kernels that do exactly the same thing, but one is for integers and one is for floats. At first I thought I could use templates or something, but it does not seem possible to specify two kernels with the same name but different argument types?
import pyopencl as cl
import numpy as np
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
prg = cl.Program(ctx, """
__kernel void arange(__global int *res_g)
{
int gid = get_global_id(0);
res_g[gid] = gid;
}
__kernel void arange(__global float *res_g)
{
int gid = get_global_id(0);
res_g[gid] = gid;
}
""").build()
Error:
<kernel>:8:15: error: conflicting types for 'arange'
__kernel void arange(__global float *res_g)
^
<kernel>:2:15: note: previous definition is here
__kernel void arange(__global int *res_g)
What is the most convenient way of doing this?
#define directive can be used for that:
code = """
__kernel void arange(__global TYPE *res_g)
{
int gid = get_global_id(0);
res_g[gid] = gid;
}
"""
prg_int = cl.Program(ctx, code).build("-DTYPE=int")
prg_float = cl.Program(ctx, code).build("-DTYPE=float")
Related
From this question and this question I managed to compile a minimal example of summing a vector into a single double inside OpenCL 1.2.
/* https://suhorukov.blogspot.com/2011/12/opencl-11-atomic-operations-on-floating.html */
inline void AtomicAdd(volatile __global double *source, const double operand) {
union { unsigned int intVal; double floatVal; } prevVal, newVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while( atomic_cmpxchg((volatile __global unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal );
}
void kernel cost_function(__constant double* inputs, __global double* outputs){
int index = get_global_id(0);
if(0 == error_index){ outputs[0] = 0.0; }
barrier(CLK_GLOBAL_MEM_FENCE);
AtomicAdd(&outputs[0], inputs[index]); /* (1) */
//AtomicAdd(&outputs[0], 5.0); /* (2) */
}
As in fact this solution is incorrect because the result is always 0 when the buffer is accessed. What might the problem with this?
the code at /* (1) */ doesn't work, and neither does the code at /* (2) */, which is only there to test the logic independent of any inputs.
Is barrier(CLK_GLOBAL_MEM_FENCE); used correctly here to reset the output before any calculations are done to it?
According to the specs in OpenCL 1.2 single precision floating point numbers are supported by atomic operations, is this(AtomicAdd) a feasible method of extending the support to double precision numbers or am I missing something?
Of course the device I am testing with supports cl_khr_fp64˙of course.
Your AtomicAdd is incorrect. Namely, the 2 errors are:
In the union, intVal must be a 64-bit integer and not 32-bit integer.
Use the 64-bit atom_cmpxchg function and not the 32-bit atomic_cmpxchg function.
The correct implementation is:
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
inline void AtomicAdd(volatile __global double *source, const double operand) {
union { unsigned ulong u64; double f64; } prevVal, newVal;
do {
prevVal.f64 = *source;
newVal.f64 = prevVal.f64 + operand;
} while(atom_cmpxchg((volatile __global ulong*)source, prevVal.u64, newVal.u64) != prevVal.u64);
}
barrier(CLK_GLOBAL_MEM_FENCE); is used correctly here. Note that a barrier must not be in an if- or else-branch.
UPDATE: According to STREAMHPC, the original implementation you use is not guaranteed to produce correct results. There is an improved implementation:
void __attribute__((always_inline)) atomic_add_f(volatile global float* addr, const float val) {
union {
uint u32;
float f32;
} next, expected, current;
current.f32 = *addr;
do {
next.f32 = (expected.f32=current.f32)+val; // ...*val for atomic_mul_f()
current.u32 = atomic_cmpxchg((volatile global uint*)addr, expected.u32, next.u32);
} while(current.u32!=expected.u32);
}
#ifdef cl_khr_int64_base_atomics
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
void __attribute__((always_inline)) atomic_add_d(volatile global double* addr, const double val) {
union {
ulong u64;
double f64;
} next, expected, current;
current.f64 = *addr;
do {
next.f64 = (expected.f64=current.f64)+val; // ...*val for atomic_mul_d()
current.u64 = atom_cmpxchg((volatile global ulong*)addr, expected.u64, next.u64);
} while(current.u64!=expected.u64);
}
#endif
I use an Arduino Uno with Arduino IDE 1.8.3. I have two arrays. I want to write a Deputy function that can add two arrays, and return the result to the main function and print it.
But I want to use x(sizeof(a)), but it seems not correct...
How do I solve this problem?
This is my code:
int a[]={1,2,3,4,5,6},b[]={1,1,1,1,1,1};
void setup() {
Serial.begin(9600);
int *p;
p = add(a,b);
for(int i=0;i<4;i++){
Serial.print(*(p+i));
}
}
void loop() {
}
int * add(int *a,int *b) {
int x = sizeof(a);
int y = sizeof(b);
static int z[4];
for(int i=0;i<4;i++) {
z[i]=a[i]+b[i];
}
return z;
}
int* a does not know the size of the array.
Easiest pass it as an extra parameter.
The next problem is that your static result cannot change its size dynamically.
static has additional problems anyway, in general.
int* add(const int *a,const int *b, int* result, byte size) {
for(byte i=0; i<size; i++) {
result[i]=a[i]+b[i];
}
return result;
}
Returning the result as the return value may be convenient.
Yesterday i got to run the unit tests of our current application on the new notebooks and got the CL_OUT_OF_RESOURCES error doing so. The code itself runs without errors on ATI cards or Intel CPU's.
The thing that got me suspicious is that the M2000M supports 'OpenCL 1.2 CUDA'. Is this standard 'OpenCL 1.2' or does it differ and do i need to modify the code?
Here the code:
__kernel void pointNormals(__global const uint* cellLinkIds, __global const uint* cellLinks,
__global const float3* cellnormals, __global float3* pointnormals,
const uint nrPoints)
{
const uint gid = get_global_id(0);
if(gid < nrPoints)
{
const uint first = select(cellLinkIds[gid-1], (uint)0, gid==0);
const uint last = cellLinkIds[gid];
float3 pointnormal = (float3)0.f;
for(uint i = first; i < last; ++i)
{
pointnormal += cellnormals[cellLinks[i]];
}
pointnormals[gid] = normalize(pointnormal);
}
}
/edit:
In the tests i get 6 errors, first at the call of clWaitForEvents the others are from clEnqueueWriteBuffer
found the cause ...
the line with const uint first = select(cellLinkIds[gid-1], (uint)0, gid==0); caused invalid memory access when gid is 0 (first element afaik).
fixed ith with const uint first = gid == 0 ? (uint)0 : cellLinkIds[gid - 1];. but what i dont get is why AMD cards did work with that bug and Nvidia did return an error.
Since Kernel Code in PyOpenCl needs to be written only in C, I have written few functions that need to be called inside the Kernel code in PyOpenCL.Where should I store these functions? how to pass a global variable to that function.
In PyOpenCl my kernel code looks like this:
program = cl.Program(context, """
__kernel void Kernel_OVERLAP_BETWEEN_N_IP_GPU(__constant int *FBNs_array,__local int *Binary_IP, __local int *cc,__global const int *olp)
{
function1(int *x, int *y,__global const int *olp);
}
""").build()
Where should I write and store the function1 function. should I define it in kernel itself, or in some other file and provide a path. If i need to define it at some other place and provide a path, please provide me some details , I am completely new to C.
Thanks
Like in C, before the kernel.
program = cl.Program(context, """
void function1(int *x, int *y)
{
//function1 code
}
__kernel void kernel_name()
{
function1(int *x, int *y);
}""").build()
program = cl.Program(context, """
void function1(int x, int *y,__global const int *cc)
{
x=10;
}
__kernel void kernel_name(__global const int *cc)
{
int x=1;
int y[1]={10};
function1(x,y,cc); //now x=10
}""").build()
Is it possible to reinterpret parameters that have been passed into an OpenCL Kernel. For example, if I have an array of integers being passes in, but I want to interpret the integer at index 16 as a float (don't ask why!) then I would have thought this would work.
__kernel void Test(__global float* im, __constant int* constArray)
{
float x = *( (__constant float*) &constArray[16] );
im[0] = x;
}
However, I get a CL_INVALID_COMMAND_QUEUE error when I next try to use the command queue, implying that the above code has performed an illegal operation.
Any suggests what is wrong with the above, and/or how to achieve the reinterpretation?
I have now tried:
__kernel void Test(__global float* im, __constant int* constArray)
{
float x = as_float(0x3f800000);
im[0] = x;
}
and this does indeed give a 1.0f in im[0]. However,
__kernel void Test(__global float* im, __constant int* constArray)
{
float x = as_float(constArray[16]);
im[0] = x;
}
always results in zero in im[0] regardless of what is in constArray[16].
Regards,
Mark.
OpenCL includes the as_typen family of operators for reinterpret casting of values from one type to another. If I am understanding the question, you should be able to do something like
__kernel void Test(__global float* im, __constant int* constArray)
{
float x = as_float(constArray[16]);
im[0] = x;
}