I've defined some data structures that implement a register protocol for a Modbus/RS-485 application. I'm compiling this for a Particle electron board.
How do I add a varying datatype to a structure? I tried (void) as well. Is this even possible?
typedef struct {
uint16_t registerAddress;
uint8_t registerSize;
void* dataType;
char description[50];
} _rgRegister;
static const _rgRegister PressureParameterRegister[6]={
{0x038, 2, float, "Measured value"},
{0x040, 1, ushort, "Parameter Id = 2 (pressure)"},
{0x041, 1, ushort, "Units Id"},
{0x042, 1, ushort, "Data Quality Id"},
{0x043, 2, float, "Off line sentinel value (default = 0.0)"},
{0x045, 1, char, "Available Units = 0x0005"}
};
The other option is I declare it as:
char datatype[10];
and pass it as:
_rgRegister.datatype = "float"
And I have to have some switch statement that dynamically casts the datatype to the data.
How do I add a varying datatype to a structure? I tried (void) as well. Is this even possible?
If the data type is limited, you can use an enum to represent the data type and a union to represent the data.
enum DataType { DT_CHAR, DT_USHORT, DT_INT, DT_FLOAT, ..., };
typedef struct {
uint16_t registerAddress;
uint8_t registerSize;
DataType dataType;
union
{
char c;
unsigned short us;
int i;
float f;
...
} data;
char description[50];
} _rgRegister;
static const _rgRegister PressureParameterRegister[6]={
{0x038, 2, DT_FLOAT, 0, "Measured value"},
{0x040, 1, DT_USHORT, 0, "Parameter Id = 2 (pressure)"},
{0x041, 1, DT_USHORT, 0, "Units Id"},
{0x042, 1, DT_USHORT, 0, "Data Quality Id"},
{0x043, 2, DT_FLOAT, 0, "Off line sentinel value (default = 0.0)"},
{0x045, 1, DT_CHAR, 0, "Available Units = 0x0005"}
};
If you have the option of using boost, you can use boost::any to simplify your code.
Related
I have a program where I use a vector to simulate all the possible outcomes when counting cards in blackjack. There's only three possible values, -1, 0, and 1. There's 52 cards in a deck therefore the vector will have 52 elements, each assigned one of values mentioned above. The program works when I scale down the size of the vector, it still works when I have it as this size however I get no output and get the warning "warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data".
#include<iostream>
#include"subtracter.h"
#include<time.h>
#include<vector>
#include<random>
using namespace std;
int acecard = 4;
int twocard = 4;
int threecard = 4;
int fourcard = 4;
int fivecard = 4;
int sixcard = 4;
int sevencard = 4;
int eightcard = 4;
int ninecard = 4;
int tencard = 16;
// declares how many of each card there is
vector<int> cardvalues = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
// a vector that describes how many cards there are with a certain value
vector<int> deck = { acecard, twocard, threecard, fourcard, fivecard, sixcard, sevencard, eightcard, ninecard, tencard };
// a vector keeping track of how many of each cards there's left in the deck
int start()
{
int deckcount;
deckcount = 0;
int decksize;
decksize = cardvalues.size();
while (decksize >= 49)
{
deckcount += cardsubtracter(cardvalues);
};
return deckcount;
}
int cardcounting()
{
int deckcount;
deckcount = start();
deckcount += cardsubtracter(cardvalues);
return deckcount;
}
int main()
{
int value;
value = cardcounting();
int size;
size = cardvalues.size();
cout << value << "\n";
cout << size;
return 0;
}
#include<iostream>
#include<random>
using namespace std;
int numbergenerator(int x, int y)
{
int number;
random_device generator;
uniform_int_distribution<>distrib(x, y);
number = distrib(generator); //picks random element from vector
return number;
}
int cardsubtracter(vector<int> mynum)
{
int counter;
int size;
int number;
size = mynum.size() - 1;//gives the range of values to picked from the vectorlist
number = numbergenerator(0, size);//gives a random number to pick from the vectorlist
counter = mynum[number]; // uses the random number to pick a value from the vectorlist
mynum.erase(mynum.begin()+number); //removes that value from the vectorlist
return counter;
}
I looked up the max limit of vectors and it said that vectors can hold up 232 values with integers, which should work for this. So I also tried creating a new file and copying the code over to that in case there was something wrong with this file.
There could be different reasons why a vector may not be able to hold all 52 elements. Some possible reasons are:
Insufficient memory: Each element in a vector requires a certain amount of memory, and the total memory required for all 52 elements may exceed the available memory. This can happen if the elements are large, or if there are many other variables or data structures in the environment that consume memory.
Data type limitations: The data type of the vector may not be able to accommodate all 52 elements. For example, if the vector is of type "integer", it can only hold integers up to a certain limit, beyond which it will overflow or produce incorrect results.
Code errors: There may be errors in the code that prevent all 52 elements from being added to the vector. For example, if the vector is being filled in a loop, there may be a mistake in the loop condition or in the indexing that causes the loop to terminate early or skip some elements.
To determine the exact reason for the vector not being able to hold all 52 elements, it is necessary to examine the code, the data types involved, and the memory usage.
I know atomic functions with OpenCL-1.x are not recommended but I just want to understand an atomic example.
The following kernel code is not working well, it produces random final values for the computation of sum of all array values (sum reduction) :
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
void atom_add_double(volatile __local double *val, double delta)
{
union {
double f;
ulong i;
} old, new;
do
{
old.f = *val;
new.f = old.f + delta;
}
while (atom_cmpxchg((volatile __local ulong *)val, old.i, new.i) != old.i);
}
__kernel void sumGPU ( __global const double *input,
__local double *localInput,
__global double *finalSum
)
{
uint lid = get_local_id(0);
uint gid = get_global_id(0);
uint localSize = get_local_size(0);
uint groupid = get_group_id(0);
local double partialSum;
local double finalSumTemp;
// Initialize sums
if (lid==0)
{
partialSum = 0.0;
finalSumTemp = 0.0;
}
barrier(CLK_LOCAL_MEM_FENCE);
// Set in local memory
int idx = groupid * localSize + lid;
localInput[lid] = input[idx];
// Compute atom_add into each workGroup
barrier(CLK_LOCAL_MEM_FENCE);
atom_add_double(&partialSum, localInput[lid]);
// See and Check if barrier below is necessary
barrier(CLK_LOCAL_MEM_FENCE);
// Final sum of partialSums
if (lid==0)
{
atom_add_double(&finalSumTemp, partialSum);
*finalSum = finalSumTemp;
}
}
The version with global id strategy works good but the version above, which passes by the using of local memory (shared memory), doesn't give the expected results (the value of *finalSum is random for each execution).
Here the Buffers and kernel args that I have put in my host code :
// Write to buffers
ret = clEnqueueWriteBuffer(command_queue, inputBuffer, CL_TRUE, 0,
nWorkItems * sizeof(double), xInput, 0, NULL, NULL);
ret = clEnqueueWriteBuffer(command_queue, finalSumBuffer, CL_TRUE, 0,
sizeof(double), finalSumGPU, 0, NULL, NULL);
// Set the arguments of the kernel
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&inputBuffer);
clSetKernelArg(kernel, 1, local_item_size*sizeof(double), NULL);
clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&finalSumBuffer);
and Finally, I read finalSumBuffer to get the sum value.
I think my issue comes rather from the kernel code but I can't find where is the error.
If anyone could see what's wrong, this would be nice to tell me.
Thanks
UPDATE 1 :
I nearly manage to perform this reduction. Following the propositions suggested by huseyin tugrul buyukisik, I have modified the kernel code like this :
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
void atom_add_double(volatile __local double *val, double delta)
{
union {
double d;
ulong i;
} old, new;
do
{
old.d = *val;
new.d = old.d + delta;
}
while (atom_cmpxchg((volatile __local ulong *)val, old.i, new.i) != old.i);
}
__kernel void sumGPU ( __global const double *input,
__local double *localInput,
__local double *partialSum,
__global double *finalSum
)
{
uint lid = get_local_id(0);
uint gid = get_global_id(0);
uint localSize = get_local_size(0);
uint groupid = get_group_id(0);
// Initialize partial sums
if (lid==0)
partialSum[groupid] = 0.0;
barrier(CLK_LOCAL_MEM_FENCE);
// Set in local memory
int idx = groupid * localSize + lid;
localInput[lid] = input[idx];
// Compute atom_add into each workGroup
barrier(CLK_LOCAL_MEM_FENCE);
atom_add_double(&partialSum[groupid], localInput[lid]);
// See and Check if barrier below is necessary
barrier(CLK_LOCAL_MEM_FENCE);
// Compute final sum
if (lid==0)
*finalSum += partialSum[groupid];
}
As said huseyin , I don't need to use atomic functions for the final sum of all partial sums.
So I did at the end :
// Compute final sum
if (lid==0)
*finalSum += partialSum[groupid];
But unfortunately, the final sum doesn't give the value expected and the value is random (for example, with nwork-items = 1024 and size-WorkGroup = 16, I get random values in the order of [1e+3 - 1e+4] instead of 5.248e+05 expected.
Here are the setting of arguments into the host code :
// Set the arguments of the kernel
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&inputBuffer);
clSetKernelArg(kernel, 1, local_item_size*sizeof(double), NULL);
clSetKernelArg(kernel, 2, nWorkGroups*sizeof(double), NULL);
clSetKernelArg(kernel, 3, sizeof(cl_mem), (void *)&finalSumBuffer);
Could you see where is my error in the kernel code ?
Thanks
Not an error but logic issue:
atom_add_double(&finalSumTemp, partialSum);
is working only once per group (by zero-local-indexed thread).
So you are just doing
finalSumTemp = partialSum
so atomics here is not needed.
There is race condition for
*finalSum = finalSumTemp;
between workgroups where each zero-index local thread writes to same address. So this should be the atomic addition (for learning purposes) or could be written on different cells to be added on host side such as sum_group1+sum_group2+... = total sum.
int idx = groupid * localSize + lid;
localInput[lid] = input[idx];
here using groupid is suspicious for multi-device summation. Because each device has its own global range and workgroup id indexings so two device could have same group id values for two different groups. Some device related offset should be used when multiple devices are used. Such as:
idx= get_global_id(0) + deviceOffset[deviceId];
Also if atomic operation is inavoidable, and if exactly N times operated, it could be moved to a single thread(such as 0-indexed thread) and looped for N times(probably being faster) in a second kernel unless that atomic operation latency can't be hidden by other means.
I have an array of 100 elements, and what I want to do is copy these 100 elements into every nth element of another array.
Let's say n was 3
The new array would have [val1 0 0 val2 0 0 val3 0 0 ...] after the values were copied to every nth element. Now in opencl, I tried creating a pointer which would point to the current index and simply I would just add n to this value every time. However, the current index always just keeps the same value in it. Below is the code I have.
__kernel void ddc(__global float *inputArray, __global float *outputArray, __const int interpolateFactor, __global int *currentIndex){
int i = get_global_id(0);
outputArray[currentIndex[0]] = inputArray[i];
currentIndex[0] = currentIndex[0] + (interpolateFactor - 1);
printf("index %i \n", currentIndex[0]);
}
Host code for the currentIndex part:
int *index;
index = (int*)malloc(2*sizeof(int));
index[0] = 0;
cl_mem currentIndex;
currentIndex = clCreateBuffer(
context,
CL_MEM_WRITE_ONLY,
2 * sizeof(int),
NULL,
&status);
status = clEnqueueWriteBuffer(
cmdQueue,
currentIndex,
CL_FALSE,
0,
2 * sizeof(int),
index,
0,
NULL,
NULL);
printf("Index enqueueWriteBuffer status: %i \n", status);
status |= clSetKernelArg(
kernel,
4,
sizeof(cl_mem),
¤tIndex);
printf("Kernel Arg currentIndex Factor status: %i \n", status);
If you are wondering why I am using an array with two elements, it's because I wasn't sure how to just reference a single variable. I just implemented it the same way I had the input and output array working. When I run the kernel with an interpolateFactor of 3, currentIndex is always printing 2.
So if I understood right what you want to do is save the next index that should be used to currentIndex. This will not work. The value will not instantly update for other workitems. If you wanted to do it this way you would have to execute all the kernels sequentially.
What you could do is
__kernel void ddc(__global float *inputArray, __global float *outputArray, __const int interpolateFactor, int start){
int i = get_global_id(0);
outputArray[start+i*(interpolateFactor-1)] = inputArray[i];
}
assuming you can start from any other spot than 0. Otherwise you could just ditch it completely.
To get it working like that you do
int start = 0;
status |= clSetKernelArg(
kernel,
3, // This should be 3 right? Might have been the problem to begin with.
sizeof(int),
&start);
Hopefully this helps.
I would like to send an array of strings from the master to a slave thread using Messgae Passing Interface (MPI).
i.e. String [] str = new String [10]
str[0]= "XXX" ... etc
How can I do that while avoiding to send each of the elements in this array as a chain of characters?
I succeeded to send an array of integers in one send operation ... but I don't know how to do that when it is about an array of strings
I don't know Java, but I'll give you the C answer. The concepts -- particularly the two approaches one might take to solve this - are the same in any language, though.
Imagine if this were a simple c-string (some characters terminated with '\0'). There are two approaches:
over-provision memory and receive up to some limit,
or send a message indicating how much data to expect.
Do you have a maximum length? (e.g. PATH_MAX or something like that). If you do not need every byte of memory, you could do
MPI_Send(str, strlen(str), MPI_CHAR, slave_rank, slave_tag, MPI_COMM_WORLD);
and you'd pair that with
MPI_Recv(str, MAX_LENGTH, MPI_CHAR, master_rank, slave_tag, MPI_COMM_WORLD);
If you don't like having slop at the end, you'll have to do it in two messages:
len=strlen(str) + 1; /* +1 for the NULL byte */
MPI_Send(&len, 1, MPI_INT, slave_rank, slave_tag, MPI_COMM_WORLD);
MPI_Send(str, strlen(str), MPI_CHAR, slave_rank, slave_tag, MPI_COMM_WORLD);
and you'd match that with
MPI_Recv(&len, 1, MPI_INT, master_rank, slave_tag, MPI_COMM_WORLD);
payload= malloc(len);
MPI_Recv(&payload, len, MPI_CHAR, master_rank, slave_tag, MPI_COMM_WORLD);
Sending arrays of strings, especially if of varying sizes, is quite an involving process. There are several options but the most MPI-friendly one is to use the packing and unpacking facilities of MPI, exposed in mpiJava as Comm.Pack, Comm.Unpack, and Comm.Pack_size.
You could do something of the sort:
Sender
byte[][] bytes = new byte[nStr][];
int[] lengths = new int[nStr];
int bufLen = MPI.COMM_WORLD.Pack_size(1, MPI.INT);
bufLen += MPI.COMM_WORLD.Pack_size(nStr, MPI.INT);
for (int i = 0; i < nStr; i++) {
bytes[i] = str[i].getBytes(Charset.forName("UTF-8"));
lengths[i] = bytes[i].length;
bufLen += MPI.COMM_WORLD.Pack_size(lengths[i], MPI.BYTE);
}
byte[] buf = new byte[bufLen];
int position = 0;
int nStrArray[] = new int[1];
nStrArray[0] = nStr;
position = MPI.COMM_WORLD.Pack(nStrArray, 0, 1, MPI.INT,
buf, position);
position = MPI.COMM_WORLD.Pack(lengths, 0, nStr, MPI.INT,
buf, position);
for (int i = 0; i < nStr; i++) {
position = MPI.COMM_WORLD.Pack(bytes[i], 0, lengths[i], MPI.BYTE,
buf, position);
}
MPI.COMM_WORLD.Send(buf, 0, bufLen, MPI.PACKED, rank, 0);
Having string lengths in an auxiliary array and packing it at the beginning of the message simplifies the receiver logic.
Receiver
Assumes that the sender is rank 0.
Status status = MPI.COMM_WORLD.Probe(0, 0);
int bufLen = status.Get_count(MPI.PACKED);
byte[] buf = new byte[bufLen];
MPI.COMM_WORLD.Recv(buf, 0, bufLen, MPI.PACKED, status.source, status.tag);
int position = 0;
int nStrArray[] = new int[1];
position = MPI.COMM_WORLD.Unpack(buf, position,
nStrArray, 0, 1, MPI.INT);
int nStr = nStrArray[0];
int lengths[] = new int[nStr];
position = MPI.COMM_WORLD.Unpack(buf, position,
lengths, 0, nStr, MPI.INT);
String[] str = new String[nStr];
for (int i = 0; i < nStr; i++) {
byte[] bytes = new byte[lengths[i]];
position = MPI.COMM_WORLD.Unpack(buf, position,
bytes, 0, lengths[i], MPI.BYTE);
str[i] = new String(bytes, "UTF-8");
}
Disclaimer: I don't have MPJ Express installed and my Java knowledge is very limited. The code is based on the mpiJava specification, the MPJ Express JavaDocs, and some examples found on the Internet.
I have a native kernel setup but I don't know how to convert its void* argument into anything useful. In the native kernel of this snippet, how would I get the int (7) or the int[] (16 ints set to 0)?
void __stdcall nativeKernel(void * args)
{
int a1 = (*(int*)args);
cout << "a1-->: "<< a1 << endl; // gibberish
}
void kernelCaller()
{
const int dim1Size = 16;
int dim1[dim1Size] = {};
cl_int status = 0;
cl_mem mem_d1 = clCreateBuffer(*context, 0, sizeof(int)*dim1Size, NULL, &status);
clEnqueueWriteBuffer(*queue, mem_d1, CL_TRUE, 0, sizeof(int)*dim1Size, dim1, 0, NULL, NULL);
const void* args[2] = {(void*)7, NULL};
cl_mem mem_list[1] = {mem_d1};
const void* args_mem_loc[1] = {&args[1]};
cl_event run;
status = clEnqueueNativeKernel(*queue, nativeKernel, args, 2, 1, mem_list, args_mem_loc, 0, NULL, &run);
status = clEnqueueReadBuffer(*queue, mem_d1, CL_TRUE, 0, sizeof(int)*dim1Size, dim1, 1, &run, NULL);
for(auto i = 0; i != dim1Size; i++)
cout << dim1[i] << " ";
}
instead of playing hard with void* i would like to suggest to use struct
create your parameter structure like:
struct myparams{
int a
int a[3];
};
and then create and fill one struct myparams in your program and pass its address to the kernelcaller
struct myparams params;
params.a=3;
status = clEnqueueNativeKernel(*queue, nativeKernel, (void*)¶ms, 2, 1, mem_list, args_mem_loc, 0, NULL, &run);
and in the nativeKernel just unbox the void* into your parameter struct:
struct myparams *params=(myparams*)args;
beware: in the example above i passed a pointer of the stack...you might not want that ;)