OpenCL invalid command queue error when copying from private memory to global - opencl

I am trying to fix an error in the program and I pinpointed it to the really small area.
Whenever I am trying to copy data from private memory of the device into the global memory, command queue gets invalidated, and clFinish() returns an error.
Consider a simple example:
kernel void example(global int *data, const int width) {
int id = get_global_id(0);
if (id == 0) {
int copy[width]; // private memory?
for (int i = 0; i < width; i++) {
copy[i] = data[i]; // works
data[i] = copy[i]; // works
}
// whenever this loop is here
// i get invalid command queue from clFinish
for (int i = 0; i < width; i++) {
data[i] = copy[i];
}
}
}
So can somebody explain to me why is that the reason?
Thank you

If the width does exceed the maximum size, the private memory will be fine. I recommend you to run the kernel with width=8/16, for example, and see the result. If you used to pass a large value for width. It might not be possible to hold all data in the private memory. They are registers and have very limited size.

Related

Receiving a string through UART in STM32F4

I've written this code to receive a series of char variable through USART6 and have them stored in a string. But the problem is first received value is just a junk! Any help would be appreciated in advance.
while(1)
{
//memset(RxBuffer, 0, sizeof(RxBuffer));
i = 0;
requestRead(&dt, 1);
RxBuffer[i++] = dt;
while (i < 11)
{
requestRead(&dt, 1);
RxBuffer[i++] = dt;
HAL_Delay(5);
}
function prototype
static void requestRead(char *buffer, uint16_t length)
{
while (HAL_UART_Receive_IT(&huart6, buffer, length) != HAL_OK)
HAL_Delay(10);
}
First of all, the HAL_Delay seems to be redundant. Is there any particular reason for it?
The HAL_UART_Receive_IT function is used for non-blocking mode. What you have written seems to be more like blocking mode, which uses the HAL_UART_Receive function.
Also, I belive you need something like this:
Somewhere in the main:
// global variables
volatile uint8_t Rx_byte;
volatile uint8_t Rx_data[10];
volatile uint8_t Rx_indx = 0;
HAL_UART_Receive_IT(&huart1, &Rx_byte, 1);
And then the callback function:
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
if (huart->Instance == UART1) { // Current UART
Rx_data[Rx_indx++] = Rx_byte; // Add data to Rx_Buffer
}
HAL_UART_Receive_IT(&huart1, &Rx_byte, 1);
}
The idea is to receive always only one byte and save it into an array. Then somewhere check the number of received bytes or some pattern check, etc and then process the received frame.
On the other side, if the number of bytes is always same, you can change the "HAL_UART_Receive_IT" function and set the correct bytes count.

AsyncTCP on ESP32 and Odd Heap/Socket Issues w/SOFTAP

I'm struggling with an issue where an ESP32 is running as a AP with AsyncTCP connecting multiple ESP32 clients. The AP receives some JSON data and replies with some JSON data. Without the handleData() function, the code runs 100% fine with no issues. Heap is static when no clients connect and issues only occur when clients start connecting.
Can anyone see anything with my code that could be causing heap corruption or other memory weirdness?
static void handleData(void* arg, AsyncClient* client, void *data, size_t len) {
int i = 0, j = 0;
char clientData[CLIENT_DATA_MAX];
char packetData[len];
char *packetBuf;
packetBuf = (char *)data;
clientData[0] = '\0';
for (i=0;i <= len;i++) {
packetData[j] = packetBuf[i]; //packetBuf[i];
if ((packetData[j] == '\n') || (i == len)) {
packetData[j] = '\0';
if ((j > 0) && (packetData[0] != '\n') && (packetData[0] != '\r')) {
// See sensorData() below...
parseData.function(packetData, clientData);
if (clientData != NULL) {
// TCP reply to client
if (client->space() > 32 && client->canSend()) {
client->write(clientData);
}
}
}
j = 0;
} else
j++;
}
}
void sensorData(void *data, void *retData) {
StaticJsonDocument<CLIENT_DATA_MAX> fields;
StaticJsonDocument<CLIENT_DATA_MAX> output;
char sensor[15] = "\0";
char MAC[18] = "\0";
char value[20] = "\0";
bool sendOK = false;
memcpy((char *)retData, "\0", 1);
DeserializationError error = deserializeJson(fields, (char *)data, CLIENT_DATA_MAX);
if (error) {
DEBUG_PRINTLN(F("deserializeJson() failed"));
return;
}
if (fields["type"])
strcpy(sensor, fields["type"]);
switch (sensor[0]) {
case 'C':
if (fields["value"])
strcpy(value, fields["value"]);
sendOK = true;
break;
case 'T': //DEBUG_PRINT(F("Temp "));
setExtTempSensor(fields["value"]);
sendOK = true;
break;
case 'N':
output["IT"] = intTempC; //Internal temp
output["B1"] = battLevels[0];
serializeJson(output, (char *)retData, CLIENT_DATA_MAX-1);
break;
}
if (sendOK) {
output["Resp"] = "Ok";
serializeJson(output, (char *)retData, CLIENT_DATA_MAX-1);
}
strcat((char *)retData, "\n");
}
static void handleNewClient(void* arg, AsyncClient* client) {
client->setRxTimeout(1000);
client->setAckTimeout(500);
client->onData(&handleData, NULL);
client->onError(&handleError, NULL);
client->onDisconnect(&handleDisconnect, NULL);
client->onTimeout(&handleTimeOut, NULL);
}
void startServer() {
server = new AsyncServer(WIFI_SERVER_PORT);
server->onClient(&handleNewClient, &server)
}
Using AsyncTCP on the ESP32 was having multiple issues. Heap issues, socket issues, assert issues, ACK timeouts, connection timeouts, etc. Swapping to AsyncUDP using the exact same code as shown above with romkey's changes, resolved all of my issues. (Just using romkey's fixes did not fix the errors I was having with AsyncTCP.) I don't believe the issue is with AsyncTCP but with ESP32 libraries.
Either you should declare packetData to be of length len + 1 or your for loop should iterate until i < len. Because the index starts at 0, packetData[len] is actually byte len + 1, so you'll overwrite something random when you store something in packetData[len] if the array is only len chars long.That something random may be the pointer stored in packetBuf, which could easily cause heap corruption.
You should always use strncpy() and never strcpy(). Likewise use strncat() rather than strcat(). Don't depend on having done the math correctly or on sizes not changing as your code evolves. strncpy() and strncat() will guard against overflows. You'll need to pass a length into sensorData() to do that, but sensorData() shouldn't be making assumptions about the available length of retData.
Your test
if (clientData != NULL) {
will never fail because clientData is the address of array and cannot change. I'm not sure what you're trying to test for here but this if will always succeed.
You can just write:
char sensor[15] = "";
you don't need to explicitly assign a string with a null byte in it.
And
memcpy((char *)retData, "\0", 1);
is equivalent to
((char *)retData)[0] = '\0';
What's the point of declaring retData to be void * in the arguments to sensorData()? Your code starts out with it being a char* before calling sensorData() and uses it as a char* inside sensorData(). void * is meant to be an escape hatch for passing around pointers without worrying about their type. You don't need that here and end up needing to extra casts back to char* because of it. Just declare the argument to be char* and don't worry about casting it again.
You didn't share the code that calls handleData() so there may well be issues outside of these functions.

Arduino Dynamic Two-dimensional array

I'm working on an Arduino project where I need to build (and work with) a two-dimensional array at runtime. I've been poking around looking for a solution, but I've had no luck. I found an example of a dynamic one-dimentional array helper here: http://playground.arduino.cc/Code/DynamicArrayHelper, so i've been trying to adopt that code for my use. I created a library using the following code:
My Header file:
#ifndef Dynamic2DArray_h
#define Dynamic2DArray_h
#include "Arduino.h"
class Dynamic2DArray
{
public:
Dynamic2DArray( bool sorted );
//Add an integer pair to the array
bool add( int v1, int v2);
//Clear out (empty) the array
bool clear();
//Get the array item in the specified row, column
int getValue(int row, int col);
//Get the number of rows in the array
int length();
private:
int _rows;
void * _slots;
bool _sorted;
void _sort();
};
#endif
The library's code:
#include "Arduino.h"
#include "Dynamic2DArray.h"
#define ARRAY_COLUMNS 2
int _rows;
void * _slots;
bool _sorted;
Dynamic2DArray::Dynamic2DArray(bool sorted) {
//Set our local value indicating where we're supposed to
//sort or not
_sorted = sorted;
//Initialize the row count so it starts at zero
_rows = 0;
}
bool Dynamic2DArray::add( int v1, int v2) {
//Add the values to the array
//implementation adapted from http://playground.arduino.cc/Code/DynamicArrayHelper
//Allocate memory based on the size of the current array rows plus one (the new row)
int elementSize = sizeof(int) * ARRAY_COLUMNS;
//calculate how much memory the current array is using
int currentBufferSize = elementSize * _rows;
//calculate how much memory the new array will use
int newBufferSize = elementSize * (_rows + 1);
//allocate memory for the new array (which should be bigger than the old one)
void * newArray = malloc ( newBufferSize );
//Does newArray not point to something (a memory address)?
if (newArray == 0) {
//Then malloc failed, so return false
return false;
}
// copy the data from the old array, to the new array
for (int idx = 0; idx < currentBufferSize ; idx++)
{
((byte*)newArray)[idx] = ((byte *)_slots)[idx];
}
// free the original array
if (_slots != NULL)
{
free(_slots);
}
// clear the newly allocated memory space (the new row)
for (int idx = currentBufferSize; idx < newBufferSize; idx++)
{
((byte *)newArray)[idx] = 0;
}
// Store the number of rows the memory is allocated for
_rows = ++_rows;
// set the array to the newly created array
_slots = newArray;
//Free up the memory used by the new array
free(newArray);
//If the array's supposed to be sorted,
//then sort it
if (_sorted) {
_sort();
}
// success
return true;
};
int Dynamic2DArray::length() {
return _rows;
};
bool Dynamic2DArray::clear() {
//Free up the memory allocated to the _slots array
free(_slots);
//And zero out the row count
_rows = 0;
};
int Dynamic2DArray::getValue(int row, int col) {
//do we have a valid row/col?
if ((row < _rows) && (col < ARRAY_COLUMNS)) {
//Return the array value at that row/col
return _slots[row][col];
} else {
//No? Then there's nothing we can do here
return -1;
}
};
//Sorted probably doesn't matter, I can probably ignore this one
void _sort() {
}
The initial assignment of the _slots value is giving me problems, I don't know how to define it so this code builds. The _slots variable is supposed to point to the dynamic array, but I've got it wrong.
When I try to compile the code into my project's code, I get the following:
Arduino: 1.8.0 (Windows 10), Board: "Pro Trinket 3V/12MHz (USB)"
sketch\Dynamic2DArray.cpp: In member function 'int Dynamic2DArray::getValue(int, int)':
sketch\Dynamic2DArray.cpp:83:22: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith]
return _slots[row][col];
^
Dynamic2DArray.cpp:83: error: 'void*' is not a pointer-to-object type
Can someone please help me fix this code? I've posted the files to https://github.com/johnwargo/Arduino-Dynamic-2D-Array-Lib.
The code you took was for a 1D dynamic array; the modifications for a 2D array are too tricky. Give up these horrors.
I think there is no reason you use dynamic array. You can assume that size max is ROW_MAX * COL_MAX, so you can define a static array int array[ROW_MAX][COL_MAX].
on one hand if you defined a dynamic array, you could free space when you dont use it anymore and take advantage of it for other work. I dont know if this is your case.
on the other hand if you define a static array (on UNO), you have 32kB available on program space, instead of 2kB available on RAM.
Because of the difference 32kB / 2kB, there are very few chances you can get bigger array with dynamic allocation.

Multiple read-write synchronization issues in opencl local and global memories

I have an opencl kernel that finds the maximum ASCII character in a string.
The problem is I cannot synchronize the multiple read-writes to global and local memories.
I am trying to update a local_maximum character in shared memory, and at the end of the workgroup (last thread), the global_maximum character, by comparing it with the local_maximum. The threads are writing one over another, I guess.
eg: Input string: "pirates of the carribean".
Output String: 'r' (but it should be 's').
Please have a look at the code and give a solution as to what I can do to get everything synchronized. I am sure people having sound knowledge can understand the code. Optimization tips are welcome.
The code is below:
__kernel void find_highest_ascii( __global const char* data, __global char* result, unsigned int size, __local char* localMaxC )
{
//creating variables and initialising..
unsigned int i, localSize, globalSize, j;
char privateMaxC,temp,temp1;
i = get_global_id(0);
localSize = get_local_size(0);
globalSize = get_global_size(0);
privateMaxC = '\0';
if(i<size){
if(i == 0)
read_mem_fence( CLK_LOCAL_MEM_FENCE );
*localMaxC = '\0';
mem_fence( CLK_LOCAL_MEM_FENCE);
////////////////////////////////////////////////////
/////UPDATING PRIVATE MAX CHARACTER/////////////////
////////////////////////////////////////////////////
for( j = i; j<size; j+=globalSize )
{
if( data[j] > privateMaxC )
{
privateMaxC = data[j];
}
}
///////////////////////////////////////////////////
///////////////////////////////////////////////////
////UPDATING SHARED MAX CHARACTER//////////////////
///////////////////////////////////////////////////
temp = *localMaxC;
read_mem_fence( CLK_LOCAL_MEM_FENCE );
if(privateMaxC>temp)
{
*localMaxC = privateMaxC;
write_mem_fence( CLK_LOCAL_MEM_FENCE );
temp = privateMaxC;
}
//////////////////////////////////////////////////
//UPDATING GLOBAL MAX CHARACTER.
temp1 = *result;
if(( (i+1)%localSize == 0 || i==size-1) && (temp > temp1 ))
{
read_mem_fence( CLK_GLOBAL_MEM_FENCE );
*result = temp;
write_mem_fence( CLK_GLOBAL_MEM_FENCE );
}
}
}
You are correct that threads will be overwriting each other's values, since your code is riddled with race conditions. In OpenCL, there is no way to synchronise between work-items that are in different work-groups. Instead of trying to achieve this kind of synchronisation with explicit fences, you can make your code much simpler by using the built-in atomic functions instead. In particular, there is an atomic_max built-in which solves your problem perfectly.
So, instead of the code you currently have to update both your local and global memory maximum values, just do something like this:
kernel void ascii_max(global int *input, global int *output, int size,
local int *localMax)
{
int i = get_global_id(0);
int l = get_local_id(0);
// Private reduction
int privateMax = '\0';
for (int idx = i; idx < size; idx+=get_global_size(0))
{
privateMax = max(privateMax, input[idx]);
}
// Local reduction
atomic_max(localMax, privateMax);
barrier(CLK_LOCAL_MEM_FENCE);
// Global reduction
if (l == 0)
{
atomic_max(output, *localMax);
}
}
This will require you to update your local memory scratch space and final result to use 32-bit integer values, but on the whole is a significantly cleaner approach to solving this problem (not to mention it actually works).
NON-ATOMIC SOLUTION
If you really don't want to use atomics, then you can implement a bog-standard reduction using local memory and work-group barriers. Here's an example:
kernel void ascii_max(global int *input, global int *output, int size,
local int *localMax)
{
int i = get_global_id(0);
int l = get_local_id(0);
// Private reduction
int privateMax = '\0';
for (int idx = i; idx < size; idx+=get_global_size(0))
{
privateMax = max(privateMax, input[idx]);
}
// Local reduction
localMax[l] = privateMax;
for (int offset = get_local_size(0)/2; offset > 1; offset>>=1)
{
barrier(CLK_LOCAL_MEM_FENCE);
if (l < offset)
{
localMax[l] = max(localMax[l], localMax[l+offset]);
}
}
// Store work-group result in global memory
if (l == 0)
{
output[get_group_id(0)] = max(localMax[0], localMax[1]);
}
}
This compares pairs of elements at a time using local memory as a scratch space. Each work-group will produce a single result, which is stored in global memory. If your data-set is small, you could run this with a single work-group (i.e. make global and local sizes the same), and this will work just fine. If it is larger, you could run a two-stage reduction by running this kernel twice, e.g.:
size_t N = ...; // something big
size_t local = 128;
size_t global = local*local; // Must result in at most 'local' number of work-groups
// First pass - run many work-groups using temporary buffer as output
clSetKernelArg(kernel, 1, sizeof(cl_mem), d_temp);
clEnqueueNDRangeKernel(..., &global, &local, ...);
// Second pass - run one work-group with temporary buffer as input
global = local;
clSetKernelArg(kernel, 0, sizeof(cl_mem), d_temp);
clSetKernelArg(kernel, 1, sizeof(cl_mem), d_output);
clEnqueueNDRangeKernel(..., &global, &local, ...);
I'll leave it to you to run them and decide which approach would be best for your own data-set.

Program fails when trying to add a pointer to an array inside a function (C)

I cannot get this code to work properly. When I try to compile it, one of three things will happen: Either I'll get no errors, but when I run the program, it immediately locks up; or it'll compile fine, but says 'Segmentation fault' and exits when I run it; or it gives warnings when compiled:
"conflicting types for ‘addObjToTree’
previous implicit declaration of ‘addObjToTree’ was here"
but then says 'Segmentation fault' and exits when I try to run it.
I'm on Mac OS X 10.6 using gcc.
game-obj.h:
typedef struct itemPos {
float x;
float y;
} itemPos;
typedef struct gameObject {
itemPos loc;
int uid;
int kind;
int isEmpty;
...
} gameObject;
internal-routines.h:
void addObjToTree (gameObject *targetObj, gameObject *destTree[]) {
int i = 0;
int stop = 1;
while (stop) {
if ((*destTree[i]).isEmpty == 0)
i++;
else if ((*destTree[i]).isEmpty == 1)
stop = 0;
else
;/*ERROR*/
}
if (stop == 0) {
destTree[i] = targetObj;
}
else
{
;/*ERROR*/
}
}
/**/
void initFS_LA (gameObject *target, gameObject *tree[], itemPos destination) {
addObjToTree(target, tree);
(*target).uid = 12981;
(*target).kind = 101;
(*target).isEmpty = 0;
(*target).maxHealth = 100;
(*target).absMaxHealth = 200;
(*target).curHealth = 100;
(*target).skill = 1;
(*target).isSolid = 1;
(*target).factionID = 555;
(*target).loc.x = destination.x;
(*target).loc.y = destination.y;
}
main.c:
#include "game-obj.h"
#include "internal-routines.h"
#include <stdio.h>
int main()
{
gameObject abc;
gameObject jkl;
abc.kind = 101;
abc.uid = 1000;
itemPos aloc;
aloc.x = 10;
aloc.y = 15;
gameObject *masterTree[3];
masterTree[0] = &(abc);
initFS_LA(&jkl, masterTree, aloc);
printf("%d\n",jkl.factionID);
return 0;
}
I don't understand why it doesn't work. I just want addObjToTree(...) to add a pointer to a gameObject in the next free space of masterTree, which is an array of pointers to gameObject structures. even weirder, if I remove the line addObjToTree(target, tree); from initFS_LA(...) it works perfectly. I've already created a function that searches masterTree by uid and that also works fine, even if I initialize a new gameObject with initFS_LA(...) (without the addObjToTree line.) I've tried rearranging the functions within the header file, putting them into separate header files, prototyping them, rearranging the order of #includes, explicitly creating a pointer variable instead of using &jkl, but absolutely nothing works. Any ideas? I appreciate any help
If I see this correctly, then you don't initialize elements 1 and 2 of the masterTree array anywhere. Then, your addObjToTree() function searches the - uninitialized - array for a free element.
Declaring a variable like gameObject *masterTree[3]; in C does not zero-initialize the array. Add some memset (masterTree, 0, sizeof (masterTree)); to initialize.
Note that you're declaring an array of pointers to structs here, not an array of structs (see also here), so you also need to adjust your addObjToTree() to check for a NULL-pointer instead of isEmpty.
It would also be good practice to pass the length of that array to that function to avoid buffer overruns.
If you want an array of structs, then you need to declare it as gameObject masterTree[3]; and the parameter in your addObjToTree() becomes gameObject *tree.

Resources