Using bluetooth with qt in linux - qt

I've wrote a program in C to connect the pc with a device by bluetooth. The program runs from terminal and the data received is shown in terminal as well. So far so good.
Now I've created a gui in qt, in which the main aim is to present the information which was before shown in terminal, now in qwtplots.
Well, I can so far connect the device with pc with the gui, but when I request the information form the device, it is shown in the terminal but the gui starts non responding.
here's the slot that requests the information from the device:
// Main Bluetooth
void gui::main_b()
{
// BLUETOOTH STUFF
int status, bytes_read;
int conta = 0;
FILE *data = NULL;
fd_set readmask;
struct timeval tv;
char buf[101];
int v, v1, v2;
tv.tv_sec = 0;
tv.tv_usec = 100000;
// Standard messages
char *startstr = "#START,0060,FF,12;";
write (sock, startstr, strlen (startstr));
data = fopen ("data.txt", "w");
while (conta < 100)
{
int i;
memset (buf, 0, 100);
FD_ZERO (&readmask);
FD_SET (sock, &readmask);
if (select (255, &readmask, NULL, NULL, &tv) > 0)
{
if (FD_ISSET (sock, &readmask))
{
int numb;
numb = read (sock, buf, 100);
// 12 bits
if (ui->comboBox->currentIndex() == 1)
{
if (numb == 14)
{
conta++;
//printf ("received %d bytes:\n", numb);
// print of counter
//printf ("%d,", buf[0]);
fprintf (data, "%d,", buf[0]);
for (i = 1; i < numb-1; i += 3)
{
v1 = buf[i] | ((buf[i + 1] & 0x0F) << 8);
v2 = buf[i + 2];
v2 = (v2 << 4) | ((buf[i + 1] & 0xf0) >> 4);
printf ("%d,%d,", v1, v2);
//fprintf (data, "%d,%d,", v1, v2);
}
printf ("\n");
//fprintf (data, "\n");
}
}
}
}
}
fclose (data);
}
so, when i click the button which calls this slot, it will never let me use the gui again.
This works in terminal.
thanks in advance.

Instead of your own select, you should use QSocketNotifier class and give your own file handles for Qt event loop.
You can also use this overload of QFile::open to turn your socket into a QIODevice instance.
Third choice is to put your own select loop into a different thread, so it does not block the Qt main event loop. But that is going to bring quite a lot of extra complexity, so I'd do that only as a last resort.

You are running the while loop in the same thread as the GUI so the event queue is blocked. You have two choices:
During the loop, call QCoreApplication::processEvents(). This forces the event queue to be processed.
Separate the while loop logic into it's own thread.
The first one is much simpler, but is generally considered inefficient, as just all about all computers have multiple cores.

Related

How to use read() to get the adress of a pointer?

I have 2 programs communicating with each other through a fifo, one's the writer the other's the reader.
The writer sends a pointer to a struct containing information.
The reader should receive the pointer and be able to see the information inside the struct.
Header file:
typedef struct req{
int _code;
char _client_pipe[PIPENAME];
char _box_name[BOXNAME];
} request;
/*writes to pipe tx a pointer with information*/
void send_request(int tx, request *r1) {
ssize_t ret = write(tx, &r1, sizeof(r1));
if (ret < 0) {
fprintf(stdout, "ERROR: %s\n", ERROR_WRITING_PIPE);
exit(EXIT_FAILURE);
}
}
/*Returns a pointer to a struct containing the request*/
request *serialize(int code, char* client_pipe, char* box_name){
request *r1 = (request*) malloc(sizeof(request));
r1->_code = code;
strcpy(r1->_client_pipe, client_pipe);
strcpy(r1->_box_name, box_name);
return r1;
}
Program writer:
int main(int argc, char **argv){
(void *) argc; // in my program i used argc, but for this problem it's not important hence why the //typecast to void
char register_pipe[PIPENAME];
char personal_pipe[PIPENAME];
char box_name[BOXNAME];
strcpy(register_pipe, argv[1]);
strcpy(personal_pipe, argv[2]);
strcpy(box_name, argv[3]);
int reg_pipe = open(register_pipe, O_WRONLY);
if (reg_pipe == -1) {
fprintf(stdout, "ERROR: %s\n", UNEXISTENT_PIPE);
return -1;
}
send_request(reg_pipe, serialize(1, personal_pipe, box_name));
}
Program reader:
char register_pipe[PIPENAME];
strcpy(register_pipe, argv[1]);
if(mkfifo(register_pipe, 0644) < 0)
exit(EXIT_FAILURE);
if ((reg_pipe = open(register_pipe, O_RDONLY)) < 0){
exit(EXIT_FAILURE);
}
if ((reg_pipe = open(register_pipe, O_RDONLY)) < 0){
exit(EXIT_FAILURE);
}
request* buffer = (request*) malloc(sizeof(request)); //this might be the issue but not sure
ssize_t broker_read= read(reg_pipe, buffer, 256); //is not reading correctly
printf("%d, %s, %s\n", buffer->_code, buffer->_client_pipe, buffer->_box_name);
So if i start program reader and set register pipe as "reg", this will create the register pipe and wait for someone to join it.
Then if i start the program writer like ./writer reg personal box
this will open the reg pipe correctly, create a struct of type request and then sent it to the reader.
The reader should receive a pointer to a struct req set like:
_code = 1;
_client_pipe[PIPENAME] = "personal";
_box_name[BOXNAME] = "box";
The reader is in fact receiving but for some reason it's not receiving correctly.
If i try to print like in the last line, it will output some random numbers and letters.
How can i fix this?
You would need to have that structure exist inside a shared memory region that you have arranged to be mapped into both processes at the same address.
Without some such arrangement, each process has a private address space, so an address known to process A is meaningless to process B.
How to make such an arrangement is very much dependent upon you operating system, and perhaps even variant of said operating system.
You will likely find it easier to just copy the structure, as opposed to its address, via the fifo.

Receiving a string through UART in STM32F4

I've written this code to receive a series of char variable through USART6 and have them stored in a string. But the problem is first received value is just a junk! Any help would be appreciated in advance.
while(1)
{
//memset(RxBuffer, 0, sizeof(RxBuffer));
i = 0;
requestRead(&dt, 1);
RxBuffer[i++] = dt;
while (i < 11)
{
requestRead(&dt, 1);
RxBuffer[i++] = dt;
HAL_Delay(5);
}
function prototype
static void requestRead(char *buffer, uint16_t length)
{
while (HAL_UART_Receive_IT(&huart6, buffer, length) != HAL_OK)
HAL_Delay(10);
}
First of all, the HAL_Delay seems to be redundant. Is there any particular reason for it?
The HAL_UART_Receive_IT function is used for non-blocking mode. What you have written seems to be more like blocking mode, which uses the HAL_UART_Receive function.
Also, I belive you need something like this:
Somewhere in the main:
// global variables
volatile uint8_t Rx_byte;
volatile uint8_t Rx_data[10];
volatile uint8_t Rx_indx = 0;
HAL_UART_Receive_IT(&huart1, &Rx_byte, 1);
And then the callback function:
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
if (huart->Instance == UART1) { // Current UART
Rx_data[Rx_indx++] = Rx_byte; // Add data to Rx_Buffer
}
HAL_UART_Receive_IT(&huart1, &Rx_byte, 1);
}
The idea is to receive always only one byte and save it into an array. Then somewhere check the number of received bytes or some pattern check, etc and then process the received frame.
On the other side, if the number of bytes is always same, you can change the "HAL_UART_Receive_IT" function and set the correct bytes count.

AsyncTCP on ESP32 and Odd Heap/Socket Issues w/SOFTAP

I'm struggling with an issue where an ESP32 is running as a AP with AsyncTCP connecting multiple ESP32 clients. The AP receives some JSON data and replies with some JSON data. Without the handleData() function, the code runs 100% fine with no issues. Heap is static when no clients connect and issues only occur when clients start connecting.
Can anyone see anything with my code that could be causing heap corruption or other memory weirdness?
static void handleData(void* arg, AsyncClient* client, void *data, size_t len) {
int i = 0, j = 0;
char clientData[CLIENT_DATA_MAX];
char packetData[len];
char *packetBuf;
packetBuf = (char *)data;
clientData[0] = '\0';
for (i=0;i <= len;i++) {
packetData[j] = packetBuf[i]; //packetBuf[i];
if ((packetData[j] == '\n') || (i == len)) {
packetData[j] = '\0';
if ((j > 0) && (packetData[0] != '\n') && (packetData[0] != '\r')) {
// See sensorData() below...
parseData.function(packetData, clientData);
if (clientData != NULL) {
// TCP reply to client
if (client->space() > 32 && client->canSend()) {
client->write(clientData);
}
}
}
j = 0;
} else
j++;
}
}
void sensorData(void *data, void *retData) {
StaticJsonDocument<CLIENT_DATA_MAX> fields;
StaticJsonDocument<CLIENT_DATA_MAX> output;
char sensor[15] = "\0";
char MAC[18] = "\0";
char value[20] = "\0";
bool sendOK = false;
memcpy((char *)retData, "\0", 1);
DeserializationError error = deserializeJson(fields, (char *)data, CLIENT_DATA_MAX);
if (error) {
DEBUG_PRINTLN(F("deserializeJson() failed"));
return;
}
if (fields["type"])
strcpy(sensor, fields["type"]);
switch (sensor[0]) {
case 'C':
if (fields["value"])
strcpy(value, fields["value"]);
sendOK = true;
break;
case 'T': //DEBUG_PRINT(F("Temp "));
setExtTempSensor(fields["value"]);
sendOK = true;
break;
case 'N':
output["IT"] = intTempC; //Internal temp
output["B1"] = battLevels[0];
serializeJson(output, (char *)retData, CLIENT_DATA_MAX-1);
break;
}
if (sendOK) {
output["Resp"] = "Ok";
serializeJson(output, (char *)retData, CLIENT_DATA_MAX-1);
}
strcat((char *)retData, "\n");
}
static void handleNewClient(void* arg, AsyncClient* client) {
client->setRxTimeout(1000);
client->setAckTimeout(500);
client->onData(&handleData, NULL);
client->onError(&handleError, NULL);
client->onDisconnect(&handleDisconnect, NULL);
client->onTimeout(&handleTimeOut, NULL);
}
void startServer() {
server = new AsyncServer(WIFI_SERVER_PORT);
server->onClient(&handleNewClient, &server)
}
Using AsyncTCP on the ESP32 was having multiple issues. Heap issues, socket issues, assert issues, ACK timeouts, connection timeouts, etc. Swapping to AsyncUDP using the exact same code as shown above with romkey's changes, resolved all of my issues. (Just using romkey's fixes did not fix the errors I was having with AsyncTCP.) I don't believe the issue is with AsyncTCP but with ESP32 libraries.
Either you should declare packetData to be of length len + 1 or your for loop should iterate until i < len. Because the index starts at 0, packetData[len] is actually byte len + 1, so you'll overwrite something random when you store something in packetData[len] if the array is only len chars long.That something random may be the pointer stored in packetBuf, which could easily cause heap corruption.
You should always use strncpy() and never strcpy(). Likewise use strncat() rather than strcat(). Don't depend on having done the math correctly or on sizes not changing as your code evolves. strncpy() and strncat() will guard against overflows. You'll need to pass a length into sensorData() to do that, but sensorData() shouldn't be making assumptions about the available length of retData.
Your test
if (clientData != NULL) {
will never fail because clientData is the address of array and cannot change. I'm not sure what you're trying to test for here but this if will always succeed.
You can just write:
char sensor[15] = "";
you don't need to explicitly assign a string with a null byte in it.
And
memcpy((char *)retData, "\0", 1);
is equivalent to
((char *)retData)[0] = '\0';
What's the point of declaring retData to be void * in the arguments to sensorData()? Your code starts out with it being a char* before calling sensorData() and uses it as a char* inside sensorData(). void * is meant to be an escape hatch for passing around pointers without worrying about their type. You don't need that here and end up needing to extra casts back to char* because of it. Just declare the argument to be char* and don't worry about casting it again.
You didn't share the code that calls handleData() so there may well be issues outside of these functions.

OpenCL, Understanding VectorAdd program

I'm new to OpenCL, with very limited background in C/C++.
I've been given this OpenCL program that adds two vectors, and supposed to figure out how it works. It comes from Intel:
https://www.intel.com/content/www/us/en/programmable/support/support-resources/design-examples/design-software/opencl/vector-addition.html
Would it be correct to say: each kernel uses 1 element from A and 1 element from B to calculate 1 element of Z?
To me, it looks like it determines the number of devices (num_devices), and essentially divides the problem size (N) by num_devices, to determine the number of elements per device (n_per_device[]). Then it creates arrays of random numbers for each device (input_a[] and input_b[]) with n_per_device number of elements.
Then these arrays are used by the kernel, where addition of the whole array is performed and stored as Z.
For example, say if the number of devices available is 1000, and problem size (N) is 1,000,000; the n_per_device is 1000 (and since there is no remainder it is the same for all), and it would generate 1000 arrays of input_a and input_b, with 1000 elements in each. Then a respective pair of arrays of 1000 elements are taken by the kernel and added together - in other words each execution of the kernel adds 1000 elements?
Am I following anything, or totally wrong here?
The kernel is:
// ACL kernel for adding two input vectors
__kernel void vectorAdd(__global const float *x,
__global const float *y,
__global float *restrict z)
{
// get index of the work item
int index = get_global_id(0);
// add the vector elements
z[index] = x[index] + y[index];
}
The host (main) code is (sorry it is long, not sure what's not important):
///////////////////////////////////////////////////////////////////////////////////
// This host program executes a vector addition kernel to perform:
// C = A + B
// where A, B and C are vectors with N elements.
//
// This host program supports partitioning the problem across multiple OpenCL
// devices if available. If there are M available devices, the problem is
// divided so that each device operates on N/M points. The host program
// assumes that all devices are of the same type (that is, the same binary can
// be used), but the code can be generalized to support different device types
// easily.
//
// Verification is performed against the same computation on the host CPU.
///////////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "CL/opencl.h"
#include "AOCL_Utils.h"
using namespace aocl_utils;
// OpenCL runtime configuration
cl_platform_id platform = NULL;
unsigned num_devices = 0;
scoped_array<cl_device_id> device; // num_devices elements
cl_context context = NULL;
scoped_array<cl_command_queue> queue; // num_devices elements
cl_program program = NULL;
scoped_array<cl_kernel> kernel; // num_devices elements
scoped_array<cl_mem> input_a_buf; // num_devices elements
scoped_array<cl_mem> input_b_buf; // num_devices elements
scoped_array<cl_mem> output_buf; // num_devices elements
// Problem data.
const unsigned N = 1000000; // problem size
scoped_array<scoped_aligned_ptr<float> > input_a, input_b; // num_devices elements
scoped_array<scoped_aligned_ptr<float> > output; // num_devices elements
scoped_array<scoped_array<float> > ref_output; // num_devices elements
scoped_array<unsigned> n_per_device; // num_devices elements
// Function prototypes
float rand_float();
bool init_opencl();
void init_problem();
void run();
void cleanup();
// Entry point.
int main() {
// Initialize OpenCL.
if(!init_opencl()) {
return -1;
}
// Initialize the problem data.
// Requires the number of devices to be known.
init_problem();
// Run the kernel.
run();
// Free the resources allocated
cleanup();
return 0;
}
/////// HELPER FUNCTIONS ///////
// Randomly generate a floating-point number between -10 and 10.
float rand_float() {
return float(rand()) / float(RAND_MAX) * 20.0f - 10.0f;
}
// Initializes the OpenCL objects.
bool init_opencl() {
cl_int status;
printf("Initializing OpenCL\n");
if(!setCwdToExeDir()) {
return false;
}
// Get the OpenCL platform.
platform = findPlatform("Altera");
if(platform == NULL) {
printf("ERROR: Unable to find Altera OpenCL platform.\n");
return false;
}
// Query the available OpenCL device.
device.reset(getDevices(platform, CL_DEVICE_TYPE_ALL, &num_devices));
printf("Platform: %s\n", getPlatformName(platform).c_str());
printf("Using %d device(s)\n", num_devices);
for(unsigned i = 0; i < num_devices; ++i) {
printf(" %s\n", getDeviceName(device[i]).c_str());
}
// Create the context.
context = clCreateContext(NULL, num_devices, device, NULL, NULL, &status);
checkError(status, "Failed to create context");
// Create the program for all device. Use the first device as the
// representative device (assuming all device are of the same type).
std::string binary_file = getBoardBinaryFile("vectorAdd", device[0]);
printf("Using AOCX: %s\n", binary_file.c_str());
program = createProgramFromBinary(context, binary_file.c_str(), device, num_devices);
// Build the program that was just created.
status = clBuildProgram(program, 0, NULL, "", NULL, NULL);
checkError(status, "Failed to build program");
// Create per-device objects.
queue.reset(num_devices);
kernel.reset(num_devices);
n_per_device.reset(num_devices);
input_a_buf.reset(num_devices);
input_b_buf.reset(num_devices);
output_buf.reset(num_devices);
for(unsigned i = 0; i < num_devices; ++i) {
// Command queue.
queue[i] = clCreateCommandQueue(context, device[i], CL_QUEUE_PROFILING_ENABLE, &status);
checkError(status, "Failed to create command queue");
// Kernel.
const char *kernel_name = "vectorAdd";
kernel[i] = clCreateKernel(program, kernel_name, &status);
checkError(status, "Failed to create kernel");
// Determine the number of elements processed by this device.
n_per_device[i] = N / num_devices; // number of elements handled by this device
// Spread out the remainder of the elements over the first
// N % num_devices.
if(i < (N % num_devices)) {
n_per_device[i]++;
}
// Input buffers.
input_a_buf[i] = clCreateBuffer(context, CL_MEM_READ_ONLY,
n_per_device[i] * sizeof(float), NULL, &status);
checkError(status, "Failed to create buffer for input A");
input_b_buf[i] = clCreateBuffer(context, CL_MEM_READ_ONLY,
n_per_device[i] * sizeof(float), NULL, &status);
checkError(status, "Failed to create buffer for input B");
// Output buffer.
output_buf[i] = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
n_per_device[i] * sizeof(float), NULL, &status);
checkError(status, "Failed to create buffer for output");
}
return true;
}
// Initialize the data for the problem. Requires num_devices to be known.
void init_problem() {
if(num_devices == 0) {
checkError(-1, "No devices");
}
input_a.reset(num_devices);
input_b.reset(num_devices);
output.reset(num_devices);
ref_output.reset(num_devices);
// Generate input vectors A and B and the reference output consisting
// of a total of N elements.
// We create separate arrays for each device so that each device has an
// aligned buffer.
for(unsigned i = 0; i < num_devices; ++i) {
input_a[i].reset(n_per_device[i]);
input_b[i].reset(n_per_device[i]);
output[i].reset(n_per_device[i]);
ref_output[i].reset(n_per_device[i]);
for(unsigned j = 0; j < n_per_device[i]; ++j) {
input_a[i][j] = rand_float();
input_b[i][j] = rand_float();
ref_output[i][j] = input_a[i][j] + input_b[i][j];
}
}
}
void run() {
cl_int status;
const double start_time = getCurrentTimestamp();
// Launch the problem for each device.
scoped_array<cl_event> kernel_event(num_devices);
scoped_array<cl_event> finish_event(num_devices);
for(unsigned i = 0; i < num_devices; ++i) {
// Transfer inputs to each device. Each of the host buffers supplied to
// clEnqueueWriteBuffer here is already aligned to ensure that DMA is used
// for the host-to-device transfer.
cl_event write_event[2];
status = clEnqueueWriteBuffer(queue[i], input_a_buf[i], CL_FALSE,
0, n_per_device[i] * sizeof(float), input_a[i], 0, NULL, &write_event[0]);
checkError(status, "Failed to transfer input A");
status = clEnqueueWriteBuffer(queue[i], input_b_buf[i], CL_FALSE,
0, n_per_device[i] * sizeof(float), input_b[i], 0, NULL, &write_event[1]);
checkError(status, "Failed to transfer input B");
// Set kernel arguments.
unsigned argi = 0;
status = clSetKernelArg(kernel[i], argi++, sizeof(cl_mem), &input_a_buf[i]);
checkError(status, "Failed to set argument %d", argi - 1);
status = clSetKernelArg(kernel[i], argi++, sizeof(cl_mem), &input_b_buf[i]);
checkError(status, "Failed to set argument %d", argi - 1);
status = clSetKernelArg(kernel[i], argi++, sizeof(cl_mem), &output_buf[i]);
checkError(status, "Failed to set argument %d", argi - 1);
// Enqueue kernel.
// Use a global work size corresponding to the number of elements to add
// for this device.
//
// We don't specify a local work size and let the runtime choose
// (it'll choose to use one work-group with the same size as the global
// work-size).
//
// Events are used to ensure that the kernel is not launched until
// the writes to the input buffers have completed.
const size_t global_work_size = n_per_device[i];
printf("Launching for device %d (%d elements)\n", i, global_work_size);
status = clEnqueueNDRangeKernel(queue[i], kernel[i], 1, NULL,
&global_work_size, NULL, 2, write_event, &kernel_event[i]);
checkError(status, "Failed to launch kernel");
// Read the result. This the final operation.
status = clEnqueueReadBuffer(queue[i], output_buf[i], CL_FALSE,
0, n_per_device[i] * sizeof(float), output[i], 1, &kernel_event[i], &finish_event[i]);
// Release local events.
clReleaseEvent(write_event[0]);
clReleaseEvent(write_event[1]);
}
// Wait for all devices to finish.
clWaitForEvents(num_devices, finish_event);
const double end_time = getCurrentTimestamp();
// Wall-clock time taken.
printf("\nTime: %0.3f ms\n", (end_time - start_time) * 1e3);
// Get kernel times using the OpenCL event profiling API.
for(unsigned i = 0; i < num_devices; ++i) {
cl_ulong time_ns = getStartEndTime(kernel_event[i]);
printf("Kernel time (device %d): %0.3f ms\n", i, double(time_ns) * 1e-6);
}
// Release all events.
for(unsigned i = 0; i < num_devices; ++i) {
clReleaseEvent(kernel_event[i]);
clReleaseEvent(finish_event[i]);
}
// Verify results.
bool pass = true;
for(unsigned i = 0; i < num_devices && pass; ++i) {
for(unsigned j = 0; j < n_per_device[i] && pass; ++j) {
if(fabsf(output[i][j] - ref_output[i][j]) > 1.0e-5f) {
printf("Failed verification # device %d, index %d\nOutput: %f\nReference: %f\n",
i, j, output[i][j], ref_output[i][j]);
pass = false;
}
}
}
printf("\nVerification: %s\n", pass ? "PASS" : "FAIL");
}
// Free the resources allocated during initialization
void cleanup() {
for(unsigned i = 0; i < num_devices; ++i) {
if(kernel && kernel[i]) {
clReleaseKernel(kernel[i]);
}
if(queue && queue[i]) {
clReleaseCommandQueue(queue[i]);
}
if(input_a_buf && input_a_buf[i]) {
clReleaseMemObject(input_a_buf[i]);
}
if(input_b_buf && input_b_buf[i]) {
clReleaseMemObject(input_b_buf[i]);
}
if(output_buf && output_buf[i]) {
clReleaseMemObject(output_buf[i]);
}
}
if(program) {
clReleaseProgram(program);
}
if(context) {
clReleaseContext(context);
}
}
There are a few sub-questions here, so let me try and address them individually. I'm going to be slightly pedantic on terminology; I'm not doing that to be snarky but hopefully this will help you make more sense of documentation, examples, etc.:
Would it be correct to say: each kernel uses 1 element from A and 1 element from B to calculate 1 element of Z?
The kernel is just the code that will run on the OpenCL device. Typically, a kernel is scheduled to run (using clEnqueueNDRangeKernel()) with multiple work-items. With just one work item, there is not much point in bothering with OpenCL at all; the performance benefit comes from massive parallelism. In any case, your quoted statement is correct for each individual work-item processing this kernel. If you run this kernel with 1000 work items, 1000 elements from A will be processed with 1000 elements from B to calculate 1000 elements of Z. The order this happens in is deliberately undefined, and at least groups of elements will be operated on concurrently.
To me, it looks like it determines the number of devices (num_devices), and essentially divides the problem size (N) by num_devices, to determine the number of elements per device (n_per_device[]). Then it creates arrays of random numbers for each device (input_a[] and input_b[]) with n_per_device number of elements.
Yes, it looks like that to me too.
For example, say if the number of devices available is 1000,
I would just like to point out that you will pretty much never have this many OpenCL devices in a system. The granularity of a single OpenCL device is typically "one GPU," or "all the CPU cores in the system," or "one FPGA accelerator card."
So a "normal" amount of devices on a desktop system is 1, 2, or maybe up to about 4 (e.g. CPU + iGPU + dual discrete GPUs). Big irons with many accelerator cards might have ~16 or so. If you're attempting to accelerate some code in a desktop (or small server) application, you'll usually just pick one device that's likely to be the most appropriate for your problem and run with that. Distributing workload evenly across heterogenous devices is a hard problem for anything but the most basic algorithms.
and problem size (N) is 1,000,000; the n_per_device is 1000 (and since there is no remainder it is the same for all), and it would generate 1000 arrays of input_a and input_b, with 1000 elements in each. Then a respective pair of arrays of 1000 elements are taken by the kernel and added together -
Yes.
in other words each execution of the kernel adds 1000 elements?
Again, this is where using the term "kernel" isn't precise enough. In your example, you would enqueue 1000 work items to execute the kernel on each of the 1000 devices.

processing + bitWrite + arduino

I am working with an Arduino and Processing with the Arduino library.
I get the error "The function bitWrite(byte, int, int) does not exist.";
it seams that processing + Arduino bitWrite function are not working together.
its raised due to this line:
arduino.bitWrite(data,desiredPin,desiredState);
my goal in this project is modifying a music reactive sketch to work with shift registers.
Here is my full code:
Arduino_Shift_display
import ddf.minim.*;
import ddf.minim.analysis.*;
import processing.serial.*;
import cc.arduino.*;
int displayNum = 8;
Arduino arduino;
//Set these in the order of frequency - 0th pin is the lowest frequency,
//while the final pin is the highest frequency
int[] lastFired = new int[displayNum];
int datapin = 2;
int clockpin = 3;
int latchpin = 4;
int switchpin = 7;
byte data = 0;
//Change these to mess with the flashing rates
//Sensitivity is the shortest possible interval between beats
//minTimeOn is the minimum time an LED can be on
int sensitivity = 75;
int minTimeOn = 50;
String mode;
String source;
Minim minim;
AudioInput in;
AudioPlayer song;
BeatDetect beat;
//Used to stop flashing if the only signal on the line is random noise
boolean hasInput = false;
float tol = 0.005;
void setup(){
// shift register setup
arduino.pinMode(datapin, arduino.OUTPUT);
arduino.pinMode(clockpin, arduino.OUTPUT);
arduino.pinMode(latchpin, arduino.OUTPUT);
arduino.digitalWrite(switchpin, arduino.HIGH);
//Uncomment the mode/source pair for the desired input
//Shoutcast radio stream
//mode = "radio";
//source = "http://scfire-ntc-aa05.stream.aol.com:80/stream/1018";
//mode = "file";
//source = "/path/to/mp3";
mode = "mic";
source = "";
size(512, 200, P2D);
minim = new Minim(this);
arduino = new Arduino(this, Arduino.list()[1]);
minim = new Minim(this);
if (mode == "file" || mode == "radio"){
song = minim.loadFile(source, 2048);
song.play();
beat = new BeatDetect(song.bufferSize(), song.sampleRate());
beat.setSensitivity(sensitivity);
} else if (mode == "mic"){
in = minim.getLineIn(Minim.STEREO, 2048);
beat = new BeatDetect(in.bufferSize(), in.sampleRate());
beat.setSensitivity(sensitivity);
}
}
void shiftWrite(int desiredPin, int desiredState)
// This function lets you make the shift register outputs
// HIGH or LOW in exactly the same way that you use digitalWrite().
// Like digitalWrite(), this function takes two parameters:
// "desiredPin" is the shift register output pin
// you want to affect (0-7)
// "desiredState" is whether you want that output
// to be HIGH or LOW
// Inside the Arduino, numbers are stored as arrays of "bits",
// each of which is a single 1 or 0 value. Because a "byte" type
// is also eight bits, we'll use a byte (which we named "data"
// at the top of this sketch) to send data to the shift register.
// If a bit in the byte is "1", the output will be HIGH. If the bit
// is "0", the output will be LOW.
// To turn the individual bits in "data" on and off, we'll use
// a new Arduino commands called bitWrite(), which can make
// individual bits in a number 1 or 0.
{
// First we'll alter the global variable "data", changing the
// desired bit to 1 or 0:
arduino.bitWrite(data,desiredPin,desiredState);
// Now we'll actually send that data to the shift register.
// The shiftOut() function does all the hard work of
// manipulating the data and clock pins to move the data
// into the shift register:
arduino.shiftOut(datapin, clockpin, MSBFIRST, data);
// Once the data is in the shift register, we still need to
// make it appear at the outputs. We'll toggle the state of
// the latchPin, which will signal the shift register to "latch"
// the data to the outputs. (Latch activates on the high-to
// -low transition).
arduino.digitalWrite(latchpin, arduino.HIGH);
arduino.digitalWrite(latchpin, arduino.LOW);
}
void draw(){
if (mode == "file" || mode == "radio"){
beat.detect(song.mix);
drawWaveForm((AudioSource)song);
} else if (mode == "mic"){
beat.detect(in.mix);
drawWaveForm((AudioSource)in);
}
if (hasInput){ //hasInput is set within drawWaveForm
for (int i=0; i<displayNum-1; i++){
if ( beat.isRange( i+1, i+1, 1) ){
shiftWrite(i, 1);
lastFired[i] = millis();
} else {
if ((millis() - lastFired[i]) > minTimeOn){
shiftWrite(i, 0);
}
}
}
}
} //End draw method
//Display the input waveform
//This method sets 'hasInput' - if any sample in the signal has a value
//larger than 'tol,' there is a signal and the lights should flash.
//Otherwise, only noise is present and the lights should stay off.
void drawWaveForm(AudioSource src){
background(0);
stroke(255);
hasInput = false;
for(int i = 0; i < src.bufferSize() - 1; i++)
{
line(i, 50 + src.left.get(i)*50, i+1, 50 + src.left.get(i+1)*50);
line(i, 150 + src.right.get(i)*50, i+1, 150 + src.right.get(i+1)*50);
if (!hasInput && (abs(src.left.get(i)) > tol || abs(src.right.get(i)) > tol)){
hasInput = true;
}
}
}
void resetPins(){
for (int i=0; i<ledPins.length; i++){
arduino.digitalWrite(ledPins[i], Arduino.LOW);
}
}
void stop(){
resetPins();
if (mode == "mic"){
in.close();
}
minim.stop();
super.stop();
}
BeatListener
class BeatListener implements AudioListener
{
private BeatDetect beat;
private AudioPlayer source;
BeatListener(BeatDetect beat, AudioPlayer source)
{
this.source = source;
this.source.addListener(this);
this.beat = beat;
}
void samples(float[] samps)
{
beat.detect(source.mix);
}
void samples(float[] sampsL, float[] sampsR)
{
beat.detect(source.mix);
}
}
You can achieve the same thing using standard bitwise operators. To turn a bit on:
data |= 1 << bitNumber;
The right-hand side (1 << bitNumber) is a bit-shift operation to create a suitable bit-mask. It takes the single '1' bit and moves it left until it reaches the desired position. The bitwise-or assignment (|=) combines that new bit-mask with the existing bits in data. This turns the desired bit on, but leaves the rest untouched.
The code to turn a bit off is slightly different:
data &= ~(1 << bitNumber);
You can see the same bit-shift operation here. However, it's preceded by the unary negation operator (~). This swaps all the 1's for 0's, and all the 0's for 1's. The result is the exact opposite of the bit-mask we used before. You can't do a bitwise-or operation this time though, or else you'll turn all the other bits on. The bitwise-and assignment (&=) is used instead to combine this mask with the data variable. This ensures the desired bit is turned off, and the rest are untouched.
In your code, desiredPin is the equivalent of bitNumber.
A full explanation of how bitwise operations work can be quite lengthy. I'd recommend looking for a good tutorial online if you need more help with that.
There are also the bitSet and bitClear Arduino macros that make the code a little more readable than bit shifting and using AND and OR. The format is either bitSet(what_to_modify,bit_number) and bitClear(what_to_modify,bit_number). These translate into very efficient code and can be used to manipulate both, variables and hardware registers. So for example, if you wanted to turn on pin 13 on the Arduino UNO, you would first need to look up that Arduino pin 13 is actually pin 5 on PORTB of the Atmel atmega328 chip. So the command would be:
bitSet(PORTB,5);

Resources