OpenVX Histogram - openvx

I am trying to use OpenVX Histogram (as per Spec 1.1) and little puzzled in the usage part. My understanding is like this (Please correct me):
vx_size numBins = 10;
vx_uint32 offset = 10;
vx_uint32 range = 256;
// Create Object
vx_distribution vx_dist = vxCreateDistribution (Context, numBins, offset,
range);
// Create Node
vx_status status = vxHistogramNode (context, img, vx_dist);
Spec says that vxHistogramNode() takes vx_distribution as [out], does that means vxHistogramNode() creates object internally? If the answer is 'Yes' than how would i pass numBins, Offset and range of my choice?
Also. how can i access the output of histogram result?

The out means that the node will write the result to provided data object. So you pass your object to the node, run the graph and then read the result:
// Create Object
vx_size numBins = 10;
vx_uint32 offset = 10;
vx_uint32 range = 256;
vx_distribution vx_dist = vxCreateDistribution (Context, numBins, offset, range);
vx_graph graph = vxCreateGraph(context);
vxHistogramNode(graph, img, vx_dist);
vxVerifyGraph(graph);
vxProcessGraph(graph);
// Read the data
vx_map_id map_id;
vx_int32 *ptr;
vxMapDistribution(vx_dist, &map_id, (void **)&ptr, VX_READ_ONLY, VX_MEMORY_TYPE_HOST, 0);
// use ptr, like ptr[0]
vxUnmapDistribution(vx_dist, map_id);

Related

Heap Corruption error when calling C from R, can't find the source issue

UPDATE3: Problem is solved but I'm leaving the code here as-is for future reference--I've posted an answer below with the final state of the code in case people wanted to see the final product.
UPDATE2: Refactored to use R_alloc instead of calloc for automated cleanup. Unfortunately the problem persists.
UPDATE: If I add this line right before UNPROTECT(1):
Rprintf("%p %p %p", (void *)rans, (void *)fm, (void *)corrs);
then the function executes with no corrupted heap error. Maybe there's a background garbage collection call that corrupts one of the pointers prior to execution finishing, resulting in a write to a garbage pointer? Important to note here that if I don't print out all three of the pointer addresses, the error comes back.
Also I'm running this on an M1 Mac and compiling with clang via R CMD SHLIB, in case Apple silicon is to blame.
I'm at my wits end trying to debug this issue, and I figured I'd turn to SO for help. I'm writing a function in C to optimize some parts of my R code, and I'm getting a Heap Corruption Error when running the function many times. The function trimCovar() is called from R using the .Call("trimCovar", ...) interface.
I'm having a lot of difficulty debugging this for a few reasons:
I'm on OSX, so I can't use Valgrind
C function depends on inputs from R, so I can't debug the C code on its own
Heap corruption only occurs when calling the function many times within an R function
(just running .Call directly a bunch of times has no errors)
Error point is inconsistent
I start with two sets of vectors, and I condense them into a frequency matrix, where each column is a position in the vector set, and each row is a particular character that appears. I concatenate them into one matrix prior to passing in because it makes pre-processing easier. An toy example of the frequency matrix would be:
INPUT:
v1_1 = 101
v1_2 = 011
v2_1 = 111
v2_2 = 110
Frequency Matrix:
position: | 1_1 | 1_2 | 1_3 | 2_1 | 2_2 | 2_3 |
0: 0.5 0.5 0.0 0.0 0.0 0.5
1: 0.5 0.5 1.0 1.0 1.0 0.5
The goal is to find the NV highest correlated positions across the vector sets, which I do by calculating pairwise KL divergence of positions. These are stored in a linked list sorted in ascending order, and at the end I take the positions corresponding to the first NV entries. The R code I have can deparse everything else, so I really just need a vector of positions at the end (duplicates are allowed).
The function takes in 5 arguments:
fMAT: a frequency matrix (RObject, so gets read in as a flat vector)
fSP : columns in matrix corresponding to positions from the first vector set
sSP : same as fSP but for second vector set
NV : Number of values to return
NR : Number of columns in fMAT
The error returned is:
R(95564,0x104858580) malloc: Heap corruption detected, free list is damaged at 0x600000f10040
*** Incorrect guard value: 4626885667169763328
R(95564,0x104858580) malloc: *** set a breakpoint in malloc_error_break to debug
This only happens when I run an R function that calls this 10+ times, so I'm assuming that I'm just missing one or two small hanging pointers corrupting a memory reference. I've tried running this with gc() called in R immediately after each call, but it doesn't fix the problem. I'm not really sure what else to do at this point, I've tried using lldb but I'm not really sure how to use that program. From running lots of print statements I've determined that it usually crashes in the main loop (identified in code below), but it's inconsistent on when it crashes. I've also tried saving off erroneous inputs--I can rerun them individually with no issues, so it must be something relatively small that only appears over many runs.
Happy to provide more details if it would help. Code is listed at the bottom.
The only thing being allocated here are linked list nodes, and I thought I had free()'d them all prior to returning. I've also double checked the input values, so I'm 99.99% sure that I'm never referencing out of bounds on firstSeqPos, secondSeqPos, ans, or fm. I've also triple checked the R code surrounding this and can confidently say it is not the source of this error.
I haven't coded in C in a long time so I feel like I'm missing something obvious. If I really have to I can try to get ahold of a Linux box to run valgrind, but if there's another option I'd prefer it. Thanks in advance!
Code:
#include <R.h>
#include <Rdefines.h>
#include <Rinternals.h>
#include <math.h>
#include <stdlib.h>
#include <stdbool.h>
typedef struct node {
double data;
int i1;
int i2;
struct node *next;
} node;
// Linked list
// data is the correlation value,
// i1 the position from first vector set,
// i2 the position from second vector set
node *makeNewNode(double data, int i1, int i2){
node *newNode;
newNode = (node *)R_alloc(1, sizeof(node));
newNode->data = data;
newNode->i1 = i1;
newNode->i2 = i2;
newNode->next = NULL;
return(newNode);
}
//insert link in sorted order (ascending)
void insertSorted(node **head, node *toInsert, int maxSize) {
int ctr = 0;
if ((*head) == NULL || (*head)->data >= toInsert->data){
toInsert->next = *head;
*head = toInsert;
} else {
node *temp = *head;
while (temp->next != NULL && temp->next->data < toInsert->data){
temp = temp->next;
if (ctr == maxSize){
// Performance optimization, if we aren't inserting in the first NR
// positions then we can just skip since we only care about the NR
// lowest scores overall
return;
}
ctr += 1;
}
toInsert->next = temp->next;
temp->next = toInsert;
}
}
// MAIN FUNCTION CALLED FROM R
// (This is the one that crashes)
SEXP trimCovar(SEXP fMAT, SEXP fSP, SEXP sSP, SEXP NV, SEXP NR){
// Converting input SEXPs into C-compatible values
int nv = asInteger(NV);
int nr = asInteger(NR);
int sp1l = length(fSP);
int sp2l = length(sSP);
int *firstSeqPos = INTEGER(coerceVector(fSP, INTSXP));
int *secondSeqPos = INTEGER(coerceVector(sSP, INTSXP));
double *fm = REAL(fMAT);
int colv1, colv2;
// Using a linked list for efficient insert
node *corrs = NULL;
int cv1, cv2;
double p1, p2, score=0;
// USUALLY FAILS IN THIS LOOP
for ( int i=0; i<sp1l; i++ ){
cv1 = firstSeqPos[i];
colv1 = (cv1 - 1) * nr;
for ( int j=0; j<sp2l; j++ ){
cv2 = secondSeqPos[j];
colv2 = (cv2 - 1) * nr;
// KL Divergence
score = 0;
for ( int k=0; k<nr; k++){
p1 = fm[colv1 + k];
p2 = fm[colv2 + k];
if (p1 != 0 && p2 != 0){
score += p1 * log(p1 / p2);
}
}
// Add result into LL
node *newNode = makeNewNode(score, cv1, cv2);
insertSorted(&corrs, newNode, nv);
}
R_CheckUserInterrupt();
}
SEXP ans;
PROTECT(ans = allocVector(INTSXP, 2*nv));
int *rans = INTEGER(ans);
int ctr=0;
int pos1, pos2;
node *ptr = corrs;
for ( int i=0; i<nv; i++){
rans[2*i] = ptr->i1;
rans[2*i+1] = ptr->i2;
ptr = ptr->next;
}
UNPROTECT(1);
return(ans);
}
int *firstSeqPos = INTEGER(coerceVector(fSP, INTSXP));
int *secondSeqPos = INTEGER(coerceVector(sSP, INTSXP));
This is not good. The SEXPs returned by the 2 calls to coerceVector() need to be protected. However it's usually considered better practice to do this coercion at the R level right before entering the .Call entry point. Note that if fSP and sSP are integer matrices, there's no need to coerce them to integer as they are already seen as integer vectors at the C level. This also avoids a possibly expensive copy (as.integer() in R and coerceVector() in C both trigger a full copy of the matrix data).
The question was answered above, but I received a couple messages from people asking for the final code, so I'm going to include it as an answer to preserve the original question. There's a couple optimizations here (thanks to #hpages for help and troubleshooting regarding these):
Original code fails because the output of coerceVector() wasn't protected with PROTECT(). I've refactored the R code to check for integer inputs prior to calling this C function to avoid this function call and be more efficient with memory (see the accepted answer for more details).
Original code uses R_alloc(), which gives responsibility to R to clean up memory at the end of the function call. However, this introduces substantial memory overhead during the runtime of the function, since memory allocated to nodes not inserted into the linked list aren't cleared until the end of the function call.
Allocation with calloc() isn't as simple as switching over and calling free() at the end of the function, since we have to guard the case where the user interrupts execution of the program. If an interrupt signal is thrown prior to the end of the function, we'll never free the memory.
Final C Code:
#include <R.h>
#include <Rdefines.h>
#include <Rinternals.h>
#include <math.h>
#include <stdlib.h>
#include <stdbool.h>
typedef struct node {
double data;
int i1;
int i2;
struct node *next;
} node;
// Defining the head as a static so that we can access it globally
// Important for ensuring clean up in case of interrupt
static node *corrs = NULL;
// Function to clean up memory allocations in case of interrupt
void cleanupFxn(){
node *ptr = corrs;
// Free allocated memory in linked list
while (corrs != NULL){
ptr = corrs;
corrs = corrs->next;
free(ptr);
}
}
node *makeNewNode(double data, int i1, int i2){
node *newNode;
// very important to use calloc here so we have control of when we free it
// R_alloc() memory won't be freed until after function finishes execution
newNode = (node *)calloc(1, sizeof(node));
newNode->data = data;
newNode->i1 = i1;
newNode->i2 = i2;
newNode->next = NULL;
return(newNode);
}
// insert link in sorted order
// returns a bool corresponding to if we inserted
bool insertSorted(node **head, node *toInsert, int maxSize) {
int ctr = 0;
if ((*head) == NULL || (*head)->data >= toInsert->data){
toInsert->next = *head;
*head = toInsert;
return(true);
} else {
node *temp = *head;
while (temp->next != NULL && temp->next->data < toInsert->data){
temp = temp->next;
if (ctr == maxSize){
// Performance optimization, if we aren't inserting in the first NR
// positions then we can just skip since we only care about the NR
// lowest scores overall. Saves a huge amount of time and memory.
return(false);
}
ctr += 1;
}
toInsert->next = temp->next;
temp->next = toInsert;
return(true);
}
}
SEXP trimCovar(SEXP fMAT, SEXP fSP, SEXP sSP, SEXP NV, SEXP NR){
// Converting inputs into C-compatible forms
int nv = asInteger(NV);
int nr = asInteger(NR);
int sp1l = length(fSP);
int sp2l = length(sSP);
// Note here we're not using coerceVector() anymore
// typechecking done on R side
int *firstSeqPos = INTEGER(fSP);
int *secondSeqPos = INTEGER(sSP);
double *fm = REAL(fMAT);
int colv1, colv2;
// Using a linked list for efficient insert
corrs = NULL;
int cv1, cv2;
double p1, p2, score=0;
bool success;
for ( int i=0; i<sp1l; i++ ){
cv1 = firstSeqPos[i];
colv1 = (cv1 - 1) * nr;
for ( int j=0; j<sp2l; j++ ){
cv2 = secondSeqPos[j];
colv2 = (cv2 - 1) * nr;
score = 0;
for ( int k=0; k<nr; k++){
p1 = fm[colv1 + k];
p2 = fm[colv2 + k];
if (p1 != 0 && p2 != 0){
score += p1 * log(p1 / p2);
}
}
node *newNode = makeNewNode(score, cv1, cv2);
success = insertSorted(&corrs, newNode, nv);
// If we don't insert, free the associated memory
// I'm checking for NULL here just out of an abundance of caution
if (!success && newNode != NULL){
free(newNode);
newNode = NULL;
}
}
R_CheckUserInterrupt();
}
SEXP ans;
PROTECT(ans = allocVector(INTSXP, 2*nv));
int *rans = INTEGER(ans);
node *ptr=corrs;
for ( int i=0; i<nv; i++){
rans[2*i] = ptr->i1;
rans[2*i+1] = ptr->i2;
ptr = ptr->next;
}
// Free allocated memory in linked list
cleanupFxn();
UNPROTECT(1);
return(ans);
}
Assuming the C file is named trimCovar.c, we'd compile with R CMD SHLIB trimCovar.c.
R Code to run this function:
dyn.load("trimCovar.so")
# Wrapped into a function with on.exit(...) to ensure cleanup
# in the event the user or system interrupts execution early
CorrComp_C <- function(fm, fsp, ssp, nv, nr){
# type checking to ensure input to C is integer vector
# (could probably do more type checking here, mainly for illustration)
stopifnot(is(fsp, 'integer'))
stopifnot(is(ssp, 'integer'))
on.exit(.C("cleanupFxn"))
a <- .Call('trimCovar', fm, fsp, ssp, nv, nr)
return(a)
}

Two pointers pointing to a same memory and realloc failing

void container_row_change(struct brick_win_size *win, int character)
{
row_container *container = &(win->container[win->current_row]);
/*int offset = win->current_column;
char *data = win->container[win->current_row]. data;
if(offset < 0 || offset >= win->col)
offset = container->size;
data = realloc(data, win->container[win->current_row].size + 2 );
memmove(&data[offset + 1], &data[offset], (win->container[win->current_row].size - offset + 1));
data[offset] = character;
win->current_column++;
container->size ++;
*/
int offset = win->current_column;
if(offset < 0 || offset >= win->col)
offset = container->size;
win->container[win->current_row].data = realloc(win->container[win->current_row]. data, win->container[win->current_row]. size + 2);
memmove(&(win->container[win->current_row].data[offset + 1]), &(win->container[win->current_row].data[offset]), win->container[win->current_row].size - offset + 1);
win->container[win->current_row].data[offset] =character;
win->current_column++;
win->container[win->current_row].size ++;
}
Can anybody tell why the commented line fails and not the other one although both are same?
I am wondering is there any errors in the way I assign pointers and reallocate it
In the commented out code, you realloc the local pointer data and never update the .data pointer field in the data structure, so it continues to point at the old (now freed) memory, leading to corruption when you try to use it.
Add the line win->container[win->current_row].data = data; to the commented out code and it will actually be equivalent to the later code.
Note that in either case, if realloc fails, you'll crash -- you should be checking for failure and doing something appropriate (probably printing an error message and trying to exit gracefully).

How do I sample isampler3d in webgl2?

two question. first off, how could I set a particular value in a 3d texture to 1, lets say the y coordinate of the element at index 1,1,1 in the following Int16Array so I could later read it. I think it'd go something like this:
var data = new Int16Array(size * size * size);
data.fill(0);
// ??? (somehow I'd set values of the data array at index 1,1,1 but I'm unsure how)
data ??? = 1;
gl.texImage3D(
gl.TEXTURE_3D,
0,
gl.R16I,
size,
size,
size,
0,
gl.RED_INTEGER,
gl.SHORT,
data);
secondly, later in my fragment shader, how could I grab that value using the GLSL texture function. I think it'd go something like this:
uniform isampler3d t_sampler;
...
ivec4 value = texture( t_sampler , vec3( 1.0 , 1.0 , 1.0 ) );
if( value.y == 1 ){
// do some special stuff
}
any help would be appreciated. again I'm just trying to create my texture using a data array I create and then read that value in the frag shader.
fyi this code is running but failing to get to the "do some special stuff" part.
thanks
// ??? (somehow I'd set values of the data array at index 1,1,1 but I'm unsure how)
data ??? = 1;
const width = ??
const height = ??
const depth = ??
const numChannels = 1; // 1 for RED, 2 for RG, 3 for RGB, 4 for RGBA
const sliceSize = width * height * numChannels;
const rowSize = width * numChannels;
const x = 1;
const y = 1;
const z = 1;
const offset = z * sliceSize + y * rowSize + x;
data[offset] = redValue;
If there are more channels, for example RGBA then
data[offset + 0] = redValue;
data[offset + 1] = greenValue;
data[offset + 2] = blueValue;
data[offset + 3] = alphaValue;
how could I grab that value using the GLSL texture function
To get a specific value from a texture you can use texelFetch with pixel/texel coordinates.
uniform isampler3d t_sampler;
...
int x = 1;
int y = 1;
int z = 1;
int mipLevel = 0;
ivec4 value = texelFetch(t_sampler, ivec3(x, y, z), mipLevel);
if( value.y == 1 ){
// do some special stuff
}
Be sure to check the JavaScript console for errors. In your case you probably need to set filtering to NEAREST since you're not providing mips and since integer textures can not be filtered.

why fundamental Frequency and magnitude are not null when microphone is off?

I would like to make real time audio processing with Qt and display the spectrum using FFTW3.
What I've done in steps:
I capture any sound from computer device and fill it into the buffer.
I assign sound samples to double array
I compute the fundamental frequency.
when I'm display the fundamental frequency and Magnetitude when the microphone is on but no signal(silence) , the fundamental frequency is not what I expected , the code don't always return zero , sometimes the code returns 1500Hz,2000hz as frequency
and when the microphone is off (mute) the code don't return zero as fundamamental frequency but returns a number between 0 and 9000Hz. Any help woulbd be appreciated
here is my code
QByteArray *buffer;
QAudioInput *audioInput;
audioInput = new QAudioInput(format, this);
//Check the number of samples in input buffer
qint64 len = audioInput->bytesReady();
//Limit sample size
if(len > 4096)
len = 4096;
//Read sound samples from input device to buffer
qint64 l = input->read(buffer.data(), len);
int input_size= BufferSize;
int output_size = input_size; //input_size/2+1;
fftw_plan p3;
double in[output_size];
fftw_complex out[output_size];
short *outdata = (short*)m_buffer.data();// assign sample into short array
int data_size = size_t(outdata);
int data_size1 = sizeof(outdata);
int count = 0;
double w = 0;
for(int i(chanelNumber); i < output_size/2; i= i + 2) //fill array in
{
w= 0.5 * (1 - cos(2*M_PI*i/output_size)); // Hann Windows
double x = 0;
if(i < data_size){
x = outdata[i];
}
if(count < output_size){
in[count] = x;// fill Array In with sample from buffer
count++;
}
}
for(int i=count; i<output_size; i++){
in[i] = 0;
}
p3 = fftw_plan_dft_r2c_1d(output_size, in, out, FFTW_ESTIMATE);// create Plan
fftw_execute(p3);// FFT
for (int i = 0; i < (output_size/2); i++) {
long peak=0;
double Amplitudemax=0;
double r1 = out[i][0] * out[i][0];
double im1 = out[i][3] * out[i][4];
double t1 = r1 + im1;
//double t = 20*log(sqrt(t1));
double t = sqrt(t1)/(double)(output_size/2);
double f = (double)i*8000 / ((double)output_size/2);
if(Magnitude > AmplitudeMax)
{
AmplitudeMax = Magnitude;
Peak =2* i;
}
}
fftw_destroy_plan(p3);
return Peak*(static_cast<double>(8000)/output_Size);
What you think is silence might contain some small amount of noise. The FFT of random noise will also appear random, and thus have a random magnitude peak. But it is possible that noise might come from equipment or electronics in the environment (fans, flyback transformers, etc.), or the power supply to your ADC or mic, thus showing some frequency biases.
If the noise level is low enough, normally one checks the level of the magnitude peak, compares it against a threshold, and cuts off frequency estimation reporting below this threshold.

Optimizing kernel shuffled keys code - OpenCL

I have just started getting into OpenCL and going through the basics of writing a kernel code. I have written a kernel code for calculating shuffled keys for points array. So, for a number of points N, the shuffled keys are calculated in 3-bit fashion, where x-bit at depth d (0
xd = 0 if p.x < Cd.x
xd = 1, otherwise
The Shuffled xyz key is given as:
x1y1z1x2y2z2...xDyDzD
The Kernel code written is given below. The point is inputted in a column major format.
__constant float3 boundsOffsetTable[8] = {
{-0.5,-0.5,-0.5},
{+0.5,-0.5,-0.5},
{-0.5,+0.5,-0.5},
{-0.5,-0.5,+0.5},
{+0.5,+0.5,-0.5},
{+0.5,-0.5,+0.5},
{-0.5,+0.5,+0.5},
{+0.5,+0.5,+0.5}
};
uint setBit(uint x,unsigned char position)
{
uint mask = 1<<position;
return x|mask;
}
__kernel void morton_code(__global float* point,__global uint*code,int level, float3 center,float radius,int size){
// Get the index of the current element to be processed
int i = get_global_id(0);
float3 pt;
pt.x = point[i];pt.y = point[size+i]; pt.z = point[2*size+i];
code[i] = 0;
float3 newCenter;
float newRadius;
if(pt.x>center.x) code = setBit(code,0);
if(pt.y>center.y) code = setBit(code,1);
if(pt.z>center.z) code = setBit(code,2);
for(int l = 1;l<level;l++)
{
for(int i=0;i<8;i++)
{
newRadius = radius *0.5;
newCenter = center + boundOffsetTable[i]*radius;
if(newCenter.x-newRadius<pt.x && newCenter.x+newRadius>pt.x && newCenter.y-newRadius<pt.y && newCenter.y+newRadius>pt.y && newCenter.z-newRadius<pt.z && newCenter.z+newRadius>pt.z)
{
if(pt.x>newCenter.x) code = setBit(code,3*l);
if(pt.y>newCenter.y) code = setBit(code,3*l+1);
if(pt.z>newCenter.z) code = setBit(code,3*l+2);
}
}
}
}
It works but I just wanted to ask if I am missing something in the code and if there is an way to optimize the code.
Try this kernel:
__kernel void morton_code(__global float* point,__global uint*code,int level, float3 center,float radius,int size){
// Get the index of the current element to be processed
int i = get_global_id(0);
float3 pt;
pt.x = point[i];pt.y = point[size+i]; pt.z = point[2*size+i];
uint res;
res = 0;
float3 newCenter;
float newRadius;
if(pt.x>center.x) res = setBit(res,0);
if(pt.y>center.y) res = setBit(res,1);
if(pt.z>center.z) res = setBit(res,2);
for(int l = 1;l<level;l++)
{
for(int i=0;i<8;i++)
{
newRadius = radius *0.5;
newCenter = center + boundOffsetTable[i]*radius;
if(newCenter.x-newRadius<pt.x && newCenter.x+newRadius>pt.x && newCenter.y-newRadius<pt.y && newCenter.y+newRadius>pt.y && newCenter.z-newRadius<pt.z && newCenter.z+newRadius>pt.z)
{
if(pt.x>newCenter.x) res = setBit(res,3*l);
if(pt.y>newCenter.y) res = setBit(res,3*l+1);
if(pt.z>newCenter.z) res = setBit(res,3*l+2);
}
}
}
//Save the result
code[i] = res;
}
Rules to optimize:
Avoid Global memory (you were using "code" directly from global memory, I changed that), you should see 3x increase in performance now.
Avoid Ifs, use "select" instead if it is possible. (See OpenCL documentation)
Use more memory inside the kernel. You don't need to operate at bit level. Operation at int level would be better and could avoid huge amount of calls to "setBit". Then you can construct your result at the end.
Another interesting thing. Is that if you are operating at 3D level, you can just use float3 variables and compute the distances with OpenCL operators. This can increase your performance quite a LOT. BUt also requires a complete rewrite of your kernel.

Resources