I am trying to extract NARF keypoints and descriptors for raw pointcloud data using pcl::NarfKeypoint , pcl::NarfDescriptor.
In visualization process, I can simply plot my keypoints along with the range image generated from the original ponitcloud.
The problem, however, deals with visualizing descriptors along with keypoints.
As far as I have understood, Narf computes all indices for keypoints and using getVector3fMap(), one can simply visualize them with pcl::visualization.
When it comes to descriptors, the output would be x, y, z, roll, pitch, yaw and more importantly descriptors[36].
Does anyone know how to visualize the descriptors with keypoints in PCL?
Do we really need to utilize those 36 points in descriptors[36] to address this problem?
My sample code:
// --------------------------------
// -----Extract NARF keypoints-----
// --------------------------------
clock_t begin = clock();
pcl::RangeImageBorderExtractor range_image_border_extractor;
pcl::NarfKeypoint narfKp (&range_image_border_extractor);
narfKp.setRangeImage (&range_image);
narfKp.getParameters().support_size = support_size;
narfKp.getParameters().calculate_sparse_interest_image = true;
narfKp.getParameters().use_recursive_scale_reduction = true;
pcl::PointCloud<int> keyPoIdx;
narfKp.compute (keyPoIdx);
cout << "range image = " << range_image << "\n \n";
cout << "keypoint = "<< keyPoIdx <<"\n";
cout << "time to compute NARF keyPoints = " << (float)(clock() - begin) / CLOCKS_PER_SEC << " [sec] \n";
// --------------------------------
// ----Extract NARF descriptors----
// --------------------------------
vector<int> desIdx;
desIdx.resize(keyPoIdx.points.size());
for (unsigned int i = 0; i < desIdx.size(); i++)
{
desIdx[i] = keyPoIdx.points[i];
}
pcl::NarfDescriptor narfDes (&range_image, &desIdx);
narfDes.getParameters().support_size = support_size;
narfDes.getParameters().rotation_invariant = true; // cause more descriptors than keypoints
pcl::PointCloud<pcl::Narf36> outputNarfDes;
narfDes.compute(outputNarfDes);
cout << "Extracted "<< outputNarfDes.size() <<" descriptors for " << keyPoIdx.points.size() << " keypoints.\n";
//------------------------------------------------------------------ //
//-----------------------Visualization-------------------------------//
// ----------------------------------------------------------------- //
// ----------------------------------------------
// -----Show keypoints in range image widget-----
// ----------------------------------------------
//for (size_t i=0; i<keyPoIdx.points.size (); ++i)
//range_image_widget.markPoint (keyPoIdx.points[i]%range_image.width,
//keyPoIdx.points[i]/range_image.width);
// ---------------------------------------
// -----Show Descriptors in 3D viewer-----
// ---------------------------------------
pcl::PointCloud<pcl::PointXYZ>::Ptr descriptors_ptr (new pcl::PointCloud<pcl::PointXYZ>);
pcl::PointCloud<pcl::PointXYZ>& desVIZ = *descriptors_ptr;
desVIZ.points.resize(outputNarfDes.size());
cout << "descriptor index size = " << desVIZ.points.size() << "\n";
for (size_t i=0; i < desVIZ.points.size(); ++i)
//for (size_t i=0; i<desIdx.size(); ++i)
{
// ??????????????? MY PROBLEM ???????????????????
desVIZ.points[i].getVector3fMap () = range_image.points[outputNarfDes.points[i]].getVector3fMap ();
// ??????????????? MY PROBLEM ???????????????????
}
pcl::visualization::PointCloudColorHandlerCustom<pcl::PointXYZ> des_color_handler (descriptors_ptr, 200, 0, 50);
viewer.addPointCloud<pcl::PointXYZ> (descriptors_ptr, des_color_handler, "descriptors");
viewer.setPointCloudRenderingProperties (pcl::visualization::PCL_VISUALIZER_POINT_SIZE, 10, "descriptors");
// -------------------------------------
// -----Show keypoints in 3D viewer-----
// -------------------------------------
pcl::PointCloud<pcl::PointXYZ>::Ptr keypoints_ptr (new pcl::PointCloud<pcl::PointXYZ>);
pcl::PointCloud<pcl::PointXYZ>& keyPo = *keypoints_ptr;
keyPo.points.resize(keyPoIdx.points.size());
for (size_t i=0; i<keyPoIdx.points.size(); ++i)
{
keyPo.points[i].getVector3fMap () = range_image.points[keyPoIdx.points[i]].getVector3fMap ();
}
pcl::visualization::PointCloudColorHandlerCustom<pcl::PointXYZ> keypoints_color_handler (keypoints_ptr, 0, 200, 0);
viewer.addPointCloud<pcl::PointXYZ> (keypoints_ptr, keypoints_color_handler, "keypoints");
viewer.setPointCloudRenderingProperties (pcl::visualization::PCL_VISUALIZER_POINT_SIZE, 5, "keypoints");
//--------------------
// -----Main loop-----
//--------------------
while (!viewer.wasStopped ())
{
range_image_widget.spinOnce (); // process GUI events
viewer.spinOnce ();
pcl_sleep(0.01);
}
You can visualize the roll/pitch/yaw, by converting it to a vector. Refer to this answer for details. This vector can either be used as the normal of each points - that way, for each view, only keypoints sharing the same orientation will have a color. Alternatively you may try to draw arrows in the position of the keypoints.
To visualize the descriptor, you can project it to a three dimensional space using PCA. Than, it can be used to set the color of your keypoints.
Related
One of my OpenCL helper functions writing to global memory in one place runs just fine, and the kernel executes typically. Still, when run from directly after that line, it freezes/crashes the kernel, and my program can't function.
The values in this function change (different values for an NDRange of 2^16), and therefore the loops change as well, and not all threads can execute the same code because of the conditionals.
Why exactly is this an issue? Am I missing some kind of memory blocking or something?
void add_world_seeds(yada yada yada...., const uint global_id, __global long* world_seeds)
for (; indexer < (1 << 16); indexer += increment) {
long k = (indexer << 16) + c;
long target2 = (k ^ e) >> 16;
long second_addend = get_partial_addend(k, x, z) & MASK_16;
if (ctz(target2 - second_addend) < mult_trailing_zeroes) { continue; }
long a = (((first_mult_inv * (target2 - second_addend)) >> mult_trailing_zeroes) ^ (J1_MUL >> 32)) & mask;
for (; a < (1 << 16); a += increment) {
world_seeds[global_id] = (a << 32) + k; //WORKS HERE
if (get_population_seed((a << 32) + k, x, z) != population_seed_state) { continue; }
world_seeds[global_id] = (a << 32) + k; //DOES NOT WORK HERE
}
}
for (; a < (1 << 16); a += increment) {
world_seeds[global_id] = (a << 32) + k; //WORKS HERE
if (get_population_seed((a << 32) + k, x, z) != population_seed_state) { continue; }
world_seeds[global_id] = (a << 32) + k; //DOES NOT WORK HERE
}
There was in fact a bug causing the undefined behavior in the code, in particular the main reversal kernel included a variable in the arguments called "increment", and in that same kernel I defined another variable called increment. It compiled fine but led to completely all over the wall wrong results and memory crashes.
I am new to CUDA and CUFFT, when I was trying to recover the fft result of cufftExecC2R(...) by applying the corresponding cufftExecC2R(...), it went wrong, the recovered data and the original data is not identical.
Here is the code, the cuda library I used was cuda-9.0.
#include "device_launch_parameters.h"
#include "cuda_runtime.h"
#include "cuda.h"
#include "cufft.h"
#include <iostream>
#include <sys/time.h>
#include <cstdio>
#include <cmath>
using namespace std;
// cuda error check
#define gpuErrchk(ans) {gpuAssrt((ans), __FILE__, __LINE__);}
inline void gpuAssrt(cudaError_t code, const char* file, int line, bool abort=true) {
if (code != cudaSuccess) {
fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) {
exit(code);
}
}
}
// ifft scale for cufft
__global__ void IFFTScale(int scale_, cufftReal* real) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
real[idx] *= 1.0 / scale_;
}
void batch_1d_irfft2_test() {
const int BATCH = 3;
const int DATASIZE = 4;
/// RFFT
// --- Host side input data allocation and initialization
cufftReal *hostInputData = (cufftReal*)malloc(DATASIZE*BATCH*sizeof(cufftReal));
for (int i = 0; i < BATCH; ++ i) {
for (int j = 0; j < DATASIZE; ++ j) {
hostInputData[i * DATASIZE + j] = (cufftReal)(i * DATASIZE + j + 1);
}
}
// DEBUG:print host input data
cout << "print host input data" << endl;
for (int i = 0; i < BATCH; ++ i) {
for (int j = 0; j < DATASIZE; ++ j) {
cout << hostInputData[i * DATASIZE + j] << ", ";
}
cout << endl;
}
cout << "=====================================================" << endl;
// --- Device side input data allocation and initialization
cufftReal *deviceInputData;
gpuErrchk(cudaMalloc((void**)&deviceInputData, DATASIZE * BATCH * sizeof(cufftReal)));
// --- Device side output data allocation
cufftComplex *deviceOutputData;
gpuErrchk(cudaMalloc(
(void**)&deviceOutputData,
(DATASIZE / 2 + 1) * BATCH * sizeof(cufftComplex)));
// Host sice input data copied to Device side
cudaMemcpy(deviceInputData,
hostInputData,
DATASIZE * BATCH * sizeof(cufftReal),
cudaMemcpyHostToDevice);
// --- Batched 1D FFTs
cufftHandle handle;
int rank = 1; // --- 1D FFTs
int n[] = {DATASIZE}; // --- Size of the Fourier transform
int istride = 1, ostride = 1; // --- Distance between two successive input/output elements
int idist = DATASIZE, odist = DATASIZE / 2 + 1; // --- Distance between batches
int inembed[] = { 0 }; // --- Input size with pitch (ignored for 1D transforms)
int onembed[] = { 0 }; // --- Output size with pitch (ignored for 1D transforms)
int batch = BATCH; // --- Number of batched executions
cufftPlanMany(
&handle,
rank,
n,
inembed, istride, idist,
onembed, ostride, odist,
CUFFT_R2C,
batch);
cufftExecR2C(handle, deviceInputData, deviceOutputData);
// **************************************************************************
/// IRFFT
cufftReal *deviceOutputDataIFFT;
gpuErrchk(cudaMalloc((void**)&deviceOutputDataIFFT, DATASIZE * BATCH * sizeof(cufftReal)));
// --- Batched 1D IFFTs
cufftHandle handleIFFT;
int n_ifft[] = {DATASIZE / 2 + 1}; // --- Size of the Fourier transform
idist = DATASIZE / 2 + 1; odist = DATASIZE; // --- Distance between batches
cufftPlanMany(
&handleIFFT,
rank,
n_ifft,
inembed, istride, idist,
onembed, ostride, odist,
CUFFT_C2R,
batch);
cufftExecC2R(handleIFFT, deviceOutputData, deviceOutputDataIFFT);
/* scale
// dim3 dimGrid(512);
// dim3 dimBlock(max((BATCH * DATASIZE + 512 - 1) / 512, 1));
// IFFTScale<<<dimGrid, dimBlock>>>((DATASIZE - 1) * 2, deviceOutputData);
*/
// host output data for ifft
cufftReal *hostOutputDataIFFT = (cufftReal*)malloc(DATASIZE*BATCH*sizeof(cufftReal));
cudaMemcpy(hostOutputDataIFFT,
deviceOutputDataIFFT,
DATASIZE * BATCH * sizeof(cufftReal),
cudaMemcpyDeviceToHost);
// print IFFT recovered host output data
cout << "print host output IFFT data" << endl;
for (int i=0; i<BATCH; i++) {
for (int j=0; j<DATASIZE; j++) {
cout << hostOutputDataIFFT[i * DATASIZE + j] << ", ";
}
printf("\n");
}
cufftDestroy(handle);
gpuErrchk(cudaFree(deviceOutputData));
gpuErrchk(cudaFree(deviceInputData));
gpuErrchk(cudaFree(deviceOutputDataIFFT));
free(hostOutputDataIFFT);
free(hostInputData);
}
int main() {
batch_1d_irfft2_test();
return 0;
}
I compile the 'rfft_test.cu' file by nvcc -o rfft_test rfft_test.cu -lcufft. the result is as below:
print host input data
1, 2, 3, 4,
5, 6, 7, 8,
9, 10, 11, 12,
=====================================================
print IFFT recovered host output data
6, 8.5359, 15.4641, 0,
22, 24.5359, 31.4641, 0,
38, 40.5359, 47.4641, 0,
Specifically, I check the scale issue for the cufftExecC2R(...), and I comment out the IFFTScale() kernel function. Thus I assume that the recovered output data was like DATASIZE*input_batched_1d_data, but even so, the result is not as expected.
I have checked the cufft manual and my code several times, I also search for some Nvidia forums and StackOverflow answers, but I didn't find any solution. Anyone's help is greatly appreciated.
Thanks in advance.
Size of your inverse transform is incorrect and should be DATASIZE not DATASIZE/2+1.
Following sections of cuFFT docs should help:
https://docs.nvidia.com/cuda/cufft/index.html#data-layout
https://docs.nvidia.com/cuda/cufft/index.html#multi-dimensional
"In C2R mode an input array ( x 1 , x 2 , … , x ⌊ N 2 ⌋ + 1 ) of only non-redundant complex elements is required." - N is transform size you pass to plan function
I'm trying to implement some draws using a polya urn scheme using Rcpp. Basically, I have a matrix I'm drawing from, and a 2nd matrix with weights proportional to the probabilities. After each draw, I need to increase the weight of whichever cell I drew.
I was running into some indexing errors which lead me to examine the sampling more generally, and I found that my weight matrix was getting modified by RcppArmadillo::sample. Two questions (1) is this behavior that I should have expected or is this a bug which I should report somewhere? (2) Any ideas on current work-around? Here's a reproducible example:
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
// [[Rcpp::export]]
void sampler(int N, int inc, NumericMatrix& weight_matrix, int reps) {
IntegerVector wm_tmp = seq_along(weight_matrix);
Rcout << "Initial weight_matrix:\n" << weight_matrix << "\n";
int x_ind;
for(int i = 0; i < reps; ++i) {
x_ind = RcppArmadillo::sample(wm_tmp, 1, true, weight_matrix)(0) - 1;
Rcout << "Weight matrix after sample: (rep = " << i << ")\n" << weight_matrix << "\n";
Rcout << "x_ind: " << x_ind << "\n";
// get indices
weight_matrix[x_ind] = weight_matrix[x_ind] + inc;
Rcout << "Add increment of " << inc << " to weight_matrix:\n" << weight_matrix << "\n";
}
}
//
// // [[Rcpp::export]]
// IntegerVector seq_cpp(IntegerMatrix x) {
// IntegerVector tmp = seq_along(x);
// IntegerVector ret = RcppArmadillo::sample(tmp, 2, true);
// return ret;
// }
/*** R
weight_matrix <- matrix(1, 5, 2)
sampler(5, 1, weight_matrix, 3)
weight_matrix <- matrix(1, 5, 2)
sampler(5, 0, weight_matrix, 3)
*/
Thanks!
That is known and documented behaviour.
You could do
i) Use Rcpp::clone() to create a distinct copy of your SEXP (ie NumericMatrix).
ii) Use an Armadillo matrix instead and pass as const arma::mat & m.
There are architectural reasons having to do with the way R organizes its data structure which mean that we cannot give you fast access (no copies!) and also protect against writes.
I made a prim's algorithm but whenever i try to use the code it give me the same matrix back. In general it isn't minimizing. Can anyone check the code and let me know why it isn't minimizing my matrix
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <vector>
#include <list>
#include <cstdlib>
#include <iomanip>
#include <limits.h>
int minKey(int n,int key[], bool mst[])
{
// Initialize min value
int min = INT_MAX, min_index;
for (int i = 0; i < n; i++)
if (mst[i] == false && key[i] < min)
min = key[i], min_index = i;
return min_index;
}
void print(int n,int **matrix)
{
for(int i=0; i<n; i++)
{
for(int j=0; j<n; j++) // print the matrix
{
cout << setw(2) << matrix[i][j] << " ";
}
cout << endl;
}
}
int **gen_random_graph(int n)
{
srand(time(0));
int **adj_matrix = new int*[n];
for(int i = 0; i < n; i++)
{
for (int j = i; j < n; j++) //generating a N x N matrix based on the # of vertex input
{
adj_matrix[i] = new int[n];
}
}
for(int u = 0; u < n; u++)
{
for (int v = u; v < n; v++) //decide whether it has an edge or not
{
bool edgeOrNot = rand() % 2;
adj_matrix[u][v] = adj_matrix[v][u] = edgeOrNot;
cout << u << " " << v << " " << adj_matrix[u][v] << endl;
if(adj_matrix[u][v] == true)
{
adj_matrix[v][u] = true;
if(u == v) //We can't have i = j in an undirected graph
{
adj_matrix[u][v] = -1;
}
cout << u << " " << v << " " << adj_matrix[u][v] << endl;
}
else
{
adj_matrix[v][u] = adj_matrix[u][v] = -1;
cout << u << " " << v << " " << adj_matrix[u][v] << "else" << endl;
}
}
}
for(int i = 0; i < n; i++)
{
for(int j = i; j < n; j++) //create the N x N with edges and sets the weight between the edge randomly
{
if(adj_matrix[i][j] == true)
{
int weight = rand() % 10 + 1;
adj_matrix[i][j] = adj_matrix[j][i] = weight;
cout << " ( " << i << "," << j << " ) " << "weight: " << adj_matrix[i][j] << endl;
}
}
}
print(n,adj_matrix);
return (adj_matrix);
}
void solve_mst_prim_matrix(int n, int **matrix)
{
int parent[n]; // Array to store constructed MST
int key[n]; // Key values used to pick minimum weight edge in cut
bool mstSet[n]; // To represent set of vertices not yet included in MST
// Initialize all keys as INFINITE
for (int i = 0; i < n; i++)
{
key[i] = INT_MAX, mstSet[i] = false;
}
// Always include first 1st vertex in MST.
key[0] = 0; // Make key 0 so that this vertex is picked as first vertex
parent[0] = -1; // First node is always root of MST
// The MST will have n vertices
for (int count = 0; count < n-1; count++)
{
// Pick the minimum key vertex from the set of vertices
// not yet included in MST
int u = minKey(n,key, mstSet);
// Add the picked vertex to the MST Set
mstSet[u] = true;
// Update key value and parent index of the adjacent vertices of
// the picked vertex. Consider only those vertices which are not yet
// included in MST
for (int v = 0; v < n; v++)
// matrix[u][v] is non zero only for adjacent vertices of m
// mstSet[v] is false for vertices not yet included in MST
// Update the key only if matrix[u][v] is smaller than key[v]
if (matrix[u][v] && mstSet[v] == false && matrix[u][v] < key[v])
parent[v] = u, key[v] = matrix[u][v];
}
cout << endl;
print(n,matrix);
}
int main()
{
int N;
cout << "Enter number of vertices" << endl;
cin >> N;
int **matrix = gen_random_graph(N);
solve_mst_prim_matrix(N, matrix);
return 0;
}
Correct me if I'm wrong, but after reading your code, you did not even change any value of **matrix in your solve_mst_prim_matrix function. So it basically prints the same thing..
I'm very interested in cryptography, and since I like programming too, I decided to make a little program to encrypt files using XTEA encryption algorithm.
I got inspired from Wikipedia, and so I wrote this function to do the encryption (To save space, I won't post the deciphering function, as it is almost the same):
void encipher(long *v, long *k)
{
long v0 = v[0], v1 = v[1];
long sum = 0;
long delta = 0x9e3779b9;
short rounds = 32;
for(uint32 i = 0; i<rounds; i++)
{
v0 += (((v1 << 4) ^ (v1 >> 5)) + v1) ^ (sum + k[sum & 3]);
sum += delta;
v1 += (((v0 << 4) ^ (v0 >> 5)) + v0) ^ (sum + k[(sum>>11) & 3]);
}
v[0] = v1;
v[1] = v1;
}
Now when I want to use it, I wrote this code:
long data[2]; // v0 and v1, 64bits
data[0] = 1;
data[1] = 1;
long key[4]; // 4 * 4 bytes = 16bytes = 128bits
*key = 123; // sets the key
cout << "READ: \t\t" << data[0] << endl << "\t\t" << data[1] << endl;
encipher(data, key);
cout << "ENCIPHERED: \t" << data[0] << endl << "\t\t" << data[1] << endl;
decipher(data, key);
cout << "DECIPHERED: \t" << data[0] << endl << "\t\t" << data[1] << endl;
I always get either run-time crash or wrong decipher text:
I do understand the basics of the program, but I don't really know what is wrong with my code. Why is the enciphered data[0] and data1 the same? And why is deciphered data completely different from the starting data? Am I using the types wrong?
I hope you can help me solving my problem :) .
Jan
The problem is here:
v[0] = v1; // should be v[0] = v0
v[1] = v1;
Also, you only set the first 4 bytes of the key. The remaining 12 bytes are uninitialized.
Try something like this:
key[0] = 0x12345678;
key[1] = 0x90ABCDEF;
key[2] = 0xFEDCBA09;
key[3] = 0x87654321;
The fixed code gives me this output:
READ: 1
1
ENCIPHERED: -303182565
-1255815002
DECIPHERED: 1
1