Using btSoftBodyHelpers::CreateFromTriMesh with trimesh - bullet

I've been trying for a while to get support for softbodies in my project,
I have already added all primitives, including static triangle meshes as you can see below:
I've now been trying to implement the softbodies.
I do have triangle shapes as I mentioned, and I thought I could re-use the triangulation code to
create softbody objects with the function:
btSoftBody* psb = btSoftBodyHelpers::CreateFromTriMesh(.....);
I successfully did this with the bunny mesh that's hardcoded, but now I want to insert any trinangulated mesh into this function.
But I'm a bit lost figuring out exactly what parameters to send in (how to get the right parameters from my triangulated mesh).
Do anyone of you have a example of this? (not a hardcoded one, but from a
btTriangleMesh *mTriMesh = new btTriangleMesh();
type object? )
It does work with the predefined type shapes that bullet has, so my update loop and all that works fine.

This is for version 2.81 (assuming vertices are stored as PHY_FLOAT and indices as PHY_INTEGER):
btTriangleMesh *mTriMesh = new btTriangleMesh();
// ...
const btVector3 meshScaling = mTriMesh->getScaling();
btAlignedObjectArray<btScalar> vertices;
btAlignedObjectArray<int> triangles;
for (int part=0;part< mTriMesh->getNumSubParts(); part++)
{
const unsigned char * vertexbase;
const unsigned char * indexbase;
int indexstride;
int stride,numverts,numtriangles;
PHY_ScalarType type, gfxindextype;
mTriMesh->getLockedReadOnlyVertexIndexBase(&vertexbase,numverts,type,stride,&indexbase,indexstride,numtriangles,gfxindextype,part);
for (int gfxindex=0; gfxindex < numverts; gfxindex++)
{
float* graphicsbase = (float*)(vertexbase+gfxindex*stride);
vertices.push_back(graphicsbase[0]*meshScaling.getX());
vertices.push_back(graphicsbase[1]*meshScaling.getY());
vertices.push_back(graphicsbase[2]*meshScaling.getZ());
}
for (int gfxindex=0;gfxindex < numtriangles; gfxindex++)
{
unsigned int* tri_indices= (unsigned int*)(indexbase+gfxindex*indexstride);
triangles.push_back(tri_indices[0]);
triangles.push_back(tri_indices[1]);
triangles.push_back(tri_indices[2]);
}
}
btSoftBodyWorldInfo worldInfo;
// Setup worldInfo...
// ....
btSoftBodyHelper::CreateFromTriMesh(worldInfo, &vertices[0], &triangles[0], triangles.size()/3 /*, randomizeConstraints = true*/);
A slower, more general approach is to iterate the mesh using mTriMesh->InternalProcessAllTriangles() but that will make your mesh a soup.

Related

Why does thrust::device_vector not seem to have a chance to hold raw pointers to other device_vectors?

I have a question that I found many threads in, but none did explicitly answer my question.
I am trying to have a multidimensional array inside the kernel of the GPU using thrust. Flattening would be difficult, as all the dimensions are non-homogeneous and I go up to 4D. Now I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome), so I tried going the way over raw-pointers.
My reasoning is, a raw pointer points onto memory on the GPU, why else would I be able to access it from within the kernel. So I should technically be able to have a device_vector, which holds raw pointers, all pointers that should be accessible from within the GPU. This way I constructed the following code:
thrust::device_vector<Vector3r*> d_fluidmodelParticlePositions(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
FluidModel *model = sim->getFluidModelFromPointSet(fluidModelIndex);
const unsigned int numParticles = model->numActiveParticles();
thrust::device_vector<Vector3r> d_neighborPositions(model->getPositions().begin(), model->getPositions().end());
d_fluidmodelParticlePositions[fluidModelIndex] = CudaHelper::GetPointer(d_neighborPositions);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
FluidModel *fm_neighbor = sim->getFluidModelFromPointSet(pid);
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = sim->numberOfNeighbors(fluidModelIndex, pid, i);
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = sim->getNeighbor(fluidModelIndex, pid, i, j);
}
d_neighborIndexesArray[i] = CudaHelper::GetPointer(d_neighborIndexes);
}
d_fluidNeighborIndexes[pid] = CudaHelper::GetPointer(d_neighborIndexesArray);
d_nNeighborsFluid[pid] = CudaHelper::GetPointer(d_nNeighbors);
}
d_allFluidNeighborParticles[fluidModelIndex] = CudaHelper::GetPointer(d_fluidNeighborIndexes);
d_nFluidNeighborsCrossFluids[fluidModelIndex] = CudaHelper::GetPointer(d_nNeighborsFluid);
}
Now the compiler won't complain, but accessing for example d_nFluidNeighborsCrossFluids from within the kernel will work, but return wrong values. I access it like this (again, from within a kernel):
d_nFluidNeighborsCrossFluids[iterator1][iterator2][iterator3];
// Note: out of bounds indexing guaranteed to not happen, indexing is definitely right
The question is, why does it return wrong values? The logic behind it should work in my opinion, since my indexing is correct and the pointers should be valid addresses from within the kernel.
Thank you already for your time and have a great day.
EDIT:
Here is a minimal reproducable example. For some reason the values appear right despite of having the same structure as my code, but cuda-memcheck reveals some errors. Uncommenting the two commented lines leads me to my main problem I am trying to solve. What does the cuda-memcheck here tell me?
/* Part of this example has been taken from code of Robert Crovella
in a comment below */
#include <thrust/device_vector.h>
#include <stdio.h>
template<typename T>
static T* GetPointer(thrust::device_vector<T> &vector)
{
return thrust::raw_pointer_cast(vector.data());
}
__global__
void k(unsigned int ***nFluidNeighborsCrossFluids, unsigned int ****allFluidNeighborParticles){
const unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i > 49)
return;
printf("i: %d nNeighbors: %d\n", i, nFluidNeighborsCrossFluids[0][0][i]);
//for(int j = 0; j < nFluidNeighborsCrossFluids[0][0][i]; j++)
// printf("i: %d j: %d neighbors: %d\n", i, j, allFluidNeighborParticles[0][0][i][j]);
}
int main(){
const unsigned int nModels = 2;
const int numParticles = 50;
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = i;
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = i + j;
}
d_neighborIndexesArray[i] = GetPointer(d_neighborIndexes);
}
d_nNeighborsFluid[pid] = GetPointer(d_nNeighbors);
d_fluidNeighborIndexes[pid] = GetPointer(d_neighborIndexesArray);
}
d_nFluidNeighborsCrossFluids[fluidModelIndex] = GetPointer(d_nNeighborsFluid);
d_allFluidNeighborParticles[fluidModelIndex] = GetPointer(d_fluidNeighborIndexes);
}
k<<<256, 256>>>(GetPointer(d_nFluidNeighborsCrossFluids), GetPointer(d_allFluidNeighborParticles));
if (cudaGetLastError() != cudaSuccess)
printf("Sync kernel error: %s\n", cudaGetErrorString(cudaGetLastError()));
cudaDeviceSynchronize();
}
A device_vector is a class definition. That class has various methods and operators associated with it. The thing that allows you to do this:
d_nFluidNeighborsCrossFluids[...]...;
is a square-bracket operator. That operator is a host operator (only). It is not usable in device code. Issues like this give rise to the general statements that "thrust::device_vector is not usable in device code." The device_vector object itself is generally not usable. However the data it contains is usable in device code, if you attempt to access it via a raw pointer.
Here is an example of a thrust device vector that contains an array of pointers to the data contained in other device vectors. That data is usable in device code, as long as you don't attempt to make use of the thrust::device_vector object itself:
$ cat t1509.cu
#include <thrust/device_vector.h>
#include <stdio.h>
template <typename T>
__global__ void k(T **data){
printf("the first element of vector 1 is: %d\n", (int)(data[0][0]));
printf("the first element of vector 2 is: %d\n", (int)(data[1][0]));
printf("the first element of vector 3 is: %d\n", (int)(data[2][0]));
}
int main(){
thrust::device_vector<int> vector_1(1,1);
thrust::device_vector<int> vector_2(1,2);
thrust::device_vector<int> vector_3(1,3);
thrust::device_vector<int *> pointer_vector(3);
pointer_vector[0] = thrust::raw_pointer_cast(vector_1.data());
pointer_vector[1] = thrust::raw_pointer_cast(vector_2.data());
pointer_vector[2] = thrust::raw_pointer_cast(vector_3.data());
k<<<1,1>>>(thrust::raw_pointer_cast(pointer_vector.data()));
cudaDeviceSynchronize();
}
$ nvcc -o t1509 t1509.cu
$ cuda-memcheck ./t1509
========= CUDA-MEMCHECK
the first element of vector 1 is: 1
the first element of vector 2 is: 2
the first element of vector 3 is: 3
========= ERROR SUMMARY: 0 errors
$
EDIT: In the mcve you have now posted, you point out that an ordinary run of the code appears to give correct results, but when you use cuda-memcheck, errors are reported. You have a general design problem that will cause this.
In C++, when an object is defined within a curly-braces region:
{
{
Object A;
// object A is in-scope here
}
// object A is out-of-scope here
}
// object A is out of scope here
k<<<...>>>(anything that points to something in object A); // is illegal
and you exit that region, the object defined within the region is now out of scope. For objects with constructors/destructors, this usually means the destructor of the object will be called when it goes out-of-scope. For a thrust::device_vector (or std::vector) this will deallocate any underlying storage associated with that vector. That does not necessarily "erase" any data, but attempts to use that data are illegal and would be considered UB (undefined behavior) in C++.
When you establish pointers to such data inside an in-scope region, and then go out-of-scope, those pointers no longer point to anything that would be legal to access, so attempts to dereference the pointer would be illegal/UB. Your code is doing this. Yes, it does appear to give the correct answer, because nothing is actually erased on deallocation, but the code design is illegal, and cuda-memcheck will highlight that.
I suppose one fix would be to pull all this stuff out of the inner curly-braces, and put it at main scope, just like the d_nFluidNeighborsCrossFluids device_vector is. But you might also want to rethink your general data organization strategy and flatten your data.
You should really provide a minimal, complete, verifiable/reproducible example; yours is neither minimal, nor complete, nor verifiable.
I will, however, answer your side-question:
I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome)
While a device_vector regards a bunch of data on the GPU, it's a host-side data structure - otherwise you would not have been able to use it in host-side code. On the host side, what it holds should be something like: The capacity, the size in elements, the device-side pointer to the actual data, and maybe more information. This is similar to how an std::vector variable may refer to data that's on the heap, but if you create the variable locally the fields I mentioned above will exist on the stack.
Now, those fields of the device vector that are located in host memory are not generally accessible from the device-side. In device-side code you would typically use the raw pointer to the device-side data the device_vector manages.
Also, note that if you have a thrust::device_vector<T> v, each use of operator[] means a bunch of separate CUDA calls to copy data to or from the device (unless there's some caching going on under the hoold). So you really want to avoid using square-brackets with this structure.
Finally, remember that pointer-chasing can be a performance killer, especially on a GPU. You might want to consider massaging your data structure somewhat in order to make it amenable to flattening.

Wrong results after using FFTW_EXHAUSTIVE flag for 2D complex FFTW

I'm using FFTW to compute a 2D complex to complex FFT using this code:
#include <stdlib.h>
#include "defines.h"
#include <math.h>
#include <fftw3.h>
int main(void)
{
fftw_complex *in,*out;
fftw_plan plan;
int rows=64;
int cols=64;
int i;
in = (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*rows*cols);
out = (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*rows*cols);
for (i=0; i<rows*cols; i++)
{
in[i][0] = input_data[2*i];
in[i][1] = input_data[2*i+1];;
}
printf("### Setting plan ###\n");
plan = fftw_plan_dft_2d(rows, cols, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
printf("### Executing plan ###\n");
fftw_execute(plan);
for ( i = 0; i <rows*cols; i++ )
{
printf ( "RE = %f \t IM = %f\n",in[i][0], in[i][1] );
}
fftw_destroy_plan(plan);
fftw_free(in);
fftw_free(out);
return 0;
}
Now, I changed the FFTW flag from ESTIMATE to EXHAUSTIVE in order to allow the planner to choose the optimal algorithm for this 2D FFT but I got an all-zeros result. Can someone tell me what is wrong?
Using the flag FFTW_ESTIMATE, the function fftw_plan_dft_2d() tries to guess which FFT algorithm is the fastest without running any of them. Using the flag FFTW_EXHAUSTIVE, that function runs every possible algorithm and select the fastest one.
The problem is that the input is overwritten in the process.
The solution is to populate the input array after the plan creation!
See documentation of planner flags:
Important: the planner overwrites the input array during planning unless a saved plan (see Wisdom) is available for that problem, so you should initialize your input data after creating the plan. The only exceptions to this are the FFTW_ESTIMATE and FFTW_WISDOM_ONLY flags, as mentioned below.

Using PCL with 2D Laser scanners

I've been researching for a while now on creating a point cloud from laser scans, but I'm running into a few issues:
First of all, PCL doesn't have io support for hokuyo lasers, so I'm planning on using the hokuyoaist library for that.
The main problem I have is how to convert from 2D laser data to a point cloud (pointcloud2) so I can work with the PCL library. I am aware of some packages in ROS that do this, but I really don't want to get near to ROS doing this.
Thanks in advance,
Marwan
You could use something like (untested, but should get you going):
#include <pcl/point_types.h>
#include <pcl/point_cloud.h>
#include <vector>
// Get Hokuyo data
int numberOfDataPoints = 0; // you need to fill this
std::vector<double> hokuyoDataX,hokuyoDataY;
for(int i=0;i<numberOfDataPoints;i++)
{
hokuyoDataX.push_back(...); // you need to fill this for the x of point i
hokuyoDataY.push_back(...); // you need to fill this for the y of point i
}
// Define new cloud object
pcl::PointCloud<pcl::PointXY>::Ptr cloud (new pcl::PointCloud<pcl::PointXY>);
cloud->is_dense = true; // no NaN and INF expected.
cloud->width = hokuyoDataX.size();
cloud->height = 1;
cloud->points.resize(hokuyoDataX.size());
// Now fill the pointcloud
for(int i=0; i<hokuyoDataX.size(); i++)
{
cloud->points[i].x = hokuyoDataX[i];
cloud->points[i].y = hokuyoDataY[i];
}

array struct with pointer accessing members sequentially

I am still learning about pointers and structs, but I hoping someone might know if it is possible to access individual members sequentially by use of a pointer?
Typedef record_data {
float a;
float b;
float c;
}records,*Sptr;
records lists[5];
Sptr ptr;
Example: assign all members of the 5 lists with a value of float 1.0
// instead of this
(void)testworks(void){
int i;
float j=1.0;
ptr = &lists[i]
ptr->lists[0].a = j;
ptr->lists[0].b = j;
ptr->lists[0].c = j;
ptr->lists[1].a = j;
// ... and so on
ptr->lists[4].c = j;
}
// want to do this
(void)testwannado(void){
int a,i;
float j=1.0;
ptr = &lists[i]
for (a=0;a<5;a++){ // step through typedef structs
for (i=0;i<3;i++){ // step through members
???
}
}
Forgive my errors in this example below, but it represents the closest thing I can think of for want I am trying to accomplish.
int *mptr;
mptr = &(ptr->lists[0].a) // want to assign a pointer to members so all 3 members can be used...
*mptr++ = j; // so I can do something like this.
This wasn't compiled, so any other errors are unintentional.
You generally don't want to do that. Structure members should be accessed individually. You can run into a lot of portability problems by assuming the memory layout of how multiple consecutive structure members are placed in memory. And most (C-like) languages do not give you a way to "introspect" through the members of a structure.

Efficient conversion of AVFrame to QImage

I need to extract frames from a video in my Qt based application. Using ffmpeg libraries I am able to fetch frames as AVFrames which I need to convert to QImage to use in other parts of my application. This conversion needs to be efficient. So far it seems sws_scale() is the right function to use but I am not sure what source and destination pixel formats are to be specified.
Came up with the following 2-step process that first converts a decoded AVFame to another AVFrame in RGB colorspace and then to QImage. It works and is reasonably fast.
src_frame = get_decoded_frame();
AVFrame *pFrameRGB = avcodec_alloc_frame(); // intermediate pframe
if(pFrameRGB==NULL) {
;// Handle error
}
int numBytes= avpicture_get_size(PIX_FMT_RGB24,
is->video_st->codec->width, is->video_st->codec->height);
uint8_t *buffer = (uint8_t*)malloc(numBytes);
avpicture_fill((AVPicture*)pFrameRGB, buffer, PIX_FMT_RGB24,
is->video_st->codec->width, is->video_st->codec->height);
int dst_fmt = PIX_FMT_RGB24;
int dst_w = is->video_st->codec->width;
int dst_h = is->video_st->codec->height;
// TODO: cache following conversion context for speedup,
// and recalculate only on dimension changes
SwsContext *img_convert_ctx_temp;
img_convert_ctx_temp = sws_getContext(
is->video_st->codec->width, is->video_st->codec->height,
is->video_st->codec->pix_fmt,
dst_w, dst_h, (PixelFormat)dst_fmt,
SWS_BICUBIC, NULL, NULL, NULL);
QImage *myImage = new QImage(dst_w, dst_h, QImage::Format_RGB32);
sws_scale(img_convert_ctx_temp,
src_frame->data, src_frame->linesize, 0, is->video_st->codec->height,
pFrameRGB->data,
pFrameRGB->linesize);
uint8_t *src = (uint8_t *)(pFrameRGB->data[0]);
for (int y = 0; y < dst_h; y++)
{
QRgb *scanLine = (QRgb *) myImage->scanLine(y);
for (int x = 0; x < dst_w; x=x+1)
{
scanLine[x] = qRgb(src[3*x], src[3*x+1], src[3*x+2]);
}
src += pFrameRGB->linesize[0];
}
If you find a more efficient approach, let me know in the comments
I know, it's too late, but maybe someone will find it useful. From here I got the clue of doing the same conversion, which looks a bit shorter.
So I created QImage which is reused for every decoded frame:
QImage img( width, height, QImage::Format_RGB888 );
Created frameRGB:
frameRGB = av_frame_alloc();
//Allocate memory for the pixels of a picture and setup the AVPicture fields for it.
avpicture_alloc( ( AVPicture *) frameRGB, AV_PIX_FMT_RGB24, width, height);
After the the first frame is decoded I create conversion context SwsContext this way (it will be used for all the next frames):
mImgConvertCtx = sws_getContext( codecContext->width, codecContext->height, codecContext->pix_fmt, width, height, AV_PIX_FMT_RGB24, SWS_BICUBIC, NULL, NULL, NULL);
And finally for every decoded frame conversion is performed:
if( 1 == framesFinished && nullptr != imgConvertCtx )
{
//conversion frame to frameRGB
sws_scale(imgConvertCtx, frame->data, frame->linesize, 0, codecContext->height, frameRGB->data, frameRGB->linesize);
//setting QImage from frameRGB
for( int y = 0; y < height; ++y )
memcpy( img.scanLine(y), frameRGB->data[0]+y * frameRGB->linesize[0], mWidth * 3 );
}
See the link for the specifics.
A simpler approach, I think:
void takeSnapshot(AVCodecContext* dec_ctx, AVFrame* frame)
{
SwsContext* img_convert_ctx;
img_convert_ctx = sws_getContext(dec_ctx->width,
dec_ctx->height,
dec_ctx->pix_fmt,
dec_ctx->width,
dec_ctx->height,
AV_PIX_FMT_RGB24,
SWS_BICUBIC, NULL, NULL, NULL);
AVFrame* frameRGB = av_frame_alloc();
avpicture_alloc((AVPicture*)frameRGB,
AV_PIX_FMT_RGB24,
dec_ctx->width,
dec_ctx->height);
sws_scale(img_convert_ctx,
frame->data,
frame->linesize, 0,
dec_ctx->height,
frameRGB->data,
frameRGB->linesize);
QImage image(frameRGB->data[0],
dec_ctx->width,
dec_ctx->height,
frameRGB->linesize[0],
QImage::Format_RGB888);
image.save("capture.png");
}
Today, I have tested directly pass the image->bit() to swscale and finally it works, so it doesn't need to copy to memory. For example:
/* 1. Get frame and QImage to show */
struct my_frame *frame = get_frame(source);
QImage *myImage = new QImage(dst_w, dst_h, QImage::Format_RGBA8888);
/* 2. Convert and write into image buffer */
uint8_t *dst[] = {myImage->bits()};
int linesizes[4];
av_image_fill_linesizes(linesizes, AV_PIX_FMT_RGBA, frame->width);
sws_scale(myswscontext, frame->data, (const int*)frame->linesize,
0, frame->height, dst, linesizes);
I just discovered that scanLine is just seeking thru the buffer.. all you need is use AV_PIX_FMT_RGB32 for the AVFrame and QImage::FORMAT_RGB32 for the QImage.
Then after decoding just do a memcpy
memcpy(img.scanLine(0), pFrameRGB->data[0], pFrameRGB->linesize[0] * pFrameRGB->height());
I had problems with the other proposed solutions as :
They did not mention freeing either AVFrame, SwsContext or the allocated buffers, which caused massive memory leaks (I had thousands of frames to handle). These problems couldn't all be solved easily as QImage relies on the underlying data, and does not copy it. If freeing the buffer directly, the QImage points to freed data and breaks. This could be solved by using QImage's cleanupFunction to free the buffer once the image is no longer needed, but with other problems it wasn't good anyways.
In some cases one of the suggestions, of passing QImage.bits directly to sws_scale, would not work as QImage are minimum 32 bit aligned. Therefore for certain dimensions it would not match the expected width by sws_scale and output each line shifted a little bit.
A third problem is that they used deprecated AVPicture elements.
I listed the problem in another question Converting an AVFrame to QImage with conversion of pixel format and in the end found a solution using a temporary buffer, which could be copied to the QImage, and then safely freed.
So see my answer for a fully working, efficient, and with no deprecated function calls, implementation : https://stackoverflow.com/a/68212609/7360943

Resources