I'm trying to implement geometry instancing in my engine using Uniform Buffer Objects.
The idea is simple. I fill all visible entities states to single UBO for current scene.
And then after commands sorting:
In case without instancing - I bind InstancingUBO range of current instance (glBindBufferRange)
With instancing - I bind 2 buffers: InstancingUBO in full range and second dynamic buffer InstancingIndexesUBO in full range.
Due to alignment requirements one instance entry has exact 256 bytes size.
Vertex pseudo shader looks like this:
"
#version 300 es
layout(location = LOCATION_POSITION) in vec4 mesh_position;
layout(location = LOCATION_NORMAL) in vec3 mesh_normal;
LAYOUT_LOCATION(4) out highp vec3 vertex_worldPosition;
LAYOUT_LOCATION(5) out mediump vec3 vertex_worldNormal;
#define MAX_INSTANCES 128
struct SInstance {
mat4 modelMatrix;
mat3 modelNormalMatrix;
vec4 weights;
uint features
uint layers;
uint Id;
float parameter;
vec4 _padding[7];
};
layout (std140) uniform InstanceUBO {
SInstance instance;
} objectUniforms;
layout (std140) uniform InstancingUBO {
SInstance instances[MAX_INSTANCES];
} objectUniformsInstances;
layout (std140) uniform InstancingIndexesUBO {
uvec4 instances[MAX_INSTANCES/4];
} instancesUniforms;
#ifdef HAS_INSTANCING
uint getInstanceIndex()
{
int inst = gl_InstanceID;
uint instIdx = instancesUniforms.instances[inst >> 2][inst & 3];
return instIdx;
}
mat4 getModelMatrix()
{
uint instance = getInstanceIndex();
return objectUniformsInstances.instances[instance].worldFromModelMatrix;
}
mat3 getNormalMatrix()
{
uint instance = getInstanceIndex();
return objectUniformsInstances.instances[instance].worldFromModelNormalMatrix;
}
#else
mat4 getModelMatrix()
{
return objectUniforms.instance.worldFromModelMatrix;
}
mat3 getNormalMatrix()
{
return objectUniforms.instance.worldFromModelNormalMatrix;
}
void main()
{
vertex_worldNormal = mesh_normal * getNormalMatrix();
vertex_worldPos = mesh_pos * getModelMatrix();
gl_Position = viewUniforms.viewProjMatrix * vertex_worldPos;
}
#endif
"
Not sure what the problem is here but it turned out the code is working completely different on different devices / platforms.
For example provided code works perfectly in GL4.5 Core OpenGL. It also work in WebGL2: FireFox, Mac Chrome and Mac Safary.
But it doesn't work in Windows Chrome and all Windows Chromium browsers (Edge, Opera). (Just missing instanced objects).
To make it working in Windows Chromium I have to change Instancing UBO declarations this way:
layout (std140) uniform InstancingUBO {
SInstance instances[2];
} objectUniformsInstances;
layout (std140) uniform InstancingIndexesUBO {
uvec4 instances[2];
} instancesUniforms;
So I decalre 2 instances for both instancing UBO's instead of 128.
But in this cases it stops working in Windows FireFox, Mac Chrome and Mac Safary.
And it still doesn't work in Mobile iOS Safary.
Looks like the problem is related to dynamic indexing of InstancingUBO.
Is this unsupported by the WebGL2 standard?
How to make instancing using UBO working on all platforms / browsers including mobile?
Related
I'm writing a renderer from scratch using openCL and I have a little compilation problem on my kernel with the error :
CL_BUILD_PROGRAM : error: program scope variable must reside in constant address space static float* objects;
The problem is that this program compiles on my desktop (with nvidia drivers) and doesn't work on my laptop (with nvidia drivers), also I have the exact same kernel file in another project that works fine on both computers...
Does anyone have an idea what I could be doing wrong ?
As a clarification, I'm coding a raymarcher which's kernel takes a list of objects "encoded" in a float array that is needed a lot in the program and that's why I need it accessible to the hole kernel.
Here is the kernel code simplified :
float* objects;
float4 getDistCol(float3 position) {
int arr_length = objects[0];
float4 distCol = {INFINITY, 0, 0, 0};
int index = 1;
while (index < arr_length) {
float objType = objects[index];
if (compare(objType, SPHERE)) {
// Treats the part of the buffer as a sphere
index += SPHERE_ATR_LENGTH;
} else if (compare(objType, PLANE)) {
//Treats the part of the buffer as a plane
index += PLANE_ATR_LENGTH;
} else {
float4 errCol = {500, 1, 0, 0};
return errCol;
}
}
}
__kernel void mkernel(__global int *image, __constant int *dimension,
__constant float *position, __constant float *aimDir, __global float *objs) {
objects = objs;
// Gets ray direction and stuf
// ...
// ...
float4 distCol = RayMarch(ro, rd);
float3 impact = rd*distCol.x + ro;
col = distCol.yzw * GetLight(impact);
image[dimension[0]*dimension[1] - idx*dimension[1]+idy] = toInt(col);
Where getDistCol(float3 position) gets called a lot by a lot of functions and I would like to avoid having to pass my float buffer to every function that needs to call getDistCol()...
There is no "static" variables allowed in OpenCL C that you can declare outside of kernels and use across kernels. Some compilers might still tolerate this, others might not. Nvidia has recently changed their OpenCL compiler from LLVM 3.4 to NVVM 7 in a driver update, so you may have the 2 different compilers on your desktop/laptop GPUs.
In your case, the solution is to hand the global kernel parameter pointer over to the function:
float4 getDistCol(float3 position, __global float *objects) {
int arr_length = objects[0]; // access objects normally, as you would in the kernel
// ...
}
kernel void mkernel(__global int *image, __constant int *dimension, __constant float *position, __constant float *aimDir, __global float *objs) {
// ...
getDistCol(position, objs); // hand global objs pointer over to function
// ...
}
Lonely variables out in the wild are only allowed as constant memory space, which is useful for large tables. They are cached in L2$, so read-only access is potentially faster. Example
constant float objects[1234] = {
1.0f, 2.0f, ...
};
Background:
Attempting to write a game where 'FTL' travel is unaffected by gravity and acceleration is instant.
How do I calculate where a planet will be, given the Kepler orbit for the planet and a ships current position and its maximum FTL speed. (in m/s)
I can get the position of the planet for a given DateTime, but I'm struggling to figure out how to calculate where a planet will be, and where to send the ship to, without chasing the planet around the orbit.
I would iterate...
compute distance between planet and ship current position
from that you compute how much time your ship need to meet the target if the target would be static (not moving). Lets call this time t.
compute planet position in actual_time+t and compute t for this new position
remember last t lets call it t0. Then compute new t in the same way as in #1 but for position of the planet after t.
loop #2
stop if fabs(t-t0)<accuracy.
This iterative solution should be closer to the finding t with each iteration unless your planet moves too fast and/or ship is really too far or too slow (initial t is significant part or even bigger than the planets tropical year). In such case You usually first jump into the star system and then jump to planet (Like in original Elite).
For obscure planetary movements (like too small orbital period) you would need different methods but realize that such case implies either planet very near to star or very heavy system central mass like black hole...
In code with constant FTL speed it would look like this:
vec3 pp,ps=vec3(?,?,?); // planet and ship positions
double t,t0,time;
time=actual_time(); t=0.0;
for (int i=0;i<100;i++) // just avoiding infinite loop in case t/planet_orbit_period>=~0.5
{
t0=t;
pp = planet_position(time+t);
t=Length(pp-ps)/ship_FTL_speed;
if (fabs(t-t0)*ship_FTL_speed<=ship_safe_FTL_distance) break;
}
This should converge pretty quickly (like 5-10 iterations should be enough) Now t should hold the time needed for travel and pp should hold the position your ship should head to. How ever if i>=100 no solution was found so you first need to go closer to the system, or use faster FTL or use different method but that should not be the case for common in stellar system FTL as the travel time should be far less then the targets orbital period...
btw. this might interest you:
Is it possible to make realistic n-body solar system simulation in matter of size and mass?
[Edit1] slower than FTL translation drive
I give it a bit of taught and change the algo a bit. First it check all the positions along whole planet period with some step (100 points per period) and remember the closest time to travel to the ship regardless of periods of planet passed during the travel. Then simply "recursively" check around best location with smaller and smaller angle step. Here Preview of result:
And updated source (full VCL app code so just use/port what you need and ignore the rest)
//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "win_main.h"
#include "GLSL_math.h" // just for vec3
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TMain *Main;
//---------------------------------------------------------------------------
// constants
const double deg=M_PI/180.0;
const double t_day=60.0*60.0*24.0;
// view
double view_x0=0.0;
double view_y0=0.0;
double zoom=1.0;
// simulation
double sim_t=0.0,sim_dt=0.01*t_day;
//---------------------------------------------------------------------------
void toscr(double &x,double &y)
{
x*=zoom; x+=view_x0;
y*=zoom; y+=view_y0;
}
//---------------------------------------------------------------------------
class planet // Kepler body simplified to 2D axis aligned. For fully 3D orbit add mising orbital parameters and equations
{
public:
// input parameters
double a,b,t0,T; // major axis,minor axis, time where M=E=0.0 deg, orbital period
// computet parameters
double c1,c2,e;
void ld(double _a,double _b,double _t0,double _T)
{
// copy input orbital parameters
a=_a;
b=_b;
t0=_t0;
T=_T;
// prepare orbital constants
e=1.0-((b*b)/(a*a)); // eccentricity
if (e>=1.0) e=0; // wrong e
c1=sqrt((1.0+e)/(1.0-e)); // some helper constants computation
c2=a*(1-e*e);
//b=a*sqrt(1.0-e);
}
vec3 position(double t) // actual position relative to center mass of the system
{
int q;
vec3 p;
double E,V,r,M;
// compute mean orbital position M [rad] from time t
M=(t-t0)/T;
M-=floor(M);
M*=2.0*M_PI;
// compute real orbital position E [rad] from M
for (E=M,q=0;q<20;q++) E=M+e*sin(E);// Kepler's equation
// heliocentric ellipse
V=2.0*atan(c1*tan(E/2.0));
r=c2/(1.0+e*cos(V));
p.x=r*cos(V);
p.y=r*sin(V);
p.z=0.0;
return p;
}
void draw_orbit(TCanvas *scr)
{
int i;
double ang,x,y,r,V,E;
x=a; y=0; toscr(x,y);
for (i=2,E=0.0;i;E+=3.6*deg)
{
if (E>=2.0*M_PI) { E=0.0; i=0; }
V=2.0*atan(c1*tan(E/2.0));
r=c2/(1.0+e*cos(V));
x=r*cos(V);
y=r*sin(V);
toscr(x,y);
if (i==2){ scr->MoveTo(x,y); i=1; }
else scr->LineTo(x,y);
}
}
};
//---------------------------------------------------------------------------
class ship // Space ship with translation propulsion
{
public:
vec3 pos,dir; // position and translation direction
double spd,tim; // translation speed and time to translate or 0.0 if no translation
ship() { pos=vec3(0.0,0.0,0.0); dir=pos; spd=0.0; tim=0.0; }
void update(double dt) // simulate dt time step has passed
{
if (tim<=0.0) return;
if (dt>tim) { dt=tim; tim=0.0; }
else tim-=dt;
pos+=spd*dt*dir;
}
void intercept(planet &pl) // set course for planet pl intercept
{
if (spd<=0.0) { tim=0.0; return; }
const double d=1000000.0; // safe distance to target
/*
// [Iteration]
int i;
vec3 p;
double t0;
for (tim=0.0,i=0;i<100;i++)
{
t0=tim;
p=pl.position(sim_t+tim);
tim=length(p-pos)/spd;
if (fabs(tim-t0)*spd<=d) break;
}
dir=normalize(p-pos);
*/
// [search]
vec3 p;
int i;
double tt,t,dt,a0,a1,T;
// find orbital position with min error (coarse)
for (a1=-1.0,t=0.0,dt=0.01*pl.T;t<pl.T;t+=dt)
{
p=pl.position(sim_t+t); // try time t
tt=length(p-pos)/spd;
a0=tt-t; if (a0<0.0) continue; // ignore overshoots
a0/=pl.T; // remove full periods from the difference
a0-=floor(a0);
a0*=pl.T;
if ((a0<a1)||(a1<0.0)) { a1=a0; tim=tt; } // remember best option
}
// find orbital position with min error (fine)
for (i=0;i<3;i++) // recursive increase of accuracy
for (a1=-1.0,t=tim-dt,T=tim+dt,dt*=0.1;t<T;t+=dt)
{
p=pl.position(sim_t+t); // try time t
tt=length(p-pos)/spd;
a0=tt-t; if (a0<0.0) continue; // ignore overshoots
a0/=pl.T; // remove full periods from the difference
a0-=floor(a0);
a0*=pl.T;
if ((a0<a1)||(a1<0.0)) { a1=a0; tim=tt; } // remember best option
}
// direction
p=pl.position(sim_t+tim);
dir=normalize(p-pos);
}
};
//---------------------------------------------------------------------------
planet pl;
ship sh;
//---------------------------------------------------------------------------
void TMain::draw()
{
if (!_redraw) return;
double x,y,r=3;
vec3 p;
// clear buffer
bmp->Canvas->Brush->Color=clBlack;
bmp->Canvas->FillRect(TRect(0,0,xs,ys));
// Star
bmp->Canvas->Pen->Color=clYellow;
bmp->Canvas->Brush->Color=clYellow;
x=0; y=0; toscr(x,y);
bmp->Canvas->Ellipse(x-r,y-r,x+r,y+r);
// planet
bmp->Canvas->Pen->Color=clDkGray;
pl.draw_orbit(bmp->Canvas);
bmp->Canvas->Pen->Color=clAqua;
bmp->Canvas->Brush->Color=clAqua;
p=pl.position(sim_t);
x=p.x; y=p.y; toscr(x,y);
bmp->Canvas->Ellipse(x-r,y-r,x+r,y+r);
// ship
bmp->Canvas->Pen->Color=clRed;
bmp->Canvas->Brush->Color=clRed;
p=sh.pos;
x=p.x; y=p.y; toscr(x,y);
bmp->Canvas->Ellipse(x-r,y-r,x+r,y+r);
// render backbuffer
Main->Canvas->Draw(0,0,bmp);
_redraw=false;
}
//---------------------------------------------------------------------------
__fastcall TMain::TMain(TComponent* Owner) : TForm(Owner)
{
pl.ld(1000000000.0,350000000.0,0.0,50.0*t_day);
sh.pos=vec3(-3500000000.0,-800000000.0,0.0);
sh.spd=500.0; // [m/s]
sh.intercept(pl);
bmp=new Graphics::TBitmap;
bmp->HandleType=bmDIB;
bmp->PixelFormat=pf32bit;
pyx=NULL;
_redraw=true;
}
//---------------------------------------------------------------------------
void __fastcall TMain::FormDestroy(TObject *Sender)
{
if (pyx) delete[] pyx;
delete bmp;
}
//---------------------------------------------------------------------------
void __fastcall TMain::FormResize(TObject *Sender)
{
xs=ClientWidth; xs2=xs>>1;
ys=ClientHeight; ys2=ys>>1;
bmp->Width=xs;
bmp->Height=ys;
if (pyx) delete[] pyx;
pyx=new int*[ys];
for (int y=0;y<ys;y++) pyx[y]=(int*) bmp->ScanLine[y];
_redraw=true;
view_x0=xs-(xs>>3);
view_y0=ys2;
zoom=double(xs2)/(2.5*pl.a);
// draw(); Sleep(5000);
}
//---------------------------------------------------------------------------
void __fastcall TMain::FormPaint(TObject *Sender)
{
_redraw=true;
}
//---------------------------------------------------------------------------
void __fastcall TMain::tim_redrawTimer(TObject *Sender)
{
for (int i=0;i<10;i++)
{
sh.update(sim_dt);
sim_t+=sim_dt;
if (sh.tim<=0.0) sim_dt=0.0; // stop simulation when jump done
}
if (sim_dt>0.0) _redraw=true;
draw();
}
//---------------------------------------------------------------------------
The important stuff is in the two classes planet,ship then sim_t is actual simulated time and sim_dt is simulated time step. In this case simulation stops after ship reach its destination. The ship::tim is the time of travel left computed in the ship::intercept() along with direction for preset speed. The update should be called on each simulated time step ...
I've been trying for a while to get support for softbodies in my project,
I have already added all primitives, including static triangle meshes as you can see below:
I've now been trying to implement the softbodies.
I do have triangle shapes as I mentioned, and I thought I could re-use the triangulation code to
create softbody objects with the function:
btSoftBody* psb = btSoftBodyHelpers::CreateFromTriMesh(.....);
I successfully did this with the bunny mesh that's hardcoded, but now I want to insert any trinangulated mesh into this function.
But I'm a bit lost figuring out exactly what parameters to send in (how to get the right parameters from my triangulated mesh).
Do anyone of you have a example of this? (not a hardcoded one, but from a
btTriangleMesh *mTriMesh = new btTriangleMesh();
type object? )
It does work with the predefined type shapes that bullet has, so my update loop and all that works fine.
This is for version 2.81 (assuming vertices are stored as PHY_FLOAT and indices as PHY_INTEGER):
btTriangleMesh *mTriMesh = new btTriangleMesh();
// ...
const btVector3 meshScaling = mTriMesh->getScaling();
btAlignedObjectArray<btScalar> vertices;
btAlignedObjectArray<int> triangles;
for (int part=0;part< mTriMesh->getNumSubParts(); part++)
{
const unsigned char * vertexbase;
const unsigned char * indexbase;
int indexstride;
int stride,numverts,numtriangles;
PHY_ScalarType type, gfxindextype;
mTriMesh->getLockedReadOnlyVertexIndexBase(&vertexbase,numverts,type,stride,&indexbase,indexstride,numtriangles,gfxindextype,part);
for (int gfxindex=0; gfxindex < numverts; gfxindex++)
{
float* graphicsbase = (float*)(vertexbase+gfxindex*stride);
vertices.push_back(graphicsbase[0]*meshScaling.getX());
vertices.push_back(graphicsbase[1]*meshScaling.getY());
vertices.push_back(graphicsbase[2]*meshScaling.getZ());
}
for (int gfxindex=0;gfxindex < numtriangles; gfxindex++)
{
unsigned int* tri_indices= (unsigned int*)(indexbase+gfxindex*indexstride);
triangles.push_back(tri_indices[0]);
triangles.push_back(tri_indices[1]);
triangles.push_back(tri_indices[2]);
}
}
btSoftBodyWorldInfo worldInfo;
// Setup worldInfo...
// ....
btSoftBodyHelper::CreateFromTriMesh(worldInfo, &vertices[0], &triangles[0], triangles.size()/3 /*, randomizeConstraints = true*/);
A slower, more general approach is to iterate the mesh using mTriMesh->InternalProcessAllTriangles() but that will make your mesh a soup.
Writing some signal processing in CUDA I recently made huge progress in optimizing it. By using 1D textures and adjusting my access patterns I managed to get a 10× performance boost. (I previously tried transaction aligned prefetching from global into shared memory, but the nonuniform access patterns happening later messed up the warp→shared cache bank association (I think)).
So now I'm facing the problem, how CUDA textures and bindings interact with asynchronous memcpy.
Consider the following kernel
texture<...> mytexture;
__global__ void mykernel(float *pOut)
{
pOut[threadIdx.x] = tex1Dfetch(texture, threadIdx.x);
}
The kernel is launched in multiple streams
extern void *sourcedata;
#define N_CUDA_STREAMS ...
cudaStream stream[N_CUDA_STREAMS];
void *d_pOut[N_CUDA_STREAMS];
void *d_texData[N_CUDA_STREAMS];
for(int k_stream = 0; k_stream < N_CUDA_STREAMS; k_stream++) {
cudaStreamCreate(stream[k_stream]);
cudaMalloc(&d_pOut[k_stream], ...);
cudaMalloc(&d_texData[k_stream], ...);
}
/* ... */
for(int i_datablock; i_datablock < n_datablocks; i_datablock++) {
int const k_stream = i_datablock % N_CUDA_STREAMS;
cudaMemcpyAsync(d_texData[k_stream], (char*)sourcedata + i_datablock * blocksize, ..., stream[k_stream]);
cudaBindTexture(0, &mytexture, d_texData[k_stream], ...);
mykernel<<<..., stream[k_stream]>>>(d_pOut);
}
Now what I wonder about is, since there is only one texture reference, what happens when I bind a buffer to a texture while other streams' kernels access that texture? cudaBindStream doesn't take a stream parameter, so I'm worried that by binding the texture to another device pointer while running kernels are asynchronously accessing said texture I'll divert their accesses to the other data.
The CUDA documentation doesn't tell anything about this. If have to to disentangle this to allow concurrent access, it seems I'd have to create a number of texture references and use a switch statementto chose between them, based on the stream number passed as a kernel launch parameter.
Unfortunately CUDA doesn't allow to put arrays of textures on the device side, i.e. the following does not work:
texture<...> texarray[N_CUDA_STREAMS];
Layered textures are not an option, because the amount of data I have only fits within a plain 1D texture not bound to a CUDA array (see table F-2 in the CUDA 4.2 C Programming Guide).
Indeed you cannot unbind the texture while still using it in a different stream.
Since the number of streams doesn't need to be large to hide the asynchronous memcpys (2 would already do), you could use C++ templates to give each stream its own texture:
texture<float, 1, cudaReadModeElementType> mytexture1;
texture<float, 1, cudaReadModeElementType> mytexture2;
template<int TexSel> __device__ float myTex1Dfetch(int x);
template<> __device__ float myTex1Dfetch<1>(int x) { return tex1Dfetch(mytexture1, x); }
template<> __device__ float myTex1Dfetch<2>(int x) { return tex1Dfetch(mytexture2, x); }
template<int TexSel> __global__ void mykernel(float *pOut)
{
pOut[threadIdx.x] = myTex1Dfetch<TexSel>(threadIdx.x);
}
int main(void)
{
float *out_d[2];
// ...
mykernel<1><<<blocks, threads, stream[0]>>>(out_d[0]);
mykernel<2><<<blocks, threads, stream[1]>>>(out_d[1]);
// ...
}
I am trying to write a modern OpenGL (programmable pipeline) program using Qt SDK .Qt OpenGL examples show only the fixed pipeline implementation.The documentation on how to initialize Shader Program is very poor.This is the best example on how to setup a shader program and load shaders they have:http://doc.trolltech.com/4.6/qglshaderprogram.html#details
This is not very descriptive as one can see.
I tried to follow this doc and cann't get the Shader program working .Getting segmentation error when the program tries to assign attributes to the shaders.I think the problem is that I access the context in the wrong way.But I can't find any reference on how to setup or retrieve the rendering context.My code goes like this:
static GLfloat const triangleVertices[] = {
60.0f, 10.0f, 0.0f,
110.0f, 110.0f, 0.0f,
10.0f, 110.0f, 0.0f
};
QColor color(0, 255, 0, 255);
int vertexLocation =0;
int matrixLocation =0;
int colorLocation =0;
QGLShaderProgram *pprogram=0;
void OpenGLWrapper::initShaderProgram(){
QGLContext context(QGLFormat::defaultFormat());
QGLShaderProgram program(context.currentContext());
pprogram=&program;
program.addShaderFromSourceCode(QGLShader::Vertex,
"attribute highp vec4 vertex;\n"
"attribute mediump mat4 matrix;\n"
"void main(void)\n"
"{\n"
" gl_Position = matrix * vertex;\n"
"}");
program.addShaderFromSourceCode(QGLShader::Fragment,
"uniform mediump vec4 color;\n"
"void main(void)\n"
"{\n"
" gl_FragColor = color;\n"
"}");
program.link();
program.bind();
vertexLocation= pprogram->attributeLocation("vertex");
matrixLocation= pprogram->attributeLocation("matrix");
colorLocation= pprogram->uniformLocation("color");
}
And here is the rendering loop:
void OpenGLWrapper::paintGL()
{
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
QMatrix4x4 pmvMatrix;
pmvMatrix.ortho(rect());
pprogram->enableAttributeArray(vertexLocation);
pprogram->setAttributeArray(vertexLocation, triangleVertices, 3);
pprogram->setUniformValue(matrixLocation, pmvMatrix);
pprogram->setUniformValue(colorLocation, color);
glDrawArrays(GL_TRIANGLES, 0, 3);
pprogram->disableAttributeArray(vertexLocation);
}
Anybody has can help with this setup? Thanks a lot .
You create a local program variable and let your pprogram pointer point to its address. But when initShaderProgram returns, the local program's lifetime ends and you pprogram points to garbage, therefore the segfault when you try to use it. You should rather create the program dynamically and let Qt handle the memory management:
pprogram = new QGLShaderProgram(context.currentContext(), this);
This assumes OpenGLWrapper derives somewhoe from QObject, if not, then you need to delete the program in its destructor manually (or use some smart pointer, or whatever).
Otherwise your initialization code looks quite reasonable. Your matrix variable should be a uniform and not an attribute, but I'm willing to classfiy this as a typo. You should also not bind the program for the whole lifetime, as this is equivalent to a call to glUseProgram. You should rather use bind (and release, which does glUseProgram(0)) in your render routine.
In my experience the Qt wrappers for OpenGL objects are rather poor and limited, I just made a thin-wrapper for straight OpenGL objects (made cross-platform and easy via GLEW), and made the usual OpenGL calls in QGLWidget. It worked no problem, after struggling for awhile with Qt's equivalents.