I want to draw plane that picture.
now I try vertex buffer and DrawPrimitive is D3DPT_LINESTRIP. but not effect I want.
so any way more than effective that???
please give me some advice. thank you.
This could be an option, is not the optimal but would achive that grid
void DrawGrid (float32 Size, CColor Color, int32 GridX, int32 GridZ)
{
// Check if the size of the grid is null
if( Size <= 0 )
return;
// Calculate the data
DWORD grid_color_aux = Color.GetUint32Argb();
float32 GridXStep = Size / GridX;
float32 GridZStep = Size / GridZ;
float32 halfSize = Size * 0.5f;
// Set the attributes to the paint device
m_pD3DDevice->SetTexture(0,NULL);
m_pD3DDevice->SetFVF(CUSTOMVERTEX::getFlags());
// Draw the lines of the X axis
for( float32 i = -halfSize; i <= halfSize ; i+= GridXStep )
{
CUSTOMVERTEX v[] =
{ {i, 0.0f, -halfSize, grid_color_aux}, {i, 0.0f, halfSize, grid_color_aux} };
m_pD3DDevice->DrawPrimitiveUP( D3DPT_LINELIST,1, v,sizeof(CUSTOMVERTEX));
}
// Draw the lines of the Z axis
for( float32 i = -halfSize; i <= halfSize ; i+= GridZStep )
{
CUSTOMVERTEX v[] =
{ {-halfSize, 0.0f, i, grid_color_aux}, {halfSize, 0.0f, i, grid_color_aux} };
m_pD3DDevice->DrawPrimitiveUP( D3DPT_LINELIST,1, v,sizeof(CUSTOMVERTEX));
}
}
The CUSTOMVERTEX struct:
struct CUSTOMVERTEX
{
float32 x, y, z;
DWORD color;
static unsigned int getFlags()
{
return D3DFVF_CUSTOMVERTEX;
}
};
Note: Is only a grid with lines, so you need to draw a solid plane, in order to get a look like result as you want.
You can use DrawPrimitive with D3DPT_TRIANGLESTRIP for a plane. Then draw the indexed lines after with D3DPT_LINELIST with a depth bias. This way, even if they lie on the plane, you won't get any z-fighting.
I will introduce you a book Introduction to 3D programming with DirectX,it has a great
detail on how to do this in Chapter 8,Section 4.
Related
Update
See rationale at the end of my question below
Using WebGL2 I can access a texel by its denormalized coordinates (sorry don't the right lingo for this). That means I don't have to scale them down to 0-1 like I do in texture2D().
However the input to the fragment shader is still the vec2/3 in normalized values.
Is there a way to declare in/out variables in the Vertex and Frag shaders so that I don't have to scale the coordinates?
somewhere in vertex shader:
...
out vec2 TextureCoordinates;
somewhere in frag shader:
...
in vec2 TextureCoordinates;
I would like for TextureCoordinates to be ivec2 and already scaled.
This question and all my other questions on webgl related to general computing using WebGL. We are trying to do tensor (multi-D matrix) operations using WebGL.
We map our data in a few ways to a Texture. The simplest approach we follow is -- assuming we can access our data as a flat array -- to lay it out along the texture's width and go up the texture's height until we're done.
Since our thinking, logic, and calculations are all based on tensor/matrix indices -- inside the fragment shader -- we'd have to map back to/from the X-Y texture coordinates to indices. The intermediate step here is to calculate an offset for a given position of a texel. Then from that offset we can calculate the matrix indices from its strides.
Calculating an offset in webgl 1 for very large textures seems to be taking much longer than webgl2 using the integer coordinates. See below:
WebGL 1 offset calculation
int coordsToOffset(vec2 coords, int width, int height) {
float s = coords.s * float(width);
float t = coords.t * float(height);
int offset = int(t) * width + int(s);
return offset;
}
vec2 offsetToCoords(int offset, int width, int height) {
int t = offset / width;
int s = offset - t*width;
vec2 coords = (vec2(s,t) + vec2(0.5,0.5)) / vec2(width, height);
return coords;
}
WebGL 2 offset calculation in the presence of int coords
int coordsToOffset(ivec2 coords, int width) {
return coords.t * width + coords.s;
}
ivec2 offsetToCoords(int offset, int width) {
int t = offset / width;
int s = offset - t*width;
return ivec2(s,t);
}
It should be clear that for a series of large texture operations we're saving hundreds of thousands of operations just on the offset/coords calculation.
It's not clear why you want do what you're trying to do. It would be better to ask something like "I'm trying to draw an image/implement post processing glow/do ray tracing/... and to do that I want to use un-normalized texture coordinates because " and then we can tell you if your solution is going to work and how to solve it.
In any case, passing int or unsigned int or ivec2/3/4 or uvec2/3/4 as a varying is supported but not interpolation. You have to declare them as flat.
Still, you can pass un-normalized values as float or vec2/3/4 and the convert to int, ivec2/3/4 in the fragment shader.
The other issue is you'll get no sampling using texelFetch, the function that takes texel coordinates instead of normalized texture coordinates. It just returns the exact value of a single pixel. It does not support filtering like the normal texture function.
Example:
function main() {
const gl = document.querySelector('canvas').getContext('webgl2');
if (!gl) {
return alert("need webgl2");
}
const vs = `
#version 300 es
in vec4 position;
in ivec2 texelcoord;
out vec2 v_texcoord;
void main() {
v_texcoord = vec2(texelcoord);
gl_Position = position;
}
`;
const fs = `
#version 300 es
precision mediump float;
in vec2 v_texcoord;
out vec4 outColor;
uniform sampler2D tex;
void main() {
outColor = texelFetch(tex, ivec2(v_texcoord), 0);
}
`;
// compile shaders, link program, look up locations
const programInfo = twgl.createProgramInfo(gl, [vs, fs]);
// create buffers via gl.createBuffer, gl.bindBuffer, gl.bufferData)
const bufferInfo = twgl.createBufferInfoFromArrays(gl, {
position: {
numComponents: 2,
data: [
-.5, -.5,
.5, -.5,
0, .5,
],
},
texelcoord: {
numComponents: 2,
data: new Int32Array([
0, 0,
15, 0,
8, 15,
]),
}
});
// make a 16x16 texture
const ctx = document.createElement('canvas').getContext('2d');
ctx.canvas.width = 16;
ctx.canvas.height = 16;
for (let i = 23; i > 0; --i) {
ctx.fillStyle = `hsl(${i / 23 * 360 | 0}, 100%, ${i % 2 ? 25 : 75}%)`;
ctx.beginPath();
ctx.arc(8, 15, i, 0, Math.PI * 2, false);
ctx.fill();
}
const tex = twgl.createTexture(gl, { src: ctx.canvas });
gl.useProgram(programInfo.program);
twgl.setBuffersAndAttributes(gl, programInfo, bufferInfo);
// no need to set uniforms since they default to 0
// and only one texture which is already on texture unit 0
gl.drawArrays(gl.TRIANGLES, 0, 3);
}
main();
<canvas></canvas>
<script src="https://twgljs.org/dist/4.x/twgl-full.min.js"></script>
So in response to your updated question it's still not clear what you want to do. Why do you want to pass varyings to the fragment shader? Can't you just do whatever math you want in the fragment shader itself?
Example:
uniform sampler2D tex;
out float result;
// some all the values in the texture
vec4 sum4 = vec4(0);
ivec2 texDim = textureSize(tex, 0);
for (int y = 0; y < texDim.y; ++y) {
for (int x = 0; x < texDim.x; ++x) {
sum4 += texelFetch(tex, ivec2(x, y), 0);
}
}
result = sum4.x + sum4.y + sum4.z + sum4.w;
Example2
uniform isampler2D indices;
uniform sampler2D data;
out float result;
// some only values in data pointed to by indices
vec4 sum4 = vec4(0);
ivec2 texDim = textureSize(indices, 0);
for (int y = 0; y < texDim.y; ++y) {
for (int x = 0; x < texDim.x; ++x) {
ivec2 index = texelFetch(indices, ivec2(x, y), 0).xy;
sum4 += texelFetch(tex, index, 0);
}
}
result = sum4.x + sum4.y + sum4.z + sum4.w;
Note that I'm also not an expert in GPGPU but I have an hunch the code above is not the fastest way because I believe parallelization happens based on output. The code above has only 1 output so no parallelization? It would be easy to change so that it takes a block ID, tile ID, area ID as input and computes just the sum for that area. Then you'd write out a larger texture with the sum of each block and finally sum the block sums.
Also, dependant and non-uniform texture reads are a known perf issue. The first example reads the texture in order. That's cache friendly. The second example reads the texture in a random order (specified by indices), that's not cache friendly.
I'm trying to blur QImage alpha channel. My current implementation use deprecated 'alphaChannel' method and works slow.
QImage blurImage(const QImage & image, double radius)
{
QImage newImage = image.convertToFormat(QImage::Format_ARGB32);
QImage alpha = newImage.alphaChannel();
QImage blurredAlpha = alpha;
for (int x = 0; x < alpha.width(); x++)
{
for (int y = 0; y < alpha.height(); y++)
{
uint color = calculateAverageAlpha(x, y, alpha, radius);
blurredAlpha.setPixel(x, y, color);
}
}
newImage.setAlphaChannel(blurredAlpha);
return newImage;
}
I was also trying to implement it using QGraphicsBlurEffect, but it doesn't affect alpha.
What is proper way to blur QImage alpha channel?
I have faced a similar question about pixel read\write access :
Invert your loops. An image is laid out in memory as a succession of rows. So you should access first by height then by width
Use QImage::scanline to access data, rather than expensives QImage::pixel and QImage::setPixel. Pixels in a scan (aka row) are guaranteed to be consecutive.
Your code will look like :
for (int ii = 0; ii < image.height(); ii++) {
uchar* scan = image.scanLine(ii);
int depth =4;
for (int jj = 0; jj < image.width(); jj++) {
//it is in fact an rgba
QRgb* rgbpixel = reinterpret_cast<QRgb*>(scan + jj*depth);
QColor color(*rgbpixel);
int alpha = calculateAverageAlpha(ii, jj, color, image);
color.setAlpha(alpha);
//write
*rgbpixel = color.rgba();
}
}
You can go further and optimize the computation of the alpha average. Lets look at the sum of pixel in a radius. The sum of alpha value at (x,y) in the radius is s(x,y). When you move one pixel in either direction, a single line is added while a single line is removed. lets say you move horizontally. if l(x,y) is the sum of the vertical line of length 2*radius centered around (x,y), you have
s(x + 1, y) = s(x, y) + l(x + r + 1, y) - l(x - r, y)
Which allow you to efficiently compute a matrix of sum (then average, by dividing with the number of pixel) in a first pass.
I suspect this kind of optimization is already implemented in a much better way in libraries such as opencv. So I would encourage you to use existing opencv functions if you wish to save time.
I have just started getting into OpenCL and going through the basics of writing a kernel code. I have written a kernel code for calculating shuffled keys for points array. So, for a number of points N, the shuffled keys are calculated in 3-bit fashion, where x-bit at depth d (0
xd = 0 if p.x < Cd.x
xd = 1, otherwise
The Shuffled xyz key is given as:
x1y1z1x2y2z2...xDyDzD
The Kernel code written is given below. The point is inputted in a column major format.
__constant float3 boundsOffsetTable[8] = {
{-0.5,-0.5,-0.5},
{+0.5,-0.5,-0.5},
{-0.5,+0.5,-0.5},
{-0.5,-0.5,+0.5},
{+0.5,+0.5,-0.5},
{+0.5,-0.5,+0.5},
{-0.5,+0.5,+0.5},
{+0.5,+0.5,+0.5}
};
uint setBit(uint x,unsigned char position)
{
uint mask = 1<<position;
return x|mask;
}
__kernel void morton_code(__global float* point,__global uint*code,int level, float3 center,float radius,int size){
// Get the index of the current element to be processed
int i = get_global_id(0);
float3 pt;
pt.x = point[i];pt.y = point[size+i]; pt.z = point[2*size+i];
code[i] = 0;
float3 newCenter;
float newRadius;
if(pt.x>center.x) code = setBit(code,0);
if(pt.y>center.y) code = setBit(code,1);
if(pt.z>center.z) code = setBit(code,2);
for(int l = 1;l<level;l++)
{
for(int i=0;i<8;i++)
{
newRadius = radius *0.5;
newCenter = center + boundOffsetTable[i]*radius;
if(newCenter.x-newRadius<pt.x && newCenter.x+newRadius>pt.x && newCenter.y-newRadius<pt.y && newCenter.y+newRadius>pt.y && newCenter.z-newRadius<pt.z && newCenter.z+newRadius>pt.z)
{
if(pt.x>newCenter.x) code = setBit(code,3*l);
if(pt.y>newCenter.y) code = setBit(code,3*l+1);
if(pt.z>newCenter.z) code = setBit(code,3*l+2);
}
}
}
}
It works but I just wanted to ask if I am missing something in the code and if there is an way to optimize the code.
Try this kernel:
__kernel void morton_code(__global float* point,__global uint*code,int level, float3 center,float radius,int size){
// Get the index of the current element to be processed
int i = get_global_id(0);
float3 pt;
pt.x = point[i];pt.y = point[size+i]; pt.z = point[2*size+i];
uint res;
res = 0;
float3 newCenter;
float newRadius;
if(pt.x>center.x) res = setBit(res,0);
if(pt.y>center.y) res = setBit(res,1);
if(pt.z>center.z) res = setBit(res,2);
for(int l = 1;l<level;l++)
{
for(int i=0;i<8;i++)
{
newRadius = radius *0.5;
newCenter = center + boundOffsetTable[i]*radius;
if(newCenter.x-newRadius<pt.x && newCenter.x+newRadius>pt.x && newCenter.y-newRadius<pt.y && newCenter.y+newRadius>pt.y && newCenter.z-newRadius<pt.z && newCenter.z+newRadius>pt.z)
{
if(pt.x>newCenter.x) res = setBit(res,3*l);
if(pt.y>newCenter.y) res = setBit(res,3*l+1);
if(pt.z>newCenter.z) res = setBit(res,3*l+2);
}
}
}
//Save the result
code[i] = res;
}
Rules to optimize:
Avoid Global memory (you were using "code" directly from global memory, I changed that), you should see 3x increase in performance now.
Avoid Ifs, use "select" instead if it is possible. (See OpenCL documentation)
Use more memory inside the kernel. You don't need to operate at bit level. Operation at int level would be better and could avoid huge amount of calls to "setBit". Then you can construct your result at the end.
Another interesting thing. Is that if you are operating at 3D level, you can just use float3 variables and compute the distances with OpenCL operators. This can increase your performance quite a LOT. BUt also requires a complete rewrite of your kernel.
Trying to use the same code (sort of) as what I have used when running using TBB (threading building blocks).
I don't have a great deal of experience with OpenCL, but I think most of the main code is correct. I believe the errors are in the .cl file, where it does the math.
Here is my mandelbrot code in TBB:
Mandelbrot TBB
Here is my code in OpenCL
Mandelbrot OpenCL
Any help would be greatly appreciated.
I changed the code in the kernel, and it ran fine. My new kernel code is the following:
// voronoi kernels
//
// local memory version
//
kernel void voronoiL(write_only image2d_t outputImage)
{
// get id of element in array
int x = get_global_id(0);
int y = get_global_id(1);
int w = get_global_size(0);
int h = get_global_size(1);
float4 result = (float4)(0.0f,0.0f,0.0f,1.0f);
float MinRe = -2.0f;
float MaxRe = 1.0f;
float MinIm = -1.5f;
float MaxIm = MinIm+(MaxRe-MinRe)*h/w;
float Re_factor = (MaxRe-MinRe)/(w-1);
float Im_factor = (MaxIm-MinIm)/(h-1);
float MaxIterations = 50;
//C imaginary
float c_im = MaxIm - y*Im_factor;
//C real
float c_re = MinRe + x*Re_factor;
//Z real
float Z_re = c_re, Z_im = c_im;
bool isInside = true;
bool col2 = false;
bool col3 = false;
int iteration =0;
for(int n=0; n<MaxIterations; n++)
{
// Z - real and imaginary
float Z_re2 = Z_re*Z_re, Z_im2 = Z_im*Z_im;
//if Z real squared plus Z imaginary squared is greater than c squared
if(Z_re2 + Z_im2 > 4)
{
if(n >= 0 && n <= (MaxIterations/2-1))
{
col2 = true;
isInside = false;
break;
}
else if(n >= MaxIterations/2 && n <= MaxIterations-1)
{
col3 = true;
isInside = false;
break;
}
}
Z_im = 2*Z_re*Z_im + c_im;
Z_re = Z_re2 - Z_im2 + c_re;
iteration++;
}
if(col2)
{
result = (float4)(iteration*0.05f,0.0f, 0.0f, 1.0f);
}
else if(col3)
{
result = (float4)(255, iteration*0.05f, iteration*0.05f, 1.0f);
}
else if(isInside)
{
result = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
}
write_imagef(outputImage, (int2)(x, y), result);
}
You can also find it here:
https://docs.google.com/file/d/0B6DBARvnB__iUjNSTWJubFhUSDA/edit
See this link. It's developed by #eric-bainville. The CPU code both native and with OpenCL is not optimal (it does not use SSE/AVX) but I think the GPU code may be good. For the CPU you can speed up the code quite a bit by using AVX and operating on eight pixels at once.
http://www.bealto.com/mp-mandelbrot.html
I am trying to display a mathematical surface f(x,y) defined on a XY regular mesh using OpenGL and C++ in an effective manner:
struct XYRegularSurface {
double x0, y0;
double dx, dy;
int nx, ny;
XYRegularSurface(int nx_, int ny_) : nx(nx_), ny(ny_) {
z = new float[nx*ny];
}
~XYRegularSurface() {
delete [] z;
}
float& operator()(int ix, int iy) {
return z[ix*ny + iy];
}
float x(int ix, int iy) {
return x0 + ix*dx;
}
float y(int ix, int iy) {
return y0 + iy*dy;
}
float zmin();
float zmax();
float* z;
};
Here is my OpenGL paint code so far:
void color(QColor & col) {
float r = col.red()/255.0f;
float g = col.green()/255.0f;
float b = col.blue()/255.0f;
glColor3f(r,g,b);
}
void paintGL_XYRegularSurface(XYRegularSurface &surface, float zmin, float zmax) {
float x, y, z;
QColor col;
glBegin(GL_QUADS);
for(int ix = 0; ix < surface.nx - 1; ix++) {
for(int iy = 0; iy < surface.ny - 1; iy++) {
x = surface.x(ix,iy);
y = surface.y(ix,iy);
z = surface(ix,iy);
col = rainbow(zmin, zmax, z);color(col);
glVertex3f(x, y, z);
x = surface.x(ix + 1, iy);
y = surface.y(ix + 1, iy);
z = surface(ix + 1,iy);
col = rainbow(zmin, zmax, z);color(col);
glVertex3f(x, y, z);
x = surface.x(ix + 1, iy + 1);
y = surface.y(ix + 1, iy + 1);
z = surface(ix + 1,iy + 1);
col = rainbow(zmin, zmax, z);color(col);
glVertex3f(x, y, z);
x = surface.x(ix, iy + 1);
y = surface.y(ix, iy + 1);
z = surface(ix,iy + 1);
col = rainbow(zmin, zmax, z);color(col);
glVertex3f(x, y, z);
}
}
glEnd();
}
The problem is that this is slow, nx=ny=1000 and fps ~= 1.
How do I optimize this to be faster?
EDIT: following your suggestion (thanks!) regarding VBO
I added:
float* XYRegularSurface::xyz() {
float* data = new float[3*nx*ny];
long i = 0;
for(int ix = 0; ix < nx; ix++) {
for(int iy = 0; iy < ny; iy++) {
data[i++] = x(ix,iy);
data[i++] = y(ix,iy);
data[i] = z[i]; i++;
}
}
return data;
}
I think I understand how I can create a VBO, initialize it to xyz() and send it to the GPU in one go, but how do I use the VBO when drawing. I understand that this can either be done in the vertex shader or by glDrawElements? I assume the latter is easier? If so: I do not see any QUAD mode in the documentation for glDrawElements!?
Edit2:
So I can loop trough all nx*ny quads and draw each by:
GL_UNSIGNED_INT indices[4];
// ... set indices
glDrawElements(GL_QUADS, 1, GL_UNSIGNED_INT, indices);
?
1/. Use display lists, to cache GL commands - avoiding recalculation of the vertices and the expensive per-vertex call overhead. If the data is updated, you need to look at client-side vertex arrays (not to be confused with VAOs). Now ignore this option...
2/. Use vertex buffer objects. Available as of GL 1.5.
Since you need VBOs for core profile anyway (i.e., modern GL), you can at least get to grips with this first.
Well, you've asked a rather open ended question. I'd suggest using modern (3.0+) OpenGL for everything. The point of just about any new OpenGL feature is to provide a faster way to do things. Like everyone else is suggesting, use array (vertex) buffer objects and vertex array objects. Use an element array (index) buffer object too. Most GPUs have a 'post-transform cache', which stores the last few transformed vertices, but this can only be used when you call the glDraw*Elements family of functions. I also suggest you store a flat mesh in your VBO, where y=0 for each vertex. Sample the y from a heightmap texture in your vertex shader. If you do this, whenever the surface changes you will only need to update the heightmap texture, which is easier than updating the VBO. Use one of the floating point or integer texture formats for a heightmap, so you aren't restricted to having your values be between 0 and 1.
If so: I do not see any QUAD mode in the documentation for glDrawElements!?
If you want quads make sure you're looking at the GL 2.1-era docs, not the new stuff.