A game uses software rendering to draw a full-screen paletted (8-bit) image in memory.
What's the fastest way to put that image on the screen, using Direct3D?
Currently I convert the paletted image to RGB in software, then put it on a D3DUSAGE_DYNAMIC texture (which is locked with D3DLOCK_DISCARD).
Is there a faster way? E.g. using shaders to perform palettization?
Related questions:
Fast paletted screen blit with OpenGL - same question with OpenGL
How do I improve Direct3D streaming texture performance? - similar question from SDL author
Create a D3DFMT_L8 texture containing the paletted image, and an 256x1 D3DFMT_X8R8G8B8 image containing the palette.
HLSL shader code:
uniform sampler2D image;
uniform sampler1D palette;
float4 main(in float2 coord:TEXCOORD) : COLOR
{
return tex1D(palette, tex2D(image, coord).r * (255./256) + (0.5/256));
}
Note that the luminance (palette index) is adjusted with a multiply-add operation. This is necessary, as palette index 255 is considered as white (maximum luminance), which becomes 1.0f when represented as a float. Reading the palette texture at that coordinate causes it to wrap around (as only the fractionary part is used) and read the first palette entry instead.
Compile it with:
fxc /Tps_2_0 PaletteShader.hlsl /FhPaletteShader.h
Use it like this:
// ... create and populate texture and paletteTexture objects ...
d3dDevice->CreatePixelShader((DWORD*)g_ps20_main, &shader)
// ...
d3dDevice->SetTexture(1, paletteTexture);
d3dDevice->SetPixelShader(shader);
// ... draw texture to screen as textured quad as usual ...
You could write a simple pixel shader to handle the palettization. Create an L8 dynamic texture and copy your paletteized image to it and create a palette lookup texture (or an array of colors in constant memory). Then just render a fullscreen quad with the palettized image set as a texture and a pixel shader that performs the palette lookup from the lookup texture or constant buffer.
That said, performing the palette conversion on the CPU shouldn't be very expensive on a modern CPU. Are you sure that is your performance bottleneck?
Related
I have the following OpenCl kernel code:
kernel void generateImage(global write_only image2d_t output_image)
{
const int2 pos = {get_global_id(0), get_global_id(1)};
write_imagef(output_image, (int2)(pos.x, pos.y), (float4)(1.0f, 0.0f, 0.0f, 0.0f));
}
How can I read the generated image on the CPU side to render it ? I am using plain C. Also a link to some nice tutorial would be great.
The clEnqueueReadImage() function is an image object's equivalent to a buffer object's clEnqueueReadBuffer() function - with similar semantics. The main difference is that (2D) images have a "pitch" - this is the number of bytes by which you advance in memory if you move 1 pixel along the y axis. (This is not necessarily equal to width times bytes per pixel but can be larger if your destination has special storage/alignment requirements.)
The alternative, much as is the case with buffer objects, is to memory-map the image using clEnqueueMapImage().
How you further process the image once your host program can access it depends on what you're trying to do and what platform you're developing for.
I'm trying to create a simple shader for my lighting system. Right now, I'm working on adding support for normal-mapping right now. Without normal-mapping, the lighting system works fine. I'm using the normals forwarded from the vertex shader, and they work perfectly fine. I'm also reading the normals from the normal map correctly. Without including the normal map, the lighting works perfectly. I've tried adding the vertex normal and the normal map's normal, and that doesn't work. Also tried multiplying. Here's how I'm reading the normal-map:
vec4 normalHeight = texture2D(m_NormalMap, texCoord);
vec3 normals = normalize((normalHeight.xyz * vec3(2.0) - vec3(1.0)));
So I have the correct vertex normals, and the normals from the normal map. How should I combine these to get the correct normals?
It depends on how you store your normal maps. If they are in world space to begin with (this is rather rare) and your scene never changes, you can look them up the way you have them. Typically, however, they are in tangent space. Tangent space is a vector space that uses the object's normal, and the rate of change in the (s,t) texture coordinates to properly transform the normals on a surface with arbitrary orientation.
Tangent space normal maps usually appear bluish to the naked eye, whereas world space normal maps are every color of the rainbow (and need to be biased and scaled because half of the colorspace is supposed to represent negative vectors) :)
If you want to understand tangent space better, complete with implementation on deriving the basis vectors, see this link.
Does your normal map not contain the adjusted normals? If yes, then you just need to read the texture in the fragment shader and you should have your normal, like so:
vec4 normalHeight = texture2D(m_NormalMap, texCoord);
vec3 normal = normalize(normalHeight.xyz);
If your trying to account for negative values then you should not be multiplying by the vector but rather the scalar.
vec3 normal = normalize( (normalHeight.xyz * 2.0) - 1.0 );
I realize both mipmaps and integral images have the problem that the resulting pixel value is not the integral of an arbitrary polygon in original texture space. Integrating over axisaligned rectangle in texture coordinates using integral images requires 4 texture lookups. Using mipmaps, opengl interpolates across the 4 adjacent pixel values in the mipmap so also 4 memory lookups. Using an integral image you need less memory (no extra preresized images, only an integral image instead of the original) and no level determination. Of course this can be implemented through shaders, but why was the (now being deprecated) fixed function pipeline ever designed with mipmap support and no integral image support?
Using an integral image you need less memory
I very much doubt that this statement is true
From what I understand the values of an integral image can get quite large, therefore requiring floating point representation which will use a lot more space than a typical 24bit mipmap (mipmaps only double the size of an image) and/or be less precise and create noise during interpolation. Also floating point images were not really used that often with the fixed function pipeline and GPUs may have been a lot slower with floating point images.
If you would use integers for the picture then the bit-depth required for the integral image would rise unreasonably high (bitdepth = extents+8 for a white image which means a 256x256 image would need a bit-depth of 264bit per color channel) with higher resolution images.
but why was the (now being deprecated) fixed function pipeline ever designed with mipmap support and no integral image support?
Because the access and interpolation of mipmaps could be built as rather simple hardwired circuits. Ever wondered, why texture dimensions had to be powers of two? To implement mipmaping calculations as a series of bit shifts and additions. Also accessing the neighbouring elements in a gaussian pyramid requires less memory accesses than evaluating the integral. And there's your main problem: Fillrate, i.e. video memory bandwidth, always has been a bottleneck of GPUs.
I would like to create a QPixmap to draw on using a QPainter. The QPixmap should support transparency without using premultiplied color channels.
Currently I do this by creating a QPixmap with the desired dimensions and filling it with a QColor that has been set to zero for each channel (including alpha).
tex = QtGui.QPixmap(width, height)
c = QtGui.QColor(0)
c.setAlpha(0)
tex.fill(c)
This adds transparency to the QPixmap. However, if I draw to the QPixmap using a QPainter, the drawn color values are premultiplied by the alpha value of the source. I don't want this because the QPixmap is later used as a texture in a QGLWidget and upon rendering the alpha channel of the QPixmap (now the alpha of the source that was drawn using the QPainter) is again multiplied against the color channels, so that the alpha is multiplied twice.
If I use a QImage with format QtGui.QImage.Format_ARGB32 in place of the QPixmap, then the color channels are not premultiplied and the alpha is applied only once. However this is too slow during rendering. I have tried to draw on QImages with the above format and then convert to QPixmaps, but got the same result (premultiplied color channels again being multiplied by the alpha channel). The Trolltech docs say,
Depending on the system, QPixmap is
stored using a RGB32 or a
premultiplied alpha format. If the
image has an alpha channel, and if the
system allows, the preferred format is
premultiplied alpha.
I am using X (Linux). Is there any way to force a QPixmap to not premultiply the color channels when that QPixmap has an alpha channel?
I ended up using QImages and optimizing my code by minimizing the number of QGLWidget::bindTextures I was calling. I still have not received a satisfactory answer about how premultiplied QPixmaps can be used as semi-transparent textures, but I'm satisfied with my program performance and won't be checking this thread anymore.
How to blur 3d object? (Papervision 3d) And save created new object as new 3d model? (can help in sky/clouds generation)
Like in 2d picture I've turn rectangel intu some blury structure
(source: narod.ru)
Set useOwnContainer to true the add the filter:
your3DObject.useOwnContainer = true;
your3DObject.filters = [new BlurFilter(4,4,2)];
When you set useOwnContainer to true, a new 2d DisplayObject is created to render the 3d projection into, and you can apply of the usual DisplayObject properties to that.
Andy Zupko has a good post about this and render layers.
Using this will cost your processor a bit, so use it wisely. For example
in the twigital I worked on at disturb media we used one Glow for
the layer that holds all the characters, not inidividual render layers for each
character. On other projects we 'baked' the filters into bitmaps and used them,
this meant a bit more memory, but freed up the processor a bit for other tasks.
HTH
I'm not familiar with Papervision 3D, but blurring in 3D is normally just blurring in 2D. You pick the object you want blurred, determine the blurring you want for that object, then apply a 2D blur before compositing other objects into the scene.
This is a cheat because in principle, different parts of the object may need different degrees of (depth of field) blurring. But it's not the only cheat in 3D graphics.
That said, there are other approaches. Ray-tracing can give true depth-of-field effects (if you're willing to pay the render-time costs). It's also possible to apply a blur to a 3D "voxel" grid instead of a 2D pixel grid - though I imagine that's more useful for smoothing shapes from e.g. medical scanners than for giving depth-of-field effects.
Blur is 2D operation, try to render object into texture and blur that texture.