In my 2D game, I have static and dynamic objects. There can be multiple cameras. My problem: Determine objects that intersect with the current camera's view rectangle.
Currently, I simply iterate over all existing objects (not caring wheter dynamic or static) and do an AABB check with the cameras view rect on them. This seems acceptable for very dynamic objects, but not for static objects, where there can be tens of thousands of them (static level geometry scattered over the whole scene).
I have looked into multiple data structures which could solve my problem:
Quadtree
This was the first thing I considered, however the problem is that it would force my scenes to be of fixed size. (Acceptable for static, but not for dynamic objects)
Dynamic AABB tree
Seems good, but the overhead for rebalancing it seems just too great for many dynamic objects.
Spatial hash
The main problem here for me was that if you zoom out with the camera a lot, a huge number of mostly non-existing spatial hash buckets had to be queried, causing low performance.
In general, my criterias for a good solution of this problem are:
Dynamic size: The solution must not cause the scene size to be limited, or require heavy recomputation for resizing
Good query performance (for the camera)
Good support of very dynamic objects: The computations needed to handle objects with constantly changing position should be good:
The maximum sane number of dynamic objects in my game at one time probably is at 5000. Consider they all change their position every frame. Is there even a data structure which can be faster, considering the frequent insertions and deletions, than comparing the AABBs of the objects with the camera every frame?
Don't try to find the silver bullet. Just split your scene into dynamic and static parts and use different algorithms for them.
Quad trees are obviously suitable for static geometry with fixed
bounds.
Spatial hashes are ideal for sets of objects with similar sizes
(particle systems, for example).
AFAIK dynamic AABB trees are rarely used for occlusion culling, their
main purpose is the broad phase of collision detection.
And as you noticed, bruteforce culling is normal for dynamic objects
if the number of them is not really big.
static level geometry scattered over the whole scene
If your scene is highly-sparse, you can divide it into islands, i.e. create a list of scene parts with "good density".
Related
I have some vertex data. Positions, normals, texture coordinates. I probably loaded it from a .obj file or some other format. Maybe I'm drawing a cube. But each piece of vertex data has its own index. Can I render this mesh data using OpenGL/Direct3D?
In the most general sense, no. OpenGL and Direct3D only allow one index per vertex; the index fetches from each stream of vertex data. Therefore, every unique combination of components must have its own separate index.
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
Your best bet is to simply accept that your data will be larger. A great many model formats will use multiple indices; you will need to fixup this vertex data before you can render with it. Many mesh loading tools, such as Open Asset Importer, will perform this fixup for you.
It should also be noted that most meshes are not cubes. Most meshes are smooth across the vast majority of vertices, only occasionally having different normals/texture coordinates/etc. So while this often comes up for simple geometric shapes, real models rarely have substantial amounts of vertex duplication.
GL 3.x and D3D10
For D3D10/OpenGL 3.x-class hardware, it is possible to avoid performing fixup and use multiple indexed attributes directly. However, be advised that this will likely decrease rendering performance.
The following discussion will use the OpenGL terminology, but Direct3D v10 and above has equivalent functionality.
The idea is to manually access the different vertex attributes from the vertex shader. Instead of sending the vertex attributes directly, the attributes that are passed are actually the indices for that particular vertex. The vertex shader then uses the indices to access the actual attribute through one or more buffer textures.
Attributes can be stored in multiple buffer textures or all within one. If the latter is used, then the shader will need an offset to add to each index in order to find the corresponding attribute's start index in the buffer.
Regular vertex attributes can be compressed in many ways. Buffer textures have fewer means of compression, allowing only a relatively limited number of vertex formats (via the image formats they support).
Please note again that any of these techniques may decrease overall vertex processing performance. Therefore, it should only be used in the most memory-limited of circumstances, after all other options for compression or optimization have been exhausted.
OpenGL ES 3.0 provides buffer textures as well. Higher OpenGL versions allow you to read buffer objects more directly via SSBOs rather than buffer textures, which might have better performance characteristics.
I found a way that allows you to reduce this sort of repetition that runs a bit contrary to some of the statements made in the other answer (but doesn't specifically fit the question asked here). It does however address my question which was thought to be a repeat of this question.
I just learned about Interpolation qualifiers. Specifically "flat". It's my understanding that putting the flat qualifier on your vertex shader output causes only the provoking vertex to pass it's values to the fragment shader.
This means for the situation described in this quote:
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
You can have 8 vertexes, 6 of which contain the unique normals and 2 of normal values are disregarded, so long as you carefully order your primitives indices such that the "provoking vertex" contains the normal data you want to apply to the entire face.
EDIT: My understanding of how it works:
I'm currently considering the implementation of a 2D data structure to allow me to store and draw objects in correct Z-Order (GDI+, entities are drawn in call order). The requirements are loosely:
Ability to add new objects to the top of any depth index
Ability to remove arbitrary object
(Ability to move object to the top of new depth index, accomplished by 2 points above)
Fast in-order and reverse-order traversal
As the main requirement is speed of traversal across the full data, the first thing that came to mind was an array like structure, eg. vector. It also easily allows for pushing new objects (removing objects not so great..). This works perfectly fine for our requirements, as it just so happens that the bulk of drawable entities don't change, and the ones that do sit at the top end of the order.
However it got me thinking of the implications for more dynamic requirements:
A vector will resize itself as required -> as the 'depth' vectors would need to be maintained contiguously in memory (top-level vector enforces it), this could lead to some pretty expensive vector resizes. Worst case all vectors need to be moved to new memory location, average case requiring all vectors up the chain to be moved.
Vectors will often hold a buffer at the end for adding new objects -> traversal could still easily force a cache miss while jumping between 'depth' vectors, rendering the top-level vector's contiguous memory less beneficial
Could someone confirm that these observations are indeed correct, making a vector a mostly very expensive structure for storing larger dynamic data sets?
From my thoughts above, I end up deducing that while traversing the whole dataset, specifically jumping between different vectors in the top-level vector, you might as well use any other data structure with inferior traversal complexity, or similar random access complexity (linked_list; map). Traversal would effectively be the same, as we might as well assume the cache misses will happen anyway, and we save ourselves a lot of bother by not keeping the depth vectors contiguously in memory.
Would that indeed be a good solution? If I'm not mistaken, on a 1D problem space, this would come down to what's more important traversal or addition/removal, vector or linked-list. On a 2D space I'm not so sure it is so black and white.
I'm wondering what sort of application requires good traversal across a 2D space, without compromising data addition/removal, and what sort of data structures are used there.
P.S. I just noticed I'm completely ignoring space-complexity, so might as well keep on ignoring it (unless you feel like adding more insight :D)
Your first assumption is somewhat incorrect.
Instead of thinking of vectors as the blob of memory itself, think of it as a pointer to automatically managed blob of memory and some metadata to keep track of it. A vector itself is a fixed size, the memory it keeps track of isn't. (See this example, note that the size of the vector object is constant: https://ideone.com/3mwjRz)
A vector of vectors can be thought of as an array of pointers. Resizing what the pointers point to doesn't mean you need to resize the array that contains them. The promise of items being contiguous still holds: the parent array has all of the pointers adjacent to each other and each pointer points to a contiguous chunk of memory. However, it's not guaranteed that the end of arr[0][N-1] is adjacent to the beginning of arr[1][0]. (To this end, your second point is correct.)
I guess that a Linked List would be more appropriate as you will always be traversing the whole list (vectors are good for random access). Linked lists inserts and removal are very cheap and the traversal isn't that different from a vector traversal. Maybe you should consider a Doubly Linked List as you want to traverse it in both ways.
in these days, i'm trying program on mobile gpu(adreno)
the algorithm what i use for image processing has 'randomness' for memory access.
it refers some pixels in 'fixed' range for filtering.
BUT, i cant know exactly which pixel will be referred(depends on image)
as far as i understood. if multiple thread access local memory bank
it causes bank conflict. so in my case it should make bank conflict.
MY question: Can i eliminate bank conflict at random memory access?
or can i reduce them?
Assuming that the distances of your randomly accessed pixels is somehow normal distributed, you could think of tiling your image into subimages.
What I mean: instead of working with a (lets say) 1024x1024 image, you might have 4x4 images of size 256x256. Each of them is kept together in memory, so "near" pixel access stays within the same image object. Only the far distance operations need to access different subimages.
A second option: instead of using CLImage objects, try to save your data into an array. The data in the array can be stored in a Z-order curve sorting. This also leads to a reduced spatially distribution (compared to row-order-sorting)
But of course, this depends strongly on your image size.
There are a variety of ways to deal with bank conflicts - the size of the elements you are working with, the strides between lines and shifting the coordinates around to different memory addresses. It's never going to be as good as non-random / conflict free though and so what you will notice is depending on the image - you will see significantly different compute times.
See http://cuda-programming.blogspot.com/2013/02/bank-conflicts-in-shared-memory-in-cuda.html
The question is 2D specific.
I am a constantly updating texture, which is a render target for one of my layers. The update is a whole redraw of the texture and is performed by drawing sprites and outputting text. The operation is performed frequently, consumes quite a lot of CPU and I have, of course, optimized the number of the redraws to keep it down.
Is there a way to buffer these operations in Direct3D? Because currently I have to repeatedly construct a chain of sprite/text operations. Lets assume any game performing a world update - how do they overcome this tedious work? Maybe by creating more layers?
The best thing for me would be creating a modifiable draw chain object, but I haven't found anything like this in Direct3D.
There are a few general methods you might look into:
Batching: Order and combine draws to perform as few calls as possible, and draw as many objects between state changes as you can.
Cache: Keep as much geometry in vertex buffers as you can. With 2D, this gets more interesting, since most things are textured quads. In that case...
Shaders: It may be possible to write a vertex shader that takes a float4 giving the X/Y position/size of your quad, then use that to draw 4 vertexes. You won't need to perform full matrix state changes then, just update 4 floats in your shader (skips all the view calculations, 75% less memory and math). To help make sure the right settings are being used with the shaders, ...
State Blocks: Save a state block for each type of sprite, with all the colors, modes, and shaders bound. Then simply apply the state block, bind your textures, set your coordinates, and draw. At best, you can get each sprite down to 4 calls. Even still...
Cull: It's best not to draw something at all. If you can do simple screen bounds-checking (can be faster than the per-poly culling that will be done otherwise), sorting and basic occlusion (flag sprites with transparency). With 2D, most culling checks are very very cheap. Sort and clip and cull wherever you can.
As far as actually buffering, the driver will handle that for you, when and where it's appropriate. State blocks can effect buffering, by delivering all the modes in a single call (I forget whether it's good or bad, though I believe they can be beneficial). Cutting down the calls to:
if (sprite.Visible && Active(sprite) && OnScreen(sprite))
{
states[sprite.Type]->Apply();
device->BindTexture(sprite.Texture);
device->SetVertexShaderF(sprite.PositionSize);
device->Draw(quad);
}
is very likely to help with CPU use.
Let's say I have circular objects. Each object has a diameter of 64 pixels.
The cells of my quad tree are let's say 96x96 pixels.
Everything will be fine and working well when I check collision from the cell a circle is residing in + all it's neighbor cells.
BUT what if I have one circle that has a diameter of 512 pixels? It would cover many cells and thus this would be a problem when checking only the neighbor cells. But I can't re-size my quad-tree-grid every time a much larger object is inserted into the tree...
Instead och putting objects into a single cell put them in all cells they collide with. That way you can just test each cell individually. Use pointers to the object so you dont create copies. Also you only need to do this with leavenodes, so no need to combine data contained in higher nodes with lower ones.
This an interesting problem. Maybe you can extend the node or the cell with a tree height information? If you have an object bigger then the smallest cell nest it with the tree height. That's what map's application like google or bing maps does.
Here a link to a similar solution: http://www.gamedev.net/topic/588426-2d-quadtree-collision---variety-in-size. I was confusing the screen with the quadtree. You can check collision with a simple recusion.
Oversearching
During the search, and starting with the largest objects first...
Test Object.Position.X against QuadTreeNode.Centre.X, and also
test Object.Position.Y against QuadTreeNode.Centre.Y;
... Then, by taking the Absolute value of the difference, treat the object as lying within a specific child node whenever the absolute value is NOT more than the radius of the object...
... that is, when some portion of the object intrudes into that quad : )
The same can be done with AABB (Axis Aligned Bounding Boxes)
The only real caveat here is that VERY large objects that cover most of the screen, will force a search of the entire tree. In these cases, a different approach may be called for.
Of course, this only takes care of the object that everything else is being tested against. To ensure that all the other large objects in the world are properly identified, you will need to alter your quadtree slightly...
Use Multiple Appearances
In this variation on the QuadTree we ONLY place objects in the leaf nodes of the QuadTree, as pointers. Larger objects may appear in multiple leaf nodes.
Since some objects have multiple appearances in the tree, we need a way to avoid them once they've already been tested against.
So...
A simple Boolean WasHit flag can avoid testing the same object multiple times in a hit-test pass... and a 'cleanup' can be run on all 'hit' objects so that they are ready for the next test.
Whilst this makes sense, it is wasteful if performing all-vs-all hit-tests
So... Getting a little cleverer, we can avoid having any cleanup at all by using a Pointer 'ptrLastObjectTestedAgainst' inside of each object in the scene. This avoids re-testing the same objects on this run (the pointer is set after the first encounter)
It does not require resetting when testing a new object against the scene (the new object has a different pointer value than the last one). This avoids the need to reset the pointer as you would with a simple Bool flag.
I've used the latter approach in scenes with vastly different object sizes and it worked well.
Elastic QuadTrees
I've also used an 'elastic' QuadTree. Basically, you set a limit on how many items can IDEALLY fit in each QuadTreeNode - But, unlike a standard QuadTree, you allow the code to override this limit in specific cases.
The overriding rule here is that an object may NOT be placed into a Node that cannot hold it ENTIRELY... with the top node catching any objects that are larger than the screen.
Thus, small objects will continue to 'fall through' to form a regular QuadTree but large objects will not always fall all the way through to the leaf node - but will instead expand the node that last fitted them.
Think of the non-leaf nodes as 'sieving' the objects as they fall down the tree
This turns out to be a very efficient choice for many scenarios : )
Conclusion
Remember that these standard algorithms are useful general tools, but they are not a substitute for thinking about your specific problem. Do not fall into the trap of using a specific algorithm or library 'just because it is well known' ... your application is unique, and it may benefit from a slightly different approach.
Therefore, don't just learn to apply algorithms ... learn from those algorithms, and apply the principles themselves in novel and fitting ways. These are NOT the only tools, nor are they necessarily the best fit for your application.
Hope some of those ideas helped.