buffering direct3d draw operations - 2d

The question is 2D specific.
I am a constantly updating texture, which is a render target for one of my layers. The update is a whole redraw of the texture and is performed by drawing sprites and outputting text. The operation is performed frequently, consumes quite a lot of CPU and I have, of course, optimized the number of the redraws to keep it down.
Is there a way to buffer these operations in Direct3D? Because currently I have to repeatedly construct a chain of sprite/text operations. Lets assume any game performing a world update - how do they overcome this tedious work? Maybe by creating more layers?
The best thing for me would be creating a modifiable draw chain object, but I haven't found anything like this in Direct3D.

There are a few general methods you might look into:
Batching: Order and combine draws to perform as few calls as possible, and draw as many objects between state changes as you can.
Cache: Keep as much geometry in vertex buffers as you can. With 2D, this gets more interesting, since most things are textured quads. In that case...
Shaders: It may be possible to write a vertex shader that takes a float4 giving the X/Y position/size of your quad, then use that to draw 4 vertexes. You won't need to perform full matrix state changes then, just update 4 floats in your shader (skips all the view calculations, 75% less memory and math). To help make sure the right settings are being used with the shaders, ...
State Blocks: Save a state block for each type of sprite, with all the colors, modes, and shaders bound. Then simply apply the state block, bind your textures, set your coordinates, and draw. At best, you can get each sprite down to 4 calls. Even still...
Cull: It's best not to draw something at all. If you can do simple screen bounds-checking (can be faster than the per-poly culling that will be done otherwise), sorting and basic occlusion (flag sprites with transparency). With 2D, most culling checks are very very cheap. Sort and clip and cull wherever you can.
As far as actually buffering, the driver will handle that for you, when and where it's appropriate. State blocks can effect buffering, by delivering all the modes in a single call (I forget whether it's good or bad, though I believe they can be beneficial). Cutting down the calls to:
if (sprite.Visible && Active(sprite) && OnScreen(sprite))
{
states[sprite.Type]->Apply();
device->BindTexture(sprite.Texture);
device->SetVertexShaderF(sprite.PositionSize);
device->Draw(quad);
}
is very likely to help with CPU use.

Related

How are you supposed to update a texture per frame in Vulkan?

I'm trying to work with 2D in vulkan along with 3D. So right now testing out updating a texture for every frame as whatever 2D is going on. I've gotten something of a texture updater working, the problem is that it's very slow and probably not the way it's supposed to be done. Is there any better way of getting this done? The code is based on the https://vulkan-tutorial.com/ code.
https://vulkan-tutorial.com/code/26_depth_buffering.cpp
void UpdateTexture()
{
vkDeviceWaitIdle(device);
vkFreeMemory(device, textureImageMemory, nullptr);
VkBuffer stagingBuffer;
VkDeviceMemory stagingBufferMemory;
createBuffer(imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);
void* data;
vkMapMemory(device, stagingBufferMemory, 0, imageSize, 0, &data);
memcpy(data, pixel2.data(), static_cast<size_t>(imageSize));
vkUnmapMemory(device, stagingBufferMemory);
createImage(texWidth, texHeight, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);
vkDestroyBuffer(device, stagingBuffer, nullptr);
vkFreeMemory(device, stagingBufferMemory, nullptr);
createTextureImageView();
createDescriptorPool();
createDescriptorSets();
createCommandBuffers();
}
This code looks like a direct translation of some OpenGL code, and not particularly good/modern OpenGL code at that.
There's a lot wrong in this code, but most of it boils down to over-synchronization.
First, you should always view any call to vkDeviceWaitIdle as the wrong thing to do. The only exception would be when you are preparing to destroy the VkDevice itself. There is no other reason to do a full CPU/GPU sync like that.
Presumably, this synchronization exists so that you can be sure the GPU is finished using the image before modifying it. This is the wrong thing to do. You should instead employ multiple-buffering. That is, you should have two images that you use. One is currently being used in a rendering process, while the other is being transferred into.
Instead of doing a full device sync, you instead synchronize with the batch you sent two frames ago. That is, if you're wanting to transfer data for use by frame 10, then you must first do a fence-sync operation with the batch you sent in frame 8. Frame 9 is still being processed, but frame 8 is probably done by now. So the synchronization shouldn't hurt too much.
Second, never allocate memory in the middle of an operation like this. Memory gets allocated early in your application, and you leave it allocated until it's time to destroy your application. If you need a staging buffer, then keep it around and reuse it in subsequent frames. Make sure to allocate sufficient storage up-front.
Whatever your createBuffer call is doing, it seems very much like a bad idea. Vulkan is not OpenGL; Vulkan separated memory from buffers/textures that use it for a reason. Creating APIs that hide this separation basically throws all of that away.
Similarly, never unmap memory, unless you're about to destroy that memory object. There's no problem in Vulkan (or OpenGL) with leaving a piece of memory mapped indefinitely. Just map the entire memory's range and leave it mapped. Indeed, you could just pass the mapped pointer directly to your image loader, depending on how the memory get written by the image loading code (if it tries to read data from this pointer, they could be trouble).
Lastly, the commands doing the transfer need to be synchronized with the commands that consume the image. How this happens depends on which queues are being used to do the transfer.
And of course, if you want optimal performance, you may want to check to see if your implementation can read from linear images in your shader. If it can, then you may not need staging at all; you can just write the data directly to the memory in Vulkan's image format, and use it directly.
Employing all of the above is going to add a lot of complexity to your application. But that's how it's supposed to work.
A naive way consists in using the CPU to define the update depending on the time or data and then update the data for the shader, such as a MVP transformation matrix. But this is inefficient with lots of syncing and too low refresh rates, and also overloading the cpu in a loop.
So people recommend using many buffers sometimes mentioning old drivers. If someone can clarify it, that would be nice. I have a naive and probably wrong guess. If they know exactly the frame rate, then they can calculate the time for each frame and dispatch several frames in advance. But it confuses me because the frame rate is dynamic, especially for new screens with the FreeSync functionality that have dynamic refresh rates.
I have thought of a third possibility. One can use the clock directly in the shader. GL_EXT_shader_realtime_clock provides clockRealtimeEXT. It has no defined unit, and will wrap when exceeding the maximum value. But it is said "globally coherent by all invocations on the GPU". During initialization, you can measure its rate using a uniform buffer, and then assume the rate will be constant. And also manage the wrapping.
Then if you can write your shaders as a function of time, for example in a translation, that would be efficient. You just need the initial data. Remember that one must avoid if conditions in shaders.

How do I generate a waypoint map in a 2D platformer without expensive jump simulations?

I'm working on a game (using Game Maker: Studio Professional v1.99.355) that needs to have both user-modifiable level geometry and AI pathfinding based on platformer physics. Because of this, I need a way to dynamically figure out which platforms can be reached from which other platforms in order to build a node graph I can feed to A*.
My current approach is, more or less, this:
For each platform consider each other platform in the level.
For each of those platforms, if it is obviously unreachable (due to being higher than the maximum jump height, for example) do not form a link and move on to next platform.
If a link seems possible, place an ai_character instance on the starting platform and (within the current step event) simulate a jump attempt.
3.a Repeat this jump attempt for each possible starting position on the starting platform.
If this attempt is successful, record the data necessary to replicate it in real time and move on to the next platform.
If not, do not form a link.
Repeat for all platforms.
This approach works, more or less, and produces a link structure that when visualised looks like this:
linked platforms (Hyperlink because no rep.)
In this example the mostly-concealed pink ghost in the lower right corner is trying to reach the black and white box. The light blue rectangles are just there to highlight where recognised platforms are, the actual platforms are the rows of grey boxes. Link lines are green at the origin and red at the destination.
The huge, glaring problem with this approach is that for a level of only 17 platforms (as shown above) it takes over a second to generate the node graph. The reason for this is obvious, the yellow text in the screen centre shows us how long it took to build the graph: over 24,000(!) simulated frames, each with attendant collision checks against every block - I literally just run the character's step event in a while loop so everything it would normally do to handle platformer movement in a frame it now does 24,000 times.
This is, clearly, unacceptable. If it scales this badly at a mere 17 platforms then it'll be a joke at the hundreds I need to support. Heck, at this geometric time cost it might take years.
In an effort to speed things up, I've focused on the other important debugging number, the tests counter: 239. If I simply tried every possible combination of starting and destination platforms, I would need to run 17 * 16 = 272 tests. By figuring out various ways to predict whether a jump is impossible I have managed to lower the number of expensive tests run by a whopping 33 (12%!). However the more exceptions and special cases I add to the code the more convinced I am that the actual problem is in the jump simulation code, which brings me at long last to my question:
How would you determine, with complete reliability, whether it is possible for a character to jump from one platform to another, preferably without needing to simulate the whole jump?
My specific platform physics:
Jumps are fixed height, unless you hit a ceiling.
Horizontal movement has no acceleration or inertia.
Horizontal air control is allowed.
Further info:
I found this video, which describes a similar problem but which doesn't provide a good solution. This is literally the only resource I've found.
You could limit the amount of comparisons by only comparing nearby platforms. I would probably only check the horizontal distance between platforms, and if it is wider than the longest jump possible, then don't bother checking for a link between those two. But you might have done this since you checked for the max height of a jump.
I glanced at the video and it gave me an idea. Instead of looking at all platforms to find which jumps are impossible, what if you did the opposite? Try placing an AI character on all platforms and see which other platforms they can reach. That's certainly easier to implement if your enemies can't change direction in midair though. Oh well, brainstorming is the key to finding something.
Several ideas you could try out:
Limit the amount of comparisons you need to make by using a spatial data structure, like a quad tree. This would allow you to severely limit how many platforms you're even trying to check. This is mostly the same as what you're currently doing, but a bit more generic.
Try to pre-compute some jump trajectories ahead of time. This will not catch all use cases that you have - as you allow for full horizontal control - but might allow you to catch some common cases more quickly
Consider some kind of walkability grid instead of a link generation scheme. When geometry is modified, compute which parts of the level are walkable and which are not, with some resolution (something similar to the dimensions of your agent might be good starting point). You could also filter them with a height, so that grid tiles that are higher than your jump height, and you can't drop from a higher place on to them, are marked as unwalkable. Then, when you compute your pathfinding, as part of your pathfinding step you can compute when you start a jump, if a path is actually executable ('start a jump, I can go vertically no more than 5 tiles, and after the peak of the jump, i always fall down vertically with some speed).

How to achieve realistic reflection with threejs

I am trying to render as realistically as possible a scene in which a point light hits an object and bounces off with the same angle wrt the normal of the face (angle of incidence = angle of reflection) and illuminates the scene elsewhere.
Now, I know reflection in threejs is normally dealt with CubeCamera-material as per the examples I found online, but it doesn't quite apply to my case, for I may be observing the scene from a point in which I might not be able to observe the reflection of the object on the mirror-like surface of another one.
Consider this example prototype I'm working on: if the box that is protruding from the wall in the scene had a mirror-like material (accomplished with a CubeCamera), I wouldn't be able to see the green cube's reflection on the bottom face unless the camera was at a specific position; in real life, however, if an object illuminated by a light source passes in the vicinity of another one, it will in part light it as if it were a light source itself (depending on the object's index of reflectivity, of course) and such phenomenon should be visible from any point of view the object receiving indirect lighting is visible from.
Hence I came up with the idea of adding a PointLight to the cube, but this of course produces undesirable effects on the surroundings.
I will try to illustrate my goal with the following sequence:
1) Here, the far side of what I will henceforth refer to as balcony is correctly dark, while the areas marked with a red 'x' are the consequence of the cube having a child PointLight which shines in all directions.
2) Here, the balcony's far face is still dark and the bottom one is receiving even more light as the cube passes by, which is desirable, but the wall behind the cube should actually be dark (I haven't added shadows yet, I first want to get the lighting right), as well as the ground beneath it and the lamp post.
3) Finally, when the cube has passed the balcony, it's just plain wrong for the balcony's side and bottom face to be illuminated, for we all now that a reflected ray does not bounce back the way it came from. Same applies to the lamp post.
Now I realize that all the mistakes that occur are due to the fact that the cube emits light itself, what I'm hoping you can help me with is determining a way to produce physically accurate reflected rays.
I would like to avoid using ambient light or other hacks to simulate real-life scenarios and stick to physics as much as possible; I suspect what I want to achieve is very computationally heavy to render, let alone animate in a real-time use case, but that's not an issue for I'm merely trying to develop a proof-of-concept, not something that should necessarily perform fast.
From what I gather, I should probably be writing custom vertex and fragment shaders for the materials receiving indirect illumination, right? Unfortunately I wouldn't know where to begin, can anyone point me in the right direction? Cheers.
If you do not want to go to the Volumetric rendering then you have 3 options (I know of)
ray-tracing
you have to use ray-trace rendering (back ray-trace) to achieve this. This will also cover shadows,transparent materials,reflected illumination and much more if coded properly. Unless you want to do also precise atmospheric scattering then this is the way.
back raytracing is one (or 3) ray(s) per each screen pixel. It is much faster but not that precise.. (still precise enough)
raytracing is one ray per each 3D angular unit (steradian) of space per each light source. It is slow but precise (if ray density is high enough).
If the casted ray hits any obstacle then its color is changed (due to obstacle property) and new ray is casted as reflected light ray. If material is transparent then also refracted ray is casted ... Each hit or refraction affect light intensity so you stop when intensity is lower then some treshold or on some layer of recursion (limit max number of refractions per ray) to avoid infinite loops and you can manipulate performance/quality ...
standard polygon rendering
With this approach (I think you are using it right now) you have to improvise. The reflection and illumination effects can be done similar to shadowing techniques. For each surface you have to render the scene in reflected direction. The same can be done with shadows but then you just rendering to the light direction or use shadow map instead. If you have insane number of reflective surfaces then this approach is not the way also to achieve reflection of refraction you have to render recursively making it multiple rendering pass per polygon which is also insane.
cubemap
You can use cube map per each object. It is similar to bullet 2 but the insanity is done just once while generating cubemaps instead of per frame ... If you have too much objects then this is also not the way. You can use cube map only for objects with reflective surfaces to make it manageable. Also if the objects are moving then you have to re-generate cubemaps once in a while ...

Problem with huge objects in a quad tree

Let's say I have circular objects. Each object has a diameter of 64 pixels.
The cells of my quad tree are let's say 96x96 pixels.
Everything will be fine and working well when I check collision from the cell a circle is residing in + all it's neighbor cells.
BUT what if I have one circle that has a diameter of 512 pixels? It would cover many cells and thus this would be a problem when checking only the neighbor cells. But I can't re-size my quad-tree-grid every time a much larger object is inserted into the tree...
Instead och putting objects into a single cell put them in all cells they collide with. That way you can just test each cell individually. Use pointers to the object so you dont create copies. Also you only need to do this with leavenodes, so no need to combine data contained in higher nodes with lower ones.
This an interesting problem. Maybe you can extend the node or the cell with a tree height information? If you have an object bigger then the smallest cell nest it with the tree height. That's what map's application like google or bing maps does.
Here a link to a similar solution: http://www.gamedev.net/topic/588426-2d-quadtree-collision---variety-in-size. I was confusing the screen with the quadtree. You can check collision with a simple recusion.
Oversearching
During the search, and starting with the largest objects first...
Test Object.Position.X against QuadTreeNode.Centre.X, and also
test Object.Position.Y against QuadTreeNode.Centre.Y;
... Then, by taking the Absolute value of the difference, treat the object as lying within a specific child node whenever the absolute value is NOT more than the radius of the object...
... that is, when some portion of the object intrudes into that quad : )
The same can be done with AABB (Axis Aligned Bounding Boxes)
The only real caveat here is that VERY large objects that cover most of the screen, will force a search of the entire tree. In these cases, a different approach may be called for.
Of course, this only takes care of the object that everything else is being tested against. To ensure that all the other large objects in the world are properly identified, you will need to alter your quadtree slightly...
Use Multiple Appearances
In this variation on the QuadTree we ONLY place objects in the leaf nodes of the QuadTree, as pointers. Larger objects may appear in multiple leaf nodes.
Since some objects have multiple appearances in the tree, we need a way to avoid them once they've already been tested against.
So...
A simple Boolean WasHit flag can avoid testing the same object multiple times in a hit-test pass... and a 'cleanup' can be run on all 'hit' objects so that they are ready for the next test.
Whilst this makes sense, it is wasteful if performing all-vs-all hit-tests
So... Getting a little cleverer, we can avoid having any cleanup at all by using a Pointer 'ptrLastObjectTestedAgainst' inside of each object in the scene. This avoids re-testing the same objects on this run (the pointer is set after the first encounter)
It does not require resetting when testing a new object against the scene (the new object has a different pointer value than the last one). This avoids the need to reset the pointer as you would with a simple Bool flag.
I've used the latter approach in scenes with vastly different object sizes and it worked well.
Elastic QuadTrees
I've also used an 'elastic' QuadTree. Basically, you set a limit on how many items can IDEALLY fit in each QuadTreeNode - But, unlike a standard QuadTree, you allow the code to override this limit in specific cases.
The overriding rule here is that an object may NOT be placed into a Node that cannot hold it ENTIRELY... with the top node catching any objects that are larger than the screen.
Thus, small objects will continue to 'fall through' to form a regular QuadTree but large objects will not always fall all the way through to the leaf node - but will instead expand the node that last fitted them.
Think of the non-leaf nodes as 'sieving' the objects as they fall down the tree
This turns out to be a very efficient choice for many scenarios : )
Conclusion
Remember that these standard algorithms are useful general tools, but they are not a substitute for thinking about your specific problem. Do not fall into the trap of using a specific algorithm or library 'just because it is well known' ... your application is unique, and it may benefit from a slightly different approach.
Therefore, don't just learn to apply algorithms ... learn from those algorithms, and apply the principles themselves in novel and fitting ways. These are NOT the only tools, nor are they necessarily the best fit for your application.
Hope some of those ideas helped.

Best solution for 2D occlusion culling

In my 2D game, I have static and dynamic objects. There can be multiple cameras. My problem: Determine objects that intersect with the current camera's view rectangle.
Currently, I simply iterate over all existing objects (not caring wheter dynamic or static) and do an AABB check with the cameras view rect on them. This seems acceptable for very dynamic objects, but not for static objects, where there can be tens of thousands of them (static level geometry scattered over the whole scene).
I have looked into multiple data structures which could solve my problem:
Quadtree
This was the first thing I considered, however the problem is that it would force my scenes to be of fixed size. (Acceptable for static, but not for dynamic objects)
Dynamic AABB tree
Seems good, but the overhead for rebalancing it seems just too great for many dynamic objects.
Spatial hash
The main problem here for me was that if you zoom out with the camera a lot, a huge number of mostly non-existing spatial hash buckets had to be queried, causing low performance.
In general, my criterias for a good solution of this problem are:
Dynamic size: The solution must not cause the scene size to be limited, or require heavy recomputation for resizing
Good query performance (for the camera)
Good support of very dynamic objects: The computations needed to handle objects with constantly changing position should be good:
The maximum sane number of dynamic objects in my game at one time probably is at 5000. Consider they all change their position every frame. Is there even a data structure which can be faster, considering the frequent insertions and deletions, than comparing the AABBs of the objects with the camera every frame?
Don't try to find the silver bullet. Just split your scene into dynamic and static parts and use different algorithms for them.
Quad trees are obviously suitable for static geometry with fixed
bounds.
Spatial hashes are ideal for sets of objects with similar sizes
(particle systems, for example).
AFAIK dynamic AABB trees are rarely used for occlusion culling, their
main purpose is the broad phase of collision detection.
And as you noticed, bruteforce culling is normal for dynamic objects
if the number of them is not really big.
static level geometry scattered over the whole scene
If your scene is highly-sparse, you can divide it into islands, i.e. create a list of scene parts with "good density".

Resources