Why doesn't Vulkan use the "standard cartesian" coordinate system? - coordinate-systems

Vulkan uses a coordinate system where (-1, -1) is in the top left quadrant, instead of the bottom left quadrant as in the standard cartesian coordinate system one typically learns about in school. So (-1, 1) is in the bottom left quadrant in Vulkan's coordinate system.
(image from: http://vulkano.rs/guide/vertex-input)
What are the advantages of using Vulkan's coordinate system? One plain advantage I can see is pedagogical: it forces people to realize that coordinate systems are arbitrary, and one can easily map between them. However, I doubt that's the design reason.
So what are the design reasons for this choice?

Many coordinate systems in computer graphics put the origin at the top-left and point the y axis down.
This is because in early televisions and monitors, the electron beam that draws the picture starts at the top-left of the screen and progresses downward.
The pixels on the screen were generally made by reading memory in sequential addresses as the beam moved down the screen, and modulating that electron beam in accordance with each byte read in sequence. So the y axis corresponds to time, which corresponds to memory address.
Even today, virtually all representations of a bitmap in memory, or in a bitmapped file, start at the top-left.
It is natural when drawing bitmaps in such a medium to use a coordinate system that starts at the top-left too.
Things become a little more complicated when you use a bottom-left origin because finding the byte that corresponds to a pixel requires a little more math and needs to account for the height of the bitmap. There is usually just no reason to introduce the extra complexity.
When you start to introduce matrix transformations however, it becomes much more convenient to work with an upward-pointing y axis, because that lets you use all the vector algebra you learned in school without having to reverse the y axis and all the rotations in your thinking.
So what you'll usually find is that when you are working in a system that lets you do matrix operations, translations, rotations, etc., then you will have an upward-pointing y axis. At some point deep inside, however, the calculations will transform the coordinates into a downward-pointing y axis for the low-level operations.

One of the common sources of confusion and bugs in OpenGL was that NDC and window coordinates had y increasing upwards, which is opposite of the convention used in nearly all window systems and many (but not all) image formats, where y is [0..1] increasing downwards. Developers ended up having to insert a y-flip in their transformation pipeline in many cases, and it wasn't always clear when they did and didn't.
So Vulkan decided to make it so you could load an image from a y-downwards image format directly into memory and draw it to the screen without any explicit y flips, to avoid this source of errors.
Other coordinate systems were then chosen to be consistent with that, in the sense that the y direction never flips direction in the standard Vulkan transformation pipeline. That meant that clip space vertex coordinates also had y increasing downwards.
This ended up meaning that Vulkan clip coordinates have a different orientation than D3D clip coordinates, which was an annoyance for developers supporting both APIs. So the VK_KHR_maintenance1 extension adds the ability to specify a negative viewport height, which essentially introduces a y-flip to the clip-space to framebuffer coordinate transform. (D3D has essentially always had an implicit y-flip here.)
This is how I remember the reasoning in the Vulkan Working Group, anyway. I don't think there's an authoritative public source anywhere.

Related

Why does software often use an inverse coordinate system compared to regular math coord system?

As you likely know, in c#, the origin (0,0) of a plane is the upperleft corner. Going to the right and/or under is regarded as +, while going to the left and/or up is -. Opposed to this is the regular math coordsystem:
(0,0)=mid of plane, going up/right = +, down/left = -.
It's kinda counter-intuitive and can be annoying sometimes, since we're used (for years) to using the regular math coords, and you have to recalculate coords as well.
Is this a fundamental design flaw? And do you get used to it after a while? And which other languages use a different coord system like c#?
It is not C# but the display that uses a inverse coordinate system, this comes from the days back when the display was drawn in using a CRT and the image was drawn in top to bottom, left to right. That is why the coordinate system OS's use match that.
Languages like C# are just wrapping the underlying OS's API and that is why C# uses it too.
The mathematical graph plane is a virtual thing, which expands in all directions without limits.
The screen is a real thing, which can not really expand at all.
Instead we use the concept of scrolling and we are used to doing it from a starting point down.
So conceptually the graphics systems all use the same system as a (left-to-right & top-to-bottom) textblock or page in a book . It is about how we scroll to expand/advance the display area.
But it could be defined in any other way; after all e.g. negative coordinates do make sense as opposed to a negative line number..
If you don't like the coordinate system on the screen, you can create wrapper methods to re-map the coordinates any way you like.

Why does OpenGL provide support for mipmaps but not integral images?

I realize both mipmaps and integral images have the problem that the resulting pixel value is not the integral of an arbitrary polygon in original texture space. Integrating over axisaligned rectangle in texture coordinates using integral images requires 4 texture lookups. Using mipmaps, opengl interpolates across the 4 adjacent pixel values in the mipmap so also 4 memory lookups. Using an integral image you need less memory (no extra preresized images, only an integral image instead of the original) and no level determination. Of course this can be implemented through shaders, but why was the (now being deprecated) fixed function pipeline ever designed with mipmap support and no integral image support?
Using an integral image you need less memory
I very much doubt that this statement is true
From what I understand the values of an integral image can get quite large, therefore requiring floating point representation which will use a lot more space than a typical 24bit mipmap (mipmaps only double the size of an image) and/or be less precise and create noise during interpolation. Also floating point images were not really used that often with the fixed function pipeline and GPUs may have been a lot slower with floating point images.
If you would use integers for the picture then the bit-depth required for the integral image would rise unreasonably high (bitdepth = extents+8 for a white image which means a 256x256 image would need a bit-depth of 264bit per color channel) with higher resolution images.
but why was the (now being deprecated) fixed function pipeline ever designed with mipmap support and no integral image support?
Because the access and interpolation of mipmaps could be built as rather simple hardwired circuits. Ever wondered, why texture dimensions had to be powers of two? To implement mipmaping calculations as a series of bit shifts and additions. Also accessing the neighbouring elements in a gaussian pyramid requires less memory accesses than evaluating the integral. And there's your main problem: Fillrate, i.e. video memory bandwidth, always has been a bottleneck of GPUs.

Implementing z-axis in a 2D side-scroller

I'm making a side scroller similar to Castle Crashers and right now I'm using SAT for collision detection. That works great, but I want to simulate level "depth" by allowing objects to move up and down on the screen, basically along a z-axis (like this screenshot http://favoniangamers.files.wordpress.com/2009/07/castle-crashers-ps3.jpg). This isn't an isometric game, but rather uses parallax scrolling.
I added a z component to my vector class, and I plan to cull collisions based on the 'thickness' of a shape and it's z position. I'm just not sure how calculate the positions of shapes for rendering or how to add jumping with gravity. How do I calculate the max y value (for the ground) as the z position changes? Basically it's the relationship of the z and y axis that confuses me.
I'd appreciate links to resources if anyone knows of this topic.
Thanks!
It's actually possible to make your collision detection algorithm dimensionally agnostic. Just have a collision detector that works along one dimension, use that to check each dimension, and your answer to "are these colliding or not" is the logical AND of the collision detection along each of the dimensions.
Your game should be organised to keep the interaction of game objects, and the rendering of the game to the screen completely seperate. You can think of these two sections of the program as the "model" and the "view". In the model, you have a full 3D world, with 3 axes. You can't go halvesies on this point without some level of pain. Your model must be proper 3D.
The view will read the location of all the game objects, and project them onto the screen using the camera definition. For this part you don't need a full 3D rendering engine. The correct technical term for the perspective you're talking about is "oblique", and it can be seen in many ancient chinese and japanese scroll paintings and prints- in particular look for images of "The Tale of Genji".
The on screen position of an object (including the ground surface!) goes something like this:
DEPTH_RATIO=0.5;
view_x=model_x-model_z*DEPTH_RATIO-camera_x;
view_y=model_y+model_z*DEPTH_RATIO-camera_y;
you can modify for a straight orthographic front projection:
DEPTH_RATIO=0.5;
view_x=model_x-camera_x;
view_y=model_y+model_z*DEPTH_RATIO-camera_y;
And of course don't forget to cull objects outside the volume defined by the camera.
You can also use this mechanism to handle the positioning of parallax layers for you. This is of course, a matter changing your camera to a 1-point perspective projection instead of an orthographic projection. You don't have to use this to change the rendered size of your sprites, but it will help you manage the x position of objects realistically. if you're up for a challenge, you could even mix projections- use 1 point perspective for deep backgrounds, and the orthographic stuff for the foreground.
You should separate your conceptual Y axis used by you physics calculation (collision detection etc.) and the Y axis you actually draw on the screen. That way it becomes less confusing.
Just do calculations per normal pretending there is no relationship between Y and Z axis then when you actually draw the object on the screen you simulate the Z axis using the Y axis:
screen_Y = Y + Z/some_fudge_factor;
Actually, this is how real 3d engines work. After all the world calculations are done the X, Y and Z coordinates are mapped onto screen_X and screen_Y via a function (usually a bit more complicated than the equation above, but just a bit).
For example, to implement pseudo-isormetric view in your game you can even apply Z to the screen_X axis so objects are displaced diagonally instead of vertically.

Moving sprites between tiles in an Isometric world

I'm looking for information on how to move (and animate) 2D sprites across an isometric game world, but have their movement animated smoothly as the travel from tile to tile, as opposed to having them jump from the confines of one tile, to the confines of the next.
An example of this would be in the Transport Tycoon Game, where the trains and carriages are often half in one tile and half in the other.
Drawing the sprites in the right place isn't too difficult. The projection formula are:
screen_x = sprite_x - sprite_y
screen_y = (sprite_x + sprite_y) / 2 + sprite_z
sprite_x and sprite_y are fixed point values (or floating point if you want). Usually, the precision of the fixed point is the number of pixels on a tile - so if your tile graphic was 32x16 (a projected 32x32 square) you would have 5 bits of precision, i.e. 1/32th of a tile.
The really hard part is to sort the sprites into an order that renders correctly. If you use OpenGL for drawing, you can use a z-buffer to make this really easy. Using GDI, DirectX, etc, it is really hard. Transport Tycoon doesn't correctly render the sprites in all instances. The original Transport Tycoon had the most horrendous rendering engine you've ever seen. It implemented the three zoom levels are three instanciations of a massive masm macro. TT was written entirely in assembler. I know, because I ported it to the Mac many years ago (and did a cool version for the PS1 dev kit as well, it needed 6Mb though).
P.S. One of the small bungalow graphics in the game was based on the house Chris Sawyer was living in at the time. We were tempted to add a Ferrari parked in the driveway for the Mac version as that was the car he bought with the royalties.
Look up how to do linear interpolation (it's a pretty simple formula). You can then use this to parameterise the transition on a single [0, 1] range. You then simply have a state in your sprites to store the facts:
That they are moving
Start and end points
Start and end times (or start time and duration
and then each frame you can draw it in the correct position using an interpolation from the start point to the end point. Once you have exceeded the duration, the sprite then gets updated to be not-moving and positioned in the end point/tile.
Why are you thinking it'll jump from tile to tile? You can position your sprite at any x,y co-ordinate.
First create your background screen buffer and then place your sprites on top of it.

Coordinate system Transitions

I have a game world with lots of irregular objects with varying coordinate systems controlling how objects on their surface work. However the camera and these objects can leave and move out into open empty space, where a normal Cartesian coordinate system is used. How do I manage mapping between the two?
One idea I had would be to wrap these objects in a bounds such as a sphere or box, within which said coordinate system would be used, however this becomes problematic if those bounding objects overlap, at which point I'm unsure whether the idea is fundamentally flawed or a solution can be found, since these objects are moving and could overlap at some point
I think you should place all your objects in the cartesian 'empty space' coordinate system by composition of your irregular objects coordinates system with the position matrix.
It adds a level, but will make everything easier.
Regarding the use of bounds I had an idea where the object would use the coordinate system of the smallest bounds it occupied, and then transform according to the heirarchy of systems from top to bottom.
Thus lets say stick figures on a cylinder adjacent to a large object would follow the cylinder rather than flitting between the two objects and their coordinate systems.
Irregardless of the local coordinate system around each of irregular objects, all points will still map to the global world coordinates at one point or another because eventually when you want to render your objects they'll have to get mapped into world space and then camera space. You can use the same object space to world space transform matrices to do the mapping.
You can use Lame's coefficients to transform the dimensions of different coordinate systems.
You can transform any kind of coordinate systems, your own as well. The only condition is to have orthogonal dimensions (every dimension has to be independent from other dimensions).
Here is some document I found: link text.
Hope it helps.

Resources