Description:
The goal of my current project is to determine the location of an "object" with just its 3D-coordinates.
To achieve that I figured it'd be best to turn off the "Fill"-Mode of my Camera (ZED 2 from Stereolabs), because I want some hard edges in my depth-image.
The Problem:
The depth image is being distorted to a major degree due to proximity of other "objects".
The following image shows the depth image from the side, it is viewing some bars before a smooth woodwall. The wall is mostly plain, so everything is fine here.
I blacked the Color-Image and Myself, do not worry about those parts.
When I put my hand or another object in front of the wood wall parts that are bigger than my actual hand get "pulled" towards the camera around the location of the hand or other object. These parts seem to "stick" to other elevated parts in the proximity, as the area between the bars and my arm gets pulled entirely.
Question(s):
Is this normal?
Is there an easy way to get rid of it?
What is the reason behind it?
My own assumption(s):
Feel like this is some sort of approximation of unknown parts
Hopefully.. Glad the camera was calibrated by default, as that usually is a pain to do right.
Due to the new object that gets put in front of the wall, there is more stuff hidden and therefore more areas that the camera cannot see with both lenses, maybe it just "guesses" that the area between is not so far off due to some underlying algorithms that make the image smoother..
First of all I would advice you to change the depth mode also with keeping the sensing mode in STANDARD:
ULTRA: offers the highest depth range and better preserves Z-accuracy along the sensing range.
QUALITY: has a strong filtering stage giving smooth surfaces.
PERFORMANCE: designed to be smooth, can miss some details.
*********************From your description, it seems like you are using the Performance mode
The ZED Camera uses a matching alogorithm to generate the disparity/depth map, which is a closed source and I have recently contacted stereolabs about that and they've said "We cannot disclose this information to you because it's internal information and proprietary to Stereolabs."
Other works on the zed camera showed some limitations in depth sensing, specially when there is a variation in lightning and shadows. """Depth Data Error Modeling of the ZED 3D Vision Sensor from
Stereolabs"""
In addition to this, the depth error is directly proportional to the distance of the object from the camera, so make sure to set your depth range properly.
Related
I have noticed that every computer graphics system I have ever used uses a left-handed coordinate system with its origin in the upper left corner. Cairo, Java, Microsoft XYZ, and most graphics programs all use this system. I assume they all date back to a common ancestor, but I can't find any references about this.
If I had to guess I'd say it came from VGA graphics mode, using the same coordinates as text, which were naturally based on how the English language is read top-down, left-right, with the "second line" below the "first line"... but I'm making that up.
Was anyone around to tell the tale, or can point me in the direction of the correct history book?
It's an old convention, and the reasons might be a bit apocryphal. Here are some hypotheses I've found:
It's derived from CRT electron beam sweep behavior.
Scanning from top to bottom means you don't have to wait for an entire frame to be sent first, you just begin scanning as soon as you begin receiving data. (Which raises the question again, why scan from top to bottom)
It allows a right-handed coordinate system with the Z axis going into the screen rather than coming out of it.
Annoyingly, Cocoa and Quartz use lower-left origin.
I doubt that is an old convention that is kept due to legacy reasons.
UpperLeft has the advantage, that is no language writing system that goes from bottom to up. So in UpperLeft is easier:
To place multiline text
Work with pages of unknown or infinite height
If the page height is changed (ie bigger or smaller device), in BottomLeft you have to translate every object coordinate, while in UpperLeft you don't.
The last one extends also to dynamic placement and layouts, where a graphics object's coordinates are offsets to their parent
No idea. I don't think there is a definitive answer. It's likely that when people still had console based machines it made sense to go from the top left corner down to the bottom right. It's how a lot of people in the world read, as you've said. It makes sense to put the origin there.
http://en.wikipedia.org/wiki/Memory-mapped_I/O
The wikipedia article has some information about memory mapped displays. Say for example we dedicate a part of our memory to turning off and on pixels on the screen. And we let address 0 be the upper left hand part of the screen and move over in chunks turning on and off pixels depending on if they're in the memory. That's basically what the first article is saying.
I don't know if they let address 0 be the upper left hand side of a display but it makes sense and it might have just carried over.
I was also wondering about the same question. Here is another source:
The origin is always in the upper left. And that comes from the fact that, kind of TVs when they're first built, scan from left to right and then top to bottom. So it doesn't work he same way you saw kind of at high school geometry where the origin wasn't always in the lower left....
An Introduction to Interactive Programming in Python
First of: Thanks for taking the time to help me with my problem. It is much appreciated :)
I am building a natural user interface. I’d like the interface to detect several (up to 40) objects lying on it. The interface should detect if the objects are moved on it’s the canvas. It is not important what the actual object on surface is
e.x. “bottle”
or what color it has – only the shape and the placement of the object is of interest
e.x. “circle” .
So far I’m using a webcam connected to my computer and Processing’s blob functionality to detect the objects on the surface of the interface (see picture 1). This has some major disadvantages to what I am trying to accomplish:
I do not want the user to see the camera or any alternative device because this is detracting the user’s attention. Actually the surface should be completely dark.
Whenever I am reaching with my hand to rearrange the objects on the interface, the blob detection gets very busy and is recognizing objects (my hand) which are not touching the canvas directly. This problem can hardly be tackled using a Kinect, because the depth functionality is not working through glass/acrylic glass – correct me if I am wrong.
It would be nice to install a few LEDs on the canvas controlled by an Arduino. Unfortunately, the light of the LEDs would disturb the blob detection.
Because of the camera’s focal length, the table needs to be unnecessarily high (60 cm / 23 inch).
Do you have any idea on an alternative device/technology to detect the objects? Would be nice if the device would work well with Processing and Arduino.
Thanks in advance! :)
Possibilities:
Use Reflective tinted glass so that the surface would dark or reflective
Illuminate the area, where you place the webcam with array of IR LED's.
I would suggest colour based detection and contouring of the objects.
If you are using colour based detection convert frames to HSV and CrCb colour space. These are much better for segmentation of required area while using colour based detection.
I do recommend you to check out https://github.com/atduskgreg/opencv-processing. This interfaces Open-CV with processing, you will be getting lot functionalities of Open-CV in processing .
One possibility:
Use a webcam with infrared capability (such as a security camera with built-in IR illumination). Apparently some normal webcams can be converted to IR use by removing a filter, I have no idea how common that is.
Make the tabletop out of some material that is IR-transparent, but opaque or nearly so to visible light. (Look at the lens on most any IR remote control for an example.)
This doesn't help much with #2, unfortunately. Perhaps you can be a bit pickier about the size/shape of the blobs you recognize as being your objects?
If you only need a few distinct points of illumination for #3, you could put laser diodes under the table, out of the path of the camera - that should make a visible spot on top, if the tabletop material isn't completely opaque. If you need arbitrary positioning of the lights - perhaps a projector on the ceiling, pointing down?
Look into OpenCV. It's an open source computer vision project.
In addition to existing ideas (which are great), I'd like to suggest trying TUIO Processing.
Once you have the camera setup (with the right field of view/lens/etc. based on your physical constraints) you could probably get away with sticking TUIO markers to the bottom of your objects.
The software will pickup detect the markers and you'll differentiate the objects by ID, but also be able to get position/rotation/etc. and your hands will not be part of that.
Scenario
I have a 3D environment which contains a 3D scene and a '2D' scene.
The 3D scene contains a cube and a perspective camera.
The '2D' scene contains 4 round objects and an orthographic camera. These round objects can be moved around by the user therefor the orthographic camera is used otherwise the round objects can be moved 'in depth' (along z-axis) and could change in size and i want them to maintain size.
Depending on positioning the round objects, the corners of the cube in the 3D scene should be aligned with the positions of the round objects. And maintaining perspective.
Edit:
What i am trying to accomplish is: Based on an image of a room a user uses those round objects to define the dimensions of the room. Based on those dimensions a hidden cube is positioned to act as a boundery box. The next step would be to add 3d objects to the scene and maintaining perspective of the room.
I tried explaining this scenario in a picture:
Problems
Basically i have no clue where to start.
The round objects are in a '2D' environment because of the orthographic camera, therefor i have no depth value that i think i need.
I think i need some perspective transformation based on camera positions/settings? There are all sorts of matrices that could be produced but don't know how to implement them.
Sources i studied
http://www.graphicsmill.com/docs/gm/affine-and-projective-transformations.htm
below is a similar situation
https://math.stackexchange.com/questions/296794/finding-the-transform-matrix-from-4-projected-points-with-javascript
Cannot post more links because of my reputation
I hope someone can make this clear or point me in the right direction
Counting the real degrees of freedom, I would say that you don't have enough data. Imagine the projetive camera of the 3D scene as an actual pinhole camera. Then the image that camera creates on its film, sensor or whatever is described by at least 9 parameters:
3 parameters for the position of the camera in space,
2 parameters for the direction the camera is looking at and
1 parameter rotating the camera + sensor around their optical axis,
1 parameter determining the distance from pinhole to sensor and
2 parameters translating the sensor in its plane
On the other hand, knowing a projective transformation from one plane to another, e.g. using my answer to the question you already referenced, will only yield 8 geometrically meaningful parameters. So you cannot hope to reconstruct the camera position from that, so you cannot find the image of the 3D scene that would fit your markers. The Wikipedia article on 3D pose estimation writes that
Most implementations of POSIT only work on non-coplanar points (in other words, it won't work with flat objects or planes).[3]
That being said, you gave an example of where someone is actually doing this! So how do they do it? Honestly, I'm not sure, but they would have to make use of some additional knowledge or extra assumptions. For example, if they knew details about their camera (focal length, relative position between lens and sensor, or something like that), that could provide the required data. Since these apps tend to work on mobile devices, I think it rather likely that they might have either an API to request these things or a database where they can be looked up for the more common devices.
Judging from your question, you don't have that. Neither do you have all the vertical edges of the cube depicted vertically parallel to one another, which would have been another possible way to add more information. You have to come up with one more piece of information in order to allow for a hopefully unique solution.
Of course, without more information the system is just underspecified. It's not hard to find any transformation matrix which does what you requested. Actually the answer I references is placed in a setup where a 2D to 2D map is to be modeled using a 3D transformation matrix. You can do the same and be done with it. But your users might become frustrated, since the transformation they obtain might do completely wrong things to the out-of-plane direction, and there is no knob to tune that to the correct behavior.
I'm working on a game (using Game Maker: Studio Professional v1.99.355) that needs to have both user-modifiable level geometry and AI pathfinding based on platformer physics. Because of this, I need a way to dynamically figure out which platforms can be reached from which other platforms in order to build a node graph I can feed to A*.
My current approach is, more or less, this:
For each platform consider each other platform in the level.
For each of those platforms, if it is obviously unreachable (due to being higher than the maximum jump height, for example) do not form a link and move on to next platform.
If a link seems possible, place an ai_character instance on the starting platform and (within the current step event) simulate a jump attempt.
3.a Repeat this jump attempt for each possible starting position on the starting platform.
If this attempt is successful, record the data necessary to replicate it in real time and move on to the next platform.
If not, do not form a link.
Repeat for all platforms.
This approach works, more or less, and produces a link structure that when visualised looks like this:
linked platforms (Hyperlink because no rep.)
In this example the mostly-concealed pink ghost in the lower right corner is trying to reach the black and white box. The light blue rectangles are just there to highlight where recognised platforms are, the actual platforms are the rows of grey boxes. Link lines are green at the origin and red at the destination.
The huge, glaring problem with this approach is that for a level of only 17 platforms (as shown above) it takes over a second to generate the node graph. The reason for this is obvious, the yellow text in the screen centre shows us how long it took to build the graph: over 24,000(!) simulated frames, each with attendant collision checks against every block - I literally just run the character's step event in a while loop so everything it would normally do to handle platformer movement in a frame it now does 24,000 times.
This is, clearly, unacceptable. If it scales this badly at a mere 17 platforms then it'll be a joke at the hundreds I need to support. Heck, at this geometric time cost it might take years.
In an effort to speed things up, I've focused on the other important debugging number, the tests counter: 239. If I simply tried every possible combination of starting and destination platforms, I would need to run 17 * 16 = 272 tests. By figuring out various ways to predict whether a jump is impossible I have managed to lower the number of expensive tests run by a whopping 33 (12%!). However the more exceptions and special cases I add to the code the more convinced I am that the actual problem is in the jump simulation code, which brings me at long last to my question:
How would you determine, with complete reliability, whether it is possible for a character to jump from one platform to another, preferably without needing to simulate the whole jump?
My specific platform physics:
Jumps are fixed height, unless you hit a ceiling.
Horizontal movement has no acceleration or inertia.
Horizontal air control is allowed.
Further info:
I found this video, which describes a similar problem but which doesn't provide a good solution. This is literally the only resource I've found.
You could limit the amount of comparisons by only comparing nearby platforms. I would probably only check the horizontal distance between platforms, and if it is wider than the longest jump possible, then don't bother checking for a link between those two. But you might have done this since you checked for the max height of a jump.
I glanced at the video and it gave me an idea. Instead of looking at all platforms to find which jumps are impossible, what if you did the opposite? Try placing an AI character on all platforms and see which other platforms they can reach. That's certainly easier to implement if your enemies can't change direction in midair though. Oh well, brainstorming is the key to finding something.
Several ideas you could try out:
Limit the amount of comparisons you need to make by using a spatial data structure, like a quad tree. This would allow you to severely limit how many platforms you're even trying to check. This is mostly the same as what you're currently doing, but a bit more generic.
Try to pre-compute some jump trajectories ahead of time. This will not catch all use cases that you have - as you allow for full horizontal control - but might allow you to catch some common cases more quickly
Consider some kind of walkability grid instead of a link generation scheme. When geometry is modified, compute which parts of the level are walkable and which are not, with some resolution (something similar to the dimensions of your agent might be good starting point). You could also filter them with a height, so that grid tiles that are higher than your jump height, and you can't drop from a higher place on to them, are marked as unwalkable. Then, when you compute your pathfinding, as part of your pathfinding step you can compute when you start a jump, if a path is actually executable ('start a jump, I can go vertically no more than 5 tiles, and after the peak of the jump, i always fall down vertically with some speed).
I am trying to render as realistically as possible a scene in which a point light hits an object and bounces off with the same angle wrt the normal of the face (angle of incidence = angle of reflection) and illuminates the scene elsewhere.
Now, I know reflection in threejs is normally dealt with CubeCamera-material as per the examples I found online, but it doesn't quite apply to my case, for I may be observing the scene from a point in which I might not be able to observe the reflection of the object on the mirror-like surface of another one.
Consider this example prototype I'm working on: if the box that is protruding from the wall in the scene had a mirror-like material (accomplished with a CubeCamera), I wouldn't be able to see the green cube's reflection on the bottom face unless the camera was at a specific position; in real life, however, if an object illuminated by a light source passes in the vicinity of another one, it will in part light it as if it were a light source itself (depending on the object's index of reflectivity, of course) and such phenomenon should be visible from any point of view the object receiving indirect lighting is visible from.
Hence I came up with the idea of adding a PointLight to the cube, but this of course produces undesirable effects on the surroundings.
I will try to illustrate my goal with the following sequence:
1) Here, the far side of what I will henceforth refer to as balcony is correctly dark, while the areas marked with a red 'x' are the consequence of the cube having a child PointLight which shines in all directions.
2) Here, the balcony's far face is still dark and the bottom one is receiving even more light as the cube passes by, which is desirable, but the wall behind the cube should actually be dark (I haven't added shadows yet, I first want to get the lighting right), as well as the ground beneath it and the lamp post.
3) Finally, when the cube has passed the balcony, it's just plain wrong for the balcony's side and bottom face to be illuminated, for we all now that a reflected ray does not bounce back the way it came from. Same applies to the lamp post.
Now I realize that all the mistakes that occur are due to the fact that the cube emits light itself, what I'm hoping you can help me with is determining a way to produce physically accurate reflected rays.
I would like to avoid using ambient light or other hacks to simulate real-life scenarios and stick to physics as much as possible; I suspect what I want to achieve is very computationally heavy to render, let alone animate in a real-time use case, but that's not an issue for I'm merely trying to develop a proof-of-concept, not something that should necessarily perform fast.
From what I gather, I should probably be writing custom vertex and fragment shaders for the materials receiving indirect illumination, right? Unfortunately I wouldn't know where to begin, can anyone point me in the right direction? Cheers.
If you do not want to go to the Volumetric rendering then you have 3 options (I know of)
ray-tracing
you have to use ray-trace rendering (back ray-trace) to achieve this. This will also cover shadows,transparent materials,reflected illumination and much more if coded properly. Unless you want to do also precise atmospheric scattering then this is the way.
back raytracing is one (or 3) ray(s) per each screen pixel. It is much faster but not that precise.. (still precise enough)
raytracing is one ray per each 3D angular unit (steradian) of space per each light source. It is slow but precise (if ray density is high enough).
If the casted ray hits any obstacle then its color is changed (due to obstacle property) and new ray is casted as reflected light ray. If material is transparent then also refracted ray is casted ... Each hit or refraction affect light intensity so you stop when intensity is lower then some treshold or on some layer of recursion (limit max number of refractions per ray) to avoid infinite loops and you can manipulate performance/quality ...
standard polygon rendering
With this approach (I think you are using it right now) you have to improvise. The reflection and illumination effects can be done similar to shadowing techniques. For each surface you have to render the scene in reflected direction. The same can be done with shadows but then you just rendering to the light direction or use shadow map instead. If you have insane number of reflective surfaces then this approach is not the way also to achieve reflection of refraction you have to render recursively making it multiple rendering pass per polygon which is also insane.
cubemap
You can use cube map per each object. It is similar to bullet 2 but the insanity is done just once while generating cubemaps instead of per frame ... If you have too much objects then this is also not the way. You can use cube map only for objects with reflective surfaces to make it manageable. Also if the objects are moving then you have to re-generate cubemaps once in a while ...