VNDetectFaceLandmarksRequest 50% slower when using regionOfInterest - apple-vision

I am doing real time face recognition on a video stream.
Right now, it's a bit slow, so I decided to use the regionOfInterest of my VNDetectFaceLandmarksRequest to reduce the size of the image where the algorithm has to do face recognition.
The underlying idea is that the face will be always more or less in the same position within two frame; so I use the previous faceObservation result with a transform.
In this case, drift is 0.05 (meaning that we allow the face to move at most 0.05% of the size of the frame)
My calculation is the following, and the bounding box seems correct:
CGRect(x: faceObservation.boundingBox.origin.x - self.drift, y: faceObservation.boundingBox.origin.y - self.drift, width: faceObservation.boundingBox.width + self.drift, height: faceObservation.boundingBox.height + self.drift * 2)
However, I noticed that the calculation is 50% slower when I set the regionOfInterest
This doesn't make sense to me.
Is there something I do wrong, or is my assumption incorrect ?

Related

Percieved width of a decal depending on the rotation angle of the wall

I am creating a raycasting game from scratch using JavaScript canvas.
Part of the challenge (for me) is to decorate walls with random images (pictures). I already implemented drawing of walls, floor an ceiling and sprites.
While drawing walls, I store for each x (depicting screen coordinate) the distance to the wall (Z-BUFFER), the height of the wall (H-BUFFER) and actual coordinates of the pixel in the underlying 2D grid (GRID_BUFFER).
My approach for painting the decals (pictures) on the wall is then the following (after identifying a list of decals that could theoretically be visible):
distance to the decal's position is calculated (position is defined as being in the middle of the grid vertice facing the observer)
screen coordinate decalScreenX is calculated based on the transformation matrix from grid coordinates to screen coordinates. This works correctly:
let decalScreenX = Math.floor((RAYCAST.SCREEN_WIDTH / 2) * (1 + CAMERA.transformX /CAMERA.transformDepth));
Then I retrieve image data for the decal in question and get it's width and height
And based on the distance and the observed angle, I calculate the percieved width of the decal. This is where the real issue lies, as I see that I don't calculate this width completely accurate.
with all this information, it is then easy to calculate left and right screen coordinates - where to begin and and where to end drawing the decal, use H-BUFFER to calculate height factor and use GRID_BUFFER to draw only on grid belonging to this decal.
I saw the width calculation in terms that decal is rotated from the player direction vector by an angle, if the player direction is not opposite of the direction with which decal faces the space (example):
or if player direction is directly opposite to the direction of decal, this angle is 0° (example):
My first approach was to use dot product of the reversed player direction and decal facing direction, thus getting cosine of the angle between vectors and use this as a factor to reduce perceived width:
let CosA = PLAYER.dir.mirror().dot(decal.facingDir);
let widthScale = CosA * (CAMERA.transformDepth / decal.distance);
The problem with this solution is, that when perpendicular , the factor is 0 and the decal is not drawn but as the walls are drawn with perspective, this should not be the case. So I began improvising. I defined CAMERA.minPerspective factor as seen below. Field of vision (FOV) is 70°.
CAMERA.minPerspective = Math.cos(Math.radians((90 + this.FOV) / 2));
My intuition was (as I lack the knowledge of perspective and geometry, alas) that for small angles, the factor should remain 1. And for angles close to 90° there should be some minimal factor, so that decal remains visible. So I came with this "improved" code:
let CosA = PLAYER.dir.mirror().dot(decal.facingDir);
let FACTOR = Math.min(1, CosA + CAMERA.minPerspective);
FACTOR = Math.max(FACTOR, CAMERA.minPerspective);
let widthScale = FACTOR * (CAMERA.transformDepth / decal.distance);
This works considerably better, but it has some flaws. Visually, for angles 0-50° the factor of reduction is too great. This can be observed if I use decals of such width, that they should cover complete grid surface. (see image below; left of the stairs the wall underneath is visible, decal should cover complete grid, but it doesn't, bacause the FACTOR is to small).
I have searched Stack Overflow and the rest of the Web for better solution, by it seems that my knowledge of geometry also prevents me to recognize proper solutions if they are out of this context.
So, please. There are probably deterministic solutions for calculating percieved width, without using raycasting phase again or by using the information I am able to store in raycasting phase. While JavaScript is used in code example, I consider this question not to be specific to any programming language.
I have found solution that retains (or even improves) simplicity and time complexity of the approach in the question.
I have added two points to the decal definition - leftDrawStart and
rightStartDraw. Those are easy to calculate at the point of decal
instantialization, based on real sprite (decal) width and the definition
of the grid (block) size. While doing this calculation, I consider leftDrawStart from the camera perspective (not grid coordinates).
when rendering decal, I calculate using transformation matrix (as in question, code example below) screen coordinates for leftDrawStart and rightStartDraw from their grid coordinates:
transform(spritePos) {
let invDet = 1.0 / (CAMERA.dir.x * PLAYER.dir.y - PLAYER.dir.x * CAMERA.dir.y);
CAMERA.transformX = invDet * (PLAYER.dir.y * spritePos.x - PLAYER.dir.x * spritePos.y);
CAMERA.transformDepth = invDet * (-CAMERA.dir.y * spritePos.x + CAMERA.dir.x * spritePos.y);
}
I distinguish the calculated absolute drawStartX and drawEndX, and their adjustment so that they fit the screen boundaries or return from function if they are completely offscreen
finally, percieved width of the decal is not even required since the texture position can be calculated by using ratio of differences between curent drawing stripe - absolute drawing start and difference of absolute drawing end - absolute drawing start:
let texX = (((stripe - drawStartX_abs) / (drawEndX_abs - drawStartX_abs)) * imageData.width) | 0;
The approach is completelly accurate and considerably faster in comparison to approach where decal casting would be incorporated in the raycasting step.

SVG cover strip modeling software

I am in a refactoring process for a client where their 2D modeling software needs to be rewritten. There is poor old logic for scaling things down that does not fit in the canvas. I was wondering can anyone provide a proper mathematical formula to scale down a vector based on canvas size, most important thing is that the ratio should be kept between lines when scaling down.
One single formula is not required I can take any suggestions with using any programming language.
Example image:
Incase someone models a 2000mm width cover strip the drawn line should be downscaled to fit in the canvas. In this case, pixels and millimeters are proportional.
I have tried using exponential downscaling like this, but that does not count the canvas size in any way.
20mm^0.85=12.76mm
10mm^0.85=7.07mm
5mm^0.85=3.92mm
I know this is more a mathematical question, but it's more like a programming problem.
Thank you for your time.
Since you are not specifying any language, I will outline the procedure. It is very easy to implement, for instance, in javascript. Let canvas.width and canvas.height the width and height of the canvas, and object.width and object.height the width and height of the object.
Start by calculating scx = object.width / canvas.width and scy = object.height / canvas.height.
If you only want to downscale (never upscale), then: If both scx and scy are lower than 1, then do nothing (the object fits). In any other case, the largest value max(scx, scy) is your scale factor. You must divide object.width and object.height by that scale factor.
*If you always want to fit the object to the canvas, then the largest value max(scx, scy) is your scale factor. You must divide object.width and object.height by that scale factor.
Just one more advice: you can easily set a margin (actually padding) by using a lower canvas.width and canvas.height. Say you use 90% of the actual sizes. Then you can set the origin point at 5% of the width and the height and you know that no object will be closer than 5% to any canvas limit.

Creating seamless rotated background image

I want to repeat a background image that is rotated. Trying to make it seamless is destroying my soul.
Starting with something simple, consider each image is laid out like bricks. Creating a seamless repeating background image is pretty simple:
(the red area is the crop). You can see this working as expected at http://jsfiddle.net/mPqfB.
Now let's say I want to rotate the image by 45 degrees:
Unfortunately, the same crop no longer works, as you can see on http://jsfiddle.net/mPqfB/1.
I'm trying to figure out how to crop the image correctly so that we have a seamless repeat. There's probably some fairly trivial maths involved to do this but I can't for the life of me figure it out.
[Update]
I'm attempting to follow #oezi's calculations so to make things easier have created an image of dimensions: 100px x 50px.
Therefore:
Least Common Multiple = 100
Hypotenuse = 1002 + 1002 = 20000
Now I'm assuming this means we don't have to create an image of 20000px x 20000px. Am hoping that #oezi can clarify how he performs his resizing??
If this is a2 + b2 = c2 is equal to c = square root of (a2 + b2)
Then we can concur that our crop should be 141px?
Finally, this doesn't actually explain where we take the crop from?
[Update 2]
It does look like this is how the resize should be created. Taking a 141px x 141px crop of the image yielded the correct results - http://jsfiddle.net/EfuV2/
As far as where to crop from, it doesn't actually matter!
is the rotation is exactly 45 degrees, you'll have to find out the least common multiple of the width and height of your unrotated pattern.
in your case, that's 15100 (width 100 and height 151)
it would be much better to scale your pattern to width 100 and height 150, so the least common multiple is only 300
Take that number and some math (pythagorean theorem). Assume your number is the length of the two short arms and calculate the length of the hypotenuse - that's our result (make a square image of that size to get your pattern).
in your case, that's 21355
with resizing, it's ~ 424
Note that this is just typed straight from my head because i can't try it out practically at the moment - but i'm really sure it's correct.
edit: a fast (and messy) test got me to this:
http://i.imgur.com/rZuu9.jpg
http://jsfiddle.net/mPqfB/2/ (click the image-link first, otherwise jsfiddle doesn't show the image)
accidentally i made the pattern only be 423 in height and the rotation isn't perfect (don't have photoshop here), but it's good enough to prove that my math is correct.
The trick is to crop the pattern at points where the section being cut off matches the section remaining on the opposite side of the crop area (see example cuts in blue). It'll probably take some trial and error to get it right but you should be able to do it easily enough.

What is the math behind -webkit-perspective?

"Simple" question that I can't find the answer to -- What does -webkit-perspective actually do mathematically? (I know the effect it has, it basically acts like a focal-length control) e.g. what does -webkit-perspective: 500 mean?!?
I need to find the on-screen location of something that's been moved using, among other things, -webkit-perspective
The CSS 3D Transforms Module working draft gives the following explanation:
perspective(<number>)
specifies a perspective projection matrix. This matrix maps a viewing cube onto a pyramid whose base is infinitely far away from the
viewer and whose peak represents the viewer's position. The viewable
area is the region bounded by the four edges of the viewport (the
portion of the browser window used for rendering the webpage between
the viewer's position and a point at a distance of infinity from the
viewer). The depth, given as the parameter to the function, represents
the distance of the z=0 plane from the viewer. Lower values give a
more flattened pyramid and therefore a more pronounced perspective
effect. The value is given in pixels, so a value of 1000 gives a
moderate amount of foreshortening and a value of 200 gives an extreme
amount. The matrix is computed by starting with an identity matrix and
replacing the value at row 3, column 4 with the value -1/depth. The
value for depth must be greater than zero, otherwise the function is
invalid.
This is something of a start, if not entirely clear. The first sentence leads me to believe the perspective projection matrix article on Wikipedia might be of some help, although in the comments on this post it is revealed there might be some slight differences between the CSS Working Group's conventions and those found in Wikipedia, so please check those out to save yourself a headache.
Check out http://en.wikipedia.org/wiki/Perspective_projection#Diagram
After reading the previous comments and doing some research and testing I'm pretty sure this is correct.
Notice that this is same for the Y coord too.
Transformed X = Original X * ( Perspective / ( Perspective - Z translation ) )
eg.
Div is 500px wide
Perspective is 10000px
Transform is -5000px in Z direction
Transformed Width = 500 * ( 10000 / ( 10000 - ( -5000 ) )
Transformed Width = 500 * ( 10000 / 15000) = 500 * (2/3) = 333px
#Domenic Oddly enough, the description "The matrix is computed by starting with an identity matrix and replacing the value at row 3, column 4 with the value -1/depth." has already been removed from the The CSS 3D Transforms Module working draft now. Perhaps there might have been some inaccuracies with this description.
Well, as to the question what does the number in perspective(<number>) means, I think it could be seen as the distance between the position of the imagined camera and your computer screen.

Gaussian Falloff Format for Mesh Manipulation

This return below is defined as a gaussian falloff. I am not seeing e or powers of 2, so I am not sure how this is related to the Gaussian falloff, or if it is the wrong kind of fallout for me to use to get a nice smooth deformation on my mesh:
Mathf.Clamp01 (Mathf.Pow (360.0, -Mathf.Pow (distance / inRadius, 2.5) - 0.01))
where Mathf.Clamp01 returns a value between 0 and 1.
inRadius is the size of the distortion and distance is determined by:
sqrMagnitude = (vertices[i] - position).sqrMagnitude;
// Early out if too far away
if (sqrMagnitude > sqrRadius)
continue;
distance = Mathf.Sqrt(sqrMagnitude);
vertices is a list of mesh vertices, and position is the point of mesh manipulation/deformation.
My question is two parts:
1) Is the above actually a Gaussian falloff? It is expontential, but there does not seem to be the crucial e or power of 2... (Updated - I see how the graph seems to decrease smoothly in a Gaussian-like way. Perhaps this function is not the cause for problem 2 below)
2) My mesh is not deforming smoothly enough - given the above parameters, would you recommend a different Gaussian falloff?
Don't know about meshes etc. but lets see that math:
f=360^(-0.1- ((d/r)^2.5) ) looks similar enough to gausian function to make a "fall off".
i'll take the exponent apart to show a point:
f= 360^( -(d/r)^2.5)*360^(-0.1)=(0.5551)*360^( -(d/r)^2.5)
if d-->+inf then f-->0
if d-->+0 then f-->(0.5551)
the exponent of 360 is always negative (assuming 'distance' and 'inRadius' are always positive) and getting bigger (more negative) almost cubicly ( power of 2.5) with distance thus the function is "falling off" and doing it pretty fast.
Conclusion: the function is not Gausian because it behaves badly for negative input and probably for other reasons. It does exibits the "fall off" behavior you are looking for.
Changing r will change the speed of the fall-off. When d==r the f=(1/360)*0.5551.
The function will never go over 0.5551 and below zero so the "clipping" in the code is meaningless.
I don't see any see any specific reason for the constant 360 - changing it changes the slope a bit.
cheers!

Resources