Fixed point multiplication in assembly (x86)

Fixed point multiplication in assembly (x86) - math

I want to multiply and divide an unsigned 8.8 fixed-point number in the
ax register with 1.00125 and store the result in ax as well.
I know that fixed point multiplication/division requires some extra steps
but I have no idea how to implement those in assembly.
Help is greatly appreciated.

If you care about accuracy, 1.00125 can't be stored exactly in any integer format or in any floating point format because it's a recursive fraction in binary (in binary it's 1.000000000101000111101011100001010001111010111...b where that 00001010001111010111 sequence repeats forever). For this reason I'd convert it into the rational number 801/800; and then do x * 1.00125 = (x * 801) / 800 (possibly with "round to nearest" on the division).
If you don't care about accuracy, then the more bits you can use for "1.00125" the closer the result will be to the correct answer. With 8 bits ("1.7 fixed point") the closest you can get is 1.0000000b, which means you can just skip the multiplication (x * 1.00125 = x). With 16 bits ("1.15 fixed point") the closest you can get is 1.000000000101001b (or 1.001220703125 in decimal).
However, you can cheat more. Specifically, you can significantly increase accuracy with the same number of bits by doing (x * 1) + (x * 0.00125). E.g. instead of having a 16 bit constant like 1.000000000101001b (where 9 bits are zeros), you can have a 16-bit constant like 0.0000000001010001111010111b (where the 16 bits are the last 16 bits without any of the leading zeros). In this case the constant is very close (like 0.00124999880) rather than "less close" (like 1.001220703125 was).
Ironically; with only 16 bits, this "0.00125" is more accurate than a 32-bit floating point representation of 1.00125 can be.
So.. in assembly (assuming everything is unsigned) it might look like:
;ax = x << 8 (or x as an 8.8 fixed point number)
mov cx,ax ;cx = x << 8
mov bx,41943 ;bx = 41943 = 0.00124999880 << 25
mul bx ;dx:ax = (x << 8) * (0.00124999880 << 25) = x * 0.00124999880 << 33
;dx = x * 0.00124999880 << 17
shr dx,9 ;dx = x * 0.00124999880 << 17 >> 9 = x * 0.00124999880 << 8, carry flag = last bit shifted out
adc dx,0 ;Round up to nearest (add 1 if last bit shifted out was set)
lea ax,[dx+cx] ;ax = x << 8 + x * 0.00124999880 << 8 = x * 1.00124999880 << 8
Of course the problem here is that converting it back to "8.8 fixed point" ruins most of the accuracy anyway. To keep most of the accuracy, you could use a 32-bit result ("8.24 fixed point") instead. This might look like:
;ax = x << 8 (or x as an 8.8 fixed point number)
mov cx,ax ;cx = x << 8
mov bx,41943 ;bx = 41943 = 0.00124999880 << 25
mul bx ;dx:ax = (x << 8) * (0.00124999880 << 25) = x * 0.00124999880 << 33
add ax,1 << 8 ;To cause the following shift to round to nearest
adc dx,0
shrd ax,dx,9
shr dx,9 ;dx:ax = x * 0.00124999880 << 33 >> 0 = x * 0.00124999880 << 24
;cx:0 = x << 24
add dx,cx ;dx:ax = x << 24 + x * 0.00124999880 << 24 = x * 1.00124999880 << 24
The other problem is that there's potential overflow. E.g. if x was 0xFF.FF (or about 255.996) the result would be something like 256.32 which is too big to fit in an "8.8" or "8.24" or "8.anything" fixed point format. To avoid that problem you can just increase the number of integer bits (and reduce the accuracy by 1 bit) - e.g. make the result "9.7 fixed point", or "9.23 fixed point".
The important points here are:
a) For "fixed point" calculations, every operation (multiplication, division, addition, ...) causes the decimal point to move.
b) Because the decimal point keeps moving, it's best to adopt a standard notation for where the decimal point is at each step. My way is to include an "explicit shift" in the comments (e.g. "x << 8" rather than just "x"). This "explicit shift documented in the comments" makes it easy to determine where the decimal point moves, and if/how much you need to shift left/right to convert to a different fixed point format.
c) For good code, you need to pay attention to accuracy and overflow, and this causes the decimal point to move around even more (and makes the use of a "standard notation for where the decimal point is" more important).

An easy solution is to just use the x87 floating point unit to do the multiplication. Assuming real mode with nasm (untested):
example:
push bp
mov sp, bp ; establish stack frame
push ax
push ax ; make space for quotient
fild word [bp-2] ; load number
fld st0 ; duplicate top of stack
fmul dword [factor] ; compute product
fistp word [bp-2]
fmul dword [invfac] ; compute quotient
fistp word [bp-4]
pop dx ; quotient
pop ax ; product
pop bp ; tear down stack framt
ret
factor dd 1.00125
invfac dd 0.999875 ; 1/1.00125
This leaves the quotient in dx and the product in ax. Rounding is done according to the rounding mode configured in the x87 FPU (which should be rounding to nearest by default).

One thing to understand about fixed point multiplication that the point of rhe result is the point of operand 1 plus the point of operand 2.
Thus, when multiplying two numbers with fixed point of zero, we get a result with fixed point zero.
And when multiplying two numbers with fixed point at 8 places (binary) we get a number with fixed point at 16 places.
So, need to scale down such result as needed.

Related

How to calculate the length of a curve of a math function?

I got curve C and I want to compute the curve length between its 2 points A,B:
f(x) = x² + 2x
C( x,f( x))
A(-2.0,f(-2.0))
B( 0.5,f( 0.5))
so x=<-2.0,0.5>
How to calculate the curve length between points A,B ?
Of course I want to know how to calculate it on sheets :)
Thank you ;)

You can simply compute many n points along the curve and add up the distances between them approximating your curve with many small line segments. That is actually how curve integration is done when the number of points goes to infinity. Without any higher math we can set n to some big enough value and add it in O(n) for loop. For example in C++ like this:
#include <math.h>
double f(double x){ return (x*x)+x+x; } // your function
double length(double x0,double x1,int n) // length of f(x) x=<x0,x1>
{
int e;
double x,dx,y,dy,l;
y=f(x0); dx=(x1-x0)/double(n-1); l=0.0; // start y and length
for (e=1,x=x0+dx;e;x+=dx) // loop through whole x range
{
if (x>=x1) { x=x1; e=0; } // end?
dy=y; y=f(x); dy=y-dy; // y=f(x) and dy is y-oldy
l+=sqrt((dx*dx)+(dy*dy)); // add line length
}
return l; // return length
}
use like this:
cout << length(-2.0,0.5,10) << endl;
cout << length(-2.0,0.5,100) << endl;
cout << length(-2.0,0.5,1000) << endl;
cout << length(-2.0,0.5,10000) << endl;
cout << length(-2.0,0.5,100000) << endl;
cout << length(-2.0,0.5,1000000) << endl;
when the result start saturating stop increasing n as you found your solution (with some error of coarse) Here results on my machine:
4.57118083390485
4.30516477250995
4.30776425810517
4.30551273287911
4.30528771762491
4.30526521739629
So we can round the answer to for example 4.305 ...
Of coarse if you compute the curve integral algebraically instead of this then you can obtain precise answer in O(1) if integrable of coarse...

Here is Python 3 code that will approximate the length of an arc of a function graph. It is designed for continuous functions, though no computer program can do the infinitely many calculations needed to get the true result.
"""Compute the arc length of the curve defined by y = x**2 + 2*x for
-2 <= x <= 0.5 without using calculus.
"""
from math import hypot
def arclength(f, a, b, tol=1e-6):
"""Compute the arc length of function f(x) for a <= x <= b. Stop
when two consecutive approximations are closer than the value of
tol.
"""
nsteps = 1 # number of steps to compute
oldlength = 1.0e20
length = 1.0e10
while abs(oldlength - length) >= tol:
nsteps *= 2
fx1 = f(a)
xdel = (b - a) / nsteps # space between x-values
oldlength = length
length = 0
for i in range(1, nsteps + 1):
fx0 = fx1 # previous function value
fx1 = f(a + i * (b - a) / nsteps) # new function value
length += hypot(xdel, fx1 - fx0) # length of small line segment
return length
def f(x):
return x**2 + 2*x
print(arclength(f, -2.0, 0.5, 1e-10))
You can set the "tolerance" for the result. This routine basically follows the mathematical definition of the length of an arc. It approximates the curve with joined line segments and calculates the combined length of the segments. The number of segments is doubled until two consecutive length approximations are closer than the given tolerance. In the graphic below the blue segments are added together, then the red segments, and so on. Since a line is the shortest distance between two points, all the approximations and the final answer will be less than the true answer (unless round-off or other errors occur in the calculations).
The answer given by that code is
4.3052627174649505
The result from calculus, reduced to a decimal number, is
4.305262717478898
so my result is a little low, as expected, and is within the desired tolerance.
My routine does have some features to reduce computations and improve accuracy, but more could be done. Ask if you need more, such as the calculus closed form of the answer. Warning--that answer involves the inverse hyperbolic sine function.

Hash Value for 3D Vector

Is there a way to represent a 3D Vector as a definite number? I mean that two vectors with different values can't ever have the same hash value. I'm sure there already is a question about this but I haven't found it unfortunately. Thanks for your help.
EDIT:
I know this algorithm for 2D vectors which is pretty good (I think): (x + y) * (x + y + 1) / 2 + y

The best approach to get a hash for a vector of floats is to convert it to a string of bytes or characters and calculate a hash on it. An example of this is given using numpy and python in the following answer:
Most efficient property to hash for numpy array.
This will work efficiently for large numbers of vectors, but you cannot guarantee that you will not get collisions due to the simple fact of mapping three floats onto an integer. However there are a number of hashing algorithms available in the python hashlib library to choose from, you might need to experiment. An option in C++ is Boost::Hash.

See the pigeon-hole principle - in the same way you can't fit you can't 100 pigeons into 10 holes, you can't uniquely convert 3 values into 1 value (with all values of the same size). There will have to be duplicates.
Now, if you could have a number with 3x as many bits as the vector values, the problem becomes fairly easy:
// convert x, y and z to the range 0-...
x -= minimum possible value
y -= minimum possible value
z -= minimum possible value
mult = maximum possible value + 1
hash = x * mult * mult + y * mult + z
If you're having trouble understanding the above, just take the example of the range of the values being 0-99. We'd multiple x by 100*100 = 10000 and y by 100, so the hash would be a decimal value with (at most) 6 digits with x, y and z all next to each other, guaranteed to not overlap:
x = 12
y = 34
z = 56
hash = 123456
Now this same idea will hold for any maximum value by just changing the base / radix.
If there isn't any overlap in some base, each unique combination of values of x, y and z will result in a unique hash.
This is by far the simplest approach, although it doesn't produce a particularly good hash, so it depends what you want to use it for - there might be a way to uniquely convert this number to another number which will be a good hash.

Responding to this post a little late, and perhaps this isn't what you're looking for, but I figured I would chime in with another answer.
You could use the function you mentioned, (x + y) * (x + y + 1) / 2 + y , and do it recursively, ex. f( f(x,y) , z).
You can also use other pairing functions as well and use the same method (https://en.wikipedia.org/wiki/Pairing_function).
For my problem, I wanted a function that would order vectors based on their location. The order itself didn't matter, only that a close value means a similar vector. What I ended up doing was:
double get_pairing(double x, double y, double z) {
double normalizer = 0.0;
if(x < 0) {
normalizer += (3.0 * MAX_COORD_VAL);
}
if (y < 0) {
normalizer += (6.0 * MAX_COORD_VAL);
}
if (z < 0) {
normalizer += (9.0 * MAX_COORD_VAL);
}
double g = x + y + z - normalizer + (21 * MAX_COORD_VAL);
return g;
}
This orders vectors based on whether they have negative coordinate values and whether they have large coordinate values.
This works assuming you have a max coordinate value.

Can someone explain the process of multi-precision multiplication in assembly?

Refer to the figure below and this link.
Regarding Figure 2.3, I understand why M (multiplier) and N (multiplicand) are in those orders that are listed in "Partial product..M..L" that's in the rightmost column. It comes from how we were normally taught to multiply:
I understand why the figure is 64-bits long, because it's 32-bits times 32-bits.
I understand the addresses go from P~P+7 that way because the H.O. bit of the final product starts at P and L.O. bit of the final product ends at P+7.
I understand why each large rectangle is split into an upper and lower half, because the HCS12 can only handle a maximum of 16-bits times 16-bits at a time.
My problem: The way each small rectangle (lower and upper halves) are arranged is confusing me. Apparently, it's supposed to mimic the simplified multiplication process, which I can understand how is being done. I just don't understand entirely how it translates into the figure. The link from my first line also shows a similar process. I don't want to guess or assume what I think is happening. Can someone please explain in large detail (preferably steps) how you figure out which small rectangle goes into which column and row; or in other words, can you tell me how the multiplication process translates into the figure?

The equation you have is
( MH<<16 + ML ) x ( NH<<16 + NL )
with << meaning "shift left by". Note that a shift left by 16 is equivalent to a multiplication by 65536, and two shifts by 16 are equivalent to one by 32.
If you multiply this out, you get
ML x NL +
MH<<16 x NL +
ML x NH<<16 +
MH<<16 x NH<<16
If you pull the shifts out:
(ML x NL) << 0 +
(MH x NL) << 16 +
(ML x NH) << 16 +
(MH x NH) << 32
Now the shift amounts show the number of bits each block is shifted by left in the graphic.

comparing two angles

Given four points in the plane, A,B,X,Y, I wish to determine which of the following two angles is smaller ∢ABX or ∢ABY.
The angle ∢ABX is defined as the angle of BX, when AB is translated to lie on the open segment (-∞,0]. Intuitively when saying ∢ABX I mean the angle you get when you turn left after visiting vertex B.
I'd rather not use cos or sqrt, in order to preserve accuracy, and to minimize performance (the code would run on an embedded system).
In the case where A=(-1,0),B=(0,0), I can compare the two angles ∢ABX and ∢ABY, by calculating the dot product of the vectors X,Y, and watch its sign.
What I can do in this case is:
Determine whether or not ABX turns right or left
If ABX turns left check whether or not Y and A are on the same side of the line on segment BX. If they are - ∢ABX is a smaller than ABY.
If ABX turns right, then Y and A on the same side of BX means that ∢ABX is larger than ∢ABY.
But this seems too complicated to me.
Any simpler approach?

Here's some pseudocode. Doesn't detect the case when both angles are the same. Also doesn't deal with angle orientation, e.g. assumes all angles are <= 180 degrees.
v0 = A-B
v1 = X-B
v2 = Y-B
dot1 = dot(v0, v1)
dot2 = dot(v0, v2)
if(dot1 > 0)
if(dot2 < 0)
// ABX is smaller
if(dot1 * dot1 / dot(v1,v1) > dot2 * dot2 / dot(v2, v2) )
// ABX is smaller
// ABY is smaller
if(dot2 > 0)
// ABY is smaller
if(dot1 * dot1 / dot(v1,v1) > dot2 * dot2 / dot(v2,v2) )
// ABY is smaller
// ABX is smaller
Note that much of this agonizing pain goes away if you allow taking two square roots.

Center the origin on B by doing
X = X - B
Y = Y - B
A = A - B
EDIT: you also need to normalise the 3 vectors
A = A / |A|
X = X / |X|
Y = Y / |Y|
Find the two angles by doing
acos(A dot X)
acos(A dot Y)
===
I don't understand the point of the loss of precision. You are just comparing, not modifying in any way the coordinates of the points...

You might want to check out Rational Trigonometry. The ideas of distance and angle are replaced by quadrance and spread, which don't involve sqrt and cos. See the bottom of that webpage to see how spread between two lines is calculated. The subject has its own website and even a youtube channel.

I'd rather not use cos or sqrt, in order to preserve accuracy.
This makes no sense whatsoever.
But this seems too complicated to me.
This seems utterly wrong headed to me.
Take the difference between two vectors and look at the signs of the components.
The thing you'll have to be careful about is what "smaller" means. That idea isn't very precise as stated. For example, if one point A is in quadrant 4 (x-component > 0 and y-component < 0) and the other point B is in quadrant 1 (x-component > 0 and y-component > 0), what does "smaller" mean? The angle of the vector from the origin to A is between zero and π/2; the angle of the vector from the origin to B is between 3π/4 and 2π. Which one is "smaller"?

I am not sure if you can get away without using sqrt.
Simple:
AB = A-B/|A-B|
XB = X-B/|X-B|
YB = Y-B/|Y-B|
if(dot(XB,AB) > dot (YB,AB)){
//<ABY is grater
}
else
{
...
}

Use the law of cosines: a**2 + b**2 - 2*a*b*cos(phi) = c**2
where a = |ax|, b =|bx| (|by|), c=|ab| (|ay|) and phi is your angle ABX (ABY)

Vertex shader world transform, why do we use 4 dimensional vectors?

From this site: http://www.toymaker.info/Games/html/vertex_shaders.html
We have the following code snippet:
// transformations provided by the app, constant Uniform data
float4x4 matWorldViewProj: WORLDVIEWPROJECTION;
// the format of our vertex data
struct VS_OUTPUT
{
float4 Pos : POSITION;
};
// Simple Vertex Shader - carry out transformation
VS_OUTPUT VS(float4 Pos : POSITION)
{
VS_OUTPUT Out = (VS_OUTPUT)0;
Out.Pos = mul(Pos,matWorldViewProj);
return Out;
}
My question is: why does the struct VS_OUTPUT have a 4 dimensional vector as its position? Isn't position just x, y and z?

Because you need the w coordinate for perspective calculation. After you output from the vertex shader than DirectX performs a perspective divide by dividing by w.
Essentially if you have 32768, -32768, 32768, 65536 as your output vertex position then after w divide you get 0.5, -0.5, 0.5, 1. At this point the w can be discarded as it is no longer needed. This information is then passed through the viewport matrix which transforms it to usable 2D coordinates.
Edit: If you look at how a matrix multiplication is performed using the projection matrix you can see how the values get placed in the correct places.
Taking the projection matrix specified in D3DXMatrixPerspectiveLH
2*zn/w 0 0 0
0 2*zn/h 0 0
0 0 zf/(zf-zn) 1
0 0 zn*zf/(zn-zf) 0
And applying it to a random x, y, z, 1 (Note for a vertex position w will always be 1) vertex input value you get the following
x' = ((2*zn/w) * x) + (0 * y) + (0 * z) + (0 * w)
y' = (0 * x) + ((2*zn/h) * y) + (0 * z) + (0 * w)
z' = (0 * x) + (0 * y) + ((zf/(zf-zn)) * z) + ((zn*zf/(zn-zf)) * w)
w' = (0 * x) + (0 * y) + (1 * z) + (0 * w)
Instantly you can see that w and z are different. The w coord now just contains the z coordinate passed to the projection matrix. z contains something far more complicated.
So .. assume we have an input position of (2, 1, 5, 1) we have a zn (Z-Near) of 1 and a zf (Z-Far of 10) and a w (width) of 1 and a h (height) of 1.
Passing these values through we get
x' = (((2 * 1)/1) * 2
y' = (((2 * 1)/1) * 1
z' = ((10/(10-1) * 5 + ((10 * 1/(1-10)) * 1)
w' = 5
expanding that we then get
x' = 4
y' = 2
z' = 4.4
w' = 5
We then perform final perspective divide and we get
x'' = 0.8
y'' = 0.4
z'' = 0.88
w'' = 1
And now we have our final coordinate position. This assumes that x and y ranges from -1 to 1 and z ranges from 0 to 1. As you can see the vertex is on-screen.
As a bizarre bonus you can see that if |x'| or |y'| or |z'| is larger than |w'| or z' is less than 0 that the vertex is offscreen. This info is used for clipping the triangle to the screen.
Anyway I think thats a pretty comprehensive answer :D
Edit2: Be warned i am using ROW major matrices. Column major matrices are transposed.

Rotation is specified by a 3 dimensional matrix and translation by a vector. You can perform both transforms in a "single" operation by combining them into a single 4 x 3 matrix:
rx1 rx2 rx3 tx1
ry1 ry2 ry3 ty1
rz1 rz2 rz3 tz1
However as this isn't square there are various operations that can't be performed (inversion for one). By adding an extra row (that does nothing):
0 0 0 1
all these operations become possible (if not easy).
As Goz explains in his answer by making the "1" a non identity value the matrix becomes a perspective transformation.

Clipping is an important part of this process, as it helps to visualize what happens to the geometry. The clipping stage essentially discards any point in a primitive that is outside of a 2-unit cube centered around the origin (OK, you have to reconstruct primitives that are partially clipped but that doesn't matter here).
It would be possible to construct a matrix that directly mapped your world space coordinates to such a cube, but gradual movement from the far plane to the near plane would be linear. That is to say that a move of one foot (towards the viewer) when one mile away from the viewer would cause the same increase in size as a move of one foot when several feet from the camera.
However, if we have another coordinate in our vector (w), we can divide the vector component-wise by w, and our primitives won't exhibit the above behavior, but we can still make them end up inside the 2-unit cube above.
For further explanations see http://www.opengl.org/resources/faq/technical/depthbuffer.htm#0060 and http://en.wikipedia.org/wiki/Transformation_matrix#Perspective_projection.
A simple answer would be to say that if you don't tell the pipeline what w is then you haven't given it enough information about your projection. This can be verified directly without understanding what the pipeline does with it...
As you probably know the 4x4 matrix can be split into parts based on what each part does. The 3x3 matrix at the top left is altered when you do rotation or scale operations. The fourth column is altered when you do a translation. If you ever inspect a perspective matrix, it alters the bottom row of the matrix. If you then look at how a Matrix-Vector multiplication is done, you see that the bottom row of the matrix ONLY affects the resultant w component of the vector. So if you don't tell the pipeline about w it won't have all your information.

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex