BMI270 - LSB 2 MPS2 conversion formula - data-conversion

Github: \BMI270-Sensor-API-master\BMI270-Sensor-API - master\bmi270_legacy_examples\accel_gyro\accel_gyro.c
/*!
#brief This function converts lsb to meter per second squared for 16 bit accelerometer at
range 2G, 4G, 8G or 16G.
*/
static float lsb_to_mps2(int16_t val, float g_range, uint8_t bit_width)
{
   float half_scale = ((float)(1 << bit_width) / 2.0f);
   return (GRAVITY_EARTH * val * g_range) / half_scale;
}
Formula usage
  /* Converting lsb to meter per second squared for 16 bit accelerometer at 2G range. */
  x = lsb_to_mps2(sensor_data.acc.x, 2, bmi2_dev.resolution);
  bmi2_dev.resolution = 16
Hello All,
I am trying to understand why in the conversion formula below, a 'half scale' range is used instead of 'full scale' range.
Thank you in advance
Best regards

Related

can someone tell me what dose it mean 4/ 32768.0

Am using arduino nano 33 Ble and am using the Lib Arduino_LSM9DS1
am trying to understand the equation but i dont get it
the say data[0]*4/32768 wher the lsb 32768.
it should be a 16 bit rigester where the lsb should 2^16 = 65536. or hier they use -+ 32768 ?
and what exactly 4 ? why the use this rang not a an 8 or 16 ?
can somone explin it to me ?
and how exactly get the acceleration and in which unit ?
int LSM9DS1Class::readAcceleration(float& x, float& y, float& z)
{
int16_t data[3];
if (!readRegisters(LSM9DS1_ADDRESS, LSM9DS1_OUT_X_XL, (uint8_t*)data, sizeof(data))) {
x = NAN;
y = NAN;
z = NAN;
return 0;
}
x = data[0] * 4.0 / 32768.0;
y = data[1] * 4.0 / 32768.0;
z = data[2] * 4.0 / 32768.0;
return 1;
}
The documentation states that:
Accelerometer range is set at [-4,+4]g -/+0.122 mg
So, the value returned by the function readAcceleration is in the range [-4,4], representing -4g to 4g
g is the gravitational acceleration = 9.81 m/s2
The code you're showing is the implementation of the function readAcceleration. As I understand it, the raw acceleration data represented as a 16-bit signed integer (between −32,768 to 32,767), which is then normalized (divided by 32,768) and multiplied by 4 to put in the correct range of [-4,4].

Bias value and range of the exponent of floating point

I have realized that I didn't heed duly the floating-point part of IEEE 754 standard as sitting my university desks. However, even if I'm not currently struggling with embedded stuff, I feel incompetent myself and incapable of entitling to be engineer title for lack of some way of math-calculations and wholly grasping the standard.
What I know is
0 and 255 are special values to express 0 and infinity
values.
There is implicit 1 to be used to express 23bit as 24
where e becomes 1 only if it's 000, if it's 111 and mantissa is 0000, then it's infinity, and if it's 111 and mantissa is XXXX, then it's not a number.
What I don't understand is
How can we mention -126 and 127, inclusively? How are total possible
254 values sectioned as the inclusive values?
Why is 127 selected as the bias value?
Some sources explain the sectionization as [-126..127] but some [-125...128]. It is really intricate and perplexing.
How can we say the minimum 2^{-126} if not the second aforementioned source? If it is 2^{-125} ? (I have not be able to run my brain to get it understand till now though struggling :)
Isn't using modulo remainder operator more logical with the bias value instead of subtraction i.e. 2^{e%127}? (the correction thanks to chux)
Exponent range
for 32bit float the raw exponent rexp is 8 bit <0,255> and bias is 127. Excluding special cases { 0,255 } we got <1,254> applying bias:
expmin = 1-127 = -126
expmax = 254-127 = +127
Denormal values are without implicit 1 so for minimal number the mantisa is 1 and if the exponent should point to lsb of mantisa then we need to shift few more:
expmin = 0-127-(23-1) = -149
Normal max value will be with maximal mantisa so:
max = ((2^24)-1)*(2^127) = (2^24)*(2^127) - (2^127) = 2^151 - 2^127
so the real range (denormals included) of float is:
<2^-149 ,2^+151 )
<1.40e-45,2.85e+45)
In most specs and docs only the exponent for normalized numbers is shown so:
<2^-126 ,2^+127 >
<1.175e-38,1.701e38>
Here a small C++/VCL example of disecting the 32 and 64 bit floats:
//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "Unit1.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
typedef unsigned __int32 U32;
typedef __int32 S32;
//---------------------------------------------------------------------------
// IEEE 754 double MSW masks
const U32 _f64_sig =0x80000000; // sign
const U32 _f64_exp =0x7FF00000; // exponent
const U32 _f64_exp_sig=0x40000000; // exponent sign
const U32 _f64_exp_bia=0x3FF00000; // exponent bias
const U32 _f64_exp_lsb=0x00100000; // exponent LSB
const U32 _f64_exp_pos= 20; // exponent LSB bit position
const U32 _f64_man =0x000FFFFF; // mantisa
const U32 _f64_man_msb=0x00080000; // mantisa MSB
const U32 _f64_man_bits= 52; // mantisa bits
const double _f64_lsb = 1.7e-308; // abs min number
// IEEE 754 single masks <2^-149,2^+151) <1.40e-45,2.85e+45).
const U32 _f32_sig =0x80000000; // sign
const U32 _f32_exp =0x7F800000; // exponent
const U32 _f32_exp_sig=0x40000000; // exponent sign
const U32 _f32_exp_bia=0x3F800000; // exponent bias
const U32 _f32_exp_lsb=0x00800000; // exponent LSB
const U32 _f32_exp_pos= 23; // exponent LSB bit position
const U32 _f32_man =0x007FFFFF; // mantisa
const U32 _f32_man_msb=0x00400000; // mantisa MSB
const U32 _f32_man_bits= 23; // mantisa bits
const float _f32_lsb = 3.4e-38;// abs min number
//---------------------------------------------------------------------------
void f64_disect(double x)
{
const int h=1; // may be platform dependent MSB/LSB order
const int l=0;
union _f64
{
double f; // 64bit floating point
U32 u[2]; // 2x32 bit uint
} f64;
AnsiString txt="";
U32 man[2];
S32 exp,bias;
char sign='+';
f64.f=x;
bias=_f64_exp_bia>>_f64_exp_pos;
if (f64.u[h]&_f64_sig) sign='-';
exp =(f64.u[h]&_f64_exp)>>_f64_exp_pos;
exp -=bias;
man[h]=f64.u[h]&_f64_man;
man[l]=f64.u[l];
if (exp==-bias ) // zero, denormalized
{
exp-=_f64_man_bits-1; // change exp pointing from msb to lsb (ignoring implicit bit)
txt=AnsiString().sprintf("%c%06X%08Xh>>%4i",sign,man[h],man[l],-exp);
}
else if (exp==+bias+1) // Inf,NaN
{
if (man[h]|man[l]==0) txt=AnsiString().sprintf("%cInf ",sign);
else txt=AnsiString().sprintf("%cNaN ",sign);
man[h]=0; man[l]=0; exp=0;
}
else{
exp -=_f64_man_bits; // change exp pointing from msb to lsb
man[h]|=_f64_exp_lsb; // implicit msb mantisa bit for normalized numbers
txt=AnsiString().sprintf("%06X",man);
if (exp<0) txt=AnsiString().sprintf("%c%06X%08Xh>>%4i",sign,man[h],man[l],-exp);
else txt=AnsiString().sprintf("%c%06X%08Xh<<%4i",sign,man[h],man[l],+exp);
}
// reconstruct man,exp back to double
double y=double(man[l])*pow(2.0,exp);
y+=double(man[h])*pow(2.0,exp+32.0);
Form1->mm_log->Lines->Add(AnsiString().sprintf("%21.10lf = %s = %21.10lf",x,txt,y));
}
//---------------------------------------------------------------------------
void f32_disect(double x)
{
union _f32 // float bits access
{
float f; // 32bit floating point
U32 u; // 32 bit uint
} f32;
AnsiString txt="";
U32 man;
S32 exp,bias;
char sign='+';
f32.f=x;
bias=_f32_exp_bia>>_f32_exp_pos;
if (f32.u&_f32_sig) sign='-';
exp =(f32.u&_f32_exp)>>_f32_exp_pos;
exp-=bias;
man =f32.u&_f32_man;
if (exp==-bias ) // zero, denormalized
{
exp-=_f32_man_bits-1; // change exp pointing from msb to lsb (ignoring implicit bit)
txt=AnsiString().sprintf("%c%06Xh>>%3i",sign,man,-exp);
}
else if (exp==+bias+1) // Inf,NaN
{
if (man==0) txt=AnsiString().sprintf("%cInf ",sign);
else txt=AnsiString().sprintf("%cNaN ",sign);
man=0; exp=0;
}
else{
exp-=_f32_man_bits; // change exp pointing from msb to lsb
man|=_f32_exp_lsb; // implicit msb mantisa bit for normalized numbers
txt=AnsiString().sprintf("%06X",man);
if (exp<0) txt=AnsiString().sprintf("%c%06Xh>>%3i",sign,man,-exp);
else txt=AnsiString().sprintf("%c%06Xh<<%3i",sign,man,+exp);
}
// reconstruct man,exp back to float
float y=float(man)*pow(2.0,exp);
Form1->mm_log->Lines->Add(AnsiString().sprintf("%21.10f = %s = %21.10f",x,txt,y));
}
//---------------------------------------------------------------------------
//--- Builder: --------------------------------------------------------------
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
{
mm_log->Lines->Add("[Float]\r\n");
f32_disect(123*pow(2.0,-127-22)); // Denormalizxed
f32_disect(+0.0); // Zero
f32_disect(-0.0); // Zero
f32_disect(+0.0/0.0); // NaN
f32_disect(-0.0/0.0); // NaN
f32_disect(+1.0/0.0); // Inf
f32_disect(-1.0/0.0); // Inf
f32_disect(+123.456); // Normalized
f32_disect(-0.000123); // Normalized
mm_log->Lines->Add("\r\n[Double]\r\n");
f64_disect(123*pow(2.0,-127-22)); // Denormalizxed
f64_disect(+0.0); // Zero
f64_disect(-0.0); // Zero
f64_disect(+0.0/0.0); // NaN
f64_disect(-0.0/0.0); // NaN
f64_disect(+1.0/0.0); // Inf
f64_disect(-1.0/0.0); // Inf
f64_disect(+123.456); // Normalized
f64_disect(-0.000123); // Normalized
mm_log->Lines->Add("\r\n[Fixed]\r\n");
const int n=10;
float fx=12.345,fy=4.321,fm=1<<n;
int x=float(fx*fm);
int y=float(fy*fm);
mm_log->Lines->Add(AnsiString().sprintf("%7.3f + %7.3f = %8.3f = %8.3f",fx,fy,fx+fy,float(int((x+y) ))/fm));
mm_log->Lines->Add(AnsiString().sprintf("%7.3f - %7.3f = %8.3f = %8.3f",fx,fy,fx-fy,float(int((x-y) ))/fm));
mm_log->Lines->Add(AnsiString().sprintf("%7.3f * %7.3f = %8.3f = %8.3f",fx,fy,fx*fy,float(int((x*y)>>n))/fm));
mm_log->Lines->Add(AnsiString().sprintf("%7.3f / %7.3f = %8.3f = %8.3f",fx,fy,fx/fy,float(int((x/y)<<n))/fm
+float(int(((x%y)<<n)/y))/fm));
}
//---------------------------------------------------------------------------
Which might help you understand a bit more ... If you're interested then look also at this:
print 32bit float using only integer arithmetics
exponent bias
It was selected as midle between the range edges:
bias = (0+255)/2 = 127
to simply have the same range for positive and negative exponents as possible
modulo
using exp=rexp%127 will not give you negative values from unsigned rexp no matter what not to mention division is slow operation (at least at the time the specs was created)... That is why exp=rexp-bias
How can we mention -126 and 127, inclusively? How are total possible 254 values sectioned as the inclusive values?
IEEE 754-2008 3.3 says emin, the minimum exponent, for any format shall be 1−emax, where emax is the maximum exponent. Table 3.2 in that clause says emax for the 32-bit format (named “binary32”) shall be 127. So emin is 1−127 = −126.
There is no mathematical constraint that forces this. The relationship is chosen as a matter of preference. I recall there being a desire to have slightly more positive exponents than negative but do not recall the justification for this.
Why is 127 selected as the bias value?
Once the bounds above are selected, 127 is necessarily the value needed to encode them in eight bits (as codes 1-254 while leaving 0 and 255 as special codes).
Some sources explain the sectionization as [-126..127] but some [-125...128]. It is really intricate and perplexing.
Given bits of a binary32 that are the sign bit S, the eight exponent bits E (which are a binary representation of a number e), and the 23 significand bits F (which are a binary representation of a number f), and given 0 < e < 255, then the following are equivalent to each other:
The number represented is (−1)S • 2e−127 • (1+f/223).
The number represented is (−1)S • 2e−127 • 1.F2.
The number represented is (−1)S • 2e−126 • (½+f/224).
The number represented is (−1)S • 2e−126 • .1F2.
The number represented is (−1)S • 2e−150 • (223+f).
The number represented is (−1)S • 2e−150 • 1F.2.
The difference between the first two is just that the first takes the significand bits F, treats them as a binary numeral to get a number f, then divides that number by 223 and adds 1, whereas the second uses the 23 significand bits F to write a 24-bit numeral “1.F”, which it then interprets as a binary numeral. These two methods produce the same value.
The difference between the first pair and the second pair is that the first prepares a significand in the half-open interval [1, 2), whereas the second prepares a significand in the half-open interval [½, 1) and adjusts the exponent to compensate. The product is the same.
The difference between the first pair and the third pair is also one of scaling. The third pair scales the significand so that it is an integer. The first form is most commonly seen in discussions of floating-point numbers, but the third form is useful for mathematical proofs because number theory generally works with integers. This form is also mentioned in IEEE 754 in passing, also in clause 3.3.
How can we say the minimum 2^{-126} if not the second aforementioned source? If it is 2^{-125} ? (I have not be able to run my brain to get it understand till now though struggling :)
The minimum positive normal value has S bit 0, E bits 00000001, and F bits 00000000000000000000000. In the first form, this represents +1 • 21−127 • 1 = 2−126. In the second form, it represents +1 • 21−126 • ½ = 2−126. In the third form, it represents +1 • 21-150 • 223 = 2−126. So the form is irrelevant; the values represented are the same.
Isn't using remainder operator more logical with the bias value instead of subtraction i.e. 2^{e%127}?
No. That would cause the exponent field values 1 and 128 to map to the same value, and that would waste some encodings. There is no benefit to that.
Additionally, the encoding format is such that all positive floating-point numbers are in the same order as their encodings: Increasing the encoding increases the value represented, and vice-versa. This relationship would not old with any sort of wrapped interpretation of the exponent field. (Unfortunately, this is flipped for negative numbers, so compare the encodings of floating-point numbers as pure integers does not give the same results as comparing the floating-point numbers.)

Convert temperature data from sensor to celsius degree

I have a sensor named LSM303DLHC ,it have 2 temp register but I can't figure it out how to convert it to degrees Celsius.
2 Reg is:
TEMP_OUT_H_M register // high reg
TEMP11 | TEMP10 | TEMP9 | TEMP8 | TEMP7 | TEMP6 | TEMP5 | TEMP4
TEMP_OUT_L_M register //low reg
TEMP3 | TEMP2 | TEMP1 | TEMP0 | 0 | 0 | 0 | 0
In datasheet say: "TEMP[11:0] Temperature data (8 LSB/deg - 12-bit resolution)"
My current code is
uint8_t hig_reg = read(TEMP_OUT_H_M) // value = 0x03
uint8_t low_reg = read(TEMP_OUT_L_M) // value = 0x40
int16_t temp = ((uint16_t)hig_reg << 8) | (uint16_t)low_reg; // temp = 0x0340 = 832
float mTemp = temp/256; // = 3.25
mTemp = mTemp +20 ; // =23.25 (°C) i add 20 more
But I don't understand where the 20 °C offset comes from? Datasheet never mentions it.
Thank for your answer. Turn out that temperature sensor just determine comparative temperature to calculate the variation. It not use for absolute temperature.They should add that information in datasheet. I just waste 2 day of my life for that.
My try...
First, I have note that you are taking the whole 8 bit TEMP_OUT_L_M register and as you described is just the first 4 bits of it.
Then try to make the 12 bit register first. I use python ans SMBus library,
temph = i2cbus.read_byte_data(i2caddress, TEMP_OUT_H_M) << 4
templ = i2cbus.read_byte_data(i2caddress, TEMP_OUT_L_M) >> 4
tempread = temph + templ # it is all ready converted to Decimal
Then you can go ahead with the transformation: see page 11 title 2.2 "Temperature sensor characteristics: 8 LSB/ºC , 12 bit resolution and 2.5 Vdd."
Then it is clear that:
ºC = (read_value * VDD * 10^(log 2 (LSB/ºC)) / ((resolution - 1) * (10*(ºC/LSB))
In the LSM303 then following the python code:
# temperature = (tempread * 2.5 * 1000)/(2^12-1) * (10/8)) better to write:
temperature = (tempread *2500)/(4095 * 1.25)
In your case: you have read: 0x0340, in 12 bits 0x34 in decimal: 54
temperature = (54 * 2500) / (4095 * 1.25) = 23.443223
I also noticed that:
The maximum secure register to read is 85ºC = 0x55, so we better make the register with the 4 bit LSB of TEMP_OUT_H_M and 4 bit MSB of TEMP_OUT_L_M.
In further test the LSM303 can resist near 125 ºC for a while, with out permanent damage, but it a good practice to use this temperature to put the Magnetometer and Accelerometer in a sleep mode. When the temperature reaches 80.
My opinion is that TEMP is on 10 bits and one for the sign (value max you can read : 0x3FF), so:
0x03FF - 0x0340 = 0x0BF
0x0BF / 8 = 0x17 (23.875 in decimal).
As said, don't forget the two's complement in your computation.

Inverse sqrt for fixed point

I am looking for the best inverse square root algorithm for fixed point 16.16 numbers. The code below is what I have so far(but basically it takes the square root and divides by the original number, and I would like to get the inverse square root without a division). If it changes anything, the code will be compiled for armv5te.
uint32_t INVSQRT(uint32_t n)
{
uint64_t op, res, one;
op = ((uint64_t)n<<16);
res = 0;
one = (uint64_t)1 << 46;
while (one > op) one >>= 2;
while (one != 0)
{
if (op >= res + one)
{
op -= (res + one);
res += (one<<1);
}
res >>= 1;
one >>= 2;
}
res<<=16;
res /= n;
return(res);
}
The trick is to apply Newton's method to the problem x - 1/y^2 = 0. So, given x, solve for y using an iterative scheme.
Y_(n+1) = y_n * (3 - x*y_n^2)/2
The divide by 2 is just a bit shift, or at worst, a multiply by 0.5. This scheme converges to y=1/sqrt(x), exactly as requested, and without any true divides at all.
The only problem is that you need a decent starting value for y. As I recall there are limits on the estimate y for the iterations to converge.
ARMv5TE processors provide a fast integer multiplier, and a "count leading zeros" instruction. They also typically come with moderately sized caches. Based on this, the most suitable approach for a high-performance implementation appears to be a table lookup for an initial approximation, followed by two Newton-Raphson iterations to achieve fully accurate results. We can speed up the first of these iterations further with additional pre-computation that is incorporated into the table, a technique used by Cray computers forty years ago.
The function fxrsqrt() below implements this approach. It starts out with an 8-bit approximation r to the reciprocal square root of the argument a, but instead of storing r, each table element stores 3r (in the lower ten bits of the 32-bit entry) and r3 (in the upper 22 bits of the 32-bit entry). This allows the quick computation of the first iteration as
r1 = 0.5 * (3 * r - a * r3). The second iteration is then computed in the conventional way as r2 = 0.5 * r1 * (3 - r1 * (r1 * a)).
To be able to perform these computations accurately, regardless of the magnitude of the input, the argument a is normalized at the start of the computation, in essence representing it as a 2.32 fixed-point number multiplied with a scale factor of 2scal. At the end of the computation the result is denormalized according to formula 1/sqrt(22n) = 2-n. By rounding up results whose most significant discarded bit is 1, accuracy is improved, resulting in almost all results being correctly rounded. The exhaustive test reports: results too low: 639 too high: 1454 not correctly rounded: 2093
The code makes use of two helper functions: __clz() determines the number of leading zero bits in a non-zero 32-bit argument. __umulhi() computes the 32 most significant bits of a full 64-bit product of two unsigned 32-bit integers. Both functions should be implemented either via compiler intrinsics, or by using a bit of inline assembly. In the code below I am showing portable implementations well suited to ARM CPUs along with inline assembly versions for x86 platforms. On ARMv5TE platforms __clz() should be mapped map to the CLZ instruction, and __umulhi() should be mapped to UMULL.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>
#define USE_OWN_INTRINSICS 1
#if USE_OWN_INTRINSICS
__forceinline int __clz (uint32_t a)
{
int r;
__asm__ ("bsrl %1,%0\n\t" : "=r"(r): "r"(a));
return 31 - r;
}
uint32_t __umulhi (uint32_t a, uint32_t b)
{
uint32_t r;
__asm__ ("movl %1,%%eax\n\tmull %2\n\tmovl %%edx,%0\n\t"
: "=r"(r) : "r"(a), "r"(b) : "eax", "edx");
return r;
}
#else // USE_OWN_INTRINSICS
int __clz (uint32_t a)
{
uint32_t r = 32;
if (a >= 0x00010000) { a >>= 16; r -= 16; }
if (a >= 0x00000100) { a >>= 8; r -= 8; }
if (a >= 0x00000010) { a >>= 4; r -= 4; }
if (a >= 0x00000004) { a >>= 2; r -= 2; }
r -= a - (a & (a >> 1));
return r;
}
uint32_t __umulhi (uint32_t a, uint32_t b)
{
return (uint32_t)(((uint64_t)a * b) >> 32);
}
#endif // USE_OWN_INTRINSICS
/*
* For each sub-interval in [1, 4), use an 8-bit approximation r to reciprocal
* square root. To speed up subsequent Newton-Raphson iterations, each entry in
* the table combines two pieces of information: The least-significant 10 bits
* store 3*r, the most-significant 22 bits store r**3, rounded from 24 down to
* 22 bits such that accuracy is optimized.
*/
uint32_t rsqrt_tab [96] =
{
0xfa0bdefa, 0xee6af6ee, 0xe5effae5, 0xdaf27ad9,
0xd2eff6d0, 0xc890aec4, 0xc10366bb, 0xb9a71ab2,
0xb4da2eac, 0xadce7ea3, 0xa6f2b29a, 0xa279a694,
0x9beb568b, 0x97a5c685, 0x9163027c, 0x8d4fd276,
0x89501e70, 0x8563da6a, 0x818ac664, 0x7dc4fe5e,
0x7a122258, 0x7671be52, 0x72e44a4c, 0x6f68fa46,
0x6db22a43, 0x6a52623d, 0x67041a37, 0x65639634,
0x622ffe2e, 0x609cba2b, 0x5d837e25, 0x5bfcfe22,
0x58fd461c, 0x57838619, 0x560e1216, 0x53300a10,
0x51c72e0d, 0x50621a0a, 0x4da48204, 0x4c4c2e01,
0x4af789fe, 0x49a689fb, 0x485a11f8, 0x4710f9f5,
0x45cc2df2, 0x448b4def, 0x421505e9, 0x40df5de6,
0x3fadc5e3, 0x3e7fe1e0, 0x3d55c9dd, 0x3d55d9dd,
0x3c2f41da, 0x39edd9d4, 0x39edc1d4, 0x38d281d1,
0x37bae1ce, 0x36a6c1cb, 0x3595d5c8, 0x3488f1c5,
0x3488fdc5, 0x337fbdc2, 0x3279ddbf, 0x317749bc,
0x307831b9, 0x307879b9, 0x2f7d01b6, 0x2e84ddb3,
0x2d9005b0, 0x2d9015b0, 0x2c9ec1ad, 0x2bb0a1aa,
0x2bb0f5aa, 0x2ac615a7, 0x29ded1a4, 0x29dec9a4,
0x28fabda1, 0x2819e99e, 0x2819ed9e, 0x273c3d9b,
0x273c359b, 0x2661dd98, 0x258ad195, 0x258af195,
0x24b71192, 0x24b6b192, 0x23e6058f, 0x2318118c,
0x2318718c, 0x224da189, 0x224dd989, 0x21860d86,
0x21862586, 0x20c19183, 0x20c1b183, 0x20001580
};
/* This function computes the reciprocal square root of its 16.16 fixed-point
* argument. After normalization of the argument if uses the most significant
* bits of the argument for a table lookup to obtain an initial approximation
* accurate to 8 bits. This is followed by two Newton-Raphson iterations with
* quadratic convergence. Finally, the result is denormalized and some simple
* rounding is applied to maximize accuracy.
*
* To speed up the first NR iteration, for the initial 8-bit approximation r0
* the lookup table supplies 3*r0 along with r0**3. A first iteration computes
* a refined estimate r1 = 1.5 * r0 - x * r0**3. The second iteration computes
* the final result as r2 = 0.5 * r1 * (3 - r1 * (r1 * x)).
*
* The accuracy for all arguments in [0x00000001, 0xffffffff] is as follows:
* 639 results are too small by one ulp, 1454 results are too big by one ulp.
* A total of 2093 results deviate from the correctly rounded result.
*/
uint32_t fxrsqrt (uint32_t a)
{
uint32_t s, r, t, scal;
/* handle special case of zero input */
if (a == 0) return ~a;
/* normalize argument */
scal = __clz (a) & 0xfffffffe;
a = a << scal;
/* initial approximation */
t = rsqrt_tab [(a >> 25) - 32];
/* first NR iteration */
r = (t << 22) - __umulhi (t, a);
/* second NR iteration */
s = __umulhi (r, a);
s = 0x30000000 - __umulhi (r, s);
r = __umulhi (r, s);
/* denormalize and round result */
r = ((r >> (18 - (scal >> 1))) + 1) >> 1;
return r;
}
/* reference implementation, 16.16 reciprocal square root of non-zero argment */
uint32_t ref_fxrsqrt (uint32_t a)
{
double arg = a / 65536.0;
double rsq = sqrt (1.0 / arg);
uint32_t r = (uint32_t)(rsq * 65536.0 + 0.5);
return r;
}
int main (void)
{
uint32_t arg = 0x00000001;
uint32_t res, ref;
uint32_t err, lo = 0, hi = 0;
do {
res = fxrsqrt (arg);
ref = ref_fxrsqrt (arg);
err = 0;
if (res < ref) {
err = ref - res;
lo++;
}
if (res > ref) {
err = res - ref;
hi++;
}
if (err > 1) {
printf ("!!!! arg=%08x res=%08x ref=%08x\n", arg, res, ref);
return EXIT_FAILURE;
}
arg++;
} while (arg);
printf ("results too low: %u too high: %u not correctly rounded: %u\n",
lo, hi, lo + hi);
return EXIT_SUCCESS;
}
I have a solution that I characterize as "fast inverse sqrt, but for 32bit fixed points". No table, no reference, just straight to the point with a good guess.
If you want, jump to the source code below, but beware of a few things.
(x * y)>>16 can be replaced with any fixed-point multiplication scheme you want.
This does not require 64-bit [long-words], I just use that for the ease of demonstration. Long words are used to prevent overflow in multiplication. A fixed-point math library will have fixed-point multiplication functions that handle this better.
The initial guess is pretty good, so you get relatively precise results in the first incantation.
The code is more verbose than needed for demonstration.
Values less than 65536 (<1) and greater than 32767<<16 cannot be used.
This is generally not faster than using a square root table and division if your hardware has a division function. If it does not, this avoids divisions.
int fxisqrt(int input){
if(input <= 65536){
return 1;
}
long xSR = input>>1;
long pushRight = input;
long msb = 0;
long shoffset = 0;
long yIsqr = 0;
long ysqr = 0;
long fctrl = 0;
long subthreehalf = 0;
while(pushRight >= 65536){
pushRight >>=1;
msb++;
}
shoffset = (16 - ((msb)>>1));
yIsqr = 1<<shoffset;
//y = (y * (98304 - ( ( (x>>1) * ((y * y)>>16 ) )>>16 ) ) )>>16; x2
//Incantation 1
ysqr = (yIsqr * yIsqr)>>16;
fctrl = (xSR * ysqr)>>16;
subthreehalf = 98304 - fctrl;
yIsqr = (yIsqr * subthreehalf)>>16;
//Incantation 2 - Increases precision greatly, but may not be neccessary
ysqr = (yIsqr * yIsqr)>>16;
fctrl = (xSR * ysqr)>>16;
subthreehalf = 98304 - fctrl;
yIsqr = (yIsqr * subthreehalf)>>16;
return yIsqr;
}

How to make ARGB transparency using bitwise operators

I need to make transparency, having 2 pixels:
pixel1: {A, R, G, B} - foreground pixel
pixel2: {A, R, G, B} - background pixel
A,R,G,B are Byte values
each color is represented by byte value
now I'm calculating transparency as:
newR = pixel2_R * alpha / 255 + pixel1_R * (255 - alpha) / 255
newG = pixel2_G * alpha / 255 + pixel1_G * (255 - alpha) / 255
newB = pixel2_B * alpha / 255 + pixel1_B * (255 - alpha) / 255
but it is too slow
I need to do it with bitwise operators (AND,OR,XOR, NEGATION, BIT MOVE)
I want to do it on Windows Phone 7 XNA
---attached C# code---
public static uint GetPixelForOpacity(uint reduceOpacityLevel, uint pixelBackground, uint pixelForeground, uint pixelCanvasAlpha)
{
byte surfaceR = (byte)((pixelForeground & 0x00FF0000) >> 16);
byte surfaceG = (byte)((pixelForeground & 0x0000FF00) >> 8);
byte surfaceB = (byte)((pixelForeground & 0x000000FF));
byte sourceR = (byte)((pixelBackground & 0x00FF0000) >> 16);
byte sourceG = (byte)((pixelBackground & 0x0000FF00) >> 8);
byte sourceB = (byte)((pixelBackground & 0x000000FF));
uint newR = sourceR * pixelCanvasAlpha / 256 + surfaceR * (255 - pixelCanvasAlpha) / 256;
uint newG = sourceG * pixelCanvasAlpha / 256 + surfaceG * (255 - pixelCanvasAlpha) / 256;
uint newB = sourceB * pixelCanvasAlpha / 256 + surfaceB * (255 - pixelCanvasAlpha) / 256;
return (uint)255 << 24 | newR << 16 | newG << 8 | newB;
}
You can't do an 8 bit alpha blend using only bitwise operations, unless you basically re-invent multiplication with basic ops (8 shift-adds).
You can do two methods as mentioned in other answers: use 256 instead of 255, or use a lookup table. Both have issues, but you can mitigate them. It really depends on what architecture you're doing this on: the relative speed of multiply, divide, shift, add and memory loads. In any case:
Lookup table: a trivial 256x256 lookup table is 64KB. This will thrash your data cache and end up being very slow. I wouldn't recommend it unless your CPU has an abysmally slow multiplier, but does have low latency RAM. You can improve performance by throwing away some alpha bits, e.g A>>3, resulting in 32x256=8KB of lookup, which has a better chance of fitting in cache.
Use 256 instead of 255: the idea being divide by 256 is just a shift right by 8. This will be slightly off and tend to round down, darkening the image slightly, e.g if R=255, A=255 then (R*A)/256 = 254. You can cheat a little and do this: (R*A+R+A)/256 or just (R*A+R)/256 or (R*A+A)/256 = 255. Or, scale A to 0..256 first, e.g: A = (256*A)/255. That's just one expensive divide-by-255 instead of 6. Then, (R*A)/256 = 255.
I don't think it can be done with the same precision using only those operators. Your best bet is, I reckon, using a LUT (as long as the LUT can fit in the CPU cache, otherwise it might even be slower)
// allocate the LUT (64KB)
unsigned char lut[256*256] __cacheline_aligned; // __cacheline_aligned is a GCC-ism
// macro to access the LUT
#define LUT(pixel, alpha) (lut[(alpha)*256+(pixel)])
// precompute the LUT
for (int alpha_value=0; alpha_value<256; alpha_value++) {
for (int pixel_value=0; pixel_value<256; pixel_value++) {
LUT(pixel_value, alpha_value) = (unsigned char)((double)(pixel_value) * (double)(alpha_value) / 255.0));
}
}
// in the loop
unsigned char ialpha = 255-alpha;
newR = LUT(pixel2_R, alpha) + LUT(pixel1_R, ialpha);
newG = LUT(pixel2_G, alpha) + LUT(pixel1_G, ialpha);
newB = LUT(pixel2_B, alpha) + LUT(pixel1_B, ialpha);
otherwise you should try vectorizing your code. But to do that you should at least provide us with more info on your CPU architecture and compiler. Keep in mind that your compiler might be able to vectorize automatically, if provided with the right options.

Resources