a negative unsigned int? - unsigned

I'm trying to wrap my head around the truetype specification. On this page, in the section 'cmap' format 4, the parameter idDelta is listed as an unsigned 16-bits integer (UInt16). Yet, further down, a few examples are given, and here idDelta is given the values -9, -18, -27 and 1. How is this possible?

This is not a bug in the spec. The reason they show negative numbers in the idDelta row for the examples is that All idDelta[i] arithmetic is modulo 65536. (quoted from the section just above). Here's how that works.
The formula to get the glyph index is
glyphIndex = idDelta[i] + c
where c is the character code. Since this expression must be modulo 65536, that's equivalent to the following expression if you were using integers larger than 2 bytes :
glyphIndex = (idDelta[i] + c) % 65536
idDelta is a u16, so let's say it had the max value 65535 (0xFFFF), then glyphIndex would be equal to c - 1 since:
0xFFFF + 2 = 0x10001
0x10001 % 0x10000 = 1
You can think of this as a 16 integer wrapping around to 0 when an overflow occurs.
Now remember that a modulo is repeated division, keeping the remainder. Well in this case, since idDelta is only 16 bits, the max amount of divisions a modulo will need to do is 1, since the max value you can get from adding two 16 bit integers is 0x1FFFE , which is smaller than 0x100000. That means that a shortcut is to subtract 65536 (0x10000) instead of performing the modulo.
glyphIndex = (idDelta[i] - 0x10000) + c
And this is what the example shows as the values in the table. Here's an actual example from a .ttf file I've decoded :
I want the index for the character code 97 (lowercase 'a').
97 is greater than 32 and smaller than 126, so we use index 2 of the mappings.
idDelta[2] == 65507
glyphIndex = (65507 + 97) % 65536 === 68 which is the same as (65507 - 65536) + 97 === 68

The definition and use of idDelta on that page is not consistent. In the struct subheader it is defined as an int16, while a little earlier the same subheader is listed as UInt16*4.
It's probably a bug in the spec.
If you look at actual implementations, like this one from perl Tk, you'll see that idDelta is usually given as signed:
typedef struct SUBHEADER {
USHORT firstCode; /* First valid low byte for subHeader. */
USHORT entryCount; /* Number valid low bytes for subHeader. */
SHORT idDelta; /* Constant adder to get base glyph index. */
USHORT idRangeOffset; /* Byte offset from here to appropriate
* glyphIndexArray. */
} SUBHEADER;
Or see the implementation from libpdfxx:
struct SubHeader
{
USHORT firstCode;
USHORT entryCount;
SHORT idDelta;
USHORT idRangeOffset;
};

Related

How does the modulus shorten a number to only 16 digits in this example?

I am learning Solidity by following various tutorials online. One of these tutorials is Cryptozombies. In this tutorial we create a zombie that has "dna" (a number). These numbers must be only 16 digits long to work with the code. We do this by defining
uint dnaDigits = 16;
uint dnaModulus = 10 ** dnaDigits;
At some point in the lesson, we define a function that generates "random" dna by passing a string to the keccak256 hash function.
function _generateRandomDna(string memory _str) private view returns (uint) {
uint rand = uint(keccak256(abi.encodePacked(_str)));
return rand % dnaModulus;
}
I am trying to figure out how the output of keccak256 % 10^16 is always a 16 digit integer. I guess part of the problem is that I don't exactly understand what keccak256 outputs. What I (think) I know is that it outputs a 256 bit number. I bit is either 0 or 1, so there are 2^256 possible outputs from this function?
If you need more information please let me know. I think included everything relevant from the code.
Whatever keccak256(abi.encodePacked(_str)) returns, it converted into uint because of the casting uint(...).
uint rand = uint(keccak256(abi.encodePacked(_str)));
When it's an uint then the simple math, because it's modulus.
xxxxx % 100 always < 100
I see now that this code is mean to produce a number that has no more than 16 digits, not exactly 16 digits.
Take x % 10^16
If x < 10^16, then then the remainder is x, and x is by definition less than 10^16, which is the smallest possible number with 17 digits I.e. every number less than 10^16 has 16 or fewer digits.
If x = 10^16, then remainder is 0 which has fewer than 16 digits.
If x > 10^16, then either 10^16 goes into x fully or not. If it does fully, remainder is zero which has less than 16 digits, if it goes in partially, then remainder is only a part of 10^16 which will always have 16 or fewer digits.

Understanding Adruino Binary to Decimal Conversations

I was looking at some code today for integrating a real time clock with an arduino and it had some binary to decimal (and vice versa) that I don't fully understand.
The code in question is below:
byte decToBcd(byte val)
{
return ( (val/10*16) + (val%10) );
}
byte bcdToDec(byte val)
{
return ( (val/16*10) + (val%16) );
}
ex: decToBcd(12);
I really fail to grasp how this works. I am not sure I understand the math, or if some sort of assumptions are being taken advantage of.
Would someone mind explaining how exactly the math and data types below are supposed to work? If possible touching on why the value "16" is used in the conversions instead of "8" when we are supposed to be working with a byte value.
For context, the full code can be found here: http://www.codingcolor.com/microcontrollers/an-arduino-lcd-clock-using-a-chronodot-rtc/
The key hint here is BCD - Binary-coded decimal - in the function name. In BCD each decimal digit is represented by four bits (half of a byte). As a result the maximum (decimal) number you can store using BCD notation is 99 - 9 in the upper nibble (half of the byte) and 9 in the lower nibble.
Let's take a look at number 12 as an example. Number 12 looks as follows in the binary notation:
12 = %00001010
However in BCD it looks as follows:
12 = %00010010
because
0001 0010
1 2
Now if you look at the decToBcd function val%10 is responsible for calculating the value of the ones place (i.e. the last digit). Since this goes to the lower part of the byte we don't need to do anything special here. val/10*16 first calculates the value of the tens place - val/10. However since the value has to go to the upper half of the byte it needs to be shifted up by four bits - hence *16. Another (in my opinion more readable) way of writing this function would be:
((val / 10) << 4) | (val % 10)
The bcdToDec does the reverse conversion.
RTC usually stores Year in 1 byte as 2 digits only, i.e: 2014 is 14.
And some of them stores it as a number from the year 1970 so 2014 = 44.
So maximum it can hold is 99 in both cases.

bit-shift operation in accelerometer code

I'm programming my Arduino micro controller and I found some code for accepting accelerometer sensor data for later use. I can understand all but the following code. I'd like to have some intuition as to what is happening but after all my searching and reading I can't wrap my head around what is going on and truly understand.
I have taken a class in C++ and we did very little with bitwise operations or bit shifting or whatever you'd like to call it. Let me try to explain what I think I understand and you can correct me where it is needed.
So:
I think we are storing a value in x, pretty sure in fact.
It appears that the data in array "buff", slot number 1, is being set to the datatype of integer.
The value in slot 1 is being bit shifted 8 places to the left.(does this point to buff slot 0?)
This new value is being compared to the data in buff slot 0 and if either bits are true then the bit in the data stored in x will also be true so, 0 and 1 = 1, 0 and 0 = 0 and 1 and 0 = 1 in the end stored value.
The code does this for all three axis: x, y, z but I'm not sure why...I need help. I want full understanding before I progress.
//each axis reading comes in 10 bit resolution, ie 2 bytes.
// Least Significant Byte first!!
//thus we are converting both bytes in to one int
x = (((int)buff[1]) << 8) | buff[0];
y = (((int)buff[3]) << 8) | buff[2];
z = (((int)buff[5]) << 8) | buff[4];
This code is being used to convert the raw accelerometer data (in an array of 6 bytes) into three 10-bit integer values. As the comment says, the data is LSB first. That is:
buff[0] // least significant 8 bits of x data
buff[1] // most significant 2 bits of x data
buff[2] // least significant 8 bits of y data
buff[3] // most significant 2 bits of y data
buff[4] // least significant 8 bits of z data
buff[5] // most significant 2 bits of z data
It's using bitwise operators two put the two parts together into a single variable. The (int) typecasts are unnecessary and (IMHO) confusing. This simplified expression:
x = (buff[1] << 8) | buff[0];
Takes the data in buff[1], and shifts it left 8 bits, and then puts the 8 bits from buff[0] in the space so created. Let's label the 10 bits a through j for example's sake:
buff[0] = cdefghij
buff[1] = 000000ab
Then:
buff[1] << 8 = ab00000000
And:
buff[1] << 8 | buff[0] = abcdefghij
The value in slot 1 is being bit shifted 8 places to the left.(does this point to buff slot 0?)
Nah. Bitwise operators ain't pointer arithmetic, don't confuse the two. Shifting by N places to the left is (roughly) equivalent with multiplying by 2 to the Nth power (except some corner cases in C, but let's not talk about those yet).
This new value is being compared to the data in buff slot 0 and if either bits are true then the bit in the data stored in x will also be true
No. | is not the logical OR operator (that would be ||) but the bitwise OR one. All the code does is combining the two bytes in buff[0] and buff[1] into a single 2-byte integer, where buff[1] denotes the MSB of the number.
The device result is in 6 bytes and the bytes need to be rearranged into 3 integers (having values that can only take up 10 bits at most).
So the first two bytes look like this:
00: xxxx xxxx <- binary value
01: ???? ??xx
The ??? part isn't part of the result because the xxx part comprise the 10 bits. I guess the hardware is built in such a way that the ??? part is all zero bits.
To get this into a single integer variable, we need all 8 of the low bits plus the upper-order 2 bits, shifted left by 8 position so they don't interfere with the low order 8 bits. The logical OR (| - vertical bar) will join those two parts into a single integer that looks like this:
x: ???? ??xx xxxx xxxx <- binary value of a single 16 bit integer
Actually it doesn't matter how big the 'int' is (in bits) as the remaining bits (beyond that 16) will be zero in this case.
to expand and clarify the reply by Carl Norum.
The (int) typecast is required because the source is a byte. The bitshift is performed on the source datatype before the result is saved into X. Therefore it must be cast to at least 16 bits (an int) in order to bitshift 8 bits and retain all the data before the OR operation is executed and the result saved.
What the code is not telling you is if this should be an unsigned int or if there is a sign in the bit data. I'd expect -ve data is possible with an Accelerometer.

How to efficiently convert a few bytes into an integer between a range?

I'm writing something that reads bytes (just a List<int>) from a remote random number generation source that is extremely slow. For that and my personal requirements, I want to retrieve as few bytes from the source as possible.
Now I am trying to implement a method which signature looks like:
int getRandomInteger(int min, int max)
I have two theories how I can fetch bytes from my random source, and convert them to an integer.
Approach #1 is naivé . Fetch (max - min) / 256 number of bytes and add them up. It works, but it's going to fetch a lot of bytes from the slow random number generator source I have. For example, if I want to get a random integer between a million and a zero, it's going to fetch almost 4000 bytes... that's unacceptable.
Approach #2 sounds ideal to me, but I'm unable come up with the algorithm. it goes like this:
Lets take min: 0, max: 1000 as an example.
Calculate ceil(rangeSize / 256) which in this case is ceil(1000 / 256) = 4. Now fetch one (1) byte from the source.
Scale this one byte from the 0-255 range to 0-3 range (or 1-4) and let it determine which group we use. E.g. if the byte was 250, we would choose the 4th group (which represents the last 250 numbers, 750-1000 in our range).
Now fetch another byte and scale from 0-255 to 0-250 and let that determine the position within the group we have. So if this second byte is e.g. 120, then our final integer is 750 + 120 = 870.
In that scenario we only needed to fetch 2 bytes in total. However, it's much more complex as if our range is 0-1000000 we need several "groups".
How do I implement something like this? I'm okay with Java/C#/JavaScript code or pseudo code.
I'd also like to keep the result from not losing entropy/randomness. So, I'm slightly worried of scaling integers.
Unfortunately your Approach #1 is broken. For example if min is 0 and max 510, you'd add 2 bytes. There is only one way to get a 0 result: both bytes zero. The chance of this is (1/256)^2. However there are many ways to get other values, say 100 = 100+0, 99+1, 98+2... So the chance of a 100 is much larger: 101(1/256)^2.
The more-or-less standard way to do what you want is to:
Let R = max - min + 1 -- the number of possible random output values
Let N = 2^k >= mR, m>=1 -- a power of 2 at least as big as some multiple of R that you choose.
loop
b = a random integer in 0..N-1 formed from k random bits
while b >= mR -- reject b values that would bias the output
return min + floor(b/m)
This is called the method of rejection. It throws away randomly selected binary numbers that would bias the output. If min-max+1 happens to be a power of 2, then you'll have zero rejections.
If you have m=1 and min-max+1 is just one more than a biggish power of 2, then rejections will be near half. In this case you'd definitely want bigger m.
In general, bigger m values lead to fewer rejections, but of course they require slighly more bits per number. There is a probabilitistically optimal algorithm to pick m.
Some of the other solutions presented here have problems, but I'm sorry right now I don't have time to comment. Maybe in a couple of days if there is interest.
3 bytes (together) give you random integer in range 0..16777215. You can use 20 bits from this value to get range 0..1048575 and throw away values > 1000000
range 1 to r
256^a >= r
first find 'a'
get 'a' number of bytes into array A[]
num=0
for i=0 to len(A)-1
num+=(A[i]^(8*i))
next
random number = num mod range
Your random source gives you 8 random bits per call. For an integer in the range [min,max] you would need ceil(log2(max-min+1)) bits.
Assume that you can get random bytes from the source using some function:
bool RandomBuf(BYTE* pBuf , size_t nLen); // fill buffer with nLen random bytes
Now you can use the following function to generate a random value in a given range:
// --------------------------------------------------------------------------
// produce a uniformly-distributed integral value in range [nMin, nMax]
// T is char/BYTE/short/WORD/int/UINT/LONGLONG/ULONGLONG
template <class T> T RandU(T nMin, T nMax)
{
static_assert(std::numeric_limits<T>::is_integer, "RandU: integral type expected");
if (nMin>nMax)
std::swap(nMin, nMax);
if (0 == (T)(nMax-nMin+1)) // all range of type T
{
T nR;
return RandomBuf((BYTE*)&nR, sizeof(T)) ? *(T*)&nR : nMin;
}
ULONGLONG nRange = (ULONGLONG)nMax-(ULONGLONG)nMin+1 ; // number of discrete values
UINT nRangeBits= (UINT)ceil(log((double)nRange) / log(2.)); // bits for storing nRange discrete values
ULONGLONG nR ;
do
{
if (!RandomBuf((BYTE*)&nR, sizeof(nR)))
return nMin;
nR= nR>>((sizeof(nR)<<3) - nRangeBits); // keep nRangeBits random bits
}
while (nR >= nRange); // ensure value in range [0..nRange-1]
return nMin + (T)nR; // [nMin..nMax]
}
Since you are always getting a multiple of 8 bits, you can save extra bits between calls (for example you may need only 9 bits out of 16 bits). It requires some bit-manipulations, and it is up to you do decide if it is worth the effort.
You can save even more, if you'll use 'half bits': Let's assume that you want to generate numbers in the range [1..5]. You'll need log2(5)=2.32 bits for each random value. Using 32 random bits you can actually generate floor(32/2.32)= 13 random values in this range, though it requires some additional effort.

Split a non-power-of-two based int

I know that you can split a power-of-two number in half like so:
halfintR = some32bitint & 0xFFFF
halfintL = some32bitint >> 16
can you do the same thing for an integer which is bounded by a non-power of two space?
(say that you want your range to be limited to the set of integers that will fit into 4 digit base 52 space unsigned)
You could use the following
rightDigits = number % 2704 // 52 squared
leftDigits = number / 2704
Well, of course. & 0xffff is the same as % 0x10000 and >> 16 is the same as / 0x10000. It is just that division by a power-of-two is more efficient when done with bit operations like shifting and masking. Division works with any number (within range of representation).
Once you realize that the & and >> are used for doing modulo und division calculation respectively, you can write what you want as:
lower = some4DigitsNumberBase52 % (52 * 52)
upper = some4DigitaNumberBase52 / (52 * 52)
This is the basis for doing base calculation. You can also derive the solution from the algorithm that displays a number in a specific base: how do you come up with the rightmost two digits and the 2 leftmost digits.

Resources