How to encrypt 32bit integer? [duplicate]

How to encrypt 32bit integer? [duplicate] - encryption

I need an algorithm that can do a one-to-one mapping (ie. no collision) of a 32-bit signed integer onto another 32-bit signed integer.
My real concern is enough entropy so that the output of the function appears to be random. Basically I am looking for a cipher similar to XOR Cipher but that can generate more arbitrary-looking outputs. Security is not my real concern, although obscurity is.
Edit for clarification purpose:
The algorithm must be symetric, so that I can reverse the operation without a keypair.
The algorithm must be bijective, every 32-bit input number must generate a 32-bit unique number.
The output of the function must be obscure enough, adding only one to the input should result big effect on the output.
Example expected result:
F(100) = 98456
F(101) = -758
F(102) = 10875498
F(103) = 986541
F(104) = 945451245
F(105) = -488554
Just like MD5, changing one thing may change lots of things.
I am looking for a mathmetical function, so manually mapping integers is not a solution for me. For those who are asking, algorithm speed is not very important.

Use any 32-bit block cipher! By definition, a block cipher maps every possible input value in its range to a unique output value, in a reversible fashion, and by design, it's difficult to determine what any given value will map to without the key. Simply pick a key, keep it secret if security or obscurity is important, and use the cipher as your transformation.
For an extension of this idea to non-power-of-2 ranges, see my post on Secure Permutations with Block Ciphers.
Addressing your specific concerns:
The algorithm is indeed symmetric. I'm not sure what you mean by "reverse the operation without a keypair". If you don't want to use a key, hardcode a randomly generated one and consider it part of the algorithm.
Yup - by definition, a block cipher is bijective.
Yup. It wouldn't be a good cipher if that were not the case.

I will try to explain my solution to this on a much simpler example, which then can be easily extended for your large one.
Say i have a 4 bit number. There are 16 distinct values. Look at it as if it was a four dimensional cube:
(source: ams.org)
.
Every vertex represents one of those numbers, every bit represents one dimension. So its basicaly XYZW, where each of the dimensions can have only values 0 or 1. Now imagine you use a different order of dimensions. For example XZYW. Each of the vertices now changed its number!
You can do this for any number of dimensions, just permute those dimensions. If security is not your concern this could be a nice fast solution for you. On the other hand, i dont know if the output will be "obscure" enough for your needs and certainly after a large amount of mapping done, the mapping can be reversed (which may be an advantage or disadvantage, depending on your needs.)

The following paper gives you 4 or 5 mapping examples, giving you functions rather than building mapped sets: www.cs.auckland.ac.nz/~john-rugis/pdf/BijectiveMapping.pdf

If your goal is simply to get a seemingly random permutation of numbers of a roughly defined size, then there is another possible way: reduce the set of numbers to a prime number.
Then you can use a mapping of the form
f(i) = (i * a + b) % p
and if p is indeed a prime, this will be a bijection for all a != 0 and all b. It will look fairly random for larger a and b.
For example, in my case for which I stumbled on this question, I used 1073741789 as a prime for the range of numbers smaller than 1 << 30. That makes me lose only 35 numbers, which is fine in my case.
My encoding is then
((n + 173741789) * 507371178) % 1073741789
and the decoding is
(n * 233233408 + 1073741789 - 173741789) % 1073741789
Note that 507371178 * 233233408 % 1073741789 == 1, so those two numbers are inverse the field of numbers modulo 1073741789 (you can figure out inverse numbers in such fields with the extended euclidean algorithm).
I chose a and b fairly arbitrarily, I merely made sure they are roughly half the size of p.

Apart from generating random lookup-tables, you can use a combination of functions:
XOR
symmetric bit permutation (for example shift 16 bits, or flip 0-31 to 31-0, or flip 0-3 to 3-0, 4-7 to 7-4, ...)
more?

Can you use a random generated lookup-table? As long as the random numbers in the table are unique, you get a bijective mapping. It's not symmetric, though.
One 16 GB lookup-table for all 32 bit values is probably not practical, but you could use two separate 16-bit lookup tables for the high-word and the low word.
PS: I think you can generate a symmetric bijective lookup table, if that's important. The algorithm would start with an empty LUT:
+----+ +----+
| 1 | -> | |
+----+ +----+
| 2 | -> | |
+----+ +----+
| 3 | -> | |
+----+ +----+
| 4 | -> | |
+----+ +----+
Pick the first element, assign it a random mapping. To make the mapping symmetric, assign the inverse, too:
+----+ +----+
| 1 | -> | 3 |
+----+ +----+
| 2 | -> | |
+----+ +----+
| 3 | -> | 1 |
+----+ +----+
| 4 | -> | |
+----+ +----+
Pick the next number, again assign a random mapping, but pick a number that's not been assigned yet. (i.e. in this case, don't pick 1 or 3). Repeat until the LUT is complete. This should generate a random bijective symmetric mapping.

Take a number, multiplies by 9, inverse digits, divide by 9.
123 <> 1107 <> 7011 <> 779
256 <> 2304 <> 4032 <> 448
1028 <> 9252 <> 2529 <> 281
Should be obscure enough !!
Edit : it is not a bijection for 0 ending integer
900 <> 8100 <> 18 <> 2
2 <> 18 <> 81 <> 9
You can always add a specific rule like :
Take a number, divide by 10 x times, multiplies by 9, inverse digits, divide by 9, multiples by 10^x.
And so
900 <> 9 <> 81 <> 18 <> 2 <> 200
200 <> 2 <> 18 <> 81 <> 9 <> 900
W00t it works !
Edit 2 : For more obscurness, you can add an arbitrary number, and substract at the end.
900 < +256 > 1156 < *9 > 10404 < invert > 40401 < /9 > 4489 < -256 > 4233
123 < +256 > 379 < *9 > 3411 < invert > 1143 < /9 > 127 < -256 > -129

Here is my simple idea:
You can move around the bits of the number, as PeterK proposed, but you can have a different permutation of bits for each number, and still be able to decipher it.
The cipher goes like this:
Treat the input number as an array of bits I[0..31], and the output as O[0..31].
Prepare an array K[0..63] of 64 randomly generated numbers. This will be your key.
Take the bit of input number from position determined by the first random number (I[K[0] mod 32]) and place it at the beginning of your result (O[0]). Now to decide which bit to place at O[1], use the previously used bit. If it is 0, use K[1] to generate position in I from which to take, it it is 1, use K[2] (which simply means skip one random number).
Now this will not work well, as you may take the same bit twice. In order to avoid it, renumber the bits after each iteration, omitting the used bits. To generate the position from which to take O[1] use I[K[p] mod 31], where p is 1 or 2, depending on the bit O[0], as there are 31 bits left, numbered from 0 to 30.
To illustrate this, I'll give an example:
We have a 4-bit number, and 8 random numbers: 25, 5, 28, 19, 14, 20, 0, 18.
I: 0111 O: ____
_
25 mod 4 = 1, so we'll take bit whose position is 1 (counting from 0)
I: 0_11 O: 1___
_
We've just taken a bit of value 1, so we skip one random number and use 28. There are 3 bits left, so to count position we take 28 mod 3 = 1. We take the first (counting from 0) of the remaining bits:
I: 0__1 O: 11__
_
Again we skip one number, and take 14. 14 mod 2 = 0, so we take the 0th bit:
I: ___1 O: 110_
_
Now it doesn't matter, but the previous bit was 0, so we take 20. 20 mod 1 = 0:
I: ____ O: 1101
And this is it.
Deciphering such a number is easy, one just has to do the same things. The position at which to place the first bit of the code is known from the key, the next positions are determined by the previously inserted bits.
This obviously has all the disadvantages of anything which just moves the bits around (for example 0 becomes 0, and MAXINT becomes MAXINT), but is seems harder to find how someone has encrypted the number without knowing the key, which has to be secret.

If you don't want to use proper cryptographic algorithms (perhaps for performance and complexity reasons) you can instead use a simpler cipher like the Vigenère cipher. This cipher was actually described as le chiffre indéchiffrable (French for 'the unbreakable cipher').
Here is a simple C# implementation that shifts values based on a corresponding key value:
void Main()
{
var clearText = Enumerable.Range(0, 10);
var key = new[] { 10, 20, Int32.MaxValue };
var cipherText = Encode(clearText, key);
var clearText2 = Decode(cipherText, key);
}
IEnumerable<Int32> Encode(IEnumerable<Int32> clearText, IList<Int32> key) {
return clearText.Select((i, n) => unchecked(i + key[n%key.Count]));
}
IEnumerable<Int32> Decode(IEnumerable<Int32> cipherText, IList<Int32> key) {
return cipherText.Select((i, n) => unchecked(i - key[n%key.Count]));
}
This algorithm does not create a big shift in the output when the input is changed slightly. However, you can use another bijective operation instead of addition to achieve that.

Draw a large circle on a large sheet of paper. Write all the integers from 0 to MAXINT clockwise from the top of the circle, equally spaced. Write all the integers from 0 to MININT anti-clockwise, equally spaced again. Observe that MININT is next to MAXINT at the bottom of the circle. Now make a duplicate of this figure on both sides of a piece of stiff card. Pin the stiff card to the circle through the centres of both. Pick an angle of rotation, any angle you like. Now you have a 1-1 mapping which meets some of your requirements, but is probably not obscure enough. Unpin the card, flip it around a diameter, any diameter. Repeat these steps (in any order) until you have a bijection you are happy with.
If you have been following closely it shouldn't be difficult to program this in your preferred language.
For Clarification following the comment: If you only rotate the card against the paper then the method is as simple as you complain. However, when you flip the card over the mapping is not equivalent to (x+m) mod MAXINT for any m. For example, if you leave the card unrotated and flip it around the diameter through 0 (which is at the top of the clock face) then 1 is mapped to -1, 2 to -2, and so forth. (x+m) mod MAXINT corresponds to rotations of the card only.

Split the number in two (16 most significant bits and 16 least significant bits) and consider the bits in the two 16-bit results as cards in two decks. Mix the decks forcing one into the other.
So if your initial number is b31,b30,...,b1,b0 you end up with b15,b31,b14,b30,...,b1,b17,b0,b16. It's fast and quick to implement, as is the inverse.
If you look at the decimal representation of the results, the series looks pretty obscure.
You can manually map 0 -> maxvalue and maxvalue -> 0 to avoid them mapping onto themselves.

Related

Implementation of Speck cipher

I am trying to implement the speck cipher as specified here: Speck Cipher. On page 18 of the document you can find some speck pseudo-code I want to implement.
It seems that I got a problem on understanding the pseudo-code. As you can find there, x and y are plaintext words with length n. l[m-2],...l[0], k[0] are key words (as for words, they have length n right?). When you do the key expansion, we iterate for i from 0 to T-2, where T are the round numbers (for example 34). However I get an IndexOutofBoundsException, because the array with the l's has only m-2 positions and not T-2.
Can someone clarify what the key expansions does and how?

Ah, I get where the confusion lies:
l[m-2],...l[0], k[0]
these are the input key words, in other words, they represent the key. These are not declarations of the size of the arrays, as you might expect if you're a developer.
Then the subkey's in array k should be derived, using array l for intermediate values.
According to the formulas, taking the largest i, i.e. i_max = T - 2 you get a highest index for array l of i_max + m - 1 = T - 2 + m - 1 = T + m - 3 and therefore a size of the array of one more: T + m - 2. The size of a zero-based array is always the index of the last element - plus one, after all.
Similarly, for subkey array k you get a highest index of i_max + 1, which is T - 2 + 1 or T - 1. Again, the size of the array is one more, so there are T elements in k. This makes a lot of sense if you require T round keys :)
Note that it seems possible to simply redo the subkey derivation for each round if you require a minimum of RAM. The entire l array doesn't seem necessary either. For software implementations that doesn't matter a single iota of course.

F#: integer (%) integer - Is Calculated How?

So in my text book there is this example of a recursive function using f#
let rec gcd = function
| (0,n) -> n
| (m,n) -> gcd(n % m,m);;
with this function my text book gives the example by executing:
gcd(36,116);;
and since the m = 36 and not 0 then it ofcourse goes for the second clause like this:
gcd(116 % 36,36)
gcd(8,36)
gcd(36 % 8,8)
gcd(4,8)
gcd(8 % 4,4)
gcd(0,4)
and now hits the first clause stating this entire thing is = 4.
What i don't get is this (%)percentage sign/operator or whatever it is called in this connection. for an instance i don't get how
116 % 36 = 8
I have turned this so many times in my head now and I can't figure how this can turn into 8?
I know this is probably a silly question for those of you who knows this but I would very much appreciate your help the same.

% is a questionable version of modulo, which is the remainder of an integer division.
In the positive, you can think of % as the remainder of the division. See for example Wikipedia on Euclidean Divison. Consider 9 % 4: 4 fits into 9 twice. But two times four is only eight. Thus, there is a remainder of one.
If there are negative operands, % effectively ignores the signs to calculate the remainder and then uses the sign of the dividend as the sign of the result. This corresponds to the remainder of an integer division that rounds to zero, i.e. -2 / 3 = 0.
This is a mathematically unusual definition of division and remainder that has some bad properties. Normally, when calculating modulo n, adding or subtracting n on the input has no effect. Not so for this operator: 2 % 3 is not equal to (2 - 3) % 3.
I usually have the following defined to get useful remainders when there are negative operands:
/// Euclidean remainder, the proper modulo operation
let inline (%!) a b = (a % b + b) % b
So far, this operator was valid for all cases I have encountered where a modulo was needed, while the raw % repeatedly wasn't. For example:
When filling rows and columns from a single index, you could calculate rowNumber = index / nCols and colNumber = index % nCols. But if index and colNumber can be negative, this mapping becomes invalid, while Euclidean division and remainder remain valid.
If you want to normalize an angle to (0, 2pi), angle %! (2. * System.Math.PI) does the job, while the "normal" % might give you a headache.

Because
116 / 36 = 3
116 - (3*36) = 8

Basically, the % operator, known as the modulo operator will divide a number by other and give the rest if it can't divide any longer. Usually, the first time you would use it to understand it would be if you want to see if a number is even or odd by doing something like this in f#
let firstUsageModulo = 55 %2 =0 // false because leaves 1 not 0
When it leaves 8 the first time means that it divided you 116 with 36 and the closest integer was 8 to give.

Just to help you in future with similar problems: in IDEs such as Xamarin Studio and Visual Studio, if you hover the mouse cursor over an operator such as % you should get a tooltip, thus:
Module operator tool tip
Even if you don't understand the tool tip directly, it'll give you something to google.

bit-shift operation in accelerometer code

I'm programming my Arduino micro controller and I found some code for accepting accelerometer sensor data for later use. I can understand all but the following code. I'd like to have some intuition as to what is happening but after all my searching and reading I can't wrap my head around what is going on and truly understand.
I have taken a class in C++ and we did very little with bitwise operations or bit shifting or whatever you'd like to call it. Let me try to explain what I think I understand and you can correct me where it is needed.
So:
I think we are storing a value in x, pretty sure in fact.
It appears that the data in array "buff", slot number 1, is being set to the datatype of integer.
The value in slot 1 is being bit shifted 8 places to the left.(does this point to buff slot 0?)
This new value is being compared to the data in buff slot 0 and if either bits are true then the bit in the data stored in x will also be true so, 0 and 1 = 1, 0 and 0 = 0 and 1 and 0 = 1 in the end stored value.
The code does this for all three axis: x, y, z but I'm not sure why...I need help. I want full understanding before I progress.
//each axis reading comes in 10 bit resolution, ie 2 bytes.
// Least Significant Byte first!!
//thus we are converting both bytes in to one int
x = (((int)buff[1]) << 8) | buff[0];
y = (((int)buff[3]) << 8) | buff[2];
z = (((int)buff[5]) << 8) | buff[4];

This code is being used to convert the raw accelerometer data (in an array of 6 bytes) into three 10-bit integer values. As the comment says, the data is LSB first. That is:
buff[0] // least significant 8 bits of x data
buff[1] // most significant 2 bits of x data
buff[2] // least significant 8 bits of y data
buff[3] // most significant 2 bits of y data
buff[4] // least significant 8 bits of z data
buff[5] // most significant 2 bits of z data
It's using bitwise operators two put the two parts together into a single variable. The (int) typecasts are unnecessary and (IMHO) confusing. This simplified expression:
x = (buff[1] << 8) | buff[0];
Takes the data in buff[1], and shifts it left 8 bits, and then puts the 8 bits from buff[0] in the space so created. Let's label the 10 bits a through j for example's sake:
buff[0] = cdefghij
buff[1] = 000000ab
Then:
buff[1] << 8 = ab00000000
And:
buff[1] << 8 | buff[0] = abcdefghij

The value in slot 1 is being bit shifted 8 places to the left.(does this point to buff slot 0?)
Nah. Bitwise operators ain't pointer arithmetic, don't confuse the two. Shifting by N places to the left is (roughly) equivalent with multiplying by 2 to the Nth power (except some corner cases in C, but let's not talk about those yet).
This new value is being compared to the data in buff slot 0 and if either bits are true then the bit in the data stored in x will also be true
No. | is not the logical OR operator (that would be ||) but the bitwise OR one. All the code does is combining the two bytes in buff[0] and buff[1] into a single 2-byte integer, where buff[1] denotes the MSB of the number.

The device result is in 6 bytes and the bytes need to be rearranged into 3 integers (having values that can only take up 10 bits at most).
So the first two bytes look like this:
00: xxxx xxxx <- binary value
01: ???? ??xx
The ??? part isn't part of the result because the xxx part comprise the 10 bits. I guess the hardware is built in such a way that the ??? part is all zero bits.
To get this into a single integer variable, we need all 8 of the low bits plus the upper-order 2 bits, shifted left by 8 position so they don't interfere with the low order 8 bits. The logical OR (| - vertical bar) will join those two parts into a single integer that looks like this:
x: ???? ??xx xxxx xxxx <- binary value of a single 16 bit integer
Actually it doesn't matter how big the 'int' is (in bits) as the remaining bits (beyond that 16) will be zero in this case.

to expand and clarify the reply by Carl Norum.
The (int) typecast is required because the source is a byte. The bitshift is performed on the source datatype before the result is saved into X. Therefore it must be cast to at least 16 bits (an int) in order to bitshift 8 bits and retain all the data before the OR operation is executed and the result saved.
What the code is not telling you is if this should be an unsigned int or if there is a sign in the bit data. I'd expect -ve data is possible with an Accelerometer.

How to efficiently convert a few bytes into an integer between a range?

I'm writing something that reads bytes (just a List<int>) from a remote random number generation source that is extremely slow. For that and my personal requirements, I want to retrieve as few bytes from the source as possible.
Now I am trying to implement a method which signature looks like:
int getRandomInteger(int min, int max)
I have two theories how I can fetch bytes from my random source, and convert them to an integer.
Approach #1 is naivé . Fetch (max - min) / 256 number of bytes and add them up. It works, but it's going to fetch a lot of bytes from the slow random number generator source I have. For example, if I want to get a random integer between a million and a zero, it's going to fetch almost 4000 bytes... that's unacceptable.
Approach #2 sounds ideal to me, but I'm unable come up with the algorithm. it goes like this:
Lets take min: 0, max: 1000 as an example.
Calculate ceil(rangeSize / 256) which in this case is ceil(1000 / 256) = 4. Now fetch one (1) byte from the source.
Scale this one byte from the 0-255 range to 0-3 range (or 1-4) and let it determine which group we use. E.g. if the byte was 250, we would choose the 4th group (which represents the last 250 numbers, 750-1000 in our range).
Now fetch another byte and scale from 0-255 to 0-250 and let that determine the position within the group we have. So if this second byte is e.g. 120, then our final integer is 750 + 120 = 870.
In that scenario we only needed to fetch 2 bytes in total. However, it's much more complex as if our range is 0-1000000 we need several "groups".
How do I implement something like this? I'm okay with Java/C#/JavaScript code or pseudo code.
I'd also like to keep the result from not losing entropy/randomness. So, I'm slightly worried of scaling integers.

Unfortunately your Approach #1 is broken. For example if min is 0 and max 510, you'd add 2 bytes. There is only one way to get a 0 result: both bytes zero. The chance of this is (1/256)^2. However there are many ways to get other values, say 100 = 100+0, 99+1, 98+2... So the chance of a 100 is much larger: 101(1/256)^2.
The more-or-less standard way to do what you want is to:
Let R = max - min + 1 -- the number of possible random output values
Let N = 2^k >= mR, m>=1 -- a power of 2 at least as big as some multiple of R that you choose.
loop
b = a random integer in 0..N-1 formed from k random bits
while b >= mR -- reject b values that would bias the output
return min + floor(b/m)
This is called the method of rejection. It throws away randomly selected binary numbers that would bias the output. If min-max+1 happens to be a power of 2, then you'll have zero rejections.
If you have m=1 and min-max+1 is just one more than a biggish power of 2, then rejections will be near half. In this case you'd definitely want bigger m.
In general, bigger m values lead to fewer rejections, but of course they require slighly more bits per number. There is a probabilitistically optimal algorithm to pick m.
Some of the other solutions presented here have problems, but I'm sorry right now I don't have time to comment. Maybe in a couple of days if there is interest.

3 bytes (together) give you random integer in range 0..16777215. You can use 20 bits from this value to get range 0..1048575 and throw away values > 1000000

range 1 to r
256^a >= r
first find 'a'
get 'a' number of bytes into array A[]
num=0
for i=0 to len(A)-1
num+=(A[i]^(8*i))
next
random number = num mod range

Your random source gives you 8 random bits per call. For an integer in the range [min,max] you would need ceil(log2(max-min+1)) bits.
Assume that you can get random bytes from the source using some function:
bool RandomBuf(BYTE* pBuf , size_t nLen); // fill buffer with nLen random bytes
Now you can use the following function to generate a random value in a given range:
// --------------------------------------------------------------------------
// produce a uniformly-distributed integral value in range [nMin, nMax]
// T is char/BYTE/short/WORD/int/UINT/LONGLONG/ULONGLONG
template <class T> T RandU(T nMin, T nMax)
{
static_assert(std::numeric_limits<T>::is_integer, "RandU: integral type expected");
if (nMin>nMax)
std::swap(nMin, nMax);
if (0 == (T)(nMax-nMin+1)) // all range of type T
{
T nR;
return RandomBuf((BYTE*)&nR, sizeof(T)) ? *(T*)&nR : nMin;
}
ULONGLONG nRange = (ULONGLONG)nMax-(ULONGLONG)nMin+1 ; // number of discrete values
UINT nRangeBits= (UINT)ceil(log((double)nRange) / log(2.)); // bits for storing nRange discrete values
ULONGLONG nR ;
do
{
if (!RandomBuf((BYTE*)&nR, sizeof(nR)))
return nMin;
nR= nR>>((sizeof(nR)<<3) - nRangeBits); // keep nRangeBits random bits
}
while (nR >= nRange); // ensure value in range [0..nRange-1]
return nMin + (T)nR; // [nMin..nMax]
}
Since you are always getting a multiple of 8 bits, you can save extra bits between calls (for example you may need only 9 bits out of 16 bits). It requires some bit-manipulations, and it is up to you do decide if it is worth the effort.
You can save even more, if you'll use 'half bits': Let's assume that you want to generate numbers in the range [1..5]. You'll need log2(5)=2.32 bits for each random value. Using 32 random bits you can actually generate floor(32/2.32)= 13 random values in this range, though it requires some additional effort.

Arbitrary-precision arithmetic Explanation

I'm trying to learn C and have come across the inability to work with REALLY big numbers (i.e., 100 digits, 1000 digits, etc.). I am aware that there exist libraries to do this, but I want to attempt to implement it myself.
I just want to know if anyone has or can provide a very detailed, dumbed down explanation of arbitrary-precision arithmetic.

It's all a matter of adequate storage and algorithms to treat numbers as smaller parts. Let's assume you have a compiler in which an int can only be 0 through 99 and you want to handle numbers up to 999999 (we'll only worry about positive numbers here to keep it simple).
You do that by giving each number three ints and using the same rules you (should have) learned back in primary school for addition, subtraction and the other basic operations.
In an arbitrary precision library, there's no fixed limit on the number of base types used to represent our numbers, just whatever memory can hold.
Addition for example: 123456 + 78:
12 34 56
78
-- -- --
12 35 34
Working from the least significant end:
initial carry = 0.
56 + 78 + 0 carry = 134 = 34 with 1 carry
34 + 00 + 1 carry = 35 = 35 with 0 carry
12 + 00 + 0 carry = 12 = 12 with 0 carry
This is, in fact, how addition generally works at the bit level inside your CPU.
Subtraction is similar (using subtraction of the base type and borrow instead of carry), multiplication can be done with repeated additions (very slow) or cross-products (faster) and division is trickier but can be done by shifting and subtraction of the numbers involved (the long division you would have learned as a kid).
I've actually written libraries to do this sort of stuff using the maximum powers of ten that can be fit into an integer when squared (to prevent overflow when multiplying two ints together, such as a 16-bit int being limited to 0 through 99 to generate 9,801 (<32,768) when squared, or 32-bit int using 0 through 9,999 to generate 99,980,001 (<2,147,483,648)) which greatly eased the algorithms.
Some tricks to watch out for.
1/ When adding or multiplying numbers, pre-allocate the maximum space needed then reduce later if you find it's too much. For example, adding two 100-"digit" (where digit is an int) numbers will never give you more than 101 digits. Multiply a 12-digit number by a 3 digit number will never generate more than 15 digits (add the digit counts).
2/ For added speed, normalise (reduce the storage required for) the numbers only if absolutely necessary - my library had this as a separate call so the user can decide between speed and storage concerns.
3/ Addition of a positive and negative number is subtraction, and subtracting a negative number is the same as adding the equivalent positive. You can save quite a bit of code by having the add and subtract methods call each other after adjusting signs.
4/ Avoid subtracting big numbers from small ones since you invariably end up with numbers like:
10
11-
-- -- -- --
99 99 99 99 (and you still have a borrow).
Instead, subtract 10 from 11, then negate it:
11
10-
--
1 (then negate to get -1).
Here are the comments (turned into text) from one of the libraries I had to do this for. The code itself is, unfortunately, copyrighted, but you may be able to pick out enough information to handle the four basic operations. Assume in the following that -a and -b represent negative numbers and a and b are zero or positive numbers.
For addition, if signs are different, use subtraction of the negation:
-a + b becomes b - a
a + -b becomes a - b
For subtraction, if signs are different, use addition of the negation:
a - -b becomes a + b
-a - b becomes -(a + b)
Also special handling to ensure we're subtracting small numbers from large:
small - big becomes -(big - small)
Multiplication uses entry-level math as follows:
475(a) x 32(b) = 475 x (30 + 2)
= 475 x 30 + 475 x 2
= 4750 x 3 + 475 x 2
= 4750 + 4750 + 4750 + 475 + 475
The way in which this is achieved involves extracting each of the digits of 32 one at a time (backwards) then using add to calculate a value to be added to the result (initially zero).
ShiftLeft and ShiftRight operations are used to quickly multiply or divide a LongInt by the wrap value (10 for "real" math). In the example above, we add 475 to zero 2 times (the last digit of 32) to get 950 (result = 0 + 950 = 950).
Then we left shift 475 to get 4750 and right shift 32 to get 3. Add 4750 to zero 3 times to get 14250 then add to result of 950 to get 15200.
Left shift 4750 to get 47500, right shift 3 to get 0. Since the right shifted 32 is now zero, we're finished and, in fact 475 x 32 does equal 15200.
Division is also tricky but based on early arithmetic (the "gazinta" method for "goes into"). Consider the following long division for 12345 / 27:
457
+-------
27 | 12345 27 is larger than 1 or 12 so we first use 123.
108 27 goes into 123 4 times, 4 x 27 = 108, 123 - 108 = 15.
---
154 Bring down 4.
135 27 goes into 154 5 times, 5 x 27 = 135, 154 - 135 = 19.
---
195 Bring down 5.
189 27 goes into 195 7 times, 7 x 27 = 189, 195 - 189 = 6.
---
6 Nothing more to bring down, so stop.
Therefore 12345 / 27 is 457 with remainder 6. Verify:
457 x 27 + 6
= 12339 + 6
= 12345
This is implemented by using a draw-down variable (initially zero) to bring down the segments of 12345 one at a time until it's greater or equal to 27.
Then we simply subtract 27 from that until we get below 27 - the number of subtractions is the segment added to the top line.
When there are no more segments to bring down, we have our result.
Keep in mind these are pretty basic algorithms. There are far better ways to do complex arithmetic if your numbers are going to be particularly large. You can look into something like GNU Multiple Precision Arithmetic Library - it's substantially better and faster than my own libraries.
It does have the rather unfortunate misfeature in that it will simply exit if it runs out of memory (a rather fatal flaw for a general purpose library in my opinion) but, if you can look past that, it's pretty good at what it does.
If you cannot use it for licensing reasons (or because you don't want your application just exiting for no apparent reason), you could at least get the algorithms from there for integrating into your own code.
I've also found that the bods over at MPIR (a fork of GMP) are more amenable to discussions on potential changes - they seem a more developer-friendly bunch.

While re-inventing the wheel is extremely good for your personal edification and learning, its also an extremely large task. I don't want to dissuade you as its an important exercise and one that I've done myself, but you should be aware that there are subtle and complex issues at work that larger packages address.
For example, multiplication. Naively, you might think of the 'schoolboy' method, i.e. write one number above the other, then do long multiplication as you learned in school. example:
123
x 34
-----
492
+ 3690
---------
4182
but this method is extremely slow (O(n^2), n being the number of digits). Instead, modern bignum packages use either a discrete Fourier transform or a Numeric transform to turn this into an essentially O(n ln(n)) operation.
And this is just for integers. When you get into more complicated functions on some type of real representation of number (log, sqrt, exp, etc.) things get even more complicated.
If you'd like some theoretical background, I highly recommend reading the first chapter of Yap's book, "Fundamental Problems of Algorithmic Algebra". As already mentioned, the gmp bignum library is an excellent library. For real numbers, I've used MPFR and liked it.

Don't reinvent the wheel: it might turn out to be square!
Use a third party library, such as GNU MP, that is tried and tested.

You do it in basically the same way you do with pencil and paper...
The number is to be represented in a buffer (array) able to take on an arbitrary size (which means using malloc and realloc) as needed
you implement basic arithmetic as much as possible using language supported structures, and deal with carries and moving the radix-point manually
you scour numeric analysis texts to find efficient arguments for dealing by more complex function
you only implement as much as you need.
Typically you will use as you basic unit of computation
bytes containing with 0-99 or 0-255
16 bit words contaning wither 0-9999 or 0--65536
32 bit words containing...
...
as dictated by your architecture.
The choice of binary or decimal base depends on you desires for maximum space efficiency, human readability, and the presence of absence of Binary Coded Decimal (BCD) math support on your chip.

You can do it with high school level of mathematics. Though more advanced algorithms are used in reality. So for example to add two 1024-byte numbers :
unsigned char first[1024], second[1024], result[1025];
unsigned char carry = 0;
unsigned int sum = 0;
for(size_t i = 0; i < 1024; i++)
{
sum = first[i] + second[i] + carry;
carry = sum - 255;
}
result will have to be bigger by one place in case of addition to take care of maximum values. Look at this :
9
+
9
----
18
TTMath is a great library if you want to learn. It is built using C++. The above example was silly one, but this is how addition and subtraction is done in general!
A good reference about the subject is Computational complexity of mathematical operations. It tells you how much space is required for each operation you want to implement. For example, If you have two N-digit numbers, then you need 2N digits to store the result of multiplication.
As Mitch said, it is by far not an easy task to implement! I recommend you take a look at TTMath if you know C++.

One of the ultimate references (IMHO) is Knuth's TAOCP Volume II. It explains lots of algorithms for representing numbers and arithmetic operations on these representations.
#Book{Knuth:taocp:2,
author = {Knuth, Donald E.},
title = {The Art of Computer Programming},
volume = {2: Seminumerical Algorithms, second edition},
year = {1981},
publisher = {\Range{Addison}{Wesley}},
isbn = {0-201-03822-6},
}

Assuming that you wish to write a big integer code yourself, this can be surprisingly simple to do, spoken as someone who did it recently (though in MATLAB.) Here are a few of the tricks I used:
I stored each individual decimal digit as a double number. This makes many operations simple, especially output. While it does take up more storage than you might wish, memory is cheap here, and it makes multiplication very efficient if you can convolve a pair of vectors efficiently. Alternatively, you can store several decimal digits in a double, but beware then that convolution to do the multiplication can cause numerical problems on very large numbers.
Store a sign bit separately.
Addition of two numbers is mainly a matter of adding the digits, then check for a carry at each step.
Multiplication of a pair of numbers is best done as convolution followed by a carry step, at least if you have a fast convolution code on tap.
Even when you store the numbers as a string of individual decimal digits, division (also mod/rem ops) can be done to gain roughly 13 decimal digits at a time in the result. This is much more efficient than a divide that works on only 1 decimal digit at a time.
To compute an integer power of an integer, compute the binary representation of the exponent. Then use repeated squaring operations to compute the powers as needed.
Many operations (factoring, primality tests, etc.) will benefit from a powermod operation. That is, when you compute mod(a^p,N), reduce the result mod N at each step of the exponentiation where p has been expressed in a binary form. Do not compute a^p first, and then try to reduce it mod N.

Here's a simple ( naive ) example I did in PHP.
I implemented "Add" and "Multiply" and used that for an exponent example.
http://adevsoft.com/simple-php-arbitrary-precision-integer-big-num-example/
Code snip
// Add two big integers
function ba($a, $b)
{
if( $a === "0" ) return $b;
else if( $b === "0") return $a;
$aa = str_split(strrev(strlen($a)>1?ltrim($a,"0"):$a), 9);
$bb = str_split(strrev(strlen($b)>1?ltrim($b,"0"):$b), 9);
$rr = Array();
$maxC = max(Array(count($aa), count($bb)));
$aa = array_pad(array_map("strrev", $aa),$maxC+1,"0");
$bb = array_pad(array_map("strrev", $bb),$maxC+1,"0");
for( $i=0; $i<=$maxC; $i++ )
{
$t = str_pad((string) ($aa[$i] + $bb[$i]), 9, "0", STR_PAD_LEFT);
if( strlen($t) > 9 )
{
$aa[$i+1] = ba($aa[$i+1], substr($t,0,1));
$t = substr($t, 1);
}
array_unshift($rr, $t);
}
return implode($rr);
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex