RSA exponent size - encryption

RSA exponent size - encryption

I am learning about the RSA algorithm. I perform the algorithm on very small prime numbers and use online Big Integer calculators to perform the encryption and decryption and everything works just fine.
My question is about the size of the exponent we create and when it comes to bigger numbers, it seems infeasible to calculate.
For example, the algorithm starts with picking two prime numbers p and q. You compute n=pxq and then the totient of n. Next you pick a number 'e' such that 1
Then to perform an encryption you take say like the ASCII character 'A' which is 65 and you raise it to the power of e. (65^e)
The online big integer calculator started getting very slow and sluggish (over a minute to calculate) when e was bigger than about 100,000 (6 digits)
My question is then, for the working RSA algorithm, what size (number of digits) number does that algorithm pick?
One thought I had was it was possible the online calculator that I was using was not using the best method for exponents? This is the calculator I am using: http://www.javascripter.net/math/calculators/100digitbigintcalculator.htm

Let's say M is the modulus. So YES, you could first perform intermediate = 65^e, and finally compute intermediate mod M. And of course, intermediate would be a very very very very big integer (if e equals 65537, the decimal representation of intermediate contains 118813 digits!).
BUT, thanks to a very basic modular arithmetic theorem,
(65^e) mod M = ((((65 mod M) * 65) mod M) * 65) mod M [...] (e times)
(the theorem states that in a quotient ring, the n-th power of the class of an element is the class of the n-th power of the element)
As you can see, this does not need any very big integer library, since after each arithmetic product, you use mod M that returns an integer between 0 and M-1. So, you only have to compute arithmetic products of integers less than M.
As an example, here is a simple shell script (bash) that computes 65^65537 mod 991*997. As you can see, no need to get a big number library:
#!/bin/bash
# set RSA parameters
m=65 # message to encode
M=$((991*997)) # modulus (both 991 and 997 are prime numbers)
e=65537 # public exponent (coprime with 990*996, thus compliant with RSA algorithm)
# compute (m^e) mod M
ret=1
for i in {1..$e}
do
ret=$(((ret*m)%M))
done
# display the result
echo $ret
It immediately returns 784933, thus 65^65537 mod 991*997 = 784933
The biggest integer computed with your method of calculus has 118813 digits, but the biggest integer handled with this shell script only has 12 or less digits ((M-1)^2 is made of 12 digits).
According to these explanations, we can now answer your question:
My question is then, for the working RSA algorithm, what size (number of digits) number does that algorithm pick?
With the above explanations, you can see that the maximum number of digits in the decimal representation of integers you have to manipulate is 1+log10((M-1)^2), because you will, at most, compute a product of two integers between 0 and M-1.
Note that 1+log10((M-1)^2) = 1+2.log10(M-1) < 2+2.log10(M) = 2.(1+log10(M)). Also note that 1+log10(M) is the number of digits of M.
Therefore, as a conclusion, this proves that the number of digits your library has to handle correctly is twice the number of digits of the modulus (if you are computing the exponentiation using integer multiplications the way explained here).

Related

Smallest perfect square divisible by all elements of an array (with large numbers)

Given an array A[] with n elements, the task is to find S mod (10^9+7), in which S is the smallest perfect square which is divisible by all the elements A[i] (1<=i<=n) of the given array.
So, the problem is very easy if the value of A[i] and n is small. But in this case, I don't know what to do when A[i] can up to 10^7 and n can up to 10^5. So everybody help me pls!

The smallest integer X which is a multiple of all the A_i is called the least common multiple of the A_i. It's also true that every common multiple of the A_i is divisible by X. So S is divisible by X, or equivalently S is a multiple of X.
The LCM can computed fairly efficiently by the algorithms mentioned in the wikipedia article, but remember our final goal is S, a perfect square, not X. Also, the size of X (and S) is likely to be enormous given the constraints in your problem.
Thus I think the correct approach is to use a modified Sieve of Eratosthenes (or just obtain from some online source a list of primes up to 3163) to completely factor all the A_i simultaneously into their prime power factorizations. Since the A_i < 107 you need only include primes <= 103.5. Now, with each A_i factored into its prime power factorization use the prime factorization method to find the LCM, but still retain this in prime power format, in other words don't yet multiply everything together. Next, scan through each of the powers and add 1 to any odd powers. Now you have the prime power factorization of S. Iterate through these prime powers, multiplying each one into the product and taking the product mod (109+7) at each step.

Using MPFR And Adding - How many Digits are Correct?

I have a pretty easy question (I think). As much as I've tried, I can not find an answer to this question.
I am creating a function, for which I want the user to enter two numbers. The first is the the number of terms of a certain infinite series to add together. The second is the number of digits the user would like the truncated sum to be accurate to.
Say the terms of the sequence are a_i. How much precision n, would be required in mpfr to ensure the result of adding these a_i from i=0 up to the user's entered value would be needed to guarantee the number of digits the user needs?
By the way, I'm adding the a_i in a naive way.
Any help will be much appreciated.
Thanks,
Rick

You can convert between decimal digits of precision, d, and binary digits of precision, b, with logarithms
b = d × log(10) / log(2)
A little rearranging shows why
b × log(2) = d × log(10)
log(2b) = log(10d)
2b = 10d
Each term of the series (and each addition) will introduce a rounding error at the least significant digit so, assuming each of the t terms involves n (two argument) arithmetic operations, you will want to add an extra
log(t * (n+2))/log(2)
bits.
You'll need to round the number of bits of precision up to be sure that you have enough room for your decimal digits of precision
b = ceil((d*log(10.0) + log(t*(n+2)))/log(2.0));
Finally, you should be aware that the terms may introduce cancellation errors, in which case this simple calculation will dramatically underestimate the required number of bits, even assuming I've got it right in the first place ;-)

How do computers evaluate huge numbers?

If I enter a value, for example
1234567 ^ 98787878
into Wolfram Alpha it can provide me with a number of details. This includes decimal approximation, total length, last digits etc. How do you evaluate such large numbers? As I understand it a programming language would have to have a special data type in order to store the number, let alone add it to something else. While I can see how one might approach the addition of two very large numbers, I can't see how huge numbers are evaluated.
10^2 could be calculated through repeated addition. However a number such as the example above would require a gigantic loop. Could someone explain how such large numbers are evaluated? Also, how could someone create a custom large datatype to support large numbers in C# for example?

Well it's quite easy and you can have done it yourself
Number of digits can be obtained via logarithm:
since `A^B = 10 ^ (B * log(A, 10))`
we can compute (A = 1234567; B = 98787878) in our case that
`B * log(A, 10) = 98787878 * log(1234567, 10) = 601767807.4709646...`
integer part + 1 (601767807 + 1 = 601767808) is the number of digits
First, say, five, digits can be gotten via logarithm as well;
now we should analyze fractional part of the
B * log(A, 10) = 98787878 * log(1234567, 10) = 601767807.4709646...
f = 0.4709646...
first digits are 10^f (decimal point removed) = 29577...
Last, say, five, digits can be obtained as a corresponding remainder:
last five digits = A^B rem 10^5
A rem 10^5 = 1234567 rem 10^5 = 34567
A^B rem 10^5 = ((A rem 10^5)^B) rem 10^5 = (34567^98787878) rem 10^5 = 45009
last five digits are 45009
You may find BigInteger.ModPow (C#) very useful here
Finally
1234567^98787878 = 29577...45009 (601767808 digits)

There are usually libraries providing a bignum datatype for arbitrarily large integers (eg. mapping digits k*n...(k+1)*n-1, k=0..<some m depending on n and number magnitude> to a machine word of size n redefining arithmetic operations). for c#, you might be interested in BigInteger.
exponentiation can be recursively broken down:
pow(a,2*b) = pow(a,b) * pow(a,b);
pow(a,2*b+1) = pow(a,b) * pow(a,b) * a;
there also are number-theoretic results that have engenedered special algorithms to determine properties of large numbers without actually computing them (to be precise: their full decimal expansion).

To compute how many digits there are, one uses the following expression:
decimal_digits(n) = 1 + floor(log_10(n))
This gives:
decimal_digits(1234567^98787878) = 1 + floor(log_10(1234567^98787878))
= 1 + floor(98787878 * log_10(1234567))
= 1 + floor(98787878 * 6.0915146640862625)
= 1 + floor(601767807.4709647)
= 601767808
The trailing k digits are computed by doing exponentiation mod 10^k, which keeps the intermediate results from ever getting too large.
The approximation will be computed using a (software) floating-point implementation that effectively evaluates a^(98787878 log_a(1234567)) to some fixed precision for some number a that makes the arithmetic work out nicely (typically 2 or e or 10). This also avoids the need to actually work with millions of digits at any point.

There are many libraries for this and the capability is built-in in the case of python. You seem primarily concerned with the size of such numbers and the time it may take to do computations like the exponent in your example. So I'll explain a bit.
Representation
You might use an array to hold all the digits of large numbers. A more efficient way would be to use an array of 32 bit unsigned integers and store "32 bit chunks" of the large number. You can think of these chunks as individual digits in a number system with 2^32 distinct digits or characters. I used an array of bytes to do this on an 8-bit Atari800 back in the day.
Doing math
You can obviously add two such numbers by looping over all the digits and adding elements of one array to the other and keeping track of carries. Once you know how to add, you can write code to do "manual" multiplication by multiplying digits and putting the results in the right place and a lot of addition - but software will do all this fairly quickly. There are faster multiplication algorithms than the one you would use manually on paper as well. Paper multiplication is O(n^2) where other methods are O(n*log(n)). As for the exponent, you can of course multiply by the same number millions of times but each of those multiplications would be using the previously mentioned function for doing multiplication. There are faster ways to do exponentiation that require far fewer multiplies. For example you can compute x^16 by computing (((x^2)^2)^2)^2 which involves only 4 actual (large integer) multiplications.
In practice
It's fun and educational to try writing these functions yourself, but in practice you will want to use an existing library that has been optimized and verified.

I think a part of the answer is in the question itself :) To store these expressions, you can store the base (or mantissa), and exponent separately, like scientific notation goes. Extending to that, you cannot possibly evaluate the expression completely and store such large numbers, although, you can theoretically predict certain properties of the consequent expression. I will take you through each of the properties you talked about:
Decimal approximation: Can be calculated by evaluating simple log values.
Total number of digits for expression a^b, can be calculated by the formula
Digits = floor function (1 + Log10(a^b)), where floor function is the closest integer smaller than the number. For e.g. the number of digits in 10^5 is 6.
Last digits: These can be calculated by the virtue of the fact that the expression of linearly increasing exponents form a arithmetic progression. For e.g. at the units place; 7, 9, 3, 1 is repeated for exponents of 7^x. So, you can calculate that if x%4 is 0, the last digit is 1.
Can someone create a custom datatype for large numbers, I can't say, but I am sure, the number won't be evaluated and stored.

Efficient Multiplication of Varying-Length #s [Conceptual]

EDIT
So it seems I "underestimated" what varying length numbers meant. I didn't even think about situations where the operands are 100 digits long. In that case, my proposed algorithm is definitely not efficient. I'd probably need an implementation who's complexity depends on the # of digits in each operands as opposed to its numerical value, right?
As suggested below, I will look into the Karatsuba algorithm...
Write the pseudocode of an algorithm that takes in two arbitrary length numbers (provided as strings), and computes the product of these numbers. Use an efficient procedure for multiplication of large numbers of arbitrary length. Analyze the efficiency of your algorithm.
I decided to take the (semi) easy way out and use the Russian Peasant Algorithm. It works like this:
a * b = a/2 * 2b if a is even
a * b = (a-1)/2 * 2b + a if a is odd
My pseudocode is:
rpa(x, y){
if x is 1
return y
if x is even
return rpa(x/2, 2y)
if x is odd
return rpa((x-1)/2, 2y) + y
}
I have 3 questions:
Is this efficient for arbitrary length numbers? I implemented it in C and tried varying length numbers. The run-time in was near-instant in all cases so it's hard to tell empirically...
Can I apply the Master's Theorem to understand the complexity...?
a = # subproblems in recursion = 1 (max 1 recursive call across all states)
n / b = size of each subproblem = n / 1 -> b = 1 (problem doesn't change size...?)
f(n^d) = work done outside recursive calls = 1 -> d = 0 (the addition when a is odd)
a = 1, b^d = 1, a = b^d -> complexity is in n^d*log(n) = log(n)
this makes sense logically since we are halving the problem at each step, right?
What might my professor mean by providing arbitrary length numbers "as strings". Why do that?
Many thanks in advance

What might my professor mean by providing arbitrary length numbers "as strings". Why do that?
This actually change everything about the problem (and make your algorithm incorrect).
It means than 1234 is provided as 1,2,3,4 and you cannot operate directly on the whole number. You need to analyze your algorithm in terms of #additions, #multiplications, #divisions.
You should expect a division to be a bit more expensive than a multiplication, and a multiplication to be lot more expensive than an addition. So a good algorithm try to reduce the number of divisions and multiplications.
Check out the Karatsuba algorithm, (ps don't copy it that's not what your teacher want) is one of the fastest for this specification.

Add 3): Native integers are limited in how large (or small) numbers they can represent (32- or 64-bit integers for example). To represent arbitrary length numbers you can choose strings, because then you are not really limited by this. The problem is then, of course, that your arithmetic units are not really made to add strings ;-)

Fast exponentiation when only first k digits are required?

This is actually for a programming contest, but I've tried really hard and haven't got even the faintest clue how to do this.
Find the first and last k digits of nm where n and m can be very large ~ 10^9.
For the last k digits I implemented modular exponentiation.
For the first k I thought of using the binomial theorem upto certain powers but that involves quite a lot of computation for factorials and I'm not sure how to find an optimal point at which n^m can be expanded as (x+y)m.
So is there any known method to find the first k digits without performing the entire calculation?
Update 1 <= k <= 9 and k will always be <= digits in nm

not sure, but the identity nm = exp10(m log10(n)) = exp(q (m log(n)/q)) where q = log(10) comes to mind, along with the fact that the first K digits of exp10(x) = the first K digits of exp10(frac(x)) where frac(x) = the fractional part of x = x - floor(x).
To be more explicit: the first K digits of nm are the first K digits of its mantissa = exp(frac(m log(n)/q) * q), where q = log(10).
Or you could even go further in this accounting exercise, and use exp((frac(m log(n)/q)-0.5) * q) * sqrt(10), which also has the same mantissa (+ hence same first K digits) so that the argument of the exp() function is centered around 0 (and between +/- 0.5 log 10 = 1.151) for speedy convergence.
(Some examples: suppose you wanted the first 5 digits of 2100. This equals the first 5 digits of exp((frac(100 log(2)/q)-0.5)*q)*sqrt(10) = 1.267650600228226. The actual value of 2100 is 1.267650600228229e+030 according to MATLAB, I don't have a bignum library handy. For the mantissa of 21,000,000,000 I get 4.612976044195602 but I don't really have a way of checking.... There's a page on Mersenne primes where someone's already done the hard work; 220996011-1 = 125,976,895,450... and my formula gives 1.259768950493908 calculated in MATLAB which fails after the 9th digit.)
I might use Taylor series (for exp and log, not for nm) along with their error bounds, and keep adding terms until the error bounds drop below the first K digits. (normally I don't use Taylor series for function approximation -- their error is optimized to be most accurate around a single point, rather than over a desired interval -- but they do have the advantage that they're mathematically simple, and you can increased accuracy to arbitrary precision simply by adding additional terms)
For logarithms I'd use whatever your favorite approximation is.

Well. We want to calculate and to get only n first digits.
Calculate by the following iterations:
You have .
Calcluate each not exactly.
The thing is that the relative error of is less
than n times relative error of a.
You want to get final relative error less than .
Thus relative error on each step may be .
Remove last digits at each step.
For example, a=2, b=16, n=1. Final relative error is 10^{-n} = 0,1.
Relative error on each step is 0,1/16 > 0,001.
Thus 3 digits is important on each step.
If n = 2, you must save 4 digits.
2 (1), 4 (2), 8 (3), 16 (4), 32 (5), 64 (6), 128 (7), 256 (8), 512 (9), 1024 (10) --> 102,
204 (11), 408 (12), 816 (13), 1632 (14) -> 163, 326 (15), 652 (16).
Answer: 6.
This algorithm has a compexity of O(b). But it is easy to change it to get
O(log b)

Suppose you truncate at each step? Not sure how accurate this would be, but, e.g., take
n=11
m=some large number
and you want the first 2 digits.
recursively:
11 x 11 -> 121, truncate -> 12 (1 truncation or rounding)
then take truncated value and raise again
12 x 11 -> 132 truncate -> 13
repeat,
(132 truncated ) x 11 -> 143.
...
and finally add #0's equivalent to the number of truncations you've done.

Have you taken a look at exponentiation by squaring? You might be able to modify one of the methods such that you only compute what's necessary.
In my last algorithms class we had to implement something similar to what you're doing and I vaguely remember that page being useful.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex