Multiplication using FFT in integer rings

Multiplication using FFT in integer rings - math

I need to multiply long integer numbers with an arbitrary BASE of the digits using FFT in integer rings. Operands are always of length n = 2^k for some k, and the convolution vector has 2n components, therefore I need a 2n'th primitive root of unity.
I'm not particularly concerned with efficiency issues, so I don't want to use Strassen & Schönhage's algorithm - just computing basic convolution, then some carries, and that's nothing else.
Even though it seems simple to many mathematicians, my understanding of algebra is really bad, so I have lots of questions:
What are essential differences or nuances between performing the FFT in integer rings modulo 2^n + 1 (perhaps composite) and in integer FIELDS modulo some prime p?
I ask this because 2 is a (2n)th primitive root of unity in such a ring, because 2^n == -1 (mod 2^n+1). In contrast, integer field would require me to search for such a primitive root.
But maybe there are other nuances which will prevent me from using rings of such a form for the FFT.
If I picked integer rings, what are sufficient conditions for the existence of 2^n-th root of unity in this field?
All other 2^k-th roots of unity of smaller order could be obtained by squaring this root, right?..
What essential restrictions are imposed on the multiplication by the modulo of the ring? Maybe on their length, maybe on the numeric base, maybe even on the numeric types used for multiplication.
I suspect that there may be some loss of information if the coefficients of the convolution are reduced by the modulo operation. Is it true and why?.. What are general conditions that will allow me to avoid this?
Is there any possibility that just primitive-typed dynamic lists (i.e. long) will suffice for FFT vectors, their product and the convolution vector? Or should I transform the coefficients to BigInteger just in case (and what is the "case" when I really should)?
If a general answer to these question takes too long, I would be particularly satisfied by an answer under the following conditions. I've found a table of primitive roots of unity of order up to 2^30 in the field Z_70383776563201:
http://people.cis.ksu.edu/~rhowell/calculator/roots.html
So if I use 2^30th root of unity to multiply numbers of length 2^29, what are the precision/algorithmic/efficiency nuances I should consider?..
Thank you so much in advance!
I am going to award a bounty to the best answer - please consider helping out with some examples.

First, an arithmetic clue about your identity: 70383776563201 = 1 + 65550 * 2^30. And that long number is prime. There's a lot of insight into your modulus on the page How the FFT constants were found.
Here's a fact of group theory you should know. The multiplicative group of integers modulo N is the product of cyclic groups whose orders are determined by the prime factors of N. When N is prime, there's one cycle. The orders of the elements in such a cyclic group, however, are related to the prime factors of N - 1. 70383776563201 - 1 = 2^31 * 3^1 * 5^2 * 11 * 13, and the exponents give the possible orders of elements.
(1) You don't need a primitive root necessarily, you need one whose order is at least large enough. There are some probabilistic algorithms for finding elements of "high" order. They're used in cryptography for ensuring you have strong parameters for keying materials. For numbers of the form 2^n+1 specifically, they've received a lot of factoring attention and you can go look up the results.
(2) The sufficient (and necessary) condition for an element of order 2^n is illustrated by the example modulus. The condition is that some prime factor p of the modulus has to have the property that 2^n | p - 1.
(3) Loss of information only happens when elements aren't multiplicatively invertible, which isn't the case for the cyclic multiplicative group of a prime modulus. If you work in a modular ring with a composite modulus, some elements are not so invertible.
(4) If you want to use arrays of long, you'll be essentially rewriting your big-integer library.

Suppose we need to calculate two n-bit integer multiplication where
n = 2^30;
m = 2*n; p = 2^{n} + 1
Now,
w = 2, x =[w^0,w^1,...w^{m-1}] (mod p).
The issue, for each x[i], it will be too large and we cannot do w*a_i in O(1) time.

Related

Divisibility function in SML

I've been struggling with the basics of functional programming lately. I started writing small functions in SML, so far so good. Although, there is one problem I can not solve. It's on Project Euler (https://projecteuler.net/problem=5) and it simply asks for the smallest natural number that is divisible from all the numbers from 1 - n (where n is the argument of the function I'm trying to build).
Searching for the solution, I've found that through prime factorization, you analyze all the numbers from 1 to 10, and then keep the numbers where the highest power on a prime number occurs (after performing the prime factorization). Then you multiply them and you have your result (eg for n = 10, that number is 2520).
Can you help me on implementing this to an SML function?
Thank you for your time!

Since coding is not a spectator sport, it wouldn't be helpful for me to give you a complete working program; you'd have no way to learn from it. Instead, I'll show you how to get started, and start breaking down the pieces a bit.
Now, Mark Dickinson is right in his comments above that your proposed approach is neither the simplest nor the most efficient; nonetheless, it's quite workable, and plenty efficient enough to solve the Project Euler problem. (I tried it; the resulting program completed instantly.) So, I'll go with it.
To start with, if we're going to be operating on the prime decompositions of positive integers (that is: the results of factorizing them), we need to figure out how we're going to represent these decompositions. This isn't difficult, but it's very helpful to lay out all the details explicitly, so that when we write the functions that use them, we know exactly what assumptions we can make, what requirements we need to satisfy, and so on. (I can't tell you how many times I've seen code-writing attempts where different parts of the program disagree about what the data should look like, because the exact easiest form for one function to work with was a bit different from the exact easiest form for a different function to work with, and it was all done in an ad hoc way without really planning.)
You seem to have in mind an approach where a prime decomposition is a product of primes to the power of exponents: for example, 12 = 22 × 31. The simplest way to represent that in Standard ML is as a list of pairs: [(2,2),(3,1)]. But we should be a bit more precise than this; for example, we don't want 12 to sometimes be [(2,2),(3,1)] and sometimes [(3,1),(2,2)] and sometimes [(3,1),(5,0),(2,2)]. So, we can say something like "The prime decomposition of a positive integer is represented as a list of prime–exponent pairs, with the primes all being positive primes (2,3,5,7,…), the exponents all being positive integers (1,2,3,…), and the primes all being distinct and arranged in increasing order." This ensures a unique, easy-to-work-with representation. (N.B. 1 is represented by the empty list, nil.)
By the way, I should mention — when I tried this out, I found that everything was a little bit simpler if instead of storing exponents explicitly, I just repeated each prime the appropriate number of times, e.g. [2,2,3] for 12 = 2 × 2 × 3. (There was no single big complication with storing exponents explicitly, it just made a lot of little things a bit more finicky.) But the below breakdown is at a high level, and applies equally to either representation.
So, the overall algorithm is as follows:
Generate a list of the integers from 1 to 10, or 1 to 20.
This part is optional; you can just write the list by hand, if you want, so as to jump into the meatier part faster. But since your goal is to learn the basics of functional programming, you might as well do this using List.tabulate [documentation].
Use this to generate a list of the prime decompositions of these integers.
Specifically: you'll want to write a factorize or decompose function that takes a positive integer and returns its prime decomposition. You can then use map, a.k.a. List.map [documentation], to apply this function to each element of your list of integers.
Note that this decompose function will need to keep track of the "next" prime as it's factoring the integer. In some languages, you would use a mutable local variable for this; but in Standard ML, the normal approach is to write a recursive helper function with a parameter for this purpose. Specifically, you can write a function helper such that, if n and p are positive integers, p ≥ 2, where n is not divisible by any prime less than p, then helper n p is the prime decomposition of n. Then you just write
local
fun helper n p = ...
in
fun decompose n = helper n 2
end
Use this to generate the prime decomposition of the least common multiple of these integers.
To start with, you'll probably want to write a lcmTwoDecompositions function that takes a pair of prime decompositions, and computes the least common multiple (still in prime-decomposition form). (Writing this pairwise function is much, much easier than trying to create a multi-way least-common-multiple function from scratch.)
Using lcmTwoDecompositions, you can then use foldl or foldr, a.k.a. List.foldl or List.foldr [documentation], to create a function that takes a list of zero or more prime decompositions instead of just a pair. This makes use of the fact that the least common multiple of { n1, n2, …, nN } is lcm(n1, lcm(n2, lcm(…, lcm(nN, 1)…))). (This is a variant of what Mark Dickinson mentions above.)
Use this to compute the least common multiple of these integers.
This just requires a recompose function that takes a prime decomposition and computes the corresponding integer.

How to perform mathematical operations on large numbers

I have a question about working on very big numbers. I'm trying to run RSA algorithm and lets's pretend i have 512 bit number d and 1024 bit number n. decrypted_word = crypted_word^d mod n, isn't it? But those d and n are very large numbers! Non of standard variable types can handle my 512 bit numbers. Everywhere is written, that rsa needs 512 bit prime number at last, but how actually can i perform any mathematical operations on such a number?
And one more think. I can't use extra libraries. I generate my prime numbers with java, using BigInteger, but on my system, i have only basic variable types and STRING256 is the biggest.

Suppose your maximal integer size is 64 bit. Strings are not that useful for doing math in most languages, so disregard string types. Now choose an integer of half that size, i.e. 32 bit. An array of these can be interpreted as digits of a number in base 232. With these, you can do long addition and multiplication, just like you are used to with base 10 and pen and paper. In each elementary step, you combine two 32-bit quantities, to produce both a 32-bit result and possibly some carry. If you do the elementary operation in 64-bit arithmetic, you'll have both of these as part of a single 64-bit variable, which you'll then have to split into the 32-bit result digit (via bit mask or simple truncating cast) and the remaining carry (via bit shift).
Division is harder. But if the divisor is known, then you may get away with doing a division by constant using multiplication instead. Consider an example: division by 7. The inverse of 7 is 1/7=0.142857…. So you can multiply by that to obtain the same result. Obviously we don't want to do any floating point math here. But you can also simply multiply by 14286 then omit the last six digits of the result. This will be exactly the right result if your dividend is small enough. How small? Well, you compute x/7 as x*14286/100000, so the error will be x*(14286/100000 - 1/7)=x/350000 so you are on the safe side as long as x<350000. As long as the modulus in your RSA setup is known, i.e. as long as the key pair remains the same, you can use this approach to do integer division, and can also use that to compute the remainder. Remember to use base 232 instead of base 10, though, and check how many digits you need for the inverse constant.
There is an alternative you might want to consider, to do modulo reduction more easily, perhaps even if n is variable. Instead of expressing your remainders as numbers 0 through n-1, you could also use 21024-n through 21024-1. So if your initial number is smaller than 21024-n, you add n to convert to this new encoding. The benefit of this is that you can do the reduction step without performing any division at all. 21024 is equivalent to 21024-n in this setup, so an elementary modulo reduction would start by splitting some number into its lower 1024 bits and its higher rest. The higher rest will be right-shifted by 1024 bits (which is just a change in your array indexing), then multiplied by 21024-n and finally added to the lower part. You'll have to do this until you can be sure that the result has no more than 1024 bits. How often that is depends on n, so for fixed n you can precompute that (and for large n I'd expect it to be two reduction steps after addition but hree steps after multiplication, but please double-check that) whereas for variable n you'll have to check at runtime. At the very end, you can go back to the usual representation: if the result is not smaller than n, subtract n. All of this should work as described if n>2512. If not, i.e. if the top bit of your modulus is zero, then you might have to make further adjustments. Haven't thought this through, since I only used this approach for fixed moduli close to a power of two so far.
Now for that exponentiation. I very much suggest you do the binary approach for that. When computing xd, you start with x, x2=x*x, x4=x2*x2, x8=…, i.e. you compute all power-of-two exponents. You also maintain some intermediate result, which you initialize to one. In every step, if the corresponding bit is set in the exponent d, then you multiply the corresponding power into that intermediate result. So let's say you have d=11. Then you'd compute 1*x1*x2*x8 because d=11=1+2+8=10112. That way, you'll need only about 1024 multiplications max if your exponent has 512 bits. Half of them for the powers-of-two exponentiation, the other to combine the right powers of two. Every single multiplication in all of this should be immediately followed by a modulo reduction, to keep memory requirements low.
Note that the speed of the above exponentiation process will, in this simple form, depend on how many bits in d are actually set. So this might open up a side channel attack which might give an attacker access to information about d. But if you are worried about side channel attacks, then you really should have an expert develop your implementation, because I guess there might be more of those that I didn't think about.

You may write some macros you may execute under Microsoft for functions like +, -, x, /, modulo, x power y which work generally for any integer of less than ten or hundred thousand digits (the practical --not theoretical-- limit being the internal memory of your CPU). Please note the logic is exactly the same as the one you got at elementary school.
E.g.: p= 1819181918953471 divider of (2^8091) - 1, q = ((2^8091) - 1)/p, mod(2^8043 ; q ) = 23322504995859448929764248735216052746508873363163717902048355336760940697615990871589728765508813434665732804031928045448582775940475126837880519641309018668592622533434745187004918392715442874493425444385093718605461240482371261514886704075186619878194235490396202667733422641436251739877125473437191453772352527250063213916768204844936898278633350886662141141963562157184401647467451404036455043333801666890925659608198009284637923691723589801130623143981948238440635691182121543342187092677259674911744400973454032209502359935457437167937310250876002326101738107930637025183950650821770087660200075266862075383130669519130999029920527656234911392421991471757068187747362854148720728923205534341236146499449910896530359729077300366804846439225483086901484209333236595803263313219725469715699546041162923522784170350104589716544529751439438021914727772620391262534105599688603950923321008883179433474898034318285889129115556541479670761040388075352934137326883287245821888999474421001155721566547813970496809555996313854631137490774297564881901877687628176106771918206945434350873509679638109887831932279470631097604018939855788990542627072626049281784152807097659485238838560958316888238137237548590528450890328780080286844038796325101488977988549639523988002825055286469740227842388538751870971691617543141658142313059934326924867846151749777575279310394296562191530602817014549464614253886843832645946866466362950484629554258855714401785472987727841040805816224413657036499959117701249028435191327757276644272944743479296268749828927565559951441945143269656866355210310482235520220580213533425016298993903615753714343456014577479225435915031225863551911605117029393085632947373872635330181718820669836830147312948966028682960518225213960218867207825417830016281036121959384707391718333892849665248512802926601676251199711698978725399048954325887410317060400620412797240129787158839164969382498537742579233544463501470239575760940937130926062252501116458281610468726777710383038372260777522143500312913040987942762244940009811450966646527814576364565964518092955053720983465333258335601691477534154940549197873199633313223848155047098569827560014018412679602636286195283270106917742919383395056306107175539370483171915774381614222806960872813575048014729965930007408532959309197608469115633821869206793759322044599554551057140046156235152048507130125695763956991351137040435703946195318000567664233417843805257728.
The last step took about 0.1 sec.
wpjo (willibrord oomen on academia.edu)

Big-O running time for functions

Find the big-O running time for each of these functions:
T(n) = T(n - 2) + n²
Our Answers: n², n³
T(n) = 3T(n/2) + n
Our Answers: O(n log n), O(nlog₂3)
T(n) = 2T(n/3) + n
Our Answers: O(n log base 3 of n), O(n)
T(n) = 2T(n/2) + n^3
Our Answers: O(n³ log₂n), O(n³)
So we're having trouble deciding on the right answers for each of the questions.
We all got different results and would like an outside opinion on what the running time would be.
Thanks in advance.

A bit of clarification:
The functions in the questions appear to be running time functions as hinted by their T() name and their n parameter. A more subtle hint is the fact that they are all recursive and recursive functions are, alas, a common occurrence when one produces a function to describe the running time of an algorithm (even when the algorithm itself isn't formally using recursion). Indeed, recursive formulas are a rather inconvenient form and that is why we use the Big O notation to better summarize the behavior of an algorithm.
A running time function is a parametrized mathematical expression which allows computing a [sometimes approximate] relative value for the running time of an algorithm, given specific value(s) for the parameter(s). As is the case here, running time functions typically have a single parameter, often named n, and corresponding to the total number of items the algorithm is expected to work on/with (for e.g. with a search algorithm it could be the total number of records in a database, with a sort algorithm it could be the number of entries in the unsorted list and for a path finding algorithm, the number of nodes in the graph....). In some cases a running time function may have multiple arguments, for example, the performance of an algorithm performing some transformation on a graph may be bound to both the total number of nodes and the total number of vertices or the average number of connections between two nodes, etc.
The task at hand (for what appears to be homework, hence my partial answer), is therefore to find a Big O expression that qualifies the upper bound limit of each of running time functions, whatever the underlying algorithm they may correspond to. The task is not that of finding and qualifying an algorithm to produce the results of the functions (this second possibility is also a very common type of exercise in Algorithm classes of a CS cursus but is apparently not what is required here.)
The problem is therefore more one of mathematics than of Computer Science per se. Basically one needs to find the limit (or an approximation thereof) of each of these functions as n approaches infinity.
This note from Prof. Jeff Erikson at University of Illinois Urbana Champaign provides a good intro to solving recurrences.
Although there are a few shortcuts to solving recurrences, particularly if one has with a good command of calculus, a generic approach is to guess the answer and then to prove it by induction. Tools like Excel, a few snippets in a programming languages such as Python or also MATLAB or Sage can be useful to produce tables of the first few hundred values (or beyond) along with values such as n^2, n^3, n! as well as ratios of the terms of the function; these tables often provide enough insight into the function to find the closed form of the function.
A few hints regarding the answers listed in the question:
Function a)
O(n^2) is for sure wrong:
a quick inspection of the first few values in the sequence show that n^2 is increasingly much smaller than T(n)
O(n^3) on the other hand appears to be systematically bigger than T(n) as n grows towards big numbers. A closer look shows that O(n^3) is effectively the order of the Big O notation for this function, but that O(n^3 / 6) is a more precise notation which systematically exceed the value of T(n) [for bigger values of n, and/or as n tends towards infinity] but only by a minute fraction compared with the coarser n^3 estimate.
One can confirm that O(n^3 / 6) is it, by induction:
T(n) = T(n-2) + n^2 // (1) by definition
T(n) = n^3 / 6 // (2) our "guess"
T(n) = ((n - 2)^3 / 6) + n^2 // by substitution of T(n-2) by the (2) expression
= (n^3 - 2n^2 -4n^2 -8n + 4n - 8) / 6 + 6n^2 / 6
= (n^3 - 4n -8) / 6
= n^3/6 - 2n/3 - 4/3
~= n^3/6 // as n grows towards infinity, the 2n/3 and 4/3 factors
// become relatively insignificant, leaving us with the
// (n^3 / 6) limit expression, QED

RSA - bitlength of p and q

I'm just trying to understand the key generation part of RSA, and more specifically, selecting the p and q primes. Given a target bit length for the modulus, n, what range I should be generating p and q in?
The modulus, n, is the product of p and q, where p and q are both prime numbers. I've read that p and q should be relatively close to each other, and somewhere around sqrt(n). If the target bit length is, for example, 32 bits (very small I realise), then does that follow that p and q should be a random prime of a maximum 16 bits?
Thanks for any clarification
Rob

For a 32-bit modulus the question is a bit academic: your primary aim in choosing p and q is to make the product hard to factorize, but finding the prime factorisation of a number smaller than 2^32 is so easy that there's little point worrying about the sizes of p and q in this case. Note that the mathematics will work just fine so long as p and q are distinct primes.
For something more realistic, like a 1024-bit modulus, then yes, you're pretty safe choosing two 512-bit primes p and q at random: that is, choose p and q uniformly from the set of all primes in the range [2^511, 2^512]. There's a notion of 'strong primes', which are primes designed to avoid particular possible known attacks---for example, you'll see recommendations that p and q should be chosen so that p-1 and q-1 have large factors, to guard against easy factorizations using Pollard's 'p-1' algorithm. However, these recommendations don't really apply to large moduli and state-of-the-art factorization algorithms (GNFS, ECM). There are other possible cases that in theory could give an easy factorization, but they're so unlikely to turn up in practice from random choices of p and q that they're not worth worrying about.
Summary: just choose two random primes with equal bitlength, and you're done.
A couple of additional comments and things to think about:
Of course, if you do choose two 512-bit primes, you'll end up with either a 1023-bit or a 1024-bit modulus; that's probably not worth worrying about, but if you really cared about getting exactly a 1024-bit modulus you could either restrict the range of p and q further, say to [1.5 * 2^511, 2^512], or just throw out any 1023-bit modulus and try again.
Don't deliberately choose p and q so that they're near each other: if p and q are truly close to each other (e.g., less than 10^10 apart, say), then their product pq is easily factorized by Fermat's method. But if you're choosing random primes p and q in the range [2^511, 2^512], this isn't going to happen with any sort of realistic probability.
When choosing a prime at random, a tempting strategy is to pick a random (odd) integer in the range [2^511, 2^512] and then increment it until you find the first prime. But note that that does not give a uniform choice amongst all primes: primes occurring after a large gap would be more likely to come up than other primes. A better strategy is just to keep picking random odd numbers and keep the first one that's a prime (or more likely, a strong probable prime to so many randomly-chosen bases that you can be sure in practice that it's prime).
Make sure you've got a really good cryptographic source of random numbers on hand for your prime number generation.

As a programmer how would you explain imaginary numbers?

As a programmer I think it is my job to be good at math but I am having trouble getting my head round imaginary numbers. I have tried google and wikipedia with no luck so I am hoping a programmer can explain in to me, give me an example of a number squared that is <= 0, some example usage etc...

I guess this blog entry is one good explanation:
The key word is rotation (as opposed to direction for negative numbers, which are as stranger as imaginary number when you think of them: less than nothing ?)
Like negative numbers modeling flipping, imaginary numbers can model anything that rotates between two dimensions “X” and “Y”. Or anything with a cyclic, circular relationship

Problem: not only am I a programmer, I am a mathematician.
Solution: plow ahead anyway.
There's nothing really magical to complex numbers. The idea behind their inception is that there's something wrong with real numbers. If you've got an equation x^2 + 4, this is never zero, whereas x^2 - 2 is zero twice. So mathematicians got really angry and wanted there to always be zeroes with polynomials of degree at least one (wanted an "algebraically closed" field), and created some arbitrary number j such that j = sqrt(-1). All the rules sort of fall into place from there (though they are more accurately reorganized differently-- specifically, you formally can't actually say "hey this number is the square root of negative one"). If there's that number j, you can get multiples of j. And you can add real numbers to j, so then you've got complex numbers. The operations with complex numbers are similar to operations with binomials (deliberately so).
The real problem with complexes isn't in all this, but in the fact that you can't define a system whereby you can get the ordinary rules for less-than and greater-than. So really, you get to where you don't define it at all. It doesn't make sense in a two-dimensional space. So in all honesty, I can't actually answer "give me an exaple of a number squared that is <= 0", though "j" makes sense if you treat its square as a real number instead of a complex number.
As for uses, well, I personally used them most when working with fractals. The idea behind the mandelbrot fractal is that it's a way of graphing z = z^2 + c and its divergence along the real-imaginary axes.

You might also ask why do negative numbers exist? They exist because you want to represent solutions to certain equations like: x + 5 = 0. The same thing applies for imaginary numbers, you want to compactly represent solutions to equations of the form: x^2 + 1 = 0.
Here's one way I've seen them being used in practice. In EE you are often dealing with functions that are sine waves, or that can be decomposed into sine waves. (See for example Fourier Series).
Therefore, you will often see solutions to equations of the form:
f(t) = A*cos(wt)
Furthermore, often you want to represent functions that are shifted by some phase from this function. A 90 degree phase shift will give you a sin function.
g(t) = B*sin(wt)
You can get any arbitrary phase shift by combining these two functions (called inphase and quadrature components).
h(t) = Acos(wt) + iB*sin(wt)
The key here is that in a linear system: if f(t) and g(t) solve an equation, h(t) will also solve the same equation. So, now we have a generic solution to the equation h(t).
The nice thing about h(t) is that it can be written compactly as
h(t) = Cexp(wt+theta)
Using the fact that exp(iw) = cos(w)+i*sin(w).
There is really nothing extraordinarily deep about any of this. It is merely exploiting a mathematical identity to compactly represent a common solution to a wide variety of equations.

Well, for the programmer:
class complex {
public:
double real;
double imaginary;
complex(double a_real) : real(a_real), imaginary(0.0) { }
complex(double a_real, double a_imaginary) : real(a_real), imaginary(a_imaginary) { }
complex operator+(const complex &other) {
return complex(
real + other.real,
imaginary + other.imaginary);
}
complex operator*(const complex &other) {
return complex(
real*other.real - imaginary*other.imaginary,
real*other.imaginary + imaginary*other.real);
}
bool operator==(const complex &other) {
return (real == other.real) && (imaginary == other.imaginary);
}
};
That's basically all there is. Complex numbers are just pairs of real numbers, for which special overloads of +, * and == get defined. And these operations really just get defined like this. Then it turns out that these pairs of numbers with these operations fit in nicely with the rest of mathematics, so they get a special name.
They are not so much numbers like in "counting", but more like in "can be manipulated with +, -, *, ... and don't cause problems when mixed with 'conventional' numbers". They are important because they fill the holes left by real numbers, like that there's no number that has a square of -1. Now you have complex(0, 1) * complex(0, 1) == -1.0 which is a helpful notation, since you don't have to treat negative numbers specially anymore in these cases. (And, as it turns out, basically all other special cases are not needed anymore, when you use complex numbers)

If the question is "Do imaginary numbers exist?" or "How do imaginary numbers exist?" then it is not a question for a programmer. It might not even be a question for a mathematician, but rather a metaphysician or philosopher of mathematics, although a mathematician may feel the need to justify their existence in the field. It's useful to begin with a discussion of how numbers exist at all (quite a few mathematicians who have approached this question are Platonists, fyi). Some insist that imaginary numbers (as the early Whitehead did) are a practical convenience. But then, if imaginary numbers are merely a practical convenience, what does that say about mathematics? You can't just explain away imaginary numbers as a mere practical tool or a pair of real numbers without having to account for both pairs and the general consequences of them being "practical". Others insist in the existence of imaginary numbers, arguing that their non-existence would undermine physical theories that make heavy use of them (QM is knee-deep in complex Hilbert spaces). The problem is beyond the scope of this website, I believe.
If your question is much more down to earth e.g. how does one express imaginary numbers in software, then the answer above (a pair of reals, along with defined operations of them) is it.

I don't want to turn this site into math overflow, but for those who are interested: Check out "An Imaginary Tale: The Story of sqrt(-1)" by Paul J. Nahin. It talks about all the history and various applications of imaginary numbers in a fun and exciting way. That book is what made me decide to pursue a degree in mathematics when I read it 7 years ago (and I was thinking art). Great read!!

The main point is that you add numbers which you define to be solutions to quadratic equations like x2= -1. Name one solution to that equation i, the computation rules for i then follow from that equation.
This is similar to defining negative numbers as the solution of equations like 2 + x = 1 when you only knew positive numbers, or fractions as solutions to equations like 2x = 1 when you only knew integers.

It might be easiest to stop trying to understand how a number can be a square root of a negative number, and just carry on with the assumption that it is.
So (using the i as the square root of -1):
(3+5i)*(2-i)
= (3+5i)*2 + (3+5i)*(-i)
= 6 + 10i -3i - 5i * i
= 6 + (10 -3)*i - 5 * (-1)
= 6 + 7i + 5
= 11 + 7i
works according to the standard rules of maths (remembering that i squared equals -1 on line four).

An imaginary number is a real number multiplied by the imaginary unit i. i is defined as:
i == sqrt(-1)
So:
i * i == -1
Using this definition you can obtain the square root of a negative number like this:
sqrt(-3)
== sqrt(3 * -1)
== sqrt(3 * i * i) // Replace '-1' with 'i squared'
== sqrt(3) * i // Square root of 'i squared' is 'i' so move it out of sqrt()
And your final answer is the real number sqrt(3) multiplied by the imaginary unit i.

A short answer: Real numbers are one-dimensional, imaginary numbers add a second dimension to the equation and some weird stuff happens if you multiply...

If you're interested in finding a simple application and if you're familiar with matrices,
it's sometimes useful to use complex numbers to transform a perfectly real matrice into a triangular one in the complex space, and it makes computation on it a bit easier.
The result is of course perfectly real.

Great answers so far (really like Devin's!)
One more point:
One of the first uses of complex numbers (although they were not called that way at the time) was as an intermediate step in solving equations of the 3rd degree.
link
Again, this is purely an instrument that is used to answer real problems with real numbers having physical meaning.

In electrical engineering, the impedance Z of an inductor is jwL, where w = 2*pi*f (frequency) and j (sqrt(-1))means it leads by 90 degrees, while for a capacitor Z = 1/jwc = -j/wc which is -90deg/wc so that it lags a simple resistor by 90 deg.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex