Decimal expansions using lazy sequences in Clojure - math

Is there a package which represents decimal expansions in Clojure using lazy sequences?
For example, syntax like
(defn r `(B N x_1 x_2 x_3 ...))
could represent a real number r in base B, with decimal expansion (in math notation)
r = N . x_1 x_2 x_3 ...
with integer significand N and decimal digits 0 ≤ x_i ≤ B-1.
If the type were "smart" enough, it could handle different decimal expansions of real numbers as valid inputs, such as (10 0 9 9 9 ...) and (10 1), and consistently output decimal expansions in the latter form. It should also be able to handle overflowing digits, like reducing (10 0 15) to (10 1 5).
Is there any obstruction to working with a lazy-sequence representation of real numbers instead of the usual decimal expansion? I don't know how efficient it would be in contrast to floating-point, but it would be convenient for doing rigorous precise arithmetic involving real numbers. For example, I think there are algorithms which recursively compute the decimal expansions of π and e.

TL;DR
The short answer is that no, there is no such library and I doubt that there will ever be one. It is possible to compute numbers to accuracy greater than IEEE double precision, but to do so by representation as a sequence of single digits is immensely wasteful in terms of memory and impossible to do entirely lazily in general case. For instance, compute (+ '(0 9 0 ... ) '(0 9 1 ...)) lazily by terms.
The Long Version
When "computing" (approximating) the value of a real number or expression to machine precision, the operation computed is the taylor series expansion of the desired expression to N terms, until that the value of the N+1th term is less than machine precision at which point the approximation is aborted because the hardware convention cannot represent more information.
Typically you will only see the 32 and 64 bit IEEE floating point standards, however the IEEE floating point specification extends out to a whopping 128 bits of representation.
For the sake of argument, let's assume that someone extends clojure.core.math to have some representation arbitrary-precision-number, being a software floating point implementation against a backing ByteArray which through a protocol appears for all intents and purposes to be a normal java.lang.Number. All that this representation achieves is to push the machine epsilon (representational error limit) out even lower than the 5x10e-16 bound offered by IEEE DOUBLE/64. Building such a software floating point system is entirely viable and relatively well explored. However I am not aware of a Java/Clojure implementation thereof.
True arbitrary precision is not possible because we have finite memory machines to build upon, therefore at some point we must compromise on performance, memory and precision. Given some library which can correctly and generally represent an arbitrary taylor series evaluation as a sequence of decimal digits at some point, I claim that the overwhemling majority of operations on such arbitrary numbers will be truncated to some precision P either due to the need to perform comparison against a fixed precision representation such as a float or double because they are the industry standards for floating point representation.
To blow this well and truly out of the water, at a distance of 1 light-year an angular deviation of 1e-100 degrees would result in a navigational error of approximately 1.65117369558e-86 meteres. This means that the existing machine epsilon of 5x10e-16 with IEEE DOUBLE/64 is entirely acceptable even for interstellar navigation.
As you mentioned computing the decimal terms of Pi or other interesting series as a lazy sequence, here one could achieve headway only because the goal is the representation and investigation of a series/sequence rather than the addition, subtraction, multiplication and soforth between two or more such representations.

Related

Is it possible to remove the max value of a number

I have a mathematical formula which generates prime numbers. The numbers grow exponentially, and in 7 iterations the value hits inf and then nan.
Is there a way to remove those limits or is there a language that doesn't have limits?
Many languages such as Python 3 can handle arbitrarily large integers (limited only by RAM), so you can certainly play around with integers having thousands of digits. For example, it took less than a second to compute 10,000! = 284625968091705451890641321211... (with 35,660 digits hidden in the ...). In most languages, floating point numbers tend to be limited to what you can represent with 64 bits, though there are various libraries for arbitrary-precision floating point numbers. In no case can you exceed all limits.
If you are using C or C++ the GNU MP Bignum Library allows you to do arbitrary precision integer and floating point arithmetic.

What determines which system is used to translate a base 10 number to decimal and vice-versa?

There are a lot of ways to store a given number in a computer. This site lists 5
unsigned
sign magnitude
one's complement
two's complement
biased (not commonly known)
I can think of another. Encode everything in Ascii and write the number with the negative sign (45) and period (46) if needed.
I'm not sure if I'm mixing apples and oranges but today I heard how computers store numbers using single and double precision floating point format. In this everything is written as a power of 2 multiplied by a fraction. This means numbers that aren't powers of 2 like 9 are written as a power of 2 multiplied by a fraction e.g. 9 ➞ 16*9/16. Is that correct?
Who decides which system is used? Is it up to the hardware of the computer or the program? How do computer algebra systems handle transindental numbers like π on a finite machine? It seems like things would be a lot easier if everything's coded in Ascii and the negative sign and the decimal is placed accordingly e.g. -15.2 would be 45 49 53 46 (to base 10)
➞
111000 110001 110101 101110
Well there are many questions here.
The main reason why the system you imagined is bad, is because the lack of entropy. An ASCII character is 8 bits, so instead of 2^32 possible integers, you could represent only 4 characters on 32 bits, so 10000 integer values (+ 1000 negative ones if you want). Even if you reduce to 12 codes (0-9, -, .) you still need 4 bits to store them. So, 10^8+10^7 integer values, still much less than 2^32 (remember, 2^10 ~ 10^3). Using binary is optimal, because our bits only have 2 values. Any base that is a power of 2 also makes sense, hence octal and hex -- but ultimately they're just binary with bits packed per 3 or 4 for readability. If you forget about the sign (just use one bit) and the decimal separator, you get BCD : Binary Coded Decimals, which are usually coded on 4 bits per digit though a version on 8 bits called uncompressed BCD also seems to exist. I'm sure with a bit of research you can find fixed or floating point numbers using BCD.
Putting the sign in front is exactly sign magnitude (without the entropy problem, since it has a constant size of 1 bit).
You're roughly right on the fraction in floating point numbers. These numbers are written with a mantissa m and an exponent e, and their value is m 2^e. If you represent an integer that way, say 8, it would be 1x2^3, then the fraction is 1 = 8/2^3. With 9 that fraction is not exactly representable, so instead of 1 we write the closest number we can with the available bits. That is what we do as well with irrational (and thus transcendental) numbers like Pi : we approximate.
You're not solving anything with this system, even for floating point values. The denominator is going to be a power of 10 instead of a power of 2, which seems more natural to you, because it is the usual way we write rounded numbers, but is not in any way more valid or more accurate. ** Take 1/6 for example, you cannot represent it with a finite number of digits in the form a/10^b. *
The most popular representations for negative numbers is 2's complement, because of its nice properties when adding negative and positive numbers.
Standards committees (argue a lot internally and eventually) decide what complex number formats like floating points look like, and how to consistently treat corner cases. E.g. should dividing by 0 yield NaN ? Infinity ? An exception ? You should check out the IEEE : www.ieee.org . Some committees are not even agreeing yet, for example on how to represent intervals for interval arithmetic. Eventually it's the people who make the processors who get the final word on how bits are interpreted into a number. But sticking to standards allows for portability and compatibility between different processors (or coprocessors, what if your GPU used a different number format ? You'd have more to do than just copy data around).
Many alternatives to floating point values exist, like fixed point or arbitrary precision numbers, logarithmic number systems, rational arithmetic...
* Since 2 divides 10, you might argue that all the numbers representable by a/2^b can be a5^b/10^b, so less numbers need to be approximated. That only covers a minuscule family (an ideal, really) of the rational numbers, which are an infinite set of numbers. So it still doesn't solve the need for approximations for many rational, as well as all irrational numbers (as Pi).
** In fact, because of the fact that we use the powers of 2 we pack more significant digits after the decimal separator than we would with powers of 10 (for a same number of bits). That is, 2^-(53+e), the smallest bit of the mantissa of a double with exponent e, is much smaller than what you can reach with 53 bits of ASCII or 4-bit base 10 digits : at best 10^-4 * 2^-e

How to perform mathematical operations on large numbers

I have a question about working on very big numbers. I'm trying to run RSA algorithm and lets's pretend i have 512 bit number d and 1024 bit number n. decrypted_word = crypted_word^d mod n, isn't it? But those d and n are very large numbers! Non of standard variable types can handle my 512 bit numbers. Everywhere is written, that rsa needs 512 bit prime number at last, but how actually can i perform any mathematical operations on such a number?
And one more think. I can't use extra libraries. I generate my prime numbers with java, using BigInteger, but on my system, i have only basic variable types and STRING256 is the biggest.
Suppose your maximal integer size is 64 bit. Strings are not that useful for doing math in most languages, so disregard string types. Now choose an integer of half that size, i.e. 32 bit. An array of these can be interpreted as digits of a number in base 232. With these, you can do long addition and multiplication, just like you are used to with base 10 and pen and paper. In each elementary step, you combine two 32-bit quantities, to produce both a 32-bit result and possibly some carry. If you do the elementary operation in 64-bit arithmetic, you'll have both of these as part of a single 64-bit variable, which you'll then have to split into the 32-bit result digit (via bit mask or simple truncating cast) and the remaining carry (via bit shift).
Division is harder. But if the divisor is known, then you may get away with doing a division by constant using multiplication instead. Consider an example: division by 7. The inverse of 7 is 1/7=0.142857…. So you can multiply by that to obtain the same result. Obviously we don't want to do any floating point math here. But you can also simply multiply by 14286 then omit the last six digits of the result. This will be exactly the right result if your dividend is small enough. How small? Well, you compute x/7 as x*14286/100000, so the error will be x*(14286/100000 - 1/7)=x/350000 so you are on the safe side as long as x<350000. As long as the modulus in your RSA setup is known, i.e. as long as the key pair remains the same, you can use this approach to do integer division, and can also use that to compute the remainder. Remember to use base 232 instead of base 10, though, and check how many digits you need for the inverse constant.
There is an alternative you might want to consider, to do modulo reduction more easily, perhaps even if n is variable. Instead of expressing your remainders as numbers 0 through n-1, you could also use 21024-n through 21024-1. So if your initial number is smaller than 21024-n, you add n to convert to this new encoding. The benefit of this is that you can do the reduction step without performing any division at all. 21024 is equivalent to 21024-n in this setup, so an elementary modulo reduction would start by splitting some number into its lower 1024 bits and its higher rest. The higher rest will be right-shifted by 1024 bits (which is just a change in your array indexing), then multiplied by 21024-n and finally added to the lower part. You'll have to do this until you can be sure that the result has no more than 1024 bits. How often that is depends on n, so for fixed n you can precompute that (and for large n I'd expect it to be two reduction steps after addition but hree steps after multiplication, but please double-check that) whereas for variable n you'll have to check at runtime. At the very end, you can go back to the usual representation: if the result is not smaller than n, subtract n. All of this should work as described if n>2512. If not, i.e. if the top bit of your modulus is zero, then you might have to make further adjustments. Haven't thought this through, since I only used this approach for fixed moduli close to a power of two so far.
Now for that exponentiation. I very much suggest you do the binary approach for that. When computing xd, you start with x, x2=x*x, x4=x2*x2, x8=…, i.e. you compute all power-of-two exponents. You also maintain some intermediate result, which you initialize to one. In every step, if the corresponding bit is set in the exponent d, then you multiply the corresponding power into that intermediate result. So let's say you have d=11. Then you'd compute 1*x1*x2*x8 because d=11=1+2+8=10112. That way, you'll need only about 1024 multiplications max if your exponent has 512 bits. Half of them for the powers-of-two exponentiation, the other to combine the right powers of two. Every single multiplication in all of this should be immediately followed by a modulo reduction, to keep memory requirements low.
Note that the speed of the above exponentiation process will, in this simple form, depend on how many bits in d are actually set. So this might open up a side channel attack which might give an attacker access to information about d. But if you are worried about side channel attacks, then you really should have an expert develop your implementation, because I guess there might be more of those that I didn't think about.
You may write some macros you may execute under Microsoft for functions like +, -, x, /, modulo, x power y which work generally for any integer of less than ten or hundred thousand digits (the practical --not theoretical-- limit being the internal memory of your CPU). Please note the logic is exactly the same as the one you got at elementary school.
E.g.: p= 1819181918953471 divider of (2^8091) - 1, q = ((2^8091) - 1)/p, mod(2^8043 ; q ) = 23322504995859448929764248735216052746508873363163717902048355336760940697615990871589728765508813434665732804031928045448582775940475126837880519641309018668592622533434745187004918392715442874493425444385093718605461240482371261514886704075186619878194235490396202667733422641436251739877125473437191453772352527250063213916768204844936898278633350886662141141963562157184401647467451404036455043333801666890925659608198009284637923691723589801130623143981948238440635691182121543342187092677259674911744400973454032209502359935457437167937310250876002326101738107930637025183950650821770087660200075266862075383130669519130999029920527656234911392421991471757068187747362854148720728923205534341236146499449910896530359729077300366804846439225483086901484209333236595803263313219725469715699546041162923522784170350104589716544529751439438021914727772620391262534105599688603950923321008883179433474898034318285889129115556541479670761040388075352934137326883287245821888999474421001155721566547813970496809555996313854631137490774297564881901877687628176106771918206945434350873509679638109887831932279470631097604018939855788990542627072626049281784152807097659485238838560958316888238137237548590528450890328780080286844038796325101488977988549639523988002825055286469740227842388538751870971691617543141658142313059934326924867846151749777575279310394296562191530602817014549464614253886843832645946866466362950484629554258855714401785472987727841040805816224413657036499959117701249028435191327757276644272944743479296268749828927565559951441945143269656866355210310482235520220580213533425016298993903615753714343456014577479225435915031225863551911605117029393085632947373872635330181718820669836830147312948966028682960518225213960218867207825417830016281036121959384707391718333892849665248512802926601676251199711698978725399048954325887410317060400620412797240129787158839164969382498537742579233544463501470239575760940937130926062252501116458281610468726777710383038372260777522143500312913040987942762244940009811450966646527814576364565964518092955053720983465333258335601691477534154940549197873199633313223848155047098569827560014018412679602636286195283270106917742919383395056306107175539370483171915774381614222806960872813575048014729965930007408532959309197608469115633821869206793759322044599554551057140046156235152048507130125695763956991351137040435703946195318000567664233417843805257728.
The last step took about 0.1 sec.
wpjo (willibrord oomen on academia.edu)

Truncating 64-bit IEEE doubles to 61-bits in a safe fashion

I am developing a programming language, September, which uses a tagged variant type as its main value type. 3 bits are used for the type (integer, string, object, exception, etc.), and 61 bits are used for the actual value (the actual integer, pointer to the object, etc.).
Soon, it will be time to add a float type to the language. I almost have the space for a 64-bit double, so I wanted to make use of doubles for calculations internally. Since I'm actually 3 bits short for storage, I would have to round the doubles off after each calculation - essentially resulting in a 61-bit double with a mantissa or exponent shorter by 3 bits.
But! I know floating point is fraught with peril and doing things which sound sensible on paper can produce disastrous results with FP math, so I have an open-ended question to the experts out there:
Is this approach viable at all? Will I run into serious error-accumulation problems in long-running calculations by rounding at each step? Is there some specific way in which I could do the rounding in order to avoid that? Are there any special values that I won't be able to treat that way (subnormals come to mind)?
Ideally, I would like my floats to be as well-behaved as a native 61-bit double would be.
I would recommend borrowing bits from the exponent field of the double-precision format. This is the method described in this article (that you would modify to borrow 3 bits from the exponent instead of 1). With this approach, all computations that do not use very large or very small intermediate results behave exactly as the original double-precision computation would. Even computations that run into the subnormal region of the new format behave exactly as they would if a 1+8+52 61-bit format had been standardized by IEEE.
By contrast, naively borrowing any number of bits at all from the significand introduces many double-rounding problems, all the more frequent that you are rounding from a 52-bit significand to a significand with only a few bits removed. Borrowing one bit from the significand as you suggest in an edit to your question would be the worst, with half the operations statistically producing double-rounded results that are different from what the ideal “native 61-bit double” would have produced. This means that instead of being accurate to 0.5ULP, the basic operations would be accurate to 3/4ULP, a dramatic loss of accuracy that would derail many of the existing, finely-designed numerical algorithms that expect 0.5ULP.
Three is a significant number of bits to borrow from an exponent that only has 11, though, and you could also consider using the single-precision 32-bit format in your language (calling the single-precision operations from the host).
Lastly, I give visibility here to another solution found by Jakub: borrow the three bits from the significand, and simulate round-to-odd for the intermediate double-precision computation before converting to the nearest number in 49-explicit-significand-bit, 11-exponent-bit format. If this way is chosen, it may useful to remark that the rounding itself to 49 bits of significand can be achieved with the following operations:
if ((repr & 7) == 4)
repr += (repr & 8) >> 1); /* midpoint case */
else
repr += 4;
repr &= ~(uint64_t)7; /* round to the nearest */
Despite working on the integer having the same representation as the double being considered, the above snippet works even if the number goes from normal to subnormal, from subnormal to normal, or from normal to infinite. You will of course want to set a tag in the three bits that have been freed as above. To recover a standard double-precision number from its unboxed representation, simply clear the tag with repr &= ~(uint64_t)7;.
This is a summary of my own research and information from the excellent answer by #Pascal Cuoq.
There are two places where we can truncate the 3-bits we need: the exponent, and the mantissa (significand). Both approaches run into problems which have to be explicitly handled in order for the calculations to behave as if we used a hypothetical native 61-bit IEEE format.
Truncating the mantissa
We shorten the mantissa by 3 bits, resulting in a 1s+11e+49m format. When we do that, performing calculations in double-precision and then rounding after each computation exposes us to double rounding problems. Fortunately, double rounding can be avoided by using a special rounding mode (round-to-odd) for the intermediate computations. There is an academic paper describing the approach and proving its correctness for all doubles - as long as we truncate at least 2 bits.
Portable implementation in C99 is straightforward. Since round-to-odd is not one of the available rounding modes, we emulate it by using fesetround(FE_TOWARD_ZERO), and then setting the last bit if the FE_INEXACT exception occurs. After computing the final double this way, we simply round to nearest for storage.
The format of the resulting float loses about 1 significant (decimal) digit compared to a full 64-bit double (from 15-17 digits to 14-16).
Truncating the exponent
We take 3 bits from the exponent, resulting in a 1s+8e+52m format. This approach (applied to a hypothetical introduction of 63-bit floats in OCaml) is described in an article. Since we reduce the range, we have to handle out-of-range exponents on both the positive side (by simply 'rounding' them to infinity) and the negative side. Doing this correctly on the negative side requires biasing the inputs to any operation in order to ensure that we get subnormals in the 64-bit computation whenever the 61-bit result needs to be subnormal. This has to be done a bit differently for each operation, since what matters is not whether the operands are subnormal, but whether we expect the result to be (in 61-bit).
The resulting format has significantly reduced range since we borrow a whopping 3 out of 11 bits of the exponent. The range goes down from 10-308...10308 to about 10-38 to 1038. Seems OK for computation, but we still lose a lot.
Comparison
Both approaches yield a well-behaved 61-bit float. I'm personally leaning towards truncating the mantissa, for three reasons:
the "fix-up" operations for round-to-odd are simpler, do not differ from operation to operation, and can be done after the computation
there is a proof of mathematical correctness of this approach
giving up one significant digit seems less impactful than giving up a big chunk of the double's range
Still, for some uses, truncating the exponent might be more attractive (especially if we care more about precision than range).

Integer divide by Zero and Float (Real no.) divide by Zero

If I run following line of code, I get DIVIDE BY ZERO error
1. System.out.println(5/0);
which is the expected behavior.
Now I run the below line of code
2. System.out.println(5/0F);
here there is no DIVIDE BY ZERO error, rather it shows INFINITY
In the first line I am dividing two integers and in the second two real numbers.
Why does dividing by zero for integers gives DIVIDE BY ZERO error while in the case of real numbers it gives INFINITY
I am sure it is not specific to any programming language.
(EDIT: The question has been changed a bit - it specifically referred to Java at one point.)
The integer types in Java don't have representations of infinity, "not a number" values etc - whereas IEEE-754 floating point types such as float and double do. It's as simple as that, really. It's not really a "real" vs "integer" difference - for example, BigDecimal represents real numbers too, but it doesn't have a representation of infinity either.
EDIT: Just to be clear, this is language/platform specific, in that you could create your own language/platform which worked differently. However, the underlying CPUs typically work the same way - so you'll find that many, many languages behave this way.
EDIT: In terms of motivation, bear in mind that for the infinity case in particular, there are ways of getting to infinity without dividing by zero - such as dividing by a very, very small floating point number. In the case of integers, there's obviously nothing between zero and one.
Also bear in mind that the cases in which integers (or decimal floating point types) are used typically don't need to concept of infinity, or "not a number" results - whereas in scientific applications (where float/double are more typically useful), "infinity" (or at least, "a number which is too large to sensibly represent") is still a potentially valid result.
This is specific to one programming language or a family of languages. Not all languages allow integers and floats to be used in the same expression. Not all languages have both types (for example, ECMAScript implementations like JavaScript have no notion of an integer type externally). Not all languages have syntax like this to convert values inline.
However, there is an intrinsic difference between integer arithmetic and floating-point arithmetic. In integer arithmetic, you must define that division by zero is an error, because there are no values to represent the result. In floating-point arithmetic, specifically that defined in IEEE-754, there are additional values (combinations of sign bit, exponent and mantissa) for the mathematical concept of infinity and meta-concepts like NaN (not a number).
So we can assume that the / operator in this programming language is generic, that it performs integer division if both operands are of the language's integer type; and that it performs floating-point division if at least one of the operands is of a float type of the language, whereas the other operands would be implicitly converted to that float type for the purpose of the operation.
In real-number math, division of a number by a number close to zero is equivalent to multiplying the first number by a number whose absolute is very large (x / (1 / y) = x * y). So it is reasonable that the result of dividing by zero should be (defined as) infinity as the precision of the floating-point value would be exceeded.
Implementation details were to be found in that programming language's specification.

Resources