Exact arithmetic operations with fractions in R - r

I want to implement (in R language) an algorithm which starts with vectors of integer numbers, and then proceed, step by step, to operate among them only with arithmetic operations (+, -, * and /).
I'm concerned about instability of division by small numbers, then I wish I could manage (within R types) with the exact fraction representation of all results. Then when I want to see the value of a vector, I would like to see the vector of exact fractions, results of all the performed operations.
I have been googling for in R documentation, packages, and I have found packages "FRACTION" and "fractional". They do something related, but not exact arithmetic: they seem to work with "real" numbers, do fix some tolerance, and then turn those truncated real numbers into their fractional representation.
Then I ask for help.

Related

Does the documentation or language definition for R remark on the intended usage of R's integers?

A common comment that I see made about R's integer type is that it's only really intended for communication with C code. Do any statement like this appear in any official part of R's documentation? I often catch myself making vectors like integer(10) under the impression that they'll be more efficient for my purposes, only to remember this folklore and reconsider if I should ever be using integers for code that never tries to communicate with C code.
I don't think so. This folklore probably comes from the fact that R is pretty loose about typing and coercion, so it's easy to end up with a floating-point variable by accident.
Integer types certainly save memory:
> object.size(seq(1e8))
400000048 bytes
> object.size(seq(1e8)+0.1)
800000048 bytes
I haven't tried benchmarking to see if R uses faster routines for integer vs floating-point arithmetic, but you could.
I haven't looked carefully through all of R's documentation, but the only slightly relevant comment that turns up in a full-text search for "integer" in the R language definition is:
In most cases,the difference between an integer and a numeric value will be unimportant as R will do the right thing when using the numbers. There are, however, times when we would like to explicitly create an integer value for a constant. We can do this by calling the function as.integer or using various other techniques ...
I did a grep integer *.texi in the doc/manual directory of the R source tree and didn't (in a quick skim) notice anything else that looked relevant.
Following Ben Bolker's advice, I checked the seven R manuals. In addition to Ben's answer, I found the following:
For most purposes the user will not be concerned if the “numbers” in a numeric vector are integers, reals or even complex. Internally calculations are done as double precision real numbers, or double precision complex numbers if the input data are complex.
An Introduction to R Section 2.2
Writing R Extensions gives lots of guidance for making R communicate with C and Fortran, but it doesn't say anything about the integer typing's intent.
The last place to check is the Full Reference Manual. You would have to be mad to do so - the word "integer" occurs over 1000 times. However, a quick look at the index reveals the documentation for the integer class. This gives us the answer is such plain English that I should not be forgiven for having missed it:
Integer vectors exist so that data can be passed to C or Fortran code which expects them, and so that (small) integer data can be represented exactly and compactly.

Is there a way to display more than 25 digits when outputting in Scilab?

I'm working with Scilab 5.5.2 and when using the format command I can display at most 25 digits of a number. Is there a way to somehow display more than this number?
Scilab operates with double precision floating point numbers; it does not support variable-precision arithmetics. Double precision means relative error of %eps, which is 2-52, approximately 2e-16.
This means you can't even get 25 correct decimal digits: when using format(25) you get garbage at the end. For example,
format(25); sqrt(3)
returns 1.732050807568877 1931766
I separated the last 7 digits here because they are wrong; the correct value of sqrt(3) begins with
1.732050807568877 2935274
Of course, if you don't mind the digits being wrong, you can have as many as you want:
strcat([sprintf('%.15f', sqrt(3)), "1111111111111111111111111111111"])
returns 1.7320508075688771111111111111111111111111111111.
But if you want to have arbitrary exceeding of real numbers, Scilab is not the right tool for the job (correction: phuclv pointed out Multiple Precision Arithmetic Toolbox which might work for you). Out of free software packages, mpmath Python library implements arbitrary precision of real numbers: it can be used directly or via Sagemath or SymPy. Commercial packages (Matlab, Maple, Mathematica) support variable precision too.
As for Scilab, I recommend using formatted print commands such as fprintf or sprintf, because they actually care about the output being meaningful. Example: printf('%.25f', sqrt(3)) returns
1.7320508075688772000000000
with garbage replaced by zeros. The last nonzero digit is still off by 1, but at least it's not meaningless.
Scilab uses double-precision floating-point type which has 53 bits of mantissa and can only be precise to ~15-17 digits. There's no reason printing digits beyond that.
If 25 digits of accuracy is needed then you can use a quadruple precision or double-double arithmetic library like ATOMS: Multiple Precision Arithmetic Toolbox details
If you need even more precision then the only way is using an arbitrary precision library like mpscilab, Xnum, etc...

Division with really big numbers

I was just wondering what different strategies there are for division when dealing with big numbers. By big numbers, I mean ~50 digit numbers .
e.g.
9237639100273856744937827364095876289200667937278 / 8263744826271827396629934467882946252671
When both numbers are big, long division seems to lose its usefulness...
I thought one possibility is to count through multiplications of the divisor until you go over the dividend, but if it was the dividend in the example above divided by a small number, e.g. 4, then that's a huge amount of calculations to do.
So, is there simple, clean way to do this?
What language / platform do you use? This is most likely already solved, so you don't need to implement it from scratch. E.g. Haskell has the Integer type, Java the java.math.BigInteger class, .NET the System.Numerics.BigInteger structure, etc.
If your question is really a theoretical one, I suggest you read Knuth, The Art of Computer Programming, Volume 2, Section 4.3.1. What you are looking for is called "Algorithm D" there. Here is a C implementation of that algorithm along with a short explanation:
http://hackers-delight.org.ua/059.htm
Long division is not very complicated if you are working with binary representations of your numbers and probably the most efficient algorithm.
if you don't need very exact result, you can use logarithms and exponents.
Exponent is the function f(x)=e^x, where e is a mathmaticall constant equal to 2.71828182845...
Logarithm (marked by ln) is the inverse of the exponent.
Since ln(a/b)=ln(a)-ln(b), to calculate a/b you need to:
Calculate ln(a) and ln(b) [By library function, logarithm table or other methods]
substruct them: temp=ln(a)-lb(b)
calculate the exponent e^temp

Integer divide by Zero and Float (Real no.) divide by Zero

If I run following line of code, I get DIVIDE BY ZERO error
1. System.out.println(5/0);
which is the expected behavior.
Now I run the below line of code
2. System.out.println(5/0F);
here there is no DIVIDE BY ZERO error, rather it shows INFINITY
In the first line I am dividing two integers and in the second two real numbers.
Why does dividing by zero for integers gives DIVIDE BY ZERO error while in the case of real numbers it gives INFINITY
I am sure it is not specific to any programming language.
(EDIT: The question has been changed a bit - it specifically referred to Java at one point.)
The integer types in Java don't have representations of infinity, "not a number" values etc - whereas IEEE-754 floating point types such as float and double do. It's as simple as that, really. It's not really a "real" vs "integer" difference - for example, BigDecimal represents real numbers too, but it doesn't have a representation of infinity either.
EDIT: Just to be clear, this is language/platform specific, in that you could create your own language/platform which worked differently. However, the underlying CPUs typically work the same way - so you'll find that many, many languages behave this way.
EDIT: In terms of motivation, bear in mind that for the infinity case in particular, there are ways of getting to infinity without dividing by zero - such as dividing by a very, very small floating point number. In the case of integers, there's obviously nothing between zero and one.
Also bear in mind that the cases in which integers (or decimal floating point types) are used typically don't need to concept of infinity, or "not a number" results - whereas in scientific applications (where float/double are more typically useful), "infinity" (or at least, "a number which is too large to sensibly represent") is still a potentially valid result.
This is specific to one programming language or a family of languages. Not all languages allow integers and floats to be used in the same expression. Not all languages have both types (for example, ECMAScript implementations like JavaScript have no notion of an integer type externally). Not all languages have syntax like this to convert values inline.
However, there is an intrinsic difference between integer arithmetic and floating-point arithmetic. In integer arithmetic, you must define that division by zero is an error, because there are no values to represent the result. In floating-point arithmetic, specifically that defined in IEEE-754, there are additional values (combinations of sign bit, exponent and mantissa) for the mathematical concept of infinity and meta-concepts like NaN (not a number).
So we can assume that the / operator in this programming language is generic, that it performs integer division if both operands are of the language's integer type; and that it performs floating-point division if at least one of the operands is of a float type of the language, whereas the other operands would be implicitly converted to that float type for the purpose of the operation.
In real-number math, division of a number by a number close to zero is equivalent to multiplying the first number by a number whose absolute is very large (x / (1 / y) = x * y). So it is reasonable that the result of dividing by zero should be (defined as) infinity as the precision of the floating-point value would be exceeded.
Implementation details were to be found in that programming language's specification.

When is it appropriate to use floating precision data types?

It's clear that one shouldn't use floating precision when working with, say, monetary amounts since the variation in precision leads to inaccuracies when doing calculations with that amount.
That said, what are use cases when that is acceptable? And, what are the general principles one should have in mind when deciding?
Floating point numbers should be used for what they were designed for: computations where what you want is a fixed precision, and you only care that your answer is accurate to within a certain tolerance. If you need an exact answer in all cases, you're best using something else.
Here are three domains where you might use floating point:
Scientific Simulations
Science apps require a lot of number crunching, and often use sophisticated numerical methods to solve systems of differential equations. You're typically talking double-precision floating point here.
Games
Think of games as a simulation where it's ok to cheat. If the physics is "good enough" to seem real then it's ok for games, and you can make up in user experience what you're missing in terms of accuracy. Games usually use single-precision floating point.
Stats
Like science apps, statistical methods need a lot of floating point. A lot of the numerical methods are the same; the application domain is just different. You find a lot of statistics and monte carlo simulations in financial applications and in any field where you're analyzing a lot of survey data.
Floating point isn't trivial, and for most business applications you really don't need to know all these subtleties. You're fine just knowing that you can't represent some decimal numbers exactly in floating point, and that you should be sure to use some decimal type for prices and things like that.
If you really want to get into the details and understand all the tradeoffs and pitfalls, check out the classic What Every Programmer Should Know About Floating Point, or pick up a book on Numerical Analysis or Applied Numerical Linear Algebra if you're really adventurous.
I'm guessing you mean "floating point" here. The answer is, basically, any time the quantities involved are approximate, measured, rather than precise; any time the quantities involved are larger than can be conveniently represented precisely on the underlying machine; any time the need for computational speed overwhelms exact precision; and any time the appropriate precision can be maintained without other complexities.
For more details of this, you really need to read a numerical analysis book.
Short story is that if you need exact calculations, DO NOT USE floating point.
Don't use floating point numbers as loop indices: Don't get caught doing:
for ( d = 0.1; d < 1.0; d+=0.1)
{ /* Some Code... */ }
You will be surprised.
Don't use floating point numbers as keys to any sort of map because you can never count on equality behaving like you may expect.
Most real-world quantities are inexact, and typically we know their numeric properties with a lot less precision than a typical floating-point value. In almost all cases, the C types float and double are good enough.
It is necessary to know some of the pitfalls. For example, testing two floating-point numbers for equality is usually not what you want, since all it takes is a single bit of inaccuracy to make the comparison non-equal. tgamblin has provided some good references.
The usual exception is money, which is calculated exactly according to certain conventions that don't translate well to binary representations. Part of this is the constants used: you'll never see a pi% interest rate, or a 22/7% interest rate, but you might well see a 3.14% interest rate. In other words, the numbers used are typically expressed in exact decimal fractions, not all of which are exact binary fractions. Further, the rounding in calculations is governed by conventions that also don't translate well into binary. This makes it extremely difficult to precisely duplicate financial calculations with standard floating point, and therefore people use other methods for them.
It's appropriate to use floating point types when dealing with scientific or statistical calculations. These will invariably only have, say, 3-8 significant digits of accuracy.
As to whether to use single or double precision floating point types, this depends on your need for accuracy and how many significant digits you need. Typically though people just end up using doubles unless they have a good reason not to.
For example if you measure distance or weight or any physical quantity like that the number you come up with isn't exact: it has a certain number of significant digits based on the accuracy of your instruments and your measurements.
For calculations involving anything like this, floating point numbers are appropriate.
Also, if you're dealing with irrational numbers floating point types are appropriate (and really your only choice) eg linear algebra where you deal with square roots a lot.
Money is different because you typically need to be exact and every digit is significant.
I think you should ask the other way around: when should you not use floating point. For most numerical tasks, floating point is the preferred data type, as you can (almost) forget about overflow and other kind of problems typically encountered with integer types.
One way to look at floating point data type is that the precision is independent of the dynamic, that is whether the number is very small of very big (within an acceptable range of course), the number of meaningful digits is approximately the same.
One drawback is that floating point numbers have some surprising properties, like x == x can be False (if x is nan), they do not follow most mathematical rules (distributivity, that is x( y + z) != xy + xz). Depending on the values for z, y, and z, this can matters.
From Wikipedia:
Floating-point arithmetic is at its
best when it is simply being used to
measure real-world quantities over a
wide range of scales (such as the
orbital period of Io or the mass of
the proton), and at its worst when it
is expected to model the interactions
of quantities expressed as decimal
strings that are expected to be exact.
Floating point is fast but inexact. If that is an acceptable trade off, use floating point.

Resources