The number of ways of choosing k objects from n, i.e. the binomial coefficient n!/(k!(n-k)!), is an integer when n and k are integers. How can I calculate this guaranteeing that the result is both correct and of integer type? The choose function returns a double even with integer arguments:
> typeof(choose(4L, 2L))
[1] "double"
as does manual calculation, e.g. n-choose-2 = n(n-1)/2
typeof((4L * (4L - 1L)) / 2L)
[1] "double"
Of course I can coerce to an integer with as.integer() but I'm nervous about machine precision:
> as.integer(3.999999999999999)
[1] 3
> as.integer(3.9999999999999999)
[1] 4
round() (with the default digits=0) rounds to the nearest integer, but returns a value of double type. If I could be certain that supplying an integer stored in double format to as.integer(round(...)) is guaranteed to round to the correct integer, never being tripped up by machine precision, then as.integer(round(choose(n, k))) would be acceptable. Is this the case? Or is there an alternative to choose() that will return an integer for integer arguments?
One way is to use the VeryLargeIntegers package. The function is:
binom(n, k)
e.g. binom(1000,50) or even binom(10000000,50)
It's wise to learn how to make very large integers too cf: as.vli('1234567890123456789')
https://www.rdocumentation.org/packages/VeryLargeIntegers/versions/0.1.8/topics/06.%20Binomial%20coefficients
The package is not completely bug-free, and larger computations will take a while.
Dr Jo.
Do not worry about the conversion, the machine precision will not be a problem. L after the integer is definitely not a double, [R] has a weird syntax, it is definitely not a long value and cannot have a decimal point.
Related
To my eye, c(4,4) is obviously a vector of integers, but typeof(c(4,4)) reports that c(4,4) is a double. Why is this?
Because it is numeric and not integer. If we need integer, we can use
v1 <- c(4L, 4L)
Or convert to integer with as.integer
v2 <- as.integer(c(4, 4))
and then check the class
According to ?numeric
numeric is identical to double (and real). It creates a double-precision vector of the specified length with each element equal to 0.
Also in ?integer
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.
The use of L is specified in ?NumericConstants
An numeric constant immediately followed by L is regarded as an integer number when possible (and with a warning if it contains a ".").
Theorem:
The required number of digits (in base t) to represent the positive integer S in base t is ⟦logtS⟧+1 (⟦.⟧: floor function).
I wondered, what is the required number of digits (in base 2) to represent the maximum positive double (floating point) number in computer. I have 64-bit OS and 32-bit R on it. Hence, I did:
.Machine$double.xmax # 1.797693e+308
typeof(.Machine$double.xmax) # double
floor(log(.Machine$double.xmax, 2))+1 # 1025
.Machine$integer.max # 2147483647
class(.Machine$integer.max) # integer
floor(log(.Machine$integer.max, 2))+1 # 31; (1 bit for sign bit)
So, the theory is OK for integers.
(1) But what about the double equivalent of the theorem? I.e., what is the required number of digits (in base t) to represent the double in base t?
(2) This may be difficult with real numbers with decimals. So, perhaps, one may know the equivalent of the theorem for decimalless reals (that is ">2147483647").
In particular, where does the 1025 above come from?
(3) Would I get 63 if I used 64-bit OS and 64-bit R for the following?
floor(log(.Machine$integer.max, 2))+1 # 63??; (1 bit for sign bit??)
Ad 3) I don't know about doubles but the integer internal representation is still 32 bits even on 64 bit systems. If you want to go bigger you need to use some sort of library for that for example 'bit64'
You will get more detailed information with help(double) and help(integer)
I often seen the symbol 1L (or 2L, 3L, etc) appear in R code. Whats the difference between 1L and 1? 1==1L evaluates to TRUE. Why is 1L used in R code?
So, #James and #Brian explained what 3L means. But why would you use it?
Most of the time it makes no difference - but sometimes you can use it to get your code to run faster and consume less memory. A double ("numeric") vector uses 8 bytes per element. An integer vector uses only 4 bytes per element. For large vectors, that's less wasted memory and less to wade through for the CPU (so it's typically faster).
Mostly this applies when working with indices.
Here's an example where adding 1 to an integer vector turns it into a double vector:
x <- 1:100
typeof(x) # integer
y <- x+1
typeof(y) # double, twice the memory size
object.size(y) # 840 bytes (on win64)
z <- x+1L
typeof(z) # still integer
object.size(z) # 440 bytes (on win64)
...but also note that working excessively with integers can be dangerous:
1e9L * 2L # Works fine; fast lean and mean!
1e9L * 4L # Ooops, overflow!
...and as #Gavin pointed out, the range for integers is roughly -2e9 to 2e9.
A caveat though is that this applies to the current R version (2.13). R might change this at some point (64-bit integers would be sweet, which could enable vectors of length > 2e9). To be safe, you should use .Machine$integer.max whenever you need the maximum integer value (and negate that for the minimum).
From the Constants Section of the R Language Definition:
We can use the ‘L’ suffix to qualify any number with the intent of making it an explicit integer.
So ‘0x10L’ creates the integer value 16 from the hexadecimal representation. The constant 1e3L
gives 1000 as an integer rather than a numeric value and is equivalent to 1000L. (Note that the
‘L’ is treated as qualifying the term 1e3 and not the 3.) If we qualify a value with ‘L’ that is
not an integer value, e.g. 1e-3L, we get a warning and the numeric value is created. A warning
is also created if there is an unnecessary decimal point in the number, e.g. 1.L.
L specifies an integer type, rather than a double that the standard numeric class is.
> str(1)
num 1
> str(1L)
int 1
To explicitly create an integer value for a constant you can call the function as.integer or more simply use "L " suffix.
I want to preface this by saying I'm an absolute programming beginner, so please excuse how basic this question is.
I'm trying to get a better understanding of "atomic" classes in R and maybe this goes for classes in programming in general. I understand the difference between a character, logical, and complex data classes, but I'm struggling to find the fundamental difference between a numeric class and an integer class.
Let's say I have a simple vector x <- c(4, 5, 6, 6) of integers, it would make sense for this to be an integer class. But when I do class(x) I get [1] "numeric". Then if I convert this vector to an integer class x <- as.integer(x). It return the same exact list of numbers except the class is different.
My question is why is this the case, and why the default class for a set of integers is a numeric class, and what are the advantages and or disadvantages of having an integer set as numeric instead of integer.
There are multiple classes that are grouped together as "numeric" classes, the 2 most common of which are double (for double precision floating point numbers) and integer. R will automatically convert between the numeric classes when needed, so for the most part it does not matter to the casual user whether the number 3 is currently stored as an integer or as a double. Most math is done using double precision, so that is often the default storage.
Sometimes you may want to specifically store a vector as integers if you know that they will never be converted to doubles (used as ID values or indexing) since integers require less storage space. But if they are going to be used in any math that will convert them to double, then it will probably be quickest to just store them as doubles to begin with.
Patrick Burns on Quora says:
First off, it is perfectly feasible to use R successfully for years
and not need to know the answer to this question. R handles the
differences between the (usual) numerics and integers for you in the
background.
> is.numeric(1)
[1] TRUE
> is.integer(1)
[1] FALSE
> is.numeric(1L)
[1] TRUE
> is.integer(1L)
[1] TRUE
(Putting capital 'L' after an integer forces it to be stored as an
integer.)
As you can see "integer" is a subset of "numeric".
> .Machine$integer.max
[1] 2147483647
> .Machine$double.xmax
[1] 1.797693e+308
Integers only go to a little more than 2 billion, while the other
numerics can be much bigger. They can be bigger because they are
stored as double precision floating point numbers. This means that
the number is stored in two pieces: the exponent (like 308 above,
except in base 2 rather than base 10), and the "significand" (like
1.797693 above).
Note that 'is.integer' is not a test of whether you have a whole
number, but a test of how the data are stored.
One thing to watch out for is that the colon operator, :, will return integers if the start and end points are whole numbers. For example, 1:5 creates an integer vector of numbers from 1 to 5. You don't need to append the letter L.
> class(1:5)
[1] "integer"
Reference: https://www.quora.com/What-is-the-difference-between-numeric-and-integer-in-R
To quote the help page (try ?integer), bolded portion mine:
Integer vectors exist so that data can be passed to C or Fortran code which expects them, and so that (small) integer data can be represented exactly and compactly.
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.
Like the help page says, R's integers are signed 32-bit numbers so can hold between -2147483648 and +2147483647 and take up 4 bytes.
R's numeric is identical to an 64-bit double conforming to the IEEE 754 standard. R has no single precision data type. (source: help pages of numeric and double). A double can store all integers between -2^53 and 2^53 exactly without losing precision.
We can see the data type sizes, including the overhead of a vector (source):
> object.size(1:1000)
4040 bytes
> object.size(as.numeric(1:1000))
8040 bytes
To my understanding - we do not declare a variable with a data type so by default R has set any number without L to be a numeric.
If you wrote:
> x <- c(4L, 5L, 6L, 6L)
> class(x)
>"integer" #it would be correct
Example of Integer:
> x<- 2L
> print(x)
Example of Numeric (kind of like double/float from other programming languages)
> x<-3.4
> print(x)
Numeric is an umbrella term for several types of classes (e.g. double and integer). Integers are numbers which do not have decimal points and thus are stored with minimal space in memory. Use the integer class only when doing computations with such numbers, otherwise revert to numeric.
My question is: Suppose you have computed an algorithm that gives the number of iterations and you would like to print the number of iterations out. But the output always many decimal places, like the following:
64.00000000
Is it possible to get an integer by doing type casting in R ? How would you do it ??
There are some gotchas in coercing to integer mode. Presumably you have a variety of numbers in some structure. If you are working with a matrix, then the print routine will display all the numbers at the same precision. However, you can change that level. If you have calculated this result with an arithmetic process it may be actually less than 64 bit display as that value.
> 64.00000000-.00000099999
[1] 64
> 64.00000000-.0000099999
[1] 63.99999
So assuming you want all the values in whatever structure this is part of, to be displayed as integers, the safest would be:
round(64.000000, 0)
... since this could happen, otherwise.
> as.integer(64.00000000-.00000000009)
[1] 63
The other gotcha is that the range of value for integers is considerably less than the range of floating point numbers.
The function is.integer can be used to test for integer mode.
is.integer(3)
[1] FALSE
is.integer(3L)
[1] TRUE
Neither round nor trunc will return a vector in integer mode:
is.integer(trunc(3.4))
[1] FALSE
Instead of trying to convert the output into an integer, find out why it is not an integer in the first place, and fix it there.
Did you initialize it as an integer, e.g. num.iterations <- 0L or num.iterations <- integer(1) or did you make the mistake of setting it to 0 (a numeric)?
When you incremented it, did you add 1 (a numeric) or 1L (an integer)?
If you are not sure, go through your code and check your variable's type using the class function.
Fixing the problem at the root could save you a lot of trouble down the line. It could also make your code more efficient as numerous operations are faster on integers than numerics (an example).
The function as.integer() truncate the number up to 0 order, so you must add a 0.5 to get a proper approx
dd<-64.00000000
as.integer(dd+0.5)
If you have a numeric matrix you wish to coerce to an integer matrix (e.g., you are creating a set of dummy variables from a factor), as.integer(matrix_object) will coerce the matrix to a vector, which is not what you want. Instead, you can use storage.mode(matrix_object) <- "integer" to maintain the matrix form.