R: How can I calculate large numbers in n-choose-k? [duplicate] - r

This question already has answers here:
How would you program Pascal's triangle in R?
(2 answers)
How to work with large numbers in R?
(1 answer)
Closed 6 years ago.
For a class assignment, I need to create a function that calculates n Choose k. I did just that, and it works fine with small numbers (e.g. 6 choose 2), but I'm supposed to get it work with 200 choose 50, where it naturally doesn't.
The answer is too large and R outputs NaN or Inf, saying:
> q5(200, 50)
[1] "NaN"
Warning message:
In factorial(n) : value out of range in 'gammafn'
I tried using logs and exponents, but it doesn't cut it.
q5 <- function (n, k) {
answer <- log(exp( factorial(n) / ( (factorial(k)) * (factorial(n - k)) )))
paste0(answer)
}

The answer to the actual question is that R cannot show numbers it cannot represent, and some of the terms in your equation are too big to represent. So it fails. However there are approximations to factorial that can be used - they work with logarithms which get big a lot slower.
The most famous one, Sterling's approximation, was not accurate enough, but the Ramanujan's approximation came to the rescue :)
ramanujan <- function(n){
n*log(n) - n + log(n*(1 + 4*n*(1+2*n)))/6 + log(pi)/2
}
nchoosek <- function(n,k){
factorial(n)/(factorial(k)*factorial(n-k))
}
bignchoosek <- function(n,k){
exp(ramanujan(n) - ramanujan(k) - ramanujan(n-k))
}
nchoosek(20,5)
# [1] 15504
bignchoosek(20,5)
# [1] 15504.06
bignchoosek(200,50)
# [1] 4.538584e+47

You can try this too:
q5 <- function (n, k) {
# nchoosek = (n-k+1)(n-k+2)...n / (1.2...k)
return(prod(sapply(1:k, function(i)(n-k+i)/(i))))
}
q5(200, 50)
#[1] 4.538584e+47
or in log domain
q5 <- function (n, k) {
# ln (nchoosek) = ln(n-k+1) + ln(n-k+2) + ...+ ln(n) - ln(1) -ln(2) - ...- ln(k)
return(exp(sum(sapply(1:k, function(i)(log(n-k+i) - log(i))))))
}
q5(200, 50)
#[1] 4.538584e+47

The packages for large numbers:
Brobdingnag package for "Very large numbers in R":
https://cran.r-project.org/web/packages/Brobdingnag/index.html
Paper: https://www.researchgate.net/publication/251996764_Very_large_numbers_in_R_Introducing_package_Brobdingnag
library(Brobdingnag)
googol <- as.brob(10)^100 # googol:=10^100
googol
# [1] +exp(230.26) # exponential notation is convenient for very large numbers
gmp package for multiple Precision Arithmetic (big integers and rationals, prime number tests, matrix computation):
https://cran.r-project.org/web/packages/gmp/index.html

This solution calculates the complete row of the Pascal triangle:
x <- 1
print(x)
for (i in 1:200) { x <- c(0, x) + c(x, 0); print(x) }
x[51] ### 200 choose 50
## > x[51]
## [1] 4.538584e+47
(as I proposed for How would you program Pascal's triangle in R? )
If you want to speed up the code then do not the print(x) (output is a relative slow operation).
To put the code in a function we can do
nchoosek <- function(n,k) {
x <- 1
for (i in 1:n) x <- c(0, x) + c(x, 0)
x[k+1] ### n choose k
}
nchoosek(200, 50) ### testing the function
## [1] 4.538584e+47
Here is a more refined version of my function:
nchoosek <- function(n, k) {
if (k==0) return(1)
if (k+k > n) k <- n-k
if (k==0) return(1)
x <- 1
for (i in 1:k) x <- c(0, x) + c(x, 0)
for (i in 1:(n-k)) x <- x + c(0, head(x, -1))
tail(x, 1)
}
nchoosek(200, 50) ### testing the function
## [1] 4.538584e+47

Related

How to solve an equation for a given variable in R?

This is equation a <- x * t - 2 * x. I want to solve this equation for t.
So basically, set a = 0 and solve for t . I am new to the R packages for solving equations. I need the package that solves for complex roots. The original equations I am work with have real and imaginary roots. I am looking for an algebraic solution only, not numerical.
I tried:
a <- x * t - 2 * x
solve(a,t)
I run into an error:
Error in solve.default(a, t) : 'a' (1000 x 1) must be square
You can use Ryacas to get the solution as an expression of x:
library(Ryacas)
x <- Sym("x")
t <- Sym("t")
Solve(x*t-2*x == 0, t)
# Yacas vector:
# [1] t == 2 * x/x
As you can see, the solution is t=2 (assuming x is not zero).
Let's try a less trivial example:
Solve(x*t-2*x == 1, t)
# Yacas vector:
# [1] t == (2 * x + 1)/x
If you want to get a function which provides the solution as a function of x, you can do:
solution <- Solve(x*t-2*x == 1, t)
f <- function(x){}
body(f) <- yacas(paste0("t Where ", solution))$text
f
# function (x)
# (2 * x + 1)/x
You might be looking for optimize:
a=function(x,t) x*t-2*x
optimize(a,lower=-100,upper=100,t=10)
optimize(a,lower=-100,upper=100,x=2)
If you need more help, I need a reproductible example.

Large number digit sum

I am trying to create a function that computes the sum of digits of large numbers, of the order of 100^100. The approach described in this question does not work, as shown below. I tried to come up with a function that does the job, but have not been able to get very far.
The inputs would be of the form a^b, where 1 < a, b < 100 and a and b are integers. So, in that sense, I am open to making digitSumLarge a function that accepts two arguments.
digitSumLarge <- function(x) {
pow <- floor(log10(x)) + 1L
rem <- x
i <- 1L
num <- integer(length = pow)
# Individually isolate each digit starting from the largest and store it in num
while(rem > 0) {
num[i] <- rem%/%(10^(pow - i))
rem <- rem%%(10^(pow - i))
i <- i + 1L
}
return(num)
}
# Function in the highest voted answer of the linked question.
digitsum <- function(x) sum(floor(x / 10^(0:(nchar(x) - 1))) %% 10)
Consider the following tests:
x <- c(1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9)
as.numeric(paste(x, collapse = ''))
# [1] 1.234568e+17
sum(x)
# 90
digitSumLarge(as.numeric(paste(x, collapse = '')))
# 85
digitsum(as.numeric(paste(x, collapse = '')))
# 81, with warning message about loss of accuracy
Is there any way I can write such a function in R?
You need arbitrary precision numbers. a^b with R's numerics (double precision floats) can be only represented with limited precision and not exactly for sufficiently large input.
library(gmp)
a <- as.bigz(13)
b <- as.bigz(67)
sum(as.numeric(strsplit(as.character(a^b), split = "")[[1]]))
#[1] 328

Non-comformable arguments in R

I am re-writting an algorithm I did in C++ in R for practice called the Finite Difference Method. I am pretty new with R so I don't know all the rules regarding vector/matrix multiplication. For some reason I am getting a non-conformable arguments error when I do this:
ST_u <- matrix(0,M,1)
ST_l <- matrix(0,M,1)
for(i in 1:M){
Z <- matrix(gaussian_box_muller(i),M,1)
ST_u[i] <- (S0 + delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
ST_l[i] <- (S0 - delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
}
I get this error:
Error in sqrt(T) %*% Z : non-conformable arguments
Here is my whole code:
gaussian_box_muller <- function(n){
theta <- runif(n, 0, 2 * pi)
rsq <- rexp(n, 0.5)
x <- sqrt(rsq) * cos(theta)
return(x)
}
d_j <- function(j, S, K, r, v,T) {
return ((log(S/K) + (r + (-1^(j-1))*0.5*v*v)*T)/(v*(T^0.5)))
}
call_delta <- function(S,K,r,v,T){
return (S * dnorm(d_j(1, S, K, r, v, T))-K*exp(-r*T) * dnorm(d_j(2, S, K, r, v, T)))
}
Finite_Difference <- function(S0,K,r,sigma,T,M,delta_S){
ST_u <- matrix(0,M,1)
ST_l <- matrix(0,M,1)
for(i in 1:M){
Z <- matrix(gaussian_box_muller(i),M,1)
ST_u[i] <- (S0 + delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
ST_l[i] <- (S0 - delta_S)*exp((r - (sigma*sigma)/(2.0))*T + sigma*sqrt(T)%*%Z)
}
Delta <- matrix(0,M,1)
totDelta <- 0
for(i in 1:M){
if(ST_u[i] - K > 0 && ST_l[i] - K > 0){
Delta[i] <- ((ST_u[i] - K) - (ST_l[i] - K))/(2*delta_S)
}else{
Delta <- 0
}
totDelta = totDelta + exp(-r*T)*Delta[i]
}
totDelta <- totDelta * 1/M
Var <- 0
for(i in 1:M){
Var = Var + (Delta[i] - totDelta)^2
}
Var = Var*1/M
cat("The Finite Difference Delta is : ", totDelta)
call_Delta_a <- call_delta(S,K,r,sigma,T)
bias <- abs(call_Delta_a - totDelta)
cat("The bias is: ", bias)
cat("The Variance of the Finite Difference method is: ", Var)
MSE <- bias*bias + Var
cat("The marginal squared error is thus: ", MSE)
}
S0 <- 100.0
delta_S <- 0.001
K <- 100.0
r <- 0.05
sigma <- 0.2
T <- 1.0
M <- 10
result1 <- Finite_Difference(S0,K,r,sigma,T,M,delta_S)
I can't seem to figure out the problem, any suggestions would be greatly appreciated.
In R, the %*% operator is reserved for multiplying two conformable matrices. As one special case, you can also use it to multiply a vector by a matrix (or vice versa), if the vector can be treated as a row or column vector that conforms to the matrix; as a second special case, it can be used to multiply two vectors to calculate their inner product.
However, one thing it cannot do is perform scalar multipliciation. Scalar multiplication of vectors or matrices always uses the plain * operator. Specifically, in the expression sqrt(T) %*% Z, the first term sqrt(T) is a scalar, and the second Z is a matrix. If what you intend to do here is multiply the matrix Z by the scalar sqrt(T), then this should just be written sqrt(T) * Z.
When I made this change, your program still didn't work because of another bug -- S is used but never defined -- but I don't understand your algorithm well enough to attempt a fix.
A few other comments on the program not directly related to your original question:
The first loop in Finite_Difference looks suspicious: guassian_box_muller(i) generates a vector of length i as i varies in the loop from 1 up to M, and forcing these vectors into a column matrix of length M to generate Z is probably not doing what you want. It will "reuse" the values in a cycle to populate the matrix. Try these to see what I mean:
matrix(gaussian_box_muller(1),10,1) # all one value
matrix(gaussian_box_muller(3),10,1) # cycle of three values
You also use loops in many places where R's vector operations would be easier to read and (typically) faster to execute. For example, your definition of Var is equivalent to:
Var <- sum((Delta - totDelta)^2)/M
and the definitions of Delta and totDelta could also be written in this simplified fashion.
I'd suggest Googling for "vector and matrix operations in r" or something similar and reading some tutorials. Vector arithmetic in particular is idiomatic R, and you'll want to learn it early and use it often.
You might find it helpful to consider the rnorm function to generate random Gaussians.
Happy R-ing!

R functions that operate on data frame columns [duplicate]

This question already has answers here:
Applying a function on each row of a data frame in R
(3 answers)
Closed 9 years ago.
I have the following function:
calculateAngle <- function(x, y)
{
v <- c(x, y)
a <- c(1, 0)
theta <- acos( sum(a*v) / ( sqrt(sum(a * a)) * sqrt(sum(v * v)) ) )
if(v[[2]] < 0)
{
return(-1 * theta)
}
else
{
return(theta)
}
}
Which takes an x and y value and calculates the angle between that vector and a vector of 1, 0. Now, this function works fine in these examples:
> calculateAngle(0, 1)
[1] 1.570796
> calculateAngle(0, -1)
[1] -1.570796
> calculateAngle(0, -10)
[1] -1.570796
> calculateAngle(rnorm(1), rnorm(1))
[1] -0.2600444
But when I try to pass it the columns of a dataframe, it returns a single value when what I want is the angle for each row.
df <- data.frame(x=rnorm(10), y=rnorm(10))
df$angle <- calculateAngle(df$x, df$y)
Help is appreciated.
(Warning: this is the lazy answer because I don't feel like spending more than 5 seconds on this!)
calculateAnglev <- Vectorize(calculateAngle,c('x','y'))
> calculateAnglev(runif(2),runif(2))
[1] 0.2738694 0.8039875
i.e. this should not be mistaken for a substitute for true vectorization, performance-wise.
Your code for theta is overly complicated, for example you have term sum(sqrt(a*a)) which is always 1, and sum(a*v) is always x. Also sum(v*v) = x^2+y^2, and using that form we get to the version which works also for vector arguments:
calculateAngle <- function(x, y)
{
a <- c(1, 0)
theta <- acos( x / sqrt(x^2+y^2))
sign(y)*theta
}

Recursively Inverting a linear system - getting huge errors (precision)

I have a Ax =b type linear system - where A is an upper-triangular matrix. The structure of A is defined as follows:
comp.Amat <- function(i,j,prob) ifelse(i > j, 0, dbinom(x=i, size=j, prob=prob))
prob <- 1/4
A <- outer(1:50, 1:50 , FUN=function(r,c) comp.Amat(r,c,prob) )
The entries in A are binomial probabilities - and the issue is the diagonal entries fastly approach to 0 when the size of A grows.
If we define the vector b as follows as well:
b <- seq(1,50,1);
Then solve(a=A,b=b) - gives an error:
" system is computationally singular: reciprocal condition number = 1.07584e-64"
That makes sense, since the diagonal entries are almost 0, so the matrix becomes non-invertible.
As a work-around, I have written the following recursive function - which starts to compute the value of last diagonal entry, then replaces that value in the previous rows. Since each entry in matrix is dbinom(j,i, prob) for j=>i :I can get a solution via this way.
solve.for.x.custom <- function(A, b, prob)
{
n =length(A[1,])
m =length(A[,1])
x = seq(1,n, 1);
x[x> 0] = -1000;
calc.inv.Aii <- function(i,j, prob)
{
res = (1 / (prob*(1-prob)))^i;
return(res);
}
for (i in m:1 )
{
if(i ==m)
{
rhs =0;
}else
{
rhs=0;
for(j in m:(i+1))
{
rhs = dbinom(x=i,size=j,prob=prob)*x[j] + rhs;
}
}
x[i] = (b[i] - rhs)*calc.inv.Aii(i,i);
}
print(x)
return(x)
}
My problem is - when I multiply this solution x' by matrix A, the errors (Ax'- b) are huge. Since I have an analytical solution (each entry in x_i can be described as a in terms of binomial probabilities multiplies by previous values) - the error I should get is 0- in each row.
I see that (1 / (1/a)) may not be equal to a because of these issues. However, the current errors are really big( -1.13817489781529e+168).
x_prime=solve.for.x.custom(A, b, prob)
A%*%x_prime - b
#output
[,1]
[1,] -1.13817489781529e+168
[2,] 2.11872209742428e+167
[3,] -1.58403954589004e+166
[4,] 6.52328959209082e+164
[5,] -1.69562573261261e+163
[6,] 3.00614551450976e+161
***
[49,] -7.58010305220250e+08
[50,] 9.65162608741321e+03
I would really appreciate it you'd recommend any suggestions or efficient methods. I gave the size of A and b as 50 -but I intend to grow them as well thus in that case this the error will increase also.
If your matrix A is upper triangular you probably want to use backsolve(A, b) rather than solve(A, b).
You can do arbitrary precision in R with Rmpfr, which will require writing a compatible version of backsolve. With the code below the break we can get
> print(max(abs(b - .b)), digits=5)
1 'mpfr' number of precision 1024 bits
[1] 2.9686e-267
There is one important caveat though: the values in A may not be accurate enough since they come from dbinom rather than using mpfr objeccts. Depending on your end goal, you may need to write your own version of dbinom using Rmpfr.
library(Rmpfr)
logcomp.Amat <- function(i,j,prob) ifelse(i > j, -Inf, dbinom(x=i, size=j, prob=prob, log=TRUE))
nbits <- 1024
.backsolve <- function(A, b) {
n <- length(b)
x <- mpfr(numeric(n), nbits)
for(i in rev(seq_len(n))) {
known <- i + seq_len(n - i)
z <- if(length(known) > 0) sum(A[i,known] * x[known]) else 0
x[i] <- (b[i] - z) / A[i,i]
}
return(x)
}
logA <- outer(1:50, 1:50, logcomp.Amat, prob=1/4)
b <- 1:50
A <- exp(mpfr(logA, nbits))
b <- mpfr(b, nbits)
x <- .backsolve(A, b)
.b <- as.vector(A %*% x)

Resources