Operations on long numbers in R - r

I aim to use maximum likelihood methods (usually about 10^5 iterations) with a probability distribution that creates very big integers and very small float values that cannot be stored as a numeric nor a in a float type.
I thought I would use the as.bigq in the gmp package. My issue is that one can only add, substract, multiply and dived two objects of class/type bigq, while my distribution actually contains logarithm, power, gamma and confluent hypergeometric functions.
What is my best option to deal with this issue?
Should I use another package?
Should I code all these functions for bigq objects.
Coding these function on R may cause some functions to be very slow, right?
How to write the logarithm function using only the +,-,*,/ operators? Should I approximate this function using a taylor series expansion?
How to write the power function using only the +,-,*,/ operators when the exponent is not an integer?
How to write the confluent hypergeometric function (the equivalent of the Hypergeometric1F1Regularized[..] function in Mathematica)?
I could eventually write these functions in C and call them from R but it sounds like some complicated work for not much, especially if I have to use the gmp package in C as well to handle these big numbers.

All your problems can be solved with Rmpfr most likely which allows you to use all of the functions returned by getGroupMembers("Math") with arbitrary accuracy.
Vignette: http://cran.r-project.org/web/packages/Rmpfr/vignettes/Rmpfr-pkg.pdf
Simple example of what it can do:
test <- mpfr(rnorm(100,mean=0,sd=.0001), 240)
Reduce("*", test)
I don't THINK it has hypergeometric functions though...

Related

Is there a LAPACK function for zeroing out the upper / lower corner of a matrix?

Some LAPACK functions (like dgqrf) return a function where the answer is upper triangular but then there's some auxilary information stored below the diagonal. I'm wondering if there's a function that will zero out the below the diagonal entries.
General Problem
No, there is no such function in standard BLAS/LAPACK.
If you are willing to move from using BLAS/LAPACK functions directly (with all potential issues and side effects), you may find linear algebra packages that would make such operations easier. Say, Eigen would provide TriangularViews, while other packages would have their way of doing that.
If you have to use BLAS/LAPACK directly, you would have to zero out it yourself.
QR-decomposition
I assume that you don't need the Q from the QR decomposition and only care about the R. With that, you want to store it in place and clean and avoid doing a copy into another allocated storage.
Technically, you can do it using dormqr and setting matrix C to be a zero-matrix. However, it is not efficient, as you are actually performing not needed linear algebra operations and storing another dense matrix. You are certainly better off doing a manual loop to clean up if that is actually required or copy R into another place (similar to how it's done here).

How to set the variables as integers in a Genetic Algorithm Function in GA package in R

Have a look at documentation of GA package in R
ftp://cran.r-project.org/pub/R/web/packages/GA/GA.pdf
The only solution I have found so far is to use a ceiling, floor, or round function in order to convert the decimal value to an integer within the fitness function--before passing it to the actual function being optimized.
This, however, slows down the GA function greatly.
There is another genetic algorithm that might be useful if you only wish to use integers. It is called GeneticAlg.int from the gramEvol package.

Is there an equivalent to matlab's rcond() function in Julia?

I'm porting some matlab code that uses rcond() to test for singularity, as also recommended here (for matlab singularity testing).
I see that there is a cond() function in Julia (as also in Matlab), but rcond() doesn't appear to be available by default:
ERROR: rcond not defined
I'd assume that rcond(), like the Matlab version is more efficient than 1/cond(). Is there such a function in Julia, perhaps using an add-on module?
Julia calculates the condition number using the ratio of maximum to the minimum of the eigenvalues (got to love open source, no more MATLAB black boxs!)
Julia doesn't have a rcond function in Base, and I'm unaware of one in any package. If it did, it'd just be the ratio of the maximum to the minimum instead. I'm not sure why its efficient in MATLAB, but its quite possible that whatever the reason is it doesn't carry though to Julia.
Matlab's rcond is an optimization based upon the fact that its an estimate of the condition number for square matrices. In my testing and given that its help mentions LAPACK's 1-norm estimator, it appears as though it uses LAPACK's dgecon.f. In fact, this is exactly what Julia does when you ask for the condition number of a square matrix with the 1- or Inf-norm.
So you can simply define
rcond(A::StridedMatrix) = 1/cond(A,1)
You can save Julia from twice-inverting LAPACK's results by manually combining cond(::StridedMatrix) and cond(::LU), but the savings here will almost certainly be immeasurable. Where there is a measurable savings, however, is that you can directly take the norm(A) instead of reconstructing a matrix similar to A through its LU factorization.
rcond(A::StridedMatrix) = LAPACK.gecon!('1', lufact(A).factors, norm(A, 1))
In my tests, this behaves identically to Matlab's rcond (2014b), and provides a decent speedup.

Parallelized multidimensional numerical integration in R?

I am working on a problem that needs a numerical integration of a bivariate function, where each evaluation of the function takes about 1 minute. Since numerical integration on a single core would evaluate the function thousands to tens of thousand times, I would like to parallelize the calculation. Right now I am using the bruteforce approach that calculates a naive grid of points and add them up with appropriate area multipliers. This is definitely not efficient and I suspect any modern multidimensional numerical integration algorithm would be able to achieve the same precision with a lot fewer function evaluations. There are many packages in R that would calculate 2-d integration much more efficiently and accurately (e.g. R2Cuba), but I haven't found anything that can be easily parallelized on a cluster with SGE managed job queues. Since this is only a small part of a bigger research problem, I would like to see if this can be done with reasonable effort , before I try to parallelize one of the cubature-rule based methods in R by myself.
I have found that using sparse grid achieves the best compromise between speed and accuracy in multi-dimensional integration, and it's easily parallized on the cluster because it doesn't involve any sequential steps. It won't be as accurate as other sequentially adpative integration algorithms but it is much better than the naive method because it provides a much sparser grid of points to calculate on each core.
The following R code deals with 2-dimensional integration, but can be easily modified for higher dimensions. The apply function towards the end can be easily parallelized on a cluster.
sg.int<-function(g,...,lower,upper)
{ require("SparseGrid")
lower<-floor(lower)
upper<-ceiling(upper)
if (any(lower>upper)) stop("lower must be smaller than upper")
gridss<-as.matrix(expand.grid(seq(lower[1],upper[1]-1,by=1),seq(lower[2],upper[2]-1,by=1)))
sp.grid <- createIntegrationGrid( 'KPU', dimension=2, k=5 )
nodes<-gridss[1,]+sp.grid$nodes
weights<-sp.grid$weights
for (i in 2:nrow(gridss))
{
nodes<-rbind(nodes,gridss[i,]+sp.grid$nodes)
weights<-c(weights,sp.grid$weights)
}
gx.sp <- apply(nodes, 1, g,...)
val.sp <- gx.sp %*%weights
val.sp
}

Optimization in R with arbitrary constraints

I have done it in Excel but need to run a proper simulation in R.
I need to minimize function F(x) (x is a vector) while having constraints that sum(x)=1, all values in x are [0,1] and another function G(x) > G_0.
I have tried it with optim and constrOptim. None of them give you this option.
The problem you are referring to is (presumably) a non-linear optimization with non-linear constraints. This is one of the most general optimization problems.
The package I have used for these purposes is called nloptr: see here. From my experience, it is both versatile and fast. You can specify both equality and inequality constaints by setting eval_g_eq and eval_g_ineq, correspondingly. If the jacobians are known explicitly (can be derived analytically), specify them for faster convergence; otherwise, a numerical approximation is used.
Use this list as a general reference to optimization problems.
Write the set of equations using the Lagrange multiplier, then solve using the R command nlm.
You can do this in the OpenMx Package (currently host at the site listed below. Aiming for 2.0 relase on cran this year)
It is a general purpose package mostly used for Structural Equation Modelling, but handling nonlinear constraints.
FOr your case, make an mxModel() with your algebras expressed in mxAlgebras() and the constraints in mxConstraints()
When you mxRun() the model, the algebras will be solved within the constraints, if possible.
http://openmx.psyc.virginia.edu/

Resources