Apply a function rowwise in Rcpp/RcppArmadillo - r

I have a function which takes a vector as input and outputs a scalar and I want to apply this function to a number of observations. The data is structured in a matrix (rows are the number of observations and columns the variables) and the function is:
// [[Rcpp::export]]
double gaussianweight(arma::vec x, arma::mat H) {
double c = std::pow(2 * arma::datum::pi, -0.5 * x.n_rows);
double s = std::pow(arma::det(H), -1);
arma::mat Hinv = arma::inv(H);
return(c * s * std::exp(-0.5 * arma::dot(Hinv * x, Hinv * x)));
}
to every row vector of a arma::mat X. How would I do that efficiently? A loop that lopps over the rows of X or are there better solutions? I use R for the most time and really got used to avoid loops whenever it is possible. I tried the .each_row() operations but had no luck...

Related

Negative subscripts in matrix indexing

In Rcpp/RcppArmadillo I want to do the following: From an n x n matrix A, I would like to extract a submatrix A[-j, -j] where j is a vector of indices: In R it can go like
A = matrix(1:16, 4, 4)
j = c(2, 3)
A[-j, -j]
Seems that this functionality is not available in Rcpp or RcppArmadillo - sorry if I have overlooked something. One approach in R is
pos = setdiff(1:nrow(A), j)
A[pos, pos]
That will carry over to RcppArmadillo, but it seems akward having to create the vector pos as the complement of j - and I am not sure how to do it efficiently.
Does anyone have an idea for an efficient implementation / or a piece of code to share?
The armadillo docs have the function .shed which takes an argument(s) that "... contains the indices of rows/columns/slices to remove". From my reading to remove both rows and columns will take two calls of .shed().
Using your example
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
arma::mat fun(arma::mat X, arma::uvec row, arma::uvec col) {
X.shed_cols(col); // remove columns
X.shed_rows(row); // remove rows
return(X);
}
/***R
A = matrix(1:16, 4, 4)
j = c(2, 3)
A[-j, -j]
# minus one for zero indexing
fun(A, j-1, j-1)
*/

solving matrices using Cramer's rule

So I searched the in internet looking for programs with Cramer's Rule and there were some few, but apparently these examples were for fixed matrices only like 2x2 or 4x4.
However, I am looking for a way to solve a NxN Matrix. So I started and reached the point of asking the user for the size of the matrix and asked the user to input the values of the matrix but then I don't know how to move on from here.
As in I guess my next step is to apply Cramer's rule and get the answers but I just don't know how.This is the step I'm missing. can anybody help me please?
First, you need to calculate the determinant of your equations system matrix - that is the matrix, that consists of the coefficients (from the left-hand side of the equations) - let it be D.
Then, to calculate the value of a certain variable, you need to take the matrix of your system (from the previous step), replace the coefficients of the corresponding column with constant terms (from the right-hand side), calculate the determinant of resulting matrix - let it be C, and divide C by D.
A bit more about the replacement from the previous step: say, your matrix if 3x3 (as in the image) - so, you have a system of equations, where every a coefficient is multiplied by x, every b - by y, and every c by z, and ds are the constant terms. So, to calculate y, you replace those coefficients that are multiplied by y - bs in this case, with ds.
You perform the second step for every variable and your system gets solved.
You can find an example in https://rosettacode.org/wiki/Cramer%27s_rule#C
Although the specific example deals with a 4X4 matrix the code is written to accommodate any size square matrix.
What you need is calculate the determinant. Cramer's rule is just for the determinant of a NxN matrix
if N is not big, you can use the Cramer's rule(see code below), which is quite straightforward. However, this method is not efficient; if your N is big, you need to resort to other methods, such as lu decomposition
Assuming your data is double, and result can be hold by double.
#include <malloc.h>
#include <stdio.h>
double det(double * matrix, int n) {
if( 1 >= n ) return matrix[ 0 ];
double *subMatrix = (double*)malloc(( n - 1 )*( n - 1 ) * sizeof(double));
double result = 0.0;
for( int i = 0; i < n; ++i ) {
for( int j = 0; j < n - 1; ++j ) {
for( int k = 0; k < i; ++k )
subMatrix[ j*( n - 1 ) + k ] = matrix[ ( j + 1 )*n + k ];
for( int k = i + 1; k < n; ++k )
subMatrix[ j*( n - 1 ) + ( k - 1 ) ] = matrix[ ( j + 1 )*n + k ];
}
if( i % 2 == 0 )
result += matrix[ 0 * n + i ] * det(subMatrix, n - 1);
else
result -= matrix[ 0 * n + i ] * det(subMatrix, n - 1);
}
free(subMatrix);
return result;
}
int main() {
double matrix[ ] = { 1,2,3,4,5,6,7,8,2,6,4,8,3,1,1,2 };
printf("%lf\n", det(matrix, 4));
return 0;
}

Rcpparmadillo matrixproduct performance

Can someone explain to me why the calculations becomes so much slower when I add arma::mat P(X * arma::inv(X.t() * X) * X.t()); to my code. The mean grew with a factor 164 last time I benchmarked the code.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
//[[Rcpp::export]]
List test1(DataFrame data, Language formula, String y_name) {
Function model_matrix("model.matrix");
NumericMatrix x_rcpp = model_matrix(formula, data);
NumericVector y_rcpp = data[y_name];
arma::mat X(x_rcpp.begin(), x_rcpp.nrow(), x_rcpp.ncol());
arma::colvec Y(y_rcpp.begin(), y_rcpp.size());
arma::colvec coef = inv(X.t() * X) * X.t() * Y;
arma::colvec resid = Y - X * coef;
arma::colvec fitted = X * coef;
DataFrame data_res = DataFrame::create(_["Resid"] = resid,
_["Fitted"] = fitted);
return List::create(_["Results"] = coef,
_["Data"] = data_res);
}
//[[Rcpp::export]]
List test2(DataFrame data, Language formula, String y_name) {
Function model_matrix("model.matrix");
NumericMatrix x_rcpp = model_matrix(formula, data);
NumericVector y_rcpp = data[y_name];
arma::mat X(x_rcpp.begin(), x_rcpp.nrow(), x_rcpp.ncol());
arma::colvec Y(y_rcpp.begin(), y_rcpp.size());
arma::colvec coef = inv(X.t() * X) * X.t() * Y;
arma::colvec resid = Y - X * coef;
arma::colvec fitted = X * coef;
arma::mat P(X * arma::inv(X.t() * X) * X.t());
DataFrame data_res = DataFrame::create(_["Resid"] = resid,
_["Fitted"] = fitted);
return List::create(_["Results"] = coef,
_["Data"] = data_res);
}
/*** R
data <- data.frame(Y = rnorm(10000), X1 = rnorm(10000), X2 = rnorm(10000), X3 = rnorm(10000))
microbenchmark::microbenchmark(test1(data, Y~X1+X2+X3, "Y"),
test2(data, Y~X1+X2+X3, "Y"), times = 10)
*/
Best regards,
Jakob
What you are doing is awfully close to fastLm() which I revised many times over the years. From that we can draw a few conclusions:
Don't X (X' X)^1 X' directly. Use solve().
Don't ever work off a formula object. Use a matrix and vector for X and y.
Here is benchmark example illustrating how parsing the formula destroys all gains from the matrix algebra.
As an aside, R itself has pivoted operations for rank-deficient matrix. That help with deformed matrices; in many "normal" cases you should be ok.
Great question. Not entirely sure why the speed increase outside of a few notes that I've made. So, be warned.
Consider the n being used here is 10000 with the p being 3.
Let's look at the operations requested. We'll start with the coef or beta_hat operation:
Beta_[p x 1] = (X^T_[p x n] * X_[n x p])^(-1) * X^T_[p x n] * Y_[n x 1]
Looking at the P or projection / hat matrix:
P_[n x n] = X_[n x p] * (X^T_[p x n] * X_[n x p])^(-1) * X^T_[p x n]
So, the N matrix here is sufficiently larger than the prior matrix. Matrix multiplication is generally governed by O(n^3) (the naive schoolbook multiplication). So, potentially, this can explain the large increment in time.
Outside of that, there are repetitive calculations involving
(X^T_[p x n] * X_[n x p])^(-1) * X^T_[p x n]
within test2 causing it to be recomputed. The main issue here is the inverse being the most expensive operation.
Also, regarding the use of inv the API entry indicates that:
if matrix A is know to be symmetric positive definite, using inv_sympd() is faster
if matrix A is know to be diagonal, use inv( diagmat(A) )
to solve a system of linear equations, such as Z = inv(X)*Y, using solve() is faster and more accurate
The third point is particular of interest in this case as it gives a more optimized routine for inv(X.t() * X)*X.t() => solve(X.t() * X, X.t())

How can I get Z*Z^T using GSL, where Z is column vector?

I am looking through GSL functions to calculate Z*Z^T, where Z is n*1 column vector, but I could not find any fit function, every help is much appreciated.
GSL supports BLAS (basic linear algebra subprograms),
see [http://www.gnu.org/software/gsl/manual/html_node/GSL-BLAS-Interface.html][1].
The functions are classified by the complexity of the operation:
level 1: vector-vector operations
level 2: matrix-vector operations
level 3: matrix-matrix operations
Most functions come in different versions for float, double and complex numbers. Your operation is basically an outer product of the vector Z with itself.
You can initialize the vector as a column vector (here double precision numbers):
gsl_matrix * Z = gsl_matrix_calloc (n,1);
and then use the BLAS function gsl_blas_dgemm to compute
Z * Z^T. The first arguments of this function determine, whether or not the input matrices should be transposed before the matrix multiplication:
gsl_blas_dgemm (CblasNoTrans, CblasTrans, 1.0, Z, Z, 0.0, C);
Here's a working test program (you may need to link it against gsl and blas):
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_blas.h>
int main(int argc, char ** argv)
{
size_t n = 4;
gsl_matrix * Z = gsl_matrix_calloc (n,1);
gsl_matrix * C = gsl_matrix_calloc (n,n);
gsl_matrix_set(Z,0,0,1);
gsl_matrix_set(Z,1,0,2);
gsl_matrix_set(Z,2,0,0);
gsl_matrix_set(Z,3,0,1);
gsl_blas_dgemm (CblasNoTrans,
CblasTrans, 1.0, Z, Z, 0.0, C);
int i,j;
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
printf ("%g\t", gsl_matrix_get (C, i, j));
}
printf("\n");
}
gsl_matrix_free(Z);
gsl_matrix_free(C);
return 0;
}

Calculate bessel function in MATLAB using Jm+1=2mj(m) -j(m-1) formula

I tried to implement bessel function using that formula, this is the code:
function result=Bessel(num);
if num==0
result=bessel(0,1);
elseif num==1
result=bessel(1,1);
else
result=2*(num-1)*Bessel(num-1)-Bessel(num-2);
end;
But if I use MATLAB's bessel function to compare it with this one, I get too high different values.
For example if I type Bessel(20) it gives me 3.1689e+005 as result, if instead I type bessel(20,1) it gives me 3.8735e-025 , a totally different result.
such recurrence relations are nice in mathematics but numerically unstable when implementing algorithms using limited precision representations of floating-point numbers.
Consider the following comparison:
x = 0:20;
y1 = arrayfun(#(n)besselj(n,1), x); %# builtin function
y2 = arrayfun(#Bessel, x); %# your function
semilogy(x,y1, x,y2), grid on
legend('besselj','Bessel')
title('J_\nu(z)'), xlabel('\nu'), ylabel('log scale')
So you can see how the computed values start to differ significantly after 9.
According to MATLAB:
BESSELJ uses a MEX interface to a Fortran library by D. E. Amos.
and gives the following as references for their implementation:
D. E. Amos, "A subroutine package for Bessel functions of a complex
argument and nonnegative order", Sandia National Laboratory Report,
SAND85-1018, May, 1985.
D. E. Amos, "A portable package for Bessel functions of a complex
argument and nonnegative order", Trans. Math. Software, 1986.
The forward recurrence relation you are using is not stable. To see why, consider that the values of BesselJ(n,x) become smaller and smaller by about a factor 1/2n. You can see this by looking at the first term of the Taylor series for J.
So, what you're doing is subtracting a large number from a multiple of a somewhat smaller number to get an even smaller number. Numerically, that's not going to work well.
Look at it this way. We know the result is of the order of 10^-25. You start out with numbers that are of the order of 1. So in order to get even one accurate digit out of this, we have to know the first two numbers with at least 25 digits precision. We clearly don't, and the recurrence actually diverges.
Using the same recurrence relation to go backwards, from high orders to low orders, is stable. When you start with correct values for J(20,1) and J(19,1), you can calculate all orders down to 0 with full accuracy as well. Why does this work? Because now the numbers are getting larger in each step. You're subtracting a very small number from an exact multiple of a larger number to get an even larger number.
You can just modify the code below which is for the Spherical bessel function. It is well tested and works for all arguments and order range. I am sorry it is in C#
public static Complex bessel(int n, Complex z)
{
if (n == 0) return sin(z) / z;
if (n == 1) return sin(z) / (z * z) - cos(z) / z;
if (n <= System.Math.Abs(z.real))
{
Complex h0 = bessel(0, z);
Complex h1 = bessel(1, z);
Complex ret = 0;
for (int i = 2; i <= n; i++)
{
ret = (2 * i - 1) / z * h1 - h0;
h0 = h1;
h1 = ret;
if (double.IsInfinity(ret.real) || double.IsInfinity(ret.imag)) return double.PositiveInfinity;
}
return ret;
}
else
{
double u = 2.0 * abs(z.real) / (2 * n + 1);
double a = 0.1;
double b = 0.175;
int v = n - (int)System.Math.Ceiling((System.Math.Log(0.5e-16 * (a + b * u * (2 - System.Math.Pow(u, 2)) / (1 - System.Math.Pow(u, 2))), 2)));
Complex ret = 0;
while (v > n - 1)
{
ret = z / (2 * v + 1.0 - z * ret);
v = v - 1;
}
Complex jnM1 = ret;
while (v > 0)
{
ret = z / (2 * v + 1.0 - z * ret);
jnM1 = jnM1 * ret;
v = v - 1;
}
return jnM1 * sin(z) / z;
}
}

Resources