Stan syntax to create vector of products from parameters without sampling the product - stan

I have a latent variable model in which I produce a product term. The product term is the product of two latent variables who's scores are sampled. Currently, my model is sampling the product term. This has drastically increased the number of parameters in my model.
My original model was in non matrix formulation:
vector [N] mueta;
matrix [N ,2] xi ;
mueta = b1[1] +
b1[2]*xi[,1] +
b1[3]*xi[,2] +
b1[4]*(xi[,2].*xi[,1]) ;
I changed it to a matrix formulation wherexi[,1] is an N length vector of 1s (intercept), xi[,2:3] are factor scores, and xi[,4] is an interaction effect.
vector [N] mueta;
xi[,1] = rep_vector(1, N);
xi[,2:3] = zi * diag_pre_multiply(sigmaxi,L1)' ;
xi[,4] = (xi[,2].*xi[,3]);
mueta = xi * b1 ;
The first model does not sample the product of the xi matrix, the second formulation does. Is there a way for me to specify this in Stan so that xi[,4] is not sampled, and is just a generated value from the product of the sampled scores of the 2 factors.

I have to formulate this as an answer because I can't format code in a comment. I'd suggest declaring xi one size bigger and calculating this as
vector[N] mueta;
xi[ , 1] = rep_vector(1, N);
xi[ , 2:3] = zi * diag_pre_multiply(sigmaxi, L1)' ;
xi[ , 4] = xi[ , 2] .* xi[ , 3];
mueta = xi * b1
If xi[ , 2] and xi[ , 3] are data, then you can also precompute their elementwise product. So this can be:
transformed data {
vector[N] intercept = rep_vector(1, N);
vector[N] xi2_3 = xi[ , 2] .* xi[ , 3];
...
vector[N] mueta
= append_row(intercept,
append_row(zi * diag_pre_multiply(sigmaxi, L1)',
xi2_3))
* b1;
It'd be even better to reorganize the predictors so that you have append_row(intercept, xi2_3) defined as a transformed data variable.
It's probably possible to go further and just directly define the elements of mueta (mu_eta?) without first construting a matrix.

It looks like I solved my own issue. I wanted to post this answer for others who may have a similar problem.
vector [N] mueta;
xi[,1] = rep_vector(1, N);
xi[,2:3] = zi * diag_pre_multiply(sigmaxi,L1)' ;
mueta = (append_col(xi,(xi[,2].*xi[,3])) * b1) ;

Related

To understand how Gram-Schmidt Process is translated into this piece of code as the implementation

Trying to understand Gram-Schmidt process from this explanation:
http://mlwiki.org/index.php/Gram-Schmidt_Process
The steps of the calculation make sense to me. However the Python implementation included in the same article doesn't seem to be aligned.
def normalize(v):
return v / np.sqrt(v.dot(v))
n = len(A)
A[:, 0] = normalize(A[:, 0])
for i in range(1, n):
Ai = A[:, i]
for j in range(0, i):
Aj = A[:, j]
t = Ai.dot(Aj)
Ai = Ai - t * Aj
A[:, i] = normalize(Ai)
From above code, we see it does dot product for V1 and b, however the (V1,V1) part is not done as the denominator (refer to below equation). I wonder how below equation is translated into code residing in the for loop?
This is what the code does exactly
Basically it normalize the previous vector (column in A) and project the current one to it and to be subtracted by the current one.
Normalization happens with every vector for neat calculation.
The V2 equation above doesn't normalize the previous vector hence the difference.
Try this vectorized implementation.
Also I would suggest to go through David C lay book for theory.
def replace_zero(array):
for i in range(len(array)) :
if array[i] == 0 :
array[i] = 1
return array
def gram_schmidt(self,A, norm=True, row_vect=False):
"""Orthonormalizes vectors by gram-schmidt process
Parameters
-----------
A : ndarray,
Matrix having vectors in its columns
norm : bool,
Do you need Normalized vectors?
row_vect: bool,
Does Matrix A has vectors in its rows?
Returns
-------
G : ndarray,
Matrix of orthogonal vectors
Gram-Schmidt Process
--------------------
The Gram–Schmidt process is a simple algorithm for
producing an orthogonal or orthonormal basis for any
nonzero subspace of Rn.
Given a basis {x1,....,xp} for a nonzero subspace W of Rn,
define
v1 = x1
v2 = x2 - (x2.v1/v1.v1) * v1
v3 = x3 - (x3.v1/v1.v1) * v1 - (x3.v2/v2.v2) * v2
.
.
.
vp = xp - (xp.v1/v1.v1) * v1 - (xp.v2/v2.v2) * v2 - .......
.... - (xp.v(p-1) / v(p-1).v(p-1) ) * v(p-1)
Then {v1,.....,vp} is an orthogonal basis for W .
In addition,
Span {v1,.....,vp} = Span {x1,.....,xp} for 1 <= k <= p
References
----------
Linear Algebra and Its Applications - By David.C.Lay
"""
if row_vect :
# if true, transpose it to make column vector matrix
A = A.T
no_of_vectors = A.shape[1]
G = A[:,0:1].copy() # copy the first vector in matrix
# 0:1 is done to to be consistent with dimensions - [[1,2,3]]
# iterate from 2nd vector to number of vectors
for i in range(1,no_of_vectors):
# calculates weights(coefficents) for every vector in G
numerator = A[:,i].dot(G)
denominator = np.diag(np.dot(G.T,G)) #to get elements in diagonal
weights = np.squeeze(numerator/denominator)
# projected vector onto subspace G
projected_vector = np.sum(weights * G,
axis=1,
keepdims=True)
# orthogonal vector to subspace G
orthogonalized_vector = A[:,i:i+1] - projected_vector
# now add the orthogonal vector to our set
G = np.hstack((G,orthogonalized_vector))
if norm :
# to get orthoNormal vectors (unit orthogonal vectors)
# replace zero to 1 to deal with division by 0 if matrix has 0 vector
# or normazalization value comes out to be zero
G = G/self.replace_zero(np.linalg.norm(G,axis=0))
if row_vect:
return G.T
return G
G = np.array([[1,0,0],[1,1,0],[1,1,1],[1,1,1]])
gram_schmidt(G)
>
array([[ 0.5 , -0.8660254 , 0. ],
[ 0.5 , 0.28867513, -0.81649658],
[ 0.5 , 0.28867513, 0.40824829],
[ 0.5 , 0.28867513, 0.40824829]])

Renewal Function for Weibull Distribution

The renewal function for Weibull distribution m(t) with t = 10 is given as below.
I want to find the value of m(t). I wrote the following r code to compute m(t)
last_term = NULL
gamma_k = NULL
n = 50
for(k in 1:n){
gamma_k[k] = gamma(2*k + 1)/factorial(k)
}
for(j in 1: (n-1)){
prev = gamma_k[n-j]
last_term[j] = gamma(2*j + 1)/factorial(j)*prev
}
final_term = NULL
find_value = function(n){
for(i in 2:n){
final_term[i] = gamma_k[i] - sum(last_term[1:(i-1)])
}
return(final_term)
}
all_k = find_value(n)
af_sum = NULL
m_t = function(t){
for(k in 1:n){
af_sum[k] = (-1)^(k-1) * all_k[k] * t^(2*k)/gamma(2*k + 1)
}
return(sum(na.omit(af_sum)))
}
m_t(20)
The output is m(t) = 2.670408e+93. Does my iteratvie procedure correct? Thanks.
I don't think it will work. First, lets move Γ(2k+1) from denominator of m(t) into Ak. Thus, Ak will behave roughly as 1/k!.
In the nominator of the m(t) terms there is t2k, so roughly speaking you're computing sum with terms
100k/k!
From Stirling formula
k! ~ kk, making terms
(100/k)k
so yes, they will start to decrease and converge to something but after 100th term
Anyway, here is the code, you could try to improve it, but it breaks at k~70
N <- 20
A <- rep(0, N)
# compute A_k/gamma(2k+1) terms
ps <- 0.0 # previous sum
A[1] = 1.0
for(k in 2:N) {
ps <- ps + A[k-1]*gamma(2*(k-1) + 1)/factorial(k-1)
A[k] <- 1.0/factorial(k) - ps/gamma(2*k+1)
}
print(A)
t <- 10.0
t2 <- t*t
r <- 0.0
for(k in 1:N){
r <- r + (-t2)^k*A[k]
}
print(-r)
UPDATE
Ok, I calculated Ak as in your question, got the same answer. I want to estimate terms Ak/Γ(2k+1) from m(t), I believe it will be pretty much dominated by 1/k! term. To do that I made another array k!*Ak/Γ(2k+1), and it should be close to one.
Code
N <- 20
A <- rep(0.0, N)
psum <- function( pA, k ) {
ps <- 0.0
if (k >= 2) {
jmax <- k - 1
for(j in 1:jmax) {
ps <- ps + (gamma(2*j+1)/factorial(j))*pA[k-j]
}
}
ps
}
# compute A_k/gamma(2k+1) terms
A[1] = gamma(3)
for(k in 2:N) {
A[k] <- gamma(2*k+1)/factorial(k) - psum(A, k)
}
print(A)
B <- rep(0.0, N)
for(k in 1:N) {
B[k] <- (A[k]/gamma(2*k+1))*factorial(k)
}
print(B)
shows that
I got the same Ak values as you did.
Bk is indeed very close to 1
It means that term Ak/Γ(2k+1) could be replaced by 1/k! to get quick estimate of what we might get (with replacement)
m(t) ~= - Sum(k=1, k=Infinity) (-1)k (t2)k / k! = 1 - Sum(k=0, k=Infinity) (-t2)k / k!
This is actually well-known sum and it is equal to exp() with negative argument (well, you have to add term for k=0)
m(t) ~= 1 - exp(-t2)
Conclusions
Approximate value is positive. Probably will stay positive after all, Ak/Γ(2k+1) is a bit different from 1/k!.
We're talking about 1 - exp(-100), which is 1-3.72*10-44! And we're trying to compute it precisely summing and subtracting values on the order of 10100 or even higher. Even with MPFR I don't think this is possible.
Another approach is needed
OK, so I ended up going down a pretty different road on this. I have implemented a simple discretization of the integral equation which defines the renewal function:
m(t) = F(t) + integrate (m(t - s)*f(s), s, 0, t)
The integral is approximated with the rectangle rule. Approximating the integral for different values of t gives a system of linear equations. I wrote a function to generate the equations and extract a matrix of coefficients from it. After looking at some examples, I guessed a rule to define the coefficients directly and used that to generate solutions for some examples. In particular I tried shape = 2, t = 10, as in OP's example, with step = 0.1 (so 101 equations).
I found that the result agrees pretty well with an approximate result which I found in a paper (Baxter et al., cited in the code). Since the renewal function is the expected number of events, for large t it is approximately equal to t/mu where mu is the mean time between events; this is a handy way to know if we're anywhere in the neighborhood.
I was working with Maxima (http://maxima.sourceforge.net), which is not efficient for numerical stuff, but which makes it very easy to experiment with different aspects. At this point it would be straightforward to port the final, numerical stuff to another language such as Python.
Thanks to OP for suggesting the problem, and S. Pappadeux for insightful discussions. Here is the plot I got comparing the discretized approximation (red) with the approximation for large t (blue). Trying some examples with different step sizes, I saw that the values tend to increase a little as step size gets smaller, so I think the red line is probably a little low, and the blue line might be more nearly correct.
Here is my Maxima code:
/* discretize weibull renewal function and formulate system of linear equations
* copyright 2020 by Robert Dodier
* I release this work under terms of the GNU General Public License
*
* This is a program for Maxima, a computer algebra system.
* http://maxima.sourceforge.net/
*/
"Definition of the renewal function m(t):" $
renewal_eq: m(t) = F(t) + 'integrate (m(t - s)*f(s), s, 0, t);
"Approximate integral equation with rectangle rule:" $
discretize_renewal (delta_t, k) :=
if equal(k, 0)
then m(0) = F(0)
else m(k*delta_t) = F(k*delta_t)
+ m(k*delta_t)*f(0)*(delta_t / 2)
+ sum (m((k - j)*delta_t)*f(j*delta_t)*delta_t, j, 1, k - 1)
+ m(0)*f(k*delta_t)*(delta_t / 2);
make_eqs (n, delta_t) :=
makelist (discretize_renewal (delta_t, k), k, 0, n);
make_vars (n, delta_t) :=
makelist (m(k*delta_t), k, 0, n);
"Discretized integral equation and variables for n = 4, delta_t = 1/2:" $
make_eqs (4, 1/2);
make_vars (4, 1/2);
make_eqs_vars (n, delta_t) :=
[make_eqs (n, delta_t), make_vars (n, delta_t)];
load (distrib);
subst_pdf_cdf (shape, scale, e) :=
subst ([f = lambda ([x], pdf_weibull (x, shape, scale)), F = lambda ([x], cdf_weibull (x, shape, scale))], e);
matrix_from (eqs, vars) :=
(augcoefmatrix (eqs, vars),
[submatrix (%%, length(%%) + 1), - col (%%, length(%%) + 1)]);
"Subsitute Weibull pdf and cdf for shape = 2 into discretized equation:" $
apply (matrix_from, make_eqs_vars (4, 1/2));
subst_pdf_cdf (2, 1, %);
"Just the right-hand side matrix:" $
rhs_matrix_from (eqs, vars) :=
(map (rhs, eqs),
augcoefmatrix (%%, vars),
[submatrix (%%, length(%%) + 1), col (%%, length(%%) + 1)]);
"Generate the right-hand side matrix, instead of extracting it from equations:" $
generate_rhs_matrix (n, delta_t) :=
[delta_t * genmatrix (lambda ([i, j], if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))];
"Generate numerical right-hand side matrix, skipping over formulas:" $
generate_rhs_matrix_numerical (shape, scale, n, delta_t) :=
block ([f, F, numer: true], local (f, F),
f: lambda ([x], pdf_weibull (x, shape, scale)),
F: lambda ([x], cdf_weibull (x, shape, scale)),
[genmatrix (lambda ([i, j], delta_t * if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))]);
"Solve approximate integral equation (shape = 3, t = 1) via LU decomposition:" $
fpprintprec: 4 $
n: 20 $
t: 1;
[AA, bb]: generate_rhs_matrix_numerical (3, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Iterative solution of approximate integral equation (shape = 3, t = 1):" $
xx: bb;
for i thru 10 do xx: AA . xx + bb;
xx - (AA.xx + bb);
xx_iterative: xx;
"Should find iterative and LU give same result:" $
xx_diff: xx_iterative - xx_by_lu[1];
sqrt (transpose(xx_diff) . xx_diff);
"Try shape = 2, t = 10:" $
n: 100 $
t: 10 $
[AA, bb]: generate_rhs_matrix_numerical (2, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Baxter, et al., Eq. 3 (for large values of t) compared to discretization:" $
/* L.A. Baxter, E.M. Scheuer, D.J. McConalogue, W.R. Blischke.
* "On the Tabulation of the Renewal Function,"
* Econometrics, vol. 24, no. 2 (May 1982).
* H(t) is their notation for the renewal function.
*/
H(t) := t/mu + sigma^2/(2*mu^2) - 1/2;
tx_points: makelist ([float (k/n*t), xx_by_lu[1][k, 1]], k, 1, n);
plot2d ([H(u), [discrete, tx_points]], [u, 0, t]), mu = mean_weibull(2, 1), sigma = std_weibull(2, 1);

Rcpparmadillo matrixproduct performance

Can someone explain to me why the calculations becomes so much slower when I add arma::mat P(X * arma::inv(X.t() * X) * X.t()); to my code. The mean grew with a factor 164 last time I benchmarked the code.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
//[[Rcpp::export]]
List test1(DataFrame data, Language formula, String y_name) {
Function model_matrix("model.matrix");
NumericMatrix x_rcpp = model_matrix(formula, data);
NumericVector y_rcpp = data[y_name];
arma::mat X(x_rcpp.begin(), x_rcpp.nrow(), x_rcpp.ncol());
arma::colvec Y(y_rcpp.begin(), y_rcpp.size());
arma::colvec coef = inv(X.t() * X) * X.t() * Y;
arma::colvec resid = Y - X * coef;
arma::colvec fitted = X * coef;
DataFrame data_res = DataFrame::create(_["Resid"] = resid,
_["Fitted"] = fitted);
return List::create(_["Results"] = coef,
_["Data"] = data_res);
}
//[[Rcpp::export]]
List test2(DataFrame data, Language formula, String y_name) {
Function model_matrix("model.matrix");
NumericMatrix x_rcpp = model_matrix(formula, data);
NumericVector y_rcpp = data[y_name];
arma::mat X(x_rcpp.begin(), x_rcpp.nrow(), x_rcpp.ncol());
arma::colvec Y(y_rcpp.begin(), y_rcpp.size());
arma::colvec coef = inv(X.t() * X) * X.t() * Y;
arma::colvec resid = Y - X * coef;
arma::colvec fitted = X * coef;
arma::mat P(X * arma::inv(X.t() * X) * X.t());
DataFrame data_res = DataFrame::create(_["Resid"] = resid,
_["Fitted"] = fitted);
return List::create(_["Results"] = coef,
_["Data"] = data_res);
}
/*** R
data <- data.frame(Y = rnorm(10000), X1 = rnorm(10000), X2 = rnorm(10000), X3 = rnorm(10000))
microbenchmark::microbenchmark(test1(data, Y~X1+X2+X3, "Y"),
test2(data, Y~X1+X2+X3, "Y"), times = 10)
*/
Best regards,
Jakob
What you are doing is awfully close to fastLm() which I revised many times over the years. From that we can draw a few conclusions:
Don't X (X' X)^1 X' directly. Use solve().
Don't ever work off a formula object. Use a matrix and vector for X and y.
Here is benchmark example illustrating how parsing the formula destroys all gains from the matrix algebra.
As an aside, R itself has pivoted operations for rank-deficient matrix. That help with deformed matrices; in many "normal" cases you should be ok.
Great question. Not entirely sure why the speed increase outside of a few notes that I've made. So, be warned.
Consider the n being used here is 10000 with the p being 3.
Let's look at the operations requested. We'll start with the coef or beta_hat operation:
Beta_[p x 1] = (X^T_[p x n] * X_[n x p])^(-1) * X^T_[p x n] * Y_[n x 1]
Looking at the P or projection / hat matrix:
P_[n x n] = X_[n x p] * (X^T_[p x n] * X_[n x p])^(-1) * X^T_[p x n]
So, the N matrix here is sufficiently larger than the prior matrix. Matrix multiplication is generally governed by O(n^3) (the naive schoolbook multiplication). So, potentially, this can explain the large increment in time.
Outside of that, there are repetitive calculations involving
(X^T_[p x n] * X_[n x p])^(-1) * X^T_[p x n]
within test2 causing it to be recomputed. The main issue here is the inverse being the most expensive operation.
Also, regarding the use of inv the API entry indicates that:
if matrix A is know to be symmetric positive definite, using inv_sympd() is faster
if matrix A is know to be diagonal, use inv( diagmat(A) )
to solve a system of linear equations, such as Z = inv(X)*Y, using solve() is faster and more accurate
The third point is particular of interest in this case as it gives a more optimized routine for inv(X.t() * X)*X.t() => solve(X.t() * X, X.t())

Keeping exact wave form in memory

Let's say I have a program that calculates the value of the sine wave at time t. The sine wave is of the form sin(f*t + phi). Amplitude is 1.
If I only have one sin term all is fine. I can easily calculate the value at any time t.
But, at runtime, the wave form becomes modified when it combines with other waves. sin(f1 * t + phi1) + sin(f2 * t + phi2) + sin(f3 * t + phi3) + ...
The simplest solution is to have a table with columns for phi and f, iterate over all rows, and sum the results. But to me it feels that once I reach thousands of rows, the computation will become slow.
Is there a different way of doing this? Like combining all the sines into one statement/formula?
If you have a Fourier series (i.e. f_i = i f for some f) you can use the Clenshaw recurrence relation which is significantly faster than computing all the sines (but it might be slightly less accurate).
In your case you can consider the sequence:
f_k = exp( i ( k f t + phi_k) ) , where i is the imaginary unit.
Notice that Im(f_k) = sin( k f t + phi_k ), that is your sequence.
Also
f_k = exp( i ( k f t + phi_k) ) = exp( i k f t ) exp( i phi_k )
Hence you have a_k = exp(i phi_k). You can precompute these values and store them in an array. For simplicity from now on assume a_0 = 0.
Now, exp( i (k + 1) f t) = exp(i k f t) * exp(i f t), so alpha_k = exp(i f t) and beta_k = 0.
You can now apply the recurrence formula, in C++ you can do something like this:
complex<double> clenshaw_fourier(double f, double t, const vector< complex<double> > & a )
{
const complex<double> alpha = exp(f * t * i);
complex<double> b = 0;
for (int k = a.size() - 1; k >0; -- k )
b = a[k] + alpha * b;
return a[0] + alpha * b;
}
Assuming that a[k] == exp( i phi_k ).
The real part of the answer is the sum of cos(k f t + phi_k), while the imaginary part is the sum of sin(k f t + phi_k).
As you can see this only uses addition and multiplications, except for exp(f * t * i) that is only computed once.
There are different bases (plural of basis) that can be advantageous (i.e. compact) for representing different waveforms. The most common and well-known one is that which you mention, called the Fourier basis usually. Daubechies wavelets for example are a relatively recent addition that cope with more discontinuous waveforms much better than a Fourier basis does. But this is really a math topic and probably if you post on Math Overflow you will get better answers.

Given a list of coefficients, create a polynomial

I want to create a polynomial with given coefficients. This seems very simple but what I have found till now did not appear to be the thing I desired.
For example in such an environment;
n = 11
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
v = [1,a,0,0,1,1,1,a,a,0,1]
Given a list/vector v of length n (I will set this n and v at the begining), I want to get the polynomial v(x) as v[i]*x^i.
(Actually after that I am going to build the quotient ring GF(4,'a')[x] /< x^n-v(x) > after getting this v(x) from above) then I will say;
S = R.quotient(x^n-v(x), 'y')
y = S.gen()
But I couldn't write it.
This is a frequently asked question in many places so it is better to leave it here as an answer although the answer I have is so simple:
I just wrote R(v) and it gave me the polynomial:
sage
n = 11
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
v = [1,a,0,0,1,1,1,a,a,0,1]
R(v)
x^10 + a*x^8 + a*x^7 + x^6 + x^5 + x^4 + a*x + 1
Basically (that is, ignoring the specifics of your polynomial ring) you have a list/vector v of length n and you require a polynomial which is the sum of all v[i]*x^i. Note that this sum equals the matrix product V.X where V is a one row matrix (essentially equal to the vector v) and X is a column matrix consisting of powers of x. In Maxima you could write
v: [1,a,0,0,1,1,1,a,a,0,1]$
n: length(v)$
V: matrix(v)$
X: genmatrix(lambda([i,j], x^(i-1)), n, 1)$
V.X;
The output is
x^10+ax^8+ax^7+x^6+x^5+x^4+a*x+1

Resources