STAN - Real parameter given a categorical vector - stan

I am a beginner using stan (and stackoverflow by the way, I have absolutely no idea how you pretty print a dataframe on it, could not find how, sorry).
Let's say I want to make the following model :
y ~ normal(I*S + P*J,sigma)
P ~ normal(dP,1)
(to simplify this example I fix the deviation around P at 1)
where on one hand, I is a n,p matrix predictor and S are the corresponding regression coefficients (size p)
and on the other hand dP and J can only have 3 different values but my dataframe is constructed such as they looks like this (R-project):
dp <- c(0,0,0,0,0,0,0,2,2,2,2,2,2,2,1,1,1,1,1,1)
J <- c(5.2,5.2,....,2.3,2.3,....,7.5,7.5,...)
parameters are S, P and sigma.
I do no want stan to change every components of P, dp represents 3 types of data and I only want three different values of P corresponding to the 3 differents values of dp.
However each row of my dataframe contains different values of I.
edit: said in another way: for each row k, I want :
y[k] ~ I[k,1]*S[1]+I[k,2]*S[2]...+ real_value_P * J[k]
How can I achieve that ?
Here is my code:
data {
int < lower = 1 > NR; // Number of rows
int < lower = 1 > NC; // Number of columns
matrix [NR,NC] I ;// Predictor I
vector [NR] dP;
vector [NR] J ;
vector [NR] y; // Outcome
}
parameters {
real < lower = 0 > sigma; // Error SD
vector [NC] S ;
vector [NC] P ;
}
model {
P ~ normal(dP,1)
y ~ normal(I*S+P*J,sigma) ;
}
I am not sure I have been really clear, stats are still a tough subject to me either and my model is a bit more complicated than presented.
Thanks

The "trick" seems to indicate in a vector ("indices" here) which value of P to take (1,1,1,1,1,...2,2,2,2,2,...,3,3,3,3,3) and then to loop over it in the parameters to assign the correct value :
transformed parameters {
vector [NR] JP ; //J*P
for (k in 1:NR){
JP[k]=True_P[indices[k]]*J[k] ;
}
Hence the complete code :
data {
int < lower = 1 > NR; // Number of rows
int < lower = 1 > NC; // Number of columns
matrix [NR,NC] I ;// Predictor I
int indices[NR]; // indices
vector [3] P ;
vector [NR] dP ;
vector [NR] J ;
vector [NR] y; // Outcome
}
parameters {
real < lower = 0 > sigma; // Error SD
vector < lower = 0 > [NC] S ; // regression coefficients for predictors I
vector [2] True_P ;
}
transformed parameters {
vector [NR] JP ; //J*P
for (k in 1:NR){
JP[k]=True_P[indices[k]]*J[k] ;
}
}
model {
for (k in 1:3){
P[k] ~ normal(True_P[k],1) ;
}
y ~ normal(I*S+JP,sigma) ;
}
generated quantities {
} // The posterior predictive distributiondistribution

Related

How to use an if condition to select only the values different from NA in a matrix?

I am trying to apply a function on the elements of an array that are different from NA. I tried to use an if statement with the !is.na function but I get an error message saying that the "argument is of length zero". Would someone have an idea on how to fix that error or an alternative way to only select the non NA values of the matrix?
F <- function(x, a, b, c, d) {
f <- a*(tanh(b*(x - c)) - d)
return(f)
}
nlon <- 3241 ; nlat <- 1680
p1 <- 3221 ; p2 <- 1103
pr_new <- matrix(0, nlat, nlon) # for the example
lim <- 10
for (n in 1:nlon) {
a <- -0.5; b <- 1; c <- 0; d <- 1 #Parameters of F
if (n < p1) { #left side of the step
for (m in nlat - lim:nlat) {
if (!is.na(c(pr_new[m, n]))) { #no calculation on the NA values
pr_new[m, n] <- F(n, a, b, c, d)
}
}
} else { #right side of the step
if (is.na(c(pr_new[p2, n]))) { #if we are on the upper step
for (m in p2 - 1:p2 - 1 - lim) {
if (!is.na(c(pr_new[m, n]))) { #no calculation on the NA values
pr_new[m, n] <- F(m, a, b, c, d)
}
}
} else { #if we are on the lower step
for (m in p2:p2 - lim) {
if (!is.na(c(pr_new[m, n]))) { #no calculation on the NA values
pr_new[m, n] <- F(m, a, b, c, d)
}
}
}
}
}
You can find out which value to which a loop-index was set after an error by simply typing the loop-index name at the console:
Error in if (!is.na(c(pr_new[m, n]))) { : argument is of length zero
> m
[1] 0 # R uses 1 based indexing so 0 indexed value is not there
> n
[1] 1
> str( p2:p2-lim) # demonstrating error
num 1093
The comment was correct from #zephryl , but it only identified one of the three times that a similar error was made.
for (m in nlat-lim:nlat){ ...
for (m in p2-1:p2-1-lim){ ...
for (m in p2:p2-lim){ ...
In each of these an expression using both colons and minnus signs has been incorrectly contsructed because the ":" has a higher operator precedence than a binary minus sign. You can find the operator precedence rules at the ?Syntax help page.
If you correct those three errors you get code that runs without error.
for (n in 1:nlon){
a= -0.5; b=1; c=0; d=1 #Parameters of F
if (n<p1){ #left side of the step
for (m in (nlat-lim):nlat ){ # fix #1
if (!is.na(c(pr_new[m,n]))){ #no calculation on the NA values
pr_new[m,n]=F(n,a,b,c,d)
}
}
}
else{ #right side of the step
if (is.na(c(pr_new[p2,n]))) { #if we are on the upper step
for (m in (p2-1):(p2-1-lim) ){ # fix #2
if (!is.na(c(pr_new[m,n]))){ #no calculation on the NA values
pr_new[m,n]=F(m,a,b,c,d)
}
}
}
else { #if we are on the lower step
for (m in p2:(p2-lim) ){ # fix # 3
if (!is.na(c(pr_new[m,n]))){ #no calculation on the NA values
pr_new[m,n]=F(m,a,b,c,d)
}
}
}
}
}
Regarding the tangential "answer" from a new user of low rep, I did test the theory that it might return a similar answer. I did try chatGPT on some questions and noticed that it not only returned incorrect ansers, but it was alo unable to learn from its mistakes when there were reported to it. When the title and body of the question were given to ChatGPT it gave an almost identical answer to the now-deleted one.
The which function can be used to return a vector of indices from an array or matrix but it is most useful when it is used with the arr.ind parameter is set to its non-default value: ..., arr.in = TRUE. And na.omit can be used to remove cases from matrices. It will however, remove, entire lines of values for any row that contains a single NA.

solving matrices using Cramer's rule

So I searched the in internet looking for programs with Cramer's Rule and there were some few, but apparently these examples were for fixed matrices only like 2x2 or 4x4.
However, I am looking for a way to solve a NxN Matrix. So I started and reached the point of asking the user for the size of the matrix and asked the user to input the values of the matrix but then I don't know how to move on from here.
As in I guess my next step is to apply Cramer's rule and get the answers but I just don't know how.This is the step I'm missing. can anybody help me please?
First, you need to calculate the determinant of your equations system matrix - that is the matrix, that consists of the coefficients (from the left-hand side of the equations) - let it be D.
Then, to calculate the value of a certain variable, you need to take the matrix of your system (from the previous step), replace the coefficients of the corresponding column with constant terms (from the right-hand side), calculate the determinant of resulting matrix - let it be C, and divide C by D.
A bit more about the replacement from the previous step: say, your matrix if 3x3 (as in the image) - so, you have a system of equations, where every a coefficient is multiplied by x, every b - by y, and every c by z, and ds are the constant terms. So, to calculate y, you replace those coefficients that are multiplied by y - bs in this case, with ds.
You perform the second step for every variable and your system gets solved.
You can find an example in https://rosettacode.org/wiki/Cramer%27s_rule#C
Although the specific example deals with a 4X4 matrix the code is written to accommodate any size square matrix.
What you need is calculate the determinant. Cramer's rule is just for the determinant of a NxN matrix
if N is not big, you can use the Cramer's rule(see code below), which is quite straightforward. However, this method is not efficient; if your N is big, you need to resort to other methods, such as lu decomposition
Assuming your data is double, and result can be hold by double.
#include <malloc.h>
#include <stdio.h>
double det(double * matrix, int n) {
if( 1 >= n ) return matrix[ 0 ];
double *subMatrix = (double*)malloc(( n - 1 )*( n - 1 ) * sizeof(double));
double result = 0.0;
for( int i = 0; i < n; ++i ) {
for( int j = 0; j < n - 1; ++j ) {
for( int k = 0; k < i; ++k )
subMatrix[ j*( n - 1 ) + k ] = matrix[ ( j + 1 )*n + k ];
for( int k = i + 1; k < n; ++k )
subMatrix[ j*( n - 1 ) + ( k - 1 ) ] = matrix[ ( j + 1 )*n + k ];
}
if( i % 2 == 0 )
result += matrix[ 0 * n + i ] * det(subMatrix, n - 1);
else
result -= matrix[ 0 * n + i ] * det(subMatrix, n - 1);
}
free(subMatrix);
return result;
}
int main() {
double matrix[ ] = { 1,2,3,4,5,6,7,8,2,6,4,8,3,1,1,2 };
printf("%lf\n", det(matrix, 4));
return 0;
}

meaning of brm regression parameters

I am using the brms package to build a multilevel model with a gaussian process on the predictor, x. The model looks like this: make_stancode(y ~ gp(x, cov = "exp_quad", by= groups) + (1| groups), data = dat) so a gp on the x predictor and a multilevel group variable. In my case I have 5 groups. I've been looking at the code for that (below) and I'm trying to figure out the meanings and dimensions of some of the parameters.
I see that M_1 is the number of groups
My questions are:
What is the meaning of N_1, is it the same as the number of observations , N? It is used here: vector[N_1] z_1[M_1]; // unscaled group-level effects
For Kgp_1 and Mgp_1 ( int Kgp_1; and int Mgp_1;), if I have 5 groups are both Kgp_1 and Mgp_1 equal to 5? If so, why are two variables used?
// generated with brms 1.10.0
functions {
/* compute a latent Gaussian process
* Args:
* x: array of continuous predictor values
* sdgp: marginal SD parameter
* lscale: length-scale parameter
* zgp: vector of independent standard normal variables
* Returns:
* a vector to be added to the linear predictor
*/
vector gp(vector[] x, real sdgp, real lscale, vector zgp) {
matrix[size(x), size(x)] cov;
cov = cov_exp_quad(x, sdgp, lscale);
for (n in 1:size(x)) {
// deal with numerical non-positive-definiteness
cov[n, n] = cov[n, n] + 1e-12;
}
return cholesky_decompose(cov) * zgp;
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> Kgp_1;
int<lower=1> Mgp_1;
vector[Mgp_1] Xgp_1[N];
int<lower=1> Igp_1[Kgp_1];
int<lower=1> Jgp_1_1[Igp_1[1]];
int<lower=1> Jgp_1_2[Igp_1[2]];
int<lower=1> Jgp_1_3[Igp_1[3]];
int<lower=1> Jgp_1_4[Igp_1[4]];
int<lower=1> Jgp_1_5[Igp_1[5]];
// data for group-level effects of ID 1
int<lower=1> J_1[N];
int<lower=1> N_1;
int<lower=1> M_1;
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real temp_Intercept; // temporary intercept
// GP hyperparameters
vector<lower=0>[Kgp_1] sdgp_1;
vector<lower=0>[Kgp_1] lscale_1;
vector[N] zgp_1;
real<lower=0> sigma; // residual SD
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // unscaled group-level effects
}
transformed parameters {
// group-level effects
vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
}
model {
vector[N] mu = rep_vector(0, N) + temp_Intercept;
mu[Jgp_1_1] = mu[Jgp_1_1] + gp(Xgp_1[Jgp_1_1], sdgp_1[1], lscale_1[1], zgp_1[Jgp_1_1]);
mu[Jgp_1_2] = mu[Jgp_1_2] + gp(Xgp_1[Jgp_1_2], sdgp_1[2], lscale_1[2], zgp_1[Jgp_1_2]);
mu[Jgp_1_3] = mu[Jgp_1_3] + gp(Xgp_1[Jgp_1_3], sdgp_1[3], lscale_1[3], zgp_1[Jgp_1_3]);
mu[Jgp_1_4] = mu[Jgp_1_4] + gp(Xgp_1[Jgp_1_4], sdgp_1[4], lscale_1[4], zgp_1[Jgp_1_4]);
mu[Jgp_1_5] = mu[Jgp_1_5] + gp(Xgp_1[Jgp_1_5], sdgp_1[5], lscale_1[5], zgp_1[Jgp_1_5]);
for (n in 1:N) {
mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n];
}
// priors including all constants
target += student_t_lpdf(sdgp_1 | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(lscale_1 | 0, 0.5)
- 1 * normal_lccdf(0 | 0, 0.5);
target += normal_lpdf(zgp_1 | 0, 1);
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_1[1] | 0, 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept;
}
If you use make_standata(...) on the same formula, you can see the data that would be passed onto Stan. From here, you can piece together what some of the variables do. If I use the lme4::sleepstudy dataset as a proxy for your data, I get:
library(brms)
dat <- lme4::sleepstudy
dat$groups <- dat$Subject
dat$y <- dat$Reaction
dat$x <- dat$Days
s_data <- make_standata(
y ~ gp(x, cov = "exp_quad", by= groups) + (1| groups), data = dat)
s_data$N_1
#> 18
For N_1, I get 18 which is the number of levels in groups in this dataset.
For Kgp_1 and Mgp_1 ( int Kgp_1; and int Mgp_1;), if I have 5 groups are both Kgp_1 and Mgp_1 equal to 5? If so, why are two variables used?
s_data$Mgp_1
#> 1
s_data$Kgp_1
#> 18
It looks like Kgp_1 is again the number of groups. I am not sure what Mgp_1 does besides set the length of the vector vector[Mgp_1] Xgp_1[N];

Partially observed parameter in Stan

I am trying to migrate some code from JAGS to Stan. Say I have the following dataset:
N <- 10
nchoices <- 3
ncontrols <- 3
toydata <- list("y" = rbinom(N, nchoices - 1, .5),
"controls" = matrix(runif(N*ncontrols), N, ncontrols),
"N" = N,
"nchoices" = nchoices,
"ncontrols" = ncontrols)
and that I want to run a multinomial logit with the following code (taken from section 9.5 of the documentation):
data {
int N;
int nchoices;
int y[N];
int ncontrols;
vector[ncontrols] controls[N];
}
parameters {
matrix[nchoices, ncontrols] beta;
}
model {
for (k in 1:nchoices)
for (d in 1:ncontrols)
beta[k,d] ~ normal(0,100);
for (n in 1:N)
y[n] ~ categorical(softmax(beta * controls[n]));
}
I now want to fix the first row of beta to zero. In JAGS I would simply declare in the model block that
for (i in 1:ncontrols) {
beta[1,i] <- 0
}
but I am not sure about how to do this in Stan. I have tried many combinations along the lines of section 6.2 of the documentation (Partially Known Parameters) like, for instance,
parameters {
matrix[nchoices, ncontrols] betaNonObs;
}
transformed parameters {
matrix[nchoices, ncontrols] beta;
for (i in 1:ncontrols) beta[1][i] <- 0
for (k in 2:nchoices) beta[k] <- betaNonObs[k - 1]
}
but none of them work. Any suggestions?
It would be helpful to mention the error message. In this case, if beta is declared to be a matrix, then the syntax you want is the R-like syntax
beta[1,i] <- 0.0; // you also omitted the semicolon
To answer your broader question, I believe you were on the right track with your last approach. I would create a matrix of parameters in the parameters block called free_beta and copy those elements to another matrix declared in the model block called beta that has one extra row at the top for the fixed zeros. Like
data {
int N;
int nchoices;
int y[N];
int ncontrols;
vector[ncontrols] controls[N];
}
parameters {
matrix[nchoices-1, ncontrols] free_beta;
}
model {
// copy free beta into beta
matrix[nchoices,ncontrols] beta;
for (d in 1:ncontrols)
beta[1,d] <- 0.0;
for (k in 2:nchoices)
for (d in 1:ncontrols)
beta[k,d] <- free_beta[k-1,d];
// priors on free_beta, which execute faster this way
for (k in 1:(nchoices-1))
row(free_beta,k) ~ normal(0.0, 100.0);
// likelihood
for (n in 1:N)
y[n] ~ categorical(softmax(beta * controls[n]));
}

Math Problem: Scale a graph so that it matches another

I have 2 tables of values and want to scale the first one so that it matches the 2nd one as good as possible. Both have the same length. If both are drawn as graphs in a diagram they should be as close to each other as possible. But I do not want quadratic, but simple linear weights.
My problem is, that I have no idea how to actually compute the best scaling factor because of the Abs function.
Some pseudocode:
//given:
float[] table1= ...;
float[] table2= ...;
//wanted:
float factor= ???; // I have no idea how to compute this
float remainingDifference=0;
for(int i=0; i<length; i++)
{
float scaledValue=table1[i] * factor;
//Sum up the differences. I use the Abs function because negative differences are differences too.
remainingDifference += Abs(scaledValue - table2[i]);
}
I want to compute the scaling factor so that the remainingDifference is minimal.
Simple linear weights is hard like you said.
a_n = first sequence
b_n = second sequence
c = scaling factor
Your residual function is (sums are from i=1 to N, the number of points):
SUM( |a_i - c*b_i| )
Taking the derivative with respect to c yields:
d/dc SUM( |a_i - c*b_i| )
= SUM( b_i * (a_i - c*b_i)/|a_i - c*b_i| )
Setting to 0 and solving for c is hard. I don't think there's an analytic way of doing that. You may want to try https://math.stackexchange.com/ to see if they have any bright ideas.
However if you work with quadratic weights, it becomes significantly simpler:
d/dc SUM( (a_i - c*b_i)^2 )
= SUM( 2*(a_i - c*b_i)* -c )
= -2c * SUM( a_i - c*b_i ) = 0
=> SUM(a_i) - c*SUM(b_i) = 0
=> c = SUM(a_i) / SUM(b_i)
I strongly suggest the latter approach if you can.
I would suggest trying some sort of variant on Newton Raphson.
Construct a function Diff(k) that looks at the difference in area between your two graphs between fixed markers A and B.
mathematically I guess it would be integral ( x = A to B ){ f(x) - k * g(x) }dx
anyway realistically you could just subtract the values,
like if you range from X = -10 to 10, and you have a data point for f(i) and g(i) on each integer i in [-10, 10], (ie 21 datapoints )
then you just sum( i = -10 to 10 ){ f(i) - k * g(i) }
basically you would expect this function to look like a parabola -- there will be an optimum k, and deviating slightly from it in either direction will increase the overall area difference
and the bigger the difference, you would expect the bigger the gap
so, this should be a pretty smooth function ( if you have a lot of data points )
so you want to minimise Diff(k)
so you want to find whether derivative ie d/dk Diff(k) = 0
so just do Newton Raphson on this new function D'(k)
kick it off at k=1 and it should zone in on a solution pretty fast
that's probably going to give you an optimal computation time
if you want something simpler, just start with some k1 and k2 that are either side of 0
so say Diff(1.5) = -3 and Diff(2.9) = 7
so then you would pick a k say 3/10 of the way (10 = 7 - -3) between 1.5 and 2.9
and depending on whether that yields a positive or negative value, use it as the new k1 or k2, rinse and repeat
In case anyone stumbles upon this in the future, here is some code (c++)
The trick is to first sort the samples by the scaling factor that would result in the best fit for the 2 samples each. Then start at both ends iterate to the factor that results in the minimum absolute deviation (L1-norm).
Everything except for the sort has a linear run time => Runtime is O(n*log n)
/*
* Find x so that the sum over std::abs(pA[i]-pB[i]*x) from i=0 to (n-1) is minimal
* Then return x
*/
float linearFit(const float* pA, const float* pB, int n)
{
/*
* Algebraic solution is not possible for the general case
* => iterative algorithm
*/
if (n < 0)
throw "linearFit has invalid argument: expected n >= 0";
if (n == 0)
return 0;//If there is nothing to fit, any factor is a perfect fit (sum is always 0)
if (n == 1)
return pA[0] / pB[0];//return x so that pA[0] = pB[0]*x
//If you don't like this , use a std::vector :P
std::unique_ptr<float[]> targetValues_(new float[n]);
std::unique_ptr<int[]> indices_(new int[n]);
//Get proper pointers:
float* targetValues = targetValues_.get();//The value for x that would cause pA[i] = pB[i]*x
int* indices = indices_.get(); //Indices of useful (not nan and not infinity) target values
//The code above guarantees n > 1, so it is safe to get these pointers:
int m = 0;//Number of useful target values
for (int i = 0; i < n; i++)
{
float a = pA[i];
float b = pB[i];
float targetValue = a / b;
targetValues[i] = targetValue;
if (std::isfinite(targetValue))
{
indices[m++] = i;
}
}
if (m <= 0)
return 0;
if (m == 1)
return targetValues[indices[0]];//If there is only one target value, then it has to be the best one.
//sort the indices by target value
std::sort(indices, indices + m, [&](int ia, int ib){
return targetValues[ia] < targetValues[ib];
});
//Start from the extremes and meet at the optimal solution somewhere in the middle:
int l = 0;
int r = m - 1;
// m >= 2 is guaranteed => l > r
float penaltyFactorL = std::abs(pB[indices[l]]);
float penaltyFactorR = std::abs(pB[indices[r]]);
while (l < r)
{
if (l == r - 1 && penaltyFactorL == penaltyFactorR)
{
break;
}
if (penaltyFactorL < penaltyFactorR)
{
l++;
if (l < r)
{
penaltyFactorL += std::abs(pB[indices[l]]);
}
}
else
{
r--;
if (l < r)
{
penaltyFactorR += std::abs(pB[indices[r]]);
}
}
}
//return the best target value
if (l == r)
return targetValues[indices[l]];
else
return (targetValues[indices[l]] + targetValues[indices[r]])*0.5;
}

Resources