Evaluating log-likelihood of unseen data in rstan - r

I understand I can calculate the log likelihood of each sample during sampling, e.g.
...
model {
for (i in 1:N) {
(y[i] - 1) ~ bernoulli(p[i, 2]);
}
}
generated quantities {
vector[N] log_lik;
for (i in 1:N){
log_lik[i] = bernoulli_lpmf((y[i] - 1) | p[i, 2]);
}
}
After fitting, I can then extract log likelihood using the loo package:
log_lik_m <- extract_log_lik(stan_fit)
But I want to evaluate log likelihood of unseen data. This is possible in brms:
ll <- log_lik(fit_star, newdata = new_df)
But I would like to do this with rstan, since I can't easily define my model in brms (I am assuming).
For reference, I am trying to use Estimated LFO-CV to evaluate and compare my time-series model.
(e.g. https://github.com/paul-buerkner/LFO-CV-paper/blob/master/sim_functions.R#L186)
(https://mc-stan.org/loo/articles/loo2-lfo.html)

Thanks to the link from #dipetkov, I solved this myself. I didn't use the exact methods in the link, but came up with an alternative. You can call stan functions from R to get it to compute log likelihood for your model, even with unseen data (and its very fast!).
First, I put everything in my transformed parameters block into a function in stan's functions block. Then, I created a second function that wraps the first function, and evaluates the log likelihood for given observations and provided parameter estimates (I then removed my generated_quantities block). rstan has a function expose_stan_functions which adds all functions in the stan functions block to the R environment.
You can then call the log likelihood function you made to evaluate your model with any observations (previously seen or unseen), along with a set of parameter estimates.

Related

What does the summary function do to the output of regsubsets?

Let me preface this by saying that I do think this question is a coding question, not a statistics question. It would almost surely be closed over at Stats.SE.
The leaps package in R has a useful function for model selection called regsubsets which, for any given size of a model, finds the variables that produce the minimum residual sum of squares. Now I am reading the book Linear Models with R, 2nd Ed., by Julian Faraway. On pages 154-5, he has an example of using the AIC for model selection. The complete code to reproduce the example runs like this:
data(state)
statedata = data.frame(state.x77, row.names=state.abb)
require(leaps)
b = regsubsets(Life.Exp~.,data=statedata)
rs = summary(b)
rs$which
AIC = 50*log(rs$rss/50) + (2:8)*2
plot(AIC ~ I(1:7), ylab="AIC", xlab="Number of Predictors")
The rs$which command produces the output of the regsubsets function and allows you to select the model once you've plotted the AIC and found the number of parameters that minimizes the AIC. But here's the problem: while the typed-up example works fine, I'm having trouble with the wrong number of elements in the array when I try to use this code and adapt it to other data. For example:
require(faraway)
data(odor, package='faraway')
b=regsubsets(odor~temp+gas+pack+
I(temp^2)+I(gas^2)+I(pack^2)+
I(temp*gas)+I(temp*pack)+I(gas*pack),data=odor)
rs=summary(b)
rs$which
AIC=50*log(rs$rss/50) + (2:10)*2
produces a warning message:
Warning message:
In 50 * log(rs$rss/50) + (2:10) * 2 :
longer object length is not a multiple of shorter object length
Sure enough, length(rs$rss)=8, but length(2:10)=9. Now what I need to do is model selection, which means I really ought to have an RSS value for each model size. But if I choose b$rss in the AIC formula, it doesn't work with the original example!
So here's my question: what is summary() doing to the output of the regsubsets() function? The number of RSS values is not only not the same, but the values themselves are not the same.
Ok, so you know the help page for regsubsets says
regsubsets returns an object of class "regsubsets" containing no
user-serviceable parts. It is designed to be processed by
summary.regsubsets.
You're about to find out why.
The code in regsubsets calls Alan Miller's Fortran 77 code for subset selection. That is, I didn't write it and it's in Fortran 77. I do understand the algorithm. In 1996 when I wrote leaps (and again in 2017 when I made a significant modification) I spent enough time reading the code to understand what the variables were doing, but regsubsets mostly followed the structure of the Fortran driver program that came with the code.
The rss field of the regsubsets object has that name because it stores a variable called RSS in the Fortran code. This variable is not the residual sum of squares of the best model. RSS is computed in the setup phase, before any subset selection is done, by the subroute SSLEAPS, which is commented 'Calculates partial residual sums of squares from an orthogonal reduction from AS75.1.' That is, RSS describes the RSS of the models with no selection fitted from left to right in the design matrix: the model with just the leftmost variable, then the leftmost two variables, and so on. There's no reason anyone would need to know this if they're not planning to read the Fortran so it's not documented.
The code in summary.regsubsets extracts the residual sum of squares in the output from the $ress component of the object, which comes from the RESS variable in the Fortran code. This is an array whose [i,j] element is the residual sum of squares of the j-th best model of size i.
All the model criteria are computed from $ress in the same loop of summary.regsubsets, which can be edited down to this:
for (i in ll$first:min(ll$last, ll$nvmax)) {
for (j in 1:nshow) {
vr <- ll$ress[i, j]/ll$nullrss
rssvec <- c(rssvec, ll$ress[i, j])
rsqvec <- c(rsqvec, 1 - vr)
adjr2vec <- c(adjr2vec, 1 - vr * n1/(n1 + ll$intercept -
i))
cpvec <- c(cpvec, ll$ress[i, j]/sigma2 - (n1 + ll$intercept -
2 * i))
bicvec <- c(bicvec, (n1 + ll$intercept) * log(vr) +
i * log(n1 + ll$intercept))
}
}
cpvec gives you the same information as AIC, but if you want AIC it would be straightforward to do the same loop and compute it.
regsubsets has a nvmax parameter to control the "maximum size of subsets to examine". By default this is 8. If you increase it to 9 or higher, your code works.
Please note though, that the 50 in your AIC formula is the sample size (i.e. 50 states in statedata). So for your second example, this should be nrow(odor), so 15.

Two calculation formulas of density (pdf) of a bivariate normal distribution returning different results

With the code I’m calculating the density of a bivariate normal distribution. Here I use two formulas which should return the same result.
The first formula uses the dmvnorm of the mvtnorm package and the second formula uses the formula from Wikipedia (https://en.wikipedia.org/wiki/Multivariate_normal_distribution).
When the standard deviation of both distributions equals one (the covariance matrix has only ones on primary diagonal), the results are the same. But when you vary the two entries in the covariance matrix to two or one third… the results aren’t both identical.
(I hope) I have read the help properly and also this document (https://cran.r-project.org/web/packages/mvtnorm/vignettes/MVT_Rnews.pdf).
Here on stackoverflow (How to calculate multivariate normal distribution function in R) I found this because perhaps my covariance matrix is wrong defined.
But until now I couldn’t find an answer…
So my question: Why is my code returning different results when the standard deviation not equals one?
I hope I gave enough information... but when something is missing please comment. I will edit my question.
Many thanks in advance!
And now my code:
library(mvtnorm) # for loading the package if necessary
mu=c(0,0)
rho=0
sigma=c(1,1) # the standard deviation which should be changed to two or one third or… to see the different results
S=matrix(c(sigma[1],0,0,sigma[2]),ncol=2,byrow=TRUE)
x=rmvnorm(n=100,mean=mu,sigma=S)
dim(x) # for control
x[1:5,] # for visualization
# defining a function
Comparison=function(Points=x,mean=mu,sigma=S,quantity=4) {
for (i in 1:quantity) {
print(paste0("The ",i," random point"))
print(Points[i,])
print("The following two results should be the same")
print("Result from the function 'dmvnorm' out of package 'mvtnorm'")
print(dmvnorm(Points[i,],mean=mu,sigma=sigma,log=FALSE))
print("Result from equation out of wikipedia")
print(1/(2*pi*S[1,1]*S[2,2]*(1-rho^2)^(1/2))*exp((-1)/(2*(1-rho^2))*(Points[i,1]^2/S[1,1]^2+Points[i,2]^2/S[2,2]^2-(2*rho*Points[i,1]*Points[i,2])/(S[1,1]*S[2,2]))))
print("----")
print("----")
} # end for-loop
} # end function
# execute the function and compare the results
Comparison(Points=x,mean=mu,sigma=S,quantity=4)
Remember that S is the variance-covariance matrix. The formula you use from Wikipedia uses the standard deviation and not the variance. Hence you need to plug in the square root of the diagonal entries into the formula. This is also the reason why it works when you choose 1 as the diagonal entries (both the variance and the SD is 1).
See your modified code below:
library(mvtnorm) # for loading the package if necessary
mu=c(0,0)
rho=0
sigma=c(2,1) # the standard deviation which should be changed to two or one third or… to see the different results
S=matrix(c(sigma[1],0,0,sigma[2]),ncol=2,byrow=TRUE)
x=rmvnorm(n=100,mean=mu,sigma=S)
dim(x) # for control
x[1:5,] # for visualization
# defining a function
Comparison=function(Points=x,mean=mu,sigma=S,quantity=4) {
for (i in 1:quantity) {
print(paste0("The ",i," random point"))
print(Points[i,])
print("The following two results should be the same")
print("Result from the function 'dmvnorm' out of package 'mvtnorm'")
print(dmvnorm(Points[i,],mean=mu,sigma=sigma,log=FALSE))
print("Result from equation out of wikipedia")
SS <- sqrt(S)
print(1/(2*pi*SS[1,1]*SS[2,2]*(1-rho^2)^(1/2))*exp((-1)/(2*(1-rho^2))*(Points[i,1]^2/SS[1,1]^2+Points[i,2]^2/SS[2,2]^2-(2*rho*Points[i,1]*Points[i,2])/(SS[1,1]*SS[2,2]))))
print("----")
print("----")
} # end for-loop
} # end function
# execute the function and compare the results
Comparison(Points=x,mean=mu,sigma=S,quantity=4)
So your comment when you define sigma is not correct. In your code, sigma is the variances, not the standard deviations if you judge by how you construct S.

Runtime error in JAGS

I'm attempting to do this in JAGS:
z[l] ~ dbeta(0.5,0.5)
y[i,l] ~ z[l]*dnorm(0,10000) + inprod(1-z[l],dnegbin(exp(eta_star[i,l]),alpha[l]))
(dnorm(0,10000) models a Dirac delta in 0: see here if you are interested in the model).
But I get:
RUNTIME ERROR:
Incorrect number of arguments in function dnegbin
But if I do this:
y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
It runs just fine. I wonder that I cannot multiply a value for a distribution, so I imagine that something like this could work:
z[l] ~ dbeta(0.5,0.5)
pointmass_0[l] ~ dnorm(0,10000)
y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
y_star[i,l] = z[l]*pointmass_0[l]+inprod(1-z[l],y[i,l])
If I run that I get:
ystar[1,1] is a logical node and cannot be observed
You are looking to model a zero-inflated negative binomial model. You can do this in JAGS if you use the "ones trick", an pseudo-likelihood method that can be used when the distribution of your outcome variables is not one of the standard distributions in JAGS but you can still write down an expression for the likelihood.
The "ones trick" consists of creating pseudo-observations with the value 1. These are then modeled as Bernoulli random variables probability parameter Lik/C where Lik is the likelihood of your observations and C is a large constant to ensure that Lik/C << 1.
data {
C <- 10000
for (i in 1:N) {
one[i,1] <- 1
}
}
model {
for (i in 1:N) {
one[i,1] ~ dbern(lik[i,1]/C)
lik[i,1] <- (y[i,1]==0)*z[1] + (1 - z[1]) * lik.NB[i,1]
lik.NB[i,1] <- dnegbin(y[i,1], exp(eta_star[i,1]), alpha[1])
}
z[l] ~ dbeta(0.5,0.5)
}
Note that the name dnegbin is overloaded in JAGS. There is a distribution that has two parameters and a function that takes three arguments and returns the likelihood. We are using the latter.
I am thinking of adding zero-inflated versions of count distributions to JAGS, since the above construction is quote awkward for the user, whereas zero-inflated distributions are quite easy to implement internally in JAGS.
I too would like to know a better way to handle this situation.
One cheesy solution is to add a stochastic node
ystarstar[i,j] ~ dnorm(ystar[i,j],10000000)
(i.e. a Normal distribution with a very high precision, or a Dirac delta in your terminology) to the model.

R function for Likelihood

I'm trying to analyze repairable systems reliability using growth models.
I have already fitted a Crow-Amsaa model but I wonder if there is any package or any code for fitting a Generalized Renewal Process (Kijima Model I) or type II
in R and find it's parameters Beta, Lambda(or alpha) and q.
(or some other model for the mean cumulative function MCF)
The equation number 15 of this article gives an expression for the
Log-likelihood
I tried to create the function like this:
likelihood.G1=function(theta,x){
# x is a vector with the failure times, theta vector of parameters
a=theta[1] #Alpha
b=theta[2] #Beta
q=theta[3] #q
logl2=log(b/a) # First part of the equation
for (i in 1:length(x)){
logl2=logl2 +(b-1)*log(x[i]/(a*(1+q)^(i-1))) -(x[i]/(a*(1+q)^(i-1)))^b
}
return(-logl2) #Negavite of the log-likelihood
}
And then use some rutine for minimize the -Log(L)
theta=c(0.5,1.2,0.8) #Start parameters (lambda,beta,q)
nlm(likelihood.G1,theta, x=Data)
Or also
optim(theta,likelihood.G1,method="BFGS",x=Data)
However it seems to be some mistake, since the parameters it returns has no sense
Any ideas of what I'm doing wrong?
Thanks
Looking at equation (16) of the paper you reference and comparing it with your code it looks like you are missing one term in the for loop. It seems that each data point contributes to three terms of the log-likelihood but in your code (inside the loop) you only have two terms (not considering the updating term)
Specifically, your code does not include the 4th term in equation (16):
and neither it does the 7th term, and so on. This is at least one error in the code. An extra consideration would be that α and β are constrained to be greater than zero. I am not sure if the solver you are using is considering this constraint.

Likelihood maximization in R

In R, I wrote a log-likelihood function containing two recursive calculation. The log-likelihood function works properly (it gives answer for known values of parameters), but when I try to maximize it using optim(), it takes too much time. How can I optimize the code? Thanks in advance for ideas.
This is the log-likelihood function for a markov regime switching model with a dependence structure using copula functions.
Named g in the for loop:
Named p in the for loop:
Named f in the codes:
Some data:
u <- cbind(rt(100,10),rt(100,13))
f function:
f=function(u,p,e1,e2){
s=diag(2);s[1,2]=p
ff=dcopula.gauss(cbind(pt(u[,1],e1),pt(u[,2],e2)),Sigma=s)*dt(u[,1],e1)*dt(u[,2],e2)
return(ff)
}
log-likelihood function:
loglik=function(x){
p11<-x[1];p12<-x[2];p21<-x[3];p22<-x[4];p31<-x[5];p32<-x[6];r<-x[7];a1<-x[8];a2<-x[9];s<-x[10];b1<-x[11];b2<-x[12];t<-x[13];c1<-x[14];c2<-x[15]
p1=c(numeric(nrow(u)));p2=c(numeric(nrow(u)));p3=c(numeric(nrow(u)))
g=c(numeric(nrow(u)))
p1_0=.3
p2_0=.3
g[1]<-(p1_0*f(u,r,a1,a2)[1])+(p2_0*f(u,s,b1,b2)[1])+((1-(p1_0+p2_0))*f(u,t,c1,c2)[1])
p1[1]<-((p1_0*p11*f(u,r,a1,a2)[1])+(p2_0*p21*f(u,r,a1,a2)[1])+((1-(p1_0+p2_0))*p31*f(u,r,a1,a2)[1]))/g[1]
p2[1]<-((p1_0*p12*f(u,s,b1,b2)[1])+(p2_0*p22*f(u,s,b1,b2)[1])+((1-(p1_0+p2_0))*p32*f(u,s,b1,b2)[1]))/g[1]
p3[1]<-((p1_0*(1-(p11+p12))*f(u,t,c1,c2)[1])+(p2_0*(1-(p21+p22))*f(u,t,c1,c2)[1])+((1-(p1_0+p2_0))*(1-(p31+p32))*f(u,t,c1,c2)[1]))/g[1]
for(i in 2:nrow(u)){
g[i]<-(p1[i-1]*p11*f(u,r,a1,a2)[i])+(p1[i-1]*p12*f(u,s,b1,b2)[i])+(p1[i-1]*(1-(p11+p12))*f(u,t,c1,c2)[i])+
(p2[i-1]*p21*f(u,r,a1,a2)[i])+(p2[i-1]*p22*f(u,s,b1,b2)[i])+(p2[i-1]*(1-(p21+p22))*f(u,t,c1,c2)[i])+
(p3[i-1]*p31*f(u,r,a1,a2)[i])+(p3[i-1]*p32*f(u,s,b1,b2)[i])+(p3[i-1]*(1-(p31+p32))*f(u,t,c1,c2)[i])
p1[i]<-((p1[i-1]*p11*f(u,r,a1,a2)[i])+(p1[i-1]*p12*f(u,s,b1,b2)[i])+(p1[i-1]*(1-(p11+p12))*f(u,t,c1,c2)[i]))/g[i]
p2[i]<-((p2[i-1]*p21*f(u,r,a1,a2)[i])+(p2[i-1]*p22*f(u,s,b1,b2)[i])+(p2[i-1]*(1-(p21+p22))*f(u,t,c1,c2)[i]))/g[i]
p3[i]<-((p3[i-1]*p31*f(u,r,a1,a2)[i])+(p3[i-1]*p32*f(u,s,b1,b2)[i])+(p3[i-1]*(1-(p31+p32))*f(u,t,c1,c2)[i]))/g[i]
}
return(-sum(log(g)))
}
Optimization:
library(QRM)
library(copula)
start=list(0,1,0,0,0,0,1,9,7,-1,10,13,1,6,4)
##
optim(start,loglik,lower=c(rep(0,6),-1,1,1,-1,1,1,-1,1,1),
upper=c(rep(1,6),1,Inf,Inf,1,Inf,Inf,1,Inf,Inf),
method="L-BFGS-B") -> fit
This looks like a question for Stack-Overflow.
Something that springs to my mind is:
Define a vector containing the values f(.,.,.,.) in order to avoid doing k*nrow(u) evaluations of the same function and simply call those entries of interest.
It seems like the loop could be replaced by matrix and/or vector products. However, without further information it is unclear what the code is doing and it would take eons to extract this information from the code.

Resources