Using a parameter shared by a normal distribution and a separate multinormal distribution - stan

Question
I have two measurements, one is a scalar measurement x and the other is two-element vector (y,z). The sampling distribution of the former is a normal distribution, and the sampling distribution of the latter is a multinormal distribution. The location parameter for x and for y are assumed to be the same, mu_1. In other words, both x and y provide information about the common parameter mu_1.
Here is my model:
\begin{aligned}
x \sim Normal(\mu_1, \sigma_1^2) \\
(y,z) \sim MultiNormal((\mu_1, \mu_2), \Sigma)
\end{aligned}
How can I specify this in Stan so that the parameter mu_1 is used in both sampling distributions, and I can thus get a posterior distribution for mu_1?
My confusion here stems from the fact that mu_1 is a scalar parameter in the first distribution, but part of a parameter vector in the second distribution. Also, I used a normal distribution and multinormal distribution as examples just for simplicity, but I'm looking for a solution that will work for other combinations of sampling distributions as well.

You can do this with a transformed parameters block, where the vector parameter mu_1_2 = (mu_1, mu_2) is created using previously-defined scalar parameters mu_1 and mu_2.
Here's a basic example:
data {
real x;
vector[2] y_z;
real<lower=0> sigma_x;
matrix[2,2] sigma_y_z;
}
parameters {
real mu_1;
real mu_2;
}
transformed parameters {
vector[2] mu_1_2;
mu_1_2[1] = mu_1;
mu_1_2[2] = mu_2;
}
model {
x ~ normal(mu_1, sigma_x)
y_z ~ multi_normal(mu_1_2, sigma_y_z)
}

Related

Random number simulation in R

I have been going through some random number simulation equations while i found out that as Pareto dosent have an inbuilt function.
RPareto is found as
rpareto <- function(n,a,l){
rp <- l*((1-runif(n))^(-1/a)-1)
rp
}
can someone explain the intuitive meaning behind this.
It's a well known result that if X is a continuous random variable with CDF F(.), then Y = F(X) has a Uniform distribution on [0, 1].
This result can be used to draw random samples of any continuous random variable whose CDF is known: generate u, a Uniform(0, 1) random variable and then determine the value of x for which F(x) = u.
In specific cases, there may well be more efficient ways of sampling from F(.), but this will always work as a fallback.
It's likely (I haven't checked the accuracy of the code myself, but it looks about right) that the body of your function solves f(x) = u for known u in order to generate a random variable with a Pareto distribution. You can check it with a little algebra after getting the CDF from this Wikipedia page.

how do you generate samples from the logistic CDF using the inverse-CDF method

My question is how to generate a sample in R from a logistic CDF with the inverse CDF method. The logistic density is p(θ) = exp(θ)/(1 + exp(θ))^2
Here is the algorithm for that method:
1: for t = 1 to T do
2: sample q(t) ∼ Unif(0, 1)
3: θ(t) ← F^−1(q(t))
4: end for
Here is my code but it just generates a vector of the same number. The result should be log-concave but obviously it would not be that if I put it in the histogram, so what is the problem?:
First define T as the number of draws you're taking from uniform distribution
T<-100000
sample_q<-runif(T,0,1)
It seems like plogis will give you the cumulative distribution function, so I suppose I can just take its inverse:
generate_samples_from_logistic_CDF <- function(p) {
for(t in length(T))
cdf<-plogis((1+exp(p)/(exp(p))))
inverse_cdf<-(1/cdf)
return(inverse_cdf)
}
should generate_samples_from_logistic_CDF(sample_q)
but instead it only gives me the same value for everything
Since the inverse CDF is already coded in R as qlogis(), this should work:
qlogis(runif(100000))
or if you want to do it "by hand" rather than using the built-in qlogis(), you can use R <- runif(100000); log(R/(1-R))
Note that rlogis(100000) should be more efficient.
One of your confusions is that "inverse" in the algorithm description above doesn't mean the multiplicative inverse or reciprocal (i.e. 1/x), but rather the function inverse (which in this case is log(q/(1-q)))

Runtime error in JAGS

I'm attempting to do this in JAGS:
z[l] ~ dbeta(0.5,0.5)
y[i,l] ~ z[l]*dnorm(0,10000) + inprod(1-z[l],dnegbin(exp(eta_star[i,l]),alpha[l]))
(dnorm(0,10000) models a Dirac delta in 0: see here if you are interested in the model).
But I get:
RUNTIME ERROR:
Incorrect number of arguments in function dnegbin
But if I do this:
y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
It runs just fine. I wonder that I cannot multiply a value for a distribution, so I imagine that something like this could work:
z[l] ~ dbeta(0.5,0.5)
pointmass_0[l] ~ dnorm(0,10000)
y[i,l] ~ dnegbin(exp(eta_star[i,l]),alpha[l])
y_star[i,l] = z[l]*pointmass_0[l]+inprod(1-z[l],y[i,l])
If I run that I get:
ystar[1,1] is a logical node and cannot be observed
You are looking to model a zero-inflated negative binomial model. You can do this in JAGS if you use the "ones trick", an pseudo-likelihood method that can be used when the distribution of your outcome variables is not one of the standard distributions in JAGS but you can still write down an expression for the likelihood.
The "ones trick" consists of creating pseudo-observations with the value 1. These are then modeled as Bernoulli random variables probability parameter Lik/C where Lik is the likelihood of your observations and C is a large constant to ensure that Lik/C << 1.
data {
C <- 10000
for (i in 1:N) {
one[i,1] <- 1
}
}
model {
for (i in 1:N) {
one[i,1] ~ dbern(lik[i,1]/C)
lik[i,1] <- (y[i,1]==0)*z[1] + (1 - z[1]) * lik.NB[i,1]
lik.NB[i,1] <- dnegbin(y[i,1], exp(eta_star[i,1]), alpha[1])
}
z[l] ~ dbeta(0.5,0.5)
}
Note that the name dnegbin is overloaded in JAGS. There is a distribution that has two parameters and a function that takes three arguments and returns the likelihood. We are using the latter.
I am thinking of adding zero-inflated versions of count distributions to JAGS, since the above construction is quote awkward for the user, whereas zero-inflated distributions are quite easy to implement internally in JAGS.
I too would like to know a better way to handle this situation.
One cheesy solution is to add a stochastic node
ystarstar[i,j] ~ dnorm(ystar[i,j],10000000)
(i.e. a Normal distribution with a very high precision, or a Dirac delta in your terminology) to the model.

Gaussian mixture modeling with mle2/optim

I have an mle2 model that I've developed here just to demonstrate the problem. I generate values from two separate Gaussian distributions x1 and x2, combine them together to form x=c(x1,x2), and then create an MLE that attempts to re-classify x values as belonging to the left of a specific x value or the right of a specific x value via the xsplit paremeter.
The problem is that the parameters found are not ideal. Specifically, xsplit is always returned as whatever its starting value is. And if I change its starting value (e.g., as 4 or 9) there are huge differences in the log likelihood that results.
Here is the completely reproducible example:
set.seed(1001)
library(bbmle)
x1 = rnorm(n=100,mean=4,sd=0.8)
x2 = rnorm(n=100,mean=12,sd=0.4)
x = c(x1,x2)
hist(x,breaks=20)
ff = function(m1,m2,sd1,sd2,xsplit) {
outs = rep(NA,length(xvals))
for(i in seq(1,length(xvals))) {
if(xvals[i]<=xsplit) {
outs[i] = dnorm(xvals[i],mean=m1,sd=sd1,log=T)
}
else {
outs[i] = dnorm(xvals[i],mean=m2,sd=sd2,log=T)
}
}
-sum(outs)
}
# change xsplit starting value here to 9 and 4
# and realize the difference in log likelihood
# Why isn't mle finding the right value for xsplit?
mo = mle2(ff,
start=list(m1=1,m2=2,sd1=0.1,sd2=0.1,xsplit=9),
data=list(xvals=x))
#print mo to see log likelihood value
mo
#plot the result
c=coef(mo)
m1=as.numeric(c[1])
m2=as.numeric(c[2])
sd1=as.numeric(c[3])
sd2=as.numeric(c[4])
xsplit=as.numeric(c[5])
leftx = x[x<xsplit]
rightx = x[x>=xsplit]
y1=dnorm(leftx,mean=m1,sd=sd1)
y2=dnorm(rightx,mean=m2,sd=sd2)
points(leftx,y1*40,pch=20,cex=1.5,col="blue")
points(rightx,y2*90,pch=20,cex=1.5,col="red")
How can I modify my mle2 to capture the correct parameters, specifically for xsplit?
Mixture models present a lot of technical challenges (symmetry under relabeling of components, etc.); unless you have very specific needs, you might be better off using one of the large number of special-purpose mixture modeling packages that have been written for R (just library("sos"); findFn("{mixture model}") or findFn("{mixture model} Gaussian")).
However, in this case, you have a more specific problem, which is that the goodness-of-fit/likelihood surface of the xsplit parameter is "bad" (i.e. the derivative is zero almost everywhere). In particular, if you consider a pair of points x1, x2 in your data set that are neighbours, the likelihood is exactly the same for any splitting parameter between x1 and x2 (because any of those values splits the data set into the same two components). That means the likelihood surface is piecewise flat, which makes it almost impossible for any sensible optimizer -- even those such as Nelder-Mead that don't explicitly depend on derivatives. Your choices are (1) use some sort of brute-force stochastic optimizer (such as method="SANN" in optim()); (2) take xsplit out of your function and profile over it (i.e. for each possible choice of xsplit, optimize over the other four parameters); (3) smooth your splitting criterion (i.e. fit a logistic probability of belonging to one component or the other); (4) use a special-purpose mixture model fitting algorithm, as recommended above.
set.seed(1001)
library(bbmle)
x1 = rnorm(n=100,mean=4,sd=0.8)
x2 = rnorm(n=100,mean=12,sd=0.4)
x = c(x1,x2)
Your ff function can be written more compactly:
## ff can be written more compactly:
ff2 <- function(m1,m2,sd1,sd2,xsplit) {
p <- xvals<=xsplit
-sum(dnorm(xvals,mean=ifelse(p,m1,m2),
sd=ifelse(p,sd1,sd2),log=TRUE))
}
## ML estimation
mo <- mle2(ff2,
start=list(m1=1,m2=2,sd1=0.1,sd2=0.1,xsplit=9),
data=list(xvals=x))
## refit with a different starting value for xsplit
mo2 <- update(mo,start=list(m1=1,m2=2,sd1=0.1,sd2=0.1,xsplit=4))
## not used here, but maybe handy
plotfun <- function(mo,xvals=x,sizes=c(40,90)) {
c <- coef(mo)
hist(xvals,col="gray")
p <- xvals <= c["xsplit"]
y <- with(as.list(coef(mo)),
dnorm(xvals,mean=ifelse(p,m1,m2),
sd=ifelse(p,sd1,sd2))*sizes[ifelse(p,1,2)])
points(xvals,y,pch=20,cex=1.5,col=c("blue","red")[ifelse(p,1,2)])
}
plot(slice(mo),ylim=c(-0.5,10))
plot(slice(mo2),ylim=c(-0.5,10))
I cheated a little bit to extract just the xsplit parameter:
Likelihood surface around xsplit=9:
Likelihood surface around xsplit=4:
Also see p. 243 of Bolker 2008.
Update: smoothing
As I mentioned above, one solution is to make the boundary between the two mixture components smooth, or gradual, rather than sharp. I used a logistic function plogis() with midpoint at xsplit and a scale arbitrarily set to 2 (you could try to make it sharper; in principle you could make it an adjustable parameter, but if you do that you'll probably run into trouble again because the optimizer may want to make it infinite ...) In other words, rather that saying that all observations with x<xsplit are definitely in component 1 and all observations with x>xsplit are definitely in component 2, we say that observations that are equal to xsplit have a 50/50 probability of falling in either component, with the certainty of being in component 1 increasing as x decreases below xsplit. A logistic function with a very large scaling parameter approximates the sharp-split model previously attempted; generally you want to make the scaling parameter "large enough" to get a reasonable split and small enough not to run into numeric problems. (If you make the scale too large, the computed probabilities will underflow/overflow to 0 or 1 and you'll be back where you started...)
This is my second or third try; I had to do considerable fiddling (bounding values away from 0, or between 0 and 1, and fitting the standard deviations on a log scale), but the results seem reasonable. If I don't use clamp() on the logistic (plogis) function then I get 0 or 1 probabilities; if I don't use clamp() (one-sided) on the Normal probabilities then they can underflow to zero -- in either case I get infinite or NaN outcomes. Fitting the standard deviations on the log scale works better because one doesn't run into problems when the optimizer tries negative values for the standard deviation ...
## bound x values between lwr and upr
clamp <- function(x,lwr=0.001,upr=0.999) {
pmin(upr,pmax(lwr,x))
}
ff3 <- function(m1,m2,logsd1,logsd2,xsplit) {
p <- clamp(plogis(2*(xvals-xsplit)))
-sum(log((1-p)*clamp(dnorm(xvals,m1,exp(logsd1)),upr=Inf)+
p*clamp(dnorm(xvals,m2,exp(logsd2)),upr=Inf)))
}
xvals <- x
ff3(1,2,0.1,0.1,4)
mo3 <- mle2(ff3,
start=list(m1=1,m2=2,logsd1=-1,logsd2=-1,xsplit=4),
data=list(xvals=x))
## Coefficients:
## m1 m2 logsd1 logsd2 xsplit
## 3.99915532 12.00242510 -0.09344953 -1.13971551 8.43767997
The results look reasonable.

Probability transformation using R

I want to turn a continuous random variable X with cdf F(x) into a continuous random variable Y with cdf F(y) and am wondering how to implement it in R.
For example, perform a probability transformation on data following normal distribution (X) to make it conform to a desirable Weibull distribution (Y).
(x=0 has CDF F(x=0)=0.5, CDF F(y)=0.5 corresponds to y=5, then x=0 corresponds to y=5 etc.)
There are many built in distribution functions, those starting with a 'p' will transform to a uniform and those starting with a 'q' will transform from a uniform. So the transform in your example can be done by:
y <- qweibull( pnorm( x ), 2, 6.0056 )
Then just change the functions and/or parameters for other cases.
The distr package may also be of interest for additional capabilities.
In general, you can transform an observation x on X to an observation y on Y by
getting the probability of X≤x, i.e. FX(x).
then determining what observation y has the same probability,
I.e. you want the probability Y≤y = FY(y) to be the same as FX(x).
This gives FY(y) = FX(x).
Therefore y = FY-1(FX(x))
where FY-1 is better known as the quantile function, QY. The overall transformation from X to Y is summarized as: Y = QY(FX(X)).
In your particular example, from the R help, the distribution functions for the normal distribution is pnorm and the quantile function for the Weibull distribution is qweibull, so you want to first of all call pnorm, then qweibull on the result.

Resources