sampling a multimensional posterior distribution using MCMC Metropolis-Hastings algo in R - r

I am quite new in sampling posterior distributions(so therefore Bayesian approach) using a MCMC technique based on Metropolis-Hastings algorithm.
I am using the mcmc library in R for this. My distribution is multidimensionnal. In order to check if this metro algorithm works for multivaiate distribution I did it successfully on a multidimensional student-t distribution (package mvtnorm, function dmvt).
Now I want to apply the same thing to my multivariate distribution (2 vars x and y) but it doesn't work; I get an error : Error in X[, 1] : incorrect number of dimensions
Here is my code:
library(mcmc)
library(mvtnorm)
my.seed <- 123
logprior<-function(X,...)
{
ifelse( (-50.0 <= X[,1] & X[,1]<=50.0) & (-50.0 <= X[,2] & X[,2]<=50.0), return(0), return(-Inf))
}
logpost<-function(X,...)
{
log.like <- log( exp(-((X[,1]^2 + X[,2]^2 - 4)/10 )^2) * sin(4*atan(X[,2]/X[,1])) )
log.prior<-logprior(X)
log.post<-log.like + log.prior # if flat prior, the posterior distribution is the likelihood one
return (log.post)
}
x <- seq(-5,5,0.15)
y <- seq(-5,5,0.15)
X<-cbind(x,y)
#out <- metrop(function(X) dmvt(X, df=3, log=TRUE), 0, blen=100, nbatch=100) ; this works
out <- metrop(function(X) logpost(X), c(0,0), blen=100, nbatch=100)
out <- metrop(out)
out$accept
So I tried to respect the same kind of format than for the MWE, but it doesn't work still as I got the error mentioned before.
Another thing, is that applying logpost to X works perfectly.
Thanks in advance for your help, best

The metrop function passes individual samples, and therefore a simple vector to logpost, not a matrix (which is what X is). Hence, the solution is to change X[,1] and X[,2] to X[1] and X[2], respectively.
I ran it like this, and it leads to other issues (X[2]/X[1] is NaN for the initialization), but that has more to do with your specific likelihood model and is out of the scope of your question.

Related

RJAGS - How to pass more complex functions in BUGS file

My goal is to basically migrate this code to R.
All the preprocessing wrt datasets has been already done, now however I am stuck in writing the "model" file. As a first attempt, and for the sake of clarity, I wrote the code which is shown below in R language.
What I want to do is to run an MCMC to have an estimate of the parameter R_t, given the daily reported data for Italian Country.
The main steps that have been pursued are:
Sample an array parameter, namely the log(R_t), from a Gaussian RW distribution
Gauss_RandomWalk <- function(N, x0, mu, variance) {
z <- cumsum(rnorm(n=N, mean=mu, sd=sqrt(variance)))
t <- 1:N
x <- (x0 + t*mu + z)
return(x)
}
log_R_t <- Gauss_RandomWalk(tot_dates, 0., 0., 0.035**2)
R_t_candidate <- exp(log_R_t)
Compute some quantities, that are function of this sampled parameters, namely the number of infections. This dependence is quite simple, since it is linear algebra:
infections <- rep(0. , tot_dates)
infections[1] <- exp(seed)
for (t in 2:tot_dates){
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
Convolve the array I have just computed with a delay distribution (onset+reporting delay), finally rescaling it by the exposure variable:
test_adjusted_positive <- convolve(infections, delay_distribution_df$density, type = "open")
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- round(test_adjusted_positive*exposure)
Compute the Likelihood, which is proportional to the probability that a certain set of data was observed (i.e. daily confirmed cases), by sampling the aforementioned log(R_t) parameter from which the variable positive is computed.
likelihood <- dnbinom(round(Italian_data$daily_confirmed), mu = positive, size = 1/6)
Finally, here we come to my BUGS model file:
model {
#priors as a Gaussian RW
log_rt[1] ~ dnorm(0, 0.035)
log_rt[2] ~ dnorm(0, 0.035)
for (t in 3:tot_dates) {
log_rt[t] ~ dnorm(log_rt[t-1] + log_rt[t-2], 0.035)
R_t_candidate[t] <- exp(log_rt[t])
}
# data likelihood
for (t in 2:tot_dates) {
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
test_adjusted_positive <- convolve(infections, delay_distribution)
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- test_adjusted_positive*exposure
for (t in 2:tot_dates) {
confirmed[t] ~ dnbinom( obs[t], positive[t], 1/6)
}
}
where gt_to_convolution is a constant matrix, tot_dates is a constant value and exposure is a constant array.
When trying to compile it through:
data <- NULL
data$obs <- round(Italian_data$daily_confirmed)
data$tot_dates <- n_days
data$delay_distribution <- delay_distribution_df$density
data$exposure <- exposure
data$gt_to_convolution <- gt_to_convolution
inits <- NULL
inits$log_rt <- rep(0, tot_dates)
library (rjags)
library (coda)
set.seed(1995)
model <- "MyModel.bug"
jm <- jags.model(model , data, inits)
It raises the following raising error:
Compiling model graph
Resolving undeclared variables
Allocating nodes
Deleting model
Error in jags.model(model, data, inits) : RUNTIME ERROR:
Compilation error on line 19.
Possible directed cycle involving test_adjusted_positive
Hence I am not even able to debug it a little, even though I'm pretty sure there is something wrong more in general but I cannot figure out what and why.
At this point, I think the best choice would be to implement a Metropolis Algorithm myself according to the likelihood above, but obviously, I would way much more prefer to use an already tested framework that is BUGS/JAGS, this is the reason why I am asking for help.

How to update code to create a function for calculating Welch's for polynomial trends?

I am trying to reproduce the SPSS output for significance a linear trend among means when equal variances are not assumed.
I have gratefully used code from http://www-personal.umich.edu/~gonzo/coursenotes/file3.pdf to create a function for calculating separate variances, which based on my searching I understand as the “equal variances not assumed” output in SPSS.
My problem/goal:
I am only assessing polynomial orthogonal trends (mostly linear). I want to adapt the code creating the function so that the contrast argument can take pre-made contrast matrices rather than manually specifying the coefficients each time (room for typos!).
… I have tried those exact commands but receive Error in contrast %*% means : non-conformable arguments . I have played around with the code but I can’t get it to work.
Code for creating the function from the notes:
sepvarcontrast <- function(dv, group, contrast) {
means <- c(by(dv, group, mean))
vars <- c(by(dv, group, var))
ns <- c(by(dv, group, length))
ihat <- contrast %*% means
t.denominator <- sqrt(contrast^2 %*% (vars/ns))
t.welch <- ihat/ t.denominator
num.contrast <- ifelse(is.null(dim(contrast)),1,dim(contrast)[1])
df.welch <- rep(0, num.contrast)
if (is.null(dim(contrast))) contrast <- t(as.matrix(contrast))
for (i in 1:num.contrast) {
num <- (contrast[i,]^2 %*% (vars))^2
den <- sum((contrast[i,]^2 * vars)^2 / (ns-1))
df.welch[i] <- num/den
}
p.welch <- 2*(1- pt(abs(t.welch), df.welch))
result <- list(ihat = ihat, se.ihat = t.denominator, t.welch = t.welch,
df.welch = df.welch, p.welch = p.welch)
return(result)
}
I would like to be able to use the function like this:
# Create a polynomial contrast matrix for 5 groups, then save
contr.mat5 <- contr.poly(5)
# Calculate separate variance
sepvarcontrast(dv, group, contrast = contr.mat5)
I have tried those exact commands to see if they would work but receive Error in contrast %*% means : non-conformable arguments.
All suggestions are appreciated! I am still learning how to create a reprex...

How to estimate lambdas of poisson distributed samples in R and to draw Kernel estimation of the density function of the estimator basing on that?

So I have 500 poisson distributed simulated samples with n=100 each.
1) How can I estimate the lambdas for each of these samples separately in R ?
2) How can I draw Kernel Estimation of the density function of the estimator for lambda based on the 500 estimated lambdas? (my guess is somehow with "Kernsmooth" package and function "bkfe" but i fail to programm it normally anyway
taskpois <- function(size, leng){
+ taskmlepois <- NULL
+ for (i in 1:leng){
+ randompois <- rpois(size, 6)
+ taskmlepois[i] <- mean(randompois)
+ }
+ return(taskmlepois)
+ }
tasksample <- taskpois(size=100, leng=500)
As the comments suggest, it seems you're pretty close already.
ltarget <- 2
set.seed(101)
lambdavec <- replicate(500,mean(rpois(100,lambda=ltarget)))
dd <- density(lambdavec)
plot(dd,main="",las=1,bty="l")
We might as well add the expected result based on asymptotic theory:
curve(dnorm(x,mean=2,sd=sqrt(2/100)),add=TRUE,col=2)
We can add another line that shows that the variation among the densities of different experiments is pretty large relative to the difference between the theoretical and observed density from the first experiment:
lambdavec2 <- replicate(500,mean(rpois(100,lambda=ltarget)))
lines(density(lambdavec2),col=4)

Error in Gradient Descent Calculation

I tried to write a function to calculate gradient descent for a linear regression model. However the answers I was getting does not match the answers I get using the normal equation method.
My sample data is:
df <- data.frame(c(1,5,6),c(3,5,6),c(4,6,8))
with c(4,6,8) being the y values.
lm_gradient_descent <- function(df,learning_rate, y_col=length(df),scale=TRUE){
n_features <- length(df) #n_features is the number of features in the data set
#using mean normalization to scale features
if(scale==TRUE){
for (i in 1:(n_features)){
df[,i] <- (df[,i]-mean(df[,i]))/sd(df[,i])
}
}
y_data <- df[,y_col]
df[,y_col] <- NULL
par <- rep(1,n_features)
df <- merge(1,df)
data_mat <- data.matrix(df)
#we need a temp_arr to store each iteration of parameter values so that we can do a
#simultaneous update
temp_arr <- rep(0,n_features)
diff <- 1
while(diff>0.0000001){
for (i in 1:(n_features)){
temp_arr[i] <- par[i]-learning_rate*sum((data_mat%*%par-y_data)*df[,i])/length(y_data)
}
diff <- par[1]-temp_arr[1]
print(diff)
par <- temp_arr
}
return(par)
}
Running this function,
lm_gradient_descent(df,0.0001,,0)
the results I got were
c(0.9165891,0.6115482,0.5652970)
when I use the normal equation method, I get
c(2,1,0).
Hope someone can shed some light on where I went wrong in this function.
You used the stopping criterion
old parameters - new parameters <= 0.0000001
First of all I think there's an abs() missing if you want to use this criterion (though my ignorance of R may be at fault).
But even if you use
abs(old parameters - new parameters) <= 0.0000001
this is not a good stopping criterion: it only tells you that progress has slowed down, not that it's already sufficiently accurate. Try instead simply to iterate for a fixed number of iterations. Unfortunately it's not that easy to give a good, generally applicable stopping criterion for gradient descent here.
It seems that you have not implemented a bias term. In a linear model like this, you always want to have an additional additive constant, i.e., your model should be like
w_0 + w_1*x_1 + ... + w_n*x_n.
Without the w_0 term, you usually won't get a good fit.
I know this is a couple of weeks old at this point but I'm going to take a stab at for several reasons, namely
Relatively new to R so deciphering your code and rewriting it is good practice for me
Working on a different Gradient Descent problem so this is all fresh to me
Need the stackflow points and
As far as I can tell you never got a working answer.
First, regarding your data structures. You start with a dataframe, rename a column, strip out a vector, then strip out a matrix. It would be a lot easier to just start with an X matrix (capitalized since its component 'features' are referred to as xsubscript i) and a y solution vector.
X <- cbind(c(1,5,6),c(3,5,6))
y <- c(4,6,8)
We can easily see what the desired solutions are, with and without scaling by fitting a linear fit model. (NOTE We only scale X/features and not y/solutions)
> lm(y~X)
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X1 X2
-4 -1 3
> lm(y~scale(X))
Call:
lm(formula = y ~ scale(X))
Coefficients:
(Intercept) scale(X)1 scale(X)2
6.000 -2.646 4.583
With regards to your code, one of the beauties of R is that it can perform matrix multiplication which is significantly faster than using loops.
lm_gradient_descent <- function(X, y, learning_rate, scale=TRUE){
if(scale==TRUE){X <- scale(X)}
X <- cbind(1, X)
theta <- rep(0, ncol(X)) #your old temp_arr
diff <- 1
old.error <- sum( (X %*% theta - y)^2 ) / (2*length(y))
while(diff>0.000000001){
theta <- theta - learning_rate * t(X) %*% (X %*% theta - y) / length(y)
new.error <- sum( (X %*% theta - y)^2 ) / (2*length(y))
diff <- abs(old.error - new.error)
old.error <- new.error
}
return(theta)
}
And to show it works...
> lm_gradient_descent(X, y, .01, 0)
[,1]
[1,] -3.9360685
[2,] -0.9851775
[3,] 2.9736566
vs expected of (-4, -1, 3)
For what its worth while I agree with #cfh that I would prefer a loop with a defined number of iterations, I'm actually not sure you need the abs function. If diff < 0 then your function is not converging.
Finally rather than using something like old.error and new.error I'd suggest using a a vector that records all errors. You can then plot that vector to see how quickly your function converges.

Easy way of counting precision, recall and F1-score in R

I am using an rpart classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the predict.rpart function.
But I also want to calculate precision, recall and F1 score.
My question is - do I have to write functions for those myself, or is there any function in R or any of CRAN libraries for that?
using the caret package:
library(caret)
y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions
precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")
F1 <- (2 * precision * recall) / (precision + recall)
A generic function that works for binary and multi-class classification without using any package is:
f1_score <- function(predicted, expected, positive.class="1") {
predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
expected <- as.factor(expected)
cm = as.matrix(table(expected, predicted))
precision <- diag(cm) / colSums(cm)
recall <- diag(cm) / rowSums(cm)
f1 <- ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))
#Assuming that F1 is zero when it's not possible compute it
f1[is.na(f1)] <- 0
#Binary F1 or Multi-class macro-averaged F1
ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}
Some comments about the function:
It's assumed that an F1 = NA is zero
positive.class is used only in
binary f1
for multi-class problems, the macro-averaged F1 is computed
If predicted and expected had different levels, predicted will receive the expected levels
The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):
library (ROCR);
...
y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions
pred <- prediction(predictions, y);
# Recall-Precision curve
RP.perf <- performance(pred, "prec", "rec");
plot (RP.perf);
# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);
# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric(auc.tmp#y.values)
...
Just to update this as I came across this thread now, the confusionMatrix function in caretcomputes all of these things for you automatically.
cm <- confusionMatrix(prediction, reference = test_set$label)
# extract F1 score for all classes
cm[["byClass"]][ , "F1"] #for multiclass classification problems
You can substitute any of the following for "F1" to extract the relevant values as well:
"Sensitivity", "Specificity", "Pos Pred Value", "Neg Pred Value", "Precision", "Recall", "F1", "Prevalence", "Detection", "Rate", "Detection Prevalence", "Balanced Accuracy"
I think this behaves slightly differently when you're only doing a binary classifcation problem, but in both cases, all of these values are computed for you when you look inside the confusionMatrix object, under $byClass
confusionMatrix() from caret package can be used along with a proper optional field "Positive" specifying which factor should be taken as positive factor.
confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")
This code will also give additional values such as F-statistic, Accuracy, etc.
I noticed the comment about F1 score being needed for binary classes. I suspect that it usually is. But a while ago I wrote this in which I was doing classification into several groups denoted by number. This may be of use to you...
calcF1Scores=function(act,prd){
#treats the vectors like classes
#act and prd must be whole numbers
df=data.frame(act=act,prd=prd);
scores=list();
for(i in seq(min(act),max(act))){
tp=nrow(df[df$prd==i & df$act==i,]);
fp=nrow(df[df$prd==i & df$act!=i,]);
fn=nrow(df[df$prd!=i & df$act==i,]);
f1=(2*tp)/(2*tp+fp+fn)
scores[[i]]=f1;
}
print(scores)
return(scores);
}
print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))
We can simply get F1 value from caret's confusionMatrix function
result <- confusionMatrix(Prediction, Lable)
# View confusion matrix overall
result
# F1 value
result$byClass[7]
You can also use the confusionMatrix() provided by caret package. The output includes,between others, Sensitivity (also known as recall) and Pos Pred Value(also known as precision). Then F1 can be easily computed, as stated above, as:
F1 <- (2 * precision * recall) / (precision + recall)
library(caret)
result <- confusionMatrix(Prediction, label)
#This shows all the measures you need including precision, recall and F1
result$byClass

Resources