Maximize simulated likelihood in R -- is there something obvious I am not getting? - r

I am trying to maximize a simulated likelihood in discrete choice (Lerman and Manski (1981)) by simulating frequencies and using them as probabilities (which I cannot compute directly). However, R never manages to find any optimum (maximization always yields starting values). As a minimal example, here my code for a very simple probit estimation:
### simulate data
set.seed(5849)
N <- 2000
b.cons <- 8
b.x <- 10
x <- cbind(rep(1, N), runif(N)) #"observed variables"
e <- rnorm(N) # "unobserved error"
k <- runif(N)*10+7 # threshold: something random, but high enough to guarantee some variation in i
t <- x%*%c(b.cons, b.x)+e
i <- 1*(k>t) #participation dummy
### likelihood function
R <- 1000 # number of draws
err <- matrix(rnorm(R*N), N, R) # draw error terms (outside of likelihood function to speed up estimation)
# estimate b.i, sig.i
probit.sim <- function(params, I, K, X) {
part =matrix(NA, N, R)
T = X%*%params%*%rep(1, R) + err
for (i in 1:R) part[,i] = K>T[,i]
pr.i1 = rowSums(part)/R
pr.i1[pr.i1==0] <- 0.001
pr.i1[pr.i1==1] <- 0.999
pr.i0 = 1-pr.i1
llik = t(I)%*%log(pr.i1) + t(1-I)%*%log(pr.i0)
-llik
}
### maximize likelihood
optim(c(1,1), probit.sim, I = i, K = k, X = x)
Is it because the probabilities are not smooth enough? Is there a way to maximize things that are not super smooth? On a graph, the maximum still seems pretty clear to the eye... Or am I missing something completely else?
I am really very much of a beginner, so I thank you in advance for any helpful advice!
(Also any reference that actually goes into the details of how to program such a simulated max likelihood function -- most references I saw remain very much theoretical about it)

Related

RJAGS - How to pass more complex functions in BUGS file

My goal is to basically migrate this code to R.
All the preprocessing wrt datasets has been already done, now however I am stuck in writing the "model" file. As a first attempt, and for the sake of clarity, I wrote the code which is shown below in R language.
What I want to do is to run an MCMC to have an estimate of the parameter R_t, given the daily reported data for Italian Country.
The main steps that have been pursued are:
Sample an array parameter, namely the log(R_t), from a Gaussian RW distribution
Gauss_RandomWalk <- function(N, x0, mu, variance) {
z <- cumsum(rnorm(n=N, mean=mu, sd=sqrt(variance)))
t <- 1:N
x <- (x0 + t*mu + z)
return(x)
}
log_R_t <- Gauss_RandomWalk(tot_dates, 0., 0., 0.035**2)
R_t_candidate <- exp(log_R_t)
Compute some quantities, that are function of this sampled parameters, namely the number of infections. This dependence is quite simple, since it is linear algebra:
infections <- rep(0. , tot_dates)
infections[1] <- exp(seed)
for (t in 2:tot_dates){
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
Convolve the array I have just computed with a delay distribution (onset+reporting delay), finally rescaling it by the exposure variable:
test_adjusted_positive <- convolve(infections, delay_distribution_df$density, type = "open")
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- round(test_adjusted_positive*exposure)
Compute the Likelihood, which is proportional to the probability that a certain set of data was observed (i.e. daily confirmed cases), by sampling the aforementioned log(R_t) parameter from which the variable positive is computed.
likelihood <- dnbinom(round(Italian_data$daily_confirmed), mu = positive, size = 1/6)
Finally, here we come to my BUGS model file:
model {
#priors as a Gaussian RW
log_rt[1] ~ dnorm(0, 0.035)
log_rt[2] ~ dnorm(0, 0.035)
for (t in 3:tot_dates) {
log_rt[t] ~ dnorm(log_rt[t-1] + log_rt[t-2], 0.035)
R_t_candidate[t] <- exp(log_rt[t])
}
# data likelihood
for (t in 2:tot_dates) {
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
test_adjusted_positive <- convolve(infections, delay_distribution)
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- test_adjusted_positive*exposure
for (t in 2:tot_dates) {
confirmed[t] ~ dnbinom( obs[t], positive[t], 1/6)
}
}
where gt_to_convolution is a constant matrix, tot_dates is a constant value and exposure is a constant array.
When trying to compile it through:
data <- NULL
data$obs <- round(Italian_data$daily_confirmed)
data$tot_dates <- n_days
data$delay_distribution <- delay_distribution_df$density
data$exposure <- exposure
data$gt_to_convolution <- gt_to_convolution
inits <- NULL
inits$log_rt <- rep(0, tot_dates)
library (rjags)
library (coda)
set.seed(1995)
model <- "MyModel.bug"
jm <- jags.model(model , data, inits)
It raises the following raising error:
Compiling model graph
Resolving undeclared variables
Allocating nodes
Deleting model
Error in jags.model(model, data, inits) : RUNTIME ERROR:
Compilation error on line 19.
Possible directed cycle involving test_adjusted_positive
Hence I am not even able to debug it a little, even though I'm pretty sure there is something wrong more in general but I cannot figure out what and why.
At this point, I think the best choice would be to implement a Metropolis Algorithm myself according to the likelihood above, but obviously, I would way much more prefer to use an already tested framework that is BUGS/JAGS, this is the reason why I am asking for help.

Implement a Monte Carlo Simulation Method to Estimate an Integral in R

I am trying to implement a Monte carlo simulation method to estimate an integral in R. However, I still get wrong answer. My code is as follows:
f <- function(x){
((cos(x))/x)*exp(log(x)-3)^3
}
t <- integrate(f,0,1)
n <- 10000 #Assume we conduct 10000 simulations
int_gral <- Monte_Car(n)
int_gral
You are not doing Monte-Carlo here. Monte-Carlo is a simulation method that helps you approximating integrals using sums/mean based on random variables.
You should do something in this flavor (you might have to verify that it's correct to say that the mean of the f output can approximates your integral:
f <- function(n){
x <- runif(n)
return(
((cos(x))/x)*exp(log(x)-3)^3
)
}
int_gral <- mean(f(10000))
What your code does is taking a number n and return ((cos(n))/n)*exp(log(n)-3)^3 ; there is no randomness in that
Update
Now, to get a more precise estimates, you need to replicate this step K times. Rather than using a loop, you can use replicate function:
K <- 100
dist <- data.frame(
int = replicate(K, mean(f(10000)))
)
You get a distribution of estimators for your integral :
library(ggplot2)
ggplot(dist) + geom_histogram(aes(x = int, y = ..density..))
and you can use mean to have a numerical value:
mean(dist$int)
# [1] 2.95036e-05
You can evaluate the precision of your estimates with
sd(dist$int)
# [1] 2.296033e-07
Here it is small because N is already large, giving you a good precision of first step.
I have managed to change the codes as follows. Kindly confirm to me that I am doing the right thing.
regards.
f <- function(x){
((cos(x))/x)*exp(log(x)-3)^3
}
set.seed(234)
n<-10000
for (i in 1:10000) {
x<-runif(n)
I<-sum(f(x))/n
}
I

Perceptron in R not converging

I am trying to understand Neural Networks better so I am trying to implement a simple perceptron from scratch in R. I know that this is very inefficient as there are many libraries that do this extemely well optimized but my goal is to understand the basics of neural networks better and work my way forward to more complex models.
I have created some artificial test data with a very simple linear decision boundary and split this into a training set and a test set. I then ran a logistic regression on the training data and checked the predictions from the test-set and got +99% accuray, which was to be expected given the simple nature of the data. I then tried implementing a perceptron with 2 inputs, 1 neuron, 1000 iterations, a learning rate of 0.1 and a sigmoid activation function.
I would expect to get very similar accuracy to the logistic regression model but my results are a lot worse (around 70% correct classifications in the training set). so I definitly did something wrong. The predictions only seem to get better after the first couple of iterations and then just go back and forth around a specific value (I tried with many different learning rates, no success). I'm attaching my script and I#m thankful for any advice! I think the problem lies in the calculation of the error or the weight adjustment but I can't put my finger on it...
### Reproducible Example for StackOverflow
#### Setup
# loading libraries
library(data.table)
#remove scientifc notation
options(scipen = 999)
# setting seed for random number generation
seed <- 123
#### Selfmade Test Data
# input points
x1 <- runif(10000,-100,100)
x2 <- runif(10000,-100,100)
# setting decision boundary to create output
output <- vector()
output[0.5*x1 + -1.2*x2 >= 50] <- 0
output[0.5*x1 + -1.2*x2 < 50] <- 1
# combining to dataframe
points <- cbind.data.frame(x1,x2,output)
# plotting all data points
plot(points$x1,points$x2, col = as.factor(points$output), main = "Self-created data", xlab = "x1",ylab = "x2")
# split into test and training sets
trainsize = 0.2
set.seed(seed)
train_rows <- sample(1:dim(points)[1], size = trainsize * dim(points)[1])
train <- points[train_rows,]
test <- points[-c(train_rows),]
# plotting training set only
plot(train$x1,train$x2, col = as.factor(train$output), main = "Self-created data (training set)", xlab = "x1",ylab = "x2")
#### Approaching the problem with logistic regression
# building model
train_logit <- glm(output ~ x1 + x2, data = train, family = "binomial", maxit = 10000)
summary(train_logit)
# testing performance in training set
table(round(train_logit$fitted.values) == train$output)
# testing performance of train_logit model in test set
table(test$output == round(predict(train_logit,test[,c(1,2)], type = "response")))
# We get 100% accuracy in the training set and near 100% accuracy in the test set
#### Approaching Problem with a Perceptron from scratch
# setting inputs, outputs and weights
inputs <- as.matrix(train[,c(1,2)])
output <- as.matrix(train[,3])
set.seed(123456)
weights <- as.matrix(runif(dim(inputs)[2],-1,1))
## Defining activation function + derivative
# defining sigmoid and it's derivative
sigmoid <- function(x) {1 / (1 + exp(-x))}
sig_dir <- function(x){sigmoid(x)*(1 - sigmoid(x))}
## Perceptron nitial Settings
bias <- 1
# number of iterations
iterations <- 1000
# setting learning rate
alpha <- 0.1
## Perceptron
# creating vectors for saving results per iteration
weights_list <- list()
weights_list[[1]] <- weights
errors_vec <- vector()
outputs_vec <- vector()
# saving results across iterations
weights_list_all <- list()
outputs_list <- list()
errors_list <- list()
# looping through the backpropagation algorithm "iteration" # times
for (j in 1:iterations) {
# Loop for backpropagation with updating weights after every datapoint
for (i in 1:dim(train)[1]) {
# taking the weights from the last iteration of the outer loop as a starting point
if (j > 1) {
weights_list[[1]] <- weights
}
# Feed Forward (Should we really round this?!)
output_pred <- round(sigmoid(sum(inputs[i,] * as.numeric(weights)) + bias))
error <- output_pred - output[i]
# Backpropagation (Do I need the sigmoid derivative AND a learning rate? Or should I only take one of them?)
weight_adjustments <- inputs[i,] * (error * sig_dir(output_pred)) * alpha
weights <- weights - weight_adjustments
# saving progress for later plots
weights_list[[i + 1]] <- weights
errors_vec[i] <- error
outputs_vec[[i]] <- output_pred
}
# saving results for each iteration
weights_list_all[[j]] <- weights_list
outputs_list[[j]] <- outputs_vec
errors_list[[j]] <- errors_vec
}
#### Formatting Diagnostics for easier plotting
# implementing empty list to transform weightslist
WeightList <- list()
# collapsing individual weightslist into datafames
for (i in 1:iterations) {
WeightList[[i]] <- t(data.table::rbindlist(weights_list_all[i]))
}
# pasting dataframes together
WeightFrame <- do.call(rbind.data.frame, WeightList)
colnames(WeightFrame) <- paste("w",1:dim(WeightFrame)[2], sep = "")
# pasting dataframes together
ErrorFrame <- do.call(rbind.data.frame, errors_list)
OutputFrame <- do.call(rbind.data.frame, outputs_list)
##### Plotting Results
# Development of Mean Error per iteration
plot(rowMeans(abs(ErrorFrame)),
type = "l",
xlab = "Sum of absolute Error terms")
# Development of Weights over time
plot(WeightFrame$w1, type = "l",xlim = c(1,dim(train)[1]), ylim = c(min(WeightFrame),max(WeightFrame)), ylab = "Weights", xlab = "Iterations")
lines(WeightFrame$w2, col = "green")
# lines(WeightFrame$w3, col = "blue")
# lines(WeightFrame$w4, col = "red")
# lines(WeightFrame$w5, col = "orange")
# lines(WeightFrame$w6, col = "cyan")
# lines(WeightFrame$w7, col = "magenta")
# Empty vector for number of correct categorizations per iteration
NoCorr <- vector()
# Computing percentage of correct predictions per iteration
colnames(OutputFrame) <- paste("V",1:dim(OutputFrame)[2], sep = "")
Output_mat <- as.matrix(OutputFrame)
for (i in 1:iterations) {
NoCorr[i] <- sum(output == Output_mat[i,]) / nrow(train)
}
# plotting number of correct predictions per iteration
plot(NoCorr, type = "l")
# Performance in training set after last iteration
table(output,round(OutputFrame[iterations,]))
First of all, welcome to the world of Neural Networks :).
Secondly, I want to recommend a great article to you, which I personally used to get a better understanding of backtracking and the whole NN learning stuff: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/. Might be a bit rough to get through sometimes, and for the general implementation I think it is much easier to follow pseudocode from a NN book. However, to understand what is going on this is article is very nice!
Thirdly, I will hopefully solve your problem :)
You comment yourself already with whether you should really round that output_pred. Yes you should.. if you want to use that output_pred to make a prediction! However, if you want to use it for learning it is generally not good! The reason for this is that if you round it for learning, than an output which was rounded up from 0.51 to 1 with target output 1 will not learn anything as the output was the same as the target and thus is perfect. However, 0.99 would have been a lot better of a prediction than 0.51 and thus there is definitely something to learn!
I am not 100% sure if this solves all your problems (im not an R programmer) and gets your accuracy up to 99%, but it should solve some of it, and hopefully the intuition is also clear :)

Sample from a custom likelihood function

I have the following likelihood function which I used in a rather complex model (in practice on a log scale):
library(plyr)
dcustom=function(x,sd,L,R){
R. = (log(R) - log(x))/sd
L. = (log(L) - log(x))/sd
ll = pnorm(R.) - pnorm(L.)
return(ll)
}
df=data.frame(Range=seq(100,500),sd=rep(0.1,401),L=200,U=400)
df=mutate(df, Likelihood = dcustom(Range, sd,L,U))
with(df,plot(Range,Likelihood,type='l'))
abline(v=200)
abline(v=400)
In this function, the sd is predetermined and L and R are "observations" (very much like the endpoints of a uniform distribution), so all 3 of them are given. The above function provides a large likelihood (1) if the model estimate x (derived parameter) is in between the L-R range, a smooth likelihood decrease (between 0 and 1) near the bounds (of which the sharpness is dependent on the sd), and 0 if it is too much outside.
This function works very well to obtain estimates of x, but now I would like to do the inverse: draw a random x from the above function. If I would do this many times, I would generate a histogram that follows the shape of the curve plotted above.
The ultimate goal is to do this in C++, but I think it would be easier for me if I could first figure out how to do this in R.
There's some useful information online that helps me start (http://matlabtricks.com/post-44/generate-random-numbers-with-a-given-distribution, https://stats.stackexchange.com/questions/88697/sample-from-a-custom-continuous-distribution-in-r) but I'm still not entirely sure how to do it and how to code it.
I presume (not sure at all!) the steps are:
transform likelihood function into probability distribution
calculate the cumulative distribution function
inverse transform sampling
Is this correct and if so, how do I code this? Thank you.
One idea might be to use the Metropolis Hasting Algorithm to obtain a sample from the distribution given all the other parameters and your likelihood.
# metropolis hasting algorithm
set.seed(2018)
n_sample <- 100000
posterior_sample <- rep(NA, n_sample)
x <- 300 # starting value: I chose 300 based on your likelihood plot
for (i in 1:n_sample){
lik <- dcustom(x = x, sd = 0.1, L = 200, R =400)
# propose a value for x (you can adjust the stepsize with the sd)
x.proposed <- x + rnorm(1, 0, sd = 20)
lik.proposed <- dcustom(x = x.proposed, sd = 0.1, L = 200, R = 400)
r <- lik.proposed/lik # this is the acceptance ratio
# accept new value with probablity of ratio
if (runif(1) < r) {
x <- x.proposed
posterior_sample[i] <- x
}
}
# plotting the density
approximate_distr <- na.omit(posterior_sample)
d <- density(approximate_distr)
plot(d, main = "Sample from distribution")
abline(v=200)
abline(v=400)
# If you now want to sample just a few values (for example, 5) you could use
sample(approximate_distr,5)
#[1] 281.7310 371.2317 378.0504 342.5199 412.3302

Estimating AR(1) coefficient using metropolis-Hastings algorithm (MCMC) in R

I am trying to write a program to estimate AR(1) coefficients using metropolis-hastings algorithm. My R code is as following,
set.seed(101)
#loglikelihood
logl <- function(b,data) {
ly = length(data)
-sum(dnorm(data[-1],b[1]+b[2]*data[1:(ly-1)],(b[3])^2,log=TRUE))
}
#proposal function
proposalfunction <- function(param,s){
return(rnorm(3,mean = param, sd= s))
}
#MH sampler
MCMC <- function(startvalue, iterations,data,s){
i=1
chain = array(dim = c(iterations+1,3))
chain[i,] = startvalue
while (i <= iterations){
proposal = proposalfunction(chain[i,],s)
probab = exp(logl(proposal,data = data) - logl(chain[i,],data = data))
if(!is.na(probab)){
if (runif(1) <= min(1,probab)){
chain[i+1,] = proposal
}else{
chain[i+1,] = chain[i,]
}
i=i+1
}else{
cat('\r !')
}
}
acceptance = round((1-mean(duplicated(chain)))*100,1)
print(acceptance)
return(chain)
}
#example
#generating data
data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 2000,sd = sqrt(1))
r=MCMC(c(0,.7,1),50000,data,s=.00085)
In the example, I must get zero for the mean and 0.7 for the coefficient and 1 for error variance. but everytime I run this code I get completely different values. I tried to adjust proposal scale but still I get the results that are far from the true values. Figure below shows the results.
You've flipped the sign in your log-likelihood function. This is an easy mistake to make because maximum likelihood estimation usually proceeds by minimizing the negative log-likelihood, but the requirement in MCMC is to be working with the likelihood itself (not its inverse).
Also:
dnorm() takes the standard deviation as its third argument, not the variance. You can simplify your code slightly by using head(data,-1) to get all but the last element in a vector. So your log-likelihood would be:
sum(dnorm(data[-1],b[1]+b[2]*head(data,-1),b[3],log=TRUE))
you're probably hurting yourself by fixing the candidate distribution to be independent Normal with equal SDs for all variables; allowing them to differ (although the posterior SDs are not as different as I thought they might be - about {0.02,0.016,0.0077} - so this might not be such a big problem.

Resources