MLE issues in R - r

I am new to R and taught myself what I know of R based on the other languages i know. I am in a student research position currently and must use R to find the maximum likelihood estimate of the given likelihood function:
Where g, m_i, x_ij, n_ij, and mu_i are known. I have to maximize theta_i, but i am not sure how since i am mostly self taught. I do know that i should have six estimated values of theta, however. I have tried doing research online about using mle but I am not far into Statistics to understand what the websites are talking about. Any help in figuring out what i am doing wrong would be greatly appreciated. I am unsure how to attach excel files, so i apologize not being able to include data tables.
In trying to teach myself this and work with the professor we receive this error:
Error in do.call("minuslogl", l) : could not find function "minuslogl"
Below is the code i have done up until this point:
library(stats4)
#####################################################################
#Liklihood Model ####CHECK
###################################################################
BB <- function(LITTERS, responses, fetuses, mu, theta) {
total <- 0
#1
for (i in 1:6) {
firstSum <- 0
#2
for (j in 1:LITTERS) {
secondSum <- 0
#log(mu[i] + kTheta[i])
insideFirst <- 0
for (k in 0:(responses[i,j] - 1))
{
insideFirst <- insideFirst + log10(mu[i] + k * theta[i])
}
#log(1-mu[i] + kTheta[i])
insideSecond <- 0
for (k in 0:(fetuses[i,j] - responses[i,j] - 1))
{
insideSecond <- insideSecond + log10(1 - mu[i] + k * theta[i])
}
#log(1 + kTheta[i])
insideThird <- 0
for (k in 0:(fetuses[i,j] - 1))
{
insideThird <- insideThird + log10(1 + k * theta[i])
}
secondSum <- insideFirst + insideSecond - insideThird
firstSum <- firstSum + secondSum
}
total <- total + firstSum
}
return (total)
}
###################################################################
#Number of litters
LITTERS.M <- 25
doses <- c(0, 30, 45, 60, 75, 90)
#Retrieves the litter sizes (fetuses)
litterSize.dose0 <- get.Litter.Sizes(dose0, LITTERS.M)
litterSize.dose30 <- get.Litter.Sizes(dose30, LITTERS.M)
litterSize.dose45 <- get.Litter.Sizes(dose45, LITTERS.M)
litterSize.dose60 <- get.Litter.Sizes(dose60, LITTERS.M)
litterSize.dose75 <- get.Litter.Sizes(dose75, LITTERS.M)
litterSize.dose90 <- get.Litter.Sizes(dose90, LITTERS.M)
litterSize <- c(litterSize.dose0, litterSize.dose30, litterSize.dose45, litterSize.dose60, litterSize.dose75, litterSize.dose90)
litterSizes <- matrix(litterSize, nrow = 6, ncol = LITTERS.M)
#Start of Linear Regression for AB By first estimating AB
estimate.dose0 <- get.estimate.AB(dose0)
estimate.dose30 <- get.estimate.AB(dose30)
estimate.dose45 <- get.estimate.AB(dose45)
estimate.dose60 <- get.estimate.AB(dose60)
estimate.dose75 <- get.estimate.AB(dose75)
estimate.dose90 <- get.estimate.AB(dose90)
rProbR <- c(estimate.dose0, estimate.dose30, estimate.dose45, estimate.dose60,
estimate.dose75, estimate.dose90)
ab <- c(get.Log.Estimate(estimate.dose0), get.Log.Estimate(estimate.dose30), get.Log.Estimate(estimate.dose45),
get.Log.Estimate(estimate.dose60), get.Log.Estimate(estimate.dose75), get.Log.Estimate(estimate.dose90))
#Fit to Linear Regression
toFit <- data.frame(rProbR, ab)
linearRegression <- lm(ab ~ rProbR, data=toFit)
#Get Coefficients of linear regression of AB
AApproximation = linearRegression$coefficients[1]
BApproximation = linearRegression$coefficients[2]
#Get probability response for each dose group (P(D[i]))
probabilityResponse.dose0 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 0)
probabilityResponse.dose30 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 30)
probabilityResponse.dose45 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 45)
probabilityResponse.dose60 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 60)
probabilityResponse.dose75 <- get.Probability.Response.Logistic(AApproximation + BApproximation* 75)
probabilityResponse.dose90 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 90)
probabilityResponses <- c(probabilityResponse.dose0, probabilityResponse.dose30, probabilityResponse.dose45, probabilityResponse.dose60, probabilityResponse.dose75, probabilityResponse.dose90)
#Generate number of responses for each litter (Responses)
litterResponses.dose0 <- rbinom(LITTERS.M, litterSize.dose0, probabilityResponse.dose0)
litterResponses.dose30 <- rbinom(LITTERS.M, litterSize.dose30, probabilityResponse.dose30)
litterResponses.dose45 <- rbinom(LITTERS.M, litterSize.dose45, probabilityResponse.dose45)
litterResponses.dose60 <- rbinom(LITTERS.M, litterSize.dose60, probabilityResponse.dose60)
litterResponses.dose75 <- rbinom(LITTERS.M, litterSize.dose75, probabilityResponse.dose75)
litterResponses.dose90 <- rbinom(LITTERS.M, litterSize.dose90, probabilityResponse.dose90)
litterResponse <- c(litterResponses.dose0, litterResponses.dose30, litterResponses.dose45, litterResponses.dose60, litterResponses.dose75, litterResponses.dose90)
litterResponses <- matrix(litterResponse, 6, LITTERS.M)
backgroundResponseProb <- get.Probability.Response.Logistic(AApproximation + BApproximation * 0)
backgroundResponseProb <- backgroundResponseProb + .001
mle(BB(LITTERS.M, litterResponses, litterSizes, probabilityResponse, theta=0))

I haven't gone through the entire code (you should try to provide minimal reproducible examples), but the error you're getting is caused by the fact that you're not using the mle function correctly.
The first argument to the mle function should be another function that takes candidate parameters as arguments, and returns the negative log-likelihood of the data as a function of these parameters. The second argument to the mle function is a named list of starting parameters. Look at ?mle for more details.
Here's a minimal example for fitting normally distributed data:
library(stats4)
y <- rnorm(100, 5, 3) ## Example data
mllNorm <- function(mean, log.sd) {-sum(dnorm(y, mean, exp(log.sd), log=TRUE))} ## Minus Gaussian log-likelihood
mle.fit <- mle(mllNorm, start=list(mean=1, log.sd=1)) ## MLE
print(mle.fit)
As a more general tip, I highly recommend the maxLik package for MLE. It offers a more flexible interface, prettier output, and more optimization options.

Related

Dirichlet Process manually

I am implementing the Dirichlet Mixture Model using the EM algorithm in R, but am experiencing issues with the results. I generated two binomial distributions with fractions of (70%, 30%) and means of (0.05, 0.18), and trimmed 5% of the data set near 0. However, I am using a Beta distribution for clustering instead of a binomial distribution. Additionally, I am updating the mean and variance of the distributions rather than the alpha and beta parameters in order to impose constraints on the variance of each distribution.
I expected to obtain results similar to the ground truth settings, but instead I am getting pi values of (1, 0) and means of (0.09, 0.21). I am not sure if there are errors in my EM algorithm implementation or issues with parameter initialization.
I am including my R code for the data generation and DMM below. I would appreciate any help in identifying the cause of the problem and suggestions for how to resolve it.
library(dplyr)
library(data.table)
library(tidyverse)
set.seed(42)
#read count
cover <- 100
#Ground Truth Setting
subclone_f <- c(0.7, 0.3) # Ground Truth Setting - proportion
subclone_vaf <- c(0.05, 0.18) # Ground Truth Setting - mean
n_muts <- 45000
n_clone <-length(subclone_f)
#generating the virtual mutation notation: subclonal if 2, clonal if 1
mut_type <- sample.int(2, n_muts, prob = subclone_f, replace = TRUE)
mut_type
#generating negative binomial distribution(read count) for the given coverage
mut_reads <- rbinom(n_muts, cover, prob = subclone_vaf[mut_type]) %>% data.frame()
mut_reads
vaf <- (mut_reads/cover) %>% data.frame()
# Truncate the low count reads
n <- 0.95 * nrow(vaf) # cut-off setting
vaf_trim <- sapply(vaf, function(x) sort(x, decreasing = TRUE)[1:n])
colnames(vaf_trim) <- c("vaf")
hist(vaf_trim, breaks=seq(0,0.75,by=0.0001))
# Mixture Model
# Parameter Initialization (for 2 subclonality)
pi <- c(0.5, 0.5) # Mixture proportion weight: sums up to 1
alpha <- c(2,3)
beta <- c(20,5)
Mu[1] <- alpha[1] / (alpha[1] + beta[1])
Mu[2] <- alpha[2] / (alpha[2] + beta[2])
var[1] <- alpha[1]*beta[1] / ((alpha[1] + beta[1])^2 * (alpha[1] + beta[1] +1))
var[2] <- alpha[2]*beta[2] / ((alpha[2] + beta[2])^2 * (alpha[2] + beta[2] +1))
tau <-c(0.05, 0.05)
loglike[1] <- 0.5
loglike[2] <- 0.5
k <- 2
Nu <- 1/ (alpha + beta + 1) # control the variance: same across the distributions> Originally wanted to implement the same Nu for 2 distributions but I don't know how to do that.
n_cluster <- nrow(data.frame(pi))
logdbeta <- function(x, alpha, beta) {
sum(sapply(x, function(x) {dbeta(x, alpha, beta, log = TRUE)}))
}
estBetaParams <- function(mu, var) {
alpha <- ((1 - mu) / var - (1 / mu)) * mu ^ 2
beta <- alpha * (1 / mu - 1)
return(params = list(alpha = alpha, beta = beta))
}
# Loop for the EM algorithm
while(abs(loglike[k]-loglike[k-1]) >= 0.00001) {
# E step
total <- (pi[1]*dbeta(vaf_trim, alpha[1], beta[1])) + (pi[2]*dbeta(vaf_trim, alpha[2], beta[2]))
tau1 <- pi[1]*(dbeta(vaf_trim, alpha[1], beta[1]))/ total
tau2 <- pi[2]*(dbeta(vaf_trim, alpha[2], beta[2]))/ total
# M step
pi[1] <- sum(tau1)/length(vaf_trim) # Update Pi(weight)
pi[2] <- sum(tau2)/length(vaf_trim)
Mu[1] <- sum(tau1*vaf_trim)/sum(tau1) # Update Mu
Mu[2] <- sum(tau2*vaf_trim)/sum(tau2)
#Nu <- alpha + beta
Nu <- 1/ (alpha + beta + 1)
# Our main aim was to share the same coefficient for all dist
var[1] <- Mu[1] * (1-Mu[1]) * Nu[1] # Update Variance
var[2] <- Mu[2] * (1-Mu[2]) * Nu[2]
#Update in terms of alpha and beta
estBetaParams(Mu[1], var[1])
estBetaParams(Mu[2], var[2])
# Maximize the loglikelihood
loglike[k+1]<-sum(tau1*(log(pi[1])+logdbeta(vaf_trim,Mu[1],var[1])))+sum(tau2*(log(pi[2])+logdbeta(vaf_trim,Mu[2],var[2])))
k<-k+1
}
# Print estimates
EM <- data.table(param = c("pi", "Mean"), pi = pi, Mean = Mu)
knitr::kable(EM)

R cannot replicate results from Stata for log-likelihood maximization

I have been working on this algorithm all week, and I do not seem to find the problem.
I employ Stata's NLSUR command for a simple QUAIDS maximization. Stata requires me to write a program evaluator in which I parametrize my system of equations as well as imposing parametric restrictions. Those with experience in such implementations will find the code below familiar. Then, Stata'NLSUR uses MLE to find the parameters. This is not problem, so I use these results (which are correct) to check my log-likelihood optimization problem in R.
R is a bit trickier, because it requires me to write a similar script to that in Stata, but I need to additionally specified the log-likelihood and so the variance-covariance matrix in a function. This function is shown below. I have tried the BBoptim() solver and the results for the parameters do not match, only those in the variance-covariance matrix. I also wrote my own gradient descent algorithm and check it with other toy examples. Then, I check it with this function. My gradient descent algorithm works with the toy examples, but not with my function. I believe the problem is my function, which is similar to the one employ in Stata's NLSUR.
I have to also point out that I use the Stata's estimated parameters in my own function with the same data and I find the same log-likelihood value. I also predict values with Stata and my R's function and I find the same values. I have also tested if the parameter restrictions have been imposed, and it seems this is the case.
Could you help me figure it out what I am doing wrong? The reason that I need to write this specification is that I will be incorporating truncations in the log-likelihood, but I started step-by-step by first trying to solve the simpler problem without the truncations, but only using the log-likelihood.
quaids.loglike <- function(param, s1, s2, s3, lnp1, lnp2, lnp3, lnp4, lnw){
# alphas
a1 <- param[1]
a2 <- param[2]
a3 <- param[3]
a4 <- (1 - a1 - a2 - a3)
# betas
b1 <- param[4]
b2 <- param[5]
b3 <- param[6]
b4 <- (-b1 - b2 - b3)
# gammas
g11 <- param[7]
g12 <- param[8]
g13 <- param[9]
g14 <- (-g11 - g12 - g13)
g21 <- g12
g22 <- param[10]
g23 <- param[11]
g24 <- (-g21 - g22 - g23)
g31 <- g13
g32 <- g23
g33 <- param[12]
g34 <- (-g31 - g32 - g33)
g41 <- g14
g42 <- g24
g43 <- g34
g44 <- (-g41 - g42 - g43)
# lambdas
l1 <- param[13]
l2 <- param[14]
l3 <- param[15]
# Sigmas
sig11 <- param[16]
sig12 <- param[17]
sig13 <- param[19]
sig21 <- sig12
sig22 <- param[18]
sig23 <- param[20]
sig31 <- sig13
sig32 <- sig23
sig33 <- param[21]
# Lnpindex (Ln a(p) where a0 = 5)
lnpindex <- 5 + a1*lnp1 + a2*lnp2 + a3*lnp3 + a4*lnp4
lnpindex <- lnpindex + 0.5*(g11*lnp1*lnp1 + g12*lnp1*lnp2 + g13*lnp1*lnp3 + g14*lnp1*lnp4)
lnpindex <- lnpindex + 0.5*(g21*lnp2*lnp1 + g22*lnp2*lnp2 + g23*lnp2*lnp3 + g24*lnp2*lnp4)
lnpindex <- lnpindex + 0.5*(g31*lnp3*lnp1 + g32*lnp3*lnp2 + g33*lnp3*lnp3 + g34*lnp3*lnp4)
lnpindex <- lnpindex + 0.5*(g41*lnp4*lnp1 + g42*lnp4*lnp2 + g43*lnp4*lnp3 + g44*lnp4*lnp4)
#for(i in as.character(1:4)) {
# for(j in as.character(1:4)) {
# gij <- get(paste0("g", i, j))
# lnpi <- get(paste0("lnp", i))
# lnpj <- get(paste0("lnp", j))
# lnpindex <- lnpindex + 0.5*gij*lnpi*lnpj
# }
#}
# b(p) price index
bofp <- lnp1*b1 + lnp2*b2 + lnp3*b3 + lnp4*b4
#for(i in as.character(1:4)) {
# lnpi <- get(paste0("lnp", i))
# bi <- get(paste0("b", i))
# bofp <- bofp + lnpi*bi
#}
bofp <- exp(bofp)
# The parametric shares
u1 <- a1 + g11*lnp1 + g12*lnp2 + g13*lnp3 + g14*lnp4 + b1*(lnw - lnpindex) + (l1/bofp)*(lnw - lnpindex)^2
u2 <- a2 + g21*lnp1 + g22*lnp2 + g23*lnp3 + g24*lnp4 + b2*(lnw - lnpindex) + (l2/bofp)*(lnw - lnpindex)^2
u3 <- a3 + g31*lnp1 + g32*lnp2 + g33*lnp3 + g34*lnp4 + b3*(lnw - lnpindex) + (l3/bofp)*(lnw - lnpindex)^2
U <- c(mean(u1, na.rm = TRUE), mean(u2, na.rm = TRUE), mean(u3, na.rm = TRUE))
# The vcov matrix:
sigma <- c(sig11, sig12, sig13, sig21, sig22, sig23, sig31, sig32, sig33)
sigma <- matrix(sigma, 3, 3)
# The shares
S <- cbind(s1, s2, s3)
# the individual log-likelihood
ll <- dmvnorm(S, U, sigma = sigma, log = TRUE)
return(sum(ll))
}

R: Maximum Likelihood Estimation of a exponential mixture using optim

I'm trying to get the parameters w, lambda_1, lambda_2 and p from a mixture bi-exponential model, using a loglikelihood function and the optim function in R. The model is the following
Here is the code
biexpLL <- function(theta, y) {
# define parameters
w <- theta[1]
lambda_1 <- theta[2]
a <- theta[3]
lambda_2 <- theta[4]
# likelihood function with dexp
l <- w * dexp((y - a), rate = 1/lambda_1) + (1 - w) * dexp((y - a), rate = 1/lambda_2)
- sum(log(l))
}
# Generate some fake data
w <- 0.7
n <- 500
lambda_1 <- 2
lambda_2 <- 0.2
set.seed(45)
biexp_data <- (w * rexp(n, 1/lambda_1) + (1 - w) * rexp(n, 1/lambda_2))
# Optimization
optim(par = c(0.5,0.1,0.001,0.2),
fn=biexpLL,
y=biexp_data)
#$par
#[1] -94789220.4 16582.9 -333331.7 134744336.2
The parameters are very different from the used in the fake data! What I'm doing wrong?
The original code is prone to warnings and errors since the parameters may go to invalid values easily. For example, we need w in [0, 1] and lambda > 0. Also, if a is larger than a data point, then the density becomes zero, hence infinite log likelihood.
The code below uses some tricks to handle these cases.
w is converted to the range [0, 1] by the logistic function
lambda are converted to positive values by the exponential function.
Added tiny value to the likelihood to deal with cases of zero likelihood.
Also, the data generation process has been changed so that samples are generated from one of the exponential distributions with the given probability w.
Finally, increased the sample size since the result was not stable with n=500.
biexpLL <- function(theta, y) {
# define parameters
w <- 1/(1+exp(-theta[1]))
lambda_1 <- exp(theta[2])
a <- theta[3]
lambda_2 <- exp(theta[4])
# likelihood function with dexp
l <- w * dexp((y - a), rate = 1/lambda_1) + (1 - w) * dexp((y - a), rate = 1/lambda_2)
- sum(log(l + 1e-9))
}
# Generate some fake data
w <- 0.7
n <- 5000
lambda_1 <- 2
lambda_2 <- 0.2
set.seed(45)
n1 <- round(n*w)
n2 <- n - n1
biexp_data <- c(rexp(n1, rate=1/lambda_1),
rexp(n2, rate=1/lambda_2))
# Optimization
o <- optim(par=c(0.5,0.1,0.001,0.2),
fn=biexpLL,
y=biexp_data)
1/(1+exp(-o$par[1]))
exp(o$par[2])
o$par[3]
exp(o$par[4])
On my environment I obtained the below.
The result seems reasonably close to the simulation parameters (note that two lambda values are swapped).
> 1/(1+exp(-o$par[1]))
[1] 0.3458264
> exp(o$par[2])
[1] 0.1877655
> o$par[3]
[1] 3.738172e-05
> exp(o$par[4])
[1] 2.231844
Notice that for mixture models of this kind, people often use the EM algorithm for optimizing the likelihood instead of the direct optimization as this. You may want to have a look at it as well.
I have been able to get the parameters with the R package DEoptim :
library(DEoptim)
biexpLL <- function(theta, y)
{
w <- theta[1]
lambda_1 <- theta[2]
lambda_2 <- theta[3]
l <- w * dexp(y, rate = 1 / lambda_1) + (1 - w) * dexp(y, rate = 1 / lambda_2)
log_Lik <- -sum(log(l))
if(is.infinite(log_Lik))
{
return(10 ^ 30)
}else
{
return(log_Lik)
}
}
w <- 0.7
n <- 500
lambda_1 <- 2
lambda_2 <- 0.2
set.seed(45)
indicator <- rbinom(n = 500, size = 1, prob = w)
biexp_data <- (indicator * rexp(n, 1 / lambda_1) + (1 - indicator) * rexp(n, 1 / lambda_2))
obj_DEoptim <- DEoptim(fn = biexpLL, lower = c(0, 0, 0), upper = c(1, 1000, 1000), control = list(itermax = 1000, parallelType = 1), y = biexp_data)
obj_DEoptim$optim$bestmem
par1 par2 par3
0.7079678 2.2906098 0.2026040

Bridge sampling Monte-carlo method in R studio for variance gamma

I am using trying to use bridge sampling in R studio to simulate paths for the variance gamma process. My code is:
sigma = 0.5054
theta = 0.2464
nu = 0.1184
mu=1
N=2^(k)
k=5
V_<-rep(NA,252)
V_[0]<-0
G_[N]<-rgamma(1, shape=N*1/nu, scale=nu)
G_<-0
V<-rnorm(theta*G[N],sigma^2*G[N])
for(l in 1:k){
n<-2^(k-l)
for(j in 1:2^i-1){
i<-(2*j-1)*n
d1<-(n)*mu^2/nu
d2<-(n)*mu^2/nu
Y<-rbeta(1,d1,d2)
G_[i]<-G_[i-1]+(G[i+n]-G[i-n])*Y
G[i]
print(G_[i])
Z<-rnorm(0,(G_[i+n]-G_[i])*sigma^2*Y)
V_[i]<-Y*V_[i+n]+(1-Y)*V_[i-n]+Z
print(V_[i])
}
}
ts.plot(V[i])
I'm not sure what I've done wrong. The algorithm I am trying to follow is as below in the picture:
Based on your code, a numerical sequence was simulated. And it can be roughly validated by using VarianceGamma::vgFit to estimate the parameters.
Note that the time index starts from 1 due to R syntax. The sqrt of variance was used for the standard deviation in rnorm. And I probably shouldn't add the change due to interest rate vgC in the end, since it is not included in your algorithm. Please set it as 0 if it doesn't make sense.
Simulation by Brownian bridge:
# Brownian-Gamma Bridge Sampling (BGBS) of a VG process
set.seed(1)
M <- 10
nt <- 2^M + 1 #number of observations
T <- nt - 1 #total time
T_ <- seq(0, T, length.out=nt) #fixed time increments
#random time increments
#T_ = c(0, runif(nt-2), 1)
#T_ = sort(T_) * T
r <- 1 + 0.2 #interest rate
vgC <- (r-1)
sigma <- 0.5054
theta <- 0.2464
nu <- 0.1184
V_ <- G_ <- rep(NA,nt)
V_[1] <- 0
G_[1] <- 0
G_[nt] <- rgamma(1, shape=T/nu, scale=nu)
V_[nt] <- rnorm(1, theta*G_[nt], sqrt(sigma^2*G_[nt]))
for (k in 1:M)
{
n <- 2^(M-k)
for (j in 1:2^(k-1))
{
i <- (2*j-1) * n
Y <- rbeta(1, (T_[i+1]-T_[i-n+1])/nu, (T_[i+n+1]-T_[i+1])/nu)
G_[i+1] <- G_[i-n+1] + (G_[i+n+1] - G_[i-n+1]) * Y
Z <- rnorm(1, sd=sqrt((G_[i+n+1] - G_[i+1]) * sigma^2 * Y))
V_[i+1] <- Y * V_[i+n+1] + (1-Y) * V_[i-n+1] + Z
}
}
V_ <- V_ + vgC*T_ # changes due to interest rate
plot(T_, V_)
The results roughly match with the estimation:
#Estimated parameters:
library(VarianceGamma)
dV <- V_[2:nt] - V_[1:(nt-1)]
vgFit(dV)
> vgC sigma theta nu
> 0.2996 0.5241 0.1663 0.1184
#Real parameters:
c(vgC, sigma, theta, nu)
> vgC sigma theta nu
> 0.2000 0.5054 0.2464 0.1184
EDIT
As you commented, there is another similar algorithm and can be implemented in a similar way.
Your code could be modified as below:
set.seed(1)
M <- 7
nt <- 2^M + 1
T <- nt - 1
T_ <- seq(0, T, length.out=nt)
sigma=0.008835
theta= -0.003856
nu=0.263743
vgc=0.004132
V_ <- G_ <- rep(1,nt)
G_[T+1] <- rgamma(1, shape=T/nu, scale=nu) #
V_[T+1] <- rnorm(1, theta*G_[T+1], sqrt(sigma^2*G_[T+1])) #
V_[1] <- 0
G_[1] <- 0
for (m in 1:M){ #
Y <- rbeta(1,T/(2^m*nu), T/(2^m*nu))
for (j in 1:2^(m-1)){ #
i <- (2*j-1)
G_[i*T/(2^m)+1] = G_[(i-1)*T/(2^m)+1]+(-G_[(i-1)*T/(2^m)+1]+G_[(i+1)*T/(2^m)+1])*Y #
b=G_[T*(i+1)/2^m+1] - G_[T*(i)/2^m+1] #
Z_i <- rnorm(1, sd=b*sigma^2*Y)
#V_[i] <- Y* V_[i+1] + (1-Y)*V_[i-1] + Z_i
V_[i*T/(2^m)+1] <- Y* V_[(i+1)*T/(2^m)+1] + (1-Y)*V_[(i-1)*T/(2^m)+1] + Z_i
}
}
V_ <- V_ + vgc*T_
V_
ts.plot(V_, main="BRIDGE", xlab="Time increment")
Ryan again, I have found another algorithm for bridge sampling which I tried on my own, But I am not convinced that my answers are correct. I have added my code, output and algorithm below and also the output I think it should loom like? I have used a similar format to your code:
set.seed(1)
M <- 7
nt <- 2^M + 1 #number of observations
T <- nt - 1 #total time
T_ <- seq(0, T, length.out=nt) #fixed time increments
sigma=0.008835
theta= -0.003856
nu=0.263743
vgc=0.004132
V_ <- G_ <- rep(1,nt)
G_[T] <- rgamma(1, shape=T/nu, scale=nu)
V_[T] <- rnorm(1, theta*G_[T], sqrt(sigma^2*G_[T]))
V_[1] <- 0
G_[1] <- 0
for (m in 2:M){
Y <- rbeta(1,T/(2^m*nu), T/(2^m*nu))
for (j in 2:2^(m-1)){
i <- (2*j-1)
G_[i*T/(2^m)] = G_[(i-1)*T/(2^m)]+(G_[(i-1)*T/(2^m)]+G_[(i+1)*T/(2^m)])*Y
b=G_[T*(i)/2^m] - G_[T*(i-1)/2^m]
Z_i <- rnorm(1, sd=b*sigma^2*Y)
V_[i] <- Y* V_[i+1] + (1-Y)*V_[i-1] + Z_i
}
}
V_ <- V_ + vgc*T_ # changes due to interest rate
V_
ts.plot(V_, main="BRIDGE", xlab="Time increment")
However this is how my plot from my ouput, in figure 1:
Bu as Variance gamma is a jump process with finite activity, the path should look like this: , this is just an image from google for variance gamma paths, the sequential sampling one looks like this and my aim is to compare it to Bridge sampling for simulating paths. But my output looks really different. Please let me know your thoughts. If there is an issue in my code let me know thanks. Here is algortihm for it, much similar to the one above but slightly different:

Linear regression gradient descent algorithms in R produce varying results

I am trying to implement a linear regression in R from scratch without using any packages or libraries using the following data:
UCI Machine Learning Repository, Bike-Sharing-Dataset
The linear regression was easy enough, here is the code:
data <- read.csv("Bike-Sharing-Dataset/hour.csv")
# Select the useable features
data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")]
# Split the data
trainingObs<-sample(nrow(data1),0.70*nrow(data1),replace=FALSE)
# Create the training dataset
trainingDS<-data1[trainingObs,]
# Create the test dataset
testDS<-data1[-trainingObs,]
x0 <- rep(1, nrow(trainingDS)) # column of 1's
x1 <- trainingDS[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed")]
# create the x- matrix of explanatory variables
x <- as.matrix(cbind(x0,x1))
# create the y-matrix of dependent variables
y <- as.matrix(trainingDS$cnt)
m <- nrow(y)
solve(t(x)%*%x)%*%t(x)%*%y
The next step is to implement the batch update gradient descent and here is where I am running into problems. I dont know where the errors are coming from or how to fix them, but the code works. The problem is that the values being produced are radically different from the results of the regression and I am unsure of why.
The two versions of the batch update gradient descent that I have implemented are as follows (the results of both algorithms differ from one another and from the results of the regression):
# Gradient descent 1
gradientDesc <- function(x, y, learn_rate, conv_threshold, n, max_iter) {
plot(x, y, col = "blue", pch = 20)
m <- runif(1, 0, 1)
c <- runif(1, 0, 1)
yhat <- m * x + c
MSE <- sum((y - yhat) ^ 2) / n
converged = F
iterations = 0
while(converged == F) {
## Implement the gradient descent algorithm
m_new <- m - learn_rate * ((1 / n) * (sum((yhat - y) * x)))
c_new <- c - learn_rate * ((1 / n) * (sum(yhat - y)))
m <- m_new
c <- c_new
yhat <- m * x + c
MSE_new <- sum((y - yhat) ^ 2) / n
if(MSE - MSE_new <= conv_threshold) {
abline(c, m)
converged = T
return(paste("Optimal intercept:", c, "Optimal slope:", m))
}
iterations = iterations + 1
if(iterations > max_iter) {
abline(c, m)
converged = T
return(paste("Optimal intercept:", c, "Optimal slope:", m))
}
}
return(paste("MSE=", MSE))
}
AND:
grad <- function(x, y, theta) { # note that for readability, I redefined theta as a column vector
gradient <- 1/m* t(x) %*% (x %*% theta - y)
return(gradient)
}
grad.descent <- function(x, maxit, alpha){
theta <- matrix(rep(0, length=ncol(x)), ncol = 1)
for (i in 1:maxit) {
theta <- theta - alpha * grad(x, y, theta)
}
return(theta)
}
If someone could explain why these two functions are producing different results I would greatly appreciate it. I also want to make sure that I am in fact implementing the gradient descent correctly.
Lastly, how can I plot the results of the descent with varying learning rates and superimpose this data over the results of the regression itself?
EDIT
Here are the results of running the two algorithms with alpha = .005 and 10,000 iterations:
1)
> gradientDesc(trainingDS, y, 0.005, 0.001, 32, 10000)
TEXT_SHOW_BACKTRACE environmental variable.
[1] "Optimal intercept: 2183458.95872599 Optimal slope: 62417773.0184353"
2)
> print(grad.descent(x, 10000, .005))
[,1]
x0 8.3681113
season 19.8399837
mnth -0.3515479
hr 8.0269388
holiday -16.2429750
weekday 1.9615369
workingday 7.6063719
weathersit -12.0611266
temp 157.5315413
atemp 138.8019732
hum -162.7948299
windspeed 31.5442471
To give you an example of how to write functions like this in a slightly better way, consider the following:
gradientDesc <- function(x, y, learn_rate, conv_threshold, max_iter) {
n <- nrow(x)
m <- runif(ncol(x), 0, 1) # m is a vector of dimension ncol(x), 1
yhat <- x %*% m # since x already contains a constant, no need to add another one
MSE <- sum((y - yhat) ^ 2) / n
converged = F
iterations = 0
while(converged == F) {
m <- m - learn_rate * ( 1/n * t(x) %*% (yhat - y))
yhat <- x %*% m
MSE_new <- sum((y - yhat) ^ 2) / n
if( abs(MSE - MSE_new) <= conv_threshold) {
converged = T
}
iterations = iterations + 1
MSE <- MSE_new
if(iterations >= max_iter) break
}
return(list(converged = converged,
num_iterations = iterations,
MSE = MSE_new,
coefs = m) )
}
For comparison:
ols <- solve(t(x)%*%x)%*%t(x)%*%y
Now,
out <- gradientDesc(x,y, 0.005, 1e-7, 200000)
data.frame(ols, out$coefs)
ols out.coefs
x0 33.0663095 35.2995589
season 18.5603565 18.5779534
mnth -0.1441603 -0.1458521
hr 7.4374031 7.4420685
holiday -21.0608520 -21.3284449
weekday 1.5115838 1.4813259
workingday 5.9953383 5.9643950
weathersit -0.2990723 -0.4073493
temp 100.0719903 147.1157262
atemp 226.9828394 174.0260534
hum -225.7411524 -225.2686640
windspeed 12.3671942 9.5792498
Here, x refers to your x as defined in your first code chunk. Note the similarity between the coefficients. However, also note that
out$converged
[1] FALSE
so that you could increase the accuracy by increasing the number of iterations or by playing around with the step size. It might also help to scale your variables first.

Resources