nonlinear least square algorithm code with R - r

I have a question on minimizing the sum of squared residuals to estimate "theta" in the below regression function. I intend not to use any built-in functions or packages in R, and write the iterative algorithm.
The regression function is: y_k=exp(-theta |x_k|)+e_k, for k=1,...,n
Here is my code, but it gives me the following error for some sets of x and y. Thanks in advance for your suggestions!
Error in if (abs(dif) < 10^(-5)) break :
missing value where TRUE/FALSE needed"
Code:
theta <- -sum(log(abs(y)))/sum(abs(x))
#Alg:
rep <- 1
while(rep<=1000){
Ratio <- sum((abs(x)*exp(-theta*abs(x)))*(y-exp(-theta*abs(x))))/
sum((abs(x)^2*exp(-theta*abs(x)))*(y-2*exp(-theta*abs(x))))
if(is.na(Ratio)){
thetanew <- theta
}
else{
thetanew <- theta+Ratio
}
dif <- thetanew-theta
theta <- thetanew
if(abs(dif)<10^(-5)) break
rep=rep+1
}

Related

Implement a Monte Carlo Simulation Method to Estimate an Integral in R

I am trying to implement a Monte carlo simulation method to estimate an integral in R. However, I still get wrong answer. My code is as follows:
f <- function(x){
((cos(x))/x)*exp(log(x)-3)^3
}
t <- integrate(f,0,1)
n <- 10000 #Assume we conduct 10000 simulations
int_gral <- Monte_Car(n)
int_gral
You are not doing Monte-Carlo here. Monte-Carlo is a simulation method that helps you approximating integrals using sums/mean based on random variables.
You should do something in this flavor (you might have to verify that it's correct to say that the mean of the f output can approximates your integral:
f <- function(n){
x <- runif(n)
return(
((cos(x))/x)*exp(log(x)-3)^3
)
}
int_gral <- mean(f(10000))
What your code does is taking a number n and return ((cos(n))/n)*exp(log(n)-3)^3 ; there is no randomness in that
Update
Now, to get a more precise estimates, you need to replicate this step K times. Rather than using a loop, you can use replicate function:
K <- 100
dist <- data.frame(
int = replicate(K, mean(f(10000)))
)
You get a distribution of estimators for your integral :
library(ggplot2)
ggplot(dist) + geom_histogram(aes(x = int, y = ..density..))
and you can use mean to have a numerical value:
mean(dist$int)
# [1] 2.95036e-05
You can evaluate the precision of your estimates with
sd(dist$int)
# [1] 2.296033e-07
Here it is small because N is already large, giving you a good precision of first step.
I have managed to change the codes as follows. Kindly confirm to me that I am doing the right thing.
regards.
f <- function(x){
((cos(x))/x)*exp(log(x)-3)^3
}
set.seed(234)
n<-10000
for (i in 1:10000) {
x<-runif(n)
I<-sum(f(x))/n
}
I

Differentiation of mixture Gaussians in R

Is there any way that you can differentiate mixture Gaussians in R? To get the density of a mixture Gaussian distribution, I have used the following code:
dnorm_mix <- function(x, weights, means, sds) {
value <- 0
for (i in 1:length(weights)) {value <- value + weights[i]*dnorm(x, mean = means[i], sd = sds[i])}
return(value)
}
Can anyone help me find expressions for the first and second derivative of this? I've tried to use the 'deriv' built-in function in R, but gives an error 'dnorm_mix' is not in the derivatives table.
Thanks!

Non-linear optimization for exponential function with linear constraints

I try to solve a non-linear optimization problem using the function donlp2 in R. My goal is to find out the maximum value of the following function:
442.8658*(x1+1)^(0.008752747)*(y1+1)^(0.555782)+(x2+1)^(0.008752747)*(y2+1)^(0.555782)
There is no non-linear constraints. The linear constraints are listed below:
x1+x2<=20000;
y1+y2<=20000;
x1<=4662.41;
x2<=149339;
y1<=14013.94;
y2<=1342738;
x1>=0;
x2>=0;
y1>=0;
y2>=0;
Below is my code:
p <- c(rep(0,4))
par.l <- c(rep(0,4))
par.u <- c(4662.41, 149339, 14013.94, 1342738)
fn <- function(par){
x1 <- par[1]; y1<-par[3]
x2 <- par[2]; y2<-par[4]
y <- 1 / (442.8658*(x1+1)^(0.008752747)*(y1+1)^(0.555782)
+ (x2+1)^(0.008752747)*(y2+1)^(0.555782))
}
A <- matrix(c(rep(c(1,0),2), rep(c(0,1),2)), nrow=2)
lin.l <- c(-Inf, 20000)
lin.u <- c(-Inf, 20000)
ret <- donlp2(p, fn, par.u=par.u, par.l=par.l, A=A, lin.l=lin.l, lin.u=lin.u)
I searched and found some related posts saying that donlp2 is only good to find minimum value of a function, which is the reason I took the reciprocal in the objective function.
The code ran correctly, but I have concerns with the results, since I can easily find other values that can give me greater outcome, i.e. the minimization of the objective function is not true.
I also found that when I change the initial value or the lower bound of x1,x2,y1,y2, the results will change dramatically. For example, if I set p=c(rep(0,4)), par.l<-c(rep(1,4)) instead of p=c(rep(0,4)), par.l<-c(rep(0,4)), the results will change from
$par
[1] 2.410409e+00 5.442753e-03 1.000000e+04 1.000000e+04
to
$par
[1] 2331.748 74670.025 3180.113 16819.887
Any ideas? I appreciate your suggestions and help!

sampling a multimensional posterior distribution using MCMC Metropolis-Hastings algo in R

I am quite new in sampling posterior distributions(so therefore Bayesian approach) using a MCMC technique based on Metropolis-Hastings algorithm.
I am using the mcmc library in R for this. My distribution is multidimensionnal. In order to check if this metro algorithm works for multivaiate distribution I did it successfully on a multidimensional student-t distribution (package mvtnorm, function dmvt).
Now I want to apply the same thing to my multivariate distribution (2 vars x and y) but it doesn't work; I get an error : Error in X[, 1] : incorrect number of dimensions
Here is my code:
library(mcmc)
library(mvtnorm)
my.seed <- 123
logprior<-function(X,...)
{
ifelse( (-50.0 <= X[,1] & X[,1]<=50.0) & (-50.0 <= X[,2] & X[,2]<=50.0), return(0), return(-Inf))
}
logpost<-function(X,...)
{
log.like <- log( exp(-((X[,1]^2 + X[,2]^2 - 4)/10 )^2) * sin(4*atan(X[,2]/X[,1])) )
log.prior<-logprior(X)
log.post<-log.like + log.prior # if flat prior, the posterior distribution is the likelihood one
return (log.post)
}
x <- seq(-5,5,0.15)
y <- seq(-5,5,0.15)
X<-cbind(x,y)
#out <- metrop(function(X) dmvt(X, df=3, log=TRUE), 0, blen=100, nbatch=100) ; this works
out <- metrop(function(X) logpost(X), c(0,0), blen=100, nbatch=100)
out <- metrop(out)
out$accept
So I tried to respect the same kind of format than for the MWE, but it doesn't work still as I got the error mentioned before.
Another thing, is that applying logpost to X works perfectly.
Thanks in advance for your help, best
The metrop function passes individual samples, and therefore a simple vector to logpost, not a matrix (which is what X is). Hence, the solution is to change X[,1] and X[,2] to X[1] and X[2], respectively.
I ran it like this, and it leads to other issues (X[2]/X[1] is NaN for the initialization), but that has more to do with your specific likelihood model and is out of the scope of your question.

non-conformable arrays in calculation of the regression model

YC is a matrix with dimension= 39*1
XC is a matrix with dimension= 39*700
First I define a function in the following by calculating out the coefficient by Multi linear regression because i tried to get the predicted y of loocv. The error is Error in a + b1 %*% xp : non-conformable arrays.
mlr<- function(y,x,xp){
ybar<-mean(y)
xbar<-apply(x,1,mean)
xc<-scale(x,center = T, scale= FALSE)
b<-solve(t(xc)%*%xc)%*%t(xc)%*%y
b1<-t(b)
a<-ybar-b%*%xbar
xp<- matrix(nrow=2,ncol=1)
yp_LOOCV<-a+b1%*%xp
return(yp_LOOCV)
}
RMSECV<- sqrt(sum((total$YC - yp_LOOCV)^2)/39)
Second part is to plug the function into the LOOCV loop to calculate the RMSE of the data. The reason why i didnt applied a package and use commend lm() is that i will use the loop repeatedly afterwards by applying the pca and PLR.Could please see what else i should modify to correct the error?
YC<-data.frame(YC)
XPred<-cbind(total$X1590,total$X1724)
n <- 0
yp_LOOCV <- as.vector(0)
#Running the LOOCV iteratively ('for loop').
for (i in 1 : 39)
{n <- 1 + n
YC1 <-YC[-n,]
XC1<-XPred[-n,]
Yout<-YC[n]
Xout<-XPreduced[n]
yp_LOOCV<-mlr(YC1,XC1,Xout)
}

Resources