Errors when attempting constrained optimisation using optim() - r

I have been using the Excel solver to handle the following problem
solve for a b and c in the equation:
y = a*b*c*x/((1 - c*x)(1 - c*x + b*c*x))
subject to the constraints
0 < a < 100
0 < b < 100
0 < c < 100
f(x[1]) < 10
f(x[2]) > 20
f(x[3]) < 40
where I have about 10 (x,y) value pairs. I minimize the sum of abs(y - f(x)). And I can constrain both the coefficients and the range of values for the result of my function at each x.
I tried nls (without trying to impose the constraints) and while Excel provided estimates for almost any starting values I cared to provide, nls almost never returned an answer.
I switched to using optim, but I'm having trouble applying the constraints.
This is where I have gotten so far-
best = function(p,x,y){sum(abs(y - p[1]*p[2]*p[3]*x/((1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x))))}
p = c(1,1,1)
x = c(.1,.5,.9)
y = c(5,26,35)
optim(p,best,x=x,y=y)
I did this to add the first set of constraints-
optim(p,best,x=x,y=y,method="L-BFGS-B",lower=c(0,0,0),upper=c(100,100,100))
I get the error ""ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"
and end up with a higher value of the error ($value). So it seems like I am doing something wrong. I couldn't figure out how to apply my other set of constraints at all.
Could someone provide me a basic idea how to solve this problem that a non-statistician can understand? I looked at a lot of posts and looked in a few R books. The R books stopped at the simplest use of optim.

The absolute value introduces a singularity:
you may want to use a square instead,
especially for gradient-based methods (such as L-BFGS).
The denominator of your function can be zero.
The fact that the parameters appear in products
and that you allow them to be (arbitrarily close to) zero
can also cause problems.
You can try with other optimizers
(complete list on the optimization task view),
until you find one for which the optimization converges.
x0 <- c(.1,.5,.9)
y0 <- c(5,26,35)
p <- c(1,1,1)
lower <- 0*p
upper <- 100 + lower
f <- function(p,x=x0,y=y0) sum(
(
y - p[1]*p[2]*p[3]*x / ( (1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x) )
)^2
)
library(dfoptim)
nmkb(p, f, lower=lower, upper=upper) # Converges
library(Rvmmin)
Rvmmin(p, f, lower=lower, upper=upper) # Does not converge
library(DEoptim)
DEoptim(f, lower, upper) # Does not converge
library(NMOF)
PSopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
DEopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
library(minqa)
bobyqa(p, f, lower, upper) # Does not really converge
As a last resort, you can always use a grid search.
library(NMOF)
r <- gridSearch( f,
lapply(seq_along(p), function(i) seq(lower[i],upper[i],length=200))
)

Related

How do I minimize a linear least squares function in R?

I'm reading Deep Learning by Goodfellow et al. and am trying to implement gradient descent as shown in Section 4.5 Example: Linear Least Squares. This is page 92 in the hard copy of the book.
The algorithm can be viewed in detail at https://www.deeplearningbook.org/contents/numerical.html with R implementation of linear least squares on page 94.
I've tried implementing in R, and the algorithm as implemented converges on a vector, but this vector does not seem to minimize the least squares function as required. Adding epsilon to the vector in question frequently produces a "minimum" less than the minimum outputted by my program.
options(digits = 15)
dim_square = 2 ### set dimension of square matrix
# Generate random vector, random matrix, and
set.seed(1234)
A = matrix(nrow = dim_square, ncol = dim_square, byrow = T, rlnorm(dim_square ^ 2)/10)
b = rep(rnorm(1), dim_square)
# having fixed A & B, select X randomly
x = rnorm(dim_square) # vector length of dim_square--supposed to be arbitrary
f = function(x, A, b){
total_vector = A %*% x + b # this is the function that we want to minimize
total = 0.5 * sum(abs(total_vector) ^ 2) # L2 norm squared
return(total)
}
f(x,A,b)
# how close do we want to get?
epsilon = 0.1
delta = 0.01
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
steps = vector()
while(L2_norm > delta){
x = x - epsilon * value
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
print(L2_norm)
}
minimum = f(x, A, b)
minimum
minimum_minus = f(x - 0.5*epsilon, A, b)
minimum_minus # less than the minimum found by gradient descent! Why?
On page 94 of the pdf appearing at https://www.deeplearningbook.org/contents/numerical.html
I am trying to find the values of the vector x such that f(x) is minimized. However, as demonstrated by the minimum in my code, and minimum_minus, minimum is not the actual minimum, as it exceeds minimum minus.
Any idea what the problem might be?
Original Problem
Finding the value of x such that the quantity Ax - b is minimized is equivalent to finding the value of x such that Ax - b = 0, or x = (A^-1)*b. This is because the L2 norm is the euclidean norm, more commonly known as the distance formula. By definition, distance cannot be negative, making its minimum identically zero.
This algorithm, as implemented, actually comes quite close to estimating x. However, because of recursive subtraction and rounding one quickly runs into the problem of underflow, resulting in massive oscillation, below:
Value of L2 Norm as a function of step size
Above algorithm vs. solve function in R
Above we have the results of A %% x followed by A %% min_x, with x estimated by the implemented algorithm and min_x estimated by the solve function in R.
The problem of underflow, well known to those familiar with numerical analysis, is probably best tackled by the programmers of lower-level libraries best equipped to tackle it.
To summarize, the algorithm appears to work as implemented. Important to note, however, is that not every function will have a minimum (think of a straight line), and also be aware that this algorithm should only be able to find a local, as opposed to a global minimum.

Nested integration for incomplete convolution of gauss densities

Let g(x) = 1/(2*pi) exp ( - x^2 / 2) be the density of the normal distribution with mean 0 and standard deviation 1. In some calculation on paper appeared integrals of the form
where c>0 is a positive number.
Since I could not evaluate this by hand, I had the idea to approximate and plot it. I tried this in R, because R provides the dnorm function and a function to do integrals.
You see that I need to integrate numerically n times, where n shall be chosed by the call of a plot function. My code has an for-loop to create those "incomplete" convolutions iterativly.
For example even with n=3 and c=1 this gives me an error. n=2 (thus it's one integration) works.
N = 3
ngauss <- function(x) dnorm(x , mean = 0, sd = 1)
convoluts <- list()
convoluts[[1]] <- ngauss
for (i in 2:N) {
h <- function(y) {
g <- function(z) {ngauss(y-z)*convoluts[[i-1]](z)}
return(integrate(g, lower = -1, upper = 1)$value)
}
h <- Vectorize(h)
convoluts[[i]] <- h
}
convoluts[[3]](0)
What I get is:
Error: evaluation nested too deeply: infinite recursion /
options(expressions=)?
I understand that this is a hard computation, but for "small" n something similar should possible.
Maybe someone can help me to fix my code or provide a recommendation how I can implement this in a better way. Another language that is more appropriate for this would be also okay.
The issue appears to be in how integrate deals with variables in different environments. In particular, it doesn't really deal with i correctly in each iteration. Instead using
h <- evalq(function(y) {
g <- function(z) {ngauss(y - z) * convoluts[[i - 1]](z)}
integrate(g, lower = -1, upper = 1)$value
}, list(i = i))
does the job and, say, setting N <- 6 quickly gives
convoluts[[N]](0)
# [1] 0.03423872
As your integration is simply the pdf of a sum of N independent standard normals (which then follows N(0, N)), we may also verify this approach by setting lower = -Inf and upper = Inf. Then with N <- 4 we have
dnorm(0, sd = sqrt(N))
# [1] 0.1994711
convoluts[[N]](0)
# [1] 0.1994711
So, for practical purposes, when c = Inf, you are way better off using dnorm rather than manual computations.

Monte carlo integration not working?

I wish to integrate (1/y)*(2/(1+(log(y))^2)) from 0 to 1. Wolfram alpha tells me this should be pi. But when I do monte carlo integration in R, I keep getting 3.00 and 2.99 after trying over 10 times. This is what I have done:
y=runif(10^6)
f=(1/y)*(2/(1+(log(y))^2))
mean(f)
I copied the exact function into wolfram alpha to check that the integral should be pi
I tried to check if my y is properly distributed by checking it's mean and plotting a historgram, and it seems to be ok. Could there be something wrong with my computer?
Edit: Maybe someone else could copy my code and run it themselves, to confirm that it isn't my computer acting up.
Ok, first let's start with simple transformation, log(x) -> x, making integral
I = S 2/(1+x^2) dx, x in [0...infinity]
where S is integration sign.
So function 1/(1+x^2) is falling monotonically and reasonable fast. We need some reasonable PDF to sample points in [0...infinity] interval, such that most of the region where original function is significant is covered. We will use exponential distribution with some free parameter which we will use to optimize sampling.
I = S 2/(1+x^2)*exp(k*x)/k k*exp(-k*x) dx, x in [0...infinity]
So, we have k*e-kx as properly normalized PDF in the range of [0...infinity]. Function to integrate is (2/(1+x^2))*exp(k*x)/k. We know that sampling from exponential is basically -log(U(0,1)), so code to do that is very simple
k <- 0.05
# exponential distribution sampling from uniform vector
Fx <- function(x) {
-log(x) / k
}
# integrand
Fy <- function(x) {
( 2.0 / (1.0 + x*x) )*exp(k*x) / k
}
set.seed(12345)
n <- 10^6L
s <- runif(n)
# one could use rexp() as well instead of Fx
# x <- rexp(n, k)
x <- Fx(s)
f <- Fy(x)
q <- mean(f)
print(q)
Result is equal to 3.145954, for seed 22345 result is equal to 3.135632, for seed 32345 result is equal to 3.146081.
UPDATE
Going back to original function [0...1] is quite simple
UPDATE II
changed per prof.Bolker suggestion

Error in Gradient Descent Calculation

I tried to write a function to calculate gradient descent for a linear regression model. However the answers I was getting does not match the answers I get using the normal equation method.
My sample data is:
df <- data.frame(c(1,5,6),c(3,5,6),c(4,6,8))
with c(4,6,8) being the y values.
lm_gradient_descent <- function(df,learning_rate, y_col=length(df),scale=TRUE){
n_features <- length(df) #n_features is the number of features in the data set
#using mean normalization to scale features
if(scale==TRUE){
for (i in 1:(n_features)){
df[,i] <- (df[,i]-mean(df[,i]))/sd(df[,i])
}
}
y_data <- df[,y_col]
df[,y_col] <- NULL
par <- rep(1,n_features)
df <- merge(1,df)
data_mat <- data.matrix(df)
#we need a temp_arr to store each iteration of parameter values so that we can do a
#simultaneous update
temp_arr <- rep(0,n_features)
diff <- 1
while(diff>0.0000001){
for (i in 1:(n_features)){
temp_arr[i] <- par[i]-learning_rate*sum((data_mat%*%par-y_data)*df[,i])/length(y_data)
}
diff <- par[1]-temp_arr[1]
print(diff)
par <- temp_arr
}
return(par)
}
Running this function,
lm_gradient_descent(df,0.0001,,0)
the results I got were
c(0.9165891,0.6115482,0.5652970)
when I use the normal equation method, I get
c(2,1,0).
Hope someone can shed some light on where I went wrong in this function.
You used the stopping criterion
old parameters - new parameters <= 0.0000001
First of all I think there's an abs() missing if you want to use this criterion (though my ignorance of R may be at fault).
But even if you use
abs(old parameters - new parameters) <= 0.0000001
this is not a good stopping criterion: it only tells you that progress has slowed down, not that it's already sufficiently accurate. Try instead simply to iterate for a fixed number of iterations. Unfortunately it's not that easy to give a good, generally applicable stopping criterion for gradient descent here.
It seems that you have not implemented a bias term. In a linear model like this, you always want to have an additional additive constant, i.e., your model should be like
w_0 + w_1*x_1 + ... + w_n*x_n.
Without the w_0 term, you usually won't get a good fit.
I know this is a couple of weeks old at this point but I'm going to take a stab at for several reasons, namely
Relatively new to R so deciphering your code and rewriting it is good practice for me
Working on a different Gradient Descent problem so this is all fresh to me
Need the stackflow points and
As far as I can tell you never got a working answer.
First, regarding your data structures. You start with a dataframe, rename a column, strip out a vector, then strip out a matrix. It would be a lot easier to just start with an X matrix (capitalized since its component 'features' are referred to as xsubscript i) and a y solution vector.
X <- cbind(c(1,5,6),c(3,5,6))
y <- c(4,6,8)
We can easily see what the desired solutions are, with and without scaling by fitting a linear fit model. (NOTE We only scale X/features and not y/solutions)
> lm(y~X)
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X1 X2
-4 -1 3
> lm(y~scale(X))
Call:
lm(formula = y ~ scale(X))
Coefficients:
(Intercept) scale(X)1 scale(X)2
6.000 -2.646 4.583
With regards to your code, one of the beauties of R is that it can perform matrix multiplication which is significantly faster than using loops.
lm_gradient_descent <- function(X, y, learning_rate, scale=TRUE){
if(scale==TRUE){X <- scale(X)}
X <- cbind(1, X)
theta <- rep(0, ncol(X)) #your old temp_arr
diff <- 1
old.error <- sum( (X %*% theta - y)^2 ) / (2*length(y))
while(diff>0.000000001){
theta <- theta - learning_rate * t(X) %*% (X %*% theta - y) / length(y)
new.error <- sum( (X %*% theta - y)^2 ) / (2*length(y))
diff <- abs(old.error - new.error)
old.error <- new.error
}
return(theta)
}
And to show it works...
> lm_gradient_descent(X, y, .01, 0)
[,1]
[1,] -3.9360685
[2,] -0.9851775
[3,] 2.9736566
vs expected of (-4, -1, 3)
For what its worth while I agree with #cfh that I would prefer a loop with a defined number of iterations, I'm actually not sure you need the abs function. If diff < 0 then your function is not converging.
Finally rather than using something like old.error and new.error I'd suggest using a a vector that records all errors. You can then plot that vector to see how quickly your function converges.

How extreme values of a functional can be found using R?

I have a functional like this :
(LaTex formula: $v[y]=\int_0^2 (y'^2+23yy'+12y^2+3ye^{2t})dt$)
with given start and end conditions y(0)=-1, y(2)=18.
How can I find extreme values of this functional in R? I realize how it can be done for example in Excel but didn't find appropriate solution in R.
Before trying to solve such a task in a numerical setting, it might be better to lean back and think about it for a moment.
This is a problem typically treated in the mathematical discipline of "variational calculus". A necessary condition for a function y(t) to be an extremum of the functional (ie. the integral) is the so-called Euler-Lagrange equation, see
Calculus of Variations at Wolfram Mathworld.
Applying it to f(t, y, y') as the integrand in your request, I get (please check, I can easily have made a mistake)
y'' - 12*y + 3/2*exp(2*t) = 0
You can go now and find a symbolic solution for this differential equation (with the help of a textbook, or some CAS), or solve it numerically with the help of an R package such as 'deSolve'.
PS: Solving this as an optimization problem based on discretization is possible, but may lead you on a long and stony road. I remember solving the "brachistochrone problem" to a satisfactory accuracy only by applying several hundred variables (not in R).
Here is a numerical solution in R. First the functional:
f<-function(y,t=head(seq(0,2,len=length(y)),-1)){
len<-length(y)-1
dy<-diff(y)*len/2
y0<-(head(y,-1)+y[-1])/2
2*sum(dy^2+23*y0*dy+12*y0^2+3*y0*exp(2*t))/len
}
Now the function that does the actual optimization. The best results I got were using the BFGS optimization method, and parametrizing using dy rather than y:
findMinY<-function(points=100, ## number of points of evaluation
boundary=c(-1,18), ## boundary values
y0=NULL, ## optional initial value
method="Nelder-Mead", ## optimization method
dff=T) ## if TRUE, optimizes based on dy rather than y
{
t<-head(seq(0,2,len=points),-1)
if(is.null(y0) || length(y0)!=points)
y0<-seq(boundary[1],boundary[2],len=points)
if(dff)
y0<-diff(y0)
else
y0<-y0[-1]
y0<-head(y0,-1)
ff<-function(z){
if(dff)
y<-c(cumsum(c(boundary[1],z)),boundary[2])
else
y<-c(boundary[1],z,boundary[2])
f(y,t)
}
res<-optim(y0,ff,control=list(maxit=1e9),method=method)
cat("Iterations:",res$counts,"\n")
ymin<-res$par
if(dff)
c(cumsum(c(boundary[1],ymin)),boundary[2])
else
c(boundary[1],ymin,boundary[2])
}
With 500 points of evaluation, it only takes a few seconds with BFGS:
> system.time(yy<-findMinY(500,method="BFGS"))
Iterations: 90 18
user system elapsed
2.696 0.000 2.703
The resulting function looks like this:
plot(seq(0,2,len=length(yy)),yy,type='l')
And now a solution that numerically integrates the Euler equation.
As #HansWerner pointed out, this problem boils down to applying the Euler-Lagrange equation to the integrand in OP's question, and then solving that differential equation, either analytically or numerically. In this case the relevant ODE is
y'' - 12*y = 3/2*exp(2*t)
subject to:
y(0) = -1
y(2) = 18
So this is a boundary value problem, best approached using bvpcol(...) in package bvpSolve.
library(bvpSolve)
F <- function(t, y.in, pars){
dy <- y.in[2]
d2y <- 12*y.in[1] + 1.5*exp(2*t)
return(list(c(dy,d2y)))
}
init <- c(-1,NA)
end <- c(18,NA)
t <- seq(0, 2, by = 0.01)
sol <- bvpcol(yini = init, yend = end, x = t, func = F)
y = function(t){ # analytic solution...
b <- sqrt(12)
a <- 1.5/(4-b*b)
u <- exp(2*b)
C1 <- ((18*u + 1) - a*(exp(4)*u-1))/(u*u - 1)
C2 <- -1 - a - C1
return(a*exp(2*t) + C1*exp(b*t) + C2*exp(-b*t))
}
par(mfrow=c(1,2))
plot(t,y(t), type="l", xlim=c(0,2),ylim=c(-1,18), col="red", main="Analytical Solution")
plot(sol[,1],sol[,2], type="l", xlim=c(0,2),ylim=c(-1,18), xlab="t", ylab="y(t)", main="Numerical Solution")
It turns out that in this very simple example, there is an analytical solution:
y(t) = a * exp(2*t) + C1 * exp(sqrt(12)*t) + C2 * exp(-sqrt(12)*t)
where a = -3/16 and C1 and C2 are determined to satisfy the boundary conditions. As the plots show, the numerical and analytic solution agree completely, and also agree with the solution provided by #mrip

Resources