I am working with the R programming language.
I defined the following function:
f1 <- function(x) {
i <<- i+1
vals[[i]] <<- x
final_value = x[1]^2 + x[2]^2
}
and then optimized this function:
i <- 0
vals <- list()
res <- optim(c(1,1), f1, method="CG")
I am trying to understand the outputs of the "optim" (https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/optim) function:
$par
[1] -4.103643e-07 -4.103643e-07
$value
[1] 3.367978e-13
$counts
function gradient
26 11
$convergence
[1] 0
$message
NULL
In particular, I am trying to understand the "count" arguments. Reading the documentation, the following explanation is given:
counts:
A two-element integer vector giving the number of calls to fn and gr respectively. This excludes those calls needed to compute the Hessian, if requested, and any calls to fn to compute a finite-difference approximation to the gradient.
I would have thought that "counts" would refer to the number of iterations it took for the function to be optimized. But when I look at the number of iterations, the total number of iterations does not match the output of "counts":
> vals
[[1]]
[1] 1 1
[[2]]
[1] 1.001 1.000
[[3]]
[1] 0.999 1.000
[[4]]
[1] 1.000 1.001
....
[[68]]
[1] -1.000410e-03 -4.103643e-07
[[69]]
[1] -4.103643e-07 9.995896e-04
[[70]]
[1] -4.103643e-07 -1.000410e-03
For instance, it appears that 70 iterations were used to optimize this function, but the "counts" argument suggests that 26 iterations were used.
Can someone please help me understand what "counts" is referring to and what is the difference between 70 and 26 in this question?
Thanks!
As the documentation states, the function optims does not count all function calls. If the function was called e.g. to compute the Hessian derivative, or an approximation steps, vals will get one more value, but res$counts[["function"]] won't thus this value is smaller (26) compared to the total number of function calls (70). The function you want to optimize is very simple, so there aren't many iterations needed and most calls (70-26) is just administration overhead. The higher the difference, the less fraction of the time you spent on the main part (the actual optimization).
Related
I want to discount a number of cashflows and find the ''fair'' interest rate of a financial instrument. That is, I want to set the interest rate in such a way that
$$P=\sum_i^T e^{-ri}c_i$$, where P is some value. As a toy example, I came up with this:
value <- c(1,2,3,4,5,6,7,8,9,10)
discounted_value <- c()
for(i in 1:10){
discounted_value[i] <- value[i]*exp(-r*i)
}
should_be_equal_to <- 50
I want to find a r such that the sum of the vector of discounted_values is equal to 50 (should_be_equal_to). Does anyone have an idea how to solve this? I prefer not to use a number of manual grids for r.
It's like a least square problem. You define a function that calculates the difference between the sum of your predicted value for a given r :
fn <- function(value,r) {
delta = 50 - sum(value*exp(-r*seq_along(value)))
abs(delta)
}
Then you optimize this function, providing boundaries:
optim(par = 0.5,fn=fn, value=value,method="Brent",lower=0,upper=1)
$par
[1] 0.01369665
$value
[1] 2.240363e-07
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
And you can try the optimized parameter:
r = 0.01369665
discounted_value = value*exp(-r*seq_along(value))
sum(discounted_value)
[1] 50
I want to transform my excel solver model into a model in R. I need to find 3 sets of coordinates which minimizes the distance to the 5 other given coordinates. I've made a program which calculates a distance matrix which outputs the minimal distance from each input to the given coordinates. I want to minimize this function by changing the input. Id est, I want to find the coordinates such that the sum of minimal distances are minimized. I tried several methods to do so, see the code below (Yes my distance matrix function might be somewhat cluncky, but this is because I had to reduce the input to 1 variable in order to run some algorithms such as nloprt (would get warnings otherwise). I've also seen some other questions (such as GRG Non-Linear Least Squares (Optimization)) but they did not change/improve the solution.
# First half of p describes x coordinates, second half the y coordinates # yes thats cluncky
p<-c(2,4,6,5,3,2) # initial points
x_given <- c(2,2.5,4,4,5)
y_given <- c(9,5,7,1,2)
f <- function(Coordinates){
# Predining
Term_1 <- NULL
Term_2 <- NULL
x <- NULL
Distance <- NULL
min_prob <- NULL
l <- length(Coordinates)
l2 <- length(x_given)
half_length <- l/2
s <- l2*half_length
Distance_Matrix <- matrix(c(rep(1,s)), nrow=half_length)
# Creating the distance matrix
for (k in 1:half_length){
for (i in 1:l2){
Term_1[i] <- (Coordinates[k]-x_given[i])^2
Term_2[i] <- (Coordinates[k+half_length]-y_given[i])^2
Distance[i] <- sqrt(Term_1[i]+Term_2[i])
Distance_Matrix[k,i] <- Distance[i]
}
}
d <- Distance_Matrix
# Find the minimum in each row, thats what we want to obtain ánd minimize
for (l in 1:nrow(d)){
min_prob[l] <- min(d[l,])
}
som<-sum(min_prob)
return(som)
}
# Minimise
sol<-optim(p,f)
x<-sol$par[1:3]
y<-sol$par[4:6]
plot(x_given,y_given)
points(x,y,pch=19)
The solution however is clearly not that optimal. I've tried to use the nloptr function, but I'm not sure which algorithm to use. Which algorithm can I use or can I use/program another function which solves this problem? Thanks in advance (and sorry for the detailed long question)
Look at the output of optim. It reached the iteration limit and had not yet converged.
> optim(p, f)
$`par`
[1] 2.501441 5.002441 5.003209 5.001237 1.995857 2.000265
$value
[1] 0.009927249
$counts
function gradient
501 NA
$convergence
[1] 1
$message
NULL
Although the result is not that different you will need to increase the number of iterations to get convergence. If that is still unacceptable then try different starting values.
> optim(p, f, control = list(maxit = 1000))
$`par`
[1] 2.502806 4.999866 5.000000 5.003009 1.999112 2.000000
$value
[1] 0.005012449
$counts
function gradient
755 NA
$convergence
[1] 0
$message
NULL
I am trying to use L-BFGS-B method in optim() to find out the minimum value of the following function:
ip<-function(x) log(mean(exp(return*x))) , where "return" is a series of constants.
First, I gave no boundary constraints: rst1<-optim (-1,ip,method="L-BFGS-B"), and it provided a reasonable answer (x=-118.44,ip.min=-0.00017), which could be justified by both theory and excel calculation. The given message in the result was
CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL.
As x must be less than zero in theory, I then added boundary constraints to the optimizer: rst2<-optim (-1,ip,method="L-BFGS-B",lower=-Inf,upper=0). However, this time it only provided an answer calculated by the initial parameter (-1), which is obviously not the minimum value. The given message in the result was
CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH.
I then tried other boundary constraints, and no matter what they were, as long as boundary constraints were added here, it would always provided an answer calculated by the initial parameter, and failed to find the minimum value.
Does anyone know why this happens? Many thanks.
example
rtntxt<-"
return
9.15051E-05
9.67217E-07
1.34187E-05
-0.000105801
0.000111004
0.000228786
3.84068E-06
0.000388639
-0.000122291
-7.73028E-05
4.97595E-05
-3.97503E-05
1.86449E-05
-0.000137739
-0.000180709
-1.07254E-05
3.89723E-05
"
rtn<-read.table(text=rtntxt,header=TRUE)
ip<-function(x) log(mean(exp(rtn$return*x)))
rst1<-optim(-1,ip,method="L-BFGS-B") #no boundaries
rst2<-optim(-1,ip,method="L-BFGS-B",lower=-Inf,upper=0) #with boundaries
plot
x<- -10000:10000
n<-length(x)
s<-numeric(n)
for(i in 1:n) s[i]<-ip(x[i])
plot(x,s)
x[which(s==min(s))] #rst1(no boundaries) is correct
min(s)
I am not sure how did you get that result: If I correct your code for misspelling I still get similar answers, and not the answer you got from your result:
ip<-function(x) log(mean(exp(return(x))))
rst1<-optim(-1,ip,method="L-BFGS-B")
# > rst1
# $`par`
# [1] -1.820444e+13
#
# $value
# [1] -1.820444e+13
#
# $counts
# function gradient
# 20 20
#
# $convergence
# [1] 0
#
# $message
# [1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
#
rst2<-optim (-1,ip,method="L-BFGS-B",lower=-Inf,upper=0)
# $`par`
# [1] -1.80144e+13
#
# $value
# [1] -1.80144e+13
#
# $counts
# function gradient
# 3 3
#
# $convergence
# [1] 0
#
# $message
# [1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
Moreover, to check whether there could be a mistake on my code, I tried to plot the values of you function for -1:-100000, but It does not look like there exist an optim where you tell there is. Check your code/post, and if you know approximately where the optimum value is, try to plot it graphically ( that would be my advise). Cheers !,
plot(x = -1:-100000, y = ip(-1:-100000))
Here's my setup
obs1<-c(1,1,1)
obs2<-c(0,1,2)
obs3<-c(0,0,3)
absoluteError<-function(obs,x){
return(sum(abs(obs-x)))
}
Example:
> absoluteError(obs2,1)
[1] 2
For a random vector of observations, I'd like to find a minimizer, x, which minimizes the absolute error between the observation values and a vector of all x. For instance, clearly the argument that minimizes absoluteError(obs1,x) is x=1 because this results in an error of 0. How do I find a minimizer for a random vector of observations? I'd imagine this is a linear programming problem, but I've never implemented one in R before.
The median of obs is a minimizer for the absolute error. The following is a sketch of how one might try proving this:
Let the median of a set of n observations, obs, be m. Call the absolute error between obs and m f(obs,m).
Case n is odd:
Consider f(obs,m+delta) where delta is some non zero number. Suppose delta is positive - then there are (n-1)/2 +1 observations whose error is delta more than f(obs,m). The remaining (n-1)/2 observations' error is at most delta less than f(obs,m). So f(obs,m+delta)-f(obs,m)>=delta. (The same argument can be made if delta is negative.) So the median is the only minimizer in this case. Thus f(obs,m+delta)>f(obs,m) for any non zero delta so m is a minimizer for f.
Case n is even:
Basically the same logic as above, except in this case any number between the two inner most numbers in the set will be a minimizer.
I am not sure this answer is correct, and even if it is I am not sure this is what you want. Nevertheless, I am taking a stab at it.
I think you are talking about 'Least absolute deviations', a form of regression that differs from 'Least Squares'.
If so, I found this R code for solving Least absolute deviations regression:
fabs=function(beta0,x,y){
b0=beta0[1]
b1=beta0[2]
n=length(x)
llh=0
for(i in 1:n){
r2=(y[i]-b0-b1*x[i])
llh=llh + abs(r2)
}
llh
}
g=optim(c(1,1),fabs,x=x,y=y)
I found the code here:
http://www.stat.colostate.edu/~meyer/hw12ans.pdf
Assuming you are talking about Least absolute deviations, you might not be interested in the above code if you want a solution in R from scratch rather than a solution that uses optim.
The above code is for a regression line with an intercept and one slope. I modified the code as follows to handle a regression with just an intercept:
y <- c(1,1,1)
x <- 1:length(y)
fabs=function(beta0,x,y){
b0=beta0[1]
b1=0
n=length(x)
llh=0
for(i in 1:n){
r2=(y[i]-b0-b1*x[i])
llh=llh + abs(r2)
}
llh
}
# The commands to get the estimator
g = optim(c(1),fabs,x=x,y=y, method='Brent', lower = (min(y)-5), upper = (max(y)+5))
g
I was not familiar with (i.e., had not heard of) Least absolute deviations until tonight. So, hopefully my modifications are fairly reasonable.
With y <- c(1,1,1) the parameter estimate is 1 (which I think you said is the correct answer):
$par
[1] 1
$value
[1] 1.332268e-15
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
With y <- c(0,1,2) the parameter estimate is 1:
$par
[1] 1
$value
[1] 2
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
With y <- c(0,0,3) the parameter estimate is 0 (which you said is the correct answer):
$par
[1] 8.613159e-10
$value
[1] 3
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
If you want R code from scratch, there is additional R code in the file at the link above which might be helpful.
Alternatively, perhaps it might be possible to extract the relevant code from the source file.
Alternatively, perhaps someone else can provide the desired code (and correct any errors on my part) in the next 24 hours.
If you come up with code from scratch please post it as an answer as I would love to see it myself.
lad=function(x,y){
SAD = function(beta, x, y) {
return(sum(abs(y - (beta[1] + beta[2] * x))))
}
d=lm(y~x)
ans1 = optim(par=c(d$coefficients[1], d$coefficients[2]),method = "Nelder-Mead",fn=SAD, x=x, y=y)
coe=setNames(ans1$par,c("(Intercept)",substitute(x)))
fitted=setNames(ans1$par[1]+ans1$par[2]*x,c(1:length(x)))
res=setNames(y-fitted,c(1:length(x)))
results = list(coefficients=coe, fitted.values=fitted, residuals=res)
class(results)="lad"
return(results)
}
I would like to solve an equation as below, where the X is the only unknown variable and function f() is a multi-variate Student t distribution.
More precisely, I have a multi k-dimensional integral for a student density function, which gives us a probability as a result, and I know that this probability is given as q. The lower bound for all integral is -Inf and I know the last k-1 dimension's upper bound (as given), the only unknown variable is the first integral's upper bound. It should have an solution for a variable and one equation. I tried to solve it in R. I did Dynamic Conditional Correlation to have a correlation matrix in order to specify my t-distribution. So plug this correlation matrix into my multi t distribution "dmvt", and use the "adaptIntegral" function from "cubature" package to construct a function as an argument to the command "uniroot" to solve the upper bound on the first integral. But I have some difficulties to achieve what I want to get. (I hope my question is clear) I have provided my codes before, somebody told me that there is problem, but cannot find why there is an issue there. Many thanks in advance for your help.
I now how to deal with it with one dimension integral, but I don't know how a multi-dimension integral equation can be solved in R? (e.g. for 2 dimension case)
\int_{-\infty}^{X}
\int_{-\infty}^{Y_{1}} \cdots
\int_{-\infty}^{Y_{k}}
f(x,y_{1},\cdots y_{k})
d_{x}d_{y_{1},}\cdots d_{y_{k}} = q
This code fails:
require(cubature)
require(mvtnorm)
corr <- matrix(c(1,0.8,0.8,1),2,2)
f <- function(x){ dmvt(x,sigma=corr,df=3) }
g <- function(y) adaptIntegrate(f,
lowerLimit = c( -Inf, -Inf),
upperLimit = c(y, -0.1023071))$integral-0.0001
uniroot( g, c(-2, 2))
Since mvtnorm includes a pmvt function that computes the CDF of the multivariate t distribution, you don't need to do the integral by brute force. (mvtnorm also includes a quantile function qmvt, but only for "equicoordinate" values.)
So:
library(mvtnorm)
g <- function(y1_upr,y2_upr=-0.123071,target=1e-4,df=3) {
pmvt(upper=c(y1_upr,y2_upr),df=df)-target
}
uniroot(g,c(-10000,0))
## $root
## [1] -17.55139
##
## $f.root
## [1] -1.699876e-11
## attr(,"error")
## [1] 1e-15
## attr(,"msg")
## [1] "Normal Completion"
##
## $iter
## [1] 18
##
## $estim.prec
## [1] 6.103516e-05
##
Double-check:
pmvt(upper=c(-17.55139,-0.123071),df=3)
## [1] 1e-04
## attr(,"error")
## [1] 1e-15
## attr(,"msg")
## [1] "Normal Completion"