How to put mathematical constraints with GenSA function in R - r

I am currently trying to use Simulated Annealing package GenSA in order to minimize the function below :
efficientFunction <- function(v) {
t(v) %*% Cov_Mat %*% v
}
Where Cov_Mat is a covariance matrix obtained from 4 assets and v is a weight vector of dimension 4.
I'm trying to solve the Markowitz asset allocation approach this way and I would like to know how I could introduce mathematical constraint such as the sum of all coefficients have to equal 1 :
sum(v) = 1
Moreover since I intend to rely on the GenSA function, I would like to use something like this with the constraint :
v <- c(0.25, 0.25, 0.25, 0.25)
dimension <- 4
lower <- rep(0, dimension)
upper <- rep(1, dimension)
out <- GenSA(v, lower = lower, upper = upper, fn = efficientFunction)
I have found in this paper : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.6091&rep=rep1&type=pdf
how to handle such constraint within the Simulated Annealing Algorithm but I don't know how I could implement it in R.
I'd be very grateful for any advice. It is my first time using SO so don't hesitate to tell me if I have the wrong approach in the way I ask question.

A possible approach would be to make use of so-called Lagrange multipliers (cf., http://en.wikipedia.org/wiki/Lagrange_multiplier). For example, set
efficientFunction <- function(v) {
lambda <- 100
t(v) %*% Cov_Mat %*% v + lambda * abs( sum(v) - 1 )
}
, so that in order to minimize the objective function efficientFunction the resulting parameter also minimize the penalty term lambda * abs( sum(v) - 1 ). The Lagrange multiplier lambda is set to an arbitrary but sufficiently high level.

So the function itself doesn't appear to have any constraints that you can set. However, you can reparameterize your function to force the constraint. How about
efficientFunction <- function(v) {
v <- v/sum(v)
t(v) %*% Cov_Mat %*% v
}
Here we normalize the values of v so that they will sum to 1. Then, when we get the output parameters, we need to perform the same transformation
out <- GenSA(v, lower = lower, upper = upper, fn = efficientFunction)
out$par/sum(out$par)

Related

Nested integration for incomplete convolution of gauss densities

Let g(x) = 1/(2*pi) exp ( - x^2 / 2) be the density of the normal distribution with mean 0 and standard deviation 1. In some calculation on paper appeared integrals of the form
where c>0 is a positive number.
Since I could not evaluate this by hand, I had the idea to approximate and plot it. I tried this in R, because R provides the dnorm function and a function to do integrals.
You see that I need to integrate numerically n times, where n shall be chosed by the call of a plot function. My code has an for-loop to create those "incomplete" convolutions iterativly.
For example even with n=3 and c=1 this gives me an error. n=2 (thus it's one integration) works.
N = 3
ngauss <- function(x) dnorm(x , mean = 0, sd = 1)
convoluts <- list()
convoluts[[1]] <- ngauss
for (i in 2:N) {
h <- function(y) {
g <- function(z) {ngauss(y-z)*convoluts[[i-1]](z)}
return(integrate(g, lower = -1, upper = 1)$value)
}
h <- Vectorize(h)
convoluts[[i]] <- h
}
convoluts[[3]](0)
What I get is:
Error: evaluation nested too deeply: infinite recursion /
options(expressions=)?
I understand that this is a hard computation, but for "small" n something similar should possible.
Maybe someone can help me to fix my code or provide a recommendation how I can implement this in a better way. Another language that is more appropriate for this would be also okay.
The issue appears to be in how integrate deals with variables in different environments. In particular, it doesn't really deal with i correctly in each iteration. Instead using
h <- evalq(function(y) {
g <- function(z) {ngauss(y - z) * convoluts[[i - 1]](z)}
integrate(g, lower = -1, upper = 1)$value
}, list(i = i))
does the job and, say, setting N <- 6 quickly gives
convoluts[[N]](0)
# [1] 0.03423872
As your integration is simply the pdf of a sum of N independent standard normals (which then follows N(0, N)), we may also verify this approach by setting lower = -Inf and upper = Inf. Then with N <- 4 we have
dnorm(0, sd = sqrt(N))
# [1] 0.1994711
convoluts[[N]](0)
# [1] 0.1994711
So, for practical purposes, when c = Inf, you are way better off using dnorm rather than manual computations.

R: conditional expected value

Hello everybody (this is my first post in here)!
I'm having a problem with finding the conditional expected value for a given distribution.
Suppose that we need to find E( x | x>0.5 ), where x has gev (generalised extreme value) distribution, with density dgev(x, xi, sigma, mu). What I was trying to do was
library(evir)
func1 <- function(x) {x*dgev(x, xi, sigma, mu)}
integral <- integrate(func1, lower = 0.5, upper = 10000, subdivisions = 10000)
cond.exp.val <- as.numeric(integral[1])/(1-q)
where q is the value that gives qgev(q, xi, sigma, mu) = 0.5, used for normalisation.
The result greatly depends on the 'upper' parameter of integrate() function and for higher values of this parameter the integral diverges. As my distribution parameters are
xi <- 0.81
sigma <- 0.0067
mu <- 0.0072
this integration should be feasible and convergent. Do you have any ideas what I am doing wrong or is there any built-in R function that may calculate such conditional expected value?
Generally, you are advised to use Inf rather than a large number when integrating the right tail of a density. See details in ?integrate. I took your description of q as being a value obtained by iteration and I stopped when I got within 4 decimal places of 0.5 using q <- 0.99315:
qgev(.99315, xi, sigma, mu)
[1] 0.4998413
You also incorrectly used the extraction from your integral variable. Should use either "[[" or "$" when working with lists:
func1 <- function(x) {x*dgev(x, xi, sigma, mu)}
integral <- integrate(func1, lower = 0.5, upper = Inf, subdivisions = 10000)
(cond.exp.val <- integral[[1]]/(1-.99315)) # `as.numeric` not needed
#[1] 2.646068
I have concerns that your description of how to get q was misleading, since values above 1 should not be an expectation derived from a statistical PDF.

Error in optim(): searching for global minimum for a univariate function

I am trying to optmize a function in R
The function is the Likelihood function of negative binominal when estimating only mu parameter. This should not be a problem since the function clearly has just one point of maximum. But, I am not being able to reach the desirable result.
The function to be optmized is:
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(Resultado)
}
Data is a vector of negative binomial variables with parameters 2 and 2
data <- rnegbin(10000, mu = 2, theta = 2)
When I plot the function having mu as variable with the following code:
x <- seq(0.1, 100, 0.02)
z <- EMV(data,0.1)
for (aux in x) {z <- rbind(z, EMV(data,aux))}
z <- z[2:NROW(z)]
plot(x,z)
I get the following curve:
And the maximum value of z is close to parameter value --> 2
x[which.max(z)]
But the optimization is not working with BFGS
Error in optim(par = theta, fn = EMV, data = data, method = "BFGS") :
non-finite finite-difference value [1]
And is not going to right value using SANN, for example:
$par
[1] 5.19767e-05
$value
[1] -211981.8
$counts
function gradient
10000 NA
$convergence
[1] 0
$message
NULL
The questions are:
What am I doing wrong?
Is there a way to tell optim that the param should be bigger than 0?
Is there a way to tell optim that I want to maximize the function? (I am afraid the optim is trying to minimize and is going to a very small value where function returns smallest values)
Minimization or Maximization?
Although ?optim says it can do maximization, but that is in a bracket, so minimization is default:
fn: A function to be minimized (or maximized) ...
Thus, if we want to maximize an objective function, we need to multiply an -1 to it, and then minimize it. This is quite a common situation. In statistics we often want to find maximum log likelihood, so to use optim(), we have no choice but to minimize the negative log likelihood.
Which method to use?
If we only do 1D minimization, we should use method "Brent". This method allows us to specify a lower bound and an upper bound of search region. Searching will start from one bound, and search toward the other, until it hit the minimum, or it reach the boundary. Such specification can help you to constrain your parameters. For example, you don't want mu to be smaller than 0, then just set lower = 0.
When we move to 2D or higher dimension, we should resort to "BFGS". In this case, if we want to constrain one of our parameters, say a, to be positive, we need to take log transform log_a = log(a), and reparameterize our objective function using log_a. Now, log_a is free of constraint. The same goes when we want constrain multiple parameters to be positive.
How to change your code?
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(-1 * Resultado)
}
optim(par = theta, fn = EMV, data = data, method = "Brent", lower = 0, upper = 1E5)
The help file for optim says: "By default optim performs minimization, but it will maximize if control$fnscale is negative." So if you either multiply your function output by -1 or change the control object input, you should get the right answer.

Maximize a target number by optimizing a weighting vector

I am trying to maximize the number N_ent through a 1x42 weighting vector (weight).
N_ent is calculated with the following function:
N_ent <- exp(-sum((((solve(pca$rotation[])) %*% t(weight))^2)*
(pca$sdev^2)/(sum((((solve(pca$rotation[])) %*% t(weight))^2)*
(pca$sdev^2)))*log((((solve(pca$rotation[])) %*% t(weight))^2)*
(pca$sdev^2)/(sum((((solve(pca$rotation[])) %*% t(weight))^2)*(pca$sdev^2))))))
Though it looks quite complicated, the equation works fine and supplies me with N_ent = 1.0967 when equal weights of 0.0238 (1/42 = 0.0238) are used.
Further, none of the weights may be below -0.1 or above 1.
I am new to R have struggled to use both the optim() (ignoring my constraints) and constrOptim() functions, encountering the error
Error in match.arg(method) : 'arg' must be of length 1
when optim() was used and
Error in ui %*% theta : non-conformable arguments
when constrOptim() was used.
Any help on how to set up the code for such an optimization problem would be greatly appreciated.
Here is the solution using library nloptr.
library(nloptr)
pca <- dget('pca.csv')
#random starting point
w0 <- runif(42, -0.1, 1)
#things that do not depend on weight
rotinv <- solve(pca$rotation)
m2 <- pca$sdev^2
#function to maximize
N_ent <- function(w) {
m1 <- (rotinv %*% w)^2
-exp(-sum(m1 * m2 / sum(m1 * m2) * log(m1 * m2 / sum(m1 * m2))))
}
#call optimization function
optres <- nloptr(w0, N_ent, lb = rep(-0.1, 42), ub = rep(1, 42),
opts = list('algorithm' = 'NLOPT_LN_NEWUOA_BOUND', 'print_level' = 2, 'maxeval' = 1000, 'xtol_rel' = 0))
You can view result by optres$solution. For your particular problem I find NLOPT_LN_NEWUOA_BOUND algorithm giving best result of 42. You can view all available algorithms by nloptr.print.options(). Note that _XN_ in the names of the algorithms indicate these that do not require derivatives. In your case derivative computation is not that difficult. You can provide it and use algorithms with _XD_ in the names.

Compute multiple Integral and plot them (with R)

I'm having trouble to compute and then plot multiple integral. It would be great if you could help me.
So I have this function
> f = function(x, mu = 30, s = 12){dnorm(x, mu, s)}
which i want to integrate multiple time between z(1:100) to +Inf to plot that with x=z and y = auc :
> auc = Integrate(f, z, Inf)
R return :
Warning message:
In if (is.finite(lower)) { :
the condition has length > 1 and only the first element will be used
I have tested to do a loop :
while(z < 100){
z = 1
auc = integrate(f,z,Inf)
z = z+1}
Doesn't work either ... don't know what to do
(I'm new to R , so I'm already sorry if it is really easy .. )
Thanks for your help :) !
There is no need to do the integrating by hand. pnorm gives the integral from negative infinity to the input for the normal density. You can get the upper tail instead by modifying the lower.tail parameter
z <- 1:100
y <- pnorm(z, mean = 30, sd = 12, lower.tail = FALSE)
plot(z, y)
If you're looking to integrate more complex functions then using integrate will be necessary - but if you're just looking to find probabilities for distributions then there will most likely be a function built in that does the integration for you directly.
Your problem is actually somewhat subtle, and in a certain sense gets to the core of how R works, so here is a slightly longer explanation.
R is a "vectorized" language, which means that just about everything works on vectors. If I have 2 vectors A and B, then A+B is the element-by-element sum of A and B. Nearly all R functions work this way also. If X is a vector, then Y <- exp(X) is also a vector, where each element of Y is the exponential of the corresponding element of X.
The function integrate(...) is one of the few functions in R that is not vectorized. So when you write:
f <- function(x, mu = 30, s = 12){dnorm(x, mu, s)}
auc <- integrate(f, z, Inf)
the integrate(...) function does not know what to do with z when it is a vector. So it takes the first element and complains. Hence the warning message.
There is a special function in R, Vectorize(...) that turns scalar functions into vectorized functions. You would use it this way:
f <- function(x, mu = 30, s = 12){dnorm(x, mu, s)}
auc <- Vectorize(function(z) integrate(f,z,Inf)$value)
z <- 1:100
plot(z,auc(z), type="l") # plot lines

Resources