Optimizing in R with constraints - r

I have a function f of two variables which I want to minimize under the constraint x[1]+x[2]=1.
Here,
f <- function(x){
y <- 4*sin(x[1])+3*cos(x[2])
return(y)
}
I have read here that optim() does the work, but how do I include my constraint?

After adding the constraint x[1] + x[2] = 1, the function becomes an univariate function and you can rewrite it as the following:
f <- function(x){
4*sin(x) + 3*cos(1-x)
}
optimize() can be used on one-dimensional optimization.
opt <- optimize(0, c(0, 10))
opt
# $minimum
# [1] 4.468871
#
# $objective
# [1] -6.722745
curve(f, 0, 10)
with(opt, points(minimum, objective, col = "red", pch = 16))

Related

How to find the value of time `t>0` so that `h(t)=0.01` in R?

Improving this answer in question:How to get the value of `t` so that my function `h(t)=epsilon` for a fixed `epsilon`?.
My question is that:
Consider a random matrix and sample its eigenvectors v_i and eigenvalues lambda_i. Given initial data x_0, I want to get the hitting time that for a fixed epsilon=0.01, t_n:=\inf\{t>0: h_1(t)\ge \epsilon\}. Here the function h_1(t) is given by
I have wrote the code for these setting and function h_1(t):
#make this example reproducible
set.seed(100001)
n <- 300
#Sample GOE random matrix
A <- matrix(rnorm(n*n, mean=0, sd=1), n, n)
G <- (A + t(A))/sqrt(2*n)
ev <- eigen(G)
l <- ev$values
v <- ev$vectors
#size of multivariate distribution
mean <- rep(0, n)
var <- diag(n)
#simulate bivariate normal distribution
initial <- MASS::mvrnorm(n=1000, mu=mean, Sigma=var) #ten random vectors
#normalized the first possible initial value, the initial data uniformly distributed on the sphere
xmats <- lapply(1:1000, function(i) initial[i, ]/norm(initial[i, ], type="2"))
h1t <- function(t,x_0) {
h10 <- c(x_0 %*% v[, n])
denom <- vapply(t, function(.t) {
sum((x_0 %*% v)^2 * exp(-4*(l - l[n]) * .t))
}, numeric(1L))
abs(h10) / sqrt(denom)
}
I used the answer in that question for finding the hitting time:
find_t <- function(x, epsilon = 0.01, range = c(-50, 50)) {
uniroot(function(t) h1t(t, x) - epsilon, range,
tol = .Machine$double.eps)$root
}
res <- lapply(xmats, find_t)
The output res:
[[995]]
[1] -0.2698699
[[996]]
[1] -0.3138642
[[997]]
[1] -0.4417028
[[998]]
[1] -0.04204563
[[999]]
[1] -0.4150783
[[1000]]
[1] -0.3695955
Question:
But this output res will contain negative value. How to fix that?
If I plot the graph of my function h_1(t): we can see that for epsilon=0.01, the value of time t should be positive... So it seems that here is something wrong in the function find_t .
h1t <- function(t,x_0=unlist(xmats[1000])) {
h10 <- c(x_0 %*% v[, n])
denom <- vapply(t, function(.t) {
sum((x_0 %*% v)^2 * exp(-4*(l - l[n]) * .t))
}, numeric(1L))
abs(h10) / sqrt(denom)
}
plot(h1t,0,200)
Update:
I found that if I choose n=1000 for the size of matrix, there would be error:
Error in uniroot(function(t) h1t(t, x) - epsilon, range, tol = .Machine$double.eps) :
f() values at end points not of opposite sign
There is nothing wrong with your res, like the following graph shows. The horizontal line is drawn at y == epsilon == 0.01.
You are mistaking the abscissa for the ordinate, that's all.
res <- lapply(xmats, find_t)
curve(h1t, -1, 1, ylim = c(0, 1))
abline(h = 0.01, v = res[[1000]], col = "red", lty = "dashed")
Created on 2022-11-29 with reprex v2.0.2
The strictly increasing function is defined for t > 0 but
h1t(0)
#> [1] 0.07184164
In its domain, there is no t for which h1t(t) == 0.01.

R - fast interpolation between CDF quantiles

Say we have a data.frame where the columns represent the quantiles for a given set of probabilities. Each row represents a different subject and the quantiles vary by subject. The goal is to take n_draws for each subject.
n <- 1e5
alphas <- c(.05, .25, .5, .75, .95)
n_draws <- 100
dt <- data.frame(quantile_05 = runif(n),
quantile_25 = runif(n, min = 10, max = 20),
quantile_5 = runif(n, min = 30, max = 40),
quantile_75 = runif(n, min = 50, max = 60),
quantile_95 = runif(n, min = 70, max = 80))
R has stats::approx. The issue is that it can only be applied to 1 row at once.
draws <- apply(X = dt, MARGIN = 1, function(q){
stats::approx(x=alphas, y=q, yleft = 0, rule = 2, method="linear", xout = seq(.0001, .99999, length.out = n_draws))$y
})
Naturally, one way to speed this up is parallelization:
library(parallel)
registerDoParallel(cores=8)
cl <- makeCluster(8)
clusterExport(cl, c('alphas', 'n_draws'))
draws <- parApply(cl=cl, X = dt, MARGIN = 1, function(q){
stats::approx(x=alphas, y=q, yleft = 0, rule = 2, method="linear", xout = seq(.0001, .99999, length.out = n_draws))$y
})
This parallel code is several times faster on my machine. I am curious if anyone has suggestions for further speed ups.
To be a little more clear, this is just a simplification of what I'm using in real life. I have more than 5 quantiles. I want to use this concept to estimate not only other quantiles, but also the mean. Further, I'd like to use the draws to model other quantities like max(y - 10, 0) (or y - any arbitrary value).
Your proposed method is rather inefficient for the desired goal. You end up storing 100 points for each desired ECDF and you will still need to write a function to extract a desired draw. Instead I suggest you consider using the approxfun functions. It will return a more compact set of values which will be individual functions with associated environments that contain the knots for later calculation. The calculation will be done with a C call that is accessed with the invisible helper function, .approxfun.
Demonstrating the internals (up to a point) that I'm suggesting:
out <- approxfun(y=dt[1,], x=alphas,yleft = 0, rule = 2, method="linear")
# So out is now a single instance using the knots in the first row
out
#function (v)
#.approxfun(x, y, v, method, yleft, yright, f, na.rm)
#<bytecode: 0x558366535968>
#<environment: 0x5583690a04f8>
ls(environment(out))
#[1] "f" "method" "na.rm" "x" "y" "yleft" "yright"
environment(out)$x
#[1] 0.05 0.25 0.50 0.75 0.95
environment(out)$y
#[1] 0.4038727 17.7069735 33.4438595 57.2753257 77.2024894
If you wanted the estimated 55th percentile for the first case, you could get it with:
out(55/100)
#[1] 38.21015
And now that I've suggested a way to speed up you creation of this list of functions, I'm not even sure it's worth it. I think you could just leave that dt dataframe in place and call approxfun when needed. But that's your call.
Note: This is essentially the method used by the ecdf function:
ecdf
function (x)
{
x <- sort(x)
n <- length(x)
if (n < 1)
stop("'x' must have 1 or more non-missing values")
vals <- unique(x)
rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/n,
method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")
class(rval) <- c("ecdf", "stepfun", class(rval))
assign("nobs", n, envir = environment(rval))
attr(rval, "call") <- sys.call()
rval
}
<bytecode: 0x558364a0f360>
<environment: namespace:stats>
And it's possible that you might want to use the ecdf function because it has some class-associated functions.

How to find the max value of a concave function in R

Can anyone help me rewrite the code without using for-loop? The goal is the find the spend value that maximizes calcProfit.
In this example, `calcVolume' & 'calcProfit' are dummy functions. In my real project:
they are complicated complicated
'calcProfit' will be concave
Running this loop will take > 1 minute (thus undesirable for users)
set.seed(123)
spend = 150000
unit.x <- spend/10 # x axis (unit)
max.x <- spend*2 # x axis (max)
calcVolume <- function(spend) {
runif(1,0,1)*spend
}
calcProfit <- function(vol) {
runif(1,0,1)*vol
}
out <- as.data.frame(matrix(data = NA, nrow = 21, ncol = 2))
cnt <- 1
for (step.x in seq(0, max.x, by = unit.x)) {
out[cnt, 1] <- step.x
out[cnt, 2] <- calcVolume(step.x)
out[cnt, 3] <- calcProfit(out[cnt, 2])
cnt <- cnt + 1
}
If you have the functions involved in closed form, then the composite can be optimized with optimize.
In the code below I define an auxiliary function f.
calcVolume <- function(x) {
sin(x)
}
# only calcProfit needs to be concave
calcProfit <- function(x) {
log(x)
}
f <- function(x){
calcProfit(calcVolume(x))
}
M <- optimize(f, c(0, 3), maximum = TRUE)
M
#> $maximum
#> [1] 1.57078
#>
#> $objective
#> [1] -1.381308e-10
curve(f, 0, 3)
points(M$maximum, M$objective, col = "red", pch = 16)
Created on 2022-12-06 with reprex v2.0.2
Ignoring the order of generating random number, you can try this to avoid the loop.
V1 <- seq(0, max.x, by = unit.x)
V2 <- calcVolume(V1)
V3 <- calcProfit(V2)
out <- data.frame(V1,V2,V3)
You can use which.max to find the spend value that maximizes calcProfit
out[which.max(out$V3),]$V1

How to solve "non-numeric argument.." error in numerical integration?

I want to calculate the following integral in R.
I tried to use Vectorize and integrate functions but I got error
Error in (log(z)) * (InIntegl2) : non-numeric argument to binary operator
fxyz= function(x,y,z) { (x*y*z)+z+x+2*y}
InIntegl1 = Vectorize(function(x) { integrate(fxyz, 0,5)$value})
InIntegl2 = Vectorize(function(y) { integrate( InIntegl1, 0,3)$value})
InIntegl3 = Vectorize(function(z) { integrate((log(z))*(InIntegl2), 2,6)$value})
Integral = integrate(InIntegl3 , 2, 6)$value
The first integral must be parameterized by y and z and the second by z. Then we can perform the final integration.
int1 <- Vectorize(function(y, z) integrate(fxyz, 0, 5, y = y, z = z)$value)
int2 <- Vectorize(function(z) integrate(int1, 0, 3, z = z)$value)
integrate(function(z) log(z) * int2(z), 2, 6)$value
## [1] 2071.71
In the spirit of Numerical Triple Integration in R
integrate(Vectorize(function(z) {
log(z)*integrate(Vectorize(function(y) {
integrate(function(x) { x*y*z +x + 2*y + z}, 0, 5)$value }), 0,3)$value }), 2,6)
Package cubature can solve triple integrals with one call.
library(cubature)
f <- function(X){
x <- X[1]
y <- X[2]
z <- X[3]
log(z)*(x*y*z + x+ 2*y + z)
}
loLim <- c(0, 0, 2)
hiLim <- c(5, 3, 6)
tol <- .Machine$double.eps^0.5
hcubature(f, loLim, hiLim, tol = tol)
#$integral
#[1] 2071.71
#
#$error
#[1] 2.059926e-05
#
#$functionEvaluations
#[1] 165
#
#$returnCode
#[1] 0
If only the integral's value is needed,
hcubature(f, loLim, hiLim, tol = tol)$integral
#[1] 2071.71

How to combine for loop and uniroot in R?

At first, I have two functions like the following:
ef <- function(x, a){
if(a == 0){
return(x)
} else {
return(1-exp(-a*(5+x)))
}
}
f1 <- function(x) ef(x,a)-0.75*ef(2.5,a)-0.25*ef(-1,a)
If a is 2 (i.e. a <- 2), then the root should be:
uniroot(f1, c(-5, 0), tol = 0.0001)$root
Now my question is how to calculate the root of x of the function when a change from 0.05 to 3 by 0.05?
I think it's more flexible to put a into f1() as an argument.
f1 <- function(x, a) ef(x, a)-0.75*ef(2.5, a)-0.25*ef(-1, a)
Then use sapply() to operate each value in the sequence seq(0.05, 3, 0.05):
sapply(seq(0.05, 3, 0.05), function(A){
uniroot(f1, c(-10, 10), tol = 0.0001, extendInt = "yes", a = A)$root
})
# [1] 1.565924900 1.503659791 1.438426382 1.370549617 1.300423929
# [6] 1.228478774 1.155273229 1.081323809 1.007194271 0.933431003 ...
The argument extendInt = "yes" can conquer the error when f1() does not have different signs at the endpoints. In addition, I prefer apply family rather than a for loop in this case. You can read this for the reason.
Edit: for loop solution
a <- seq(0.05, 3, 0.05)
root <- numeric()
for(i in 1:length(a)){
root[i] <- uniroot(f1, c(-10, 10), tol = 0.0001, extendInt = "yes", a = a[i])$root
}
At the end of the loop, the variable root will store all the roots. You can check whether the outputs of the two solutions are equal.

Resources