Multiple integration with functions of variables in the limits - r

I need to numerically integrate the following:
I tried to use cubature and pracma but they don't seem to support functional integration limits. I found a attempt to use cubature by:
library(cubature)
integrand <- function(arg) {
x <- arg[1]
y <- arg[2]
z <- arg[3]
w <- arg[4]
v<- arg[5]
ff <- dnorm(x, 10,2)*dnorm(y, 10,2)*dnorm(z, 10,2)*dnorm(w, 10,2)* dnorm(v, 10,2)* (x+y+z+w+v<=52)
return(ff)
}
R <- cuhre(f = integrand,
lowerLimit=c(0,0,0,0,0),
upperLimit=c(20,20,20,20,20),
relTol = 1e-5, absTol= 1e-5)
But the author doesn't guarantee that it's correct to do it.
Is there a way to numerically integrate multiple integrals with functional limits in R?

The domain of integration is the canonical simplex scaled by the factor 42. To evaluate an integral on a simplex, use the SimplicialCubature package:
integrand <- function(arg) {
x <- arg[1]
y <- arg[2]
z <- arg[3]
w <- arg[4]
v <- arg[5]
dnorm(x, 10, 2) *
dnorm(y, 10, 2) *
dnorm(z, 10, 2) *
dnorm(w, 10, 2) *
dnorm(v, 10, 2)
}
library(SimplicialCubature)
Simplex <- 42 * CanonicalSimplex(5)
Here is the command to run:
adaptIntegrateSimplex(integrand, S = Simplex)
# $integral
# [1] 0.03252553
#
# $estAbsError
# [1] 0.3248119
#
# $functionEvaluations
# [1] 9792
#
# $returnCode
# [1] 1
#
# $message
# [1] "error: maxEvals exceeded - too many function evaluations"
The algorithm has reached the maximal number of function evaluations and the estimated absolute error is 0.3248119, while the estimated value of the integral is 0.03252553. This is a big error.
We can increase the maximum number of function evaluations allowed. Taking 1e6, the computation is a bit slow and we get:
adaptIntegrateSimplex(integrand, S = Simplex, maxEvals = 1e6)
# $integral
# [1] 0.03682535
#
# $estAbsError
# [1] 0.001004083
#
# $functionEvaluations
# [1] 999811
#
# $returnCode
# [1] 1
#
# $message
# [1] "error: maxEvals exceeded - too many function evaluations"
The estimated error has decreased to 0.001004083, quite better.
Note that we can approximate this integral by using simulations, because this integral is the measure of the simplex under a multivariate normal distribution:
library(mvtnorm)
Sigma <- 2^2 * diag(5)
Mean <- rep(10, 5)
set.seed(666)
sims <- rmvnorm(1e6, mean = Mean, sigma = Sigma)
f <- function(X){ # test whether 0 < x < 42, 0 < x + y < 42, 0 < x + y + z < 42, ...
all(X > 0 & cumsum(X) < 42)
}
mean(apply(sims, 1, f))
# 0.037083

Related

Pi Estimator in R

The code below estimates pi in R, now I am trying to find the minimum number of terms N_Min
you would have to include in your estimate of pie to make it accurate to three decimal places.
pi_Est<- function(NTerms){
NTerms = 5 # start with an estimate of just five terms
pi_Est = 0 # initialise the value of pi to zero
Sum_i = NA # initialise the summation variable to null
for(ii in 1:NTerms)
{
Sum_i[ii] = (-1)^(ii+1)/(2*ii - 1) # this is the series equation for calculating pi
}
Sum_i = 4*Sum_i # multiply by four as required in the formula (see lecture notes)
pi_Est = sum(Sum_i)
cat('\nThe estimate of pi with terms = ', NTerms ,' is ',pi_Est)
}
First of all, I would change some things about your function. Instead of getting it to print out a message, get it to return a value. Otherwise it becomes very difficult to do anything with its output, including testing it for convergence to pi.
Also, no matter what the value of NTerms is you feed this function, you are immediately over-writing NTerms inside the function.
You could rewrite the function like this:
pi_Est <- function(NTerms) {
pi_Est <- 0
Sum_i <- numeric()
for(ii in seq(NTerms))
{
Sum_i[ii] <- (-1)^(ii+1)/(2*ii - 1)
}
return(sum(4 * Sum_i))
}
And to show it converges to pi, let's test it with 50,000 terms:
pi_Est(50000)
#> [1] 3.141573
Now, if we want to find the first value of NTerms that is correct to 3 decimal places, we are going to need to be able to call this function on a vector of NTerms - at the moment it is only working on a single number. So let's define the function f that vectorizes pi_Est:
f <- Vectorize(pi_Est)
Now, let's create the estimate for all values of NTerms between 1 and 2,000 and store them in a vector:
estimates <- f(1:2000)
We can see that the values of estimates seem to oscillate round and converge to pi if we plot the first 100 values:
plot(estimates[1:100], type = 'l')
abline(h = pi)
Our answer is just the first value which, when rounded to three decimal places, is the same as pi rounded to three decimal places:
result <- which(round(estimates, 3) == round(pi, 3))[1]
result
#> [1] 1103
And we can check this is correct by feeding 1103 into our original function:
pi_Est(result)
#> [1] 3.142499
You will see that this gives us 3.142, which is the same as pi rounded to 3 decimal places.
Created on 2022-01-31 by the reprex package (v2.0.1)
1000 terms are required to make the estimate accurate to within 0.001:
pi_Est1 <- function(n) {
if (n == 0) return(0)
neg <- 1/seq(3, 2*n + 1, 4)
if (n%%2) neg[length(neg)] <- 0
4*sum(1/seq(1, 2*n, 4) - neg)
}
pi_Est2 <- function(tol) {
for (i in ceiling(1/tol + 0.5):0) {
est <- pi_Est1(i)
if (abs(est - pi) > tol) break
est1 <- est
}
list(NTerms = i + 1, Estimate = est1)
}
tol <- 1e-3
pi_Est2(tol)
#> $NTerms
#> [1] 1000
#>
#> $Estimate
#> [1] 3.140593
tol - abs(pi - pi_Est2(tol)$Estimate)
#> [1] 2.500001e-10
tol - abs(pi - pi_Est1(pi_Est2(tol)$NTerms - 1))
#> [1] -1.00075e-06
Created on 2022-01-31 by the reprex package (v2.0.1)
Perhaps we can try the code below
pi_Est <- function(digits = 3) {
s <- 0
ii <- 1
repeat {
s <- s + 4 * (-1)^(ii + 1) / (2 * ii - 1)
if (round(s, digits) == round(pi, digits)) break
ii <- ii + 1
}
list(est = s, iter = ii)
}
and you will see
> pi_Est()
$est
[1] 3.142499
$iter
[1] 1103
> pi_Est(5)
$est
[1] 3.141585
$iter
[1] 130658
Why not use a single line of code for the calculation?
Pi <- tail(cumsum(4*(1/seq(1,4*50000000,2))*rep(c(1,-1), 50000000)),1)

Optimization of budget allocation in R (formerly Excel Solver)

i translated a Problem I had in Excel into R. I want to allocate a fixed Budget in a form that "Gesamt" (which is returned by the function) is maximized.
NrwGes <- function(Budget, Speed, maxnrw, cpcrp) {
BudgetA <- Budget[1]
BudgetB <- Budget[2]
BudgetC <- Budget[3]
BudgetD <- Budget[4]
BudgetE <- Budget[5]
MaxNRW <- c(90, 40, 40, 25, 15)
Speed <- c(0.9, 0.9, 0.9, 0.9, 0.9)
cpcrp <- c(6564, 4494, 3962, 4525, 4900)
TV <- BudgetA*1000/cpcrp[1]
Catchup <- BudgetB*1000/cpcrp[2]
YT <- BudgetC*1000/cpcrp[3]
FB <- BudgetD*1000/cpcrp[4]
Display <- BudgetE*1000/cpcrp[5]
a <- TV^Speed[1]/(1+abs((TV)^Speed[1]-1)/(MaxNRW[1]*0.98))
b <- Catchup^Speed[2]/(1+abs((Catchup)^Speed[2]-1)/(MaxNRW[2]*0.98))
c <- YT^Speed[3]/(1+abs((YT)^Speed[3] -1)/(MaxNRW[3]*0.98))
d <- FB^Speed[4]/(1+abs((FB)^Speed[4]-1)/(MaxNRW[4]*0.98))
e <- Display^Speed[5]/(1+abs((Display)^Speed[5]-1)/(MaxNRW[5]*0.93))
Gesamt <- a+(100-a)/100*b+((100-a)/100*(100-b)/100*c)+((100-a)/100*(100-b)/100*(100-c)/100*d)+((100-a)/100*(100-b)/100*(100-c)/100*(100-d)/100*e)
return(Gesamt)
}
I have a total Budget (i.e 5000), which can be allocated differently to maximize "Gesamt". Examples:
NrwGes(c(5000, 0, 0, 0, 0)) # 72.16038
NrwGes(c(2000, 1500, 1000, 500, 0)) # 84.23121
Brute Forcing or grid search is not an option since this will be done 15-20 times and the algorithm will be applied to an R-Shiny App.
Try optim with the L-BFGS-U method (which allows for bounds) and a lower bound of 0. Then project the input components onto a vector which sums to 5000 passing that to NrwGes. fscale = -1 says to maximize rather than minimize. The final allocation will be proj(res$par) as shown at the bottom. No packages are used.
proj <- function(x) 5000 * x / sum(x)
st <- proj(rep(1, 5))
f <- function(x) NrwGes(proj(x))
res <- optim(st, f, lower = 0 * st, method = "L-BFGS-B", control = list(fnscale = -1))
giving:
> res
$`par`
[1] 2107.8438 482.5702 468.9409 268.0808 142.4305
$value
[1] 86.64285
$counts
function gradient
14 14
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
> proj(res$par) # final allocation
[1] 3037.3561 695.3729 675.7334 386.2984 205.2391
An option is nloptr package :
library(nloptr)
# we use NLOPT_LN_COBYLA algorithm because it doesn't need gradient functions
opts <- list(algorithm="NLOPT_LN_COBYLA",
xtol_rel=1.0e-8,
maxeval=10000)
# objective function (negative because nloptr always minimize)
objFun <- function(x){ -NrwGes(x) }
# sum of budget <= 5000 (in the form g(x) <= 0)
g <- function(x){ sum(x) - 5000 }
res <- nloptr(x0=rep.int(0,5), # initial solution (all zeros)
eval_f=objFun,
lb=rep.int(0,5), # lowerbounds = 0
ub=rep.int(5000,5), # upperbounds = 5000
eval_g_ineq=g,
opts=opts)
Result :
> res
Call:
nloptr(x0 = rep.int(0, 5), eval_f = objFun, lb = rep.int(0, 5),
ub = rep.int(5000, 5), eval_g_ineq = g, opts = opts)
Minimization using NLopt version 2.4.2
NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel
or xtol_abs (above) was reached. )
Number of Iterations....: 261
Termination conditions: xtol_rel: 1e-08 maxeval: 10000
Number of inequality constraints: 1
Number of equality constraints: 0
Optimal value of objective function: -86.6428477187536
Optimal value of controls: 3037.382 695.3725 675.7232 386.2929 205.2291
N.B. you can access to solution, objective of res using res$solution, res$objective etc.

Constraint Optimization with one parameter included in the constraint of the other

I want to calculate the following
So I want to find Theta and Sigma that maximizes the function.
The constraints are:
> Theta>-Sigma
> -1<Sigma<1
So one of my problem is that I dont know how to deal with the fact that one parameter is included in the constraint of the other Parameter, that I want to optimize over.
I tried with optim(), constrOptim and dfoptim!
Using optim():
k=8
i=1:(k-1)
x=c(5,0.2)
n=24
nj=c(3,4,8,1,1,4,2,1)
EPPF <- function(x,n,nj) {
y=(x[1]+1):(x[1]+1+(n-1)-1)
z=-(prod(x[1]+i*x[2])/(prod(y))*prod(sapply(nj, hfun)))
return(z)}
hfun <- function(p){
h=(1-x[2]):((1-x[2])+p-1)
hfun=prod(h)
return(hfun)
}
> optim(c(6,0.3), fn=EPPF,method = "L-BFGS-B", n=n,nj=nj, lower = c(-x[1],-1), upper = c(Inf,1))
$par
[1] 6.0 0.3
$value
[1] -1.258458e-15
$counts
function gradient
2 2
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
I also tried using a constructor function:
make.EPPF <- function(n,nj,fixed=c(FALSE,FALSE)){
params <-fixed
function(p) {
hfun <- function(y){
h=(1-sigma):((1-sigma)+y-1)
hfun=prod(h)
return(hfun)
}
params[!fixed] <- p
theta <- params[1]
sigma <- params[2]
y=(theta+1):(theta+1+(n-1)-1)
z=(prod(theta+i*sigma)/(prod(y))*prod(sapply(nj, hfun)))
z
}
}
EPPF <-make.EPPF (n,nj)
> optim(c(theta=6, sigma=0.5), fn=EPPF,method = "L-BFGS-B",lower = c(-sigma,-1), upper = c(Inf,1))
Error in optim(c(theta = 6, sigma = 0.5), fn = EPPF, method = "L-BFGS-B", :
object 'sigma' not found
Using constrOptim():
> A <- matrix(c(1,1,0,1,0,-1),3,2,byrow=T)
> b <- c(0,-1,-1)
>
> constrOptim(c(3,0.3),EPPF,NULL,A,b, control=list(fnscale=-1))
$par
[1] 3.0 0.3
$value
[1] 9.712117e-16
$counts
[1] 0
$convergence
[1] 0
$message
NULL
$outer.iterations
[1] 1
$barrier.value
[1] 7.313452e-05
Using Package dfoptim:
> library(dfoptim)
> nmkb(x=c(6,0.3), EPPF, lower=c(-x[2],-1), upper=c(Inf, 1 ))
Error in par < lower :
comparison (3) is possible only for atomic and list types
Either there is for some reasons no convergence or some other Errors.
I am relativ new to programming and R and would really appreciate if someone could help me. Thanks!
These are 3 linear inequality constraints:
sigma + theta > 0
sigma + 1 > 0
-sigma + 1 > 0
You can do this in maxLik. But note that maxLik maximizes the function, hence remove the '-' in front of z. Here is the code that works for me (using Rscript):
k=8
i=1:(k-1)
x=c(5,0.2)
n=24
nj=c(3,4,8,1,1,4,2,1)
EPPF <- function(x,n,nj) {
theta <- x[1]
sigma <- x[2]
y=(x[1]+1):(x[1]+1+(n-1)-1)
z <- prod(x[1]+i*x[2])/(prod(y))*prod(sapply(nj, hfun))
z <- log(z)
return(z)
}
hfun <- function(p){
h=(1-x[2]):((1-x[2])+p-1)
hfun=prod(h)
return(hfun)
}
library(maxLik)
constraints <- list(ineqA=matrix(c(1,0,0,1,1,-1),3,2), ineqB=c(0,1,1))
m <- maxBFGS(EPPF, start=c(6,0.3), constraints=constraints, n=n, nj=nj)
print(summary(m))
I also took logarithm of the result as this leads to more "human" numbers. Otherwise you have to re-tune the stopping conditions. The answer seems to be -1, 1.

root values of simultaneous nonlinear equations in R

I've been trying to code this problem:
https://sg.answers.yahoo.com/question/index?qid=20110127015240AA9RjyZ
I believe there is a R function somewhere to solve for the root values of the following equations:
(x+3)^2 + (y-50)^2 = 1681
(x-11)^2 + (y+2)^2 = 169
(x-13)^2 + (y-34)^2 = 625
I tried using the 'solve' function but they're only for linear equations(?)
Also tried 'nls'
dt = data.frame(a=c(-3,11,13), b = c(50, -2, 34), c = c(1681,169,625))
nls(c~(x-a)^2 + (y-b)^2, data = dt, start = list(x = 1, y = 1))
but getting an error all the time. (and yes I already tried changing the max iteration)
Error in nls(c ~ (x - a)^2 + (y - b)^2, data = dt, start = list(x = 1, :
number of iterations exceeded maximum of 50
how do you solve the root values in R?
nls does not work with zero residual data -- see ?nls where this is mentioned. nlxb in the nlmrt package is mostly similar to nls in terms of input arguments and does support zero residual data. Using dt from the question just replace nls with nlxb:
library(nlmrt)
nlxb(c~(x-a)^2 + (y-b)^2, data = dt, start = list(x = 1, y = 1))
giving:
nlmrt class object: x
residual sumsquares = 2.6535e-20 on 3 observations
after 5 Jacobian and 6 function evaluations
name coeff SE tstat pval gradient JSingval
x 6 7.21e-12 8.322e+11 7.649e-13 -1.594e-09 96.93
y 10 1.864e-12 5.366e+12 1.186e-13 -1.05e-08 22.45
You cannot always solve three equations for two variables.You can solve two equations for two variables and test if the solution satisfies the third equation.
Use package nleqslv as follows.
library(nleqslv)
f1 <- function(z) {
f <- numeric(2)
x <- z[1]
y <- z[2]
f[1] <- (x+3)^2 + (y-50)^2 - 1681
f[2] <- (x-11)^2 + (y+2)^2 - 169
f
}
f2 <- function(z) {
x <- z[1]
y <- z[2]
(x-13)^2 + (y-34)^2 - 625
}
zstart <- c(0,0)
z1 <- nleqslv(zstart,f1)
z1
f2(z1$x)
which gives you the following output:
>z1
$x
[1] 6 10
$fvec
[1] 7.779818e-09 7.779505e-09
$termcd
[1] 1
$message
[1] "Function criterion near zero"
$scalex
[1] 1 1
$nfcnt
[1] 9
$njcnt
[1] 1
$iter
[1] 9
>f2(z1$x)
[1] 5.919242e-08
So a solution has been found and the solution follows from the vector z$x. Inserting z$x in function f2 also gives almost zero.
So a solution has been found.
You could also try package BB.
Just go through rootSolve package and you will be done:
https://cran.r-project.org/web/packages/rootSolve/vignettes/rootSolve.pdf

view values used by function boot to bootstrap estimates

I have written the code below to obtain a bootstrap estimate of a mean. My objective is to view the numbers selected from the data set, ideally in the order they are selected, by the function boot in the boot package.
The data set only contains three numbers: 1, 10, and 100 and I am only using two bootstrap samples.
The estimated mean is 23.5 and the R code below indicates that the six numbers included one '1', four '10' and one '100'. However, there are 30 possible combinations of those numbers that would have resulted in a mean of 23.5.
Is there a way for me to determine which of those 30 possible combinations is the combination that actually appeared in the two bootstrap samples?
library(boot)
set.seed(1234)
dat <- c(1, 10, 100)
av <- function(dat, i) { sum(dat[i])/length(dat[i]) }
av.boot <- boot(dat, av, R = 2)
av.boot
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = dat, statistic = av, R = 2)
#
#
# Bootstrap Statistics :
# original bias std. error
# t1* 37 -13.5 19.09188
#
mean(dat) + -13.5
# [1] 23.5
# The two samples must have contained one '1', four '10' and one '100',
# but there are 30 possibilities.
# Which of these 30 possible sequences actual occurred?
# This code shows there must have been one '1', four '10' and one '100'
# and shows the 30 possible combinations
my.combos <- expand.grid(V1 = c(1, 10, 100),
V2 = c(1, 10, 100),
V3 = c(1, 10, 100),
V4 = c(1, 10, 100),
V5 = c(1, 10, 100),
V6 = c(1, 10, 100))
my.means <- apply(my.combos, 1, function(x) {( (x[1] + x[2] + x[3])/3 + (x[4] + x[5] + x[6])/3 ) / 2 })
possible.samples <- my.combos[my.means == 23.5,]
dim(possible.samples)
n.1 <- rowSums(possible.samples == 1)
n.10 <- rowSums(possible.samples == 10)
n.100 <- rowSums(possible.samples == 100)
n.1[1]
n.10[1]
n.100[1]
length(unique(n.1)) == 1
length(unique(n.10)) == 1
length(unique(n.100)) == 1
I think you can determine the numbers sampled and the order in which they are sampled with the code below. You have to extract the function ordinary.array from the boot package and paste that function into your R code. Then specify the values for n, R and strata, where n is the number of observations in the data set and R is the number of replicate samples you want.
I do not know how general this approach is, but it worked with a couple of simple examples I tried, including the example below.
library(boot)
set.seed(1234)
dat <- c(1, 10, 100, 1000)
av <- function(dat, i) { sum(dat[i])/length(dat[i]) }
av.boot <- boot(dat, av, R = 3)
av.boot
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = dat, statistic = av, R = 3)
#
#
# Bootstrap Statistics :
# original bias std. error
# t1* 277.75 -127.5 132.2405
#
#
mean(dat) + -127.5
# [1] 150.25
# boot:::ordinary.array
ordinary.array <- function (n, R, strata)
{
inds <- as.integer(names(table(strata)))
if (length(inds) == 1L) {
output <- sample.int(n, n * R, replace = TRUE)
dim(output) <- c(R, n)
}
else {
output <- matrix(as.integer(0L), R, n)
for (is in inds) {
gp <- seq_len(n)[strata == is]
output[, gp] <- if (length(gp) == 1)
rep(gp, R)
else bsample(gp, R * length(gp))
}
}
output
}
# I think the function ordinary.array determines which elements
# of the data are sampled in each of the R samples
set.seed(1234)
ordinary.array(n=4,R=3,1)
# [,1] [,2] [,3] [,4]
# [1,] 1 3 1 3
# [2,] 3 4 1 3
# [3,] 3 3 3 3
#
# which equals:
((1+100+1+100) / 4 + (100+1000+1+100) / 4 + (100+100+100+100) / 4) / 3
# [1] 150.25

Resources