Problems in Numerical Integration through R [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have the following function
f(x)∝|x| exp(-1/2 |x| )+1/(1+(x-40)^4 ),xϵR
I want to find out E(X) and E(X^3) through Simpson's method (numerical integration), Standard Monte Carlo approach, Acceptance-rejection sampling, Importance sampling, Metropolis-Hasting Algorithm, Gibbs sampling and then Bayesian model using MCMC (I have not decided yet).
How can I validate my results obtained from different methods?
I have tried to solve E(X) mathematically but fail to find any close form. This function can be divided over different parts as
absolute(x)*double exponential density + another function utilizing higher power (4) of X in inverse form.
Due to absolute (x) and range [-Inf, Inf] We always have to divide it over [-Inf, 0] and [0, Inf]. Through Integration by parts I was able to see first part as (absolute (x) + (x^2/2) over infinite range) + Integral of this part can't be found mathematically.
So I make use of the following code to get numerical integration result as
Library(stats)
integrand <- function(x) {x*(abs(x)* exp(-0.5*abs(x))+(1/(1+(x-40)^4)))}
integrate(integrand, lower = -Inf, upper = Inf)
thus the result is E(X)= 88.85766 with absolute error < 0.004
The results which I obtain from these methods are not similar for instance
(i) Through Simpsons method I got E(X) = 0.3222642 and E(X^3)=677.0711..
simpson_v2 <- function(fun, a, b, n=100) {
# numerical integral using Simpson's rule
# assume a < b and n is an even positive integer
if (a == -Inf & b == Inf) {
f <- function(t) (fun((1-t)/t) + fun((t-1)/t))/t^2
s <- simpson_v2(f, 0, 1, n)
} else if (a == -Inf & b != Inf) {
f <- function(t) fun(b-(1-t)/t)/t^2
s <- simpson_v2(f, 0, 1, n)
} else if (a != -Inf & b == Inf) {
f <- function(t) fun(a+(1-t)/t)/t^2
s <- simpson_v2(f, 0, 1, n)
} else {
h <- (b-a)/n
x <- seq(a, b, by=h)
y <- fun(x)
y[is.nan(y)]=0
s <- y[1] + y[n+1] + 2*sum(y[seq(2,n,by=2)]) + 4 *sum(y[seq(3,n-1, by=2)])
s <- s*h/3
}
return(s)
}
EX <- function(x) x*(abs(x)* exp(-0.5*abs(x))+(1/(1+(x-40)^4)))
simpson_v2(EX, -Inf, Inf, n=100)
EX3 <- function(x) (x^3)*(abs(x)* exp(-0.5*abs(x))+(1/(1+(x-40)^4)))
simpson_v2(EX3, -Inf, Inf, n=100)
(ii) Importance Sampling
My proposal density is Normal with mean=0 and standard deviation =4. The summary of the Importance sampling process I am applying is as follows
Suppose I can't sample from f(x) which is true as it has no well-known form and no built-in function is available in R to use for sampling. So, I propose another log cancave tail distribution N(0, 4) to take samples such that instead of estimating E(x) I estimate E(x*f(x)/N(0,1)). I use the following code for this which takes 100000 samples from N(0,4)
X <- rnorm(1e5, sd=4)
Y <- (X)*(abs(X)*exp(-0.5*abs(X))+(1/(1+(x-40)^4)))/(dnorm(X, sd=4))
mean(Y)
Since this code needs random sampling from Normal distribution therefore, each time I got different answers but it is something around -0.1710694 which is almost similar to 0.3222642. I got from Simpsons method. But these results are very different E(X)= 88.85766 from integrate(). Note that integrate() use the Adaptive quadrature method. Is this method different from Simpsons and Importance sampling? What similarity in results I should expect while comparing these methods

First, EX and EX3 definition is wrong, you miss minus under exponent
Well, here are some simplifications
If you integrate this part x*(abs(x)* exp(-0.5*abs(x))), from -infinity...infinity result would be 0
If you integrate this part x^3*(abs(x)* exp(-0.5*abs(x))), from -infinity...infinity result would be 0
Integral x^3/(1+(x-40)^4) from -infinity...infinity would be infinity, I would venture, you'll get logarithms which are infinite at infinity, see http://integrals.wolfram.com/index.jsp?expr=%28xxx%29%2F%281+%2B+%28x-a%29%5E4%29&random=false
Integral x/(1+(x-40)^4) looks like something resembling inverse tangent, though online integrator provides ugly output http://integrals.wolfram.com/index.jsp?expr=x%2F%281+%2B+%28x-a%29%5E4%29&random=false
UPDATE
Looks like your EX would be 40*\pi / \sqrt{2}
And EX3 is not infinity, I might be wrong here
UPDATE 2
Yep, EX3 is finite, should be a^2*EX + \pi*a*3/\sqrt{2}, where a is equal to 40
UPDATE 3
As noted, there is also a normalization required to get true values of EX and EX3
N = 8 + \pi/\sqrt{2}
Computed integrals to be divided by N to get proper moments.

Related

How do I minimize a linear least squares function in R?

I'm reading Deep Learning by Goodfellow et al. and am trying to implement gradient descent as shown in Section 4.5 Example: Linear Least Squares. This is page 92 in the hard copy of the book.
The algorithm can be viewed in detail at https://www.deeplearningbook.org/contents/numerical.html with R implementation of linear least squares on page 94.
I've tried implementing in R, and the algorithm as implemented converges on a vector, but this vector does not seem to minimize the least squares function as required. Adding epsilon to the vector in question frequently produces a "minimum" less than the minimum outputted by my program.
options(digits = 15)
dim_square = 2 ### set dimension of square matrix
# Generate random vector, random matrix, and
set.seed(1234)
A = matrix(nrow = dim_square, ncol = dim_square, byrow = T, rlnorm(dim_square ^ 2)/10)
b = rep(rnorm(1), dim_square)
# having fixed A & B, select X randomly
x = rnorm(dim_square) # vector length of dim_square--supposed to be arbitrary
f = function(x, A, b){
total_vector = A %*% x + b # this is the function that we want to minimize
total = 0.5 * sum(abs(total_vector) ^ 2) # L2 norm squared
return(total)
}
f(x,A,b)
# how close do we want to get?
epsilon = 0.1
delta = 0.01
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
steps = vector()
while(L2_norm > delta){
x = x - epsilon * value
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
print(L2_norm)
}
minimum = f(x, A, b)
minimum
minimum_minus = f(x - 0.5*epsilon, A, b)
minimum_minus # less than the minimum found by gradient descent! Why?
On page 94 of the pdf appearing at https://www.deeplearningbook.org/contents/numerical.html
I am trying to find the values of the vector x such that f(x) is minimized. However, as demonstrated by the minimum in my code, and minimum_minus, minimum is not the actual minimum, as it exceeds minimum minus.
Any idea what the problem might be?
Original Problem
Finding the value of x such that the quantity Ax - b is minimized is equivalent to finding the value of x such that Ax - b = 0, or x = (A^-1)*b. This is because the L2 norm is the euclidean norm, more commonly known as the distance formula. By definition, distance cannot be negative, making its minimum identically zero.
This algorithm, as implemented, actually comes quite close to estimating x. However, because of recursive subtraction and rounding one quickly runs into the problem of underflow, resulting in massive oscillation, below:
Value of L2 Norm as a function of step size
Above algorithm vs. solve function in R
Above we have the results of A %% x followed by A %% min_x, with x estimated by the implemented algorithm and min_x estimated by the solve function in R.
The problem of underflow, well known to those familiar with numerical analysis, is probably best tackled by the programmers of lower-level libraries best equipped to tackle it.
To summarize, the algorithm appears to work as implemented. Important to note, however, is that not every function will have a minimum (think of a straight line), and also be aware that this algorithm should only be able to find a local, as opposed to a global minimum.

Nested integration for incomplete convolution of gauss densities

Let g(x) = 1/(2*pi) exp ( - x^2 / 2) be the density of the normal distribution with mean 0 and standard deviation 1. In some calculation on paper appeared integrals of the form
where c>0 is a positive number.
Since I could not evaluate this by hand, I had the idea to approximate and plot it. I tried this in R, because R provides the dnorm function and a function to do integrals.
You see that I need to integrate numerically n times, where n shall be chosed by the call of a plot function. My code has an for-loop to create those "incomplete" convolutions iterativly.
For example even with n=3 and c=1 this gives me an error. n=2 (thus it's one integration) works.
N = 3
ngauss <- function(x) dnorm(x , mean = 0, sd = 1)
convoluts <- list()
convoluts[[1]] <- ngauss
for (i in 2:N) {
h <- function(y) {
g <- function(z) {ngauss(y-z)*convoluts[[i-1]](z)}
return(integrate(g, lower = -1, upper = 1)$value)
}
h <- Vectorize(h)
convoluts[[i]] <- h
}
convoluts[[3]](0)
What I get is:
Error: evaluation nested too deeply: infinite recursion /
options(expressions=)?
I understand that this is a hard computation, but for "small" n something similar should possible.
Maybe someone can help me to fix my code or provide a recommendation how I can implement this in a better way. Another language that is more appropriate for this would be also okay.
The issue appears to be in how integrate deals with variables in different environments. In particular, it doesn't really deal with i correctly in each iteration. Instead using
h <- evalq(function(y) {
g <- function(z) {ngauss(y - z) * convoluts[[i - 1]](z)}
integrate(g, lower = -1, upper = 1)$value
}, list(i = i))
does the job and, say, setting N <- 6 quickly gives
convoluts[[N]](0)
# [1] 0.03423872
As your integration is simply the pdf of a sum of N independent standard normals (which then follows N(0, N)), we may also verify this approach by setting lower = -Inf and upper = Inf. Then with N <- 4 we have
dnorm(0, sd = sqrt(N))
# [1] 0.1994711
convoluts[[N]](0)
# [1] 0.1994711
So, for practical purposes, when c = Inf, you are way better off using dnorm rather than manual computations.

Fitting an inverse function

I have a function which looks like:
g(x) = f(x) - a^b / f(x)^b
g(x) - known function, data vector provided.
f(x) - hidden process.
a,b - parameters of this function.
From the above we get the relation:
f(x) = inverse(g(x))
My goal is to optimize parameters a and b such that f(x) would be as close as possible
to a normal distribution. If we look on a f(x) Q-Q normal plot (attached), my purpose is to minimize the distance between f(x) to the straight line which represents the normal distribution, by optimizing parameters a and b.
I wrote the below code:
g_fun <- function(x) {x - a^b/x^b}
inverse = function (f, lower = 0, upper = 2000) {
function (y) uniroot((function (x) f(x) - y), lower = lower, upper = upper)[1]
}
f_func = inverse(function(x) g_fun(x))
enter code here
# let's made up an example
# g(x) values are known
g <- c(-0.016339, 0.029646, -0.0255258, 0.003352, -0.053258, -0.018971, 0.005172,
0.067114, 0.026415, 0.051062)
# Calculate f(x) by using the inverse of g(x), when a=a0 and b=b0
for (i in 1:10) {
f[i] <- f_fun(g[i])
}
I have two question:
How to pass parameters a and b to the functions?
How to perform this optimization task, meaning find a and b such that f(x) would approximate normal distribution.
Not sure how you were able to produce the Q-Q plot since your provided examples do not work. You are not specifying the values of a and b and you are defining f_func but calling f_fun. Anyway here is my answer to your questions:
How to pass parameters a and b to the functions? - Just pass them as
arguments to the functions.
How to perform this optimization task, meaning find a and b such that f(x) would approximate normal distribution? - The same way any optimization task is done. Define a cost function, then minimize it.
Here is the revised code: I have added a and b as parameters, removed the inverse function and incorporated it inside f_func, which can now take vector input so no need for a for loop.
g_fun <- function(x,a,b) {x - a^b/x^b}
f_func = function(y,a,b,lower = 0, upper = 2000){
sapply(y,function(z) { uniroot(function(x) g_fun(x,a,b) - z, lower = lower, upper = upper)$root})
}
# g(x) values are known
g <- c(-0.016339, 0.029646, -0.0255258, 0.003352, -0.053258, -0.018971, 0.005172,
0.067114, 0.026415, 0.051062)
f <- f_func(g,1,1) # using a = 1 and b = 1
#[1] 0.9918427 1.0149329 0.9873386 1.0016774 0.9737270 0.9905320 1.0025893
#[8] 1.0341199 1.0132947 1.0258569
f_func(g,2,10)
[1] 1.876408 1.880554 1.875578 1.878138 1.873094 1.876170 1.878304 1.884049
[9] 1.880256 1.882544
Now for the optimization part, it depends on what you mean by f(x) would approximate normal distribution. You can compare mean square error from the qq-line if you want. Also since you say approximate, how close is good enough? You can go with shapiro.test and keep searching till you find p-value below 0.05 (be ware that there may not be a solution)
shapiro.test(f_func(g,1,2))$p
[1] 0.9484821
cost <- function(x,y) shapiro.test(f_func(g,x,y))$p
Now that we have a cost function how do we go about minimizing it. There are many many different ways to do numerical optimization. Take a look at optim function http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optim.html.
optim(c(1,1),cost)
This final line does not work, but without proper data and context this is as far as I can go. Hope this helps.

Errors when attempting constrained optimisation using optim()

I have been using the Excel solver to handle the following problem
solve for a b and c in the equation:
y = a*b*c*x/((1 - c*x)(1 - c*x + b*c*x))
subject to the constraints
0 < a < 100
0 < b < 100
0 < c < 100
f(x[1]) < 10
f(x[2]) > 20
f(x[3]) < 40
where I have about 10 (x,y) value pairs. I minimize the sum of abs(y - f(x)). And I can constrain both the coefficients and the range of values for the result of my function at each x.
I tried nls (without trying to impose the constraints) and while Excel provided estimates for almost any starting values I cared to provide, nls almost never returned an answer.
I switched to using optim, but I'm having trouble applying the constraints.
This is where I have gotten so far-
best = function(p,x,y){sum(abs(y - p[1]*p[2]*p[3]*x/((1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x))))}
p = c(1,1,1)
x = c(.1,.5,.9)
y = c(5,26,35)
optim(p,best,x=x,y=y)
I did this to add the first set of constraints-
optim(p,best,x=x,y=y,method="L-BFGS-B",lower=c(0,0,0),upper=c(100,100,100))
I get the error ""ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"
and end up with a higher value of the error ($value). So it seems like I am doing something wrong. I couldn't figure out how to apply my other set of constraints at all.
Could someone provide me a basic idea how to solve this problem that a non-statistician can understand? I looked at a lot of posts and looked in a few R books. The R books stopped at the simplest use of optim.
The absolute value introduces a singularity:
you may want to use a square instead,
especially for gradient-based methods (such as L-BFGS).
The denominator of your function can be zero.
The fact that the parameters appear in products
and that you allow them to be (arbitrarily close to) zero
can also cause problems.
You can try with other optimizers
(complete list on the optimization task view),
until you find one for which the optimization converges.
x0 <- c(.1,.5,.9)
y0 <- c(5,26,35)
p <- c(1,1,1)
lower <- 0*p
upper <- 100 + lower
f <- function(p,x=x0,y=y0) sum(
(
y - p[1]*p[2]*p[3]*x / ( (1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x) )
)^2
)
library(dfoptim)
nmkb(p, f, lower=lower, upper=upper) # Converges
library(Rvmmin)
Rvmmin(p, f, lower=lower, upper=upper) # Does not converge
library(DEoptim)
DEoptim(f, lower, upper) # Does not converge
library(NMOF)
PSopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
DEopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
library(minqa)
bobyqa(p, f, lower, upper) # Does not really converge
As a last resort, you can always use a grid search.
library(NMOF)
r <- gridSearch( f,
lapply(seq_along(p), function(i) seq(lower[i],upper[i],length=200))
)

In R, how do I find the optimal variable to minimise the correlation between two datasets [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, how do I find the optimal variable to maximize or minimize correlation between several datasets
This can be done in Excel, but my dataset has gotten too large. In excel, I would use solver.
I have 5 variables and I want to recreate a weighted average of these 5 variables so that they have the lowest correlation to a 6th variable.
Column A,B,C,D,E = random numbers
Column F = random number (which I want to minimise the correlation to)
Column G = Awi1+Bwi2+C*2i3+D*wi4+wi5*E
where wi1 to wi5 are coefficients resulted from solver In a separate cell, I would have correl(F,G)
This is all achieved with the following constraints in mind:
1. A,B,C,D, E have to be between 0 and 1
2. A+B+C+D+E= 1
I'd like to print the results of this so that I can have an efficient frontier type chart.
How can I do this in R? Thanks for the help.
I looked at the other thread mentioned by Vincent and I think I have a better solution. I hope it is correct. As Vincent points out, your biggest problem is that the optimization tools for such non-linear problems do not offer a lot of flexibility for dealing with your constraints. Here, you have two types of constraints: 1) all your weights must be >= 0, and 2) they must sum to 1.
The optim function has a lower option that can take care of your first constraint. For the second constraint, you have to be a bit creative: you can force your weights to sum to one by scaling them inside the function to be minimized, i.e. rewrite your correlation function as function(w) cor(X %*% w / sum(w), Y).
# create random data
n.obs <- 100
n.var <- 6
X <- matrix(runif(n.obs * n.var), nrow = n.obs, ncol = n.var)
Y <- matrix(runif(n.obs), nrow = n.obs, ncol = 1)
# function to minimize
correl <- function(w)cor(X %*% w / sum(w), Y)
# inital guess
w0 <- rep(1 / n.var, n.var)
# optimize
opt <- optim(par = w0, fn = correl, method = "L-BFGS-B", lower = 0)
optim.w <- opt$par / sum(opt$par)

Resources