Why do results from optim() depend on initial values? - r

In R, I am using the function optim() to find the minimum of an objective function of two variables. The real objective functions I'm working with are quite complex, so I tried to familiarize myself with the a simpler objective function. The simplest way to run optim() is optim(par,function) where par is a vector of initial values for the algorithm. I find that the answer I get depends heavily on the initial values I input. However, the function I used is so simple, I'm worried that I am misunderstanding either the input or output of optim().
The objective function I am using is:
f <- function(x){
abs(x[1])*abs(x[2])
}
When I run optim(c(-1.2,1),f) and optim(c(-1.2,10),f) I get drastically different output for the optimal arguments (par) and the minimum (value). Does anyone have an idea why this would be so?

In this case your objective function has infinitely many optimal points (not necessarily just different local maxima). Anywhere one of the parameters is zero 0 is just as good as any other point where a parameter is near 0. I'm not sure if you were expecting (0,0), but (0,34) has the same value and can also be considered optimal.

An associated function is:
g <- function(x, y) abs(x)*abs(y)
We can visualize the levels of the graph with contour and plot the points given:
A reasonable field given your initial and final conditions (noted from running optim):
x <- seq(-1.5, 0, by=.1)
y <- seq(0, 11, by=1)
A matrix of the values in g:
m <- outer(x, y, g)
Plot the results, including the results of optim. Note that the values at x==0 or y==0 are optimal.
contour(x, y, m)
o1 <- optim(c(-1.2,1),f)$par
o2 <- optim(c(-1.2,10),f)$par
segments(-1.2, 1, o1[1], o1[2], col='red')
segments(-1.2, 10, o2[1], o2[2], col='red')
# Add zero lines as contour does not draw them
segments(0, 11, 0, 0)
segments(-1.5, 0, 0, 0)
This shows a straight line from the initial condition (left side) to the zero of the function (right side). Note that the optimization does not follow a straight line, but this shows that it is reasonable that quite different results will be achieved.

Related

Using FFT in R to Determine Density Function for IID Sum

The goal is to compute the density function of a sum of n IID random variables via the density function of one of these random variables by:
Transforming the density function into the characteristic function via fft
Raise the characteristic function to the n
Transform the resulting characteristic function into the density function of interest via fft(inverse=TRUE)
The below is my naive attempt at this:
sum_of_n <- function(density, n, xstart, xend, power_of_2)
{
x <- seq(from=xstart, to=xend, by=(xend-xstart)/(2^power_of_2-1))
y <- density(x)
fft_y <- fft(y)
fft_sum_of_y <- (fft_y ^ n)
sum_of_y <- Re(fft(fft_sum_of_y, inverse=TRUE))
return(sum_of_y)
}
In the above, density is an arbitrary density function: for example
density <- function(x){return(dgamma(x = x, shape = 2, rate = 1))}
n indicates the number of IID random variables being summed. xstart and xend are the start and end of the approximate support of the random variable. power_of_2 is the power of 2 length for the numeric vectors used. As I understand things, lengths of powers of two increase the efficiency of the fft algorithm.
I understand at least partially why the above does not work as intended in general. Firstly, the values themselves will not be scaled correctly, as fft(inverse=TRUE) does not normalize by default. However, I find that the values are still not correct when I divide by the length of the vector i.e.
sum_of_y <- sum_of_y / length(sum_of_y)
which based on my admittedly limited understanding of fft is the normalizing calculation. Secondly, the resulting vector will be out of phase due to (someone correct me on this if I am wrong) the shifting of the zero frequency that occurs when fft is performed. I have tried to use, for example, pracma's fftshift and ifftshift, but they do not appear to address this problem correctly. For symmetric distributions e.g. normal, this is not difficult to address since the phase shift is typically exactly half, so that an operation like
sum_of_y <- c(sum_of_y[(length(y)/2+1):length(y)], sum_of_y[1:(length(y)/2)])
works as a correction. However, for asymmetric distributions like the gamma distribution above this fails.
In conclusion, are there adjustments to the code above that will result in an appropriately scaled and appropriately shifted final density function for the IID sum?

Why is my approximation too large using Composite Simpson's rule in R (numerical integration)?

I am trying to approximate the following integral, using numerical integration in R:
,
where the function mu is defined by this formula:
To do this, I have implemented the Composite Simpson's rule as a function in R, which takes as parameters a function (integrand), the integration interval ([a,b]) and the number of subintervals desired (n).
I have tested my code on various different mathematical functions, and it seems to be working just fine. However, when I try to approximate the integral shown in the picture, the approximation becomes to large.
My method has been to first define the inner integral in terms of its Composite Simpson approximation as a function of t in R. Then, use the Composite Simpson's rule again, in order to calculate the outer integral by viewing the inner approximation as the function to be integrated.
When doing this, the inner approximation is correct when calculated by itself, as expected, but the approximation of the entire expression becomes too large, and I can't seem to figure out why.
I am comparing the approximations to those given by Maple; the inner expression calculated by itself, using t=20, should give 0.8157191, and the entire expression should be 12.837. R correctly calculates 0.8157191, but gives 32.9285 for the entire expression.
I have tried simplifying using numerous different mathematical functions, and making the functions independent of t in R, but all seems to result in the same error. So, to sum things up, my question is, why is only the outer integral being approximated wrongly?
I would be greatly appreciative of any hints or pointers - I have included my code illustrating the problem here:
compositesimpson <- function(integrand, a, b, n) {
h<- (b-a)/n #THE DEFINITE INTERVAL IS SCALED BY
#THE DESIRED NUMBER OF SUBINTERVALS
xi<- seq.int(a, b, length.out = n+1) #DIVIDES THE DEFINITE INTERVAL INTO THE
xi<- xi[-1] #DESIRED NUMBER OF SUBINTERVALS,
xi<- xi[-length(xi)] #EXCLUDING a AND b
#THE APPROXIMATION ITSELF
approks<- (h/3)*(integrand(a) + 2*sum(integrand(xi[seq.int(2, length(xi), 2)])) +
4*sum(integrand(xi[seq.int(1, length(xi), 2)])) + integrand(b))
return(approks)
}
# SHOULD YIELD -826.5755 BY Maple, SO THE FUNCTION IS WORKING HERE
ftest<- function(x) {
return(exp(2*x)*sin(3*x))
}
compositesimpson(ftest, -4, 4, 100000)
# MU FUNCTION FOR TESTING
mu.01.kvinde<- function(x){ 0.000500 + 10^(5.728 + 0.038*(x+48) -10)}
#INNER INTEGRAL AS A FUNCTION OF ITS APPROXIMATION
indreintegrale.person1<- function(t){
indre<- exp(-compositesimpson(mu.01.kvinde, 0, t, 100000))
return(indre)
}
indreintegrale.person1(20) #YIELDS 0.8157191, WHICH IS CORRECT
compositesimpson(indreintegrale.person1, 20, 72, 100000) #YIELDS 32.9285,
#BUT SHOULD BE 12.837 ACCORDING TO MAPLE
This is something to do with trying to use vectorisation at two levels of recursion and it's not doing what you want it to. E.g. compare
indreintegrale.person1(20)
#> [1] 0.8157191
indreintegrale.person1(c(20, 72))
#> [1] 0.8157191 0.4801160
indreintegrale.person1(72)
#> [1] 2.336346e-10
I think the middle answer is wrong, but the other two are right.
Quickest fix, make this replacement:
indreintegrale.person1 <- function(t){
sapply(t, function(t2) exp(-compositesimpson(mu.01.kvinde, 0, t2, 100000)))
}
and it now gives the answer you expect (but takes a bit longer to calculate!).

Getting incorrect values of theta while trying to implement stochastic gradient descent

I am trying to implement Stochastic Gradient Descent algorithm for logistic regression. I have written a small train function whose job is to get the theta values / coefficients. But the values of theta come out to be incorrect and are same as the one initialised. I could not understand the reason for this. Is it not the correct way to implement stochastic gradient descent?
Here is the code I wrote for it:
train <- function(data, labels, alpha = 0.0009) {
theta <- seq(from = 0, to = 1, length.out = nrow(data))
label <- label[,shuffle]
data <- data[,shuffle]
for(i in seq(1:ncol(data))) {
h = hypothesis(x, theta)
theta <- theta - (alpha * ((h - y) * data[,i]))
}
return(theta)
}
Please note that, each column in the data frame is one input. There are 20K columns and 456 rows. So, 20K input values for training. The corresponding data frame named labels has the correct value for the input training data. So for example column 45 in data has its corresponding y value in column 45 of labels.
In the regression above, I am trying to train to predict between the label 1 and label 0. So labels is a data frame that comprises of 0 and 1.
I can't debug this for you without a minimal, complete, and verifiable example, but I can offer you a tool to help you debug it:
add browser() in the body of your function like this:
train <- function(data, labels, alpha = 0.001) {
browser()
# ... the rest of your function
Call train with your data. This will open up a browser session. You can enter help (not the function, just help) to get the commands to navigate in the browser, but in general, use n and s to step through the statements (s will step into a nested function call, n will step over). If you do this in RStudio, you can keep an eye on your environment tab to see what the values for, e.g., theta are, and see a current traceback. You can also evaluate any R expression, e.g., tail(theta) in the executing environment. Q exits the browser.
I'd recommend exploring what hypothesis returns in particular (I'd be surprised if it's not almost always 1). But I think you have other issues causing the undesired behavior you described (the return value for theta isn't changing from its initial assignment).
EDIT:
Fix the typo: label should be labels each time.
Compare the sum of your return with the sum of theta as it is initialized, and you'll see that the return value is not the same as your initialized theta. Hope that helped!

Finding x-value at intersection between a linear and nonlinear equation in R

I have two functions: one for a line (y) and another for a curve (hnc). I would like to determine the one x-value at which the two functions intersect
sigma = 0.075
mu = 0
r=0.226
theta=0.908
H=0.16
hnc <- function(x) (1/(sigma*sqrt(2*pi)))*(exp(-(x^2)/(2*(sigma^2))))
y <- function(x) 2*pi*x+(pi*r^2/((360/theta)/H))
curve(hnc,0,r,n=100,col="blue")
plot(y,0,r,add=T,col="red")
I have tried using the nleqslv package, but this results in two separate x-values that do not agree (perhaps because I am using it incorrectly)
int <- function(x){
z <- numeric(2)
z[1] <- (1/(sigma*sqrt(2*pi)))*(exp(-(x[1]^2)/(2*(sigma^2))))
z[2] <- 2*pi*x[2]+(pi*r^2/((360/theta)/H))
z}
nleqslv(c(0.14,0.14),int,method="Broyden")
Any help would be much appreciated!
Thanks,
Eric
Using optimize here to find the minimum of a function if a single variable seems to work well
xx <- optimize(function(x) abs(hnc(x)-y(x)), c(.10,.20))$minimum
abline(v=xx, lty=2)
You are not using nleqslv in the correct way. It is meant for solving a system of non linear equations with as many variables as there are equations.
You have two functions and you want to determine the intersection which in your case consists of a single value for x.
You need to define a new function like this
g <- function(x) hnc(x) - y(x)
Then you can use uniroot to find a zero of g(x) like this:
uniroot(g,c(0,1))
The root found will be 0.1417802 which corresponds with the graph in the first answer.
Minimizing won't always work to find a point of intersection; if there is no point of intersection you will get misleading results.

Computing integral of a line plot in R

I have two positive-valued vectors x,y of the same length in R. Using plot(x, y, "l",...), gives me a continuous line plot in 2 dimensions out of my finite vectors x and y. Is there a way to compute a definite integral over some range of this line plot in R?
edit1: I've looked into the integrate function in R. I'm not sure however how to make a function out of two vectors to pass to it, as my vectors are both finite.
edit2: For some more background, The length of x and y ~ 10,000. I've written a function to find periods, [xi, xj], of abnormalities in the data I'm observing. For each of these abnormalities, I've used plot to see what's going on in these snippets of my data. Now i need to compute statistics concerning the values of the integrals in these abnormal periods, so I'm trying to get as accurate as a number as possible to match with my graphs. X is a time variable, and I've taken very fine intervals of time.
You can do the integration with integrate(). To create a function out of your vectors x and y, you need to interpolate between the values. approxfun() does exactly that.
integrate takes a function and two bounds.
approxfun takes two vectors x and y just like those you have.
So my solution would be :
integrate(approxfun(x,y), range(x)[1], range(x)[2])
The approxfun function will take 2 vectors and return a function that gives the linear interpolation between the points. This can then be passed to functions like integrate. The splinefun function will also do interpolation, but based on a spline rather than piecewise linear.
In the piecewise linear case the integral will just be the sum of the trapezoids, it may be faster/simpler to just sum the areas of the trapezoids (the width, difference in x's
I landed here much later. But for future visitors,
here is some code for the suggestion from
Greg Snow's answer, for piece-wise linear functions:
line_integral <- function(x, y) {
dx <- diff(x)
end <- length(y)
my <- (y[1:(end - 1)] + y[2:end]) / 2
sum(dx *my)
}
# example
x <- c(0, 2, 3, 4, 5, 5, 6)
y <- c(0, 0, 1,-2,-1, 0, 0)
plot(x,y,"l")
line_integral(x,y)

Resources