Problem minimizing function in Matlab (fmincon) - math

I have a function which calculates the acoustic strength of a fish depending on the incident angle of the wavefront on the fish. I also have some in situ measurements of acoustic strength. What I'm trying to do is figure out which normal distribution of angles results in the model data matching up most closely with the in situ data.
To do this, I'm trying to use the Matlab function fmincon to minimize the following function:
function f = myfun(x)
TS_krm = KRM(normrnd(x(1),x(2),100,1), L);
f = sum((TS_insitu - TS_krm).^2);
So what this function does is calculates the sum of squared residuals, which I want to minimize. To do this, I try using fmincon:
x = fmincon(#myfun, [65;8], [], [], [], [], [0;0], [90;20], [], options);
Thus, I'm using a starting orientation with a mean of 65 degrees and a standard deviation of 8. I'm also setting the mean angle bounds to be from 0 to 90 degrees and the standard deviation bounds to be from 0 to 20 degrees.
Yet it doesn't seem to be properly finding the mean and standard deviation angles which minimize the function. Usually it outputs something right around N(65,8), almost like it isn't really trying many other values far from the starting points.
Any ideas on what I can do to make this work? I know I can set the TolX and TolFun settings, but I'm not really sure what those do and what effect they'd have. If it helps, the typical values that I'm dealing with are usually around -45 dB.
Thanks!

you should look at the order of magnitude of the values of f for different inputs. it might influence the values you need to put in TolFun (the tolerance of the minimization algorithm to changes in f). for example, if TolFun = 1e-6 and the difference between f(45) and f(64) is 1e-7, the algorithm might stop at 65.
also, i think the algorith that you are using assume that the functions is differentiable (it uses derivatives to find "where to go next"), not sure this is the case in your function. if it is not, you should use simplex to find the minimum.

Related

How to increase precision of solution of nlm-solver

Given is a function F1:
F1 <- function(C1,C2,C3,...,x,u_target) {
# a lot of equations follow
...
u_actual - u_target
}
F1 returns the result of the very last equation
u_actual - u_target
I want to determine the value for the parameter x in a way that the result of the last equation converges to zero. With
nlm(f=F1,p=c(0),C1=C1,C2=C2,...,stepmax=0.001,ndigit=8)
I get a result, but not a satisfying one:
u_actual = 0.1316566
u_target = 0.1
I played a lot with the arguments of the nlm command (gradtol,stepmax,iterlim etc.), but I was not able to get a better result. I also tried optim, optimize and uniroot, but was not able to get them run at all.
u and x show a negative exponential development. With decreasing x, u increases exponential. If x is zero, u results in a finite value. x also has an upper boundary, which is unknown. So I guessed it would be promising if the iteration starts at the lower boundary (zero) and increases step by step. However, whether I decrease or increase the value of stepmax, the result is not getting better.
I would appreciate any hint from the r-community.
Thank you very much.
PS: in matlab a colleague uses fsolve(#(x) F1(x,u_target,C1,C2,...),0), and it works fine.

R: one-dimensional optimization

I would like to use optimize(), or something similar, to search for a minimum / maximum value of a function. However I am unsure of about the exact range over which the function should be optimized, which is a required parameter for the function 'optimze()' (e.g. optimize(f=FUN,interval=c(lowerBound,upperBound))).
In this optimization problem, I am able to estimate a value a that is "close" to the optimal solution, but "closeness" depends on the situation.
Is there a function in R that can use the initial value a that does not require that the interval over which the function is optimized to be specified up front?
When you say you're not sure about the lower limit, I suspect that this means that the parameter you are trying to estimate is not bounded below.
If this the case, one trick is to transform the function so that there is a lower bound on the parameter.
This trivial function has a minimum at x=4:
fun <- function(x) -exp(-(x - 4)^2) + 8
which we can find via:
optimize(f=fun,interval=c(0,8))
#> $minimum
#> [1] 4
but let's pretend for a moment that we're not sure if there is a lower limit or not, and that we know that the upper limit is 8. R will throw an error if we try:
optimize(f=fun,interval=c(-Inf,8))
because the bounds must be finite. In this case, we can use the exponential transformation (exp()) which maps
the real numbers to the positive numbers, like so:
optimize(f=function(x)fun(log(x)),
interval=exp(c(-Inf,8)))
#> $minimum
#> [1] 54.59815
and then to get the root, you just need to back transform the above the solution via:
log(54.59815)
#> 4
If you don't know either the upper or lower bound on the underlying parameter, then you can use the log-odds transformation in place of the log():
function(x) log(x/(1-x))
and it's inverse in place of exp():
function(y) exp(y)/(1 + exp(y))
Note that the log-odds transformation maps the real numbers onto the unit interval, so the interval parameter becomes 0:1.
These solutions do have some numerical limitations (e.g. if we had set interval=exp(c(-Inf,16)) in the first solution, we would have gotten an error). Tip, you can re-scale these transformations to center around a given point a which can reduce the numerical limitations.

Can a very large (or very small) value in feature vector using SVC bias results? [scikit-learn]

I am trying to better understand how the values of my feature vector may influence the result. For example, let's say I have the following vector with the final value being the result (this is a classification problem using an SVC, for example):
0.713, -0.076, -0.921, 0.498, 2.526, 0.573, -1.117, 1.682, -1.918, 0.251, 0.376, 0.025291666666667, -200, 9, 1
You'll notice that most of the values center around 0, however, there is one value that is orders of magnitude smaller, -200.
I'm concerned that this value is skewing the prediction and is being weighted unfairly heavier than the rest simply because the value is so much different.
Is this something to be concerned about when creating a feature vector? Or will the statistical test I use to evaluate my vector control for this large (or small) value based on the training set I provide it with? Are there methods available in sci-kit learn specifically that you would recommend to normalize the vector?
Thank you for your help!
Yes, it is something you should be concerned about. SVM is heavily influenced by any feature scale variances, so you need a preprocessing technique in order to make it less probable, from the most popular ones:
Linearly rescale each feature dimension to the [0,1] or [-1,1] interval
Normalize each feature dimension so it has mean=0 and variance=1
Decorrelate values by transformation sigma^(-1/2)*X where sigma = cov(X) (data covariance matrix)
each can be easily performed using scikit-learn (although in order to achieve the third one you will need a scipy for matrix square root and inversion)
I am trying to better understand how the values of my feature vector may influence the result.
Then here's the math for you. Let's take the linear kernel as a simple example. It takes a sample x and a support vector sv, and computes the dot product between them. A naive Python implementation of a dot product would be
def dot(x, sv):
return sum(x_i * sv_i for x_i, sv_i in zip(x, sv))
Now if one of the features has a much more extreme range than all the others (either in x or in sv, or worse, in both), then the term corresponding to this feature will dominate the sum.
A similar situation arises with the polynomial and RBF kernels. The poly kernel is just a (shifted) power of the linear kernel:
def poly_kernel(x, sv, d, gamma):
return (dot(x, sv) + gamma) ** d
and the RBF kernel is the square of the distance between x and sv, times a constant:
def rbf_kernel(x, sv, gamma):
diff = [x_i - sv_i for x_i, sv_i in zip(x, sv)]
return gamma * dot(diff, diff)
In each of these cases, if one feature has an extreme range, it will dominate the result and the other features will effectively be ignored, except to break ties.
scikit-learn tools to deal with this live in the sklearn.preprocessing module: MinMaxScaler, StandardScaler, Normalizer.

Help understanding unipolar transfer function

There is a question I am stuck on using the following formula for the unipolar transfer function:
f(net)= 1
__________
-net
1 + e
The example has the following:
out = 1
____________ = 0.977
-3.75
1 + e
How do we arrive at 0.977?
What is e?
e = 2.71828... is the base of natural logarithms. It's a mathematical constant that comes up in many different equations, similar to π. You will see it all the time when doing exponents and logarithms.
Plug it into your equation and you get 0.977.
While factually correct the other responses merely provide the value of e and confirm the underlying computation. This type of sigmoid functions is so ubiquitous to neural networks that some additional insight may be welcome.
Essentially the exponential function (e to the x power), has a very characteristic curve:
Mostly flat at zero (very slightly above zero, actually), from - infinity to about -2
incrementally sharp turn towards the vertical, between about -2 and +4
quasi "vertical", with values in excess of 150 and increasingly huge, from +5 to infinity
As a result exponential curves are very useful for producing "S-shaped" functions; BTW, "S" is Sigma in Greek which supplied the etymology for "sigmoid". Such functions are often patterned on the formula shown in the question:
1/(1 + e^-x)
where x is the variable. Typically such functions also include constants aimed at stretching the range (the input zone where changes in x are significant) and/or at modifying the curve in this middle zone.
The result of such functions is that up to a particular value of the input, the function is quasi constant, then, for a particular range of inputs, the function provides a increasing output, and finally past the upper value of the range, the function is quasi constant. Also looking in more details, such Sigmoids have a point of inflection which correspond to a reversing of the rate of change of the ouptut and which also marks an area of the curve, on either side, where the changes are the slowest, relatively.
In turn, such S-shaped curves (1) are very useful to normalize the output of neural network neurons, or more generally, to normalize various numeric values during processes of various nature. Intuitively these correspond to a "sweet spot" or a "sweet range" of the underlying neuron or device.
(1) Or also, possibly, "step-down" shaped curves, i.e. curves with a mostly constant high value, a decreasing value within the mid-range, and a low mostly constant value thereafter.
e is Euler's number == 2.718281828....
If you raise e to the -3.75 power, add one to it, and take the inverse, you'll get precisely 0.977022630....
'e' is the base for the natural logarithm function, the value of which is equivalent to the sum of the infinite series 1/n! for n from 0 to infinity. It is available in the C standard library or the java Math package as the exp() function.
If you evaluate 1/(1+exp(-3.75)) you will get 0.977

approximation methods

I attached image:
(source: piccy.info)
So in this image there is a diagram of the function, which is defined on the given points.
For example on points x=1..N.
Another diagram, which was drawn as a semitransparent curve,
That is what I want to get from the original diagram,
i.e. I want to approximate the original function so that it becomes smooth.
Are there any methods for doing that?
I heard about least squares method, which can be used to approximate a function by straight line or by parabolic function. But I do not need to approximate by parabolic function.
I probably need to approximate it by trigonometric function.
So are there any methods for doing that?
And one idea, is it possible to use the Least squares method for this problem, if we can deduce it for trigonometric functions?
One more question!
If I use the discrete Fourier transform and think about the function as a sum of waves, so may be noise has special features by which we can define it and then we can set to zero the corresponding frequency and then perform inverse Fourier transform.
So if you think that it is possible, then what can you suggest in order to identify the frequency of noise?
Unfortunately many solutions here presented don't solve the problem and/or they are plain wrong.
There are many approaches and they are specifically built to solve conditions and requirements you must be aware of !
a) Approximation theory: If you have a very sharp defined function without errors (given by either definition or data) and you want to trace it exactly as possible, you are using
polynominal or rational approximation by Chebyshev or Legendre polynoms, meaning that you
approach the function by a polynom or, if periodical, by Fourier series.
b) Interpolation: If you have a function where some points (but not the whole curve!) are given and you need a function to get through this points, you can use several methods:
Newton-Gregory, Newton with divided differences, Lagrange, Hermite, Spline
c) Curve fitting: You have a function with given points and you want to draw a curve with a given (!) function which approximates the curve as closely as possible. There are linear
and nonlinear algorithms for this case.
Your drawing implicates:
It is not remotely like a mathematical function.
It is not sharply defined by data or function
You need to fit the curve, not some points.
What do you want and need is
d) Smoothing: Given a curve or datapoints with noise or rapidly changing elements, you only want to see the slow changes over time.
You can do that with LOESS as Jacob suggested (but I find that overkill, especially because
choosing a reasonable span needs some experience). For your problem, I simply recommend
the running average as suggested by Jim C.
http://en.wikipedia.org/wiki/Running_average
Sorry, cdonner and Orendorff, your proposals are well-minded, but completely wrong because you are using the right tools for the wrong solution.
These guys used a sixth polynominal to fit climate data and embarassed themselves completely.
http://scienceblogs.com/deltoid/2009/01/the_australians_war_on_science_32.php
http://network.nationalpost.com/np/blogs/fullcomment/archive/2008/10/20/lorne-gunter-thirty-years-of-warmer-temperatures-go-poof.aspx
Use loess in R (free).
E.g. here the loess function approximates a noisy sine curve.
(source: stowers-institute.org)
As you can see you can tweak the smoothness of your curve with span
Here's some sample R code from here:
Step-by-Step Procedure
Let's take a sine curve, add some
"noise" to it, and then see how the
loess "span" parameter affects the
look of the smoothed curve.
Create a sine curve and add some noise:
period <- 120 x <- 1:120 y <-
sin(2*pi*x/period) +
runif(length(x),-1,1)
Plot the points on this noisy sine curve:
plot(x,y, main="Sine Curve +
'Uniform' Noise") mtext("showing
loess smoothing (local regression
smoothing)")
Apply loess smoothing using the default span value of 0.75:
y.loess <- loess(y ~ x, span=0.75,
data.frame(x=x, y=y))
Compute loess smoothed values for all points along the curve:
y.predict <- predict(y.loess,
data.frame(x=x))
Plot the loess smoothed curve along with the points that were already
plotted:
lines(x,y.predict)
You could use a digital filter like a FIR filter. The simplest FIR filter is just a running average. For more sophisticated treatment look a something like a FFT.
This is called curve fitting. The best way to do this is to find a numeric library that can do it for you. Here is a page showing how to do this using scipy. The picture on that page shows what the code does:
(source: scipy.org)
Now it's only 4 lines of code, but the author doesn't explain it at all. I'll try to explain briefly here.
First you have to decide what form you want the answer to be. In this example the author wants a curve of the form
f(x) = p0 cos (2π/p1 x + p2) + p3 x
You might instead want the sum of several curves. That's OK; the formula is an input to the solver.
The goal of the example, then, is to find the constants p0 through p3 to complete the formula. scipy can find this array of four constants. All you need is an error function that scipy can use to see how close its guesses are to the actual sampled data points.
fitfunc = lambda p, x: p[0]*cos(2*pi/p[1]*x+p[2]) + p[3]*x # Target function
errfunc = lambda p: fitfunc(p, Tx) - tX # Distance to the target function
errfunc takes just one parameter: an array of length 4. It plugs those constants into the formula and calculates an array of values on the candidate curve, then subtracts the array of sampled data points tX. The result is an array of error values; presumably scipy will take the sum of the squares of these values.
Then just put some initial guesses in and scipy.optimize.leastsq crunches the numbers, trying to find a set of parameters p where the error is minimized.
p0 = [-15., 0.8, 0., -1.] # Initial guess for the parameters
p1, success = optimize.leastsq(errfunc, p0[:])
The result p1 is an array containing the four constants. success is 1, 2, 3, or 4 if ths solver actually found a solution. (If the errfunc is sufficiently crazy, the solver can fail.)
This looks like a polynomial approximation. You can play with polynoms in Excel ("Add Trendline" to a chart, select Polynomial, then increase the order to the level of approximation that you need). It shouldn't be too hard to find an algorithm/code for that.
Excel can show the equation that it came up with for the approximation, too.

Resources