Help understanding unipolar transfer function - math

There is a question I am stuck on using the following formula for the unipolar transfer function:
f(net)= 1
__________
-net
1 + e
The example has the following:
out = 1
____________ = 0.977
-3.75
1 + e
How do we arrive at 0.977?
What is e?

e = 2.71828... is the base of natural logarithms. It's a mathematical constant that comes up in many different equations, similar to π. You will see it all the time when doing exponents and logarithms.
Plug it into your equation and you get 0.977.

While factually correct the other responses merely provide the value of e and confirm the underlying computation. This type of sigmoid functions is so ubiquitous to neural networks that some additional insight may be welcome.
Essentially the exponential function (e to the x power), has a very characteristic curve:
Mostly flat at zero (very slightly above zero, actually), from - infinity to about -2
incrementally sharp turn towards the vertical, between about -2 and +4
quasi "vertical", with values in excess of 150 and increasingly huge, from +5 to infinity
As a result exponential curves are very useful for producing "S-shaped" functions; BTW, "S" is Sigma in Greek which supplied the etymology for "sigmoid". Such functions are often patterned on the formula shown in the question:
1/(1 + e^-x)
where x is the variable. Typically such functions also include constants aimed at stretching the range (the input zone where changes in x are significant) and/or at modifying the curve in this middle zone.
The result of such functions is that up to a particular value of the input, the function is quasi constant, then, for a particular range of inputs, the function provides a increasing output, and finally past the upper value of the range, the function is quasi constant. Also looking in more details, such Sigmoids have a point of inflection which correspond to a reversing of the rate of change of the ouptut and which also marks an area of the curve, on either side, where the changes are the slowest, relatively.
In turn, such S-shaped curves (1) are very useful to normalize the output of neural network neurons, or more generally, to normalize various numeric values during processes of various nature. Intuitively these correspond to a "sweet spot" or a "sweet range" of the underlying neuron or device.
(1) Or also, possibly, "step-down" shaped curves, i.e. curves with a mostly constant high value, a decreasing value within the mid-range, and a low mostly constant value thereafter.

e is Euler's number == 2.718281828....
If you raise e to the -3.75 power, add one to it, and take the inverse, you'll get precisely 0.977022630....

'e' is the base for the natural logarithm function, the value of which is equivalent to the sum of the infinite series 1/n! for n from 0 to infinity. It is available in the C standard library or the java Math package as the exp() function.
If you evaluate 1/(1+exp(-3.75)) you will get 0.977

Related

lpsolve - unfeasible solution, but I have example of 1

I'm trying to solve this in LPSolve IDE:
/* Objective function */
min: x + y;
/* Variable bounds */
r_1: 2x = 2y;
r_2: x + y = 1.11 x y;
r_3: x >= 1;
r_4: y >= 1;
but the response I get is:
Model name: 'LPSolver' - run #1
Objective: Minimize(R0)
SUBMITTED
Model size: 4 constraints, 2 variables, 5 non-zeros.
Sets: 0 GUB, 0 SOS.
Using DUAL simplex for phase 1 and PRIMAL simplex for phase 2.
The primal and dual simplex pricing strategy set to 'Devex'.
The model is INFEASIBLE
lp_solve unsuccessful after 2 iter and a last best value of 1e+030
How come this can happen when x=1.801801802 and y=1.801801802 are possible solutions here?
How To Find The Solution
Let's do some math.
Your problem is:
min x+y
s.t. 2x = 2y
x + y = 1.11 x y
x >= 1
y >= 1
The first constraint 2x = 2y can be simplified to x=y. We now substitute throughout the problem:
min 2*x
s.t. 2*x = 1.11 x^2
x >= 1
And rearrange:
min 2*x
s.t. 1.11 x^2-2*x=0
x >= 1
From geometry we know that 1.11 x^2-2*x makes an upward-opening parabola with a minimum less than zero. Therefore, there are exactly two points. These are given by the quadratic equation: 200/111 and 0.
Only one of these satisfies the second constraint: 200/111.
Why Can't I Find This Constraint With My Solver
The easy way out is to say it's because the x^2 term (x*y before the substitution is nonlinear). But it goes a little deeper than that. Nonlinear problems can be easy to solve as long as they are convex. A convex problem is one whose constraints form a single, contiguous space such that any line drawn between two points in the space stays within the boundaries of the space.
Your problem is not convex. The constraint 1.11 x^2-2*x=0 defines an infinite number of points. No two of these points can be connected by a straight line which stays in the space defined by the constraint because that space is curved. If the constraint were instead 1.11 x^2-2*x<=0 then the space would be convex because all points could be connected with straight lines that stay in its interior.
Nonconvex problems are part of a broader class of problems called NP-Hard. This means that there is not (and perhaps cannot) be any easy way of solving the problem. We have to be smart.
Solvers that can handle mixed-integer programming (MIP/MILP) can solve many non-convex problems efficiently, as can other techniques such as genetic algorithms. But, beneath the hood, these techniques all rely on glorified guess-and-check.
So your solver fails because the problem is nonconvex and your solver is neither smart enough to use MIP to guess-and-check its way to a solution nor smart enough to use the quadratic equation.
How Then Can I Solve The Problem?
In this particular instance, we are able to use mathematics to quickly find a solution because, although the problem is nonconvex, it is part of a class of special cases. Deep thinking by mathematicians has given us a simple way of handling this class.
But consider a few generalizations of the problem:
(a) a x^3+b x^2+c x+d=0
(b) a x^4+b x^3+c x^2+d x+e =0
(c) a x^5+b x^4+c x^3+d x^2+e x+f=0
(a) has three potential solutions which must be checked (exact solutions are tricky), (b) has four (trickier), and (c) has five. The formulas for (a) and (b) are much more complex than the quadratic formula and mathematicians have shown that there is no formula for (c) that can be expressed using "elementary operations". Instead, we have to resort to glorified guess-and-check.
So the techniques we used to solve your problem don't generalize very well. This is what it means to live in the realm of the nonconvex and NP-hard, and it's a good reason to fund research in mathematics, computer science, and related fields.

Calculate Normals from Heightmap

I am trying to convert an heightmap into a matrix of normals using central differencing which will later correspond to the steepness of a giving point.
I found several links with correct results but without explaining the math behind.
T
L O R
B
From this link I realised I can just do:
Vec3 normal = Vec3(2*(R-L), 2*(B-T), -4).Normalize();
The thing is that I don't know where the 2* and -4 comes from.
In this explanation of central differencing I see that we should divide that value by 2, but I still don't know how to connect all of this.
What I really want to know is the linear algebra definition behind this.
I have an heightmap, I want to measure the central differences and I want to obtain the normal vector to use later to measure the steepness.
PS: the Z-axis is the height.
From vector calculus, the normal of a surface is given by the gradient operator:
A height map h(x, y) is a special form of the function f:
For a discretized height map, assuming that the grid size is 1, the first-order approximations to the two derivative terms above are given by:
Since the x step from L to R is 2, and same for y. The above is exactly the formula you had, divided through by 4. When this vector is normalized, the factor of 4 is canceled.
(No linear algebra was harmed in the writing of this answer)

Simple 2 or 3 parameters float PRNG formula that changes faster than the float resolution and produces white noise?

I'm looking for a 2 or 3 parameters math formula with the following characteristics:
Simple (the fewest amount of operations the better)
Random output (non-periodic)
Normalized (Meaning the output will never be outside a given range; doesn't matter the range since once I know the range I can just divide and add/subtract to get it into the 0 to 1 range I'm looking for)
White noise (the more samples you get the more evenly distributed the outputs get across the range of possible output values, with no gaps or hotspots, to the extent permitted by the floating-point standard)
Random all the way down (no gradual changes between output values even if the inputs are changed by the smallest amount the float standard will allow. I understand that given the nature of randomness, it is possible two output values might be close together once in a while, but that must only happen by coincidence, and not because of smoothness or periodicity)
Uses only the operations listed bellow (but of course, any operations that can be done by a combination of the ones listed bellow are also allowed)
I need this because I need a good source of controllable randomness for some experiments I'm doing with Cycles material nodes in Blender. And since that is where the formula will be implemented, the only operations I have available are:
Addition
Subtraction
Multiplication
Division
Power (X to the power of Y)
Logarithm (I think it's X Log Y; I'm not very familiar with the logarithm operation, so I'm not 100% sure if that is enough to specify which type of logarithm it is; let me know if you need more information about it)
Sine
Cosine
Tangent
Arcsine
Arccosine
Arctangent (not Atan2, but that can be created by combining operations if necessary)
Minimum (Returns the lowest of 2 numbers)
Maximum (Returns the highest of 2 numbers)
Round (Returns the closest round number to the input)
Less-than (Returns 1 if X is less than Y, zero otherwise)
Greater-than (Returns 1 if X is more than Y, zero otherwise)
Modulo (Produces a sawtooth pattern of period Y; for positive X values it's in the 0 to Y range, and for negative values of X it's in the -Y to zero range)
Absolute (strips the sign of the input value, makes it positive if it was negative, doesn't do anything if it's already positive)
There is no iteration nor looping functionality available (and of course, branching can only be done by calculating all the branches and then doing something like multiplying the results of the branches not meant to be taken by zero and then adding the results of all of them together).

Convergence of R density() function to a delta function

I'm a bit puzzled by the behavior of the R density() function in an edge case...
Suppose I add more and more points with x=0 into a simulated data set. What I expect is that the density estimate will very quickly converge (I'm being deliberately vague about what that means...) to a delta function at x=0. In practice, the fit certainly gets narrower, but very slowly, as shown by this sequence of plots:
plot(density(c(0,0)), xlim=c(-2,2))
plot(density(c(0,0,0,0)), xlim=c(-2,2))
plot(density(c(rep(0,10000))), xlim=c(-2,2))
plot(density(c(rep(0,10000000))), xlim=c(-2,2))
But if you add a tiny bit of noise to the simulated data, the behavior is much better:
plot(density(0.0000001*rnorm(10000000) + c(rep(0,10000000))), xlim=c(-2,2))
Just let sleeping dogs lie? Or am I missing something about the usage of density()?
Per ?bw.nrd0, the default bandwidth selector for density:
bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator. It defaults to 0.9 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one-fifth power (= Silverman's ‘rule of thumb’, Silverman (1986, page 48, eqn (3.31)) unless the quartiles coincide when a positive result will be guaranteed.
When your data is constant, then the quartiles coincide, so the last clause guaranteeing a positive result kicks in. This basically means that the bandwidth chosen is not a continuous function of the spread of the data, at zero.
To illustrate:
> bw.nrd0(rep(0, 1e6))
[1] 0.05678616
> bw.nrd0(rnorm(1e6, s=1e-6))
[1] 5.672872e-08
Actually (...tail between legs...) I now realize that my entire question was misguided. Being fairly new to R, I had instantly assumed that density() tries to fit Gaussians of different widths to the data points, optimizing both the number of Gaussians and their individual widths. But in fact it is doing something much simpler. It just smears out each data point, and adds up the smears to give a smoothed estimate of the data. density() is just a simple smoothing algorithm. So, yes indeed, RTFM :)

differentiation in matlab

i need to find acceleration of an object the formula for that given in text is a = d^2(L)/d(T)^2 , where L= length and T= time
i calculated this in matlab by using this equation
a = (1/(T3-T1))*(((L3-L2)/(T3-T2))-((L2-L1)/(T2-T1)))
or
a = (v2-v1)/(T2-T1)
but im not getting the right answers ,can any body tell me how to find (a) by any other method in matlab.
This has nothing to do with matlab, you are just trying to numerically differentiate a function twice. Depending on the behaviour of the higher (3rd, 4th) derivatives of the function this will or will not yield reasonable results. You will also have to expect an error of order |T3 - T1|^2 with a formula like the one you are using, assuming L is four times differentiable. Instead of using intervals of different size you may try to use symmetric approximations like
v (x) = (L(x-h) - L(x+h))/ 2h
a (x) = (L(x-h) - 2 L(x) + L(x+h))/ h^2
From what I recall from my numerical math lectures this is better suited for numerical calculation of higher order derivatives. You will still get an error of order
C |h|^2, with C = O( ||d^4 L / dt^4 || )
with ||.|| denoting the supremum norm of a function (that is, the fourth derivative of L needs to be bounded). In case that's true you can use that formula to calculate how small h has to be chosen in order to produce a result you are willing to accept. Note, though, that this is just the theoretical error which is a consequence of an analysis of the Taylor approximation of L, see [1] or [2] -- this is where I got it from a moment ago -- or any other introductory book on numerical mathematics. You may get additional errors depending on the quality of the evaluation of L; also, if |L(x-h) - L(x)| is very small numerical substraction may be ill conditioned.
[1] Knabner, Angermann; Numerik partieller Differentialgleichungen; Springer
[2] http://math.fullerton.edu/mathews/n2003/numericaldiffmod.html

Resources