How to perform feature Normalize in Julia? - julia

I have written a "featureNormalize" function for Andrew Ng's machine learning course in Julia.
It works for input feature but for predicting the output values if I input as
predict = featureNormalize([1650 3])
The output is
1×2 Matrix{Float64}:
NaN NaN
function featureNormalize(X)
mu = mean(X,dims=1)
sigma = std(X,dims=1)
X_norm = (X .- mu) ./ sigma
return X_norm
end

You need a data set with at least two examples in order to compute std, otherwise it will return NaN. If you really need to consider this corner case, you can introduce an if-else branch in the function body.

You may want to consider that sometimes you need to skip some features from the normalisation or that you may need to revert the normalisation (when you normalise y).
Here is an example that accounts for these two cases (from BetaML.jl):
"""julia
getScaleFactors(x;skip)
Return the scale factors (for each dimensions) in order to scale a matrix X (n,d)
such that each dimension has mean 0 and variance 1.
# Parameters
- `x`: the (n × d) dimension matrix to scale on each dimension d
- `skip`: an array of dimension index to skip the scaling [def: `[]`]
# Return
- A touple whose first elmement is the shift and the second the multiplicative
term to make the scale.
"""
function getScaleFactors(x;skip=[])
μ = mean(x,dims=1)
σ² = var(x,corrected=false,dims=1)
sfμ = - μ
sfσ² = 1 ./ sqrt.(σ²)
for i in skip
sfμ[i] = 0
sfσ²[i] = 1
end
return (sfμ,sfσ²)
end
"""
scale(x,scaleFactors;rev)
Perform a linear scaling of x using scaling factors `scaleFactors`.
# Parameters
- `x`: The (n × d) dimension matrix to scale on each dimension d
- `scalingFactors`: A tuple of the constant and multiplicative scaling factor
respectively [def: the scaling factors needed to scale x to mean 0 and variance 1]
- `rev`: Whether to invert the scaling [def: `false`]
# Return
- The scaled matrix
# Notes:
- Also available `scale!(x,scaleFactors)` for in-place scaling.
- Retrieve the scale factors with the `getScaleFactors()` function
"""
function scale(x,scaleFactors=(-mean(x,dims=1),1 ./ sqrt.(var(x,corrected=false,dims=1))); rev=false )
if (!rev)
y = (x .+ scaleFactors[1]) .* scaleFactors[2]
else
y = (x ./ scaleFactors[2]) .- scaleFactors[1]
end
return y
end
function scale!(x,scaleFactors=(-mean(x,dims=1),1 ./ sqrt.(var(x,corrected=false,dims=1))); rev=false)
if (!rev)
x .= (x .+ scaleFactors[1]) .* scaleFactors[2]
else
x .= (x ./ scaleFactors[2]) .- scaleFactors[1]
end
return nothing
end

Related

R Derivatives of an Inverse

I have an expression that contains several parts. However, for simplicity, consider only the following part as MWE:
Let's assume we have the inverse of a matrix Y that I want to differentiate w.r.t. x.
Y is given as I - (x * b * t(b)), where I is the identity matrix, x is a scalar, and b is a vector.
According to The Matrix Cookbook Equ. 59, the partial derivative of an inverse is:
Normally I would use the function D from the package stats to calculate the derivatives. But that is not possible in this case, because e.g. solve to specify Y as inverse and t() is not in the table of derivatives.
What is the best workaround to circumvent this problem? Are there any other recommended packages that can handle such input?
Example that doesn't work:
f0 <- expression(solve(I - (x * b %*% t(b))))
D(f0, "x")
Example that works:
f0 <- expression(x^3)
D(f0, "x")
3 * x^2
I assume that the question is how to get an explicit expression for the derivative of the inverse of Y with respect to x. In the first section we compute it and in the second section we double check it by computing it numerically and show that the two approaches give the same result.
b and the null space of b are both eigenspaces of Y which we can readily verify by noting that Yb = (1-(b'b)x)b and if z belongs to the nullspace of b then Yz = z. This also shows that the corresponding eigenvalues are 1 - x(b'b) with multiplicity 1 and 1 with multiplicity n-1 (since the nullspace of b has that dimension).
As a result of the fact that we can expand such a matrix into the sum of each eigenvalue times the projection onto its eigenspace we can express Y as the following where bb'/b'b is the projection onto the eigenspace spanned by b and the part pre-multiplying it is the eigenvalue. The remaining terms do not involve x because they involve an eigenvalue of 1 independently of x and the nullspace of b is independent of x as well.
Y = (1-x(b'b))(bb')/(b'b) + terms not involving x
The inverse of Y is formed by taking the reciprocals of the eigenvalues so:
Yinv = 1/(1-x(b'b)) * (bb')/(b'b) + terms not involving x
and the derivative of that wrt x is:
(b'b) / (1 - x(b'b))^2 * (bb')/(b'b)
Cancelling the b'b and writing the derivative in terms of R code:
1/(1 - x*sum(b*b))^2*outer(b, b)
Double check
Using specific values for b and x we can verify it against the numeric derivative as follows:
library(numDeriv)
x <- 1
b <- 1:3
# Y inverse as a function of x
Yinv <- function(x) solve(diag(3) - x * outer(b, b))
all.equal(matrix(jacobian(Yinv, x = 1), 3),
1/(1 - x*sum(b*b))^2*outer(b, b))
## [1] TRUE

Is there any way to bound the region searched by NLsolve in Julia?

I'm trying to find one of the roots of a nonlinear (roughly quartic) equation.
The equation always has four roots, a pair of them close to zero, a large positive, and a large negative root. I'd like to identify either of the near zero roots, but nlsolve, even with an initial guess very close to these roots, seems to always converge on the large positive or negative root.
A plot of the function essentially looks like a constant negative value, with a (very narrow) even-ordered pole near zero, and gradually rising to cross zero at the large positive and negative roots.
Is there any way I can limit the region searched by nlsolve, or do something to make it more sensitive to the presence of this pole in my function?
EDIT:
Here's some example code reproducing the problem:
using NLsolve
function f!(F,x)
x = x[1]
F[1] = -15000 + x^4 / (x+1e-5)^2
end
# nlsolve will find the root at -122
nlsolve(f!,[0.0])
As output, I get:
Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [0.0]
* Zero: [-122.47447713915808]
* Inf-norm of residuals: 0.000000
* Iterations: 15
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 16
* Jacobian Calls (df/dx): 6
We can find the exact roots in this case by transforming the objective function into a polynomial:
using PolynomialRoots
roots([-1.5e-6,-0.3,-15000,0,1])
produces
4-element Array{Complex{Float64},1}:
122.47449713915809 - 0.0im
-122.47447713915808 + 0.0im
-1.0000000813048448e-5 + 0.0im
-9.999999186951818e-6 + 0.0im
I would love a way to identify the pair of roots around the pole at x = -1e-5 without knowing the exact form of the objective function.
EDIT2:
Trying out Roots.jl :
using Roots
f(x) = -15000 + x^4 / (x+1e-5)^2
find_zero(f,0.0) # finds +122... root
find_zero(f,(-1e-4,0.0)) # error, not a bracketing interval
find_zeros(f,-1e-4,0.0) # finds 0-element Array{Float64,1}
find_zeros(f,-1e-4,0.0,no_pts=6) # finds root slightly less than -1e-5
find_zeros(f,-1e-4,0.0,no_pts=10) # finds 0-element Array{Float64,1}, sensitive to value of no_pts
I can get find_zeros to work, but it's very sensitive to the no_pts argument and the exact values of the endpoints I pick. Doing a loop over no_pts and taking the first non-empty result might work, but something more deterministic to converge would be preferable.
EDIT3 :
Here's applying the tanh transformation suggested by Bogumił
using NLsolve
function f_tanh!(F,x)
x = x[1]
x = -1e-4 * (tanh(x)+1) / 2
F[1] = -15000 + x^4 / (x+1e-5)^2
end
nlsolve(f_tanh!,[100.0]) # doesn't converge
nlsolve(f_tanh!,[1e5]) # doesn't converge
using Roots
function f_tanh(x)
x = -1e-4 * (tanh(x)+1) / 2
return -15000 + x^4 / (x+1e-5)^2
end
find_zeros(f_tanh,-1e10,1e10) # 0-element Array
find_zeros(f_tanh,-1e3,1e3,no_pts=100) # 0-element Array
find_zero(f_tanh,0.0) # convergence failed
find_zero(f_tanh,0.0,max_evals=1_000_000,maxfnevals=1_000_000) # convergence failed
EDIT4 : This combination of techniques identifies at least one root somewhere around 95% of the time, which is good enough for me.
using Peaks
using Primes
using Roots
# randomize pole location
a = 1e-4*rand()
f(x) = -15000 + x^4 / (x+a)^2
# do an initial sample to find the pole location
l = 1000
minval = -1e-4
maxval = 0
m = []
sample_r = []
while l < 1e6
sample_r = range(minval,maxval,length=l)
rough_sample = f.(sample_r)
m = maxima(rough_sample)
if length(m) > 0
break
else
l *= 10
end
end
guess = sample_r[m[1]]
# functions to compress the range around the estimated pole
cube(x) = (x-guess)^3 + guess
uncube(x) = cbrt(x-guess) + guess
f_cube(x) = f(cube(x))
shift = l ÷ 1000
low = sample_r[m[1]-shift]
high = sample_r[m[1]+shift]
# search only over prime no_pts, so no samplings divide into each other
# possibly not necessary?
for i in primes(500)
z = find_zeros(f_cube,uncube(low),uncube(high),no_pts=i)
if length(z)>0
println(i)
println(cube.(z))
break
end
end
More comment could be given if you provided more information on your problem.
However in general:
It seems that your problem is univariate, in which case you can use Roots.jl where find_zero and find_zeros give the interface you ask for (i.e. allowing to specify the search region)
If a problem is multivariate you have several options how to do it in the problem specification for nlsolve (as it by default does not allow to specify a bounding box AFAICT). The simplest is to use variable transformation. E.g. you can apply a ai * tanh(xi) + bi transformation selecting ai and bi for each variable so that it is bounded to the desired interval
The first problem you have in your definition is that the way you define f it never crosses 0 near the two roots you are looking for because Float64 does not have enough precision when you write 1e-5. You need to use greater precision of computations:
julia> using Roots
julia> f(x) = -15000 + x^4 / (x+1/big(10.0^5))^2
f (generic function with 1 method)
julia> find_zeros(f,big(-2*10^-5), big(-8*10^-6), no_pts=100)
2-element Array{BigFloat,1}:
-1.000000081649671426108658262468117284940444265467160592853348997523986352593615e-05
-9.999999183503552405580084054429938261707450678661727461293670518591720605751116e-06
and set no_pts to be sufficiently large to find intervals bracketing the roots.

Yun's algorithm

I would like to try to implement Yun's algorithm for square-free factorization of polynomials. From Wikipedia (f is the polynomial):
a0 = gcd(f, f'); b1 = f/a0; c1 = f'/a0; d1 = c1 - b1'; i = 1
repeat
ai = gcd(bi, di); bi+1 = bi/ai; ci+1 = di/ai; i = i + 1; di = ci - bi'
until b = 1
However, I'm not sure about the second step. I would like to use it for polynomials with integer coefficients (not necessary monic or primitive). Is it possible to realize the division b1 = f/a0 using just integers?
I found the code for synthetic division:
def extended_synthetic_division(dividend, divisor):
'''Fast polynomial division by using Extended Synthetic Division. Also works with non-monic polynomials.'''
# dividend and divisor are both polynomials, which are here simply lists of coefficients. Eg: x^2 + 3x + 5 will be represented as [1, 3, 5]
out = list(dividend) # Copy the dividend
normalizer = divisor[0]
for i in xrange(len(dividend)-(len(divisor)-1)):
out[i] /= normalizer # for general polynomial division (when polynomials are non-monic),
# we need to normalize by dividing the coefficient with the divisor's first coefficient
coef = out[i]
if coef != 0: # useless to multiply if coef is 0
for j in xrange(1, len(divisor)): # in synthetic division, we always skip the first coefficient of the divisor,
# because it is only used to normalize the dividend coefficients
out[i + j] += -divisor[j] * coef
# The resulting out contains both the quotient and the remainder, the remainder being the size of the divisor (the remainder
# has necessarily the same degree as the divisor since it is what we couldn't divide from the dividend), so we compute the index
# where this separation is, and return the quotient and remainder.
separator = -(len(divisor)-1)
return out[:separator], out[separator:] # return quotient, remainder.
The problem for me is that out[i] /= normalizer. Would it always work with integer (floor) division for Yun's b1 = f/a0? Is it so that it is always possible to divide f/gcd(f, f')? Is the out[separator:] (remainder) always going to zero?
The fact that the "division in p/GCD(p, p') will always work (i.e. be "exact", with no remainder in Z)" follows from the definition of the GCD. For any polynomials p and q their GCD(p,q) divides both p and q exactly. That's why it is called GCD i.e. Greatest Common Divisor:
A greatest common divisor of p and q is a polynomial d that divides p and q and such that every common divisor of p and q also divides d.
P.S. it makes more sense to ask such purely mathematical questions at the more specialized https://math.stackexchange.com/

Vectorizing code to calculate (squared) Mahalanobis Distiance

EDIT 2: this post seems to have been moved from CrossValidated to StackOverflow due to it being mostly about programming, but that means by fancy MathJax doesn't work anymore. Hopefully this is still readable.
Say I want to to calculate the squared Mahalanobis distance between two vectors x and y with covariance matrix S. This is a fairly simple function defined by
M2(x, y; S) = (x - y)^T * S^-1 * (x - y)
With python's numpy package I can do this as
# x, y = numpy.ndarray of shape (n,)
# s_inv = numpy.ndarray of shape (n, n)
diff = x - y
d2 = diff.T.dot(s_inv).dot(diff)
or in R as
diff <- x - y
d2 <- t(diff) %*% s_inv %*% diff
In my case, though, I am given
m by n matrix X
n-dimensional vector mu
n by n covariance matrix S
and want to find the m-dimensional vector d such that
d_i = M2(x_i, mu; S) ( i = 1 .. m )
where x_i is the ith row of X.
This is not difficult to accomplish using a simple loop in python:
d = numpy.zeros((m,))
for i in range(m):
diff = x[i,:] - mu
d[i] = diff.T.dot(s_inv).dot(diff)
Of course, given that the outer loop is happening in python instead of in native code in the numpy library means it's not as fast as it could be. $n$ and $m$ are about 3-4 and several hundred thousand respectively and I'm doing this somewhat often in an interactive program so a speedup would be very useful.
Mathematically, the only way I've been able to formulate this using basic matrix operations is
d = diag( X' * S^-1 * X'^T )
where
x'_i = x_i - mu
which is simple to write a vectorized version of, but this is unfortunately outweighed by the inefficiency of calculating a 10-billion-plus element matrix and only taking the diagonal... I believe this operation should be easily expressible using Einstein notation, and thus could hopefully be evaluated quickly with numpy's einsum function, but I haven't even begun to figure out how that black magic works.
So, I would like to know: is there either a nicer way to formulate this operation mathematically (in terms of simple matrix operations), or could someone suggest some nice vectorized (python or R) code that does this efficiently?
BONUS QUESTION, for the brave
I don't actually want to do this once, I want to do it k ~ 100 times. Given:
m by n matrix X
k by n matrix U
Set of n by n covariance matrices each denoted S_j (j = 1..k)
Find the m by k matrix D such that
D_i,j = M(x_i, u_j; S_j)
Where i = 1..m, j = 1..k, x_i is the ith row of X and u_j is the jth row of U.
I.e., vectorize the following code:
# s_inv is (k x n x n) array containing "stacked" inverses
# of covariance matrices
d = numpy.zeros( (m, k) )
for j in range(k):
for i in range(m):
diff = x[i, :] - u[j, :]
d[i, j] = diff.T.dot(s_inv[j, :, :]).dot(diff)
First off, it seems like maybe you're getting S and then inverting it. You shouldn't do that; it's slow and numerically inaccurate. Instead, you should get the Cholesky factor L of S so that S = L L^T; then
M^2(x, y; L L^T)
= (x - y)^T (L L^T)^-1 (x - y)
= (x - y)^T L^-T L^-1 (x - y)
= || L^-1 (x - y) ||^2,
and since L is triangular L^-1 (x - y) can be computed efficiently.
As it turns out, scipy.linalg.solve_triangular will happily do a bunch of these at once if you reshape it properly:
L = np.linalg.cholesky(S)
y = scipy.linalg.solve_triangular(L, (X - mu[np.newaxis]).T, lower=True)
d = np.einsum('ij,ij->j', y, y)
Breaking that down a bit, y[i, j] is the ith component of L^-1 (X_j - \mu). The einsum call then does
d_j = \sum_i y_{ij} y_{ij}
= \sum_i y_{ij}^2
= || y_j ||^2,
like we need.
Unfortunately, solve_triangular won't vectorize across its first argument, so you should probably just loop there. If k is only about 100, that's not going to be a significant issue.
If you are actually given S^-1 rather than S, then you can indeed do this with einsum more directly. Since S is quite small in your case, it's also possible that actually inverting the matrix and then doing this would be faster. As soon as n is a nontrivial size, though, you're throwing away a lot of numerical accuracy by doing this.
To figure out what to do with einsum, write everything in terms of components. I'll go straight to the bonus case, writing S_j^-1 = T_j for notational convenience:
D_{ij} = M^2(x_i, u_j; S_j)
= (x_i - u_j)^T T_j (x_i - u_j)
= \sum_k (x_i - u_j)_k ( T_j (x_i - u_j) )_k
= \sum_k (x_i - u_j)_k \sum_l (T_j)_{k l} (x_i - u_j)_l
= \sum_{k l} (X_{i k} - U_{j k}) (T_j)_{k l} (X_{i l} - U_{j l})
So, if we make arrays X of shape (m, n), U of shape (k, n), and T of shape (k, n, n), then we can write this as
diff = X[np.newaxis, :, :] - U[:, np.newaxis, :]
D = np.einsum('jik,jkl,jil->ij', diff, T, diff)
where diff[j, i, k] = X_[i, k] - U[j, k].
Dougal nailed this one with an excellent and detailed answer, but thought I'd share a small modification that I found increases efficiency in case anyone else is trying to implement this. Straight to the point:
Dougal's method was as follows:
def mahalanobis2(X, mu, sigma):
L = np.linalg.cholesky(sigma)
y = scipy.linalg.solve_triangular(L, (X - mu[np.newaxis,:]).T, lower=True)
return np.einsum('ij,ij->j', y, y)
A mathematically equivalent variant I tried is
def mahalanobis2_2(X, mu, sigma):
# Cholesky decomposition of inverse of covariance matrix
# (Doing this in either order should be equivalent)
linv = np.linalg.cholesky(np.linalg.inv(sigma))
# Just do regular matrix multiplication with this matrix
y = (X - mu[np.newaxis,:]).dot(linv)
# Same as above, but note different index at end because the matrix
# y is transposed here compared to above
return np.einsum('ij,ij->i', y, y)
Ran both versions head-to-head 20x using identical random inputs and recorded the times (in milliseconds). For X as a 1,000,000 x 3 matrix (mu and sigma 3 and 3x3) I get:
Method 1 (min/max/avg): 30/62/49
Method 2 (min/max/avg): 30/47/37
That's about a 30% speedup for the 2nd version. I'm mostly going to be running this in 3 or 4 dimensions but to see how it scaled I tried X as 1,000,000 x 100 and got:
Method 1 (min/max/avg): 970/1134/1043
Method 2 (min/max/avg): 776/907/837
which is about the same improvement.
I mentioned this in a comment on Dougal's answer but adding here for additional visibility:
The first pair of methods above take a single center point mu and covariance matrix sigma and calculate the squared Mahalanobis distance to each row of X. My bonus question was to do this multiple times with many sets of mu and sigma and output a two-dimensional matrix. The set of methods above can be used to accomplish this with a simple for loop, but Dougal also posted a more clever example using einsum.
I decided to compare these methods with each other by using them to solve the following problem: Given k d-dimensional normal distributions (with centers stored in rows of k by d matrix U and covariance matrices in the last two dimensions of the k by d by d array S), find the density at the n points stored in rows of the n by d matrix X.
The density of a multivariate normal distribution is a function of the squared Mahalanobis distance of the point to the mean. Scipy has an implementation of this as scipy.stats.multivariate_normal.pdf to use as a reference. I ran all three methods against each other 10x using identical random parameters each time, with d=3, k=96, n=5e5. Here are the results, in points/sec:
[Method]: (min/max/avg)
Scipy: 1.18e5/1.29e5/1.22e5
Fancy 1: 1.41e5/1.53e5/1.48e5
Fancy 2: 8.69e4/9.73e4/9.03e4
Fancy 2 (cheating version): 8.61e4/9.88e4/9.04e4
where Fancy 1 is the better of the two methods above and Fancy2 is Dougal's 2nd solution. Since the Fancy 2 needs to calculate the inverses of all the covariance matrices I also tried a "cheating version" where it was passed these as a parameter, but it looks like that didn't make a difference. I had planned on including the non-vectorized implementation but that was so slow it would have taken all day.
What we can take away from this is that using Dougal's first method is about 20% faster than however Scipy does it. Unfortunately despite its cleverness the 2nd method is only about 60% as fast as the first. There are probably some other optimizations that can be done but this is already fast enough for me.
I also tested how this scaled with higher dimensionality. With d=100, k=96, n=1e4:
Scipy: 7.81e3/7.91e3/7.86e3
Fancy 1: 1.03e4/1.15e4/1.08e4
Fancy 2: 3.75e3/4.10e3/3.95e3
Fancy 2 (cheating version): 3.58e3/4.09e3/3.85e3
Fancy 1 seems to have an even bigger advantage this time. Also worth noting that Scipy threw a LinAlgError 8/10 times, probably because some of my randomly-generated 100x100 covariance matrices were close to singular (which may mean that the other two methods are not as numerically stable, I did not actually check the results).

How to obtain the numerical solution of these differential equations with matlab

I have differential equations derived from epidemic spreading. I want to obtain the numerical solutions. Here's the equations,
t is a independent variable and ranges from [0,100].
The initial value is
y1 = 0.99; y2 = 0.01; y3 = 0;
At first, I planned to deal these with ode45 function in matlab, however, I don't know how to express the series and the combination. So I'm asking for help here.
**
The problem is how to express the right side of the equations as the odefun, which is a parameter in the ode45 function.
**
Matlab has functions to calculate binomial coefficients (number of combinations) and the finite series can be expressed just as matrix multiplication. I'll demonstrate how that works for the sum in the first equation. Note the use of the element-wise "dotted" forms of the arithmetic operators.
Calculate a row vector coefs with the constant coefficients in the sum as:
octave-3.0.0:33> a = 0:20;
octave-3.0.0:34> coefs = log2(a * 0.05 + 1) .* bincoeff(20, a);
The variables get combined into another vector:
octave-3.0.0:35> y1 = 0.99;
octave-3.0.0:36> y2 = 0.01;
octave-3.0.0:37> z = (y2 .^ a) .* ((1 - y2) .^ a) .* (y1 .^ a);
And the sum is then just evaluated as the inner product:
octave-3.0.0:38> coefs * z'
The other sums are similar.
function demo(a_in)
X = [0;0;0];
T = [0:.1:100];
a = a_in; % for nested scope
[Xout, Tout ]= ode45( #myFunc, T, X );
function [dxdt] = myFunc( t, x )
% nested function accesses "a"
dxdt = 0*x + a;
% Todo: real value of dxdt.
end
end
What about this, and you simply need to fill in the dxdt from your math above? It remains to be seen if the numerical roundoff matters...
Edit: there's a serious issue due to the 1=y1+y2+y3 constraint. Is that even allowed, since you have an IVP with 3 initial values given and 3 first order ODE's? If that constraint is a natural consequence of the equations, it may not be needed.

Resources