Mahout - Cosine distance between vectors > 1 - vector

I'm using Mahout's CosineDistanceMeasure class in order to calculate the distance between vectors, represented as DenseVectors.
DenseVector vector1 = ... //initialized to some values
DenseVector vector2 = ... //initialized to some values
CosineDistanceMeasure cos = new CosineDistanceMeasure();
cos.distance(vector1, vector2);
Now, for some couples of vectors, the method distance() returns values bigger than 1, while I thought cosine distance was supposed to be included between 0 and 1.
Can anybody explain this behaviour?
Thank you in advance!

Related

Where can I find the Scilab balanc() function to calculate the similarity transform to program it in Maxima

I'm trying to program the z-transform in wxMaxima which doesn't have it programmed but not by definition but by using the Scilab approach. Scilab to calculate the z-transform first converts the transfer function to the state space, after that the system must be discretized and after that converted to z transfer function, I need this because of some algebraic calculations that I need to do to analyze stability of a system in function of the sample period.
Right now I'm stranded with the function balanc() which finds a similarity transform such that
Ab = X^(-1) . A . X
as approximately equal row and column norms.
Most of my code in wxMaxima to reach in the near future has been done by translating the Scilab code into wxMaxima, currently I'm writing the tf2ss() function an inside that function the balanc() function is called, the problem is that I couldn't find the code for that function in Scilab installation directory, I've searched info in books and papers but every example starts with the Ab matrix given as an input to the problem, Scilab instead has the option to have as an input only the A matrix and it calculates the Ab and X matrices, so, I need help to make this function exactly as Scilab has it programmed to been able to compare all the steps that I'm doing.
Finally, wxMaxima has a function to calculate similarity transforms but it don't have the same output as Scilab what it means to me that they uses different criteria to calculate the similarity transform.
Note: I've tried to make the calculations in wxMaxima to have Ab and X matrices as elements with variables but the system of equations remains with too many variables and couldn't be solved.
Thanks in advance for the help in doing this.
In Scilab balanc() is hard-coded and based on LAPACK's dgebal (see the Fortran source at Netlib). In the algorithm the operations are quite simple (computing inf and 2-norms, swaping columns or rows of a matrix), maybe this could easily translated ?
A more readable version of the algorithm can be found on page 3 (Algorithm 2) of the following document: https://arxiv.org/abs/1401.5766.
Here is a Scilab implementation of Algorithm 3:
function [A,X]=bal(Ain)
A = Ain;
n = size(A,1);
X = ones(n,1);
β = 2; // multiply or divide by radix preserves precision
p = 2; // eventually change to 1-norm
converged = 0;
while converged == 0
converged = 1;
for i=1:n
c = norm(A(:,i),p);
r = norm(A(i,:),p);
s = c^p+r^p;
f = 1;
while c < r/β
c = c*β;
r = r/β;
f = f*β;
end
while c >= r*β
c = c/β;
r = r*β;
f = f/β;
end
if (c^p+r^p) < 0.95*s
converged = 0;
X(i) = f*X(i);
A(:,i) = f*A(:,i);
A(i,:) = A(i,:)/f;
end
end
end
X = diag(X);
endfunction
On this example the above implementation gives the same balanced matrix:
--> A=rand(5,5,"normal"); A(:,1)=A(:,1)*1024; A(2,:)=A(2,:)/1024
A =
897.30729 -1.6907865 -1.0217046 -0.9181476 -0.1464695
-0.5430253 -0.0011318 -0.0000356 -0.001277 -0.00038
-774.96457 3.1685332 0.1467254 -0.410953 -0.6165827
155.22118 0.1680727 -0.2262445 -0.3402948 1.6098294
1423.0797 -0.3302511 0.5909125 -1.2169245 -0.7546739
--> [Ab,X]=balanc(A)
Ab =
897.30729 -0.8453932 -32.694547 -14.690362 -9.3740507
-1.0860507 -0.0011318 -0.0022789 -0.0408643 -0.0486351
-24.217643 0.0495083 0.1467254 -0.2054765 -1.2331655
9.7013239 0.0052523 -0.452489 -0.3402948 6.4393174
22.23562 -0.0025801 0.2954562 -0.3042311 -0.7546739
X =
0.03125 0. 0. 0. 0.
0. 0.015625 0. 0. 0.
0. 0. 1. 0. 0.
0. 0. 0. 0.5 0.
0. 0. 0. 0. 2.
--> [Ab,X]=bal(A)
Ab =
897.30729 -0.8453932 -32.694547 -14.690362 -9.3740507
-1.0860507 -0.0011318 -0.0022789 -0.0408643 -0.0486351
-24.217643 0.0495083 0.1467254 -0.2054765 -1.2331655
9.7013239 0.0052523 -0.452489 -0.3402948 6.4393174
22.23562 -0.0025801 0.2954562 -0.3042311 -0.7546739
X =
1. 0. 0. 0. 0.
0. 0.5 0. 0. 0.
0. 0. 32. 0. 0.
0. 0. 0. 16. 0.
0. 0. 0. 0. 64.

Julia error using convex package with diagind function

I'm trying to solve the problem
d = 0.5 * ||X - \Sigma||_{Frobenius Norm} + 0.01 * ||XX||_{1},
where X is a symmetric positive definite matrix, and all the diagnoal element should be 1. XX is same with X except the diagonal matrix is 0. \Sigma is known, I want minimum d with X.
My code is as following:
using Convex
m = 5;
A = randn(m, m);
x = Semidefinite(5);
xx=x;
xx[diagind(xx)].=0;
obj=vecnorm(A-x,2)+sumabs(xx)*0.01;
pro= minimize(obj, [x >= 0]);
pro.constraints+=[x[diagind(x)].=1];
solve!(pro)
MethodError: no method matching diagind(::Convex.Variable)
I just solve the optimal problem by constrain the diagonal elements in matrix, but it seems diagind function could not work here, How can I solve the problem.
I think the following does what you want:
m = 5
Σ = randn(m, m)
X = Semidefinite(m)
XX = X - diagm(diag(X))
obj = 0.5 * vecnorm(X - Σ, 2) + 0.01 * sum(abs(XX))
constraints = [X >= 0, diag(X) == 1]
pro = minimize(obj, constraints)
solve!(pro)
For the types of operations:
diag extracts the diagonal of a matrix, as a vector
diagm constructs a diagonal matrix out of a vector
So, to have XX be X with zero diagonal, we subtract the diagonal of X from it. And to constrain X having diagonal 1, we compare its diagonal with 1, using ==.
It is a good idea to keep immutable values as far as possible, instead of trying to modify things. I don't know whether Convex even supports that.

Inverse of matrix and numerical integration in R

in R I try to
1) get a general form of an inverse of a matrix (I mean a matrix with parameters instead of specific numbers),
2) then use this to compute an integral.
I mean, I've got a P matrix with a parameter theta, I need to add and subtract something, then take an inverse of this and multiply it by a vector so that I am given a vector pil. From the vector pil I take term by term and multiply it by a function with again the parameter theta and the result must be integrated from 0 to infinity.
I tried this, but it didn't work because I know the result should be pst=
(0.3021034 0.0645126 0.6333840)
c<-0.1
g<-0.15
integrand1 <- function(theta) {
pil1 <- function(theta) {
P<-matrix(c(
1-exp(-theta), 1-exp(-theta),1-exp(-theta),exp(-theta),0,0,0,exp(-theta),exp(-theta)
),3,3);
pil<-(rep(1,3))%*%solve(diag(1,3)-P+matrix(1,3,3));
return(pil[[1]])
}
q<-pil1(theta)*(c^g/gamma(g)*theta^(g-1)*exp(-c*theta))
return(q)}
(pst1<-integrate(integrand1, lower = 0, upper = Inf)$value)
#0.4144018
This was just for the first term of the vector pst, because when I didn't know how to a for cycle for this.
Please, do you have any idea why it won't work and how to make it work?
Functions used in integrate should be vectorized as stated in the help.
At the end of your code add this
integrand2 <- Vectorize(integrand1)
integrate(integrand2, lower = 0, upper = Inf)$value
#[1] 0.3021034
The result is the first element of your expected result.
You will have to present more information about the input to get your expected vector.

Euclidean distance between two n-dimenstional vectors

What's an easy way to find the Euclidean distance between two n-dimensional vectors in Julia?
Here is a simple way
n = 10
x = rand(n)
y = rand(n)
d = norm(x-y) # The euclidean (L2) distance
For Manhattan/taxicab/L1 distance, use norm(x-y,1)
This is easily done thanks to the lovely Distances package:
Pkg.add("Distances") #if you don't have it
using Distances
one7d = rand(7)
two7d = rand(7)
dist = euclidean(one7d,two7d)
Also if you have say 2 matrices of 9d col vectors, you can get the distances between each corresponding pair using colwise:
thousand9d1 = rand(9,1000)
thousand9d2 = rand(9,1000)
dists = colwise(Euclidean(), thousand9d1, thousand9d2)
#returns: 1000-element Array{Float64,1}
You can also compare to a single vector e.g. the origin (if you want the magnitude of each column vector)
origin9 = zeros(9)
mags = colwise(Euclidean(), thousand9ds1, origin9)
#returns: 1000-element Array{Float64,1}
Other distances are also available:
Squared Euclidean
Cityblock
Chebyshev
Minkowski
Hamming
Cosine
Correlation
Chi-square
Kullback-Leibler divergence
Jensen-Shannon divergence
Mahalanobis
Squared Mahalanobis
Bhattacharyya
Hellinger
More details at the package's github page here.

Calculate the length of a segment of a quadratic bezier

I use this algorithm to calculate the length of a quadratic bezier:
http://www.malczak.linuxpl.com/blog/quadratic-bezier-curve-length/
However, what I wish to do is calculate the length of the bezier from 0 to t where 0 < t < 1
Is there any way to modify the formula used in the link above to get the length of the first segment of a bezier curve?
Just to clarify, I'm not looking for the distance between q(0) and q(t) but the length of the arc that goes between these points.
(I don't wish to use adaptive subdivision to aproximate the length)
Since I was sure a similar form solution would exist for that variable t case - I extended the solution given in the link.
Starting from the equation in the link:
Which we can write as
Where b = B/(2A) and c = C/A.
Then transforming u = t + b we get
Where k = c - b^2
Now we can use the integral identity from the link to obtain:
So, in summary, the required steps are:
Calculate A,B,C as in the original equation.
Calculate b = B/(2A) and c = C/A
Calculate u = t + b and k = c -b^2
Plug these values into the equation above.
[Edit by Spektre] I just managed to implement this in C++ so here the code (and working correctly matching naively obtained arc lengths):
float x0,x1,x2,y0,y1,y2; // control points of Bezier curve
float get_l_analytic(float t) // get arclength from parameter t=<0,1>
{
float ax,ay,bx,by,A,B,C,b,c,u,k,L;
ax=x0-x1-x1+x2;
ay=y0-y1-y1+y2;
bx=x1+x1-x0-x0;
by=y1+y1-y0-y0;
A=4.0*((ax*ax)+(ay*ay));
B=4.0*((ax*bx)+(ay*by));
C= (bx*bx)+(by*by);
b=B/(2.0*A);
c=C/A;
u=t+b;
k=c-(b*b);
L=0.5*sqrt(A)*
(
(u*sqrt((u*u)+k))
-(b*sqrt((b*b)+k))
+(k*log(fabs((u+sqrt((u*u)+k))/(b+sqrt((b*b)+k)))))
);
return L;
}
There is still room for improvement as some therms are computed more than once ...
While there may be a closed form expression, this is what I'd do:
Use De-Casteljau's algorithm to split the bezier into the 0 to t part and use the algorithm from the link to calculate its length.
You just have to evaluate the integral not between 0 and 1 but between 0 and t. You can use the symbolic toolbox of your choice to do that if you're not into the math. For instance:
http://integrals.wolfram.com/index.jsp?expr=Sqrt\[a*x*x%2Bb*x%2Bc\]&random=false
Evaluate the result for x = t and x = 0 and subtract them.

Resources