Which method of matrix determinant calculation is this?

This is the approach John Carmack uses to calculate the determinant of a 4x4 matrix. From my investigations i have determined that it starts out like the laplace expansion theorem but then goes on to calculate 3x3 determinants which doesn't seem to agree with any papers i've read.
// 2x2 sub-determinants
float det2_01_01 = mat[0][0] * mat[1][1] - mat[0][1] * mat[1][0];
float det2_01_02 = mat[0][0] * mat[1][2] - mat[0][2] * mat[1][0];
float det2_01_03 = mat[0][0] * mat[1][3] - mat[0][3] * mat[1][0];
float det2_01_12 = mat[0][1] * mat[1][2] - mat[0][2] * mat[1][1];
float det2_01_13 = mat[0][1] * mat[1][3] - mat[0][3] * mat[1][1];
float det2_01_23 = mat[0][2] * mat[1][3] - mat[0][3] * mat[1][2];
// 3x3 sub-determinants
float det3_201_012 = mat[2][0] * det2_01_12 - mat[2][1] * det2_01_02 + mat[2][2] * det2_01_01;
float det3_201_013 = mat[2][0] * det2_01_13 - mat[2][1] * det2_01_03 + mat[2][3] * det2_01_01;
float det3_201_023 = mat[2][0] * det2_01_23 - mat[2][2] * det2_01_03 + mat[2][3] * det2_01_02;
float det3_201_123 = mat[2][1] * det2_01_23 - mat[2][2] * det2_01_13 + mat[2][3] * det2_01_12;
return ( - det3_201_123 * mat[3][0] + det3_201_023 * mat[3][1] - det3_201_013 * mat[3][2] + det3_201_012 * mat[3][3] );
Could someone explain to me how this approach works or point me to a good write up which uses the same approach?
If it matters this matrix is row major.

It seems to be the method that involves using minors. The mathematical aspect can be found on wikipedia at
Basically you reduce the matrix to something smaller and easier to compute, and sum those results up (it involves some (-1) factors which should be described on the page i linked to).

He uses the standard formula where you can compute, in pseudocode,
det(M) = sum(M[0, i] * det(M.minor[0, i]) * (-1)^i)
Here minor[0, i] is a matrix you obtain by crossing out 0-th row and i-th column from your original matrix and (-1)*i stands for i-th power of -1.
The same (up to an overall sign) formula will work if you take a different row or if you make a loop over a column. If you think about how det is defined, it's pretty self-explanatory. Note how for 2-matrix this becomes:
det(M) = M[0, 0] * M[1, 1] * (+1) + M[0, 1] * M[1, 0] * (-1)
or, by row 1 rather then 0,
-det(M) = M[1, 0] * M[0, 1] * (+1) + M[1, 1] * M[0, 0] * (-1)
– you should recognize the standard formula for determinant of 2x2 matrix.
Similarly, for a 3-matrix composed as N = [[a, b, c], [d, e, f], [g, h, i]] this leads to the formula
det(N) = a * det([[e, f], [h, i]]) - b * det([[d, f], [g, i]]) + c * det([[d, e], [g, h]])
which of course becomes the textbook formula
a*e*i + b*f*g + c*d*h - c*e*g - a*f*h - b*d*i
once you expand each of 2x2 determinants.
Now if you take a 4-matrix X, you will see that to compute det(X) you need to compute determinants of 4 minors, each minor being a 3x3 matrix; but you can also expand them further so you'll have the determinants of 6 2x2 matrices with some coefficients. You should really try it yourself similarly to what is above for 3x3 matrices.


Replicating `np.einsum` result via normal matrix operations

I have implemented a TCB Spline in Python via Numpy. The critical piece of the code appears below:
np.einsum('km,km,kl,lm->m',xdiffpow_knot, h_pow_knot[:,i], hermite_matrix, lag_knot[:,i])
where k and l are always 4 (k being powers of 0 to 3 and l being the 4 control points used by TCB splines) and m is the length of the array x I want to interpolate.
I implemented via np.einsum at the time as I couldn't figure out the necessary matrix operations to do it without np.einsum. It seems as though I'm left with an extra m in the result (note that the first two km terms are simply element-wise multiplication).
Now I'm reimplementing in Julia without einsum (so I can take advantage of algorithmic differentiation in ForwardDiff, ReverseDiff, etc.). How do I replicate the above einsum via matrix operations?
What have I tried?
Thinking solely about the dimensions involved and making the dot-product work, it feels as though I'm missing an m-element vector. The only m element vector that makes sense is a 1-s vector, which I believe would act as a summation. But I'd like to validate that this is correct in theory before I rely on it.
Full Code. It's Ugly...
HERMITE_MATRIX = np.array([[ 2.,-2., 1., 1.],
[-3., 3.,-2.,-1.],
[ 0., 0., 1., 0.],
[ 1., 0., 0., 0.]])
def hermite(x_knot, y_knot, tension=0.0, continuity=0.0, bias=0.0, weight_fwd_knot=None, hermite_matrix=HERMITE_MATRIX):
order = 3
h_knot = np.diff(x_knot)
h_pow_knot = (1.0 / h_knot) ** np.arange(order, -1, -1)[:,None]
is_first_knot = np.isclose(x_knot, x_knot[0])
is_last_knot = np.isclose(x_knot, x_knot[-1])
y_next_knot = np.roll(y_knot,-1)
y_prev_knot = np.roll(y_knot, 1)
x_next_knot = np.roll(x_knot,-1)
x_prev_knot = np.roll(x_knot, 1)
if weight_fwd_knot is None:
weight_fwd_knot = np.where(is_last_knot, 0.0, np.where(is_first_knot, 1.0, (x_next_knot - x_knot)/(x_next_knot - x_prev_knot)))
weight_bak_knot = 1.0 - weight_fwd_knot
dydxfwd_knot = np.where(is_last_knot, (y_knot - y_prev_knot)/(x_knot - x_prev_knot), (y_next_knot - y_knot)/(x_next_knot - x_knot))
dydxbak_knot = np.where(is_first_knot, (y_next_knot - y_knot)/(x_next_knot - x_knot), (y_knot - y_prev_knot)/(x_knot - x_prev_knot))
dy_in_knot = (1 - tension) * ((1 + continuity) * (1 - bias) * dydxfwd_knot * weight_fwd_knot + (1 - continuity) * (1 + bias) * dydxbak_knot * weight_bak_knot)
dy_out_knot = (1 - tension) * ((1 - continuity) * (1 - bias) * dydxfwd_knot * weight_fwd_knot + (1 + continuity) * (1 + bias) * dydxbak_knot * weight_bak_knot)
lag_knot = np.array([y_knot[:-1], y_next_knot[:-1], dy_out_knot[:-1] * h_knot, dy_in_knot[1:] * h_knot])
def f(x):
i = np.maximum(np.minimum(np.searchsorted(x_knot, x, side="right") - 1, x_knot.size - 2), 0)
xdiffpow_knot = (x - x_knot[i]) ** np.arange(order, -1, -1)[:,None]
return np.einsum('km,km,kl,lm->m',xdiffpow_knot, h_pow_knot[:,i], hermite_matrix, lag_knot[:,i])
return f
>>> a = np.array([[1,2],[3,4]])
>>> np.einsum('km,km,kl,lm->m',a,a,a,a)
array([142, 392])
This can be computed using basic linear algebra operations.
Observe the 'kl,lm' part is traditional matrix multiplication and can be sub-computed to yield 'km':
>>> x = np.matmul(a,a)
Now the remaining 'km,km,km->m' is element wise multiplication and summing over index 'k'
>>> y = a * a * x
>>> y
array([[ 7, 40],
[135, 352]])
>>> np.sum(y, axis=0)
array([142, 392])

How to find where an equation equals zero

Say I have a function and I find the second derivative like so:
xyr <- D(expression(14252/(1+exp((-1/274.5315)*(x-893)))), 'x')
D2 <- D(xyr, 'x')
it gives me back as, typeof 'language':
-(14252 * (exp((-1/274.5315) * (x - 893)) * (-1/274.5315) * (-1/274.5315))/(1 +
exp((-1/274.5315) * (x - 893)))^2 - 14252 * (exp((-1/274.5315) *
(x - 893)) * (-1/274.5315)) * (2 * (exp((-1/274.5315) * (x -
893)) * (-1/274.5315) * (1 + exp((-1/274.5315) * (x - 893)))))/((1 +
exp((-1/274.5315) * (x - 893)))^2)^2)
how do I find where this is equal to 0?
A little bit clumsy to use a graph/solver for this, since your initial function as the form:
f(x) = c / ( 1 + exp(ax+b) )
You derive twice and solve for f''(x) = 0 :
f''(x) = c * a^2 * exp(ax+b) * (1+exp(ax+b)) * [-1 + exp(ax+b)] / ((1+exp(ax+b))^3)
Which is equivalent that the numerator equals 0 - since a, c, exp() and 1+exp() are always positive the only term which can be equal to zero is:
exp(ax+b) - 1 = 0
x = -b/a
Here a =-1/274.5315, b=a*(-893). So x=893.
Just maths ;)
from applied mathematician point of view, it's always better to have closed form/semi-closed form solution than using solver or optimization. You gain in speed and in accuracy.
from pur mathematician point of view, it's more elegant!
You can use uniroot after having created a function from your derivative expression:
f = function(x) eval(D2)
uniroot(f,c(0,1000)) # The second argument is the interval over which you want to search roots.
#[1] 893
#[1] -2.203307e-13
#[1] 7
#[1] NA
#[1] 6.103516e-05

Quadratic Bezier Curve: Calculate t given x

Good day. I am using a Quadratic Bezier Curve with the following configurations:
Start Point P1 = (1, 2)
Anchor Point P2 = (1, 8)
End Point P3 = (10, 8)
I know that given a t, I know I can solve for x and y using the following equation:
t = 0.5; // given example value
x = (1 - t) * (1 - t) * P1.x + 2 * (1 - t) * t * P2.x + t * t * P3.x;
y = (1 - t) * (1 - t) * P1.y + 2 * (1 - t) * t * P2.y + t * t * P3.y;
where P1.x is the x coordinate of P1, and so on.
What I've tried now is that given an x value, I calculate for t using wolframalpha and then I plug that t in to the y equation and I get a my x and y point.
However, I want to automate finding t and then y. I have a formula to get x and y given a t. However, I don't have a formula to get t based on x. I'm a bit rusty with my algebra and expanding the first equation to isolate t doesn't look too easy.
Does anyone have a formula to get t based on x? My google search skills are failing me as of now.
I think it's also worth noting that my Bezier curve faces right.
Any help will be very much appreciated. Thanks.
problem is that what you want to solve is not function in general
for any t is just one (x,y) pair
but for any x there can be 0,1,2,+inf solutions of t
I would do this iteratively
you already can get any point p(t)=Bezier(t) so use iteration of t to minimize distance |p(t).x-x|
find all local mins of d=|p(t).x-x|
so when d start rising again set dt*=-0.1 and stop if |dt|<1e-6 or any other threshold. Stop if t is out of interval <0,1> and remember the solution to some list. Restore original t,dt and reset the local min search variables
process all local mins
eliminate all that has bigger distance then some threshold/accuracy compute y and do what you need with the point ...
It is much slower then algebraic approach but you can use this for any curvature not just quadratic
Usually cubic curves are used and do this algebraically with them is a nightmare.
Look at your Bernstein polynomials B[i]; you have...
x = SUM_i ( B[i](t) * P[i].x )
B[0](t) = t^2 - 2*t + 1
B[1](t) = -2*t^2 + 2*t
B[2](t) = t^2
...so you can rearrange (assuming I did this right)...
0 = (P[0].x - 2*P[1].x + P[2].x) * t^2 + (-2*P[0].x + 2*P[1].x) * t + P[0].x - x
Now you should just be able to use the quadratic formula to find if the solutions for t exist (i.e., are real, not complex), and what they are.
import numpy as np
import matplotlib.pyplot as plt
#Control points
p0=(1000,2500); p1=(2000,-1500); p2=(5000,3000)
#x-coordinates to fit
xcoord = [1750., 2750., 3950.,4760., 4900.]
# t variable with as few points as needed, considering accuracy. I found 30 is good enough
t = np.linspace(0,1,30)
# calculate coordinates of quadratic Bezier curve
x = (1 - t) * (1 - t) * p0[0] + 2 * (1 - t) * t * p1[0] + t * t * p2[0];
y = (1 - t) * (1 - t) * p0[1] + 2 * (1 - t) * t * p1[1] + t * t * p2[1];
# find the closest points to each x-coordinate. Interpolate y-coordinate
for ind in xcoord:
for jnd in range(len(x[:-1])):
if ind >= x[jnd] and ind <= x[jnd+1]:
ytemp = (ind-x[jnd])*(y[jnd+1]-y[jnd])/(x[jnd+1]-x[jnd]) + y[jnd]
plt.xlim(0, 6000)
plt.ylim(-2000, 4000)
plt.plot(p0[0],p0[1],'kx', p1[0],p1[1],'kx', p2[0],p2[1],'kx')
plt.plot((p0[0],p1[0]),(p0[1],p1[1]),'k:', (p1[0],p2[0]),(p1[1],p2[1]),'k:')
plt.plot(x,y,'r', x, y, 'k:')
plt.plot(xcoord, ycoord, 'rs')

How to find the interception coordinates of a moving target in 3D space?

Assuming I have a spaceship (source); And an asteroid (target) is somewhere near it.
I know, in 3D space (XYZ vectors):
My ship's position (sourcePos) and velocity (sourceVel).
The asteroid's position (targetPos) and velocity (targetVel).
(eg. sourcePos = [30, 20, 10]; sourceVel = [30, 20, 10]; targetPos = [600, 400, 200]; targetVel = [300, 200, 100]`)
I also know that:
The ship's velocity is constant.
The asteroid's velocity is constant.
My ship's projectile speed (projSpd) is constant.
My ship's projectile trajectory, after being shot, is linear (/straight).
(eg. projSpd = 2000.00)
How can I calculate the interception coordinates I need to shoot at in order to hit the asteroid?
This question is based on this Yahoo - Answers page.
I also searched for similar problems on Google and here on SO, but most of the answers are for 2D-space, and, of the few for 3D, neither the explanation nor the pseudo-codes explain what is doing what and/or why, so I couldn't really understand enough to apply them on my code successfully. Here are some of the pages I visited:
Danik Games Devlog, Blitz3D Forums thread, UnityAnswers, StackOverflow #1, StackOverflow #2
I really can't figure out the maths / execution-flow on the linked pages as they are, unless someone dissects it (further) into what is doing what, and why;
Provides a properly-commented pseudo-code for me to follow;
Or at least points me to links that actually explain how the equations work instead of just throwing even more random numbers and unfollowable equations in my already-confused psyche.
I find the easiest approach to these kind of problems to make sense of them first, and have a basic high school level of maths will help too.
Solving this problem is essentially solving 2 equations with 2 variables which are unknown to you:
The vector you want to find for your projectile (V)
The time of impact (t)
The variables you know are:
The target's position (P0)
The target's vector (V0)
The target's speed (s0)
The projectile's origin (P1)
The projectile's speed (s1)
Okay, so the 1st equation is basic. The impact point is the same for both the target and the projectile. It is equal to the starting point of both objects + a certain length along the line of both their vectors. This length is denoted by their respective speeds, and the time of impact. Here's the equation:
P0 + (t * s0 * V0) = P1 + (t * s0 * V)
Notice that there are two missing variables here - V & t, and so we won't be able to solve this equation right now. On to the 2nd equation.
The 2nd equation is also quite intuitive. The point of impact's distance from the origin of the projectile is equal to the speed of the projectile multiplied by the time passed:
We'll take a mathematical expression of the point of impact from the 1st equation:
P0 + (t * s0 * V0) <-- point of impact
The point of origin is P1
The distance between these two must be equal to the speed of the projectile multiplied by the time passed (distance = speed * time).
The formula for distance is: (x0 - x1)^2 + (y0 - y1)^2 = distance^2, and so the equation will look like this:
((P0.x + s0 * t * V0.x) - P1.x)^2 + ((P0.y + s0 * t * V0.y) - P1.y)^2 = (s1 * t)^2
(You can easily expand this for 3 dimensions)
Notice that here, you have an equation with only ONE unknown variable: t!. We can discover here what t is, then place it in the previous equation and find the vector V.
Let me solve you some pain by opening up this formula for you (if you really want to, you can do this yourself).
a = (V0.x * V0.x) + (V0.y * V0.y) - (s1 * s1)
b = 2 * ((P0.x * V0.x) + (P0.y * V0.y) - (P1.x * V0.x) - (P1.y * V0.y))
c = (P0.x * P0.x) + (P0.y * P0.y) + (P1.x * P1.x) + (P1.y * P1.y) - (2 * P1.x * P0.x) - (2 * P1.y * P0.y)
t1 = (-b + sqrt((b * b) - (4 * a * c))) / (2 * a)
t2 = (-b - sqrt((b * b) - (4 * a * c))) / (2 * a)
Now, notice - we will get 2 values for t here.
One or both may be negative or an invalid number. Obviously, since t denotes time, and time can't be invalid or negative, you'll need to discard these values of t.
It could very well be that both t's are bad (in which case, the projectile cannot hit the target since it's faster and out of range). It could also be that both t's are valid and positive, in which case you'll want to choose the smaller of the two (since it's preferable to hit the target sooner rather than later).
t = smallestWhichIsntNegativeOrNan(t1, t2)
Now that we've found the time of impact, let's find out what the direction the projectile should fly is. Back to our 1st equation:
P0 + (t * s0 * V0) = P1 + (t * s0 * V)
Now, t is no longer a missing variable, so we can solve this quite easily. Just tidy up the equation to isolate V:
V = (P0 - P1 + (t * s0 * V0)) / (t * s1)
V.x = (P0.x - P1.x + (t * s0 * V0.x)) / (t * s1)
V.y = (P0.y - P1.y + (t * s0 * V0.y)) / (t * s1)
And that's it, you're done!
Assign the vector V to the projectile and it will go to where the target will be rather than where it is now.
I really like this problem since it takes math equations we learnt in high school where everyone said "why are learning this?? we'll never use it in our lives!!", and gives them a pretty awesome and practical application.
I hope this helps you, or anyone else who's trying to solve this.
If you want a projectile to hit asteroid, it should be shoot at the point interceptionPos that satisfy the equation:
|interceptionPos - sourcePos| / |interceptionPos - targetPos| = projSpd / targetVel
where |x| is a length of vector x.
In other words, it would take equal amount of time for the target and the projectile to reach this point.
This problem would be solved by means of geometry and trigonometry, so let's draw it.
A will be asteroid position, S - ship, I - interception point.
Here we have:
AI = targetVel * t
SI = projSpd * t
AS = |targetPos - sourcePos|
vector AS and AI direction is defined, so you can easily calculate cosine of the SAI angle by means of simple vector math (take definitions from here and here). Then you should use the Law of cosines with the SAI angle. It will yield a quadratic equation with variable t that is easy to solve (no solutions = your projectile is slower than asteroid). Just pick the positive solution t, your point-to-shoot will be
targetPos + t * targetVel
I hope you can write a code to solve it by yourself. If you cannot get something please ask in comments.
I got a solution. Notice that the ship position, and the asteroid line (position and velocity) define a 3D plane where the intercept point lies. In my notation below | [x,y,z] | denotes the magnitude of the vector or Sqrt(x^2+y^2+z^2).
Notice that if the asteroid travels with targetSpd = |[300,200,100]| = 374.17 then to reach the intercept point (still unknown, called hitPos) will require time equal to t = |hitPos-targetPos|/targetSpd. This is the same time the projectile needs to reach the intercept point, or t = |hitPos - sourcePos|/projSpd. The two equations are used to solve for the time to intercept
t = |targetPos-sourcePos|/(projSpd - targetSpd)
= |[600,400,200]-[30,20,10]|/(2000 - |[300,200,100]|)
= 710.81 / ( 2000-374.17 ) = 0.4372
Now the location of the intetception point is found by
hitPos = targetPos + targetVel * t
= [600,400,200] + [300,200,100] * 0.4372
= [731.18, 487.45, 243.73 ]
Now that I know the hit position, I can calculate the direction of the projectile as
projDir = (hitPos-sourcePos)/|hitPos-sourcePos|
= [701.17, 467.45, 233.73]/874.52 = [0.8018, 0.5345, 0.2673]
Together the projDir and projSpd define the projectile velocity vector.
Credit to Gil Moshayof's answer, as it really was what I worked off of to build this. But they did two dimensions, and I did three, so I'll share my Unity code in case it helps anyone along. A little long winded and redundant. It helps me to read it and know what's going on.
Vector3 CalculateIntercept(Vector3 targetLocation, Vector3 targetVelocity, Vector3 interceptorLocation, float interceptorSpeed)
Vector3 A = targetLocation;
float Ax = targetLocation.x;
float Ay = targetLocation.y;
float Az = targetLocation.z;
float As = targetVelocity.magnitude;
Vector3 Av = Vector3.Normalize(targetVelocity);
float Avx = Av.x;
float Avy = Av.y;
float Avz = Av.z;
Vector3 B = interceptorLocation;
float Bx = interceptorLocation.x;
float By = interceptorLocation.y;
float Bz = interceptorLocation.z;
float Bs = interceptorSpeed;
float t = 0;
float a = (
Mathf.Pow(As, 2) * Mathf.Pow(Avx, 2) +
Mathf.Pow(As, 2) * Mathf.Pow(Avy, 2) +
Mathf.Pow(As, 2) * Mathf.Pow(Avz, 2) -
Mathf.Pow(Bs, 2)
if (a == 0)
Debug.Log("Quadratic formula not applicable");
return targetLocation;
float b = (
As * Avx * Ax +
As * Avy * Ay +
As * Avz * Az +
As * Avx * Bx +
As * Avy * By +
As * Avz * Bz
float c = (
Mathf.Pow(Ax, 2) +
Mathf.Pow(Ay, 2) +
Mathf.Pow(Az, 2) -
Ax * Bx -
Ay * By -
Az * Bz +
Mathf.Pow(Bx, 2) +
Mathf.Pow(By, 2) +
Mathf.Pow(Bz, 2)
float t1 = (-b + Mathf.Pow((Mathf.Pow(b, 2) - (4 * a * c)), (1 / 2))) / (2 * a);
float t2 = (-b - Mathf.Pow((Mathf.Pow(b, 2) - (4 * a * c)), (1 / 2))) / (2 * a);
Debug.Log("t1 = " + t1 + "; t2 = " + t2);
if (t1 <= 0 || t1 == Mathf.Infinity || float.IsNaN(t1))
if (t2 <= 0 || t2 == Mathf.Infinity || float.IsNaN(t2))
return targetLocation;
t = t2;
else if (t2 <= 0 || t2 == Mathf.Infinity || float.IsNaN(t2) || t2 > t1)
t = t1;
t = t2;
Debug.Log("t = " + t);
Debug.Log("Bs = " + Bs);
float Bvx = (Ax - Bx + (t * As + Avx)) / (t * Mathf.Pow(Bs, 2));
float Bvy = (Ay - By + (t * As + Avy)) / (t * Mathf.Pow(Bs, 2));
float Bvz = (Az - Bz + (t * As + Avz)) / (t * Mathf.Pow(Bs, 2));
Vector3 Bv = new Vector3(Bvx, Bvy, Bvz);
Debug.Log("||Bv|| = (Should be 1) " + Bv.magnitude);
return Bv * Bs;
I followed the problem formulation as described by Gil Moshayof's answer, but found that there was an error in the simplification of the quadratic formula. When I did the derivation by hand I got a different solution.
The following is what worked for me when finding the intersect in 2D:
std::pair<double, double> find_2D_intersect(Vector3 sourcePos, double projSpd, Vector3 targetPos, double targetSpd, double targetHeading)
double P0x = targetPos.x;
double P0y = targetPos.y;
double s0 = targetSpd;
double V0x = std::cos(targetHeading);
double V0y = std::sin(targetHeading);
double P1x = sourcePos.x;
double P1y = sourcePos.y;
double s1 = projSpd;
// quadratic formula
double a = (s0 * s0)*((V0x * V0x) + (V0y * V0y)) - (s1 * s1);
double b = 2 * s0 * ((P0x * V0x) + (P0y * V0y) - (P1x * V0x) - (P1y * V0y));
double c = (P0x * P0x) + (P0y * P0y) + (P1x * P1x) + (P1y * P1y) - (2 * P1x * P0x) - (2 * P1y * P0y);
double t1 = (-b + std::sqrt((b * b) - (4 * a * c))) / (2 * a);
double t2 = (-b - std::sqrt((b * b) - (4 * a * c))) / (2 * a);
double t = choose_best_time(t1, t2);
double intersect_x = P0x + t * s0 * V0x;
double intersect_y = P0y + t * s0 * V0y;
return std::make_pair(intersect_x, intersect_y);

MySQL Math - Is it possible to calculate a correlation in a query?

In a MySQL (5.1) database table there is data that represents:
how long a user takes to perform a task and
how many items the user handled during the task.
Would MySQL support correlating the data or do I need to use PHP/C# to calcuate?
Where would I find a good formula to calculate correlation (it's been a long time since I last did this)?
Here's a rough implementation of the sample correlation coefficient as described in:
Wikipedia - Correlation and Dependence
create table sample( x float not null, y float not null );
insert into sample values (1, 10), (2, 4), (3, 5), (6,17);
select #ax := avg(x),
#ay := avg(y),
#div := (stddev_samp(x) * stddev_samp(y))
from sample;
select sum( ( x - #ax ) * (y - #ay) ) / ((count(x) -1) * #div) from sample;
| sum( ( x - #ax ) * (y - #ay) ) / ((count(x) -1) * #div) |
| 0.700885077729073 |
Single-Pass Solution
There are two flavors of the Pearson correlation coefficient, one for a Sample and one for an entire Population. These are single-pass and, I believe, correct formulas for both:
-- Methods for calculating the two Pearson correlation coefficients
-- For Population
(avg(x * y) - avg(x) * avg(y)) /
(sqrt(avg(x * x) - avg(x) * avg(x)) * sqrt(avg(y * y) - avg(y) * avg(y)))
AS correlation_coefficient_population,
-- For Sample
(count(*) * sum(x * y) - sum(x) * sum(y)) /
(sqrt(count(*) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(*) * sum(y * y) - sum(y) * sum(y)))
AS correlation_coefficient_sample
FROM your_table;
I developed and tested this as T-SQL. The code that generated the test data didn't translate to MySQL but the formulas should. Make sure your x and y are decimals values; integer math can significantly impact these calcs.
