FastICA and Blind Signal Separation Mathematics Question - math

I'm trying to make my own implementation of the FastICA algorithm based on the paper here: http://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf.
I need some help w/ the math though.
In the middle of page 14 there is an equation that looks somewhat like
w+ = E{ xg(w^Tx) } - E{ g[prime]( w^T x)} w
What does the E mean? Back from my probability days I recall that it is the "expected value" of a random variable but it doesn't make sense to me what the expected value of a vector is.
Thanks,
mj

ICA is interesting stuff. I used it some in my graduate research, but I didn't dig in too much under the hood; I just downloaded the FastICA implementation for MatLab and used that.
Anyway, you are correct that E{...} denotes expected value. The elements of the vector x represent the individual signals. Strictly speaking, x is a time series and should be written x(t), but the convention in ICA is to treat x instead as a random variable. In that context, of course, the idea of expected value makes sense. For example E{x} would just be the mean value of x (taken to be zero in ICA as the signals have been centered).
The authors of the paper you linked also have a book on ICA. It's outrageously expensive on Amazon, but if you can find a copy at, say, a nearby university library, it might be worth a look. It's been several years, but I remember it as being as gentle an introduction as one could hope for given the mathematics.

Related

Constructing Taylor Series from a Recursive function in Pari-GP

This is a continuation of my questions:
Declaring a functional recursive sequence in Matlab
Is there a more efficient way of nesting logarithms?
Nesting a specific recursion in Pari-GP
But I'll keep this question self contained. I have made a coding project for myself; which is to program a working simple calculator for a tetration function I've constructed. This tetration function is holomorphic, and stated not to be Kneser's solution (as to all the jargon, ignore); long story short, I need to run the numbers; to win over the nay-sayers.
As to this, I have to use Pari-GP; as this is a fantastic language for handling large numbers and algebraic expressions. As we are dealing with tetration (think numbers of the order e^e^e^e^e^e); this language is, of the few that exist, the best for such affairs. It is the favourite when doing iterated exponential computations.
Now, the trouble I am facing is odd. It is not so much that my code doesn't work; it's that it's overflowing because it should over flow (think, we're getting inputs like e^e^e^e^e^e; and no computer can handle it properly). I'll post the first batch of code, before I dive deeper.
The following code works perfectly; and does everything I want. The trouble is with the next batch of code. This produces all the numbers I want.
\\This is the asymptotic solution to tetration. z is the variable, l is the multiplier, and n is the depth of recursion
\\Warning: z with large real part looks like tetration; and therefore overflows very fast. Additionally there are singularities which occur where l*(z-j) = (2k+1)*Pi*I.
\\j,k are integers
beta_function(z,l,n) =
{
my(out = 0);
for(i=0,n-1,
out = exp(out)/(exp(l*(n-i-z)) +1));
out;
}
\\This is the error between the asymptotic tetration and the tetration. This is pretty much good for 200 digit accuracy if you need.
\\modify the 0.000000001 to a bigger number to make this go faster and receive less precision. When graphing 0.0001 is enough
\\Warning: This will blow up at some points. This is part of the math; these functions have singularities/branch cuts.
tau(z,l,n)={
if(1/real(beta_function(z,l,n)) <= 0.000000001, //this is where we'll have problems; if I try to grab a taylor series with this condition we error out
-log(1+exp(-l*z)),
log(1 + tau(z+1,l,n)/beta_function(z+1,l,n)) - log(1+exp(-l*z))
)
}
\\This is the sum function. I occasionally modify it; to make better graphs, but the basis is this.
Abl(z,l,n) = {
beta_function(z,l,n) + tau(z,l,n)
}
Plugging this in, you get the following expressions:
Abl(1,log(2),100)
realprecision = 28 significant digits (20 digits displayed)
%109 = 0.15201551563214167060
exp(Abl(0,log(2),100))
%110 = 0.15201551563214167060
Abl(1+I,2+0.5*I,100)
%111 = 0.28416643148885326261 + 0.80115283113944703984*I
exp(Abl(0+I,2+0.5*I,100))
%112 = 0.28416643148885326261 + 0.80115283113944703984*I
And so on and so forth; where Abl(z,l,n) = exp(Abl(z-1,l,n)). There's no problem with this code. Absolutely none at all; we can set this to 200 precision and it'll still produce correct results. The graphs behave exactly as the math says they should behave. The problem is, in my construction of tetration (the one we actually want); we have to sort of paste together the solutions of Abl(z,l,n) across the value l. Now, you don't have to worry about any of that at all; but, mathematically, this is what we're doing.
This is the second batch of code; which is designed to "paste together" all these Abl(z,l,n) into one function.
//This is the modified asymptotic solution to the Tetration equation.
beta(z,n) = {
beta_function(z,1/sqrt(1+z),n);
}
//This is the Tetration function.
Tet(z,n) ={
if(1/abs(beta_function(z,1/sqrt(1+z),n)) <= 0.00000001,//Again, we see here this if statement; and we can't have this.
beta_function(z,1/sqrt(1+z),n),
log(Tet(z+1,n))
)
}
This code works perfectly for real-values; and for complex values. Some sample values,
Tet(1+I,100)
%113 = 0.12572857262453957030 - 0.96147559586703141524*I
exp(Tet(0+I,100))
%114 = 0.12572857262453957030 - 0.96147559586703141524*I
Tet(0.5,100)
%115 = -0.64593666417664607364
exp(Tet(0.5,100))
%116 = 0.52417133958039107545
Tet(1.5,100)
%117 = 0.52417133958039107545
We can also effectively graph this object on the real-line. Which just looks like the following,
ploth(X=0,4,Tet(X,100))
Now, you may be asking; What's the problem then?
If you try and plot this function in the complex plane, it's doomed to fail. The nested logarithms produce too many singularities near the real line. For imaginary arguments away from the real-line, there's no problem. And I've produced some nice graphs; but the closer you get to the real line; the more it misbehaves and just short circuits. You may be thinking; well then, the math is wrong! But, no, the reason this is happening is because Kneser's tetration is the only tetration that is stable about the principal branch of the logarithm. Since this tetration IS NOT Kneser's tetration, it's inherently unstable about the principal branch of the logarithm. Of course, Pari just chooses the principal branch. So when I do log(log(log(log(log(beta(z+5,100)))))); the math already says this will diverge. But on the real line; it's perfectly adequate. And for values of z with an imaginary argument away from zero, we're fine too.
So, how I want to solve this, is to grab the Taylor series at Tet(1+z,100); which Pari-GP is perfect for. The trouble?
Tet(1+z,100)
*** at top-level: Tet(1+z,100)
*** ^------------
*** in function Tet: ...unction(z,1/sqrt(1+z),n))<=0.00000001,beta_fun
*** ^---------------------
*** _<=_: forbidden comparison t_SER , t_REAL.
The numerical comparison I've done doesn't translate to a comparison between t_SER and t_REAL.
So, my question, at long last: what is an effective strategy at getting the Taylor series of Tet(1+z,100) using only real inputs. The complex inputs near z=0 are erroneous; the real values are not. And if my math is right; we can take the derivatives along the real-line and get the right result. Then, we can construct a Tet_taylor(z,n) which is just the Taylor Series expansion. Which; will most definitely have no errors when trying to graph.
Any help, questions, comments, suggestions--anything, is greatly appreciated! I really need some outside eyes on this.
Thanks so much if you got to the bottom of this post. This one is bugging me.
Regards, James
EDIT:
I should add that a Tet(z+c,100) for some number c is the actual tetration function we want. There is a shifting constant I haven't talked about yet. Nonetheless; this is spurious to the question, and is more a mathematical point.
This is definitely not an answer - I have absolutely no clue what you are trying to do. However, I see no harm in offering suggestions. PARI has a built in type for power series (essentially Taylor series) - and is very good at working with them (many operations are supported). I was originally going to offer some suggestions on how to get a Taylor series out of a recursive definition using your functions as an example - but in this case, I'm thinking that you are trying to expand around a singularity which might be doomed to failure. (On your plot it seems as x->0, the result goes to -infinity???)
In particular if I compute:
log(beta(z+1, 100))
log(log(beta(z+2, 100)))
log(log(log(beta(z+3, 100))))
log(log(log(log(beta(z+4, 100)))))
...
The different series are not converging to anything. Even the constant term of the series is getting smaller with each iteration, so I am not entirely sure there is even a Taylor series expansion about x = 0.
Questions/suggestions:
Should you be expanding about a different point? (say where the curve
crosses the x-axis).
Does the Taylor series satisfy some recursive relation? For example: A(z) = log(A(z+1)). [This doesn't work, but perhaps there is another way to write it].
I suspect my answer is unlikely to be satisfactory - but then again your question is more mathematical than a practical programming problem.
So I've successfully answered my question. I haven't programmed in so long; I'm kind of shoddy. But I figured it out after enough coffee. I created 3 new functions, which allow me to grab the Taylor series.
\\This function attempts to find the number of iterations we need.
Tet_GRAB_k(A,n) ={
my(k=0);
while( 1/real(beta(A+k,n)) >= 0.0001, k++);
return(k);
}
\\This function will run and produce the same results as Tet; but it's slower; but it let's us estimate Taylor coefficients.
\\You have to guess which k to use for whatever accuracy before overflowing; which is what the last function is good for.
Tet_taylor(z,n,k) = {
my(val = beta(z+k,n));
for(i=1,k,val = log(val));
return(val);
}
\\This function produces an array of all the coefficients about a value A.
TAYLOR_SERIES(A,n) = {
my(ser = vector(40,i,0));
for(i=1,40, ser[i] = polcoeff(Tet_taylor(A+z,n,Tet_GRAB_k(A,n)),i-1,z));
return(ser);
}
After running the numbers, I'm confident this works. The Taylor series is converging; albeit rather slowly and slightly less accurately than desired; but this will have to do.
Thanks to anyone who read this. I'm just answering this question for completeness.

In how many functions maxterm/minterm of n binary variables can be expressed

I have confusion. Semantically we can construct 2^(2^n) boolean functions, but I read in Digital Electronics Morris Mano that we can construct 2^2n combinations of minterm/maxterm. How?
Samsamp, could you point to a specific place in the book or even better provide an exact quote? In the copy I found over the Internet I was not able to find such a claim after a fast glance over. The closest thing I found is:
Since the function can be either I or 0 for each minterm, and since there
are 2^n min terms, one can calculate the possible functions that can be formed with n variables to be 2^2^n.
which looks OK to me.

Ada mod and rem implementation

When looking into exactly what the difference is between mod and rem (something I admittedly should have done years ago, I found little on the matter. https://en.wikipedia.org/wiki/Modulo_operation states there are a few different divisions that can be used, and also states which sign the result has for each. If there's any statement about which division is performed in the ARM, I must've missed it. I assume it's Euclidian, but I want to be sure.
edit:
So I had missed this: http://www.adaic.org/resources/add_content/standards/05rm/html/RM-4-5-5.html which covers the relations. However, in the relation for mod: A = B*N + (A mod B)
The only mention of N is "in addition, for some signed integer value N". Where does N come from?
As said in the comments, http://www.ada-auth.org/standards/12rm/html/RM-4-5-5.html well explains the fundamental differences in behavior. The tables lower down in the reference manual were of great help. What I eventually came to the conclusion of (and implemented for various fractional types) is that rem uses truncated division, and mod uses floored division. I will edit this answer should I be shown wrong.

How to quantitatively measure how simplified a mathematical expression is

I am looking for a simple method to assign a number to a mathematical expression, say between 0 and 1, that conveys how simplified that expression is (being 1 as fully simplified). For example:
eval('x+1') should return 1.
eval('1+x+1+x+x-5') should returns some value less than 1, because it is far from being simple (i.e., it can be further simplified).
The parameter of eval() could be either a string or an abstract syntax tree (AST).
A simple idea that occurred to me was to count the number of operators (?)
EDIT: Let simplified be equivalent to how close a system is to the solution of a problem. E.g., given an algebra problem (i.e. limit, derivative, integral, etc), it should assign a number to tell how close it is to the solution.
The closest metaphor I can come up with it how a maths professor would look at an incomplete problem and mentally assess it in order to tell how close the student is to the solution. Like in a math exam, were the student didn't finished a problem worth 20 points, but the professor assigns 8 out of 20. Why would he come up with 8/20, and can we program such thing?
I'm going to break a stack-overflow rule and post this as an answer instead of a comment, because not only I'm pretty sure the answer is you can't (at least, not the way you imagine), but also because I believe it can be educational up to a certain degree.
Let's assume that a criteria of simplicity can be established (akin to a normal form). It seems to me that you are very close to trying to solve an analogous to entscheidungsproblem or the halting problem. I doubt that in a complex rule system required for typical algebra, you can find a method that gives a correct and definitive answer to the number of steps of a series of term reductions (ipso facto an arbitrary-length computation) without actually performing it. Such answer would imply knowing in advance if such computation could terminate, and so contradict the fact that automatic theorem proving is, for any sufficiently powerful logic capable of representing arithmetic, an undecidable problem.
In the given example, the teacher is actually either performing that computation mentally (going step by step, applying his own sequence of rules), or gives an estimation based on his experience. But, there's no generic algorithm that guarantees his sequence of steps are the simplest possible, nor that his resulting expression is the simplest one (except for trivial expressions), and hence any quantification of "distance" to a solution is meaningless.
Wouldn't all this be true, your problem would be simple: you know the number of steps, you know how many steps you've taken so far, you divide the latter by the former ;-)
Now, returning to the criteria of simplicity, I also advice you to take a look on Hilbert's 24th problem, that specifically looked for a "Criteria of simplicity, or proof of the greatest simplicity of certain proofs.", and the slightly related proof compression. If you are philosophically inclined to further understand these subjects, I would suggest reading the classic Gödel, Escher, Bach.
Further notes: To understand why, consider a well-known mathematical artefact called the Mandelbrot fractal set. Each pixel color is calculated by determining if the solution to the equation z(n+1) = z(n)^2 + c for any specific c is bounded, that is, "a complex number c is part of the Mandelbrot set if, when starting with z(0) = 0 and applying the iteration repeatedly, the absolute value of z(n) remains bounded however large n gets." Despite the equation being extremely simple (you know, square a number and sum a constant), there's absolutely no way to know if it will remain bounded or not without actually performing an infinite number of iterations or until a cycle is found (disregarding complex heuristics). In this sense, every fractal out there is a rough approximation that typically usages an escape time algorithm as an heuristic to provide an educated guess whether the solution will be bounded or not.

Recursive hypothesis-building with ambiguites - what's it called?

There's a problem I've encountered a lot (in the broad fields of data analyis or AI). However I can't name it, probably because I don't have a formal CS background. Please bear with me, I'll give two examples:
Imagine natural language parsing:
The flower eats the cow.
You have a program that takes each word, and determines its type and the relations between them. There are two ways to interpret this sentence:
1) flower (substantive) -- eats (verb) --> cow (object)
using the usual SVO word order, or
2) cow (substantive) -- eats (verb) --> flower (object)
using a more poetic world order. The program would rule out other possibilities, e.g. "flower" as a verb, since it follows "the". It would then rank the remaining possibilites: 1) has a more natural word order than 2), so it gets more points. But including the world knowledge that flowers can't eat cows, 2) still wins. So it might return both hypotheses, and give 1) a score of 30, and 2) a score of 70.
Then, it remembers both hypotheses and continues parsing the text, branching off. One branch assumes 1), one 2). If a branch reaches a contradiction, or a ranking of ~0, it is discarded. In the end it presents ranked hypotheses again, but for the whole text.
For a different example, imagine optical character recognition:
** **
** ** *****
** *******
******* **
* ** **
** **
I could look at the strokes and say, sure this is an "H". After identifying the H, I notice there are smudges around it, and give it a slightly poorer score.
Alternatively, I could run my smudge recognition first, and notice that the horizontal line looks like an artifact. After removal, I recognize that this is ll or Il, and give it some ranking.
After processing the whole image, it can be Hlumination, lllumination or Illumination. Using a dictionary and the total ranking, I decide that it's the last one.
The general problem is always some kind of parsing / understanding. Examples:
Natural languages or ambiguous languages
OCR
Path finding
Dealing with ambiguous or incomplete user imput - which interpretations make sense, which is the most plausible?
I'ts recursive.
It can bail out early (when a branch / interpretation doesn't make sense, or will certainly end up with a score of 0). So it's probably some kind of backtracking.
It keeps all options in mind in light of ambiguities.
It's based off simple rules at the bottom can_eat(cow, flower) = true.
It keeps a plausibility ranking of interpretations.
It's recursive on a meta level: It can fork / branch off into different 'worlds' where it assumes different hypotheses when dealing with the next part of data.
It'll forward the individual rankings, probably using bayesian probability, to dependent hypotheses.
In practice, there will be methods to train this thing, determine ranking coefficients, and there will be cutoffs if the tree becomes too big.
I have no clue what this is called. One might guess 'decision tree' or 'recursive descent', but I know those terms mean different things.
I know Prolog can solve simple cases of this, like genealogies and finding out who is whom's uncle. But you have to give all the data in code, and it doesn't seem convienent or powerful enough to do this for my real life cases.
I'd like to know, what is this problem called, are there common strategies for dealing with this? Is there good literature on the topic? Are there libraries for ideally C(++), Python, were you can just define a bunch of rules, and it works out all the rankings and hypotheses?
I don't think there is one answer that fits all the bullet points you have. But I hope my links will lead you closer to an answer or might give you a different question.
I think the closest answer is Bayesian network since you have probabilities affecting each other as I understand it, it is also related to Conditional probability and Fuzzy Logic
You also describe a bit of genetic programming as well as Artificial Neural Networks
I can name drop some more topics which might be related:
http://en.wikipedia.org/wiki/Rule-based_programming
http://en.wikipedia.org/wiki/Expert_system
http://en.wikipedia.org/wiki/Knowledge_engineering
http://en.wikipedia.org/wiki/Fuzzy_system
http://en.wikipedia.org/wiki/Bayesian_inference

Resources