Calculate output of dynamic fnn without linear algebra

Calculate output of dynamic fnn without linear algebra - math

Working on a project which includes evolving the fnn topology during runtime, I want to outsource the nn output calculation to my gpu. Currently using unity (soon switching to just c# libs without unity or smth similar), I use hlsl like compute shaders.
Now since I am very restricted with hlsl's syntax (Arrays/Matrices etc with dynamic index ranges, dot product and matrix functions only working on preexisting types like float2 and those "vectors" having a max length of 4 (float4)), I am looking for a formula with which I can calculate the output of the fnn within a single calculation (obviously still including a loop). That means my shader has the weights, input layer, bias and structure of the nn each as seperate structured buffers. Now I need to find a formula with which I can calculate the output without using dot products or matrix vector multiplications since it's very hard and tortuous to implement those. However I've tried finding a formula for days, hopelessly...
Does anyone have a clue for this? Sadly not so many less recourses on the internet concerning hlsl and this problem.
Thanks a lot!
EDIT:
Okay I got some hints that my actual math related question is not quite clear. Since I am kind of an artist myself, I made a little illustration:
Now, there of course are multiple ways to calculate the output O. I wanna record following:
v3 = v1w1+v2w2
v4 = v1w3+v2w4
v5 = v3w5+v4w6 = w5*(v1w1+v2w2)+w6*(v1w3+v2w4)
This is an easy way to calculate the final output O in one step:
O1 = w5*(v1w1+v2w2)+w6*(v1w3+v2w4).
There are other quite cool ways to calculate the output in one step. Looking at the weights as matrices, e.g.
m1 = [[w1, w2], [w3, w4]] and m2 = [[w5, w6]].
This way we can calculate O as
L1 = m1 * I and O = m2 * L1
or in one step
O = m2*(m1*I)
which is the more elegant way. I would prefer it this way but I cant do it with matrix-vector multiplications or any other linear algebra since I am restricted by my programming tools, so I have to stay with this shape:
O1 = w5*(v1w1+v2w2)+w6*(v1w3+v2w4).
Now this would be really really easy if I had a neural network with fixed topology. However, since it evolves during runtime I have to find a function/formula with which I can calculate O independent from the topology. All information I have is ONE list of ALL weights (w=[w1, w2, w3, w4, w5, w6]), a list of inputs (i=[v1, v2]) and the structure of the nn (s=[2, 2, 1] - 2 input nodes, one hidden layer with 2 nodes, and one output node).
However I cant think of an algorithm that calculates the output with the given information efficiently.

Related

How would you optimize dividing bi variate data in R?

I'm not looking for a specific line a code - just built in functions or common packages that may help me do the following. Basically, something like, write up some code and use this function. I'm stuck on how to actually optimize - should I use SGD?
I have two variables, X, Y. I want to separate Y into 4 groups so that the L2, that is $(Xji | Yi - mean(Xji) | Yi)^2$ is minimized subject to the constraint that there are at least n observations in each group.
How would one go about solving this? I'd imagine you can't do this with the optim function? Basically the algo needs to move 3 values around (there are 3 cutoff points for Y) until L2 is minimized subject to n being a certain size.
Thanks

You could try optim and simply add a penalty if the constraints are not satisfied: since you minimise, add zero if all constraints are okay; otherwise a positive number.
If that does not work, since you only look for three cutoff points, I'd probably try a grid search, i.e. compute the objective function for different levels of the cutoff point; throw away those that violate the constraints, and then keep the best solution.

affinity propagation clustering on a large sparse matrix

I am trying R package apcluster on a set of objects that I want to cluster, but I'm running into performance/memory problems, and I suspect I'm not doing it right. I'd like to hear your opinion, please.
In short: I have a set of about 13000 objects. Each object is associated with a set of 2 to 5 'features'. The similarity (by which I want to cluster, eventually) between any two objects i and j is equal to the number of features they have in common divided by the total number of distinct features they 'span'. E.g. if i = {a,b,c} and j = {c,d}, then sim[i,j] = 1/4 = 0.25, because they have only 1 feature in common ({c}) and in total they describe 4 distinct features ({a,b,c,d}).
Calculating my NxN similarity matrix is not a problem in theory: it can be done using set operations if each object's features are stored as a list; or features can be pivoted to a matrix of 1's and 0's, where each column is a feature, and then R's function dist with method="binary" does the trick.
In practice however, the first problem is that such similarity calculations are extremely slow. For 13 K objects, there are about 84.5 M similarities to calculate, but this doesn't sound so bad for a modern computer. I don't understand why it should take a few hours to do that. And the set operation version, that should be quicker as far as I can tell, is actually much slower than dist. [Another package called fingerprint is supposed to deal with such cases more efficiently, but so far I haven't been able to make it work, it gives a lot of errors when trying to make what they call 'featvec' objects].
The other thing to consider is that the 2-5 features per object are not very repetitive. There may be a group of 100 or so objects with at least one feature in common between them, but then none of the other 12.9 K objects has any feature in common with these 100 objects. The consequence is that the pivoted feature matrix is very sparse (if we consider 0's as empty). There are about 4000 columns in the pivoted matrix, and each row has at most 5 1's. I wonder if this is negatively impacting the performance of dist, in that it has to multiply through a lot of 0's that could instead be ignored.
Does it seem normal to you that it should take a few hours to apply dist to a matrix like the one I described? Can you suggest a different way to calculate the similarity that takes advantage of the sparseness of the matrix?
Anyway, I managed to get the output from dist, which however had class 'dist', and was a distance matrix, not a similarity one, so I had to use 1 - as.matrix(distance_matrix) to be able to make the similarity matrix apcluster needs as input.
That's when I got the first 'memory' problem. R said the vector could not be allocated due to its size. I tried the usual tricks, but in the end I could not get more than 4 GB, and my matrices are (apparently) bigger.
I overcame this by assigning each time new matrices to their old 'self'.
And then when I submitted this painstakingly put together similarity matrix to apcluster, again the vector size error popped up, as if the first thing apcluster did was create some other large object from what I had fed it.
I had a look at as.Sparse... in apcluster, but it does not seem to help a lot, considering that you have to calculate the full matrix first anyway.
In the end the only thing that worked a little bit was 'leveraged affinity propagation' by apclusterL, which however is an approximation.
Does anybody know if and how I could do this better? E.g. is it wise to pivot the data first, or should I stick to list and set operations? Or, can the fact that the initial matrix is sparse be used to compute directly a sparse similarity matrix, rather than compute it fully and reduce it to sparse later?
Any advice would be greatly appreciated. Thanks!
BTW, yes, I saw this thread: Cluster Analysis in R on large sparse matrix ; which does not seem to have been answered conclusively.

The R interpreter is really slow.
So you should use R mostly to "drive" your program, but implement all the computations heavy stuff in C or FORTRAN.
You didn't show the code you are using, but I guess it involves nested for loops? Try to rewrite it without any for loops in R, or rewrite it in C.
But no matter what, AP clustering will always remain very slow. It involves many passes over O(n²) matrixes, i.e. it scales very badly.

Mathematical library to compare simularities in graphs of data for a high level language (eg. Javascript)?

I'm looking for something that I guess is rather sophisticated and might not exist publicly, but hopefully it does.
I basically have a database with lots of items which all have values (y) that correspond to other values (x). Eg. one of these items might look like:
x | 1 | 2 | 3 | 4 | 5
y | 12 | 14 | 16 | 8 | 6
This is just a a random example. Now, there are thousands of these items all with their own set of x and y values. The range between one x and the x after that one is not fixed and may differ for every item.
What I'm looking for is a library where I can plugin all these sets of Xs and Ys and tell it to return things like the most common item (sets of x and y that follow a compareable curve / progression), and the ability to check whether a certain set is atleast x% compareable with another set.
With compareable I mean the slope of the curve if you would draw a graph of the data. So, not actaully the static values but rather the detection of events, such as a high increase followed by a slow decrease, etc.
Due to my low amount of experience in mathematics I'm not quite sure what I'm looking for is called, and thus have trouble explaining what I need. Hopefully I gave enough pointers for someone to point me into the right direction.
I'm mostly interested in a library for javascript, but if there is no such thing any library would help, maybe I can try to port what I need.

About Markov Cluster(ing) again, of which I happen to be the author, and your application. You mention you are interested in trend similarity between objects. This is typically computed using Pearson correlation. If you use the mcl implementation from http://micans.org/mcl/, you'll also obtain the program 'mcxarray'. This can be used to compute pearson correlations between e.g. rows in a table. It might be useful to you. It is able to handle missing data - in a simplistic approach, it just computes correlations on those indices for which values are available for both. If you have further questions I am happy to answer them -- with the caveat that I usually like to cc replies to the mcl mailing list so that they are archived and available for future reference.

What you're looking for is an implementation of a Markov clustering. It is often used for finding groups of similar sequences. Porting it to Javascript, well... If you're really serious about this analysis, you drop Javascript as soon as possible and move on to R. Javascript is not meant to do this kind of calculations, and it is far too slow for it. R is a statistical package with much implemented. It is also designed specifically for very speedy matrix calculations, and most of the language is vectorized (meaning you don't need for-loops to apply a function over a vector of values, it happens automatically)
For the markov clustering, check http://www.micans.org/mcl/
An example of an implementation : http://www.orthomcl.org/cgi-bin/OrthoMclWeb.cgi
Now you also need to define a "distance" between your sets. As you are interested in the events and not the values, you could give every item an extra attribute being a vector with the differences y[i] - y[i-1] (in R : diff(y) ). The distance between two items can then be calculated as the sum of squared differences between y1[i] and y2[i].
This allows you to construct a distance matrix of your items, and on that one you can call the mcl algorithm. Unless you work on linux, you'll have to port that one.

What you're wanting to do is ANOVA, or ANalysis Of VAriance. If you run the numbers through an ANOVA test, it'll give you information about the dataset that will help you compare one to another. I was unable to locate a Javascript library that would perform ANOVA, but there are plenty of programs that are capable of it. Excel can perform ANOVA from a plugin. R is a stats package that is free and can also perform ANOVA.
Hope this helps.

Something simple is (assuming all the graphs have 5 points, and x = 1,2,3,4,5 always)
Take u1 = the first point of y, ie. y1
Take u2 = y2 - y1
...
Take u5 = y5 - y4
Now consider the vector u as a point in 5-dimensional space. You can use simple clustering algorithms, like k-means.
EDIT: You should not aim for something too complicated as long as you go with javascript. If you want to go with Java, I can suggest something based on PCA (requiring the use of singular value decomposition, which is too complicated to be implemented efficiently in JS).
Basically, it goes like this: Take as previously a (possibly large) linear representation of data, perhaps differences of components of x, of y, absolute values. For instance you could take
u = (x1, x2 - x1, ..., x5 - x4, y1, y2 - y1, ..., y5 - y4)
You compute the vector u for each sample. Call ui the vector u for the ith sample. Now, form the matrix
M_{ij} = dot product of ui and uj
and compute its SVD. Now, the N most significant singular values (ie. those above some "similarity threshold") give you N clusters.
The corresponding columns of the matrix U in the SVD give you an orthonormal family B_k, k = 1..N. The squared ith component of B_k gives you the probability that the ith sample belongs to cluster K.

If it is ok to use java you really should have a look at Weka. It is possible to access all features via java code. Maybe you find a markov clustering, but if not, they hava a lot other clustering algorithem and its really easy to use.

Problem for lsi

I am using Latent semantic analysis for text similarity. I have 2 questions.
How to select K value for dimention reduction?
I read alot every where that LSI work for similary meaning words for example car and automobile. How is it possible??? What is the magic step I am missing here?

The typical choice for k is 300. Ideally, you set k based on an evaluation metric that uses the reduced vectors. For example, if you're clustering documents, you could select the k that maximizes the clustering solution score. If you don't have a benchmark to measure against, then I would set k based on how big your data set is. If you only have 100 documents, then you wouldn't expect to need several hundred latent factors to represent them. Likewise, if you have a million documents, then 300 may be too small. However, in my experience the resulting vectors are fairly robust to large changes in k, provided that k is not too small (i.e., k = 300 does about as well as k = 1000).
You might be confusing LSI with Latent Semantic Analysis (LSA). They're very related techniques, with the difference being that LSI operates on documents, and LSA operates on words. Both approaches use the same input (a term x document matrix). There are several good open source LSA implementations if you would like to try them. The LSA wikipedia page has a comprehensive list.

try a couple of different values from [1..n] and see what works for whatever task you are trying to accomplish
Make a word-word correlation matrix [ i.e. cell(i,j) holds the # of docs where (i,j) co-occur ] and use something like PCA on it

How to check if m n-sized vectors are linearly independent?

Disclaimer
This is not strictly a programming question, but most programmers soon or later have to deal with math (especially algebra), so I think that the answer could turn out to be useful to someone else in the future.
Now the problem
I'm trying to check if m vectors of dimension n are linearly independent. If m == n you can just build a matrix using the vectors and check if the determinant is != 0. But what if m < n?
Any hints?
See also this video lecture.

Construct a matrix of the vectors (one row per vector), and perform a Gaussian elimination on this matrix. If any of the matrix rows cancels out, they are not linearly independent.
The trivial case is when m > n, in this case, they cannot be linearly independent.

Construct a matrix M whose rows are the vectors and determine the rank of M. If the rank of M is less than m (the number of vectors) then there is a linear dependence. In the algorithm to determine the rank of M you can stop the procedure as soon as you obtain one row of zeros, but running the algorithm to completion has the added bonanza of providing the dimension of the spanning set of the vectors. Oh, and the algorithm to determine the rank of M is merely Gaussian elimination.
Take care for numerical instability. See the warning at the beginning of chapter two in Numerical Recipes.

If m<n, you will have to do some operation on them (there are multiple possibilities: Gaussian elimination, orthogonalization, etc., almost any transformation which can be used for solving equations will do) and check the result (eg. Gaussian elimination => zero row or column, orthogonalization => zero vector, SVD => zero singular number)
However, note that this question is a bad question for a programmer to ask, and this problem is a bad problem for a program to solve. That's because every linearly dependent set of n<m vectors has a different set of linearly independent vectors nearby (eg. the problem is numerically unstable)

I have been working on this problem these days.
Previously, I have found some algorithms regarding Gaussian or Gaussian-Jordan elimination, but most of those algorithms only apply to square matrix, not general matrix.
To apply for general matrix, one of the best answers might be this:
http://rosettacode.org/wiki/Reduced_row_echelon_form#MATLAB
You can find both pseudo-code and source code in various languages.
As for me, I transformed the Python source code to C++, causes the C++ code provided in the above link is somehow complex and inappropriate to implement in my simulation.
Hope this will help you, and good luck ^^

If computing power is not a problem, probably the best way is to find singular values of the matrix. Basically you need to find eigenvalues of M'*M and look at the ratio of the largest to the smallest. If the ratio is not very big, the vectors are independent.

Another way to check that m row vectors are linearly independent, when put in a matrix M of size mxn, is to compute
det(M * M^T)
i.e. the determinant of a mxm square matrix. It will be zero if and only if M has some dependent rows. However Gaussian elimination should be in general faster.

Sorry man, my mistake...
The source code provided in the above link turns out to be incorrect, at least the python code I have tested and the C++ code I have transformed does not generates the right answer all the time. (while for the exmample in the above link, the result is correct :) -- )
To test the python code, simply replace the mtx with
[30,10,20,0],[60,20,40,0]
and the returned result would be like:
[1,0,0,0],[0,1,2,0]
Nevertheless, I have got a way out of this. It's just this time I transformed the matalb source code of rref function to C++. You can run matlab and use the type rref command to get the source code of rref.
Just notice that if you are working with some really large value or really small value, make sure use the long double datatype in c++. Otherwise, the result will be truncated and inconsistent with the matlab result.
I have been conducting large simulations in ns2, and all the observed results are sound.
hope this will help you and any other who have encontered the problem...

A very simple way, that is not the most computationally efficient, is to simply remove random rows until m=n and then apply the determinant trick.
m < n: remove rows (make the vectors shorter) until the matrix is square, and then
m = n: check if the determinant is 0 (as you said)
m < n (the number of vectors is greater than their length): they are linearly dependent (always).
The reason, in short, is that any solution to the system of m x n equations is also a solution to the n x n system of equations (you're trying to solve Av=0). For a better explanation, see Wikipedia, which explains it better than I can.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Calculate output of dynamic fnn without linear algebra - math

Related

How would you optimize dividing bi variate data in R?

affinity propagation clustering on a large sparse matrix

Mathematical library to compare simularities in graphs of data for a high level language (eg. Javascript)?

Problem for lsi

How to check if m n-sized vectors are linearly independent?

Categories

Resources