I'd like to understand better how outer works and how to vectorize functions. Below is a minimal example for what I am trying to do:I have a set of numbers 2,3,4. for each combination (a,b) of create a diagonal matrix with a a b b b on the diagonal, and then do something with it, e.g. calculating its determinant (this is just for demonstration purposes). The results of the calculation should be written in a 3 by 3 matrix, one field for each combination.
The code below isn't working - apparently, outer (or my.func) doesn't understand that I don't want the whole lambdas vector to be applied - you can see that this is the case when you uncomment the print command included.
lambdas <- c(1:2)
my.func <- function(lambda1,lambda2){
# print(diag(c(rep(lambda1,2),rep(lambda2,2))))
det(diag(c(rep(lambda1,2),rep(lambda2,2))))
}
det.summary <- outer(lambdas,lambdas, FUN="my.func")
How do I need to modify my function or the call of outer so things behave like I'd like to?
I guess I need to vectorize my function somehow, but I don't know how, and in which way the outer call would be processed differently.
Edit:
I've changed size of the matrices to make it a bit less messy. I'd like to generate 4 diagonal 4 by 4 matrices, with the following diagonals; in are brackets the corresponding parameters lambda1, lambda2:
1 1 1 1 (1,1), 1 1 2 2 (1,2), 2 2 1 1 (2,1), 2 2 2 2 (2,2).
Then, I want to calculate their determinants (which is an arbitrary choice here) and put the results into a matrix, whose first column corresponds to lambda1=1, the second to lambda1=2, and the rows correspond to the choice of lambda2. det.summary should be a 2 by to matrix with the following values:
1 4
4 16
as these are the determinants of the diagonal matrices listed above.
What do you know, there is a Vectorize function (capital "V")!
outer(lambdas,lambdas, Vectorize(my.func))
# [,1] [,2]
# [1,] 1 4
# [2,] 4 16
As you figured out (and as it took me a while to figure out) outer requires the function to be vectorized. In some ways, it is the opposite of the *pply functions which effectively vectorize an operation by feeding the operator/function each value in turn. But this is easily dealt with, as shown above.
Related
Let's say I have a vector V, and I want to either turn this vector into multiple m x n matrices, or get multiple m x n matrices from this Vector V.
For the most basic example: Turn V = collect(1:75) into 3 5x5 matrices.
As far as I am aware this can be done by first using reshape reshape(V, 5, :) and then looping through it. Is there a better way in Julia without using a loop?
If possible, a solution that can easily change between row-major and column-major results is preferrable.
TL:DR
m, n, n_matrices = 4, 2, 5
V = collect(1:m*n*n_matrices)
V = reshape(V, m, n, :)
V = permutedims(V, [2,1,3])
display(V)
From my limited knowledge about Julia:
When doing V = collect(1:m*n), you initialize a contiguous array in memory. From V you wish to create a container of m by n matrices. You can achieve this by doing reshape(V, m, n, :), then you can access the first matrix with V[:,:,1]. The "container" in this case is just another array (thus you have a three dimensional array), which in this case we interpret as "an array of matrices" (but you could also interpret it as a box). You can then transpose every matrix in your array by swapping the first two dimensions like this: permutedims(V, [2,1,3]).
How this works
From what I understand; n-dimensional arrays in Julia are contiguous arrays in memory when you don't do any "skipping" (e.g. V[1:2:end]). For example the 2 x 4 matrix A:
1 3 5 7
2 4 6 8
is in memory just 1 2 3 4 5 6 7 8. You simply interpret the data in a specific way, where the first two numbers makes up the first column, then the second two numbers makes the next column so on so forth. The reshape function simply specifies how you want to interpret the data in memory. So if we did reshape(A, 4, 2) we basically interpret the numbers in memory as "the first four values makes the first column, the second four values makes the second column", and we would get:
1 5
2 6
3 7
4 8
We are basically doing the same thing here, but with an extra dimension.
From my observations it also seems to be that permutedims in this case reallocates memory. Also, feel free to correct me if I am wrong.
Old answer:
I don't know much about Julia, but in Python using NumPy I would have done something like this:
reshape(V, :, m, n)
EDIT: As #BatWannaBe states, the result is technically one array (but three dimensional). You can always interpret a three dimensional array as a container of 2D arrays, which from my understanding is what you ask for.
I would like to check whether 2 vectors are the same in APL. Right now I am using the following solution (comparing element by element, summing the elements and comparing with size of vector a):
a←1 2 3
b←1 2 3
(+/a=b)=⍴a ⍝ it needs to return 0 or 1
Is there any quicker or more idiomatic solution?
You can use the match function which compares its entire arguments rather than equals which is a scalar function that compares the elements of each argument:
a←1 2 3
b←1 2 3 4 5
c←1 2 3
a≡b
0
a≡c
1
The match primitive, as mentioned above, returns 1 if the arguments are exactly identical. This means that they have the exact same rank, shape, data type, and content. In a few cases match will return a false negative because of data-type issues (division resulting in a floating point representation, even though it is within the comparison tolerance of an integer), or because a scaler will not match a 1-element vector.
^/a=b
will return a 1 if all elements of a test equal to corresponding elements of b, but it will fail with a LENGTH error if a and b are of different lengths, and it will use scaler extension, so that if a is 1 1 1 and b is a scaler 1, the result will be 1.
Match is usually better for this, and it is also more efficient on large arrays.
I have a vector
v<-c(1,2,3)
I need add the numbers in the vector in the following fashion
1,1+2,1+2+3
producing a second vector
v1<-c(1,3,6)
This is probably quite simple...but I am a bit stuck.
Use the cumulative sum function:
cumsum(v)
#[1] 1 3 6
Say I have a matrix:
m<-matrix(1:5,4,5)
m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 4 3 2
[2,] 2 1 5 4 3
[3,] 3 2 1 5 4
[4,] 4 3 2 1 5
Now, when I do
image(m)
I get unexpected output. And so I need to do:
image(t(m)[,4:1])
to get it the "right" way. What is the point?
Others have pointed out that what you are seeing is consistent with the documentation, here are a couple of thoughts of the why it does it that way.
The image function was not originally designed to plot images/graphics, but to represent tabular information graphically, therefore the ordering of things was intended to be consistent with other graphing ideals rather than making sure that clipart looked correct. This means that rotations and mirroring of the image does not make it "wrong", it is just a different view, and the view that followed the plotting rules was chosen.
It also tried to be consistent with other graphing functions and the philosophy that they were based on. For a scatter plot we use plot(x,y) with x being the horizontal axis, but when we do table(x,y) the x variable forms the rows of the resulting table. Both of these facts are consistent with common practice (the explanatory variable is generally the row variable in a table since numbers are easier to compare vertically). So the image function uses the rows of the matrix (the x variable if it came from the table function) as the predictor/explanatory variable on the horizontal axis. It is also customary for values in plots to increase going left to right and bottom to top (but in tables it is more common to increase going top to bottom).
From the help file:
Notice that image interprets the z matrix as a table of f(x[i], y[j])
values, so that the x axis corresponds to row number and the y axis to
column number, with column 1 at the bottom, i.e. a 90 degree
counter-clockwise rotation of the conventional printed layout of a
matrix.
I want to differentiate data vectors to find those that are similar. For example:
A=[4,5,6,7,8];
B=[4,5,6,6,8];
C=[4,5,6,7,7];
D=[1,2,3,9,9];
E=[1,2,3,9,8];
In the previous example I want to distinguish that A,B,C vectors are similar (not the same) to each other and D,E are similiar to each other. The result should be something like: A,B,C are similar and D,E are similar, but the group A,B,C is not similar to the group of D,E. Matlab can do this?
I was thinking using some classification algorithm or Kmeans,ROC,etc.. but I'm not sure which one will be the best one.
Any suggestion? Thanks in advance
One of my new favourite methods for this sort of thing is agglomerate clustering.
First, concatenate all your vectors into a matrix, where each row is a separate vector. This makes such methods much easier to use:
F = [A; B; C; D; E];
Then the linkages can be found:
Z = linkage(F, 'ward', 'euclidean');
This can be plotted using:
dendrogram(Z);
This shows a tree, where each leaf at the bottom is one of the original vectors. Lengths of the branches show similarities and dissimilarities.
As you can see, 1, 2 and 3 are shown to be very close, as are 4 and 5. This even gives a measure of closeness, and shows that vectors 1 and 3 are deemed to be closer than vectors 2 and 3 (in the sense that, percentagewise, 7 is closer to 8 than 6 is to 7).
If all the vectors you are comparing are of the same length, a suitable norm on pairwise differences may well be enough. The norm to choose will depend on your particular criteria of closeness, of course, but with the examples you show, simply summing the absolute values of the components of the pairwise differences gives:
A B C D E
A 0 1 1 12 11
B 0 2 13 12
C 0 13 12
D 0 1
E 0
which doesn't need a particularly well-tuned threshold to work.
You can use pdist(), this function gives you the pairwise distances.
Various distance (opposite of similarity) metrics are already implemented, 'euclidean' seems appropriate for your situation, although you may want to try out the effect of different metrics.
Here it goes the solution I propose based on your results:
Z = [A;B;C;D;E];
Y = pdist(Z);
matrix = SQUAREFORM(Y);
matrix_round = round(matrix);
Now that we have the vector we can set the threshold based on the maximun value and decide with which theshold is the most appropriate.
It would be nice to create some cluster plot showing the differences between them.
Best regards