for loop for scalar product with a matrix - r

I'm trying to fill a 10 x 1500 matrix with a loop.
I have to fill that matrix with 150 small 10 x 10 matrixes. I have tried to implement this with a double loop, but unsuccessfully. My problem is that each 10*10 matrix is the result of a scalar product.
At the begin it seems to be easy, but then I realized I couldn't figure out the sizes of the 10 x 1500 matrix with the 150 small 10*10 matrixes.
Here is what I did:
es_var is a 1 x 150 matrix, which I converted to a vector to simplify the scalar product (at least in my opinion).
diax is a 10 x 10 matrix.
I want to multiply each value of the es_var vector per the whole diag 10*10 matrix.
I am having troubles because I don't manage to input R in filling 10 rows per time. Thus in the end I get a 10*1500 matrix, but it is the same 10*10 time matrix repeated 150 times.
Here is my code
es_var1 = as.vector(es_var)
v = matrix(0, 10, 10*N)
for (i in 1:N){
v[,] = es_var1[i] * diax
}
Can somebody help in figuring out this, please? I spent the whole day trying it. And I need to do that without using in build functions since this is a small part of a big math demonstration I have to implement.

If I understand your requirement correctly, you can accomplish this with the following line:
v <- matrix(diax,10,1500)*rep(es_var1,each=100);
This constructs a 10x1500 matrix with the 10x10 diax matrix as the initial values, cycled sufficiently to cover the complete 10x1500 size. Then, to apply the es_var1 multiplication, you can replicate each of its elements 100 times, such that they will naturally align with each consecutive 10x10 small matrix during vectorized multiplication.

Related

Julia: Turn Vector into multiple m x n matrices without a loop

Let's say I have a vector V, and I want to either turn this vector into multiple m x n matrices, or get multiple m x n matrices from this Vector V.
For the most basic example: Turn V = collect(1:75) into 3 5x5 matrices.
As far as I am aware this can be done by first using reshape reshape(V, 5, :) and then looping through it. Is there a better way in Julia without using a loop?
If possible, a solution that can easily change between row-major and column-major results is preferrable.
TL:DR
m, n, n_matrices = 4, 2, 5
V = collect(1:m*n*n_matrices)
V = reshape(V, m, n, :)
V = permutedims(V, [2,1,3])
display(V)
From my limited knowledge about Julia:
When doing V = collect(1:m*n), you initialize a contiguous array in memory. From V you wish to create a container of m by n matrices. You can achieve this by doing reshape(V, m, n, :), then you can access the first matrix with V[:,:,1]. The "container" in this case is just another array (thus you have a three dimensional array), which in this case we interpret as "an array of matrices" (but you could also interpret it as a box). You can then transpose every matrix in your array by swapping the first two dimensions like this: permutedims(V, [2,1,3]).
How this works
From what I understand; n-dimensional arrays in Julia are contiguous arrays in memory when you don't do any "skipping" (e.g. V[1:2:end]). For example the 2 x 4 matrix A:
1 3 5 7
2 4 6 8
is in memory just 1 2 3 4 5 6 7 8. You simply interpret the data in a specific way, where the first two numbers makes up the first column, then the second two numbers makes the next column so on so forth. The reshape function simply specifies how you want to interpret the data in memory. So if we did reshape(A, 4, 2) we basically interpret the numbers in memory as "the first four values makes the first column, the second four values makes the second column", and we would get:
1 5
2 6
3 7
4 8
We are basically doing the same thing here, but with an extra dimension.
From my observations it also seems to be that permutedims in this case reallocates memory. Also, feel free to correct me if I am wrong.
Old answer:
I don't know much about Julia, but in Python using NumPy I would have done something like this:
reshape(V, :, m, n)
EDIT: As #BatWannaBe states, the result is technically one array (but three dimensional). You can always interpret a three dimensional array as a container of 2D arrays, which from my understanding is what you ask for.

functions for matrices / cluster analysis

I have the following problem. Maybe you can help me!
I have 60 matrices (60 trials). Each of those matrices is 16*1000 fileds big (16 angles and 1000 timestamps). The 16 angels are bodyangels.
Now I want to calculate the euclicid distance for each of the combinations (1770). So I would get 1770 matrices which are are 16*1000 fileds big.
The list for every combinations I would get through this formula:
>comb <- combinations(n=60, r=2, v=n, set=TRUE, repeats.allowed=FALSE)
The formula which I want to apply to each of these combinations is:
> dab <- sqrt(sum((a-b)^2)) # a and b are two matrices
I tried to create a function, which is fortunately only for single values, but not for whole matrices.:
>dist.fun <- function(x,y)
>{
>z <- sqrt(sum((x)-(y))^2)
> return(z)
>}
Out of those distances I want to create an euclidic distance matrix to do a cluster analysis.
>plot(hclust((as.dist(m)),method="ward.D2")) # m is the euclidic disdance matrix
I hope, anyone can help me with this problem. The data is biomechanical data from gymnasts, which I want to investigate in terms of variant and invariant components and prototypes.

How to use pointDistance with a very large vector

I've got a big problem.
I've got a large raster (rows=180, columns=480, number of cells=86400)
At first I binarized it (so that there are only 1's and 0's) and then I labelled the clusters.(Cells that are 1 and connected to each other got the same label.)
Now I need to calculate all the distances between the cells, that are NOT 0.
There are quiet a lot and that's my big problem.
I did this to get the coordinates of the cells I'm interested in (get the positions (i.e. cell numbers) of the cells, that are not 0):
V=getValues(label)
Vu=c(1:max(V))
pos=which(V %in% Vu)
XY=xyFromCell(label,pos)
This works very well. So XY is a matrix, which contains all the coordinates (of cells that are not 0). But now I'm struggling. I need to calculate the distances between ALL of these coordinates. Then I have to put each one of them in one of 43 bins of distances. It's kind of like this (just an example):
0<x<0.2 bin 1
0.2<x<0.4 bin2
When I use this:
pD=pointDistance(XY,lonlat=FALSE)
R says it's not possible to allocate vector of this size. It's getting too large.
Then I thought I could do this (create an empty data frame df or something like that and let the function pointDistance run over every single value of XY):
for (i in 1:nrow(XY))
{pD=PointDistance(XY,XY[i,],lonlat=FALSE)
pDbin=as.matrix(table(cut(pD,breaks=seq(0,8.6,by=0.2),Labels=1:43)))
df=cbind(df,pDbin)
df=apply(df,1,FUN=function(x) sum(x))}
It is working when I try this with e.g. the first 50 values of XY.
But when I use that for the whole XY matrix it's taking too much time.(Sometimes this XY matrix contains 10000 xy-coordinates)
Does anyone have an idea how to do it faster?
I don't know if this will works fast or not. I recommend you try this:
Let say you have dataframe with value 0 or 1 in each cell. To find coordinates all you have to do is write the below code:
cord_matrix <- which(dataframe == 1, arr.ind = TRUE)
Now, you get the coordinate matrix with row index and column index.
To find the euclidean distance use dist() function. Go through it. It will look like this:
dist_vector <- dist(cord_matrix)
It will return lower triangular matrix. can be transformed into vector/symmetric matrix. Now all you have to do is calculating bins according to your requirement.
Let me know if this works within the specific memory space.

compute matrix distance using dynamic programming

I have a matrix composing values 0, 1, and 2. 99% of the values are 0. The matrix has 1 million rows and 700 columns. There will be at least one non-zero values each row.
I need to compute the distance between each pair of columns using this formula for distance between column x and y:
D=(Sum(|xi-yi|)/2L for i from 1 to L, L=1 million, i.e. the number of rows.
I wrote a piece of R code but it's taking too long to compute, is it possible to use dynamic programing to do it faster? Here is my code:
#mac is the matrix
nCols=ncol(mac)
nRows=nrow(mac)
#the pairwise distance matrix
distMat=matrix(data=-1,nrow=nCols,ncol=nCols)
abs.dist=function(x){return(abs(x[1]-x[2]))}
for(i in 1:(nCols-1)){
for(j in (i+1):nCols){
d1=apply(mac[,c(i,j),1,abs.dist)
k=sum(d1)/(2*nRows)
distMat[i,j]=k
distMat[j,i]=k
}
}
for(i in 1:nCols) distMat[i,i]=0
Thanks a lot for any help?
I will just summarize what is in the comments already:
#mac is the matrix
nCols=ncol(mac)
nRows=nrow(mac)
#the pairwise distance matrix
distMat=matrix(data=-1,nrow=nCols,ncol=nCols)
for(i in 1:(nCols-1)){
for(j in (i+1):nCols){
d1=abs(mac[,i]-mac[,j])
k=sum(d1)/(2*nRows)
distMat[i,j]=k
distMat[j,i]=k
}
}
diag(distMat) <- 0
This is approximately 100 times faster for a 2000x500 matrix.
It took about half a minute for a 1e6x700 matrix.
Computing a distance matrix means you need (n^2-n)/2 operations. I'm not surprised it is taking a while.
Since you need all pairs, these calculations have to be done independently. Dynamic programming will not help. DP helps when you build the solution from smaller parts. Everything here is independent so DP won't help (as far as I know).
You said most entries are 0. Try looking at a sparse matrix library. This blog post may give you some ideas for doing this in R.

R looping over two vectors

I have created two vectors in R, using statistical distributions to build the vectors.
The first is a vector of locations on a string of length 1000. That vector has around 10 values and is called mu.
The second vector is a list of numbers, each one representing the number of features at each location mentioned above. This vector is called N.
What I need to do is generate a random distribution for all features (N) at each location (mu)
After some fiddling around, I found that this code works correctly:
for (i in 1:length(mu)){
a <- rnorm(N[i],mu[i],20)
feature.location <- c(feature.location,a)
}
This produces the right output - a list of numbers of length sum(N), and each number is a location figure which correlates with the data in mu.
I found that this only worked when I used concatenate to get the values into a vector.
My question is; why does this code work? How does R know to loop sum(N) times but for each position in mu? What role does concatenate play here?
Thanks in advance.
To try and answer your question directly, c(...) is not "concatenate", it's "combine". That is, it combines it's argument list into a vector. So c(1,2,3) is a vector with 3 elements.
Also, rnorm(n,mu,sigma) is a function that returns a vector of n random numbers sampled from the normal distribution. So at each iteration, i,
a <- rnorm(N[i],mu[i],20)
creates a vector a containing N[i] random numbers sampled from Normal(mu[i],20). Then
feature.location <- c(feature.location,a)
adds the elements of that vector to the vector from the previous iteration. So at the end, you have a vector with sum(N[i]) elements.
I guess you're sampling from a series of locations, each a variable no. of times.
I'm guessing your data looks something like this:
set.seed(1) # make reproducible
N <- ceiling(10*runif(10))
mu <- sample(seq(1000), 10)
> N;mu
[1] 3 4 6 10 3 9 10 7 7 1
[1] 206 177 686 383 767 496 714 985 377 771
Now you want to take a sample from rnorm of length N(i), with mean mu(i) and sd=20 and store all the results in a vector.
The method you're using (growing the vector) is not recommended as it will be re-copied in memory each time an element is added. (See Circle 2, although for small examples like this, it's not so important.)
First, initialize the storage vector:
f.l <- NULL
for (i in 1:length(mu)){
a <- rnorm(n=N[i], mean=mu[i], sd=20)
f.l <- c(f.l, a)
}
Then, each time, a stores your sample of length N[i] and c() combines it with the existing f.l by adding it to the end.
A more efficient approach is
unlist(mapply(rnorm, N, mu, MoreArgs=list(sd=20)))
Which vectorizes the loop. Unlist is used as mapply returns a list of vectors of varying lengths.

Resources