Dimension Reduction with SVD in R - r

I am trying to use SVD in R for dimension Reduction of a Matrix. I am able to find D, U, V matrix for "MovMat" Matrix. I want to reduce some dimensions that their values in D matrix is less than a "treshhold".
I wrote the code below. But I do not know how I can find values less than threshold in "MovMat" Matrix.
library(cluster)
library(fpc)
# "MovMat" is a users-movies Matrix.
# It is contain the rating score which each user gives for each movie.
svdAllDimensions = svd(MovMat)
d=diag(svd$d) # Finding D, U, V
u=svd$u
v=svd$v

I assigned the values of D which is less than the Threshold and again multiply D, V, U with each other and find new matrix with less dimension.
for(i in rowOfD){
for(j in columnOfD){
if (i==j){
if(d[i,j]<Threshold){
d[i,j] = 0
}
}
}
}

Related

Find limiting distribution of transition matrix and plot in R

may I know how I can find and plot the results of the limiting distribution or a unique stationary distribution of a transition matrix in R? (my goal is to have a unique and constant result instead of a random result)
This is the P matrix used:
P=matrix(c(0.2,0.3,0.5,0.1,0.8,0.1,0.4,0.2,0.4),nrow=3,ncol=3,byrow=TRUE)
I misspoke in my earlier answer. Either the sums of the rows or the column need to all be 1 for a transition matrix. It depends on whether you are using v'P or Pv to transition to the next step.
I'll use Pv.
For the limiting distribution to be stable, we must have:
Pv = v, or (P - I)v = 0. So the limiting distribution is an eigenvector with eigenvalue 1. Then to be sure it's a distribution sum(v) == 1.
Since your matrix has rows that sum to 1, not columns, we need to use the transpose of the matrix to calculate the eigenvalues:
e <- eigen(t(P))$vectors[, 1]
e <- e / sum(e)
Gives:
e
[1] 0.1960784 0.5490196 0.2549020
To check this:
P=matrix(c(0.2,0.3,0.5,0.1,0.8,0.1,0.4,0.2,0.4),nrow=3,ncol=3, byrow = TRUE)
ans <- e
for (i in 1:1000) {
ans <- ans %*% P
}
ans
ans
[,1] [,2] [,3]
[1,] 0.1960784 0.5490196 0.254902
Same, so it's stable.
I'm not clear as to what you wanted to plot.

Pointwise multiplication and right matrix division

I'm currently trying to recreate this Matlab function in R:
function X = uniform_sphere_points(n,d)
% X = uniform_sphere_points(n,d)
%
%function generates n points unformly within the unit sphere in d dimensions
z= randn(n,d);
r1 = sqrt(sum(z.^2,2));
X=z./repmat(r1,1,d);
r=rand(n,1).^(1/d);
X = X.*repmat(r,1,d);
Regarding the the right matrix division I installed the pracma package. My R code right now is:
uniform_sphere_points <- function(n,d){
# function generates n points uniformly within the unit sphere in d dimensions
z = rnorm(n, d)
r1 = sqrt(sum(z^2,2))
X = mrdivide(z, repmat(r1,1,d))
r = rnorm(1)^(1/d)
X = X * matrix(r,1,d)
return(X)
}
But it is not really working since I always end with a non-conformable arrays error in R.
This operation for sampling n random points from the d-dimensional unit sphere could be stated in words as:
Construct a n x d matrix with entries drawn from the standard normal distribution
Normalize each row so it has (2-norm) magnitude 1
For each row, compute a random value by taking a draw from the uniform distribution (between 0 and 1) and raise that value to the 1/d power. Multiply all elements in the row by that value.
The following R code does these operations:
unif.samp <- function(n, d) {
z <- matrix(rnorm(n*d), nrow=n, ncol=d)
z * (runif(n)^(1/d) / sqrt(rowSums(z^2)))
}
Note that in the second line of code I have taken advantage of the fact that multiplying a n x d matrix in R by a vector of length n will multiply each row by the corresponding value in that vector. This saves us the work of using repmat to construct matrices of exactly the same size as our original matrix for these sorts of row-specific operations.

Find projection matrix to create zero-sum vector

I have subspace of vectors w whose elements sum to 0.
I would like to find a projection matrix Z such that it projects any x vector onto the subspace w (i.e., a subspace where the vector sums to 0).
Is there an R function to do this?
The question did not specify how w is provided but if w is a matrix with full rank spanning the space w then
Z <- w %*% solve(crossprod(w), t(w))
If w has orthogonal columns then the above line reduces to:
Z <- tcrossprod(w)
Another possibility is to use the pracma package in which case w need not be of full rank:
library(pracma)
Z <- tcrossprod(orth(w))
If w were the space of all n-vectors that sum to zero then:
Z <- diag(n) - matrix(1, n, n) / n
Note Have revised after re-reading question.

Find K nearest neighbors, starting from a distance matrix

I'm looking for a well-optimized function that accepts an n X n distance matrix and returns an n X k matrix with the indices of the k nearest neighbors of the ith datapoint in the ith row.
I find a gazillion different R packages that let you do KNN, but they all seem to include the distance computations along with the sorting algorithm within the same function. In particular, for most routines the main argument is the original data matrix, not a distance matrix. In my case, I'm using a nonstandard distance on mixed variable types, so I need to separate the sorting problem from the distance computations.
This is not exactly a daunting problem -- I obviously could just use the order function inside a loop to get what I want (see my solution below), but this is far from optimal. For example, the sort function with partial = 1:k when k is small (less than 11) goes much faster, but unfortunately returns only sorted values rather than the desired indices.
Try to use FastKNN CRAN package (although it is not well documented). It offers k.nearest.neighbors function where an arbitrary distance matrix can be given. Below you have an example that computes the matrix you need.
# arbitrary data
train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2
n = dim(train)[1]
distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n
# matrix of neighbours
k=3
nn = matrix(0,n,k) # n x k
for (i in 1:n)
nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)
Notice: You can always check Cran packages list for Ctrl+F='knn'
related functions:
https://cran.r-project.org/web/packages/available_packages_by_name.html
For the record (I won't mark this as the answer), here is a quick-and-dirty solution. Suppose sd.dist is the special distance matrix. Suppose k.for.nn is the number of nearest neighbors.
n = nrow(sd.dist)
knn.mat = matrix(0, ncol = k.for.nn, nrow = n)
knd.mat = knn.mat
for(i in 1:n){
knn.mat[i,] = order(sd.dist[i,])[1:k.for.nn]
knd.mat[i,] = sd.dist[i,knn.mat[i,]]
}
Now knn.mat is the matrix with the indices of the k nearest neighbors in each row, and for convenience knd.mat stores the corresponding distances.

Alternative way to loop for faster computing

Let's say I have a n*p dataframe.
I have computed a list of n matrix of p*p dimensions (named listMat in the R script belowed), in which each matrix is the distance matrix between the p variables for each of the n respondants.
I want to compute a n*n matrix called normMat, with each elements corresponding to the norm of the difference between each pairwise distance matrix. For exemple : normMat[1,2] will be the norm of a matrix named "diffMat" where diffMat is the difference between the 1st distance matrix and the 2nd distance matrix of the list of Matrix "listMat".
I wrote the following script which works fine, but i'm wondering if there is a more efficient way to write it, to avoid the loops (using for exemple lapply, etc ..) and make the script execution go faster.
# exemple of n = 3 distances matrix between p = 5 variables
x <- abs(matrix(rnorm(1:25),5,5))
y <- abs(matrix(rnorm(1:25),5,5))
z <- abs(matrix(rnorm(1:25),5,5))
listMat <- list(x, y, z)
normMat <- matrix(NA,n,n)
for (numRow in 1:n){
for (numCol in 1:n){
diffMat <- listMat[[numRow]] - listMat[[numCol]]
normMat[numRow, numCol] <- norm(diffMat, type="F")
}
}
Thanks for your help.
Try:
normMat <- function(x, y) {
norm(x-y, type="F")
}
sapply(listMat, function(x) sapply(listMat, function(y) normMat(x,y)))

Resources