Find limiting distribution of transition matrix and plot in R - r

may I know how I can find and plot the results of the limiting distribution or a unique stationary distribution of a transition matrix in R? (my goal is to have a unique and constant result instead of a random result)
This is the P matrix used:
P=matrix(c(0.2,0.3,0.5,0.1,0.8,0.1,0.4,0.2,0.4),nrow=3,ncol=3,byrow=TRUE)

I misspoke in my earlier answer. Either the sums of the rows or the column need to all be 1 for a transition matrix. It depends on whether you are using v'P or Pv to transition to the next step.
I'll use Pv.
For the limiting distribution to be stable, we must have:
Pv = v, or (P - I)v = 0. So the limiting distribution is an eigenvector with eigenvalue 1. Then to be sure it's a distribution sum(v) == 1.
Since your matrix has rows that sum to 1, not columns, we need to use the transpose of the matrix to calculate the eigenvalues:
e <- eigen(t(P))$vectors[, 1]
e <- e / sum(e)
Gives:
e
[1] 0.1960784 0.5490196 0.2549020
To check this:
P=matrix(c(0.2,0.3,0.5,0.1,0.8,0.1,0.4,0.2,0.4),nrow=3,ncol=3, byrow = TRUE)
ans <- e
for (i in 1:1000) {
ans <- ans %*% P
}
ans
ans
[,1] [,2] [,3]
[1,] 0.1960784 0.5490196 0.254902
Same, so it's stable.
I'm not clear as to what you wanted to plot.

Related

R - Standardize matrix to have unit diagonals

I am seeking to generate the below matrix:
Θ = B + δIp ∈ Rp×p, where Ip is the identity matrix, each off-diagonal entry
in B (symmetric matrix) is generated independently and equals 0.5 with probability
0.1 or 0 with probability 0.9. The parameter δ > 0 is chosen such that Θ is positive definite. The matrix is standardized to have unit diagonals (transforming from covariance matrix to correlation matrix).
I think that I have most of the code, but I'm unsure of how to standardize the matrix to have unit diagonals syntactically in R (and theoretically, why that is a useful feature of a matrix).
# set number of cols/rows
p <- 5
set.seed(123)
# generate matrix B with values of 0.5 given probabilities
B <- matrix(sample(c(0,0.5), p^2, replace=TRUE, prob=c(0.9,0.1)), p)
# call the matrix lower triangle, need a symmetric matrix
i <- lower.tri(B)
B[i] <- t(B)[i]
diag(B) <- rep(0, p)
# finding parameter delta, such that Θ is positive definite.
(delta <- -min(eigen(B, symmetric=TRUE, only.values=TRUE)$values))
# set theta (delta is 2.8802)
theta <- B + 2.89*(diag(p))
# now to standardize the matrix to have unit diagonals ?
There are many ways to do this, but the following is very fast in timing experiments:
v <- 1/sqrt(diag(theta))
B <- theta * outer(v, v)
This divides all rows and columns by their standard deviations, which are the square roots of the diagonal elements.
It will fail whenever any diagonal is zero (or negative): but in that case such standardization isn't possible. Computing the square roots and their reciprocals first allows you to learn as soon as possible--with minimal computation--whether the procedure will succeed.
BTW, a direct way to compute B in the first part of your code (which has a zero diagonal) is
B <- as.matrix(structure(sample(c(0,1/2), p*(p-1)/2, replace=TRUE, prob=c(.9,.1),
Size=p, Diag=TRUE, class="dist"))
This eliminates the superfluous sampling.

Implementing KNN with different distance metrics using R

I am working on a dataset in order to compare the effect of different distance metrics. I am using the KNN algorithm.
The KNN algorithm in R uses the Euclidian distance by default. So I wrote my own one. I would like to find the number of correct class label matches between the nearest neighbor and target.
I have prepared the data at first. Then I called the data (wdbc_n), I chose K=1. I have used Euclidian distance as a test.
library(philentropy)
knn <- function(xmat, k,method){
n <- nrow(xmat)
if (n <= k) stop("k can not be more than n-1")
neigh <- matrix(0, nrow = n, ncol = k)
for(i in 1:n) {
ddist<- distance(xmat, method)
neigh[i, ] <- order(ddist)[2:(k + 1)]
}
return(neigh)
}
wdbc_nn <-knn(wdbc_n ,1,method="euclidean")
Hoping to get a similar result to the paper ("on the surprising behavior of distance metrics in high dimensional space") (https://bib.dbvis.de/uploadedFiles/155.pdf, page 431, table 3).
My question is
Am I right or wrong with the codes?
Any suggestions or reference that will guide me will be highly appreciated.
EDIT
My data (breast-cancer-wisconsin)(wdbc) dimension is
569 32
After normalizing and removing the id and target column the dimension is
dim(wdbc_n)
569 30
The train and test split is given by
wdbc_train<-wdbc_n[1:469,]
wdbc_test<-wdbc_n[470:569,]
Am I right or wrong with the codes?
Your code is wrong.
The call to the distance function taked about 3 seconds every time on my rather recent PC so I only did the first 30 rows for k=3 and noticed that every row of the neigh matrix was identical. Why is that? Take a look at this line:
ddist<- distance(xmat, method)
Each loop feeds the whole xmat matrix at the distance function, then uses only the first line from the resulting matrix. This calculates the distance between the training set rows, and does that n times, discarding every row except the first. Which is not what you want to do. The knn algorithm is supposed to calculate, for each row in the test set, the distance with each row in the training set.
Let's take a look at the documentation for the distance function:
distance(x, method = "euclidean", p = NULL, test.na = TRUE, unit =
"log", est.prob = NULL)
x a numeric data.frame or matrix (storing probability vectors) or a
numeric data.frame or matrix storing counts (if est.prob is
specified).
(...)
in case nrow(x) = 2 : a single distance value. in case nrow(x) > 2 :
a distance matrix storing distance values for all pairwise probability
vector comparisons.
In your specific case (knn classification), you want to use the 2 row version.
One last thing: you used order, which will return the position of the k largest distances in the ddist vector. I think what you want is the distances themselves, so you need to use sort instead of order.
Based on your code and the example in Lantz (2013) that your code seemed to be based on, here is a complete working solution. I took the liberty to add a few lines to make a standalone program.
Standalone working solution(s)
library(philentropy)
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
knn <- function(train, test, k, method){
n.test <- nrow(test)
n.train <- nrow(train)
if (n.train + n.test <= k) stop("k can not be more than n-1")
neigh <- matrix(0, nrow = n.test, ncol = k)
ddist <- NULL
for(i in 1:n.test) {
for(j in 1:n.train) {
xmat <- rbind(test[i,], train[j,]) #we make a 2 row matrix combining the current test and train rows
ddist[j] <- distance(as.data.frame(xmat), method, k) #then we calculate the distance and append it to the ddist vector.
}
neigh[i, ] <- sort(ddist)[2:(k + 1)]
}
return(neigh)
}
wbcd <- read.csv("https://resources.oreilly.com/examples/9781784393908/raw/ac9fe41596dd42fc3877cfa8ed410dd346c43548/Machine%20Learning%20with%20R,%20Second%20Edition_Code/Chapter%2003/wisc_bc_data.csv")
rownames(wbcd) <- wbcd$id
wbcd$id <- NULL
wbcd_n <- as.data.frame(lapply(wbcd[2:31], normalize))
wbcd_train<-wbcd_n[1:469,]
wbcd_test<-wbcd_n[470:549,]
wbcd_nn <-knn(wbcd_train, wbcd_test ,3, method="euclidean")
Do note that this solution might be slow because of the numerous (100 times 469) calls to the distance function. However, since we are only feeding 2 rows at a time into the distance function, it makes the execution time manageable.
Now does that work?
The two first test rows using the custom knn function:
[,1] [,2] [,3]
[1,] 0.3887346 0.4051762 0.4397497
[2,] 0.2518766 0.2758161 0.2790369
Let us compare with the equivalent function in the FNN package:
library(FNN)
alt.class <- get.knnx(wbcd_train, wbcd_test, k=3, algorithm = "brute")
alt.class$nn.dist
[,1] [,2] [,3]
[1,] 0.3815984 0.3887346 0.4051762
[2,] 0.2392102 0.2518766 0.2758161
Conclusion: not too shabby.

Dimension Reduction with SVD in R

I am trying to use SVD in R for dimension Reduction of a Matrix. I am able to find D, U, V matrix for "MovMat" Matrix. I want to reduce some dimensions that their values in D matrix is less than a "treshhold".
I wrote the code below. But I do not know how I can find values less than threshold in "MovMat" Matrix.
library(cluster)
library(fpc)
# "MovMat" is a users-movies Matrix.
# It is contain the rating score which each user gives for each movie.
svdAllDimensions = svd(MovMat)
d=diag(svd$d) # Finding D, U, V
u=svd$u
v=svd$v
I assigned the values of D which is less than the Threshold and again multiply D, V, U with each other and find new matrix with less dimension.
for(i in rowOfD){
for(j in columnOfD){
if (i==j){
if(d[i,j]<Threshold){
d[i,j] = 0
}
}
}
}

PCA Feature selection using R

I am a biologist. An output of my experiment contains large number of features(which are stored as numbers of columns and 563 rows). The columns are the features which are 8603 in number which are quite high.
So, when I tried to do PCA analysis in R and it gives "out of memory" errors.
I have tried also doing princomp in pieces, but it does not seem to work for our
approach.
I tried using the Script given in the link...
http://www.r-bloggers.com/introduction-to-feature-selection-for-bioinformaticians-using-r-correlation-matrix-filters-pca-backward-selection/
But still it does not wok :(
I am trying to use the following code
bumpus <- read.table("http://www.ndsu.nodak.edu/ndsu/doetkott/introsas/rawdata/bumpus.html",
skip=20, nrows=49,
col.names=c("id","total","alar","head","humerus","sternum"))
boxplot(bumpus, main="Boxplot of Bumpus' data") ## in this step it is showing the ERROR
# we first standardize the data:
bumpus.scaled <- data.frame( apply(bumpus,2,scale) )
boxplot(bumpus.scaled, main="Boxplot of standardized Bumpus' data")
pca.res <- prcomp(bumpus.scaled, retx=TRUE)
pca.res
# note:
# PC.1 is some kind of average of all the measurements
# => measure of size of the bird
# PC.2 has a negative weight for 'sternum'
# and positive weights for 'alar', 'head' and 'humerus'
# => measure of shape of the bird
# first two principal components:
pca.res$x[,1:2]
plot(pca.res$x[,1:2], pch="", main="PC.1 and PC.2 for Bumpus' data (blue=survived, red=died)")
text(pca.res$x[,1:2], labels=c(1:49), col=c(rep("blue",21),rep("red",28)))
abline(v=0, lty=2)
abline(h=0, lty=2)
# compare to segment plot:
windows()
palette(rainbow(12, s = 0.6, v = 0.75))
stars(bumpus, labels=c(1:49), nrow=6, key.loc=c(20,-1),
main="Segment plot of Bumpus' data", draw.segment=TRUE)
# compare to biplot:
windows()
biplot(pca.res, scale=0)
# what do the arrows mean?
# consider the arrow for sternum:
abline(0, pca.res$rotation[5,2]/pca.res$rotation[5,1])
# consider the arrow for head:
abline(0, pca.res$rotation[3,2]/pca.res$rotation[3,1])
But second line
boxplot(bumpus, main="Boxplot of Bumpus' data") ## shows an error
The error is
Error: cannot allocate vector of size 1.4 Mb
In addition: There were 27 warnings (use warnings() to see them)
Please help!
In cases where the number of features is either huge or exceeds the number of
observations, it is well advised to calculate the principal components based on
the transposed dataset. This is especially true in your case because the default
implies calculation of a 8603 x 8603 covariance matrix which itself already
consumes about 500 MB of memory (oh well, this isn't too much, but hey...).
Assuming that the rows of your matrix X correspond to observations
and columns correspond to features, center your data and then perform PCA on the
transpose of the centered X. There won't be more eigenpairs than number of
observations anyway. Finally, multiply each resulting eigenvector by X^T. You do
not need to do the latter for the eigenvalues (see way below for a detailed explanation):
What you want
This code demonstrates the implementation of PCA on the transposed dataset and compares the results of prcomp and the "transposed PCA":
pca.reduced <- function(X, center=TRUE, retX=TRUE) {
# Note that the data must first be centered on the *original* dimensions
# because the centering of the 'transposed covariance' is meaningless for
# the dataset. This is also why Sigma must be computed dependent on N
# instead of simply using cov().
if (center) {
mu <- colMeans(X)
X <- sweep(X, 2, mu, `-`)
}
# From now on we're looking at the transpose of X:
Xt <- t(X)
aux <- svd(Xt)
V <- Xt %*% aux$v
# Normalize the columns of V.
V <- apply(V, 2, function(x) x / sqrt(sum(x^2)))
# Done.
list(X = if (retX) X %*% V else NULL,
V = V,
sd = aux$d / sqrt(nrow(X)-1),
mean = if (center) mu else NULL)
}
# Example data (low-dimensional, but sufficient for this example):
X <- cbind(rnorm(1000), rnorm(1000) * 5, rnorm(1000) * 3)
original <- prcomp(X, scale=FALSE)
transposed <- pca.reduced(X)
# See what happens:
> print(original$sdev)
[1] 4.6468136 2.9240382 0.9681769
> print(transposed$sd)
[1] 4.6468136 2.9240382 0.9681769
>
> print(original$rotation)
PC1 PC2 PC3
[1,] -0.0055505001 0.0067322416 0.999961934
[2,] -0.9999845292 -0.0004024287 -0.005547916
[3,] 0.0003650635 -0.9999772572 0.006734371
> print(transposed$V)
[,1] [,2] [,3]
[1,] 0.0055505001 0.0067322416 -0.999961934
[2,] 0.9999845292 -0.0004024287 0.005547916
[3,] -0.0003650635 -0.9999772572 -0.006734371
Details
To see why it is possible to work on the transposed matrix consider the
following:
The general form of the eigenvalue equation is
A x = λ x (1)
Without loss of generality, let M be a centered "copy" of your original
dataset X. Substitution of M^T M for A yields
M^T M x = λ x (2)
Multiplication of this equation by M yields
M M^T M x = λ M x (3)
Consequent substitution of y = M x yields
M M^T y = λ y (4)
One can already see that y corresponds to an eigenvector of the "covariance"
matrix of the transposed dataset (note that M M^T is in fact no real
covariance matrix as the dataset X was centered along its columns and not its
rows. Also, scaling must be done by means of the number of samples (rows of M)
and not the number of features (columns of M resp. rows of M^T).
It can also be seen that the eigenvalues are the same for M M^T and M^T M.
Finally, one last multiplication by M^T results in
(M^T M) M^T y = λ M^T y (5)
where M^T M is the original covariance matrix.
From equation (5) it follows that M^T y is an eigenvector of M^T M with
eigenvalue λ.

Determining if a matrix is diagonalizable in the R Programming Language

I have a matrix and I would like to know if it is diagonalizable. How do I do this in the R programming language?
If you have a given matrix, m, then one way is the take the eigen vectors times the diagonal of the eigen values times the inverse of the original matrix. That should give us back the original matrix. In R that looks like:
m <- matrix( c(1:16), nrow = 4)
p <- eigen(m)$vectors
d <- diag(eigen(m)$values)
p %*% d %*% solve(p)
m
so in that example p %*% d %*% solve(p) should be the same as m
You can implement the full algorithm to check if the matrix reduces to a Jordan form or a diagonal one (see e.g., this document). Or you can take the quick and dirty way: for an n-dimensional square matrix, use eigen(M)$values and check that they are n distinct values. For random matrices, this always suffices: degeneracy has prob.0.
P.S.: based on a simple observation by JD Long below, I recalled that a necessary and sufficient condition for diagonalizability is that the eigenvectors span the original space. To check this, just see that eigenvector matrix has full rank (no zero eigenvalue). So here is the code:
diagflag = function(m,tol=1e-10){
x = eigen(m)$vectors
y = min(abs(eigen(x)$values))
return(y>tol)
}
# nondiagonalizable matrix
m1 = matrix(c(1,1,0,1),nrow=2)
# diagonalizable matrix
m2 = matrix(c(-1,1,0,1),nrow=2)
> m1
[,1] [,2]
[1,] 1 0
[2,] 1 1
> diagflag(m1)
[1] FALSE
> m2
[,1] [,2]
[1,] -1 0
[2,] 1 1
> diagflag(m2)
[1] TRUE
You might want to check out this page for some basic discussion and code. You'll need to search for "diagonalized" which is where the relevant portion begins.
All symmetric matrices across the diagonal are diagonalizable by orthogonal matrices. In fact if you want diagonalizability only by orthogonal matrix conjugation, i.e. D= P AP' where P' just stands for transpose then symmetry across the diagonal, i.e. A_{ij}=A_{ji}, is exactly equivalent to diagonalizability.
If the matrix is not symmetric, then diagonalizability means not D= PAP' but merely D=PAP^{-1} and we do not necessarily have P'=P^{-1} which is the condition of orthogonality.
you need to do something more substantial and there is probably a better way but you could just compute the eigenvectors and check rank equal to total dimension.
See this discussion for a more detailed explanation.

Resources