How to create a matrix with probability distribution in R - r

I want to create a matrix in R with element [-1,0,1] with probability [1/6, 2/3, 1/6] respectively. The probability may change during runtime. for static probability I have got the output but the problem is dynamic change in the probability.
for example, If I create a matrix for the above probability with [sqrt(3),0,-sqrt(3)], the required output is.
Note: The Probability should not be static as mentioned. It may vary during runtime.
Kindly help to solve this.

Supposing you want a 2x3 matrix:
matrix(sample(c(-1,0,1), size=6, replace=TRUE, prob=c(1/6,2/3,1/6)), nrow=2)
So you sample from the values you want, with probabilities defined in prob. This is just a vector, but you can make it into a matrix of the desired shape using matrix afterwards. Replace the probabilities by a variable instead of values to not make it static.
If the numbers should be distributed according to a certain scheme rather than randomly drawn according to a probability, replicate the vector elements and shuffle them:
matrix(sample(rep(c(-1,0,1), times=c(1,4,1))), nrow=2)

You can try this to generate a mxn matrix:
sample.dynamic.matrix <- function(pop.symbols, probs, m, n) {
samples <- sample(pop.symbols, m*n, prob = probs, replace=TRUE)
return(matrix(samples, nrow=m))
}
set.seed(123)
sample.dynamic.matrix(-1:1, c(1/6,2/3,1/6), 2, 3)
# [,1] [,2] [,3]
#[1,] 0 0 -1
#[2,] 1 -1 0

Related

diagonal replacement in an r correlogram

I am rather new to R. I am trying to replace the main diagonal of a correlogram (that's consisted of ones obviously). I have created the vectors for the the correlogram, and have used the cor() function from the cocron package to create the correlogram. I also created a list with the values that i want instead of the ones in the correlogram, consisted of internal reliabilities of the correlogram vectors.
library(cocron)
library(fmsb)
# defining correlated variables
JOB_ins = subset(df,select=c("q9","Rq10_new","q11","q12"))
INT_to_quit = subset(df,select=c("q13","q14","Rq15_new","q16"))
Employability = subset(df,select=c("q17","q18","q19","q20"))
Mobility_pref = subset(df,select=c("Rq21","Rq22","Rq23","Rq24","Rq25"))
Career_self_mgmt = subset(df,select=c("q26","q27","q28","q29","q30"
,"q31","q32","q33"))
# subsetting dataframes
x = subset(df,select=c(JOB_ins, INT_to_quit, Employability
,Mobility_pref,Career_self_mgmt))
#creating a correlation matrix
corrmat = cor(x)
#creating Cronbach Alpha reliabilities vector for diagonal replacement
dlist=list(round(CronbachAlpha(JOB_ins),2),round(CronbachAlpha(Int_to_quit),2)
,round(CronbachAlpha(Employability),2)
,round(CronbachAlpha(Mobility_pref),2)
,round(CronbachAlpha(Career_self_mgmt),2))
#replacing the main diagonal
diag(corrmat)=dlist
Doing that I do replace the main diagonal but It seems I also turn my correlogram from a matrix to a vector. Any idea how do I keep that from happening or reverse that?
First, you can use a vector instead of a list, replace list(round(CronbachAlpha(JOB_ins),2),...) by c(round(CronbachAlpha(JOB_ins),2),...)
Second, you can convert a vector to a matrix easily. Example:
matrix(c(1,2,3,4), nrow = 2) will convert the c(1,2,3,4) vector into the following 2x2 matrix:
[,1] [,2]
[1,] 1 3
[2,] 2 4

create a random non-singular matrix reliably

How can I create a matrix of pseudo-random values that is guaranteed to be non-singular? I tried the code below, but it failed. I suppose I could just loop until I got one by chance but I would prefer a more elegant "R-like" solution if anyone has an idea.
library(matrixcalc)
exampledf<- matrix(ceiling(runif(16,0,50)), ncol=4)
is.singular.matrix(exampledf) #this may or may not return false
using a while loop:
exampledf<-NULL
library(matrixcalc)
while(is.singular.matrix(exampledf)!=TRUE){
exampledf<- matrix(ceiling(runif(16,0,50)), ncol=4)
}
I suppose one method that guarantees (not is fairly likely, but actually guarantees) that the matrix is non-singular, is to start from a known non-singular matrix and apply the basic linear operations used for example in Gaussian Elimination: 1. add / subtract a multiple of one row from another row or 2. multiply row by a constant.
Depending on how "random" and how dense you want your matrix to be you can start from the identity matrix and multiply all elements with a random constant. Afterwards, you can apply a randomly selected set of operations from above, that will result in a non singular matrix. You can even apply a predefined set of operations, but using a randomly selected constant at each step.
An alternative could be to start from an upper triangular matrix for which the product of main diagonal entries is not zero. This is because the determinant of a triangular matrix is the product of the elements on the main diagonal. This effectively boils down to generating N random numbers, placing them on the main diagonal, and setting the rest of the entries (above the main diagonal) to whatever you like. If you want the matrix to be fully dense, add the first row to every other row of the matrix.
Of course this approach (like any other probably would) assumes that the matrix is relatively numerically stable and the singularity will not be affected by precision errors (as you know the precision of data types in all programming languages is limited). You would do well to avoid very small / very large values which can make the method numerically unstable.
It should be fairly unlikely that this will produce a singular matrix:
Mat1 <- matrix(rnorm(100), ncol=4)
Mat2 <- matrix(rnorm(100), ncol=4)
crossprod(Mat1,Mat2)
[,1] [,2] [,3] [,4]
[1,] 0.8138 5.112 2.945 -5.003
[2,] 4.9755 -2.420 1.801 -4.188
[3,] -3.8579 8.791 -2.594 3.340
[4,] 7.2057 6.426 2.663 -1.235
solve( crossprod(Mat1,Mat2) )
[,1] [,2] [,3] [,4]
[1,] -0.11273 0.15811 0.05616 0.07241
[2,] 0.03387 0.01187 0.07626 0.02881
[3,] 0.19007 -0.60377 -0.40665 0.17771
[4,] -0.07174 -0.31751 -0.15228 0.14582
inv1000 <- replicate(1000, {
Mat1 <- matrix(rnorm(100), ncol=4)
Mat2 <- matrix(rnorm(100), ncol=4)
try(solve( crossprod(Mat1,Mat2)))} )
str(inv1000)
#num [1:4, 1:4, 1:1000] 0.1163 0.0328 0.3424 -0.227 0.0347 ...
max(inv1000)
#[1] 451.6
> inv100000 <- replicate(100000, {Mat1 <- matrix(rnorm(100), ncol=4)
+ Mat2 <- matrix(rnorm(100), ncol=4)
+ is.singular.matrix( crossprod(Mat1,Mat2))} )
> sum(inv100000)
[1] 0

Replace 0 value in covariance matrix (pmvnorm)

so I'm using pmvnorm and a cycle for, as the elements in the covariance matrix can change according to the value of some parameters:
y<-c(0,0,0,0,0,0,0,0,0,0)
....
library(mvtnorm)
mu=c(18,12.72,(18*(c-d)+12.72*f))
covariance=matrix(c(5.7,0,5.7*(c-d),0,30.38,30.38*f^2,5.7*(c-d),30.38*f,(5.7*(c-d)^2+30.38*f^2)),3)
H=c(15,-Inf,-Inf)
L=c(Inf,15,g)
for(i in 1:10)
y[i]=pmvnorm(mean=mu,sigma=covariance,lower=H,upper=L)
where c,d,f etc were already defined.
It works but,in some cases I have the third r.v that has 0 variance and it appears an error. Is it possible to replace in the covariance matrix 0 value with very small value (as 1e-06?)
Thank you
If you just want to replace 0s with a very small value (as 1e-06)
covariance = matrix(c(0,2,3,0), ncol = 2)
covariance[covariance == 0] <- 1e-06
covariance
If this doesnt help Pascal is right, some details about your parameter could be helpful to look in to the positive definiteness problem of your covariance matrices.

Impossible to create correlated variables from this correlation matrix?

I would like to generate correlated variables specified by a correlation matrix.
First I generate the correlation matrix:
require(psych)
require(Matrix)
cor.table <- matrix( sample( c(0.9,-0.9) , 2500 , prob = c( 0.8 , 0.2 ) , repl = TRUE ) , 50 , 50 )
k=1
while (k<=length(cor.table[1,])){
cor.table[1,k]<-0.55
k=k+1
}
k=1
while (k<=length(cor.table[,1])){
cor.table[k,1]<-0.55
k=k+1
}
ind<-lower.tri(cor.table)
cor.table[ind]<-t(cor.table)[ind]
diag(cor.table) <- 1
This correlation matrix is not consistent, therefore, eigenvalue decomposition is impossible.
TO make it consistent I use nearPD:
c<-nearPD(cor.table)
Once this is done I generate the correlated variables:
fit<-principal(c, nfactors=50,rotate="none")
fit$loadings
loadings<-matrix(fit$loadings[1:50, 1:50],nrow=50,ncol=50,byrow=F)
loadings
cases <- t(replicate(50, rnorm(10)) )
multivar <- loadings %*% cases
T_multivar <- t(multivar)
var<-as.data.frame(T_multivar)
cor(var)
However the resulting correlations are far from anything that I specified initially.
Is it not possible to create such correlations or am I doing something wrong?
UPDATE from Greg Snow's comment it became clear that the problem is that my initial correlation matrix is unreasonable.
The question then is how can I make the matrix reasonable. The goal is:
each of the 49 variables should correlate >.5 with the first variable.
~40 of the variables should have a high >.8 correlation with each other
the remaining ~9 variables should have a low or negative correlation with each other.
Is this whole requirement impossible ?
Try using the mvrnorm function from the MASS package rather than trying to construct the variables yourself.
**Edit
Here is a matrix that is positive definite (so it works as a correlation matrix) and comes close to your criteria, you can tweak the values from there (all the Eigen values need to be positive, so you can see how changing a number affects things):
cor.mat <- matrix(0.2,nrow=50, ncol=50)
cor.mat[1,] <- cor.mat[,1] <- 0.55
cor.mat[2:41,2:41] <- 0.9
cor.mat[42:50, 42:50] <- 0.25
diag(cor.mat) <- 1
eigen(cor.mat)$values
Some numerical experimentation based on your specifications above suggests that the generated matrix will never (what never? well, hardly ever ...) be positive definite, but it also doesn't look far from PD with these values (making lcor below negative will almost certainly make things worse ...)
rmat <- function(n=49,nhcor=40,hcor=0.8,lcor=0) {
m <- matrix(lcor,n,n) ## fill matrix with 'lcor'
## select high-cor variables
hcorpos <- sample(n,size=nhcor,replace=FALSE)
## make all of these highly correlated
m[hcorpos,hcorpos] <- hcor
## compute min real part of eigenvalues
min(Re(eigen(m,only.values=TRUE)$values))
}
set.seed(101)
r <- replicate(1000,rmat())
## NEVER pos definite
max(r)
## [1] -1.069413e-15
par(las=1,bty="l")
png("eighist.png")
hist(log10(abs(r)),breaks=50,col="gray",main="")
dev.off()

Determining if a matrix is diagonalizable in the R Programming Language

I have a matrix and I would like to know if it is diagonalizable. How do I do this in the R programming language?
If you have a given matrix, m, then one way is the take the eigen vectors times the diagonal of the eigen values times the inverse of the original matrix. That should give us back the original matrix. In R that looks like:
m <- matrix( c(1:16), nrow = 4)
p <- eigen(m)$vectors
d <- diag(eigen(m)$values)
p %*% d %*% solve(p)
m
so in that example p %*% d %*% solve(p) should be the same as m
You can implement the full algorithm to check if the matrix reduces to a Jordan form or a diagonal one (see e.g., this document). Or you can take the quick and dirty way: for an n-dimensional square matrix, use eigen(M)$values and check that they are n distinct values. For random matrices, this always suffices: degeneracy has prob.0.
P.S.: based on a simple observation by JD Long below, I recalled that a necessary and sufficient condition for diagonalizability is that the eigenvectors span the original space. To check this, just see that eigenvector matrix has full rank (no zero eigenvalue). So here is the code:
diagflag = function(m,tol=1e-10){
x = eigen(m)$vectors
y = min(abs(eigen(x)$values))
return(y>tol)
}
# nondiagonalizable matrix
m1 = matrix(c(1,1,0,1),nrow=2)
# diagonalizable matrix
m2 = matrix(c(-1,1,0,1),nrow=2)
> m1
[,1] [,2]
[1,] 1 0
[2,] 1 1
> diagflag(m1)
[1] FALSE
> m2
[,1] [,2]
[1,] -1 0
[2,] 1 1
> diagflag(m2)
[1] TRUE
You might want to check out this page for some basic discussion and code. You'll need to search for "diagonalized" which is where the relevant portion begins.
All symmetric matrices across the diagonal are diagonalizable by orthogonal matrices. In fact if you want diagonalizability only by orthogonal matrix conjugation, i.e. D= P AP' where P' just stands for transpose then symmetry across the diagonal, i.e. A_{ij}=A_{ji}, is exactly equivalent to diagonalizability.
If the matrix is not symmetric, then diagonalizability means not D= PAP' but merely D=PAP^{-1} and we do not necessarily have P'=P^{-1} which is the condition of orthogonality.
you need to do something more substantial and there is probably a better way but you could just compute the eigenvectors and check rank equal to total dimension.
See this discussion for a more detailed explanation.

Resources