diagonal replacement in an r correlogram - r

I am rather new to R. I am trying to replace the main diagonal of a correlogram (that's consisted of ones obviously). I have created the vectors for the the correlogram, and have used the cor() function from the cocron package to create the correlogram. I also created a list with the values that i want instead of the ones in the correlogram, consisted of internal reliabilities of the correlogram vectors.
library(cocron)
library(fmsb)
# defining correlated variables
JOB_ins = subset(df,select=c("q9","Rq10_new","q11","q12"))
INT_to_quit = subset(df,select=c("q13","q14","Rq15_new","q16"))
Employability = subset(df,select=c("q17","q18","q19","q20"))
Mobility_pref = subset(df,select=c("Rq21","Rq22","Rq23","Rq24","Rq25"))
Career_self_mgmt = subset(df,select=c("q26","q27","q28","q29","q30"
,"q31","q32","q33"))
# subsetting dataframes
x = subset(df,select=c(JOB_ins, INT_to_quit, Employability
,Mobility_pref,Career_self_mgmt))
#creating a correlation matrix
corrmat = cor(x)
#creating Cronbach Alpha reliabilities vector for diagonal replacement
dlist=list(round(CronbachAlpha(JOB_ins),2),round(CronbachAlpha(Int_to_quit),2)
,round(CronbachAlpha(Employability),2)
,round(CronbachAlpha(Mobility_pref),2)
,round(CronbachAlpha(Career_self_mgmt),2))
#replacing the main diagonal
diag(corrmat)=dlist
Doing that I do replace the main diagonal but It seems I also turn my correlogram from a matrix to a vector. Any idea how do I keep that from happening or reverse that?

First, you can use a vector instead of a list, replace list(round(CronbachAlpha(JOB_ins),2),...) by c(round(CronbachAlpha(JOB_ins),2),...)
Second, you can convert a vector to a matrix easily. Example:
matrix(c(1,2,3,4), nrow = 2) will convert the c(1,2,3,4) vector into the following 2x2 matrix:
[,1] [,2]
[1,] 1 3
[2,] 2 4

Related

Multiplicating a matrix with a vector results in a matrix

I have a document-term matrix:
document_term_matrix <- as.matrix(DocumentTermMatrix(corpus, control = list(stemming = FALSE, stopwords=FALSE, minWordLength=3, removeNumbers=TRUE, removePunctuation=TRUE )))
For this document-term matrix, I've calculated the local term- and global term weighing as follows:
lw_tf <- lw_tf(document_term_matrix)
gw_idf <- gw_idf(document_term_matrix)
lw_tf is a matrix with the same dimensionality as the document-term-matrix (nxm) and gw_idf is a vector of size n. However, when I run:
tf_idf <- lw_tf * gw_idf
The dimensionality of tf_idf is again nxm.
Originally, I would not expect this multiplication to work, as the dimensionalities are not conformable. However, given this output I now expect the dimensionality of gw_idf to be mxm. Is this indeed the case? And if so: what happened to the gw_idf vector of size n?
Matrix multiplication is done in R by using %*%, not * (the latter is just element-wise multiplication). Your reasoning is partially correct, you were just using the wrong symbols.
About the matrix multiplication, a matrix multiplication is only possible if the second dimension of the first matrix is the same as the first dimensions of the second matrix. The resulting dimensions is the dim1 of first matrix by the dim2 of the second matrix.
In your case, you're telling us you have a 1 x n matrix multiplied by a n x m matrix, which should result in a 1 x m matrix. You can check such case in this example:
a <- matrix(runif(100, 0 , 1), nrow = 1, ncol = 100)
b <- matrix(runif(100 * 200, 0, 1), nrow = 100, ncol = 200)
c <- a %*% b
dim(c)
[1] 1 200
Now, about your specific case, I don't really have this package that makes term-documents (would be nice of you to provide an easily reproducible example!), but if you're multiplying a nxm matrix element-wise (you're using *, like I said in the beginning) by a nx1 array, the result does not make sense. Either your variable gw_idf is not an array at all (maybe it's just a scalar) or you're simply making a wrong conclusion.

Using sapply instead of a for loop

Working on a project where we need to take the average of numbers in a matrix with those around it. For example, imagine a 3x3 matrix such as
[(1,2,3),
(4,5,6),
(7,8,9)].
Step 1 is to add padding around the matrix. Lets say we add 1 layer of padding thus getting a 5x5 matrix
[[0,0,0,0,0],
[0,1,2,3,0],
[0,4,5,6,0],
[0,7,8,9,0],
[0,0,0,0,0]].
matrix(c(0,0,0,0,0,0,1,2,3,0,0,4,5,6,0,0,7,8,9,0,0,0,0,0,0), nrow=5, ncol=5, byrow=T)
Then we average and filter getting the final 3x3 matrix. The first row/first column of this matrix should be (1+2+4+5)/9 = 1.33.
Right now my code works and looks like
for(row in (k+1):(nrow(pad.m) - k)){
for(col in (k+1):(ncol(pad.m) - k)) {
y <- pad.m[seq(row-k, row+k), seq(col-k, col+k)]
filter.m[row-k, col-k]<- mean(y)
}
where k is the number of layers of padding and pad.m is our matrix. Unfortunately my professor says that this is too unwieldy and prefers sapply over 2 for loops. I was wondering how I could subset and iterate through the matrix with sapply.
Use tensorflow. You can use either a convolutional layer or a pooling layer. Example:
library(tensorflow)
mymat <- matrix(c(0,0,0,0,0,0,1,2,3,0,0,4,5,6,0,0,7,8,9,0,0,0,0,0,0), nrow=5, ncol=5, byrow=T) # Your padded matrix
matrix1 <- tf$constant( array(mymat, dim=c(1,nrow(mymat),ncol(mymat),1)), dtype="float64" )
pool1 <- tf$nn$avg_pool(matrix1, c(1L,2L,2L,1L), c(1L,1L,1L,1L), "SAME")
sess <- tf$Session()
sess$run(tf$global_variables_initializer())
res <- pool1$eval(session=sess)
sess$close()
The above takes the average over 2x2 regions. But you added up the 2x2 regions and then divided by 9, which is weird, but okay. So you can get the results like this:
res <- res[1,,,]
(res * 4/9)[-1,][,-1][-(3:4),][,-(3:4)]
[,1] [,2]
[1,] 1.333333 1.777778
[2,] 2.666667 3.111111
The above is just formatting the array output back to matrix.

Pearson coefficient per rows on large matrices

I'm currently working with a large matrix (4 cols and around 8000 rows).
I want to perform a correlation analysis using Pearson's correlation coefficient between the different rows composing this matrix.
I would like to proceed the following way:
Find Pearson's correlation coefficient between row 1 and row 2. Then between rows 1 and 3... and so on with the rest of the rows.
Then find Pearson's correlation coefficient between row 2 and row 3. Then between rows 2 and 4... and so on with the rest of the rows. Note I won't find the coefficient with row 1 again...
For those coefficients being higher or lower than 0.7 or -0.7 respectively, I would like to list on a separate file the row names corresponding to those coefficients, plus the coefficient. E.g.:
row 230 - row 5812 - 0.76
I wrote the following code for this aim. Unfortunately, it takes a too long running time (I estimated almost a week :( ).
for (i in 1:7999) {
print("Analyzing row:")
print(i)
for (j in (i+1):8000) {
value<- cor(alpha1k[i,],alpha1k[j,],use = "everything",method = "pearson")
if(value>0.7 | value<(-0.7)){
aristi <- c(row.names(alpha1k)[i],row.names(alpha1k)[j],value)
arist1p<-rbind(arist1p,aristi)
}
}
Then my question is if there's any way I could do this faster. I read about making these calculations in parallel but I have no clue on how to make this work. I hope I made myself clear enough, thank you on advance!
As Roland pointed out, you can use the matrix version of cor to simplify your task. Just transpose your matrix to get a "row" comparison.
mydf <- data.frame(a = c(1,2,3,1,2,3,1,2,3,4), b = rep(5,2,10), c = c(1:10))
cor_mat <- cor(t(mydf)) # correlation of your transposed matrix
idx <- which((abs(cor_mat) > 0.7), arr.ind = T) # get relevant indexes in a matrix form
cbind(idx, cor_mat[idx]) # combine coordinates and the correlation
Note that parameters use = everything and method = "pearson" are used by default for correlation. There is no need to specify them.

How to create a matrix with probability distribution in R

I want to create a matrix in R with element [-1,0,1] with probability [1/6, 2/3, 1/6] respectively. The probability may change during runtime. for static probability I have got the output but the problem is dynamic change in the probability.
for example, If I create a matrix for the above probability with [sqrt(3),0,-sqrt(3)], the required output is.
Note: The Probability should not be static as mentioned. It may vary during runtime.
Kindly help to solve this.
Supposing you want a 2x3 matrix:
matrix(sample(c(-1,0,1), size=6, replace=TRUE, prob=c(1/6,2/3,1/6)), nrow=2)
So you sample from the values you want, with probabilities defined in prob. This is just a vector, but you can make it into a matrix of the desired shape using matrix afterwards. Replace the probabilities by a variable instead of values to not make it static.
If the numbers should be distributed according to a certain scheme rather than randomly drawn according to a probability, replicate the vector elements and shuffle them:
matrix(sample(rep(c(-1,0,1), times=c(1,4,1))), nrow=2)
You can try this to generate a mxn matrix:
sample.dynamic.matrix <- function(pop.symbols, probs, m, n) {
samples <- sample(pop.symbols, m*n, prob = probs, replace=TRUE)
return(matrix(samples, nrow=m))
}
set.seed(123)
sample.dynamic.matrix(-1:1, c(1/6,2/3,1/6), 2, 3)
# [,1] [,2] [,3]
#[1,] 0 0 -1
#[2,] 1 -1 0

Using mat2listw function in R to create spatial weights matrix

I am attempting to create a weights object in R with the mat2listw function. I have a very large spatial weights matrix (roughly 22,000x22,000)
that was created in Excel and read into R, and I'm now trying to implement:
library(spdep)
SW=mat2listw(matrix)
I am getting the following error:
Error in if (any(x<0)) stop ("values in x cannot be negative"): missing
value where TRUE/FALSE needed.
What's going wrong here? My current matrix is all 0's and 1's, with no
missing values and no negative elements. What am I missing?
I'd appreciate any advice. Thanks in advance for your help!
Here is a simple test to your previous comment:
library(spdep)
m1 <-matrix(rbinom(100, 1, 0.5), ncol =10, nrow = 10) #create a random 10 * 10 matrix
m2 <- m1 # create a duplicate of the first matrix
m2[5,4] <- NA # assign an NA value in the second matrix
SW <- mat2listw(m1) # create weight list matrix
SW2 <- mat2listw(m2) # create weight list matrix
The first matrix one does not fail, but the second matrix does. The real question is now why your weight matrix is created containing NAs. Have you considered creating spatial weight matrix in r? Using dnearneigh or other function.

Resources