Reorder symmetric matrix in R - r

EDITED: Suppose I have a symmetric matrix such as the one below.
dat<-c(NA,2,3,4,5,2,NA,8,9,10,3,8,NA,14,15,4,9,14,NA,20,5,10,15,20,NA)
x<-matrix(dat,nrow = 5,dimnames = list(c("A","B","C","D","E"),c("A","B","C","D","E")))
x
I'm trying to see if there is any way we can use R to reorder the matrix in such a way that the highest values are closer to the diagonal, with the maximum value of each column of the lower triangle as the first item in the diagonal, and also it maintains its symmetry. This is a problem in card sorting.
Here is the desired output:
result<-c(NA,20,15,10,5,20,NA,14,9,4,15,14,NA,8,3,10,9,8,NA,2,5,4,3,2,NA)
y<-matrix(result,nrow = 5,dimnames = list(c("E","D","C","B","A"),c("E","D","C","B","A")))
y

I had a similar requirement when examining a matrix containing similarities between documents.
k <- apply(x, 1, max, na.rm=TRUE)
order <- sort(k, decreasing=TRUE, index.return=TRUE)$ix
x[order, order]
I use max on each row to find the maximum value per row. na.rm ensures that the diagonal is not considered. sort then provides the desired order as a vector. Reorganising the matrix according to that order is as simple as x[order, order].

Related

Locate % of times that the second highest value appears for each column in R data frame

I have a dataframe in R as follows:
set.seed(123)
df <- as.data.frame(matrix(rnorm(20*5,mean = 0,sd=1),20,5))
I want to find the percentage of times that the highest value of each row appears in each column, which I can do as follows:
A <- table(names(df)[max.col(df)])/nrow(df)
Then the percentage of times that the second highest value of each row appears in each column can be found as follows:
df2 <- as.data.frame(t(apply(df,1,function(r) {
r[which.max(r)] <- 0.001
return(r)})))
B <- table(names(df2)[max.col(df2)])/nrow(df2)
How can I calculate in R the following?
C<- The percentage of times that the first and the second highest values
appear in the first two columns of `df` simultaneously
I would do it like this:
# compute reverse rank
df.rank <- ncol(df) - t(apply(df, 1, rank)) + 1
A <- colMeans(df.rank == 1)
B <- colMeans(df.rank == 2)
C <- mean(apply(df.rank[, 1:2], 1, prod)==2)
First I compute reverse rank which is analogous to using decreasing=T with sort() or order(). A and B is then rather straightforward. Please note that your original approach omits zeros for columns where no (second) maximum value appears which may cause problems in later usage.
For C, I take only first two columns of the rank matrix and compute their product for every row. If there are the two largest values in the first two columns the product has to be 2.
Also, if ties might appear in your data set you should consider selecting the appropriate ties.method argument for rank.

Using apply function to obtain only those rows that passes threshold in R

I am trying to apply a filter on my data (which is in the form of matrix) with say 10 columns, 200 rows.
I want to retain only those rows that where the coefficient of variance is greater than a threshold. But with the code I have, it seems its printing the coefficient of variance for the rows passing threshold. I want it to just test if it passes threshold, but print the original data point in the matrix.
covar <- function(x) ( sd(x)/mean(x) )
evar <- apply(myMatrix,1,covar)
myMatrix_filt_var <-myMatrix[evar>2,]
Here threshold I set is 2.
What am I doing wrong ? Sorry just learning R.
Thanks!
If m is your matrix, then,
m[apply(m, 1, function(x) sd(x)/mean(x) > 2), ]
should give you the filtered matrix. The idea is to obtain the coefficient of variation for every row and check if it is > 2 inside. This will return a logical vector from which by directly accessing it like m[logical_vector, ], we can get those rows where the condition is TRUE.
You can use na.rm = TRUE if you want to remove NA values while calculating sd and mean.

calculation of block quantities in a matrix in R

I have a very general question on data manipulations in R, and I am seeking a convenient and fast way. Suppose I have a matrix of dimension (R)-by-(nxm), i.e. R rows and n times m columns.
set.seed(999)
n = 5; m = 10; R = 100
ncol = m*n
mat = matrix(rnorm(n*m*R), nrow=R, ncol=ncol)
Now I want to have a new matrix (call it new.mat) of dimension (R)-by-(m), i.e. given a certain row of mat, I want to calculate a number (say sum) for the first n elements, then a number for the next n elements, and so on. In this way, the first row of mat ends up with m numbers. The same thing is done for every other row of mat.
For the given example above, the 1st element of the 1st row of the new matrix new.mat should be sum(mat[1,1:5]), the 2nd element is sum(mat[1,6:10]), and the last element is sum(mat[1,46:50]). The 2nd row of new.mat is (sum(mat[2,1:5]), sum(mat[2,6:10),...).
If possible, avoiding for loops is preferred. Thank you!
rowsum is a useful function here. You will have to do a bit of transposing to get what you want
You need to create a grouping vector that is something like c(1,1,1,1,1,2,2,2,2,2,....,10,10,10,10,10)
grp <- rep(seq_len(ceiling(ncol(mat)/5)), each = 5, length.out = ncol(mat))
# this will also work, but may be less clear why.
# grp <- (seq_len(ncol(mat))-1) %/%5
rowsum computes column sums across rows of a numeric matrix-like object for each level of a grouping variable
You are looking for row sums across columns, so you will have to transpose your results (and your input)
t(rowsum(t(mat),grp))

generate all possible column combinations and create one matrix for each of them in R

I have a matrix like this one:
myarray=cov(matrix(rexp(200),50,10))
I would like to generate all possible combinations of columns and compute the correlation matrix for each combination, if possible, using column numbers instead of names. In a second step I would like to compute the determinant of each matrix so maybe there is an efficient way to do it.
Here is one way:
list.of.matrices <- apply(expand.grid(rep(list(c(FALSE, TRUE)), ncol(myarray))),
1, function(j)myarray[, j, drop = FALSE])
length(list.of.matrices)
# [1] 1024
Then do something like:
result <- sapply(list.of.matrices, function_of_your_choice)
but note that det can only be applied to square matrices... Please clarify.

converting irregular grid to regular grid

I have a set of observation in irregular grid. I want to have them in regular grid with resolution of 5. This is an example :
d <- data.frame(x=runif(1e3, 0, 30), y=runif(1e3, 0, 30), z=runif(1e3, 0, 30))
## interpolate xy grid to change irregular grid to regular
library(akima)
d2 <- with(d,interp(x, y, z, xo=seq(0, 30, length = 500),
yo=seq(0, 30, length = 500), duplicate="mean"))
how can I have the d2 in SpatialPixelDataFrame calss? which has 3 colomns, coordinates and interpolated values.
You can use code like this (thanks to the comment by #hadley):
d3 <- data.frame(x=d2$x[row(d2$z)],
y=d2$y[col(d2$z)],
z=as.vector(d2$z))
The idea here is that a matrix in R is just a vector with a bit of extra information about its dimensions. The as.vector call drops that information, turning the 500x500 matrix into a linear vector of length 500*500=250000. The subscript operator [ does the same, so although row and col originally return a matrix, that is treated as a linear vector as well. So in total, you have three matrices, turn them all to linear vectors with the same order, use two of them to index the x and y vectors, and combine the results into a single data frame.
My original solution didn't use row and col, but instead rep to formulate the x and y columns. It is a bit more difficult to understand and remember, but might be a bit more efficient, and give you some insight useful for more difficult applications.
d3 <- data.frame(x=rep(d2$x, times=500),
y=rep(d2$y, each=500),
z=as.vector(d2$z))
For this formulation, you have to know that a matrix in R is stored in column-major order. The second element of the linearized vector therefore is d2$z[2,1], so the rows number will change between two subsequent values, while the column number will remain the same for a whole column. Consequently, you want to repeat the x vector as a whole, but repeat each element of y by itself. That's what the two rep calls do.

Resources