How to repeat in R - r

I am a newbie in R, now I have a vector H(0.6,0.045,3), I want to create a matrix A, the number of rows of this matrix can be determined by myself, each row is the value of this vector:0.6,0.045,3. like this:
A (0.6,0.045,3,
0.6,0.045,3,
0.6,0.045,3,
0.6,0.045,3,
............)

You can specify number of rows and columns in matrix function.
vec <- c(0.6,0.045,3)
nr <- 4
matrix(vec, nrow = nr, ncol = length(vec), byrow = TRUE)
# [,1] [,2] [,3]
#[1,] 0.6 0.045 3
#[2,] 0.6 0.045 3
#[3,] 0.6 0.045 3
#[4,] 0.6 0.045 3
Another option is to use replicate :
t(replicate(nr, vec))

Related

row sum based on matrix with logicals

I have two dataframes that look similar to this example:
> matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix(rexp(9), 3) < 1
[,1] [,2] [,3]
[1,] TRUE TRUE FALSE
[2,] FALSE TRUE FALSE
[3,] FALSE FALSE TRUE
I want to sum individual entries of a row, but only when the logical matrix of the same size is TRUE, else this row element should not be in the sum. All rows have at least one case where one element of matrix 2 is TRUE.
The result should look like this
[,1]
[1,] 12
[2,] 5
[3,] 9
Thanks for the help.
Multiplying your T/F matrix by your other one will zero out all the elements where FALSE. You can then sum by row.
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) < 1
as.matrix(rowSums(m1 * m2), ncol = 1)
We replace the elements to NA and use rowSums with na.rm
matrix(rowSums(replace(m1, m2, NA), na.rm = TRUE))
# [,1]
#[1,] 12
#[2,] 5
#[3,] 9
Or use ifelse
matrix(rowSums(ifelse(m2, 0, m1)))
data
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) >= 1

Efficient way to generate a coincidence matrix

I want to generate a simple coincidence matrix, I've looked for R packages but could not find one that does this calculation so far, I don't know if the English term for this matrix is different than the Portuguese one... so, that's what I need to do.
I have a matrix:
[,1] [,2] [,3] [,4]
[1,] 1 1 2 1
[2,] 1 2 3 1
[3,] 2 3 1 2
[4,] 1 2 3 3
A coincidence matrix will be calculated comparing each element row by row to generate a dissimilarity distance with the formula:
Diss = 1 - (Coincidences / (Coincidences + Discordance))
So my resulting matrix is an symmetrical one with dim 4x4 and diagonal elements equal 0, so in the example my A(1,2) would it be:
A(1,2) = 1 - (2 / 4) = 0.5
A(1,3) = 1 - (0/4) = 1.0
And so on...
I have created a function to generate this matrix:
cs_matrix <- function (x) {
cs.mat <- matrix(rep(0,dim(x)[1]^2), ncol = dim(x)[1])
for (i in 1:dim(x)[1]){
for (j in 1:dim(x)[1]){
cs.mat[i,j] <- 1 - (sum(x[i,] == x[j,]) / dim(x)[2])
}
}
return(cs.mat)
}
The function works fine, but my actual Data Set has 2560 observations of 4 variables, thus generating a 2560 x 2560 coincidence matrix, and it takes quite some time to do the calculation. I wonder if there is a more efficient way of calculating this or even if there is already a package that can calculate this dissimilarity distance. This matrix will be later used in Cluster Analysis.
I think you can use outer
add <- function(x, y) sum(mat[x, ] == mat[y,])
nr <- seq_len(nrow(mat))
mat1 <- 1 - outer(nr, nr, Vectorize(add))/ncol(mat)
mat1
# [,1] [,2] [,3] [,4]
#[1,] 0.00 0.50 1 0.75
#[2,] 0.50 0.00 1 0.25
#[3,] 1.00 1.00 0 1.00
#[4,] 0.75 0.25 1 0.00
If diagonal elements need to be 1 do diag(mat1) <- 1.
data
mat <- structure(c(1, 1, 2, 1, 1, 2, 3, 2, 2, 3, 1, 3, 1, 1, 2, 3), .Dim = c(4L,4L))

How to find column index for top n values of a matrix efficiently?

Given a matrix, say m, is there any direct method to find top k values of m and then find exactly which column/row do they belong to. I couldn't find any on SO and hence, putting this question.
My try on the above has been this:
set.seed(1729)
k=5 #top 5
m = matrix(round(runif(30),digits = 2),nr=10)
idx <- which(matrix(m %in% head(sort(m), k), nr = nrow(m)), arr.ind = TRUE)
print(m)
[,1] [,2] [,3]
[1,] 0.59 0.54 0.57
[2,] 0.44 0.43 0.32
[3,] 0.57 0.08 0.29
[4,] 0.35 0.58 0.24
[5,] 0.86 0.52 0.53
[6,] 0.41 0.78 0.17
[7,] 0.51 0.47 0.26
[8,] 0.15 0.81 0.49
[9,] 0.85 0.64 0.64
[10,] 1.00 0.78 0.95
print(idx)
row col
[1,] 8 1
[2,] 3 2
[3,] 4 3
[4,] 6 3
[5,] 7 3
I am not sure if this is efficient because of the reason that I am sorting the entire values of a matrix rather than picking up those k values. I would like to assume k << length(m).
Are there any efficient ways for a large matrix m, and also are there any methods which could help me with duplicates in the scenarios like when one wants to get top k column names
For example: with a matrix mm, I need to identify top 2 columns having least values. Here, for the following case I am expecting columns 1 and 2
mm = matrix(c(6,6,7,8,7,9,8,8,9), 3)
print(mm)
[,1] [,2] [,3]
[1,] 6 8 8
[2,] 6 7 8
[3,] 7 9 9
idx <- which(matrix(mm %in% head(sort(mm), 2), nr = nrow(mm)), arr.ind = TRUE)
print(idx)
row col
[1,] 1 1
[2,] 2 1
But, here I get only one column, i.e.; 1 , In this case, output should be two different columns having least values viz. 1 and 2
Here's a comparison of the OP's approach, #Barker's suggestion to substitute in R's partial sorting functionality and a way using quantile:
# example data
set.seed(1729)
n = 1e6
k = 50
m = matrix(runif(n), nr=10)
# illustration of the quantile way
which(m <= quantile(m, k/length(m)), arr.ind = TRUE)
# or...
library(data.table)
setDT(melt(m))[ value <= quantile(value, k/.N) ]
# Var1 Var2 value
# 1: 8 4945 1.471722e-06
# 2: 1 7025 1.856475e-05
# 3: 9 7480 4.518987e-05
# 4: 10 8378 1.877453e-05
# 5: 2 9043 3.262958e-05
# 6: 7 9925 1.327880e-05
# 7: 5 13571 5.097035e-05
# ...
# benchmark
microbenchmark::microbenchmark(times = 30,
idx = idx <- which(matrix(m %in% head(sort(m), k), nr = nrow(m)), arr.ind = TRUE),
dtq = dtq <- setDT(melt(m))[ value <= quantile(value, k/.N) ],
idxp = idxp <- which(matrix(m %in% head(sort(m, partial = 1:k), k), nr = nrow(m)), arr.ind = TRUE),
idxq = idxq <- which(m <= quantile(m, k/length(m)), arr.ind = TRUE)
)
# verifying, requires data.table 1.9.7+
fsetequal(as.data.table(idx), dtq[, .(row = Var1, col = Var2)])
fsetequal(as.data.table(idxp), dtq[, .(row = Var1, col = Var2)])
fsetequal(as.data.table(idxq), dtq[, .(row = Var1, col = Var2)])
which gives
Unit: milliseconds
expr min lq mean median uq max neval cld
idx 145.01260 148.10571 155.27124 149.97761 152.45523 206.27179 30 d
dtq 30.05910 33.11280 44.83088 35.02334 37.78721 90.92545 30 b
idxp 114.69501 118.23185 127.37992 119.50131 121.33241 175.41117 30 c
idxq 13.02406 14.47907 22.81266 16.41707 18.28308 68.53364 30 a
I took out the OP's rounding for this example. Tweaking the parameters n and k may lead to a different ranking of the approaches. My preferred way would be setDT(melt(m))[order(value, partial = 1:k)] but it looks like that's not available in R yet.

Moving Data from Matrix A to Matrix B in R

I want to cut/move/replace some data (to be precise 2500) from Matrix A to Matrix B in R.
for example Move cell(i,j) from matrix A to cell(i,j) in matrix B. both i and j have some fixed value(50 to be precise) and replace that cell(i,j) in matrix A with "0".
Since I am newto programming can anyone help me with the coding?
Thanks in Advance
Regards
You can first define a two column coordinate-matrix of the values you want to replace, where the first column refers is the row-index and the second column is the column-index. As an example, suppose you want to replace the cells c(2,1), c(2,2) and c(1,2) in a 3x3 matrix B with the calues from a 3x3 matrix A:
ind <- cbind(c(2,2,1), c(1,2,2))
A <- matrix(1:9, ncol = 3)
B <- matrix(NA, ncol = 3, nrow = 3)
B[ind] <- A[ind]; A[ind] <- 0
B
[,1] [,2] [,3]
[1,] NA 4 NA
[2,] 2 5 NA
[3,] NA NA NA
A
[,1] [,2] [,3]
[1,] 1 0 7
[2,] 0 0 8
[3,] 3 6 9

How to search through sequentially numbered matrix variables in R

I have a question pertaining to R.
I have some sequentially numbered matrices (all of the same dimensions) and I want to search them all and produce a final matrix that contains (for each matrix element) the number of times a defined threshold was exceeded.
As an example, I could choose a threshold of 0.7 and I could have the following three matrices.
matrix1
[,1] [,2] [,3]
[1,] 0.38 0.72 0.15
[2,] 0.58 0.37 0.09
[3,] 0.27 0.55 0.22
matrix2
[,1] [,2] [,3]
[1,] 0.19 0.78 0.72
[2,] 0.98 0.65 0.46
[3,] 0.72 0.57 0.76
matrix3
[,1] [,2] [,3]
[1,] 0.39 0.68 0.31
[2,] 0.40 0.05 0.92
[3,] 1.00 0.43 0.21
My desired output would then be
[,1] [,2] [,3]
[1,] 0 2 1
[2,] 1 0 1
[3,] 2 0 1
If I do this:
test <- matrix1 >= 0.7
test[test==TRUE] = 1
then I get a matrix that has a 1 where the threshold is exceeded, and 0 where it's not. So this is a key step in what I want to do:
test=
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 0 0
[3,] 0 0 0
My thought is to make a loop so I perform this calculation on each matrix and add each result of "test" so I get the final matrix I desire. But I'm not sure about two things: how to use a counter in the variable name "matrix", and second if there's a more efficient way than using a loop.
So I'm thinking of something like this:
output = matrix(0,3,3)
for i in 1:3 {
test <- matrixi >= 0.7
test[test==TRUE] = 1
output = output + test }
Of course, this doesn't work because matrixi does not translate to matrix1, matrix2, etc.
I really appreciate your help!!!
If you stored your matrices in a list you would find the manipulations easier:
lst <- list(matrix(c(0.38, 0.58, 0.27, 0.72, 0.37, 0.55, 0.15, 0.09, 0.22), nrow=3),
matrix(c(0.19, 0.98, 0.72, 0.78, 0.65, 0.57, 0.72, 0.46, 0.76), nrow=3),
matrix(c(0.39, 0.40, 1.00, 0.68, 0.05, 0.43, 0.31, 0.92, 0.21), nrow=3))
Reduce("+", lapply(lst, ">=", 0.7))
# [,1] [,2] [,3]
# [1,] 0 2 1
# [2,] 1 0 1
# [3,] 2 0 1
Here, the lapply(lst, ">=", 0.7) returns a list with x >= 0.7 called for every matrix x stored in lst. Then Reduce called with + sums them all up.
If you just have three matrices, you could just do something like lst <- list(matrix1, matrix2, matrix3). However, if you have a lot more (let's say 100, numbered 1 through 100), it's probably easier to do lst <- lapply(1:100, function(x) get(paste0("matrix", x))) or lst <- mget(paste0("matrix", 1:100)).
For 100 matrices, each of size 100 x 100 (based on your comment this is roughly the size of your use case), the Reduce approach with a list seems to be a bit faster than the rowSums approach with an array, though both are quick:
# Setup test data
set.seed(144)
for (i in seq(100)) {
assign(paste0("matrix", i), matrix(rnorm(10000), nrow=100))
}
all.equal(sum.josilber(), sum.gavin())
# [1] TRUE
library(microbenchmark)
microbenchmark(sum.josilber(), sum.gavin())
# Unit: milliseconds
# expr min lq median uq max neval
# sum.josilber() 6.534432 11.11292 12.47216 17.13995 160.1497 100
# sum.gavin() 11.421577 16.54199 18.62949 23.09079 165.6413 100
If you put the matrices in an array, this is easy to do without a loop. Here's an example:
## dummy data
set.seed(1)
m1 <- matrix(runif(9), ncol = 3)
m2 <- matrix(runif(9), ncol = 3)
m3 <- matrix(runif(9), ncol = 3)
Stick these into an array
arr <- array(c(m1, m2, m3), dim = c(3,3,3))
Now each matrix is like a plate and the array is a stack of these plates.
Do as you did and convert the array into an indicator array (you don't need to save this step, it could be done inline in the next call)
ind <- arr > 0.7
This gives:
> ind
, , 1
[,1] [,2] [,3]
[1,] FALSE TRUE TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE TRUE FALSE
, , 2
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE TRUE
[3,] FALSE TRUE TRUE
, , 3
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] TRUE FALSE FALSE
[3,] TRUE FALSE FALSE
Now use the rowSums() function to compute the values you want
> rowSums(ind, dims = 2)
[,1] [,2] [,3]
[1,] 0 1 1
[2,] 1 0 1
[3,] 1 2 1
Note that the thing that is summed over in rowSums() is (somewhat confusing!) the dimension dims + 1. In this case, we are summing the values down through the stack of plates (the array) for each 3*3 cell, hence the 9 values in the output.
If you need to get your objects into the array form, you can do this via
arr2 <- do.call("cbind", mget(c("m1","m2","m3")))
dim(arr2) <- c(3,3,3) # c(nrow(m1), ncol(m1), nmat)
> all.equal(arr, arr2)
[1] TRUE
For larger problems (more matrices) use something like
nmat <- 200 ## number matrices
matrices <- paste0("m", seq_len(nmat))
arr <- do.call("cbind", mget(matrices))
dim(arr) <- c(dim(m1), nmat)

Resources