Forloop using variable name to fill array in R - r

within a function in R I need to fill in a specific array. It is working when I write out all the lines one by one by hand, but I was wondering if it is possible to use a forloop, as the array will be way bigger than the example below.
A simplified example of what I try to do:
dt <- data.frame(prob_name = c("q_1", "q_2", "p_1", "p_2", "p_3"),
prob=c(100,200,0.07, 0.08, 0.09))
dt <- setNames(data.frame(t(dt[,-1])), dt[,1])
trans_mat <- array(0, dim = c(2, 2, 3))
for (i in 1:nrow(dt)) {
trans_mat[1, 2, i] <- p_i
}
I want those specific places in the array to be filled with the corresponding probability, so the array will be
1) 0, 0.07
0, 0
2) 0, 0.08
0, 0
etc
Is there a way to do this with a forloop (as the forloop is not recognizing the "i" in "p_i"), or do I have to write this all out like
trans_mat[1,2,1] <- p_1
Thanks in advance!

Loop over the sequence of third dimension of 'trans_mat' instead of the nrow of 'dt' as number of rows of dt is just 1., then extract ([[) the column 'p_', i, by pasteing and do the assignment
for(i in seq(dim(trans_mat)[3])) trans_mat[1, 2, i] <- dt[[paste0("p_", i)]]
-output
> trans_mat
, , 1
[,1] [,2]
[1,] 0 0.07
[2,] 0 0.00
, , 2
[,1] [,2]
[1,] 0 0.08
[2,] 0 0.00
, , 3
[,1] [,2]
[1,] 0 0.09
[2,] 0 0.00

Using replace in sapply.
sapply(dt[1, 3:5], \(x) replace(array(0, c(2, 2)), 3, x), simplify='array')
# , , p_1
#
# [,1] [,2]
# [1,] 0 0.07
# [2,] 0 0.00
#
# , , p_2
#
# [,1] [,2]
# [1,] 0 0.08
# [2,] 0 0.00
#
# , , p_3
#
# [,1] [,2]
# [1,] 0 0.09
# [2,] 0 0.00
Data:
dt <- structure(list(q_1 = 100, q_2 = 200, p_1 = 0.07, p_2 = 0.08,
p_3 = 0.09), class = "data.frame", row.names = c(NA, -1L))

Related

calculate frequency or percentage matrix in R

if I have the following:
mm <- matrix(0, 4, 3)
mm<-apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
> mm
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 0
[3,] 0 0 0
[4,] 1 0 1
How do I output a matrix that expresses the frequency or percentage of different columns where both values = 1. For example - there are two rows out of 4 where column 1 and column 2 both equal 1 (=0.5) and 1 row out of 4 where column 2 and column 3 = 1 (=0.25), so in this case I'd need:
[,1] [,2] [,3]
[1,] 1 0.5 0.5
[2,] 0.5 1 0.25
[3,] 0.5 0.25 1
I am not interested in comparing the same columns, so by default the diagonal remains at 1.
I thought I may get somewhere with cor(mm) where there may be a way to output co-frequencies or co-percentages instead of correlation coefficients but this appears to not be the case. But the dimensions of the final output should be an N by N column matrix as cor() outputs:
> cor(mm)
[,1] [,2] [,3]
[1,] 1.0000000 0.5773503 0.5773503
[2,] 0.5773503 1.0000000 0.0000000
[3,] 0.5773503 0.0000000 1.0000000
but obviously these are correlation coefficients, I just want to co-frequencies or co-percentages instead.
A base R solution is using crossprod, i.e.,
r <- `diag<-`(crossprod(mm)/nrow(mm),1)
such that
> r
[,1] [,2] [,3]
[1,] 1.0 0.50 0.50
[2,] 0.5 1.00 0.25
[3,] 0.5 0.25 1.00
DATA
mm <- structure(c(1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1), .Dim = 4:3)
set.seed(123)
mm <- matrix(0, 4, 3)
mm<-apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
combinations <- expand.grid(1:ncol(mm), 1:ncol(mm))
matrix(unlist(Map(function(x, y) {
if (x == y) {
res <- 1
} else {
res <- sum(mm[, x] * mm[, y]) / nrow(mm)
}
res
}, combinations[, 1], combinations[, 2])), 3)
# [,1] [,2] [,3]
# [1,] 1.00 0.25 0.0
# [2,] 0.25 1.00 0.5
# [3,] 0.00 0.50 1.0

How to divide a matrix by the sum of rows and it has zeros

I am pretty new to R and I have a loop which gives sometimes a matrix like this:
1 2
FALSE 0 0
TRUE 0 2
I need to do as follows:
If the two cells in a single row have zeros replace them by 0.5
If one of the cells is not zero divide by the sum of the row
so the result of this will be:
1 2
FALSE 0.5 0.5
TRUE 0 1
Any idea please?
Thank you
If your matrix is x,
(x <- matrix(c(0, 0, 0, 2), 2))
# [,1] [,2]
# [1,] 0 0
# [2,] 0 2
zero_rows <- as.logical(rowSums(x != 0))
x[zero_rows,] <- x[zero_rows,]/sum(x[zero_rows,])
x[rowSums(x) == 0, ] <- rep(0.5, ncol(x))
x
# [,1] [,2]
# [1,] 0.5 0.5
# [2,] 0.0 1.0
This will work for a matrix (2 dimensional array) of arbitrary size
#akrun's suggested edit, constructing zero_rows with rowSums(x != 0) instead of apply(x, 1, function(r) 0 %in% r) should make this even more efficient.
Let x <- matrix(c(0, 0, 0, 2), 2))
t(apply(x,1,function(y)if(all(!y))replace(y,!y,0.5)else if(any(!y))y/sum(y) else y))
[,1] [,2]
[1,] 0.5 0.5
[2,] 0.0 1.0
x = matrix(c(0, 0, 0, 2), 2)
t(apply(x, 1L, function(y) ifelse(all(y == 0), return(rep(0.5, length(y))), return(y/sum(y)))))
# [,1] [,2]
#[1,] 0.5 0.5
#[2,] 0.0 1.0

How to generate L-length binary code permutation matrix

Apologize for the title. Here is my problem.
I need to generate a L-length binary code permutation.
For L = 2, its like:
00
01
10
11
My idea is to divide the problem into two part:
write a function f that transform an integer to its binary form
loop 1:2^L
Here is my function for step 1:
kbitpermute <- function(input = 2^16 - 1, k = 16){
ret = matrix(0, ncol = k, nrow = 1)
for(i in 1:k){
ret[1,k-i+1] = input & 1
input = bitwShiftR(input, 1)
}
return(ret)
}
And then I use lapply() to obtain the permutation, like:
temp3 = lapply(1:2^2 - 1, kbitpermute, k = 2)
temp3
[[1]]
[,1] [,2]
[1,] 0 0
[[2]]
[,1] [,2]
[1,] 0 1
[[3]]
[,1] [,2]
[1,] 1 1
[[4]]
[,1] [,2]
[1,] 1 1
It seems works, but how to transform the output of lapply to its matrix form? And, is there any easy way to do the whole task?
=== update ===
I find a way to use unlist to do the transformation, like
temp3 = t(matrix(unlist(lapply(1:2^2 - 1, kbitpermute, k = 2)), nrow = 2, ncol=2^2)).
But is there any easy way?
=== update ===
right version of kbitpermute, and special thanks to #Batanichek
kbitpermute <- function(input = 2^16 - 1, k = 16){
ret = matrix(0, ncol = k, nrow = 1)
for(i in 1:k){
ret[1,k-i+1] = bitwAnd(input, 1)
input = bitwShiftR(input, 1)
}
return(ret)
}
> t(sapply(1:2^2 - 1, kbitpermute, k = 2))
[,1] [,2]
[1,] 0 0
[2,] 0 1
[3,] 1 0
[4,] 1 1
Try something like ( from here)
test=function(x,k){
a=rev(as.integer(intToBits(x))[1:k])
return(a)
}
x=2
t(sapply(1:2^x - 1, test,k=x))
# [,1] [,2]
#[1,] 0 0
#[2,] 0 1
#[3,] 1 0
#[4,] 1 1
Time comparison
x=15
system.time(t(sapply(1:2^x - 1, test,k=x)))
#пользователь система прошло
# 0.62 0.21 1.35
system.time(t(sapply(1:2^x - 1, kbitpermute,k=x)))
#пользователь система прошло
# 1.84 0.35 2.48
(P.S.have not english local )

R function to compute deviation matrix

I've writen a function to compute a matrix where each column is the corresponding input matrix column minus the column mean.
# compute the deviation matrix
deviation <- function(X) {
one <- rep(1, nrow(X))
n <- ncol(X)
d <- matrix(data = NA, nrow = nrow(X), ncol = ncol(X))
for(i in seq.int(from = 1, to = n)) {
d[,i] <- X[,i] - mean(X[,i], na.rm = TRUE) * one
}
d
}
Could this function be written more idiomatically in R (using functional programming, perhaps)?
Use sweep and colMeans:
sweep(mat, 2, colMeans(mat))
By default, sweep uses - or the subtraction function, taking the column means as calculated by colMeans, from the values in each column (MARGIN=2). Gives the same result:
mat <- matrix(1:12,nrow=3)
deviation(mat)
# [,1] [,2] [,3] [,4]
#[1,] -1 -1 -1 -1
#[2,] 0 0 0 0
#[3,] 1 1 1 1
sweep(mat, 2, colMeans(mat))
# [,1] [,2] [,3] [,4]
#[1,] -1 -1 -1 -1
#[2,] 0 0 0 0
#[3,] 1 1 1 1

How to search through sequentially numbered matrix variables in R

I have a question pertaining to R.
I have some sequentially numbered matrices (all of the same dimensions) and I want to search them all and produce a final matrix that contains (for each matrix element) the number of times a defined threshold was exceeded.
As an example, I could choose a threshold of 0.7 and I could have the following three matrices.
matrix1
[,1] [,2] [,3]
[1,] 0.38 0.72 0.15
[2,] 0.58 0.37 0.09
[3,] 0.27 0.55 0.22
matrix2
[,1] [,2] [,3]
[1,] 0.19 0.78 0.72
[2,] 0.98 0.65 0.46
[3,] 0.72 0.57 0.76
matrix3
[,1] [,2] [,3]
[1,] 0.39 0.68 0.31
[2,] 0.40 0.05 0.92
[3,] 1.00 0.43 0.21
My desired output would then be
[,1] [,2] [,3]
[1,] 0 2 1
[2,] 1 0 1
[3,] 2 0 1
If I do this:
test <- matrix1 >= 0.7
test[test==TRUE] = 1
then I get a matrix that has a 1 where the threshold is exceeded, and 0 where it's not. So this is a key step in what I want to do:
test=
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 0 0
[3,] 0 0 0
My thought is to make a loop so I perform this calculation on each matrix and add each result of "test" so I get the final matrix I desire. But I'm not sure about two things: how to use a counter in the variable name "matrix", and second if there's a more efficient way than using a loop.
So I'm thinking of something like this:
output = matrix(0,3,3)
for i in 1:3 {
test <- matrixi >= 0.7
test[test==TRUE] = 1
output = output + test }
Of course, this doesn't work because matrixi does not translate to matrix1, matrix2, etc.
I really appreciate your help!!!
If you stored your matrices in a list you would find the manipulations easier:
lst <- list(matrix(c(0.38, 0.58, 0.27, 0.72, 0.37, 0.55, 0.15, 0.09, 0.22), nrow=3),
matrix(c(0.19, 0.98, 0.72, 0.78, 0.65, 0.57, 0.72, 0.46, 0.76), nrow=3),
matrix(c(0.39, 0.40, 1.00, 0.68, 0.05, 0.43, 0.31, 0.92, 0.21), nrow=3))
Reduce("+", lapply(lst, ">=", 0.7))
# [,1] [,2] [,3]
# [1,] 0 2 1
# [2,] 1 0 1
# [3,] 2 0 1
Here, the lapply(lst, ">=", 0.7) returns a list with x >= 0.7 called for every matrix x stored in lst. Then Reduce called with + sums them all up.
If you just have three matrices, you could just do something like lst <- list(matrix1, matrix2, matrix3). However, if you have a lot more (let's say 100, numbered 1 through 100), it's probably easier to do lst <- lapply(1:100, function(x) get(paste0("matrix", x))) or lst <- mget(paste0("matrix", 1:100)).
For 100 matrices, each of size 100 x 100 (based on your comment this is roughly the size of your use case), the Reduce approach with a list seems to be a bit faster than the rowSums approach with an array, though both are quick:
# Setup test data
set.seed(144)
for (i in seq(100)) {
assign(paste0("matrix", i), matrix(rnorm(10000), nrow=100))
}
all.equal(sum.josilber(), sum.gavin())
# [1] TRUE
library(microbenchmark)
microbenchmark(sum.josilber(), sum.gavin())
# Unit: milliseconds
# expr min lq median uq max neval
# sum.josilber() 6.534432 11.11292 12.47216 17.13995 160.1497 100
# sum.gavin() 11.421577 16.54199 18.62949 23.09079 165.6413 100
If you put the matrices in an array, this is easy to do without a loop. Here's an example:
## dummy data
set.seed(1)
m1 <- matrix(runif(9), ncol = 3)
m2 <- matrix(runif(9), ncol = 3)
m3 <- matrix(runif(9), ncol = 3)
Stick these into an array
arr <- array(c(m1, m2, m3), dim = c(3,3,3))
Now each matrix is like a plate and the array is a stack of these plates.
Do as you did and convert the array into an indicator array (you don't need to save this step, it could be done inline in the next call)
ind <- arr > 0.7
This gives:
> ind
, , 1
[,1] [,2] [,3]
[1,] FALSE TRUE TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE TRUE FALSE
, , 2
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE TRUE
[3,] FALSE TRUE TRUE
, , 3
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] TRUE FALSE FALSE
[3,] TRUE FALSE FALSE
Now use the rowSums() function to compute the values you want
> rowSums(ind, dims = 2)
[,1] [,2] [,3]
[1,] 0 1 1
[2,] 1 0 1
[3,] 1 2 1
Note that the thing that is summed over in rowSums() is (somewhat confusing!) the dimension dims + 1. In this case, we are summing the values down through the stack of plates (the array) for each 3*3 cell, hence the 9 values in the output.
If you need to get your objects into the array form, you can do this via
arr2 <- do.call("cbind", mget(c("m1","m2","m3")))
dim(arr2) <- c(3,3,3) # c(nrow(m1), ncol(m1), nmat)
> all.equal(arr, arr2)
[1] TRUE
For larger problems (more matrices) use something like
nmat <- 200 ## number matrices
matrices <- paste0("m", seq_len(nmat))
arr <- do.call("cbind", mget(matrices))
dim(arr) <- c(dim(m1), nmat)

Resources