Extension/Optimisation of code: from one to several iterations - r

For a matrix of pairwise distances pdm (symmetric), where each row/column represents a point, and a vector of distances r, I will do th following for each point
# some small toy data
# note. real data is bigger, e.g. ~15k points.
pdm <- matrix(data = c(0, 4, 3,
4, 0, 2,
3, 2, 0),
nrow = 3, ncol = 3)
r <- seq(0, 5, .5)
length(r)
#> [1] 11
# index m correspondens to order of points.
m <- c(1, 2, 3)
# change format
pdml <- as.list(as.data.frame(pdm))
# ---- 1
# procedure for first point (1)
a <- list()
for(i in seq_along(r)) {
a[[i]] <- ifelse(0 < pdml[[1]] & pdml[[1]] <= r[i], 1, 0)
a[[i]] <- which(a[[i]] != 0)
# if-statement is needed since which() produces annoying integer(0) entries
if(identical(a[[i]], integer(0))) a[[i]] <- 0
a[[i]] <- sum(m[1] * m[a[[i]]])
}
# change format
do.call(rbind, a)
#> [,1]
#> [1,] 0
#> [2,] 0
#> [3,] 0
#> [4,] 0
#> [5,] 0
#> [6,] 0
#> [7,] 3
#> [8,] 3
#> [9,] 5
#> [10,] 5
#> [11,] 5
# ---- 2
# procedure for second point (2),
# ... adaption: pdml[[2]] and m[2]
Created on 2022-08-09 by the reprex package (v2.0.1)
Desired Output
After the calculation is done, I would like to calc. the average for each distance $r_i$ across all points.
Can somebody please provide a solution to extend my approach to all points or by showing an alternative, which cerntainly is more efficient? Also, any recommendation on how to improve my question is much appreciated.
Note, if it makes things easier, it is, of course, also an option to use the upper/lower half of pdm only.

If you precalculate the possible products you want to sum with tcrossprod(m),
you can simplify the calculation to a couple of matrix operations:
# Input data
m <- c(1, 2, 3)
d <- matrix(
data = c(
0, 4, 3,
4, 0, 2,
3, 2, 0
),
nrow = 3,
ncol = 3
)
r <- seq(0, 5) # Reduced for simplicity
# Possible summands
v <- tcrossprod(m) * (d != 0)
v
#> [,1] [,2] [,3]
#> [1,] 0 2 3
#> [2,] 2 0 6
#> [3,] 3 6 0
# The calculation
a <- sapply(r, \(r) colSums(v * (d <= r)))
a
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 0 0 3 5 5
#> [2,] 0 0 6 6 8 8
#> [3,] 0 0 6 9 9 9
And since you said you then wanted the mean for each distance, over points:
colMeans(a)
#> [1] 0.000000 0.000000 4.000000 6.000000 7.333333 7.333333
A slightly more obscure but potentially faster way to find a would be
with 3-d arrays:
colSums(outer(v, rep(1, length(r))) * outer(d, r, `<=`))
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 0 0 3 5 5
#> [2,] 0 0 6 6 8 8
#> [3,] 0 0 6 9 9 9

Related

High Dimensional Array

I have a 13 dimensional array:
MatrixQ<-array(0,dim=c(2,2,2,2,2,2,2,2,2,2,2,2,3))
How can I refer to ith dimension of it in a loop?
for (i in 1:13)
{
I want to assign the number i to the ith dimension of the array
}
What command I should use?
Thanks,
To my knowledge, there is no easy way of doing this built into R. However, you can take advantage of the fact that an array is actually stored as a vector with a dimension (dim) attribute. You can therefore actually access elements directly without using multiple commas. The difficulty is working out which indices of the underlying vector represent the slice of the array you want. This just requires a bit of maths.
I think you're looking for something like this:
at <- function(Array, Dimension, Slice, index = FALSE)
{
if(!is.array(Array))
stop("'at()' can only be called on arrays")
if(!(is.numeric(Dimension) && Dimension > 0))
stop("Invalid value of Dimension supplied to 'at()'")
# Get the numbers of dimensions and elements in our Array
n <- length(Array)
dims <- dim(Array)
n_dims <- length(dims)
if(Dimension > n_dims | Dimension < 1)
stop("Invalid dimension chosen in 'at()'")
if(max(Slice) > dims[Dimension])
stop("Invalid slice chosen for given dimension")
final_result <- numeric()
for(i in seq_along(Slice))
{
run_length <- cumprod(c(1, dims)[-(n_dims + 1)])[Dimension]
skip_length <- run_length * (dims[Dimension] - 1)
# Now we simply make a repeating pattern of membership / non-membership
pattern <- rep_len(c(rep(T, run_length), rep(F, skip_length)), n)
shifted_pattern <- c(rep(FALSE, run_length * (Slice[i] - 1)), pattern)
loop_result <- which(shifted_pattern[seq(n)])
final_result <- c(final_result, loop_result)
}
if(index == FALSE)
{
dims[Dimension] <- length(Slice)
return(array(Array[final_result], dim = dims))
}
return(sort(final_result))
}
Here's how you can use it. Start with an array (let's have just 3 dimensions)
my_array <- array(0, dim = c(2, 3, 4))
my_array
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 0 0 0
#> [2,] 0 0 0
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 0 0 0
#> [2,] 0 0 0
#>
#> , , 3
#>
#> [,1] [,2] [,3]
#> [1,] 0 0 0
#> [2,] 0 0 0
#>
#> , , 4
#>
#> [,1] [,2] [,3]
#> [1,] 0 0 0
#> [2,] 0 0 0
Now I can get the indices of the underlying array that represent the 2nd matrix of the third dimension (i.e. my_array[, , 2]) like this:
at(my_array, 3, 2, index = T)
# [1] 7 8 9 10 11 12
That means if I write anything to my_array[c(7, 8, 9, 10, 11, 12)], it will change the elements in the appropriate slice of the matrix:
my_array[at(my_array, 3, 2, index = T)] <- 69
my_array
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 0 0 0
#> [2,] 0 0 0
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 69 69 69
#> [2,] 69 69 69
#>
#> , , 3
#>
#> [,1] [,2] [,3]
#> [1,] 0 0 0
#> [2,] 0 0 0
#>
#> , , 4
#>
#> [,1] [,2] [,3]
#> [1,] 0 0 0
#> [2,] 0 0 0
And of course this means we can do a loop where we can select dimensions. Here, we'll first reset my_array to zeros, then we'll put the dimension number into each element in the first slice within that dimension. Notice that some cells get overwritten in the process if they belong to the first slice of multiple dimensions.
# Reset my_array first
my_array[] <- 0
for(i in 1:3)
{
my_array[at(my_array, i, 1, index = T)] <- i;
}
my_array
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 3 3 3
#> [2,] 3 3 3
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 2 1 1
#> [2,] 2 0 0
#>
#> , , 3
#>
#> [,1] [,2] [,3]
#> [1,] 2 1 1
#> [2,] 2 0 0
#>
#> , , 4
#>
#> [,1] [,2] [,3]
#> [1,] 2 1 1
#> [2,] 2 0 0
Although the example is for a 3-D array, this should work for any number of dimensions.
The reason for the index = T is that by omitting that, we can directly get the slice we want without using indexing:
at(my_array, 3, 1:2)
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 3 3 3
#> [2,] 3 3 3
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 2 1 1
#> [2,] 2 0 0
For the example in the OP's answer, you could use this to do as follows:
Indexlist2 <- numeric()
for(j in 2:4)
{
if(StartPoint2QQQ[j] == 1)
{
j_matches <- which(at(MatrixQQQ, j, 1) == max(at(MatrixQQQ, j, 1)), arr.ind = T)
Indexlist2 <- rbind(Indexlist2, j_matches)
}
}
Now we have
Indexlist2
#> dim1 dim2 dim3 dim4
#> [1,] 2 2 1 2
#> [2,] 2 2 2 1
MatrixQQQ[Indexlist2]
#> [1] 5.0 0.2
at <- function(array = a, d = 1, slice = 1)
{
consec <- cumprod(c(1, dim(a)))[-(length(dim(a)) + 1)]
skip_n <- consec * (dim(a) - 1)
basic_pattern <- rep_len(c(rep(TRUE, consec[d]), rep(FALSE, skip_n[d])), length(a))
which(c(rep(FALSE, consec[d] * (slice - 1)), basic_pattern)[seq_along(a)])
}
StartPoint1QQQ <- array(c(1,1,0,0))
StartPoint2QQQ <- array(c(1,1,0,0))+1
MatrixQQQ<-array(0,dim=c(2,2,2,2))
MatrixQQQ[2,2,1,1] <- 0.1
MatrixQQQ[2,2,2,1] <- 0.2
MatrixQQQ[2,2,1,2] <- 5
for(j in 2:4)
{
if(StartPoint2QQQ[j] == 1)
{
Indexlist2 <- which( MatrixQQQ[at(MatrixQQQ, j, 1)] == max( MatrixQQQ[at(MatrixQQQ, j, 1)]), arr.ind = TRUE)
}
}
#I simplified my higher dimension problem. MatrixQQQ is my 4 dimensional matrix that stores
#scores for each state (there are 4 states). Here the first state is time, and 2nd-4th states are cost.
#I care for cost only for now, so states 2:4 are of interest. I want to get a list of indices
# in which the slice is 1 and among them find the ones with maximum MatrixQQQ value
#The expected value here is 2,2,1,2 because its MatrixQQQ value is the highest, 5.
at <- function(Array, Dimension, Slice, index = FALSE)
{
if(!is.array(Array))
stop("'at()' can only be called on arrays")
if(!(is.numeric(Dimension) && Dimension > 0))
stop("Invalid value of Dimension supplied to 'at()'")
# Get the numbers of dimensions and elements in our Array
n <- length(Array)
dims <- dim(Array)
n_dims <- length(dims)
if(Dimension > n_dims | Dimension < 1)
stop("Invalid dimension chosen in 'at()'")
if(max(Slice) > dims[Dimension])
stop("Invalid slice chosen for given dimension")
final_result <- numeric()
for(i in seq_along(Slice))
{
run_length <- cumprod(c(1, dims)[-(n_dims + 1)])[Dimension]
skip_length <- run_length * (dims[Dimension] - 1)
# Now we simply make a repeating pattern of membership / non-membership
pattern <- rep_len(c(rep(T, run_length), rep(F, skip_length)), n)
shifted_pattern <- c(rep(FALSE, run_length * (Slice[i] - 1)), pattern)
loop_result <- which(shifted_pattern[seq(n)])
final_result <- c(final_result, loop_result)
}
if(index == FALSE)
{
dims[Dimension] <- length(Slice)
return(array(Array[final_result], dim = dims))
}
return(sort(final_result))
}
IMAX <- 2
MatrixQ<-array(0,dim=c(2,2,2,2,2,2,2,2,2,2,2,2,3))
MatrixQ[2,2,1,1,1,1,1,1,1,2,1,2,IMAX] <- 5000
j <- 10
which(MatrixQ == max(MatrixQ), arr.ind = T)
which(at(MatrixQ, j, 2) == max(at(MatrixQ, j, 2)), arr.ind = T)
######ANSWER I GET:###########################
####which(MatrixQ == max(MatrixQ), arr.ind = T)
#### dim1 dim2 dim3 dim4 dim5 dim6 dim7 dim8 dim9 dim10 dim11 dim12 dim13
####[1,] 2 2 1 1 1 1 1 1 1 2 1 2 2
####> which(at(MatrixQ, j, 2) == max(at(MatrixQ, j, 2)), arr.ind = T)
#### dim1 dim2 dim3 dim4 dim5 dim6 dim7 dim8 dim9 dim10 dim11 dim12 dim13
####[1,] 2 2 1 1 1 1 1 1 1 1 1 2 2

Solving underdetermined linear systems with R

R can solve underdetermined linear systems:
A = matrix((1:12)^2,3,4,T)
B = 1:3
qr(A)$rank # 3
qr.solve(A, B) # solutions will have one zero, not necessarily the same one
# 0.1875 -0.5000 0.3125 0.0000
solve(qr(A, LAPACK = TRUE), B)
# 0.08333333 -0.18750000 0.00000000 0.10416667
(It gives one solution among the infinity of solutions).
However, if the rank (here 2) is lower than the number of rows (here 3), it won't work:
A = matrix(c((1:8)^2,0,0,0,0),3,4,T)
B = c(1,2,0)
A
# [,1] [,2] [,3] [,4]
# [1,] 1 4 9 16
# [2,] 25 36 49 64
# [3,] 0 0 0 0
qr.solve(A, B) # Error in qr.solve(A, B) : singular matrix
solve(qr(A, LAPACK = TRUE), B) # Error in qr.coef(a, b) : error code 3
but this system does have a solution!
I know that the general solution is to use SVD or generalized/pseudo inverse of A (see this question and its answers), but:
Is there a mean with solve or qr.solve to automatically reduce the system AX=B to an equivalent system CX=D of only rank(A) rows, for which qr.solve(C, D) would simply work out-of-the-box?
Example:
C = matrix(c((1:8)^2),2,4,T)
D = c(1,2)
qr.solve(C, D)
# -0.437500 0.359375 0.000000 0.000000
qr.coef along with qr seem to do the job:
(A <- matrix(c((1:8)^2, 0, 0, 0, 0), nrow = 3, ncol = 4, byrow = TRUE))
# [,1] [,2] [,3] [,4]
# [1,] 1 4 9 16
# [2,] 25 36 49 64
# [3,] 0 0 0 0
(B <- c(1, 2, 0))
# [1] 1 2 0
(X0 <- qr.coef(qr(A), B))
# [1] -0.437500 0.359375 NA NA
X0[is.na(X0)] <- 0
X0
# [1] -0.437500 0.359375 0.000000 0.000000
# Verification:
A %*% X0
# [,1]
# [1,] 1
# [2,] 2
# [3,] 0
Second example:
(A<-matrix(c(1, 2, 0, 0, 1, 2, 0, 0, 1, 2, 1, 0), nrow = 3, ncol = 4, byrow = TRUE))
# [,1] [,2] [,3] [,4]
# [1,] 1 2 0 0
# [2,] 1 2 0 0
# [3,] 1 2 1 0
(B<-c(1, 1, 2))
# [1] 1 1 2
qr.solve(A, B)
# Error in qr.solve(A, B) : singular matrix 'a' in solve
(X0 <- qr.coef(qr(A), B))
# [1] 1 NA 1 NA
X0[is.na(X0)] <- 0
X0
# [1] 1 0 1 0
A %*% X0
# [,1]
# [1,] 1
# [2,] 1
# [3,] 2

Construct and inverse matrix from list using R

I have a relationship matrix generated from GCTA, that I can import into R using the following function
ReadGRMBin=function(prefix, AllN=F, size=4){
sum_i=function(i){
return(sum(1:i))
}
BinFileName=paste(prefix,".grm.bin",sep="")
NFileName=paste(prefix,".grm.N.bin",sep="")
IDFileName=paste(prefix,".grm.id",sep="")
id = read.table(IDFileName)
n=dim(id)[1]
BinFile=file(BinFileName, "rb");
grm=readBin(BinFile, n=n*(n+1)/2, what=numeric(0), size=size)
NFile=file(NFileName, "rb");
if(AllN==T){
N=readBin(NFile, n=n*(n+1)/2, what=numeric(0), size=size)
}
else N=readBin(NFile, n=1, what=numeric(0), size=size)
i=sapply(1:n, sum_i)
return(list(diag=grm[i], off=grm[-i], id=id, N=N))
}
It then lists the off diagonal and diagonal.
$ diag: num [1:850] 0.878 0.815 1.11 1.161 1.062 ...
$ off : num [1:360825] 0.0181 -0.0304 -0.0663 -0.0211 -0.0583 ...
$ n : int 850
I wish to create a grm I can inverse from this and ideally in the output row, column, value
I have tried the following code but it doesn't read the off diagonal in the correct format
m <- matrix(NA, ncol = length(grm$diag), nrow = length(grm$diag))
m[lower.tri(m)] <- grm$off
m[upper.tri(m)] <- t(m)[upper.tri(t(m))]
diag(m) <- grm$diag
m
want=cbind(which(!is.na(m),arr.ind = TRUE),na.omit(as.vector(m)))
Instead of reading the diagonal values as
2 1, 3 1, 3 2, 4 1, 4 2 etc
It is reading the diagonal going length wise as
2 1, 3 1, 4 1, 5 1, 6 1 etc
So the resulting matrix (shortened) ends up like this
[,1] [,2] [,3] [,4] [,5]
[1,] 0.87798703 0.018129893 -0.03044302 -0.066282429 -0.02106927
[2,] 0.01812989 0.814602911 0.07577287 -0.004078172 -0.03182918
[3,] -0.03044302 0.075772874 1.10976517 -0.055698857 -0.03960679
[4,] -0.06628243 -0.004078172 -0.05569886 1.160611629 -0.01021352
[5,] -0.02106927 -0.031829182 -0.03960679 -0.010213521 1.06245303
When preference is this
[,1] [,2] [,3] [,4] [,5]
[1,] 0.87798703 0.018129893 -0.03044302 -0.02106927 -0.04011643
[2,] 0.01812989 0.814602911 -0.06628243 -0.00582625 -0.06237402
[3,] -0.03044302 -0.06628243 1.10976517 0.1315616 -0.1601102
[4,] -0.02106927 -0.00582625 0.1315616 1.160611629 -0.1388046
[5,] -0.04011643 -0.06237402 -0.1601102 -0.1388046 1.06245303
If you know how to amend the above code to give the wanted format it would be much appreciated.
The end desired output would be the inverse of the matrix in long format if possible. Thanks
1 1 12456
1 2 78910
1 3 34568
1 4 68942
One simple solution is to adapt your code to fill the upper triangle before the lower (since it is the upper triangle that should be filled in column order):
grm = list(
diag = 1:5 * 11,
off = 0:9)
m <- diag(grm$diag)
m[upper.tri(m)] <- grm$off
m[lower.tri(m)] <- t(m)[lower.tri(t(m))]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 11 0 1 3 6
# [2,] 0 22 2 4 7
# [3,] 1 2 33 5 8
# [4,] 3 4 5 44 9
# [5,] 6 7 8 9 55

R genealg package rbga.bin evaluation function

I am trying to solve the TSP (Traveling Salesman Problem) using the rbga.bin from the genealg package. I have matrix that stores the distances between the cities
like this:
[,1] [,2] [,3] [,4]
[1,] 0 2 10 4
[2,] 0 0 12 12
[3,] 0 0 0 5
[4,] 0 0 0 0
but I'm not able to code a proper evaluation function (even though I already saw some examples in documentation and on the web). I know a chromosome will be passed as a parameter to the evaluation function, but I don't know what operations to do to return a proper value.
Basically, you are asking how to evaluate the length of a path given the endpoints on that path and a distance matrix (for path 1-3-2-4, you want d13+d32+d24+d41). You can do this with matrix indexing and the sum function. Let's consider your distance matrix and solution 1-3-2-4 (since the TSP is typically posed in a symmetric form, I've made it symmetric):
(d <- matrix(c(0, 2, 10, 4, 2, 0, 12, 12, 10, 12, 0, 5, 4, 12, 5, 0), nrow=4))
# [,1] [,2] [,3] [,4]
# [1,] 0 2 10 4
# [2,] 2 0 12 12
# [3,] 10 12 0 5
# [4,] 4 12 5 0
sln <- c(1, 3, 2, 4)
Now you can grab matrix indexes from your solution and pull the distances, summing them for your final evaluation:
(idx <- cbind(sln, c(tail(sln, -1), sln[1])))
# [1,] 1 3
# [2,] 3 2
# [3,] 2 4
# [4,] 4 1
d[idx]
# [1] 10 12 12 4
sum(d[idx])
# [1] 38

Filtering out rows in a matrix containing only 0 in R

I have the matrix
m <- matrix(c(1, 0, 3, 4, 0, 6), 3)
I need to filter out rows where both columns have a value of 0 in effect returning the matrix:
m <- matrix(c(1, 3, 4, 6), 3)
I have tried
m[m[, 1] > 0 & m[, 2] > 0]
but this returns a vector instead of a matrix stripped of rows with only 0. This should be simple but I am stuck.
Thanks,
-Elizabeth
In case you had many columns
m
[,1] [,2]
[1,] 1 4
[2,] 0 0
[3,] 3 6
m^2
[,1] [,2]
[1,] 1 16
[2,] 0 0
[3,] 9 36
rowSums(m^2)
[1] 17 0 45
m[rowSums(m^2)>0,]
[,1] [,2]
[1,] 1 4
[2,] 3 6
You are just missing a "," in your own solution.
Use
m[m[,1]>0 & m[,2]>0,]
and it will work:
> m[m[,1]>0 & m[,2]>0,]
[,1] [,2]
[1,] 1 4
[2,] 3 6

Resources