I have a data frame of 300x300 elements. Each of them are either -1 or +1:
[,1] [,2] [,3]
[1,] 1 -1 -1
[2,] 1 1 1
[3,] -1 -1 1
[4,] 1 1 -1
What I want is to iterate over my data frame, and multiply each value with every neighbouring value.
I.e:
For element [1,1] in my original data frame I want the product of [1,1], [1,2] and [2,1]
For element [2,2] in my original data frame I want the product of [2,2], [1,2], [2,1], [2,3] and [3,2].
I have tried to create 4 new data frames, each shifted 1 element to the right, left, up and down, respectively:
x_up <- shift(x, 1, dir='up')
x_up <- as.array(x_up)
dim(x_up) <- dims
x_down <- shift(x, 1, dir='down')
x_down <- as.array(x_down)
dim(x_down) <- dims
x_left <- shift(x, 1, dir='left')
x_left <- as.array(x_left)
dim(x_left) <- dims
x_right <- shift(x, 1, dir='right')
x_right <- as.array(x_right)
dim(x_right) <- dims
where x is my original data frame.
I can see when I used this approach, the new data frames are not rightfully shiftet; more of them are identical. I checked this with identical().
Is there another approach to my problem?
Edit:
shift() is of the 'binhf' library
I think there's probably a smarter way to do this, but the standard approach would be iterating over each element and multiplying its surroundings.
Starting with:
mat <- matrix(c(1, 1, -1, 1, -1, 1, -1, 1, -1, 1, 1, -1), ncol=3)
In order to avoid problems on positive margins, you must add a column and a row of 1's as margins (positive 1 won't be a problem when multiplying, if you were summing it would have to be 0's, for example).
mat2 <- addmargins(mat, FUN=function(x) 1)
Now you create an empty matrix to hold the output, and then iterate over the elements and multiply the neighbors.
out <- matrix(nrow=nrow(mat), ncol=ncol(mat))
for (i in 1:nrow(mat)) {
for (j in 1:ncol(mat)) {
out[i,j] <- prod(mat[i,j], mat2[i-1, j], mat2[i, j-1], mat2[i+1, j], mat2[i, j+1])
}
}
Resulting in:
> out
[,1] [,2] [,3]
[1,] -1 1 1
[2,] -1 1 -1
[3,] 1 1 1
[4,] -1 1 -1
This took less than a second for a 300x300 matrix, so it might be enough for you.
This should do the trick:
ind <- which(x==x, arr.ind=TRUE) # index matrix
# find distances (need distances of 1 or 0)
dist.mat <- as.matrix(dist(ind))
inds2mult <- apply(dist.mat, 1, function(ii) which(ii <= 1))
# get product of each list element in inds2mult
# and reform into appropriate matrix
matrix(
sapply(inds2mult, function(ii) prod(unlist(x)[ii])),
ncol=ncol(x))
# [,1] [,2] [,3]
#[1,] -1 1 1
#[2,] -1 1 -1
#[3,] 1 1 1
#[4,] -1 1 -1
To get around memory issues with large matrices in the call to dist, you can try the fields.rdist.near function (with a delta value of 1) from the fields package:
x <- matrix(rep(-1, 300*300), ncol=300)
ind <- which(x==x, arr.ind=TRUE) # index matrix
library(fields)
ind.list <- fields.rdist.near(ind, delta=1) # took my computer ~ 15 - 20 seconds
inds2mult <- tapply(ind.list$ind[,2], ind.list$ind[,1], list)
matrix(
sapply(inds2mult, function(ii) prod(unlist(x)[ii])),
ncol=ncol(x))
The delta argument from the fields.rdist.near help page:
Threshhold distance. All pairs of points that separated by more
than delta in distance are ignored.
Related
I have a positive definite symmetric matrix. Pasting the matrix generated using the following code:
set.seed(123)
m <- genPositiveDefMat(
dim = 3,
covMethod = "unifcorrmat",
rangeVar = c(0,1) )
x <- as.matrix(m$Sigma)
diag(x) <- 1
x
#Output
[,1] [,2] [,3]
[1,] 1.0000000 -0.2432303 -0.4110525
[2,] -0.2432303 1.0000000 -0.1046602
[3,] -0.4110525 -0.1046602 1.0000000
Now, I want to run the matrix through iterations and in each iteration I want to replace the symmetric pair with NA. For example,
Iteration 1:
x[1,2] = x[2,1] <- NA
Iteration2:
x[1,3] = x[3,1] <- NA
and so on....
My idea was to check using a for loop
Prototype:
for( r in 1:nrow(x)
for( c in 1:ncol(x)
if x[r,c]=x[c,r]<-NA
else
x[r,c]
The issue with my code is for row 1 and column 1, the values are equal hence it sets to 0 (which is wrong). Also, the moment it is not NA it comes out of the loop.
Appreciate any help here.
Thanks
If you need the replacement done iteratively, you can use the indexes of values represented by upper.tri(x)/lower.tri to do the replacements pair-by-pair. That will allow you to pass the results to a function before/after each replacement, e.g.:
idx <- which(lower.tri(mat), arr.ind=TRUE)
sel <- cbind(
replace(mat, , seq_along(mat))[ idx ],
replace(mat, , seq_along(mat))[ idx[,2:1] ]
)
# [,1] [,2]
#[1,] 2 4 ##each row represents the lower/upper pair
#[2,] 3 7
#[3,] 6 8
for( i in seq_len(nrow(sel)) ) {
mat[ sel[i,] ] <- NA
print(mean(mat, na.rm=TRUE))
}
#[1] 0.2812249
#[1] 0.5581359
#[1] 1
I want to find the minimum value in my distance matrix in order to programm the single linkage algorithm for cluster analysis with R. But the output doesn´t show the coordinates (row number and column number) to identify the minimum.
I tried the "which" command to solve this.
This seems to be the right approach:
> x <- matrix(c(1, 2, 0, 4), nrow=2, ncol=2)
> which(x == min(x), arr.ind=TRUE)
row col
[1,] 1 2
I tried it with my case, but there is no right output:
> which(distance.matrix.euc==min(distance.matrix.euc), arr.ind=TRUE)
row col
I expect that R shows me the coordinates where the minimum value is in the distance matrix, but it shows nothing.
Do you have an idea what´s wrong.
If you create the distance.matrix.euc with the dist function in R, then its class will be dist, not a matrix.
set.seed(2)
x <- matrix(sample(1:10, 6, replace = FALSE), nrow=3)
x
# [,1] [,2]
# [1,] 5 1
# [2,] 6 10
# [3,] 9 7
distance_matrix <- dist(x)
distance_matrix
# 1 2
# 2 9.055385
# 3 7.211103 4.242641
class(distance_matrix)
# [1] "dist"
As #akrun suggested, you can convert your distance matrix into matrix class. Then, the which command should return closest points.
min_dist <- min(distance_matrix)
distance_matrix <- as.matrix(distance_matrix)
which(distance_matrix==min_dist, arr.ind=TRUE)
# row col
# 3 3 2
# 2 2 3
I have two matrices, call them A (n x 2) and B (q x 2). I'd like to get an n x q x 2 array C, such that C[1,5,] represents the difference between the first row of A and the fifth row of B, taking the subtraction of the first element in the first row of A with the first element in the fifth row of B and the second element similarly subtracted.
I'm trying to perform this function via the outer function, but it also gives me the "non-diagonal" subtractions; i.e. it will also subtract A[1,1] - B[5,2] and A[1,2] - B[5,1] which I am not interested in. Does anyone have a fast, easy way to do this?
Current code
>diffs <- outer(A,B,FUN ='-')
>diffs[1,,5,]
[,1] [,2]
[1,] **-0.3808701** 0.7591052
[2,] 0.2629293 **1.4029046**
I've added the stars to indicate what I actually want.
Thanks for any help in advance
(EDIT)
Here's a simpler case for illustrative purposes
> A <- matrix(1:10, nrow = 5, ncol = 2)
> B <- matrix(4:9, nrow = 3, ncol = 2)
> A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> B
[,1] [,2]
[1,] 4 7
[2,] 5 8
[3,] 6 9
>diffs <- outer(A,B,FUN ='-')
>diffs[1,,3,] == (A[1,] - B[3,])
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE
>diffs[1,,3,]
[,1] [,2]
[1,] -5 -8
[2,] 0 -3
Before worrying about the shape of the output I think we should make sure we're getting the correct values.
A <- matrix(1:10, nrow=5, ncol=2)
B <- matrix(4:9, nrow=3, ncol=2)
# long-winded method
dia_long <- c(
c(A[1,] - B[1,]),
c(A[1,] - B[2,]),
c(A[1,] - B[3,]),
c(A[2,] - B[1,]),
c(A[2,] - B[2,]),
c(A[2,] - B[3,]),
c(A[3,] - B[1,]),
c(A[3,] - B[2,]),
c(A[3,] - B[3,]),
c(A[4,] - B[1,]),
c(A[4,] - B[2,]),
c(A[4,] - B[3,]),
c(A[5,] - B[1,]),
c(A[5,] - B[2,]),
c(A[5,] - B[3,]))
# loop method
comb <- expand.grid(1:nrow(A), 1:nrow(B))
dia_loop <- list()
for (i in 1:nrow(comb)) {
dia_loop[[i]] <- A[comb[i, 1], ] - B[comb[i, 2], ]
}
dia_loop <- unlist(dia_loop)
# outer/apply method
dia_outer <- apply(outer(A, B, FUN='-'), c(3, 1), diag)
# they all return the same values
all.identical <- function(l) {
all(sapply(2:length(l), FUN=function(x) identical(l[1], l[x])))
}
all.identical(lapply(list(dia_long, dia_loop, dia_outer), sort))
# TRUE
table(dia_long)
# dia_long
# -5 -4 -3 -2 -1 0 1 2 3
# 1 2 4 5 6 5 4 2 1
Are these the values you are looking for?
My solution: use nested lapply and sapply functions to extract the diagonals. I then needed to do some post-processing (not related to this specific problem), before I then turned it into an array. Should be noted that this is a q x 2 x n array, which turned out to be better for my purposes - this could be permuted with aperm from here though to solve the original question.
A <- matrix(1:10, nrow = 5, ncol = 2)
B <- matrix(4:9, nrow = 3, ncol = 2)
diffs <- outer(A,B, FUN = '-')
diffs <- lapply(X = 1:nrow(A),FUN = function(y){
t(sapply(1:ncol(B), FUN = function(x) diag(diffs[y,,x,])))})
diffs <- array(unlist(lapply(diffs, FUN = t)), dim = c(nrow(B),2,nrow(A)))
I have a function that finds me the nearest values for each row in a matrix. It then reports a list with an index of the nearest rows. However, I want it to exclude values if they are +1 in the first AND +1 in the second column away from a particular set of values (-1 in the first and -1 in the second column should also be removed). Moreover, +1 in first column and -1 in second column with respect to the values of interest should also be avoided.
As an example, if I want things closes to c(2, 1), it should accept c(3,1) or (2,2) or (1,1), but NOT c(3,2) and not c(1,0).
Basically, for an output to be reported either column 1 or column 2 should be a value of 1 away from a row of interest, but not both.
input looks like this
x
v1 v2
[1,] 3 1
[2,] 2 1
[3,] 3 2
[4,] 1 2
[5,] 8 5
myfunc(x)
The output looks like this. Notice that the closest thing to row 2 ($V2 in output) is row 1,3,4. The answer should only be 1 though.
$V1
[1] 2 3
$V2
[1] 1 3 4
$V3
[1] 1 2
$V4
[1] 2
$V5
integer(0)
Here is myfunc
myfunc = function(t){
d1 <- dist(t[,1])
d2 <- dist(t[,2])
dF <- as.matrix(d1) <= 1 & as.matrix(d2) <= 1
diag(dF) <- NA
colnames(dF) <- NULL
dF2 <- lapply(as.data.frame(dF), which)
return(dF2)
}
Basically, the rows that you want to find should differ from your reference element by +1 or -1 in one column and be identical in the other column. That means that the sum over the absolute values of the differences is exactly one. For your example c(2, 1), this works as follows:
c(3, 1): difference is c(1, 0), thus sum(abs(c(1, 0))) = 1 + 0 = 1
c(1, 1): difference is c(-1, 0), thus sum(abs(c(-1, 0))) = 1 + 0 = 1
etc.
The following function checks exactly this:
myfunc <- function(x) {
do_row <- function(r) {
r_mat <- matrix(rep(r, length = length(x)), ncol = ncol(x), byrow = TRUE)
abs_dist <- abs(r_mat - x)
return(which(rowSums(abs_dist) == 1))
}
return(apply(x, 1, do_row))
}
do_row() does the job for a single row, and then apply() is used to do this with each row. For your example, I get:
myfunc(x)
## [[1]]
## [1] 2 3
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 1
##
## [[4]]
## integer(0)
##
## [[5]]
## integer(0)
Using sweep(), one can write a shorter function:
myfunc2 <- function(x) {
apply(x, 1, function(r) which(rowSums(abs(sweep(x, 2, r))) == 1))
}
But this seems harder to understand and it turns out that it is slower by about a factor two for your matrix x. (I have also tried it with a large matrix, and there, the efficiency seems about the same.)
I have the following problem:
I do have a lists with matrices with indices.
Every column of a matrix shows which row indices should be equal to 1 for that specific column.
All the other values should be equal to 0.
I do know the size of the output matrices and there are no duplicated values in a column.
For example the following matrix should be translated as follows:
m_in = matrix(c(1,3,5,7,3,4), nrow =2)
m_out = matrix(c(1,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,0), nrow = 7)
I did made a code that works, but it would be great if I could achieve this without loops in a more efficient/clever way.
Index <- matrix(20, 100, data = sample(1:200))
Vector <- c(2,3,5,8,20)
ListIndices <- sapply(Vector, function(x)Index[0:x,])
emptylistlist <- list()
for (i in 1: length(ListIndices)){
for (j in 1 : 100){
emptylistlist[[i]] <- matrix(nrow = 200, ncol = 100, data = 0)
emptylistlist[[i]][ListIndices[[i]],j]<-1
}
}
We can try sparseMatrix from library(Matrix) and then wrap it with as.matrix.
library(Matrix)
as.matrix(sparseMatrix(i= c(m1), j= c(col(m1)), x=1))
# [,1] [,2] [,3]
#[1,] 1 0 0
#[2,] 0 0 0
#[3,] 1 0 1
#[4,] 0 0 1
#[5,] 0 1 0
#[6,] 0 0 0
#[7,] 0 1 0
If there is a list of matrices, then we can use lapply
lapply(lst, function(y) as.matrix(sparseMatrix(i= c(y), j= c(col(y)), x= 1)))
The typical way is with matrix assignment:
m_out = matrix(0L, max(m_in), ncol(m_in))
m_out[cbind(c(m_in), c(col(m_in)))] <- 1L
How it works: The syntax for matrix assignment M[IND] <- V is described at help("[<-").
Each row of IND is a pair of (row, column) positions in M.
Elements of M at those positions will be overwritten with (corresponding elements of) V.
As far as the list of matrices goes, an array would be more natural:
set.seed(1)
Index <- matrix(20, 100, data = sample(1:200))
Vector <- c(2,3,5,8,20)
idx <- sapply(Vector, function(x)Index[0:x,])
# "ListIndices" is too long a name
a_out = array(0L, dim=c(
max(unlist(idx)),
max(sapply(idx,ncol)),
length(idx)))
a_out[ cbind(
unlist(idx),
unlist(lapply(idx,col)),
rep(seq_along(idx),lengths(idx))
)] <- 1L
The syntax is the same as for matrix assignment.
Seeing as the OP has so many zeros and so few ones, a sparse matrix, as in #akrun's answer makes the most sense, or a sparse array, if such a thing has been implemented.