Flatten (reassign values of) hotspots in a matrix - r

I have a matrix in the following format:
set.seed(1)
m = matrix(sample(c(0,0,0,1),25,rep=T), nrow=5)
m[13] = 4
print(m)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 0 0 1
[2,] 0 1 0 0 0
[3,] 0 0 4 1 0
[4,] 1 0 0 0 0
[5,] 0 0 1 1 0
Consider [3,3] is some hotspot that we want to 'flatten' by spreading it's value across the nearest neighbouring/nearby cells of zero value. In this case that means assigning 1 to cells [2,3], [3,2] and [4,3] so that [3,3] can also be reduced to 1:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 0 0 1
[2,] 0 1 1 0 0
[3,] 0 1 1 1 0
[4,] 1 0 1 0 0
[5,] 0 0 1 1 0
Is anyone aware of a matrix/raster operation that can achieve this efficiently, while preserving the sum total of all the cells?

I got interested in this question, so I made an attempt. There, probably, exists a "rastery" tool for what you're trying but I'm not aware.
First, a helper function that finds the indices of elements of a square surrounding a specific element in a matrix:
find_neighbors = function(i, j, n)
{
tmp = expand.grid(replicate(2, -n:n, simplify = F))
tmp2 = tmp[rowSums(abs(tmp) < n) < 2, ]
inds = cbind(tmp2[, 1] + i, tmp2[, 2] + j)
inds[order(rowSums(abs(cbind(inds[, 1] - i, ##so that up/down/right/left are filled before diagonal neighbors
inds[, 2] - j)))), ]
}
E.g.:
m1 = matrix(0, 7, 8)
m1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] 0 0 0 0 0 0 0 0
#[2,] 0 0 0 0 0 0 0 0
#[3,] 0 0 0 0 0 0 0 0
#[4,] 0 0 0 0 0 0 0 0
#[5,] 0 0 0 0 0 0 0 0
#[6,] 0 0 0 0 0 0 0 0
#[7,] 0 0 0 0 0 0 0 0
m1[find_neighbors(3, 4, 1)] = 1
m1[find_neighbors(3, 4, 2)] = 2
m1[find_neighbors(3, 4, 3)] = 3
m1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] 3 2 2 2 2 2 3 0
#[2,] 3 2 1 1 1 2 3 0
#[3,] 3 2 1 0 1 2 3 0
#[4,] 3 2 1 1 1 2 3 0
#[5,] 3 2 2 2 2 2 3 0
#[6,] 3 3 3 3 3 3 3 0
#[7,] 0 0 0 0 0 0 0 0
And the function that flattens the hot-spots. There is a nested loop. The first "for" to loop over the hotspots and the second to iteratively flatten the hotspot to its neighbors. Nevertheless, the loop is exitted once the spot is flattened.
ff = function(mat, thres = 1)
{
wh = which(mat > thres, T)
for(r in seq_len(nrow(wh))) {
for(n in seq_len(max(c(dim(mat) - wh[r, ], wh[r, ] - 1)))) {
if(mat[wh[r, , drop = F]] <= thres) break #stop flattening if we are done
inds = find_neighbors(wh[r, 1], wh[r, 2], n) #get indices of neighbours
inds = inds[!((rowSums(inds <= 0) > 0) | #use valid indices..
inds[, 1] > nrow(mat) |
inds[, 2] > ncol(mat)), ]
inds = inds[mat[inds] < thres, , drop = F] #use indices that are allowed to take values
tofill = nrow(inds) * thres #how many 'units' need to be taken from the hotspot?
mat[wh[r, , drop = F]] = mat[wh[r, , drop = F]] + sum(mat[inds]) #in case the neighbors
#of the hotspot are > 0,
#the, just, increase the
#value of the hotspot
if(mat[wh[r, , drop = F]] <= tofill) tofill = mat[wh[r, , drop = F]] - thres #do we have enough
#'units' in the hotspot?
if(tofill > 0) {
if(tofill < thres) {
mat[inds[1, , drop = F]] = tofill
mat[wh[r, , drop = F]] = mat[wh[r, , drop = F]] - tofill
next
}
nr = tofill %/% thres
mat[inds[seq_len(nr), , drop = F]] = thres
if((tofill %% thres) > 0) mat[inds[nr + 1, , drop = F]] = tofill %% thres
mat[wh[r, , drop = F]] = mat[wh[r, , drop = F]] - tofill
}
}
}
mat
}
And an example:
mm = matrix(0, 11, 9); mm[8, 2] = 12; mm[6, 7] = 4
mm
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 0 0 0 0 0 0 0 0 0
# [2,] 0 0 0 0 0 0 0 0 0
# [3,] 0 0 0 0 0 0 0 0 0
# [4,] 0 0 0 0 0 0 0 0 0
# [5,] 0 0 0 0 0 0 0 0 0
# [6,] 0 0 0 0 0 0 4 0 0
# [7,] 0 0 0 0 0 0 0 0 0
# [8,] 0 12 0 0 0 0 0 0 0
# [9,] 0 0 0 0 0 0 0 0 0
#[10,] 0 0 0 0 0 0 0 0 0
#[11,] 0 0 0 0 0 0 0 0 0
ff(mm)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 0 0 0 0 0 0 0 0 0
# [2,] 0 0 0 0 0 0 0 0 0
# [3,] 0 0 0 0 0 0 0 0 0
# [4,] 0 0 0 0 0 0 0 0 0
# [5,] 0 0 0 0 0 0 1 0 0
# [6,] 0 1 0 0 0 1 1 0 0
# [7,] 1 1 1 0 0 0 1 0 0
# [8,] 1 1 1 1 0 0 0 0 0
# [9,] 1 1 1 0 0 0 0 0 0
#[10,] 0 1 0 0 0 0 0 0 0
#[11,] 0 0 0 0 0 0 0 0 0
ff(mm, 3)
ff(mm, 5)
ff(mm, 1500)
Hope any of these will be helpful.

Possible outline of an attack.
1) find the hotspots:
hotind <- which (m > 1, arr.ind=TRUE)
2) loop over the rows of hotind to spread:
for (j in 1: nrow(hotind) {
hotpoint <- hotind[j,]
# for example, divvy up the hot value into four nearest neighbors
m[hotpoint[1]-1,hotpoint[2]-1] <- m[hotpoint[1],hotpoint[2]]/4
# do_same_for m[hotpoint[1]+1,hotpoint[2]-1] and_so_on
m[hotpoint[1],hotpoint[2]] <- 1 # or your choice of final value
}
It sure "feels" to me like there's a way to do this with a smoothing convolution kernel approach, so here's hoping someone posts a slicker method.

Related

Set values along a diagonal in a matrix

I am trying to use the matrix() and diag() functions to create the following pattern, but with a 100 x 100 matrix rather than 5 x 5.
5 x 5 matrix:
| 0 1 0 0 0 |
| 1 0 1 0 0 |
| 0 1 0 1 0 |
| 0 0 1 0 1 |
| 0 0 0 1 0 |
In other words, I want to have two diagonals with values of 1, one to the left of the main diagonal, and one to the right of the main diagonal.
The diag() function (actually the diag<- function) can be used for assignment:
mat <- matrix( 0, 100,100)
diag(mat) <- 1
mat[1:10,1:10]
#-----------
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 0 0 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0 0
[4,] 0 0 0 1 0 0 0 0 0 0
[5,] 0 0 0 0 1 0 0 0 0 0
[6,] 0 0 0 0 0 1 0 0 0 0
[7,] 0 0 0 0 0 0 1 0 0 0
[8,] 0 0 0 0 0 0 0 1 0 0
[9,] 0 0 0 0 0 0 0 0 1 0
[10,] 0 0 0 0 0 0 0 0 0 1
You, however, want the sub-diagonal and super-diagonal to be assigned values, so use logical expressions with col and row:
mat <- matrix( 0, 100,100)
mat[row(mat)==col(mat)-1] <- 1
mat[row(mat)==col(mat)+1] <- 1
mat[1:10,1:10]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 1 0 0 0 0 0 0 0 0
[2,] 1 0 1 0 0 0 0 0 0 0
[3,] 0 1 0 1 0 0 0 0 0 0
[4,] 0 0 1 0 1 0 0 0 0 0
[5,] 0 0 0 1 0 1 0 0 0 0
[6,] 0 0 0 0 1 0 1 0 0 0
[7,] 0 0 0 0 0 1 0 1 0 0
[8,] 0 0 0 0 0 0 1 0 1 0
[9,] 0 0 0 0 0 0 0 1 0 1
[10,] 0 0 0 0 0 0 0 0 1 0
(This method does not depend on having a square matrix. I have a vague memory that there is a faster method that does not require using row and col. For very large objects each of those functions returns a matrix of the same dimensions as their arguments.)
For the main diagonal, the row and column indices are the same. For the other diagonals, there is a difference of 1 between the row index and column index. Generate those indices directly and assign values in those indices.
sz = 5
m = matrix(0, sz, sz)
inds1 = cbind(r = 1:(sz-1), c = 2:sz)
inds2 = cbind(r = 2:sz, c = 1:(sz-1))
m[inds1] = 1
m[inds2] = 1
m
# OR, to make it concise
m = matrix(0, sz, sz)
inds = rbind(cbind(1:(sz-1), 2:sz), cbind(2:sz, 1:(sz-1)))
replace(m, inds, 1)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 1 0 0 0
#[2,] 1 0 1 0 0
#[3,] 0 1 0 1 0
#[4,] 0 0 1 0 1
#[5,] 0 0 0 1 0
We could create a function using a math trick which would work for all square matrix.
get_off_diagonal_1s <- function(n) {
#Create a matrix with all 0's
mat <- matrix(0, ncol = n, nrow = n)
#Subtract row indices by column indices
inds = row(mat) - col(mat)
#Replace values where inds is 1 or -1
mat[inds == 1 | inds == -1] = 1
mat
}
get_off_diagonal_1s(5)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 1 0 0 0
#[2,] 1 0 1 0 0
#[3,] 0 1 0 1 0
#[4,] 0 0 1 0 1
#[5,] 0 0 0 1 0
get_off_diagonal_1s(8)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] 0 1 0 0 0 0 0 0
#[2,] 1 0 1 0 0 0 0 0
#[3,] 0 1 0 1 0 0 0 0
#[4,] 0 0 1 0 1 0 0 0
#[5,] 0 0 0 1 0 1 0 0
#[6,] 0 0 0 0 1 0 1 0
#[7,] 0 0 0 0 0 1 0 1
#[8,] 0 0 0 0 0 0 1 0

Creating matrix with probabilities depending on index column

How do I create a matrix 10x10, with only 1's (heads), and 0's (tails), with the probability of a heads is 1 divided by the index of the column.
I tried several things but it won't work which is really frustrating. I tried to do it with a vector and a for loop.
mat <- matrix(sample(c(0,1), 100, replace=TRUE, prob=c(1/h, 1-(1/h)), 10))
But now the only question is how to define h.
Here is an option using sapply
n_col <- 10
n_row <- 10
mat <- matrix(nrow = n_row,
ncol = n_col)
set.seed(1)
sapply(1:n_col, function(x) {
mat[, x] <- sample(x = c(1, 0),
size = n_row,
replace = TRUE,
prob = c(1/x, 1 - 1/x))
})
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 0 1 0 1 0 1 0 0 0
# [2,] 1 0 0 0 0 1 0 0 0 0
# [3,] 1 1 0 0 0 0 0 0 0 0
# [4,] 1 0 0 0 0 0 0 0 0 0
# [5,] 1 1 0 1 0 0 0 0 0 0
# [6,] 1 0 0 0 0 0 0 1 0 0
# [7,] 1 1 0 1 0 0 0 0 0 0
# [8,] 1 1 0 0 0 0 0 0 0 0
# [9,] 1 0 1 0 0 0 0 0 0 0
#[10,] 1 1 0 0 0 0 1 1 0 0
Hope it helps.

Obtain matrices by switch a one and a zero-Local search

Let's start with the following matrix.
M <- matrix(c(0,0,0,1,0,0,1,1,
0,0,1,0,0,1,1,0,
0,0,0,0,0,1,1,1,
0,0,0,1,1,0,1,0,
0,0,0,1,1,1,0,0,
0,0,1,0,1,0,0,1),nrow = 8,ncol = 6)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
[3,] 0 1 0 0 0 1
[4,] 1 0 0 1 1 0
[5,] 0 0 0 1 1 1
[6,] 0 1 1 0 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
I want to obtain set of matrices by switching ones and zeros. For each column, starting from column 1, I wanna obtain set of matrices by switching 1 in (4,1) with 0 in (1,1), (2,1), (3,1), (5,1), (6,1) and then do the same for 1s in (7,1) and (8,1). Then continue to the other columns. There are altogether
90 matrices (15 for each column, 15*6) after switching. This is just an example. I have bigger size matrices. How do I generalize for other cases?
Here's a solution. You could wrap the whole thing up into a function. It produces a list of lists of matrices, results, where results[[i]] is a list of matrices with the ith column switched.
column_switcher = function(x) {
ones = which(x == 1)
zeros = which(x == 0)
results = matrix(rep(x, length(ones) * length(zeros)), nrow = length(x))
counter = 1
for (one in ones) {
for (zero in zeros) {
results[one, counter] = 0
results[zero, counter] = 1
counter = counter + 1
}
}
return(results)
}
switched = lapply(1:ncol(M), function(col) column_switcher(M[, col]))
results = lapply(seq_along(switched), function(m_col) {
lapply(1:ncol(switched[[m_col]]), function(i) {
M[, m_col] = switched[[m_col]][, i]
return(M)
})
})
results[[1]]
# [[1]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 0 0 0 0 0
# [2,] 0 0 0 0 0 0
# [3,] 0 1 0 0 0 1
# [4,] 0 0 0 1 1 0
# [5,] 0 0 0 1 1 1
# [6,] 0 1 1 0 1 0
# [7,] 1 1 1 1 0 0
# [8,] 1 0 1 0 0 1
#
# [[2]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 0 0 0
# [2,] 1 0 0 0 0 0
# [3,] 0 1 0 0 0 1
# [4,] 0 0 0 1 1 0
# [5,] 0 0 0 1 1 1
# [6,] 0 1 1 0 1 0
# [7,] 1 1 1 1 0 0
# [8,] 1 0 1 0 0 1
#
# ...
Checking the length of the list and the lengths of the sublists, they're all there.
length(results)
# [1] 6
lengths(results)
# [1] 15 15 15 15 15 15

Create a binary adjacency matrix from a vector of indices

Suppose I have a vector that looks like this:
x <- sample(5, 500, replace = TRUE)
so that each element corresponds to some index from 1 through 5.
What's an efficient way to create a binary adjacency matrix from this vector? To elaborate, the matrix A should be such that A[i,j] = 1 if x[i] = x[j] and 0 otherwise.
In one line, you could do
outer(x, x, function(x, y) as.integer(x==y))
which returns
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 0 0 0 0 0 1 0 0 0
[2,] 0 1 1 1 0 1 0 0 1 0
[3,] 0 1 1 1 0 1 0 0 1 0
[4,] 0 1 1 1 0 1 0 0 1 0
[5,] 0 0 0 0 1 0 0 0 0 0
[6,] 0 1 1 1 0 1 0 0 1 0
[7,] 1 0 0 0 0 0 1 0 0 0
[8,] 0 0 0 0 0 0 0 1 0 0
[9,] 0 1 1 1 0 1 0 0 1 0
[10,] 0 0 0 0 0 0 0 0 0 1
or, in two lines
myMat <- outer(x, x, "==")
myMat[] <- as.integer(myMat)
Check that they're the same.
identical(myMat, outer(x, x, function(x, y) as.integer(x==y)))
[1] TRUE
data
set.seed(1234)
x <- sample(5, 10, replace = TRUE)

Convert a string into a similarity matrix

I have number of strings in an idiosyncratic format, representing sets. In R, I'd like to convert them into a similarity matrix.
For example, a string showing that 1+2 comprise a set, 3 is alone in a set, and 4,5, and 6 comprise a set is:
"1+2,3,4+5+6"
For the example above, I'd like to be able to produce
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 0 0 0 0
[2,] 1 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 0 0 0 1 1 1
[5,] 0 0 0 1 1 1
[6,] 0 0 0 1 1 1
It seems like this should be a painfully simple task. How would I go about it?
Here's an approach:
out <- lapply(unlist(strsplit("1+2,3,4+5+6", ",")), function(x) {
as.numeric(unlist(strsplit(x, "\\+")))
})
x <- table(unlist(out), rep(seq_along(out), sapply(out, length)))
matrix(x %*% t(x), nrow(x))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 1 0 0 0 0
## [2,] 1 1 0 0 0 0
## [3,] 0 0 1 0 0 0
## [4,] 0 0 0 1 1 1
## [5,] 0 0 0 1 1 1
## [6,] 0 0 0 1 1 1
Pseudocode:
Split at , to get an array of strings, each describing a set.
For each element of the array:
Split at + to get an array of set members
Mark every possible pairing of members of this set on the matrix
You can create a matrix in R with:
m = mat.or.vec(6, 6)
By default, the matrix should initialize with all entries 0. You can assign new values with:
m[2,3] = 1
Here's another approach:
# write a simple function
similarity <- function(string){
sets <- gsub("\\+", ":", strsplit(string, ",")[[1]])
n <- as.numeric(tail(strsplit(gsub("[[:punct:]]", "", string), "")[[1]], 1))
mat <- mat.or.vec(n, n)
ind <- suppressWarnings(lapply(sets, function(x) eval(parse(text=x))))
for(i in 1:length(ind)){
mat[ind[[i]], ind[[i]]] <- 1
}
return(mat)
}
# Use that function
> similarity("1+2,3,4+5+6")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 0 0 0 0
[2,] 1 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 0 0 0 1 1 1
[5,] 0 0 0 1 1 1
[6,] 0 0 0 1 1 1
# Using other string
> similarity("1+2,3,5+6+7, 8")
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 0 0 0 0 0 0
[2,] 1 1 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 1 1 1 0
[6,] 0 0 0 0 1 1 1 0
[7,] 0 0 0 0 1 1 1 0
[8,] 0 0 0 0 0 0 0 1

Resources