Adding adjustable random noise to a matrix in R

Adding adjustable random noise to a matrix in R - r

I have a matrix generating function that produces lower-triangle of 1s and upper-triangle of 0s.
I was wondering if it might be possible to add some adjustable random noise (from some distribution that gives random 0 and 1) to the outputted matrix such that the random 0s randomly replace some of the bottom 1s, and random 1s randomly replace some of the top 0s?
lower_mat <- function(r, c) {
m <- matrix(0, nrow=r,ncol=c)
m[lower.tri(m)] <- 1
m
}
lower_mat(5,4)
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 1 0 0 0
# [3,] 1 1 0 0
# [4,] 1 1 1 0
# [5,] 1 1 1 1

If you want to assume that you are swapping from lower to upper a certain number of positions, you could do
swap_upper_lower <- function(m, n) {
tops <- which(upper.tri(m))
bots <- which(lower.tri(m))
stopifnot(length(bots)>=n && length(tops)>=n)
tops <- sample(tops, n)
bots <- sample(tops, n)
vals <- m[tops]
m[tops] <- m[bots]
m[bots] <- vals
m
}
mm <- lower_mat(5,4)
swap_upper_lower(mm, 3)
That will swap 3 values from the lower triangle to the upper triangle
If you would prefer to think of it as swapping the positions of 0's and 1's you could instead do
swap_0_1 <- function(m, n) {
ones <- which(m==1)
zers <- which(m==0)
stopifnot(length(ones)>=n && length(zers)>=n)
ones <- sample(ones, n)
zers <- sample(zers, n)
vals <- m[ones]
m[ones] <- m[zers]
m[zers] <- vals
m
}
Note this will treat values on the diagonal differently than the other function.

Related

Finding a pattern in a binary matrix with R

I have a nxn symetrical binary matrix and I want to find the largest rectangle (area) with 0 at the top-left and bottom-right corners and 1 at the top-right and bottom-left corner. If I just do it with loops, checking all the rectangles from the biggest to the smallest it takes "days" for n=100. Does anyone have an idea to do it efficiently?
Thanks a lot !

thanks for your answers. Matrices I use are adjacency matrices of random Erdos-Renyi graphs. But one can take any random symetrical binary matrix to test it. Until now, I use 4 nested loops :
switch<-function(Mat)
{
n=nrow(Mat)
for (i in 1:(n-1)) {
for(j in seq(n,i+1,by=-1)) {
for(k in 1:(n-1)) {
if ((k==i)||(k==j) || (Mat[i,k]==1)||(Mat[j,k]==0)) next
for(l in seq(n,k+1,by=-1)) {
if ((l==i)||(l==j)|| (Mat[i,l]==0)||(Mat[j,l]==1)) next
return(i,j,k,l)
}
}
}
}

Here's an approach that you can try for now. It doesn't require symmetry, and it treats all nonzero elements like ones for efficiency.
It loops over the ones, assuming that there are fewer ones than zeros. (You would want to loop over zeros in the reverse case with fewer zeros than ones.)
This approach probably isn't optimal, since it loops over all of the ones even if the largest box is identified early. You can devise a clever stopping condition to short-circuit the loop in that case.
But it is still fast for n = 100, requiring less than half of a second on my machine, even when ones and zeros occur in roughly equal proportion (the worst case):
f <- function(X) {
if (!is.logical(X)) {
storage.mode(X) <- "logical"
}
J <- which(X, arr.ind = TRUE, useNames = FALSE)
i <- J[, 1L]
j <- J[, 2L]
nmax <- 0L
res <- NULL
for (k in seq_along(i)) {
i0 <- i[k]
j0 <- j[k]
ok <- i < i0 & j > j0
if (any(ok)) {
i1 <- i[ok]
j1 <- j[ok]
ok <- !(X[i0, j1] | X[i1, j0])
if (any(ok)) {
i1 <- i1[ok]
j1 <- j1[ok]
n <- (i0 - i1 + 1L) * (j1 - j0 + 1L)
w <- which.max(n)
if (n[w] > nmax) {
nmax <- n[w]
res <- c(i0 = i0, j0 = j0, i1 = i1[w], j1 = j1[w])
}
}
}
}
res
}
mkX <- function(n) {
X <- matrix(sample(0:1, n * n, TRUE), n, n)
X[upper.tri(X)] <- t(X)[upper.tri(X)]
X
}
set.seed(1L)
X <- mkX(6L)
X
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 1 0 0 1 0
## [2,] 1 0 1 1 0 0
## [3,] 0 1 0 1 1 1
## [4,] 0 1 1 0 0 0
## [5,] 1 0 1 0 0 1
## [6,] 0 0 1 0 1 0
f(X)
## i0 j0 i1 j1
## 5 1 1 5
Y <- mkX(100L)
microbenchmark::microbenchmark(f(Y))
## Unit: milliseconds
## expr min lq mean median uq max neval
## f(Y) 310.139 318.3363 327.8116 321.4109 326.5088 391.9081 100

Create a list of matrices with 1's / 0's based on a list of matrices with the index

I have the following problem:
I do have a lists with matrices with indices.
Every column of a matrix shows which row indices should be equal to 1 for that specific column.
All the other values should be equal to 0.
I do know the size of the output matrices and there are no duplicated values in a column.
For example the following matrix should be translated as follows:
m_in = matrix(c(1,3,5,7,3,4), nrow =2)
m_out = matrix(c(1,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,0), nrow = 7)
I did made a code that works, but it would be great if I could achieve this without loops in a more efficient/clever way.
Index <- matrix(20, 100, data = sample(1:200))
Vector <- c(2,3,5,8,20)
ListIndices <- sapply(Vector, function(x)Index[0:x,])
emptylistlist <- list()
for (i in 1: length(ListIndices)){
for (j in 1 : 100){
emptylistlist[[i]] <- matrix(nrow = 200, ncol = 100, data = 0)
emptylistlist[[i]][ListIndices[[i]],j]<-1
}
}

We can try sparseMatrix from library(Matrix) and then wrap it with as.matrix.
library(Matrix)
as.matrix(sparseMatrix(i= c(m1), j= c(col(m1)), x=1))
# [,1] [,2] [,3]
#[1,] 1 0 0
#[2,] 0 0 0
#[3,] 1 0 1
#[4,] 0 0 1
#[5,] 0 1 0
#[6,] 0 0 0
#[7,] 0 1 0
If there is a list of matrices, then we can use lapply
lapply(lst, function(y) as.matrix(sparseMatrix(i= c(y), j= c(col(y)), x= 1)))

The typical way is with matrix assignment:
m_out = matrix(0L, max(m_in), ncol(m_in))
m_out[cbind(c(m_in), c(col(m_in)))] <- 1L
How it works: The syntax for matrix assignment M[IND] <- V is described at help("[<-").
Each row of IND is a pair of (row, column) positions in M.
Elements of M at those positions will be overwritten with (corresponding elements of) V.
As far as the list of matrices goes, an array would be more natural:
set.seed(1)
Index <- matrix(20, 100, data = sample(1:200))
Vector <- c(2,3,5,8,20)
idx <- sapply(Vector, function(x)Index[0:x,])
# "ListIndices" is too long a name
a_out = array(0L, dim=c(
max(unlist(idx)),
max(sapply(idx,ncol)),
length(idx)))
a_out[ cbind(
unlist(idx),
unlist(lapply(idx,col)),
rep(seq_along(idx),lengths(idx))
)] <- 1L
The syntax is the same as for matrix assignment.
Seeing as the OP has so many zeros and so few ones, a sparse matrix, as in #akrun's answer makes the most sense, or a sparse array, if such a thing has been implemented.

Trying to output a list of lists from a loop in r

I am trying to do something that I am sure should be quite simple: I am trying to make a function which turns a list of number pairs (pairedList) and a vector (botList) into a series of vectors (one for each pair) of length(botlist) where the numbers in those vectors are all equal to zero except for those corresponding to the index points identified by the pair which will be 1.
#generating mock data to simulate my application:
pair1 <- c(2,4)
pair2 <- c(1,3)
pair3 <- c(5,6)
pairedList <- c(pair1, pair2, pair3)
botList <- c(1:length(pairedList))
Here is what the output should ultimately look like:
[1] 0 1 0 1 0 0
[1] 1 0 1 0 0 0
[1] 0 0 0 0 1 1
The code below allows me to print the vectors in the right manner (by replacing the line in the if loop with print(prob) and commenting out the final print statement):
library(gtools)
test <- function() {
#initialising empty list
output <- list()
for (i in botList) {
x <- rep(0, length(pairedList))
ind <- pairedList[i:(i+1)]
ind.inv <- sort(ind, decreasing=T)
val <- rep(1,length(ind))
new.x <- vector(mode="numeric",length(x)+length(val))
new.x <- new.x[-ind]
new.x[ind] <- val
prob <- new.x
if (odd(i)) {
output[i] <- prob
}
print(output)
}
}
However I need to return this list of vectors from my function rather than printing it and when I do so, I get the following output and am met with an error and a number of warnings:
[[1]]
[1] 0
[[1]]
[1] 0
[[1]]
[1] 0
[[2]]
NULL
[[3]]
[1] 1
[[1]]
[1] 0
[[2]]
NULL
[[3]]
[1] 1
[[1]]
[1] 0
[[2]]
NULL
[[3]]
[1] 1
[[4]]
NULL
[[5]]
[1] 0
Error in new.x[-ind] : only 0's may be mixed with negative subscripts
In addition: Warning messages:
1: In output[i] <- prob :
number of items to replace is not a multiple of replacement length
2: In output[i] <- prob :
number of items to replace is not a multiple of replacement length
3: In output[i] <- prob :
number of items to replace is not a multiple of replacement length
My question is:
How can I change my code to output what I need from this function? I thought this was going to be a five minute job, and after hours on this one little thing I am stuck!
Thanks in advance

Something you can try, although there must be nicer ways:
# create a list with all the "pair1", "pair2", ... objects
l_pairs <- mget(ls(pattern="^pair\\d+"))
# compute maximum number among the values of pair., it determines the number of columns of the results
n_max <- max(unlist(l_pairs))
# finally, create for each pair. a vector of 0s and put 1s at the positions specified in pair.
res <- t(sapply(l_pairs, function(x){y <- rep(0, n_max); y[x]<-1; y}))
res
# [,1] [,2] [,3] [,4] [,5] [,6]
#pair1 0 1 0 1 0 0
#pair2 1 0 1 0 0 0
#pair3 0 0 0 0 1 1

You could use row/col indexing
m1 <- matrix(0, ncol=max(pairedList), nrow=3)
m1[cbind(rep(1:nrow(m1),each=2), pairedList)] <- 1
m1
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 0 1 0 1 0 0
#[2,] 1 0 1 0 0 0
#[3,] 0 0 0 0 1 1

James, the following should work. I've just tested it.
pair1 <- c(2,4)
pair2 <- c(1,3)
pair3 <- c(5,6)
pairedList <- c(pair1, pair2, pair3)
botList <- c(1:(length(pairedList)/2)
library(gtools)
test <- function(pairedList, botList) {
#initialising empty list
output <- list()
for (i in botList) {
x <- rep(0, length(pairedList))
ind <- pairedList[i:(i+1)]
ind.inv <- sort(ind, decreasing=T)
val <- rep(1,length(ind))
new.x <- vector(mode="numeric",length(x)+length(val))
new.x <- new.x[-ind]
new.x[ind] <- val
prob <- new.x
output[[i]] <- prob
print(prob)
}
return(output)
}
The reason for the strange error is that botList was being created as length 6 rather than length 3. Also if you want to assign a value to a list within a function you need to use double [[]] rather than []
Once you've removed them from the function rbind them all together as follows:
output <- test(pairedList, botList)
result <- do.call(rbind,output)

R: Multiplying elements of data frame with neighbours

I have a data frame of 300x300 elements. Each of them are either -1 or +1:
[,1] [,2] [,3]
[1,] 1 -1 -1
[2,] 1 1 1
[3,] -1 -1 1
[4,] 1 1 -1
What I want is to iterate over my data frame, and multiply each value with every neighbouring value.
I.e:
For element [1,1] in my original data frame I want the product of [1,1], [1,2] and [2,1]
For element [2,2] in my original data frame I want the product of [2,2], [1,2], [2,1], [2,3] and [3,2].
I have tried to create 4 new data frames, each shifted 1 element to the right, left, up and down, respectively:
x_up <- shift(x, 1, dir='up')
x_up <- as.array(x_up)
dim(x_up) <- dims
x_down <- shift(x, 1, dir='down')
x_down <- as.array(x_down)
dim(x_down) <- dims
x_left <- shift(x, 1, dir='left')
x_left <- as.array(x_left)
dim(x_left) <- dims
x_right <- shift(x, 1, dir='right')
x_right <- as.array(x_right)
dim(x_right) <- dims
where x is my original data frame.
I can see when I used this approach, the new data frames are not rightfully shiftet; more of them are identical. I checked this with identical().
Is there another approach to my problem?
Edit:
shift() is of the 'binhf' library

I think there's probably a smarter way to do this, but the standard approach would be iterating over each element and multiplying its surroundings.
Starting with:
mat <- matrix(c(1, 1, -1, 1, -1, 1, -1, 1, -1, 1, 1, -1), ncol=3)
In order to avoid problems on positive margins, you must add a column and a row of 1's as margins (positive 1 won't be a problem when multiplying, if you were summing it would have to be 0's, for example).
mat2 <- addmargins(mat, FUN=function(x) 1)
Now you create an empty matrix to hold the output, and then iterate over the elements and multiply the neighbors.
out <- matrix(nrow=nrow(mat), ncol=ncol(mat))
for (i in 1:nrow(mat)) {
for (j in 1:ncol(mat)) {
out[i,j] <- prod(mat[i,j], mat2[i-1, j], mat2[i, j-1], mat2[i+1, j], mat2[i, j+1])
}
}
Resulting in:
> out
[,1] [,2] [,3]
[1,] -1 1 1
[2,] -1 1 -1
[3,] 1 1 1
[4,] -1 1 -1
This took less than a second for a 300x300 matrix, so it might be enough for you.

This should do the trick:
ind <- which(x==x, arr.ind=TRUE) # index matrix
# find distances (need distances of 1 or 0)
dist.mat <- as.matrix(dist(ind))
inds2mult <- apply(dist.mat, 1, function(ii) which(ii <= 1))
# get product of each list element in inds2mult
# and reform into appropriate matrix
matrix(
sapply(inds2mult, function(ii) prod(unlist(x)[ii])),
ncol=ncol(x))
# [,1] [,2] [,3]
#[1,] -1 1 1
#[2,] -1 1 -1
#[3,] 1 1 1
#[4,] -1 1 -1
To get around memory issues with large matrices in the call to dist, you can try the fields.rdist.near function (with a delta value of 1) from the fields package:
x <- matrix(rep(-1, 300*300), ncol=300)
ind <- which(x==x, arr.ind=TRUE) # index matrix
library(fields)
ind.list <- fields.rdist.near(ind, delta=1) # took my computer ~ 15 - 20 seconds
inds2mult <- tapply(ind.list$ind[,2], ind.list$ind[,1], list)
matrix(
sapply(inds2mult, function(ii) prod(unlist(x)[ii])),
ncol=ncol(x))
The delta argument from the fields.rdist.near help page:
Threshhold distance. All pairs of points that separated by more
than delta in distance are ignored.

How to generate a matrices A) each row has a single value of one; B) rows sum to one

This is a two-part problem: the first is to create an NXN square matrix for which only one random element in each row is 1, the other items must be zero. (i.e. the sum of elements in each row is 1).
The second is to create an NXN square matrix for which the sum of items in each row is 1, but each element follows a distribution e.g. normal distribution.
Related questions include (Create a matrix with conditional sum in each row -R)
Matlab seems to do what I want automatically (Why this thing happens with random matrix such that all rows sum up to 1?), but I am looking for a solution in r.
Here is what I tried:
# PART 1
N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,sample(N,1)]<- 1
})
(I get zeroes still)
# PART 2
N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,]<- rnorm(N)
})
(It needs scaling)

Here's another loop-less solution that uses the two column addressing facility using the "[<-" function. This creates a two-column index matrix whose first column is simply an ascending series that assigns the row locations, and whose second column (the one responsible for picking the column positions) is a random integer value. (It's a vectorized version of Matthew's "easiest method", and I suspect would be faster since there is only one call to sample.):
M <- matrix(0,N,N)
M[ cbind(1:N, sample(1:N, N, rep=TRUE))] <- 1
> rowSums(M)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
If you didn't specify rep=TRUE, then colSums(M) would have all been ones as well, but that was not what you requested. It does mean the rank of your resultant matrix may be less than N. If you left out the rep=TRUE the matrix would be full rank.

Here you see why lapply doesn't always replace a loop. You're trying to iterate through the rows of x and modify the matrix, but what you're modifying is a copy of the x from the global environment.
The easiest fix is to use a for loop:
for (y in 1:N) {
x[y,sample(N,1)]<- 1
}
apply series should be used for the return value, rather than programming functions with side-effects.
A way to do this is to return the rows, then rbind them into a matrix. The second example is shown here, as this more closely resembles an apply:
do.call(rbind, lapply((1:N), function(i) rnorm(N)))
However, this is more readable:
matrix(rnorm(N*N), N, N)
Now to scale this to have row sums equal to 1. You use the fact that a matrix is column-oriented and that vectors are recycled, meaning that you can divide a matrix M by rowSums(M). Using a more reasonable N=5:
m <- matrix(rnorm(N*N), N, N)
m/rowSums(m)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.1788692 0.5398464 0.24980924 -0.01282655 0.04430168
## [2,] 0.4176512 0.2564463 0.11553143 0.35432975 -0.14395871
## [3,] 0.3480568 0.7634421 -0.38433940 0.34175983 -0.06891932
## [4,] 1.1807180 -0.0192272 0.16500179 -0.31201400 -0.01447859
## [5,] 1.1601173 -0.1279919 -0.07447043 0.20865963 -0.16631458

No-loop solution :)
n <- 5
# on which column in each row insert 1s
s <- sample(n,n,TRUE)
# indexes for each row
w <- seq(1,n*n,by=n)-1
index <- s+w
# vector of 0s
vec <- integer(n*n)
# put 1s
vec[index] <- 1
# voila :)
matrix(vec,n,byrow = T)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 0 0 1 0
[3,] 0 0 0 0 1
[4,] 1 0 0 0 0
[5,] 1 0 0 0 0