create an incidence matrix with restrictions in r (i.graph) - r

I would like to create a (N*M)-Incidence Matrix for a bipartite graph (N=M=200).
However, the following restrictions have to be considered:
Each column i ( 1 , ... , 200 ) has a column sum of g = 10
each row has a Row sum of h = 10
no multiedges (The values in the incidence Matrix only take on the values [0:1]
So far I have
M <- 200; # number of rows
N <- 200; # number of colums
g <- 10
I <- matrix(sample(0:1, M*N, repl=T, prob= c(1-g/N,g/N)), M, N);
Does anybody has a solution?

Here's one way to do what you want. First the algorithm idea, then its implementation in R.
Two step Algorithm Idea
You want a matrix of 0's and 1's, with each row adding up to be 10, and each column adding up to be 10.
Step 1: First,create a trivial solution as follows:
The first 10 rows have 1's for the first 10 elements, then 190 zeros.
The second set of ten rows have 1's from the 11th to the 20th element and so on.
In other words, a feasible solution is to have a 200x200 matrix of all 0's, with dense matrices of 10x10 1's embedded diagonally, 20 times.
Step 2: Shuffle entire rows and entire columns.
In this shuffle, the rowSum and columnSums are maintained.
Implementation in R
I use a smaller matrix of 16x16 to demonstrate. In this case, let's say we want each row and each column to add up to 4. (This colsum has to be integer divisible of the larger square matrix dimension.)
n <- 4 #size of the smaller square
i <- c(1,1,1,1) #dense matrix of 1's
z <- c(0,0,0,0) #dense matrix of 0's
#create a feasible solution to start with:
m <- matrix(c(rep(c(i,z,z,z),n),
rep(c(z,i,z,z),n),
rep(c(z,z,i,z),n),
rep(c(z,z,z,i),n)), 16,16)
#shuffle (Run the two lines following as many times as you like)
m <- m[sample(16), ] #shuffle rows
m <- m[ ,sample(16)] #shuffle columns
#verify that the sum conditions are not violated
colSums(m); rowSums(m)
#solution
print(m)
Hope that helps you move forward with your bipartite igraph.

Related

R: looking for a more efficient way of running rbinom() taking probabilities from a large matrix

I have a 20000x90 matrix M of probabilities ranging between 0 and 1. I want to use these probabilities to return a 20000x90 matrix of 1's and 0's (i.e. for each probability p in the matrix I want to run rbinom(1,1,p)).
This is my current solution which works:
apply(M,1:2,function(x) rbinom(1,1,x))
However, it is quite slow and I am wondering if there is a faster way of achieving the same thing.
rbinom is vectorised, so you can pass the vector (or matrix) of probabilities; you just need to change the n, the number of observations, from 1 to the number of values in M.
An example:
# with loop
set.seed(74309281)
M = matrix(runif(20000*90), nr=20000, nc=90)
a1 = apply(M,1:2,function(x) rbinom(1,1,x))
# without loop
set.seed(74309281)
M = matrix(runif(20000*90), nr=20000, nc=90)
a2 = matrix(rbinom(length(M),1,M), nr=nrow(M), nc=ncol(M))
all.equal(a1, a2)
# [1] TRUE

nested loop matrix index in R

I am learning matrix multiplication in R and following is what I want to achieve. I am doing this purely to upscale my skills in R.
Following is the kind of matrix I am working with:
m <- matrix(1, 100, 10)
I have matrix with only element 1 with 100 rows and 10 columns. Now I want to replace for column 1 with 0 from row1 to row10. Then for the second column, I want to replace 1 with zeros from row 11 to row 20. Similarly for for the third column, I want to replace 1 with zeros from row 21 to row 30 and similarly for the rest up too column 10. Following my my example
m <- matrix(1, 100, 10)
for(j in 1:10){
for(i in (j-1)*10+1: j*10){
m[i,j] <-0
}
}
I was quite confident that my logic was correct but every time I run my code, I get following error message Subscripts out of bounds Call. I tried couple days now and I could not resolve this problem. I would highly appreciate for any hints or direct solutions to fix this. Many thanks in advance.
You could use just one variable, which I think would be easier. For each column, j, get the lower and upper range of row indices and assign as 0.
for(j in 1:10){
row_lower <- (j-1)*10+1
row_upper <- j*10
m[row_lower: row_upper, j] <- 0
}
Returns 0's in your specified range.

Convert a one column matrix to n x c matrix

I have a (nxc+n+c) by 1 matrix. And I want to deselect the last n+c rows and convert the rest into a nxc matrix. Below is what I've tried, but it returns a matrix with every element the same in one row. I'm not sure why is this. Could someone help me out please?
tmp=x[1:n*c,]
Membership <- matrix(tmp, nrow=n, ncol=c)
You have a vector x of length n*c + n + c, when you do the extract, you put a comma in your code.
You should do tmp=x[1:(n*c)].
Notice the importance of parenthesis, since if you do tmp=x[1:n*c], it will take the range from 1 to n, multiply it by c - giving a new range and then extract based on this new range.
For example, you want to avoid:
(1:100)[1:5*5]
[1] 5 10 15 20 25
You can also do without messing up your head with indexing:
matrix(head(x, n*c), ncol=c)

Convert equal interval of vector to rows of matrix

I've imported table that contains the travel times for an origin-destination cost matrix of size nxn. As a result, travel times equal to zero when an origin and destination are the same.
For example, an OD cost matrix of 25 origins and 25 destinations (625 elements) would have zero values running down the diagonal. In a vector, the value 0 occurs at the 0th element, 26th element, 51st element, etc.
I've read the travel times in as a vector and I'd like to reshape the vector into a matrix where every element on the diagonal has the value of zero. Does anyone have any ideas on how this would be done?
Code:
### READ and PREPARE DATA ###
# Read OD cost matrix (use data.table for performance)
od_table <- read.table('DMatrix.txt', sep=',', header=TRUE, na.strings="NA", stringsAsFactors=FALSE)
v <- t(od_table$Total_TravelTime)
n <- sqrt(length(v))
D <- matrix(v, nrow=25)
The resulting matrix has zero values along the first row only:

how to select a matrix column based on column name

I have a table with shortest paths obtained with:
g<-barabasi.game(200)
geodesic.distr <- table(shortest.paths(g))
geodesic.distr
# 0 1 2 3 4 5 6 7
# 117 298 3002 2478 3342 3624 800 28
I then build a matrix with 100 rows and same number of columns as length(geodesic.distr):
geo<-matrix(0, nrow=100, ncol=length(unlist(labels(geodesic.distr))))
colnames(geo) <- unlist(labels(geodesic.distr))
Now I run 100 experiments where I create preferential attachment-based networks with
for(i in seq(1:100)){
bar <- barabasi.game(vcount(g))
geodesic.distr <- table(shortest.paths(bar))
distance <- unlist(labels(geodesic.distr))
for(ii in distance){
geo[i,ii]<-WHAT HERE?
}
}
and for each experiment, I'd like to store in the matrix how many paths I have found.
My question is: how to select the right column based on the column name? In my case, some names produced by the simulated network may not be present in the original one, so I need not only to find the right column by its name, but also the closest one (suppose my max value is 7, I may end up with a path of length 9 which is not present in the geo matrix, so I want to add it to the column named 7)
There is actually a problem with your approach. The length of the geodesic.distr table is stochastic, and you are allocating a matrix to store 100 realizations based on a single run. What if one of the 100 runs will give you a longer geodesic.distr vector? I assume you want to make the allocated matrix bigger in this case. Or, even better, you want run the 100 realizations first, and allocate the matrix after you know its size.
Another potential problem is that if you do table(shortest.paths(bar)), then you are (by default) considering undirected distances, will end up with a symmetric matrix and count all distances (expect for self-distances) twice. This may or may not be what you want.
Anyway, here is a simple way, with the matrix allocated after the 100 runs:
dists <- lapply(1:100, function(x) {
bar <- barabasi.game(vcount(g))
table(shortest.paths(bar))
})
maxlen <- max(sapply(dists, length))
geo <- t(sapply(dists, function(d) c(d, rep(0, maxlen-length(d)))))

Resources