Build a matrix depending on two random walks in a graph - r

I am working on a project and I reached this point but in fact I am stuck on it since one week ago, I tried many ideas but all trials to code my algorithm failed.
Suppose we have the following simple graph:
the edges in order are: 1--3, 1--4, 3--2
For each edge, a random walk is defined on each vertex to move to one of it's neighbors like:
For the first edge, v1=1 ,v2=3, n1=3,4 and n2=1,2 in order, so the possible moves from v1 and v2 are:
1 to 3,3 to 1
1 to 4,3 to 1
1 to 3,3 to 2
1 to 4,3 to 2
For the second edge, v1=1 ,v2=4, n1=3,4 and n2=1 in order,so the possible moves from v1 and v2 are:
1 to 3,4 to 1
1 to 4,3 to 1
For the third edge, v1=3 ,v2=2, n1=1,2 and n2=3 in order,so the possible moves from v1 and v2 are:
3 to 1,2 to 3
3 to 2,2 to 3
For the whole graph there are just 8 possible moves so I have 8 variables to construct the constraints matrix
Let us denote the moves by x's (according to their order of occurrences); i.e
(1 to 3,3 to 1) to be represented by x_1
(1 to 4,3 to 1) to be represented by x_2
:
(3 to 1,2 to 3) to be represented by x_7
(3 to 2,2 to 3) to be represented by x_8
I want to build the required constraints matrix depending on these moves, the number of constraints will equal \sum{i} ( number of neighbors for v1(i) * number of neighbors for v2(i) ) which is 10 in our graph.
My algorithm to build this matrix is:
Step1: 1) select 1st edge, fix v1, v2, n2
2) change n1 and fill the 1st row of the matrix by 1's in the place of the resulted moves and 0 if there is no similar move on the graph until you finish all elements in n1.
Step2: move to the 2nd row of the matrix and select the 2nd element of n2 and
1) loop over n1
2) fill the 2nd row by 1's in the place of the resulted moves until you finish all elements in n1.
Step3: since you selected all elements in n1 and n2 for the vertices in the first edge move to a new row in the matrix
Step4: Select next edges and do the same work done before until you finish all edges.
Step5: select the 1st edge again and do the same work but while fixing v1,v2 &n1, loop over n2
The resulted matrix according to this algorithm will be:
1 1 0 0 0 0 0 0
0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0
0 0 0 0 0 0 1 1
1 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
What I failed to do is: how to let the matrix know that there is a move and to replace it by 1 in it's position and if there is no move to replace it by 0 in it's position
My code is:
library(igraph)
graph<-matrix(c(1,3,1,4,3,2),ncol=2,byrow=TRUE)
g<-graph.data.frame(d = graph, directed = FALSE)
countercol<-0
for (edge in 1:length(E(g))){
v1<-ends(graph = g, es = edge)[1]
v2<-ends(graph = g, es = edge)[2]
n1<-neighbors(g,v1,mode=c("all"))
n2<-neighbors(g,v2,mode=c("all"))
countercol=countercol+(length(n1)*length(n2))
}
counterrow<-0
for (edge in 1:length(E(g))){
v1<-ends(graph = g, es = edge)[1]
v2<-ends(graph = g, es = edge)[2]
n1<-neighbors(g,v1,mode=c("all"))
n2<-neighbors(g,v2,mode=c("all"))
counterrow=counterrow+(length(n1)+length(n2))
}
for (edge in 1:length(E(df))){
v1<-ends(graph = df, es = edge)[1]
v2<-ends(graph = df, es = edge)[2]
n1<-neighbors(df,v1,mode=c("all"))
n2<-neighbors(df,v2,mode=c("all"))
...
...
...
}
I am not looking for someone to write the code, what I want is the idea to let the program differentiate between the possible moves and store 1's and 0's in the suitable position for the resulted move.
Many Many thanks for any kind of help

Here's a solution consisting of two parts
edgeMoves <- function(e) {
umoves <- sapply(ends(graph = g, es = e), neighbors, graph = g, mode = "all", simplify = FALSE)
do.call(paste, c(expand.grid(mapply(function(x, y)
paste(x, names(y), sep =" to "), ends(graph = g, es = e), umoves, SIMPLIFY = FALSE)), sep = ", "))
}
edgeConstraints <- function(e) {
v <- ends(graph = g, es = e)
n1 <- names(neighbors(g, v[1], mode = "all"))
n2 <- names(neighbors(g, v[2], mode = "all"))
t(cbind(sapply(n2, function(nn2) moves %in% paste0(v[1], " to ", n1, ", ", v[2], " to ", nn2)),
sapply(n1, function(nn1) moves %in% paste0(v[1], " to ", nn1, ", ", v[2], " to ", n2))))
}
moves <- do.call(c, sapply(E(g), edgeMoves))
moves
# [1] "1 to 3, 3 to 1" "1 to 4, 3 to 1" "1 to 3, 3 to 2"
# [4] "1 to 4, 3 to 2" "1 to 3, 4 to 1" "1 to 4, 4 to 1"
# [7] "3 to 1, 2 to 3" "3 to 2, 2 to 3"
do.call(rbind, sapply(E(g), edgeConstraints)) * 1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# 1 1 1 0 0 0 0 0 0
# 2 0 0 1 1 0 0 0 0
# 3 1 0 1 0 0 0 0 0
# 4 0 1 0 1 0 0 0 0
# 1 0 0 0 0 1 1 0 0
# 3 0 0 0 0 1 0 0 0
# 4 0 0 0 0 0 1 0 0
# 3 0 0 0 0 0 0 1 1
# 1 0 0 0 0 0 0 1 0
# 2 0 0 0 0 0 0 0 1
The row order is different, but I suspect that it is not a problem. Also, for a single edge you may use edgeMoves(e) and edgeConstraints(e) * 1.

Related

Replace specific numbers that follow each other in a matrix

I am working with survey data and i would like to replace specific values - that follow each other- in a data frame.
For example
v1 v2 v3 v4 v5
0 2 0 0 55
0 0 3 0 1
3 0 1 1 2
0 2 0 2 0
If I replace (0,2,0) with 1's and the rest of the data frame with 0's, the new matrix will look like
v1 v2 v3 v4 v5
1 1 1 0 0
0 0 0 0 0
0 0 0 0 0
1 1 1 1 1
How can I do this to n-lenght specific number, i.e. (1,3); (1,2,4,5,8,2)?
As others have pointed out, you need to clarify your question a bit to make sure that we are answering it correctly.
My assumptions are that, you are matching a pattern (c(0,2,0)) in the example you show, and you are only matching the pattern across rows. That is, it cannot wrap from row 2, column 5 to row 3, column 1, nor will it check matches in the columns only.
If those assumptions are correct, then the following function will work. It replicates the example you provided, and returns a matrix. You can modify the replace value (rep_val) and fill values (fill_val), 1 and 0 in your example, respectively, with the optional parameters. This function could also be improved to be more elegant, but I think it works.
Code
replace_pattern <- function(x, pattern, rep_with = 1, fill_val = 0)
{
n <- length(pattern)
if (n > ncol(x))
stop("pattern is longer than number of columns")
new_x <- matrix(fill_val, nrow = nrow(x), ncol = ncol(x))
# loop over each row
for (rr in seq_len(nrow(x))) {
# start matching the pattern at the entry = length of pattern
# and look backwards
for (cc in n:ncol(x)) {
cur_cols <- (cc - n + 1):cc
cur_vals <- x[rr, cur_cols]
# if it matches the pattern, replace the values with specified value
if (isTRUE(all.equal(cur_vals, pattern, check.attributes = FALSE))) {
new_x[rr, cur_cols] <- rep_with
}
}
}
new_x
}
Testing
Using your example, and setting it to xx
xx:
v1 v2 v3 v4 v5
0 2 0 0 55
0 0 3 0 1
3 0 1 1 2
0 2 0 2 0
And then calling replace_pattern(xx, c(0, 2, 0)) returns the second matrix you provided.
x2 <- xx
x2[2, 2] <- 1
replace_pattern(x2, c(1, 3))
Returns:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 1 1 0 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
And
xx3 <- rbind(xx, c(1, 2, 4, 5, 8))
replace_pattern(xx3, c(1, 2, 4, 5, 8))
Returns:
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 1 1 1 1 1
Finally, replace_pattern(xx, c(1,2,4,5,8,2)) will fail because the pattern is longer than the number of columns.

Biclustering in R

I want to apply byclustering on a binary matrix in R. There is a nice package called "biclust" available, but it does and displays not everything that I want.
I have a binary matrix which looks like the following:
1 0 0 1 0 1 0
0 0 0 0 0 0 0
0 0 1 0 1 0 0
1 0 0 1 0 1 0
0 0 1 0 1 0 0
1 0 0 1 0 1 0
0 0 0 0 0 0 0
And my goal is to bicluster (and display) this as following (may be colored):
1 1 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 0 0 0 0
0 0 0 1 1 0 0
0 0 0 1 1 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Set up code:
# install.packages("biclust") (if necessary)
library("biclust")
testMatrix <- matrix(c(1,0,0,1,0,1,0,
0,0,0,0,0,0,0,
0,0,1,0,1,0,0,
1,0,0,1,0,1,0,
0,0,1,0,1,0,0,
1,0,0,1,0,1,0,
0,0,0,0,0,0,0),
nrow = 7,
ncol = 7,
byrow = TRUE)
I applied the biclust function of the "biclust" R package:
testCluster <- biclust(x = testMatrix, method=BCBimax())
and indeed I get the two clusters expected:
An object of class Biclust
call:
biclust(x = testMatrix, method = BCBimax())
Number of Clusters found: 2
First 2 Cluster sizes:
BC 1 BC 2
Number of Rows: 3 2
Number of Columns: 3 2
I can both display the clusters separately by:
drawHeatmap(x = testMatrix, bicResult = testCluster, number = 1) # shown in picture below
drawHeatmap(x = testMatrix, bicResult = testCluster, number = 2)
and I can display the entire clustered matrix (one cluster at upper left corner) by:
drawHeatmap2(x = testMatrix, bicResult = testCluster, number = 1) # shown in picture below
drawHeatmap2(x = testMatrix, bicResult = testCluster, number = 2)
So far so good, but I want:
Colors of display switched. Now the 1 is red and the 0 is green.
I want to see the rows and columns of the original matrix. Now there are shown just the row numbers and column numbers of the specific cluster (with drawHeatMap) and there are shown no row and column numbers at the entire clustered matrix (drawHeatMap2).
I want a nicely ordered clustered matrix. Now only the cluster specified in drawHeatmap2 is shown in the upper left corner, but for the rest of the matrix I also want the other clusters nicely ordered from the upper left corner to the lower right corner for the rest of the matrix.
Are these changes possible (with the "biclust" package)? Or is it better to do it in another way with R?
Change the drawHeatmap() funtion in the biclust source packag package:
trace("drawHeatmap", edit = TRUE)
Change the following:
(a) Switch red and green - switch the rvect and gvect in call rgb()
(b) Original rownames instead of new - change 'labels=' to '=bicCols' and '=bicRows'.
Print rownumbers: before axis about rows: cat(bicRows).
Save rownumbers to file - before axis about rows: write(bicRows, file="FILENAME.txt")

Basic R, how to populate a vector with results from a function

So I have a list of coordinates that I perform a chull on.
X <- matrix(stats::rnorm(100), ncol = 2)
hpts <- chull(X)
chull would return something like "[1] 1 3 44 16 43 9 31 41". I want to then multiple X by another vector to return only the values of X that are in the result set of chull. So for example [-2.1582511,-2.1761699,-0.5796294]*[1,0,1,...] = [-2.1582511,0,-0.5796294...] would be the result. I just don't know how to populate the second vector correctly.
Y <- matrix(0, ncol = 1,nrow=50) #create a vector with nothing
# how do I fill vector y with a 1 or 0 based on the results from chull what do I do next?
X[,1] * Y
X[,2] * Y
Thanks,
To return only the values of X that are in the result set of hpts, use
> X[hpts]
## [1] 2.1186262 0.5038656 -0.4360200 -0.8511972 -2.6542077 -0.3451074 1.0771153
## [8] 2.2306497
I read it like "X such that hpts", or "the values of hpts that are in X"
Of course, these values of X are different from yours, due to my values of rnorm
To get a vector of 1s and 0s signifying results use
> Y <- ifelse(X[,1] %in% X[hpts], 1, 0)
> Y
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0
## [44] 0 1 0 0 1 0 1

splitting dataframe with collated points in to individuals in R

I have a dataframe (.txt) which looks like this [where "dayX" = the day of death in a survival assay in fruitflies, the numbers beneath are the number of flies to die in that treatment combination on that day, X or A are treaments, m & f are also treatments, the first number is the line, the second number is the block]
line day1 day2 day3 day4 day5
1 Xm1.1 0 0 0 2 0
2 Xm1.2 0 0 1 0 0
3 Xm2.1 1 1 0 0 0
4 Xm2.2 0 0 0 3 1
5 Xf1.1 0 3 0 0 1
6 Xf1.2 0 0 1 0 0
7 Xf2.1 2 0 2 0 0
8 Xf2.2 1 0 1 0 0
9 Am1.1 0 0 0 0 2
10 Am1.2 0 0 1 0 0
11 Am2.1 0 2 0 0 1
12 Am2.2 0 2 0 0 0
13 Af1.1 3 0 0 1 0
14 Af1.2 0 1 3 0 0
15 Af1.1 0 0 0 1 0
16 Af2.2 1 0 0 0 0
and want it to become this using R->
XA mf line block individual age
1 X m 1 1 1 4
2 X m 1 1 2 4
3 X m 1 2 1 3
and so on...
the resulting dataframe collects the "age" value from the day the individual died, as scored in the upper dataframe, for example there were two flies that died on the 4th day (day4) in treatment Xm1.1 therefore R creates two rows, one containing information extracted regarding the first individual and thus being labelled as individual "1", then another row with the same information except labelled as individual "2".. if a 3rd individual died in the same treatment on day 5, there would be a third row which is the same as the above two rows except the "age" would be "5" and individual would be "3". When it moves on to the next treatment row, in this case Xm1.2, the first individual to die within that treatment set would be labelled as individual "1" (which in this case dies on day 3). In my example there is a total of 38 deaths, therefore I am trying to get R to build a df which is 38*6 (excl. headers).
is there a way to take my dataframe [the real version is approx 50*640 with approx 50 individuals per unique combination of X/A, m/f, line (1:40), block (1-4) so ~32000 individual deaths] to an end dataframe of 6*~32000 in an automated way?
both of these example dataframes can be built using this code if it helps you to try out solutions:
test<-data.frame(1:16);colnames(test)=("line")
test$line=c("Xm1.1","Xm1.2","Xm2.1","Xm2.2","Xf1.1","Xf1.2","Xf2.1","Xf2.2","Am1.1","Am1.2","Am2.1","Am2.2","Af1.1","Af1.2","Af2.1","Af2.2")
test$day1=rep(0,16);test$day2=rep(0,16);test$day3=rep(0,16);test$day4=rep(0,16);test$day5=rep(0,16)
test$day4[1]=2;test$day3[2]=1;test$day2[3]=1;test$day4[4]=3;test$day5[5]=1;
test$day3[6]=1;test$day1[7]=2;test$day1[8]=1;test$day5[9]=3;test$day3[10]=1;
test$day2[11]=2;test$day2[12]=2;test$day4[13]=1;test$day3[14]=3;test$day4[15]=1;
test$day1[16]=1;test$day3[7]=2;test$day3[8]=1;test$day2[5]=3;test$day1[3]=1;
test$day5[11]=1;test$day5[9]=2;test$day5[4]=1;test$day1[13]=3;test$day2[14]=1;
test2=data.frame(rep(1:3),rep(1:3),rep(1:3),rep(1:3),rep(1:3),rep(1:3))
colnames(test2)=c("XA","mf","line","block","individual","age")
test2$XA[1]="X";test2$mf[1]="m";test2$line[1]=1;test2$block[1]=1;test2$individual[1]=1;test2$age[1]=4;
test2$XA[2]="X";test2$mf[2]="m";test2$line[2]=1;test2$block[2]=1;test2$individual[2]=2;test2$age[2]=4;
test2$XA[3]="X";test2$mf[3]="m";test2$line[3]=1;test2$block[3]=2;test2$individual[3]=1;test2$age[3]=3;
apologies for the awfully long way of making this dummy dataset, suffering from sleep deprivation and jetlag and haven't used R for months, if you run the code in R you will hopefully see better what I aim to do
-------------------------------------------------------------------------------------
By Rg255:
Currently stuck at this derived from #Arun's answer (I have added the strsplit (as.character(dt$line) , "" )) section to get around one error)
df=read.table("C:\\Users\\...\\data.txt",header=T)
require(data.table)
head(df[1:20])
dt <- as.data.table(df)
dt <- dt[, {dd <- unlist(.SD, use.names = FALSE);
list(individual = sequence(dd[dd>0]),
age = rep(which(dd>0), dd[dd>0])
)}, by=line]
out <- as.data.table(data.frame(do.call(rbind, strsplit(as.character(dt$line), ""))[, c(1:3,5)], stringsAsFactors=FALSE))
setnames(out, c("XA", "mf", "line", "block"))
out[, `:=`(line = as.numeric(line), block = as.numeric(block))]
out <- cbind(out, dt[, list(individual, age)])
Produces the following output:
> df=read.table("C:\\Users\\..\\data.txt",header=T)
> require(data.table)
> head(df[1:20])
line Day4 Day6 Day8 Day10 Day12 Day14 Day16 Day18 Day20 Day22 Day24 Day26 Day28 Day30 Day32 Day34 Day36 Day38 Day40
1 Xm1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 4 2
2 Xm2.1 0 0 0 0 0 0 0 0 0 2 0 0 0 1 2 1 0 2 0
3 Xm3.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1
4 Xm4.1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 2 3 8
5 Xm5.1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 3 3 3 6
6 Xm6.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
> dt <- as.data.table(df)
> dt <- dt[, {dd <- unlist(.SD, use.names = FALSE);
+ list(individual = sequence(dd[dd>0]),
+ age = rep(which(dd>0), dd[dd>0])
+ )}, by=line]
> out <- as.data.table(data.frame(do.call(rbind, strsplit(as.character(dt$line), ""))[, c(1:3,5)], stringsAsFactors=FALSE))
Warning message:
In function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 1)
> setnames(out, c("XA", "mf", "line", "block"))
> out[, `:=`(line = as.numeric(line), block = as.numeric(block))]
Error in `[.data.table`(out, , `:=`(line = as.numeric(line), block = as.numeric(block))) :
LHS of := must be a single column name, when with=TRUE. When with=FALSE the LHS may be a vector of column names or positions.
In addition: Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
> out <- cbind(out, dt[, list(individual, age)])
>
Here goes a data.table solution. The line column must have unique values.
require(data.table)
df <- read.table("data.txt", header=TRUE, stringsAsFactors=FALSE)
dt <- as.data.table(df)
dt <- dt[, {dd <- unlist(.SD, use.names = FALSE);
list(individual = sequence(dd[dd>0]),
age = rep(which(dd>0), dd[dd>0])
)}, by=line]
out <- as.data.table(data.frame(do.call(rbind,
strsplit(gsub("([[:alpha:]])([[:alpha:]])([0-9]+)\\.([0-9]+)$",
"\\1 \\2 \\3 \\4", dt$line), " ")), stringsAsFactors=FALSE))
setnames(out, c("XA", "mf", "line", "block"))
out[, `:=`(line = as.numeric(line), block = as.numeric(block))]
out <- cbind(out, dt[, list(individual, age)])
This works on your data.txt file.

Incidence Matrix for Finding Neighbor (both left and right) in Circular Blocks (column)

Consider a matrix with r rows and c columns and containing v integers between 0 and v-1; in the following example, r=4, c=2, and v=6.
L <- c(0,1,1,2,0,1,2,3)
(x <- matrix(L,nrow=4,ncol=2,byrow = TRUE))
## 0 1
## 1 2
## 0 1
## 2 3
The goal is to generate a r*c (row) by v column incidence matrix, as follows:
each row corresponds to one element of the original matrix (in column-major order, i.e. in the example here the 4th row corresponds to x[4,1] and the 5th row corresponds to x[1,2])
find the "neighbors" above and below each element, wrapping around (cyclically) from the top to the bottom of the matrix; count the number of neighbor elements for each value of v.
For example, the first element in the matrix (x[1,1]) has neighbours 1 (below) and 2 ("above", i.e. wrapped around to the bottom of the column; thus we enter 1 in columns 2 and 3 of row 1, matching the corresponding elements of 0:(v-1). The rest of the row is set to zero:
rownames 0 1 2 3 4 5
[1] 0 1 1 0 0 0
The next element (x[2,1]) has 0 on both sides (above and below), so the first column (corresponding to 0) is set to 2, with the rest of the elements equal to zero.
[2] 2 0 0 0 0 0
The full matrix for the example above is:
rownames 0 1 2 3 4 5
[1] 0 1 1 0 0 0
[2] 2 0 0 0 0 0
[3] 0 1 1 0 0 0
[4] 2 0 0 0 0 0
[5] 0 0 1 1 0 0
[6] 0 2 0 0 0 0
[7] 0 0 1 1 0 0
[8] 0 2 0 0 0 0
The row sums are each 2.
L =c(0,1,1,2,0,1,2,3)
x=matrix(L,nrow=4,ncol=2,byrow = TRUE)
There might be a cleaner way to do this:
wrapind <- function(i,n)
ifelse((r <- i %% n) == 0, n, r)
n <- nrow(x)
v <- 6
incmat <- matrix(0,ncol=v,nrow=prod(dim(x)),
dimnames=list(NULL,0:(v-1)))
k <- 1
for (i in seq(ncol(x)))
for (j in seq(nrow(x))) {
cat(i,j,k,"\n") ## unnecessary
tt <- table(as.character(x[wrapind(c(j-1,j+1),n),i]))
incmat[k,names(tt)] <- tt
k <- k+1
}

Resources