I am trying to recreate the Biclique Communities method (Lehmann, Schwartz, & Hansen, 2008) in R which relies on the definition of a Ka,b biclique. The example below shows two adjacent K2,2 bicliques - the first clique is {A,B,1,2} and the second clique is {B,C,2,3}. I would like to be able to identify these cliques using R so that I can apply this to a broader dataset.
I have included my attempt so far in R and I am stuck with the following two issues:
If I use the standard walktrap.community it recognises the communities but does not allow the set {B,2} to belong in both cliques
If I use an updated clique.community function this doesn't seem to identify the cliques or I don't understand correctly (or both)
Example code:
library(igraph)
clique.community <- function(graph, k) {
clq <- cliques(graph, min=k, max=k)
edges <- c()
for (i in seq_along(clq)) {
for (j in seq_along(clq)) {
if ( length(unique(c(clq[[i]], clq[[j]]))) == k+1 ) {
edges <- c(edges, c(i,j))
}
}
}
clq.graph <- simplify(graph(edges))
V(clq.graph)$name <- seq_len(vcount(clq.graph))
comps <- decompose.graph(clq.graph)
lapply(comps, function(x) {
unique(unlist(clq[ V(x)$name ]))
})
}
users <- c('A', 'A', 'B', 'B', 'B', 'C', 'C')
resources <- c(1, 2, 1, 2, 3, 2, 3)
cluster <- data.frame(users, resources)
matrix <- as.data.frame.matrix(table(cluster))
igraph <- graph.incidence(matrix)
clique.community(igraph, 2)
walktrap.community(igraph)
Beware that the above solution becomes inefficient very quickly even for small (dense) graphs and values of k,l due to the fact that comb <- combn(vMode1, k) becomes extremely large.
A more efficient solution can be found in the "biclique" package that is in development at https://github.com/YupingLu/biclique.
I managed to find a script for this in the Sisob workbench
computeBicliques <- function(graph, k, l) {
vMode1 <- c()
if (!is.null(V(graph)$type)) {
vMode1 <- which(!V(graph)$type)
vMode1 <- intersect(vMode1, which(degree(graph) >= l))
}
nb <- get.adjlist(graph)
bicliques <- list()
if (length(vMode1) >= k) {
comb <- combn(vMode1, k)
i <- 1
sapply(1:ncol(comb), function(c) {
commonNeighbours <- c()
isFirst <- TRUE
sapply(comb[,c], function(n) {
if (isFirst) {
isFirst <<- FALSE
commonNeighbours <<- nb[[n]]
} else {
commonNeighbours <<- intersect(commonNeighbours, nb[[n]])
}
})
if (length(commonNeighbours) >= l) {
bicliques[[i]] <<- list(m1=comb[,c], m2=commonNeighbours)
}
i <<- i + 1
})
}
bicliques
}
Related
I was wondering about how to find the smallest circumcircle of an irregular polygon. I've worked with spatial polygons in R.
I want to reproduce some of the fragstats metrics in a vector mode because I had hard times with the package 'landscapemetrics' for a huge amount of data. In specific I would like to implement the circle (http://www.umass.edu/landeco/research/fragstats/documents/Metrics/Shape%20Metrics/Metrics/P11%20-%20CIRCLE.htm). So far, I could not find the formula or script for the smallest circumcircle.
All your comments are more than welcome.
Than you
As I mentioned in a comment, I don't know of existing R code for this, but a brute force search should be fast enough if you don't have too many points that need to be in the circle. I just wrote this one. The center() function is based on code from Wikipedia for drawing a circle around a triangle; circumcircle() is the function you want, found by brute force search through all circles that pass through 2 or 3 points in the set. On my laptop it takes about 4 seconds to handle 100 points. If you have somewhat bigger sets, you can probably get tolerable results by translating to C++, but it's an n^4 growth rate, so you'll need a better solution
for a really large set.
center <- function(D) {
if (NROW(D) == 0)
matrix(numeric(), ncol = 2)
else if (NROW(D) == 1)
D
else if (NROW(D) == 2) {
(D[1,] + D[2,])/2
} else if (NROW(D) == 3) {
B <- D[2,] - D[1,]
C <- D[3,] - D[1,]
Dprime <- 2*(B[1]*C[2] - B[2]*C[1])
if (Dprime == 0) {
drop <- which.max(c(sum((B-C)^2), sum(C^2), sum(B^2)))
center(D[-drop,])
} else
c((C[2]*sum(B^2) - B[2]*sum(C^2))/Dprime,
(B[1]*sum(C^2) - C[1]*sum(B^2))/Dprime) + D[1,]
} else
center(circumcircle(D))
}
radius <- function(D, U = center(D))
sqrt(sum((D[1,] - U)^2))
circumcircle <- function(P) {
n <- NROW(P)
if (n < 3)
return(P)
P <- P[sample(n),]
bestset <- NULL
bestrsq <- Inf
# Brute force search
for (i in 1:(n-1)) {
for (j in (i+1):n) {
D <- P[c(i,j),]
U <- center(D)
rsq <- sum((D[1,] - U)^2)
if (rsq >= bestrsq)
next
failed <- FALSE
for (k in (1:n)[-j][-i]) {
Pk <- P[k,,drop = FALSE]
if (sum((Pk - U)^2) > rsq) {
failed <- TRUE
break
}
}
if (!failed) {
bestset <- c(i,j)
bestrsq <- rsq
}
}
}
# Look for the best 3 point set
for (i in 1:(n-2)) {
for (j in (i+1):(n-1)) {
for (l in (j+1):n) {
D <- P[c(i,j,l),]
U <- center(D)
rsq <- sum((D[1,] - U)^2)
if (rsq >= bestrsq)
next
failed <- FALSE
for (k in (1:n)[-l][-j][-i]) {
Pk <- P[k,,drop = FALSE]
if (sum((Pk - U)^2) > rsq) {
failed <- TRUE
break
}
}
if (!failed) {
bestset <- c(i,j,l)
bestrsq <- rsq
}
}
}
}
P[bestset,]
}
showP <- function(P, ...) {
plot(P, asp = 1, type = "n", ...)
text(P, labels = seq_len(nrow(P)))
}
showD <- function(D) {
U <- center(D)
r <- radius(D, U)
theta <- seq(0, 2*pi, len = 100)
lines(U[1] + r*cos(theta), U[2] + r*sin(theta))
}
n <- 100
P <- cbind(rnorm(n), rnorm(n))
D <- circumcircle(P)
showP(P)
showD(D)
This shows the output
I have an issue with get.edge.ids() function in igraph in R I need to pass odd number of vertices to it and get the edgeIDs between them but unfortunately it only gets pairwise vertices sample code to generate a directed graph:
Graph <- erdos.renyi.game(20, 100 , directed=TRUE, loops=FALSE)
how do I call get.edge.ids:
get.edge.ids(Graph, c("1", "2", "3))
I expect to get all possible edges IDs between these vertices but it doesn't work. I developed a function for this purpose but it is not fast enough. Here is the function:
insideOfCommEdgeIDs <- function(graph, vertices)
{
out <- matrix()
condition <- matrix()
if (length(vertices) < 2) {return(NULL)}
for (i in vertices)
{
for (j in vertices)
{
condition <- are_adjacent(graph,i,j)
ifelse(condition,
out <- rbind(out, get.edge.ids(graph, c(i, j), directed=TRUE)),
next)
}
}
return(out[!is.na(out)])
}
Is there any way to to this faster?
You can use the %--% operator to query edges by vertex indices and then use as_ids() to get the edge index.
Please note, I'm using igraph version 1.2.4.2, so I'm using sample_gnm() rather than erdos.renyi.game().
library(igraph)
set.seed(1491)
Graph <- sample_gnm(20, 100 , directed = TRUE, loops = FALSE)
as_ids(E(Graph)[c(1, 2, 3) %--% c(1, 2, 3)])
#> [1] 6 12
This matches the output from your custom function:
insideOfCommEdgeIDs <- function(graph,vertices)
{
out <- matrix()
condition <- matrix()
if(length(vertices) < 2) {return(NULL)}
for(i in vertices)
{
for (j in vertices)
{
condition <- are_adjacent(graph,i,j)
ifelse(condition,out <- rbind(out,get.edge.ids(graph,c(i,j),directed = TRUE)),next)
}
}
return(out[!is.na(out)])
}
insideOfCommEdgeIDs(Graph, c(1, 2, 3))
#> [1] 6 12
Created on 2020-04-10 by the reprex package (v0.3.0)
I don't understand, how method predict.naiveBayes works, if there are two misspelled usages of functions, i.e., isnumeric[attribs[v]] and islogical[attribs[v]].
In my opinion, there should be is.numeric(attribs[v]) and is.logical(attribs[v]), respectively.
Code below:
...
L <- sapply(1:nrow(newdata), function(i) {
ndata <- newdata[i, ]
L <- log(object$apriori) + apply(log(sapply(seq_along(attribs),
function(v) {
nd <- ndata[attribs[v]]
if (is.na(nd)) rep(1, length(object$apriori)) else {
prob <- if (isnumeric[attribs[v]]) {
msd <- object$tables[[v]]
msd[, 2][msd[, 2] <= eps] <- threshold
dnorm(nd, msd[, 1], msd[, 2])
} else object$tables[[v]][, nd + islogical[attribs[v]]]
prob[prob <= eps] <- threshold
prob
}
})), 1, sum)
if (type == "class")
L
else {
## Numerically unstable:
## L <- exp(L)
## L / sum(L)
## instead, we use:
sapply(L, function(lp) {
1/sum(exp(L - lp))
})
}
})
...
Everything works fine, when I am using naive Bayes classifier from package, but it is rather strange, due to these inconsistencies. Can anyone explain me my doubts?
Just two lines above your code excerpt there is basically what you expect:
isnumeric <- sapply(newdata, is.numeric)
islogical <- sapply(newdata, is.logical)
That is, isnumeric and islogical are not functions, they are validly defined logical vectors.
I have troubles with my code that is supposed to determine whether a given natural number is highly composite or not. So far, I have come up with this:
ex33 <- function(x){
fact <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x))
factors <- div[x %% div == 0L]
return(factors)
}
k <- length(fact(x))
check <- NULL
tocheck <- c(1:(x-1))[-fact(x)]
for (i in tocheck) {
l <- length(fact(i))
if (l>=k){
check[i]<-1
break
}else{
check[i]<-0
}
}
if (1 %in% check){
return(FALSE)
}else{
return(TRUE)
}
}
I know, this is quite ineffective and slow, but I could not find another algorithm to speed this function up.
I have this piece of code:
library("GO.db")
lookParents <- function(x) {
parents <- subset(yy[x][[1]], labels(yy[x][[1]])=="is_a")
for (parent in parents) {
m[index,1] <<- Term(x)
m[index,2] <<- Term(parent)
m[index,3] <<- -log2(go_freq[x,1]/go_freq_all)
m[index,4] <<- log2(go1_freq2[x])
m[index,5] <<- x
m[index,6] <<- parent
index <<- index + 1
}
if (is.null(parents)) {
return(c())
} else {
return(parents)
}
}
getTreeMap <- function(GOlist, xx, m) {
print(paste("Input list has",length(GOlist), "terms", sep=" "))
count <- 1
for (go in GOlist) {
parents <- lookParents(go)
if (count %% 100 == 0) {
print(count)
}
while (length(parents) != 0) {
x <- parents[1]
parents <- parents[-1]
parents <- c(lookParents(x), parents)
}
count <- count + 1
}
}
xx <- c(as.list(GOBPANCESTOR), as.list(GOCCANCESTOR), as.list(GOMFANCESTOR))
go1_freq2 <- table(as.character(unlist(xx[go1])))
xx <- c(as.list(GOBPPARENTS), as.list(GOCCPARENTS), as.list(GOMFPARENTS))
m <- as.data.frame(matrix(nrow=1,ncol=6))
m[1,] <- c("all", "null", 0, 0, "null","null")
##biological processes
index <- 2
getTreeMap(BP, xx, m)
but it is really slow. BP is simply a vector. Do you have performance suggestions to apply? I would like to make it run faster, but that's all I can do at the moment.
I suggest following improvements:
add your functions into RProfile.site and compile them using cmpfun
use foreach and dopar instead of normal for
always delete the variables you don't need anymore and then call the garbage collector