I have this graph:
df<-data.frame(x=c('a','b','c'),y=c('d','c','f'))
g<-graph.data.frame(df,directed=F)
is there a way to return two lists of vertexes according to which subgraph they belong?
I'd like to get to this output:
vertex id
1 a 1
2 d 1
3 b 2
4 c 2
5 f 2
Thank you
See clusters. Btw. what you are looking for is the components of the graph. (The igraph terminology is confusing, too.)
data.frame(vertex=V(g)$name, id=clusters(g)$membership)
# vertex id
# 1 a 1
# 2 b 2
# 3 c 2
# 4 d 1
# 5 f 2
Related
How does one count the characters based on the order they appear in a single length string. Below is an minimal example:
x <- "abbccdddaab"
First thought was this but it only counts them irrespective of order:
table(unlist(strsplit(x, "\\b")))
a b c d
3 3 2 3
But the desired output is:
a b c d a b
1 2 2 3 2 1
I would imagine the solution would require a for loop?
We can use rle instead of table as rle returns the output as a list of values and lengths based on checking whether the adjacent elements are same or not
out <- rle(strsplit(x, "\\b")[[1]])
setNames(out$lengths, out$values)
# a b c d a b
# 1 2 2 3 2 1
Using data.table::rleid :
x <- "abbccdddaab"
tmp <- strsplit(x, "\\b")[[1]]
table(data.table::rleid(tmp))
#1 2 3 4 5 6
#1 2 2 3 2 1
This is an example.
df <- data.frame(item=letters[1:5], n=c(3,2,2,1,1))
df
item n
1 a 3
2 b 2
3 c 2
4 d 1
5 e 1
Item needs to be grouped so that the group has a sample size of at least 4.
This would be the solution if you follow the sorting of df.
item n cluster
1 a 3 1
2 b 2 1
3 c 2 2
4 d 1 2
5 e 1 2
How to get all possible unique solutions?
Further, the code should also not allow any clusters to have a sample size less than 4.
Below, we have a brute force approach using the package partitions. The idea is that we find every partition of the rows of df. We then sum each group and check to see that the requirement has been met.
df <- data.frame(item=letters[1:5], n=c(3,2,2,1,1))
minSize <- 4
funGetClusters <- function(df, minSize) {
allParts <- partitions::listParts(nrow(df))
goodInd <- which(sapply(allParts, function(p) {
all(sapply(p, function(x) sum(df$n[x])) >= minSize)
}))
allParts[goodInd]
}
clusterBreakdown <- funGetClusters(df, minSize)
allDfs <- lapply(clusterBreakdown, function(p) {
copyDf <- df
copyDf$cluster <- 1L
clustInd <- 2L
for (i in p[-1]) {
copyDf$cluster[i] <- clustInd
}
copyDf
})
Here is the output:
allDfs
[[1]]
item n cluster
1 a 3 1
2 b 2 1
3 c 2 1
4 d 1 1
5 e 1 1
[[2]]
item n cluster
1 a 3 1
2 b 2 2
3 c 2 2
4 d 1 1
5 e 1 1
[[3]]
item n cluster
1 a 3 2
2 b 2 1
3 c 2 1
4 d 1 2
5 e 1 1
[[4]]
item n cluster
1 a 3 2
2 b 2 1
3 c 2 1
4 d 1 1
5 e 1 2
[[5]]
item n cluster
1 a 3 2
2 b 2 1
3 c 2 2
4 d 1 1
5 e 1 1
[[6]]
item n cluster
1 a 3 2
2 b 2 2
3 c 2 1
4 d 1 1
5 e 1 1
It should be noted, that there is a combinatorial explosion as the number of rows increases. For example, just with 10 rows we would have to test 115975 different partitions.
As #chinsoon comments, RcppAlgos could be a good choice for an acceptable solution for larger cases. Disclaimer, I am the author. I have answered similar questions with much larger inputs and have had good success.
Allocating tasks to parallel workers so that expected cost is roughly equal
Split a set into n unequal subsets with the key deciding factor being that the elements in the subset aggregate and equal a predetermined amount?
#AllanCameron also has a great answer and nice methodology to attacking this problem. You should give that a read as well.
Lastly, the following vignette by Robin K. S. Hankin (author of the partitions package) and Luke J. West is not only a great read, but very applicable to problems like the one presented here.
Set Partitions in R
suppose I have a network like this with multiple subgraphs.
How can I only keep the subgraph with the most number of vertices while removing the rest? In this case I want to keep the subgraph on the left and remove the 3-vertices one the lower right. Thanks!
Given
set.seed(1)
g <- sample_gnp(20, 1 / 20)
plot(g)
we wish to keep the subgraph with 6 vertices. Using
(clu <- components(g))
# $membership
# [1] 1 2 3 4 5 4 5 5 6 7 8 9 10 3 5 11 5 3 12 5
# $csize
# [1] 1 1 3 2 6 1 1 1 1 1 1 1
# $no
# [1] 12
gMax <- induced_subgraph(g, V(g)[clu$membership == which.max(clu$csize)])
we then get
plot(gMax)
This assumes that there is a single largest connected subgraph. Otherwise the "first" one will be chosen.
I have a weighted directed graph with three strongly connected components(SCC).
The SCCs are obtained from the igraph::clusters function
library(igraph)
SCC<- clusters(graph, mode="strong")
SCC$membership
[1] 9 2 7 7 8 2 6 2 2 5 2 2 2 2 2 1 2 4 2 2 2 3 2 2 2 2 2 2 2 2
SCC$csize
[1] 1 21 1 1 1 1 2 1 1
SCC$no
[1] 9
I want to visualize the SCCs with circles and a colored background as the graph below, is there any ways to do this in R? Thanks!
Take a look at the mark.groups argument of plot.igraph. Something like the following will do the trick:
# Create some toy data
set.seed(1)
library(igraph)
graph <- erdos.renyi.game(20, 1/20)
# Do the clustering
SCC <- clusters(graph, mode="strong")
# Add colours and use the mark.group argument
V(graph)$color <- rainbow(SCC$no)[SCC$membership]
plot(graph, mark.groups = split(1:vcount(graph), SCC$membership))
I have some issue while adding vertex labels in a weighted igraph working with R.
The data frame of the graph is:
df <- read.table(text=
"From, To, Weight
A,B,1
B,C,2
B,F,3
C,D,5
B,F,4
C,D,6
D,E,7
E,B,8
E,B,9
E,C,10
E,F,11", sep=',',header=TRUE)
# From To Weight
# 1 A B 1
# 2 B C 2
# 3 B F 3
# 4 C D 5
# 5 B F 4
# 6 C D 6
# 7 D E 7
# 8 E B 8
# 9 E B 9
# 10 E C 10
# 11 E F 11
and I use :
g<-graph.data.frame(df,directed = TRUE)
plot(g)
to plot the following graph :
One can see that vertex labels (for example) from E to B are superimposed.
(The same problem appears for vertex C-D and vertex B-F)
I'd like to know how to separate these labels so as to have each
different weight on each vertex ?
try the qgraph package. qgraph builds on igraph and does a lot of stuff for you in the background.
install.packages('qgraph')
require(qgraph)
qgraph(df,edge.labels=T)
Hope this helps.