How to plot a bipartite graph in R - r

How do I plot a network of type bipartite in R? Similar to this:
I have similar data but with weights for both genes and diseases and SARS. This network is an example. I have different kind of attributes. I followed a link here. But due to my little knowledge in this topic, I could not get much out of it. Thanks in advance for any help.

From the ?bipartite_graph help:
Bipartite graphs have a type vertex attribute in igraph, this is boolean and FALSE for the vertices of the first kind and TRUE for vertices of the second kind.
So you could do something like this (igraph 1.0.1):
library(igraph)
set.seed(123)
# generate random bipartite graph.
g <- sample_bipartite(10, 5, p=.4)
# check the type attribute:
V(g)$type
# define color and shape mappings.
col <- c("steelblue", "orange")
shape <- c("circle", "square")
plot(g,
vertex.color = col[as.numeric(V(g)$type)+1],
vertex.shape = shape[as.numeric(V(g)$type)+1]
)
Check also ?bipartite.
Using the example provided by the OP in the comments. Since the graph is multipartite and given the provided data format, I would first create a bipartite graph, then add the additional edges. Note that although the resulting graph returns TRUE for is_bipartite() the type argument is specified as numeric instead of logical and may not work properly with other bipartite functions.
set.seed(123)
V1 <- sample(LETTERS[1:10], size = 10, replace = TRUE)
V2 <- sample(1:10, size = 10, replace = TRUE)
d <- data.frame(V1 = V1, V2 = V2, weights = runif(10))
d
> d
V1 V2 weights
1 C 10 0.8895393
2 H 5 0.6928034
3 E 7 0.6405068
4 I 6 0.9942698
5 J 2 0.6557058
6 A 9 0.7085305
7 F 3 0.5440660
8 I 1 0.5941420
9 F 4 0.2891597
10 E 10 0.1471136
g <- graph_from_data_frame(d, directed = FALSE)
V(g)$label <- V(g)$name # set labels.
# create a graph connecting central node FOO to each V2.
e <- expand.grid(V2 = unique(d$V2), V2 = "FOO")
> e
V2 V2
1 10 FOO
2 5 FOO
3 7 FOO
4 6 FOO
5 2 FOO
6 9 FOO
7 3 FOO
8 1 FOO
9 4 FOO
g2 <- graph.data.frame(e, directed = FALSE)
# join the two graphs.
g <- g + g2
# set type.
V(g)$type <- 1
V(g)[name %in% 1:10]$type <- 2
V(g)[name %in% "FOO"]$type <- 3
V(g)$type
> V(g)$type
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3
col <- c("steelblue", "orange", "green")
shape <- c("circle", "square", "circle")
library(rTRM) # Bioconductor package containing layout.concentric()
# the fist element in the list for concentric is the central node.
l <- layout.concentric(g, concentric = list("FOO", 1:10, LETTERS[1:10]))
plot(g,
layout = l,
vertex.color = col[V(g)$type],
vertex.shape = shape[V(g)$type],
edge.width = E(g)$weights * 5 # optional, plot edges width proportional to weights.
)
The function layout.concentric() is in (my) package rTRM, available from Bioconductor. It is really a simple implementation I wrote to do exactly what you want. I am not completely sure whether the latest igraph version has the same functionality though (it may be).

For the example you provided, I would recommend using the x and y attributes for visualizing a bipartite graph. E.g.:
V(g)$x <- c(1, 1, 1, 2, 2, 2, 2)
V(g)$y <- c(3, 2, 1, 3.5, 2.5, 1.5, 0.5)
V(g)$shape <- shape[as.numeric(V(g)$type) + 1]
V(g)$color <- c('red', 'blue', 'green', 'steelblue', 'steelblue', 'steelblue', 'steelblue')
E(g)$color <- 'gray'
E(g)$color[E(g)['A' %--% V(g)]] <- 'red'
E(g)$color[E(g)['B' %--% V(g)]] <- 'blue'
E(g)$color[E(g)['C' %--% V(g)]] <- 'green'
plot(g)
EDIT: added code to give the vertices and edges different colors for clarity.

Or you can use the multigraph package.
swomen <- read.dl(file = "http://moreno.ss.uci.edu/davis.dat")
bmgraph(swomen, layout = "force", seed = 1, cex = 3, tcex = .8, pch = c(19, 15), lwd = 2,
+ vcol = 2:3, ecol = 8, rot = 65)
that can produce the binomial projection of the two-mode data set

Related

Generating Random Graphs According to Some Conditions

Consider the following points : 1,2,3,4,5,6,7,8,9,10
I can make a directed random graph from these numbers :
library(igraph)
from = sample(1:10, 10, replace=FALSE)
to = sample(1:10, 10, replace=FALSE)
graph_data = data.frame(from,to)
graph_data$id = as.factor(1:10)
graph <- graph.data.frame(graph_data, directed=T)
graph = simplify(graph)
V(graph)$color <- ifelse(graph_data$id == 1, "red", "orange")
plot(graph, layout=layout.circle, edge.arrow.size = 0.2)
I would like to make changes (i.e. "conditions") to the above graph so that:
None of the "nodes" are isolated (e.g. node # 6 is "isolated")
Each node can only have "incoming edge" and an "outgoing edge" (i.e. if you are on some "node" in this graph, if you keep following the arrows in the forward direction, you will eventually visit every other "node" exactly once and finally come back to the "node" you started from)
The "last node" (i.e. the "node" in the "to" column of the last row) is always connected to the "first node" (i.e. the "node" in the "from" column of the first row)
In this case, this means that :
from to
1 9 4
2 8 2
3 2 1
4 3 7
5 6 6
6 1 5
7 10 3
8 5 9
9 4 8
10 7 10
An additional row would need to be added so that "node 10" has to be connected to "node 9".
I can do this manually (e.g. graph_data[nrow(graph_data) + 1,] = c("10","9", "11", "orange)) but can someone please show me how to automatically add all these conditions to the random graph being generated?
Thank you!
That should solve your problem
library(tidyverse)
library(igraph)
set.seed(123)
n=15
data = tibble(d = paste(1:n))
relations = tibble(
from = sample(data$d),
to = lead(from, default=from[1]),
)
graph = graph_from_data_frame(relations, directed=T, vertices = data)
V(graph)$color <- ifelse(data$d == "1", "red", "orange")
plot(graph, layout=layout.circle, edge.arrow.size = 0.2)
Update 1
library(tidyverse)
library(igraph)
set.seed(123)
n=15
data = tibble(d = paste(1:n))
relations = tibble(
from = sample(data$d),
to = lead(from, default=from[1]),
)
graph = graph_from_data_frame(relations, directed=T, vertices = data)
V(graph)$color <- ifelse(data$d == relations$from[1], "red", "orange")
plot(graph, layout=layout.circle, edge.arrow.size = 0.2)
Here you are
I think the following meets your requirements. If you generate from and to like this:
from <- sample(1:10, 10, replace = FALSE)
to <- from[c(2:10, 1)]
Then repeat the rest of your code, you get:
This produces a cyclical graph where the node labels are random numbers between one and 10.

How to order boxes in boxplots by the medians of a numerical variable in a dataframe in base R

I have a dataframe with three variables; one ("group") is a factor with two levels, one ("word") is a character vector, and one ("duration") is numeric. For example:
DATA <- data.frame(
group = c(rep("prefinal",10), rep("final", 10)),
word = c(sample(LETTERS[1:5], 10, replace = T), sample(LETTERS[1:5], 10, replace = T)),
duration = rnorm(20)
)
DATA
group word duration
1 prefinal C 0.16378771
2 prefinal E 0.13370196
3 prefinal A 0.69112398
4 prefinal B 0.21499187
5 prefinal D -0.28998279
6 prefinal D -2.00353522
7 prefinal A 0.37842555
8 prefinal E 1.62326170
9 prefinal A -0.26294929
10 prefinal B -0.54276322
11 final D 1.32772171
12 final E -1.84902285
13 final C 0.01058158
14 final E 1.49529743
15 final B 0.55291290
16 final A -0.35484820
17 final D -0.16822110
18 final A 0.88667458
19 final E 0.70889916
20 final B 1.12217332
I'd like to depict the durations of the words by group in boxplots:
boxplot(DATA$duration ~ DATA$group + DATA$word,
xaxt="n",
col = rep(c("blue", "red"), 5))
axis(1, at = seq(from=1.5, to= 10.5, by=2), labels = sort(unique(DATA$word)), cex.axis = 0.9)
R seems to order the boxes in alphabetical order (of the "word" variable) by default.
EDIT:
However I'd prefer that the boxes be sorted by the median durations (in descending order) the items in the "word" variable have in the "prefinal" group. How can that be achieved?
You can reorder the levels of DATA$wordaccording to their median. The - before DATA$duration is to sort it in descending order.
DATA$word <- reorder(DATA$word, -DATA$duration, FUN = median)
boxplot(DATA$duration ~ DATA$group + DATA$word,
xaxt="n",
col = rep(c("blue", "red"), 5))
axis(1, at = seq(from=1.5, to= 10.5, by=2), labels = levels(DATA$word), cex.axis = 0.9)
You can do the same for the subgroup of prefinal. But it requires an additional step:
ordered_levels <- levels(with(DATA[DATA$group == "prefinal",], reorder(word, -duration, FUN = median)))
DATA$word <- factor(DATA$word, levels = ordered_levels)

merge edgelists to a single graph

I have three edgelists that have same nodes. I want to merge them into one graph and seperate these edges by colors and weights. I have provide small example of what I want to do:
df1
a b 1 blue
b c 0.361973313 blue
a d 0.343729742 blue
df2
a c 0.264800107 green
a a 0.228507399 green
c d 0.22202394 green
df3
d d 0.179089391 red
d a 0.173410831 red
c b 0.093636709 red
top dataframes are my edgelists. As you can see multiple edges and loops are free to have. A way that came to my mind to merge these edges to a single graph was to make a empty graph and then add these edges seperately, but I couldn't do it. Any idea?
g <- make_empty_graph(n = 0, directed = F)
g <- g + vertices(c("a","b", "c","d"))
g<- g+ edges(c( "a", "b", "b", "c",
"a", "d"),color="blue")
Here's how to do that using graph_from_data_frame. You also have to use set_edge_attr to set the attributes. Finally, your weights are very close to another, so the difference is hard to see. I changed one weight to 5 to show that it works.
df1 <- read.table(text="from to weight color
a b 1 blue
b c 0.361973313 blue
a d 0.343729742 blue",
header=TRUE,stringsAsFactors=FALSE)
df2 <- read.table(text="from to weight color
a c 0.264800107 green
a a 0.228507399 green
c d 5 green",
header=TRUE,stringsAsFactors=FALSE)
df <- rbind(df1,df2)
g <- graph_from_data_frame(df[,1:2])%>%
set_edge_attr("weight",value=df$weight) %>%
set_edge_attr("color",value=df$color)
plot(g, edge.width = E(g)$weight)
You can use the graph_from_edgelist function
library(igraph)
df1 <- data.frame(
'from' = c('a','b','a'),
'to' = c('b','c','d'),
'weight' = c(0.3, 0.2, 0.5),
'colour' = c('blue','blue','blue'))
df2 <- data.frame(
'from' = c('a','a','c'),
'to' = c('c','a','d'),
'weight' = c(0.3, 0.2, 0.5),
'colour' = c('green','green','green'))
edges <- rbind(df1, df2)
gp <- graph_from_edgelist(
as.matrix(edges[,c('from', 'to')]))
Edit:
For your properties, you can uses set_edge_attr as detailed by #P Lapointe below. Extending my code accordingly looks like this:
gp <- set_edge_attr(gp, "weight", value = edges$weight)
gp <- set_edge_attr(gp, "colour", value = edges$colour)
plot(gp, edge.width = E(g)$weight) #Stolen from P Lapointe

Color side bar dendrogram plot

Initially I was trying to add the horizontal color side bar to the dendrogram plot (NOT to the whole heat map) using colored_bars from dendextend.
The code below (THANK YOU for your help Tal!) works pretty well. The only issue remaining is how to control the distance of the bar from the leaves labels and the bar width?
Here is an example, data and code
Data (4 variables, 5 cases)
df <- read.table(header=T, text="group class v1 v2
1 A 1 3.98 23.2
2 A 2 5.37 18.5
3 C 1 4.73 22.1
4 B 1 4.17 22.3
5 C 2 4.47 22.4
")
car_type <- factor(df[,c(1)]) # groups codes (A,B,C)
cols_4 <- heat.colors(3)
col_car_type <- cols_4[car_type]
matrix<-data.matrix(df[,c(3,4)])
rnames<-df[,2]
row.names(matrix)<-rnames
matrix<-data.matrix(df[,c(3,4)])
row.names(matrix)<-rnames
dend<-hclust(dist(matrix))
labels_colors(dend) <- col_car_type[order.dendrogram(dend)] # Error in order.dendrogram(dend) : 'order.dendrogram' requires a dendrogram
# But I dont think the line above is doing anything so it can be removed...
plot(dend)
colored_bars(col_car_type, dend)
This is possible to do using dendextend.
First to install the latest dendextend version you can use:
install.packages(dendextend)
Here is an example using mtcars:
## mtcars example
# Create the dend:
dend <- as.dendrogram(hclust(dist(mtcars)))
# Create a vector giving a color for each car to which company it belongs to
car_type <- rep("Other", length(rownames(mtcars)))
is_x <- grepl("Merc", rownames(mtcars))
car_type[is_x] <- "Mercedes"
is_x <- grepl("Mazda", rownames(mtcars))
car_type[is_x] <- "Mazda"
is_x <- grepl("Toyota", rownames(mtcars))
car_type[is_x] <- "Toyota"
car_type <- factor(car_type)
n_car_types <- length(unique(car_type))
cols_4 <- colorspace::rainbow_hcl(n_car_types, c = 70, l = 50)
col_car_type <- cols_4[car_type]
# extra: showing the various clusters cuts
k234 <- cutree(dend, k = 2:4)
# color labels by car company:
labels_colors(dend) <- col_car_type[order.dendrogram(dend)]
# color branches based on cutting the tree into 4 clusters:
dend <- color_branches(dend, k = 4)
### plots
par(mar = c(12,4,1,1))
plot(dend)
colored_bars(cbind(k234[,3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Car Type"))
# horiz version:
par(mar = c(4,1,1,12))
plot(dend, horiz = TRUE)
colored_bars(cbind(k234[,3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Car Type"), horiz = TRUE)
legend("topleft", legend = levels(car_type), fill = cols_4)

Creating a cumulative step graph in R

Say I have this example data frame
set.seed(12345)
n1 <- 3
n2 <- 10
n3 <- 60
times <- seq(0, 100, 0.5)
individual <- c(rep(1, n1),
rep(2, n2),
rep(3, n3))
events <- c(sort(sample(times, n1)),
sort(sample(times, n2)),
sort(sample(times, n3)))
df <- data.frame(individual = individual, events = events)
Which gives
> head(df, 10)
individual events
1 1 72.0
2 1 75.5
3 1 87.5
4 2 3.0
5 2 14.5
6 2 16.5
7 2 32.0
8 2 45.5
9 2 50.0
10 2 70.5
I would like to plot a cumulative step graph of the events so that I get one line per individual which goes up by 1 each time an event is "encountered".
So, for instance individual 1 will be 0 up to 72.0, then go up to 1, until 75.5 when it becomes 2 and up to 3 at 87.5 to the end of the graph.
What would be the easiest way to do that?
df$step <- 1
library(plyr)
df <- ddply(df,.(individual),transform,step=cumsum(step))
plot(step~events,data=df[df$individual==1,],type="s",xlim=c(0,max(df$events)),ylim=c(0,max(df$step)),xlab="time",ylab="step")
lines(step~events,data=df[df$individual==2,],type="s",col=2)
lines(step~events,data=df[df$individual==3,],type="s",col=3)
There is also the stepfun function in the stats package. Using that, you could use the plot method for that object class:
sdf <- split(df, individual)
plot(1, 1, type = "n", xlim = c(0, max(events)), ylim = c(0, max(table(individual))),
ylab = "step", xlab = "time")
sfun <- lapply(sdf, function(x){
sf <- stepfun(sort(x$events), seq_len(nrow(x) + 1) - 1)
plot(sf, add = TRUE, col = unique(x$individual), do.points = FALSE)
})
Use ggplot2:
library(ggplot2)
# Add step height information with sequence and rle
df$step <- sequence(rle(df$individual)$lengths)
# plot
df$individual <- factor(df$individual)
ggplot(df, aes(x=events, group=individual, colour=individual, y=step)) +
geom_step()

Resources