merge edgelists to a single graph - r

I have three edgelists that have same nodes. I want to merge them into one graph and seperate these edges by colors and weights. I have provide small example of what I want to do:
df1
a b 1 blue
b c 0.361973313 blue
a d 0.343729742 blue
df2
a c 0.264800107 green
a a 0.228507399 green
c d 0.22202394 green
df3
d d 0.179089391 red
d a 0.173410831 red
c b 0.093636709 red
top dataframes are my edgelists. As you can see multiple edges and loops are free to have. A way that came to my mind to merge these edges to a single graph was to make a empty graph and then add these edges seperately, but I couldn't do it. Any idea?
g <- make_empty_graph(n = 0, directed = F)
g <- g + vertices(c("a","b", "c","d"))
g<- g+ edges(c( "a", "b", "b", "c",
"a", "d"),color="blue")

Here's how to do that using graph_from_data_frame. You also have to use set_edge_attr to set the attributes. Finally, your weights are very close to another, so the difference is hard to see. I changed one weight to 5 to show that it works.
df1 <- read.table(text="from to weight color
a b 1 blue
b c 0.361973313 blue
a d 0.343729742 blue",
header=TRUE,stringsAsFactors=FALSE)
df2 <- read.table(text="from to weight color
a c 0.264800107 green
a a 0.228507399 green
c d 5 green",
header=TRUE,stringsAsFactors=FALSE)
df <- rbind(df1,df2)
g <- graph_from_data_frame(df[,1:2])%>%
set_edge_attr("weight",value=df$weight) %>%
set_edge_attr("color",value=df$color)
plot(g, edge.width = E(g)$weight)

You can use the graph_from_edgelist function
library(igraph)
df1 <- data.frame(
'from' = c('a','b','a'),
'to' = c('b','c','d'),
'weight' = c(0.3, 0.2, 0.5),
'colour' = c('blue','blue','blue'))
df2 <- data.frame(
'from' = c('a','a','c'),
'to' = c('c','a','d'),
'weight' = c(0.3, 0.2, 0.5),
'colour' = c('green','green','green'))
edges <- rbind(df1, df2)
gp <- graph_from_edgelist(
as.matrix(edges[,c('from', 'to')]))
Edit:
For your properties, you can uses set_edge_attr as detailed by #P Lapointe below. Extending my code accordingly looks like this:
gp <- set_edge_attr(gp, "weight", value = edges$weight)
gp <- set_edge_attr(gp, "colour", value = edges$colour)
plot(gp, edge.width = E(g)$weight) #Stolen from P Lapointe

Related

Subset graph based on edges weight

I have a graph, G=(V,E) with several attributes including an edge weight attribute. I'm trying to create a subgraph based on a condition where weights are higher than x.
I've tried the standard R subsetting option with g <- E(g)[weight > max(weight)*.10], but I always get a vector.
I'm not sure what I'm doing wrong here.
Maybe you want something like this
library(igraph)
set.seed(1)
m <- matrix(sample(c(.5, 2, 5), 100, replace=T, prob = c(.6,.3,.1)), nc=10, dimnames = rep(list(letters[1:10]), 2))
g <- graph_from_adjacency_matrix(m, weighted=T, diag=F, mode="undirected")
coords <- layout.auto(g)
par(mfrow = c(1,3))
plot(g, layout=coords, edge.width = E(g)$weight)
s1 <- subgraph.edges(g, E(g)[E(g)$weight>2], del=F)
plot(s1, layout=coords, edge.width = E(s1)$weight)
s2 <- delete_vertices(s1, degree(s1, mode = "in")==0)
plot(s2, layout=coords[V(g)$name%in%V(s2)$name,], edge.width = E(s2)$weight)
That would be because you replaced your graph g with just subsetted edges. If you want to remove edges below the threshold weight, you can use:
g_sub <- delete.edges(g, E(g)[weight <= max(weight)*.10])

Is dplyr's left_join correct way to attach a data.frame to a SpatialPolygonDataFrame in R?

Merging extra data (frames) to spatial objects in R can be tricky (as explained here, or here)
Searching for a solution on how to correctly do the job I found this SO question listing several methods. dplyr's left_join was not listed there. I spotted it being used in Robin's tutorial.
My question is - is this a correc method to use? Are there any use cases (different number of rows? different rows names? sorting? etc.) that this solution would fail?
Here is some reproducible code illustarting the methods I found / came across:
# libraries
library("spdep"); library("sp"); library("dplyr")
# sopatial data
c <- readShapePoly(system.file("etc/shapes/columbus.shp", package="spdep")[1])
m <- c#data
c#data <- subset(c#data, select = c("POLYID", "INC"))
c#data$INC2 <- c#data$INC
c#data$INC <- NULL
ex <- subset(c, c$POLYID <= 2) # polygons with messed up data in merged df
c <- subset(c, c$POLYID < 49) # remove one polygon from shape so that df has one poly too many
# messing up merge data
m <- subset(m, POLYID != 1) # exclude polygon
m <- subset(m, select = c("POLYID", "INC")) # only two vars
rownames(m) <- m$POLYID - 2 # change rownames
m$POLYID[m$POLYID == 2] <- 0 # wrong ID
m <- m[order(m$INC),] # different sort
m$POLYID2 <- m$POLYID # duplicated to check dplyr
# left_join solution
s1 <- c
s1#data <- left_join(s1#data, m)
plot(c)
plot(s1, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)
View(s1#data)
# match solution
s2 <- c
s2#data = data.frame(s2#data, m[match(s2#data[,"POLYID"], m[,"POLYID"]),])
plot(c)
plot(s2, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)
View(s2#data)
# sp solution
s3 <- c
s3 <- sp::merge(s3, m, by="POLYID")
plot(c)
plot(s3, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)
View(s3#data)
# inner join solution
s4 <- c
s4#data <- inner_join(s4#data, m)
plot(c)
plot(s4, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)
View(s4#data)
# rebuild solution???
s5 <- c
s5.df <- as(s5, "data.frame")
s5.df1 <- merge(s5.df, m, sort=FALSE, by.x="POLYID", by.y="POLYID", all.x=TRUE, all.y=TRUE)
s51 <- SpatialPolygonsDataFrame(as(s5, "SpatialPolygons"), data=s5.df1)
plot(c)
plot(s51, col = "red", density = 40, angle = 0, add = TRUE)
plot(ex, col= NA, border = "green", add = TRUE)
Left join seems to do the job. Same as sp::merge and match ( I do hope there is no messing up the order so for instance plotted polygons are associated with different vales after the merge?). None of the solutions actually removes two polygons withmissing data, but I presume this is correct behaviour in R?

centering x-axis label under grouped bar plots

I have two 3x2 Matrices A, and B :
A:
n2 n3
Part1 1 5
Part2 2 6
Part3 3 7
B:
n2 n3
Part1 5 1
Part2 6 2
Part3 7 3
and I want to create stacked bar, which I do via:
d1 <- read.csv("A.csv", header=T, dec=".",sep = " ")
d1 <- subset(d1, select = c(n2, n3))
d2 <- read.csv("B.csv", header=T, dec=".",sep = " ")
d2 <- subset(d2, select = c(n2, n3))
d <- cbind(d1[,1],d2[,1],d1[,2],d2[,2])
barplot(d, col=c("lawngreen","firebrick","deepskyblue"),
space=c(0,0,0.2,0), xaxt = "n", yaxt="n",ylim = c(0, 25))
x_axis_range <- c(2,3)
x_axis_labels <- c("2", "3")
axis(1,at = x_axis_range, labels = x_axis_range)
y_axis_range <- c(0,2,4,6,25)
y_axis_labels <- c("0","2","4","6","25")
axis(2,at = y_axis_range, labels = y_axis_labels, las=2)
and it produces:
However, I want 2 and 3 to appear centered under each grouped bars. How can I do it?
Change the code that produces the x-axis to
x_axis_range <- c(1, 3.2)
x_axis_labels <- c("2", "3")
axis(1,at = x_axis_range, labels = x_axis_labels)
The bars are created with a width of one. In addition, you require to be the space between the second and third bar to be 0.2. Thus, the left borders of the bars are placed at 0, 1, 2.2, 3.2. If you want the labels to be placed at the centre of each group, you need therefore to put them at 1 and 3.2, which is what above definition of x_axis_range does.
I still left the labels to be 2 and 3, but of course you can change them to whatever you want.

How to plot a bipartite graph in R

How do I plot a network of type bipartite in R? Similar to this:
I have similar data but with weights for both genes and diseases and SARS. This network is an example. I have different kind of attributes. I followed a link here. But due to my little knowledge in this topic, I could not get much out of it. Thanks in advance for any help.
From the ?bipartite_graph help:
Bipartite graphs have a type vertex attribute in igraph, this is boolean and FALSE for the vertices of the first kind and TRUE for vertices of the second kind.
So you could do something like this (igraph 1.0.1):
library(igraph)
set.seed(123)
# generate random bipartite graph.
g <- sample_bipartite(10, 5, p=.4)
# check the type attribute:
V(g)$type
# define color and shape mappings.
col <- c("steelblue", "orange")
shape <- c("circle", "square")
plot(g,
vertex.color = col[as.numeric(V(g)$type)+1],
vertex.shape = shape[as.numeric(V(g)$type)+1]
)
Check also ?bipartite.
Using the example provided by the OP in the comments. Since the graph is multipartite and given the provided data format, I would first create a bipartite graph, then add the additional edges. Note that although the resulting graph returns TRUE for is_bipartite() the type argument is specified as numeric instead of logical and may not work properly with other bipartite functions.
set.seed(123)
V1 <- sample(LETTERS[1:10], size = 10, replace = TRUE)
V2 <- sample(1:10, size = 10, replace = TRUE)
d <- data.frame(V1 = V1, V2 = V2, weights = runif(10))
d
> d
V1 V2 weights
1 C 10 0.8895393
2 H 5 0.6928034
3 E 7 0.6405068
4 I 6 0.9942698
5 J 2 0.6557058
6 A 9 0.7085305
7 F 3 0.5440660
8 I 1 0.5941420
9 F 4 0.2891597
10 E 10 0.1471136
g <- graph_from_data_frame(d, directed = FALSE)
V(g)$label <- V(g)$name # set labels.
# create a graph connecting central node FOO to each V2.
e <- expand.grid(V2 = unique(d$V2), V2 = "FOO")
> e
V2 V2
1 10 FOO
2 5 FOO
3 7 FOO
4 6 FOO
5 2 FOO
6 9 FOO
7 3 FOO
8 1 FOO
9 4 FOO
g2 <- graph.data.frame(e, directed = FALSE)
# join the two graphs.
g <- g + g2
# set type.
V(g)$type <- 1
V(g)[name %in% 1:10]$type <- 2
V(g)[name %in% "FOO"]$type <- 3
V(g)$type
> V(g)$type
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3
col <- c("steelblue", "orange", "green")
shape <- c("circle", "square", "circle")
library(rTRM) # Bioconductor package containing layout.concentric()
# the fist element in the list for concentric is the central node.
l <- layout.concentric(g, concentric = list("FOO", 1:10, LETTERS[1:10]))
plot(g,
layout = l,
vertex.color = col[V(g)$type],
vertex.shape = shape[V(g)$type],
edge.width = E(g)$weights * 5 # optional, plot edges width proportional to weights.
)
The function layout.concentric() is in (my) package rTRM, available from Bioconductor. It is really a simple implementation I wrote to do exactly what you want. I am not completely sure whether the latest igraph version has the same functionality though (it may be).
For the example you provided, I would recommend using the x and y attributes for visualizing a bipartite graph. E.g.:
V(g)$x <- c(1, 1, 1, 2, 2, 2, 2)
V(g)$y <- c(3, 2, 1, 3.5, 2.5, 1.5, 0.5)
V(g)$shape <- shape[as.numeric(V(g)$type) + 1]
V(g)$color <- c('red', 'blue', 'green', 'steelblue', 'steelblue', 'steelblue', 'steelblue')
E(g)$color <- 'gray'
E(g)$color[E(g)['A' %--% V(g)]] <- 'red'
E(g)$color[E(g)['B' %--% V(g)]] <- 'blue'
E(g)$color[E(g)['C' %--% V(g)]] <- 'green'
plot(g)
EDIT: added code to give the vertices and edges different colors for clarity.
Or you can use the multigraph package.
swomen <- read.dl(file = "http://moreno.ss.uci.edu/davis.dat")
bmgraph(swomen, layout = "force", seed = 1, cex = 3, tcex = .8, pch = c(19, 15), lwd = 2,
+ vcol = 2:3, ecol = 8, rot = 65)
that can produce the binomial projection of the two-mode data set

How to create a multi-dimensional barchart

I'm trying to create a bar chart with lattice, which has two groupings. The first grouping is stacked, whereas the second is not. For example:
a <- factor(rep(c(1,2), times = 6))
b <- factor(rep(c(1,2,3), times = 4))
c <- factor(rep(c(1,2,3,4), times = 3))
d <- factor(rep(c("true", "false"), each = 6))
e <- factor(rep(c("yes", "no", "may be"), each = 4))
value <- c(5,8,2,4,1,8,9,3,5,6,3,12)
At the moment I'm doing the following:
a <- factor(rep(c(1,2), times = 6))
b <- factor(rep(c(1,2,3), times = 4))
c <- factor(rep(c(1,2,3,4), times = 3))
d <- factor(rep(c("true", "false"), each = 6))
e <- factor(rep(c("yes", "no", "may be"), each = 4))
value <- c(5,8,2,4,1,8,9,3,5,6,3,12)
barchart(value ~ a | b + c,
groups = d, stack = FALSE,
auto.key=TRUE,
scales = list(x = "free"))
This results in length(b)*length(c) set of barplots, each with length(a) sets of bars. Each set of bars has a bar for "true" and a bar for "false". What I would also like to add is the stacked value of e, such that each "true" bar will be divided into three sections: the bottom one will be for "yes", then "no" and them "may be" and the same with the "false" bar.
I realise that the graph will be quite complex, however it is the best way to represent the data which I have. Adding e in the formula, as in b + c + e is not an option, as I already have a set of plots and I need to keep to the same format, as they are related to each other. On the other hand having 6 bars in each set will make readability much harder.
Thanks!
ggplot2 will do the job relatively easily, if using lattice isn't a hard requirement for you. I took the liberty of expanding your data set so that all of the combinations of a, b, c, d, and e would be present.
# Load required packages
require(ggplot2)
require(plyr)
# Make factors with the same levels as in the original post
# but 100x longer, and in random order so all combinations are present
a <- sample(factor(rep(c(1,2), times = 600)))
b <- sample(factor(rep(c(1,2,3), times = 400)))
c <- sample(factor(rep(c(1,2,3,4), times = 300)))
d <- sample(factor(rep(c("true", "false"), each = 600)))
e <- sample(factor(rep(c("yes", "no", "may be"), each = 400)))
value <- runif(1200)
# Put them in a data frame
df <- data.frame(a=a, b=b, c=c, d=d, e=e, value=value)
# Calculate the sum of the value columns for each unique combination of a, b, c, d, and e
# I think this is what you'd like - am not totally sure
ds <- ddply(df, c("a", "b", "c", "d", "e"), summarise, sum.value=sum(value, na.omit=TRUE))
# Make the plot
ggplot(ds, aes(x=d, y=sum.value, fill=e)) + geom_bar(stat="identity") +
facet_grid(a~b+c) +
theme(axis.text.x=element_text(angle=-90))

Resources