Assign colour to chosen communities in a plot - r

In a plot, I need to colour two specific communities. Take the following data frame:
A B C D E F G
A 0 1 0 1 0 1 0
B 1 0 1 1 0 1 0
C 0 1 0 0 0 0 0
D 1 1 0 0 1 1 0
E 0 0 0 1 0 1 0
F 1 1 0 1 1 0 1
G 0 0 0 0 0 1 0
ob <- read.csv("...ties.csv",sep = ",", header = TRUE, row.names = 1)
m <- as.matrix(ob)
g <- graph.adjacency(m, mode="undirected", weighted = T, add.rownames = T)
First, I detect the communities (com) of my graph g using edge.betweenness:
com <- edge.betweenness.community(g)
V(g)$memb <- com$membership
This operation produces a number of communities, com[[1]],com[[2]], etc. I plot the resulting graph -- each community one colour -- with the following code:
plot(g, vertex.color=membership(com))
Now, how do I colour only two chosen communities, say com[[1]] and com[[2]], keeping the rest of the nodes homogeneous?

I had to tweak your adjacency matrix so that more than 1 community showed up.
library(igraph)
ob <- read.table(text="
A B C D E F G
A 0 1 0 1 0 1 0
B 1 0 1 1 0 1 0
C 0 0 0 0 0 0 0
D 1 1 0 0 1 0 0
E 0 0 0 1 0 1 1
F 0 1 0 0 1 0 1
G 0 0 0 0 1 1 0", header=TRUE)
m <- as.matrix(ob)
g <- graph.adjacency(m, mode="undirected", weighted = T, add.rownames = T)
com <- edge.betweenness.community(g)
V(g)$memb <- com$membership
cols <- membership(com)
cols[cols!=3] <- 1
plot(g, vertex.color=cols)

Related

Calculate degree of a subgraph using r igraph

I know the degree of my global graph, but now I need to find the degrees of nodes within a subgraph. So, John has 4 friends in his school, but three friends in his class. How do I instruct igraph to count those three friends in his class, but not the rest in his school?
My global graph
library(igraph)
school <- read.table(text="
A B C D E F G
A 0 1 0 1 0 1 1
B 1 0 1 1 0 1 0
C 0 0 0 0 0 0 1
D 1 1 0 0 1 0 0
E 0 0 0 1 0 1 1
F 0 1 0 0 1 0 1
G 1 0 1 0 1 1 0", header=TRUE)
mat <- as.matrix(school)
g <- graph.adjacency(mat, mode="undirected", add.rownames = T)
My affiliation matrix for classes P, Q, and R
x <- read.table(text="
P Q R
A 1 1 0
B 0 0 1
C 0 0 0
D 1 0 1
E 1 1 0
F 0 1 0
G 1 1 1", header=TRUE)
inc <- as.matrix(x)
ginc <- graph.incidence(inc)
My subgraph for class P
class_nodes <- names(which(inc[,"P"] == 1))
class_adj <- mat[class_nodes, class_nodes]
class_graph <- graph.adjacency(class_adj, mode = "undirected")
I need to calculate the degree of nodes in subgraph "class_graph", but counting only their ties within the subgraph, not the global graph.
You can find all the nodes in class P with (we specifically extract the names so we can look them up in a different graph object).
V(ginc)[.nei("P")]$name
Then you can extract just that subset of connections from the main graph with
subg <- induced.subgraph(g, V(ginc)[.nei("P")]$name)
and you can calculate the degree of those nodes with
degree(subg)
# A D E G
# 2 2 2 2

How to plot chosen vertices and total number of edges

I'd like to plot two selected nodes and all their edges, not only the ones that connect these two nodes directly. For example:
library(igraph)
o <- read.table(text="
A B C D E F G
A 0 1 0 1 0 1 1
B 1 0 1 1 0 1 0
C 0 0 0 0 0 0 1
D 1 1 0 0 1 0 0
E 0 0 0 1 0 1 1
F 0 1 0 0 1 0 1
G 1 0 1 0 1 1 0", header=TRUE)
mat <- as.matrix(o)
g <- graph.adjacency(mat, mode="undirected", weighted = T, add.rownames = T)
I'm able to choose two nodes of g using the codes below, but the plot includes only the edges that connect them directly. I want them all.
g2 <- induced_subgraph(g, c("A","E"))
plot(g2)
How do I plot the two vertices, and all of their edges? Also, how do I choose path distance for each vertex?
Thanks!
library(igraph)
o <- read.table(text="
A B C D E F G
A 0 1 0 1 0 1 1
B 1 0 1 1 0 1 0
C 0 0 0 0 0 0 1
D 1 1 0 0 1 0 0
E 0 0 0 1 0 1 1
F 0 1 0 0 1 0 1
G 1 0 1 0 1 1 0", header=TRUE)
mat <- as.matrix(o)
g <- graph.adjacency(mat, mode="undirected", weighted = T, add.rownames = T)
plot(g)
# get 1st connections of A and E (their friends)
vertices_input = c("A","E") # specify vertices of interest
ids = as.numeric(E(g)[from(vertices_input)]) # get the ids of the edges from g that correspond to those vertices
g2 = subgraph.edges(g, ids) # keep only those edges from g as a sub-graph g2
plot(g2)
# get up to 2nd connections of A and E (friends of friends)
nms = V(g2)$name # get names of vertices of your sub-graph g2 (main vertices and 1st connections)
ids = as.numeric(E(g)[from(nms)]) # get the ids of the edges from g that correspond to main vertices and 1st connections
g3 = subgraph.edges(g, ids) # keep only those edges from g as a sub-graph g3
plot(g3)

Measure weight of communities for different subgraphs

I detect communities in my adjacency matrix. Parallely, I create an affiliation matrix using the vertices of the same matrix. How do I measure the weight of the communities in each of the columns of the affiliation matrix?
Take the following adjacency matrix:
A B C D E F G
A 0 1 0 1 0 1 0
B 1 0 1 1 0 1 0
C 0 1 0 0 0 0 0
D 1 1 0 0 1 1 0
E 0 0 0 1 0 1 0
F 1 1 0 1 1 0 1
G 0 0 0 0 0 1 0
I identify the communities:
com <- edge.betweenness.community(g)
V(g)$memb <- com$membership
Now take the following affiliation matrix:
P R Q
A 1 1 0
B 1 0 1
C 1 1 0
D 0 1 0
E 1 0 1
F 0 0 1
G 1 1 0
How do I count the number of vertices corresponding to community [[1]] which are affiliated to the "P" in the affiliation matrix?
You can do sum(m[com[[1]],"P"]>0), given that m holds your affiliation matrix. Or lapply(com, function(x) colSums(m[x, ])) for all communities.

Transform data frame

I have a questionnaire with an open-ended question like "Please name up to ten animals", which gives me the following data frame (where each letter stands for an animal):
nrow <- 1000
list <- vector("list", nrow)
for(i in 1:nrow){
na <- rep(NA, sample(1:10, 1))
list[[i]] <- sample(c(letters, na), 10, replace=FALSE)
}
df <- data.frame()
df <- rbind(df, do.call(rbind, list))
head(df)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
# 1 r <NA> a j w e i h u z
# 2 t o e x d v <NA> z n c
# 3 f y e s n c z i u k
# 4 y <NA> v j h z p i c q
# 5 w s v f <NA> c g b x e
# 6 p <NA> a h v x k z o <NA>
How can I transform this data frame to look like the following data frame? Remember that I don't actually know the column names.
r <- 1000
c <- length(letters)
t1 <- matrix(rbinom(r*c,1,0.5),r,c)
colnames(t1) <- letters
head(t1)
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# [1,] 0 1 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0
# [2,] 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 0 1 0 1
# [3,] 0 1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
# [4,] 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 0 1 1 0 0
# [5,] 1 0 1 1 1 1 1 1 1 0 1 1 0 0 0 0 1 1 0 1 1 0 0 1 0 0
# [6,] 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1 1 0 1
td <- data.frame(t(apply(df, 1, function(x) as.numeric( unique(unlist(df)) %in% x))))
colnames (td) <- unique(unlist(df))
letters could be replaced with a vector of animal names colnames(t1).
You can do the following using tidyr which could be much faster than other approaches, though I like the approach by #germcd very much. You may need to tinker with the select, removing NAs as well as a blank space, which may be an artifact of the simulated data you provided:
require(tidyr)
## Add an ID for each record:
df$id <- 1:nrow(df)
out <- (df %>%
gather(column, animal, -id) %>%
filter(animal != " ") %>%
spread(animal, column)
)
head(out)
This code gathers the unnamed columns into a long format, removes any empty columns or missing data, and then spreads by the unique values of the animal column. This also has the potentially desirable property of preserving the column order in which the animals were named. If it's not desirable then you could easily convert the resulting animal columns to numeric:
out_num <- out
out_num[,-1] <- as.numeric((!is.na(out[,-1])))
head(out_num)
You can try mtabulate from the "qdapTools" package:
library(qdapTools)
head(mtabulate(as.data.frame(t(df))))
# c d i l m o r v x y a f s t k p u b h j n q e g w z
# 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 2 0 1 0 0 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
# 3 0 0 1 0 0 0 1 0 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0
# 4 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
# 5 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 1 1 0 0 0 0
# 6 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0
There are, of course, many other options.
For example, cSplit_e from my "splitstackshape" package (with the downside that inefficiently, you need to paste the values together first before you can split them):
library(splitstackshape)
library(dplyr)
As ones and zeroes:
df %>%
mutate(combined = apply(., 1, function(x) paste(na.omit(x), collapse = ","))) %>%
cSplit_e("combined", ",", mode = "binary", type = "character", fill = 0) %>%
select(starts_with("combined_")) %>%
head
# combined_a combined_b combined_c combined_d combined_e combined_f combined_g combined_h combined_i
# 1 0 0 1 1 0 0 0 0 1
# 2 1 0 0 1 0 1 0 0 0
# 3 1 0 0 0 0 0 0 0 1
# 4 0 1 1 0 0 0 0 1 1
# 5 0 1 0 1 0 0 0 1 0
# 6 0 1 0 0 0 0 0 0 0
# combined_j combined_k combined_l combined_m combined_n combined_o combined_p combined_q combined_r
# 1 0 0 1 1 0 1 0 0 1
# 2 0 0 0 1 0 0 0 0 0
# 3 0 1 0 0 0 0 1 0 1
# 4 1 0 1 0 1 0 0 0 0
# 5 0 1 0 0 1 0 1 1 1
# 6 1 1 0 1 0 0 0 1 0
# combined_s combined_t combined_u combined_v combined_w combined_x combined_y combined_z
# 1 0 0 0 1 0 1 1 0
# 2 1 1 0 0 0 0 0 0
# 3 0 1 1 0 0 1 1 0
# 4 0 0 1 0 0 0 1 0
# 5 1 0 0 0 0 0 0 0
# 6 1 1 1 0 0 0 0 0
As the original values:
df %>%
mutate(combined = apply(., 1, function(x) paste(na.omit(x), collapse = ","))) %>%
cSplit_e("combined", ",", mode = "value", type = "character", fill = "") %>%
select(starts_with("combined_")) %>%
head
# combined_a combined_b combined_c combined_d combined_e combined_f combined_g combined_h combined_i
# 1 c d i
# 2 a d f
# 3 a i
# 4 b c h i
# 5 b d h
# 6 b
# combined_j combined_k combined_l combined_m combined_n combined_o combined_p combined_q combined_r
# 1 l m o r
# 2 m
# 3 k p r
# 4 j l n
# 5 k n p q r
# 6 j k m q
# combined_s combined_t combined_u combined_v combined_w combined_x combined_y combined_z
# 1 v x y
# 2 s t
# 3 t u x y
# 4 u y
# 5 s
# 6 s t u
Alternatively, you can use "reshape2":
library(reshape2)
## The values
dcast(melt(as.matrix(df), na.rm = TRUE),
Var1 ~ value, value.var = "value")
## ones and zeroes
dcast(melt(as.matrix(df), na.rm = TRUE),
Var1 ~ value, value.var = "value", fun.aggregate = length)

How to do row-wise subtraction and replace a specific number with zero?

Step 1: I have a simplified dataframe like this:
df1 = data.frame (B=c(1,0,1), C=c(1,1,0)
, D=c(1,0,1), E=c(1,1,0), F=c(0,0,1)
, G=c(0,1,0), H=c(0,0,1), I=c(0,1,0))
B C D E F G H I
1 1 1 1 1 0 0 0 0
2 0 1 0 1 0 1 0 1
3 1 0 1 0 1 0 1 0
Step 2: I want to do row wise subtraction, i.e. (row1 - row2), (row1 - row3) and (row2 - row3)
row1-row2 1 0 1 0 0 -1 0 -1
row1-row3 0 1 0 1 -1 0 -1 0
row2-row3 -1 1 -1 1 -1 1 -1 1
step 3: replace all -1 to 0
row1-row2 1 0 1 0 0 0 0 0
row1-row3 0 1 0 1 0 0 0 0
row2-row3 0 1 0 1 0 1 0 1
Could you mind to teach me how to do so?
I like using the plyr library for things like this using the combn function to generate all possible pairs of rows/columns.
require(plyr)
combos <- combn(nrow(df1), 2)
adply(combos, 2, function(x) {
out <- data.frame(df1[x[1] , ] - df1[x[2] , ])
out[out == -1] <- 0
return(out)
}
)
Results in:
X1 B C D E F G H I
1 1 1 0 1 0 0 0 0 0
2 2 0 1 0 1 0 0 0 0
3 3 0 1 0 1 0 1 0 1
If necessary, you can drop the first column, plyr spits that out automagically for you.
Similar questions:
Sum pairwise rows with R?
Chi Square Analysis using for loop in R
Compare one row to all other rows in a file using R
For the record, I would do this:
cmb <- combn(seq_len(nrow(df1)), 2)
out <- df1[cmb[1,], ] - df1[cmb[2,], ]
out[out < 0] <- 0
rownames(out) <- apply(cmb, 2,
function(x) paste("row", x[1], "-row", x[2], sep = ""))
This yields (the last line above is a bit of sugar, and may not be needed):
> out
B C D E F G H I
row1-row2 1 0 1 0 0 0 0 0
row1-row3 0 1 0 1 0 0 0 0
row2-row3 0 1 0 1 0 1 0 1
Which is fully vectorised and exploits indices to extend/extract the elements of df1 required for the row-by-row operation.
> df2 <- rbind(df1[1,]-df1[2,], df1[1,]-df1[3,], df1[2,]-df1[3,])
> df2
B C D E F G H I
1 1 0 1 0 0 -1 0 -1
2 0 1 0 1 -1 0 -1 0
21 -1 1 -1 1 -1 1 -1 1
> df2[df2==-1] <- 0
> df2
B C D E F G H I
1 1 0 1 0 0 0 0 0
2 0 1 0 1 0 0 0 0
21 0 1 0 1 0 1 0 1
If you'd like to change the name of the rows to those in your example:
> rownames(df2) <- c('row1-row2', 'row1-row3', 'row2-row3')
> df2
B C D E F G H I
row1-row2 1 0 1 0 0 0 0 0
row1-row3 0 1 0 1 0 0 0 0
row2-row3 0 1 0 1 0 1 0 1
Finally, if the number of rows is not known ahead of time, the following should do the trick:
df1 = data.frame (B=c(1,0,1), C=c(1,1,0), D=c(1,0,1), E=c(1,1,0), F=c(0,0,1), G=c(0,1,0), H=c(0,0,1), I=c(0,1,0))
n <- length(df1[,1])
ret <- data.frame()
for (i in 1:(n-1)) {
for (j in (i+1):n) {
diff <- df1[i,] - df1[j,]
rownames(diff) <- paste('row', i, '-row', j, sep='')
ret <- rbind(ret, diff)
}
}
ret[ret==-1] <- 0
print(ret)

Resources