igraph in R: Add edges between vertices with shared attributes - r

I'm trying to create a graph in R using igraph based on rules. I have a graph with nodes, each of which has several attributes. I'd like to add edges based on those attributes. Toy example:
library(igraph)
make_empty_graph() %>%
add_vertices(
nv = 5,
attr = list(
this_attr = sample(c("a", "b"), 5, replace = TRUE)
)
) %>%
{something here to add edges where this_attr is the same)
This appears to be a solution if I were using Gremlin in Python, but I don't grok it/igraph enough to translate to igraph: Gremlin: adding edges between nodes having the same property
If tidygraph would make this easier, that'd be an acceptable dependency.
Any help would be appreciated.
Edit: This works but feels super messy.
g <- igraph::make_empty_graph() %>%
igraph::add_vertices(
nv = 5,
attr = list(
sample_attr = sample(c("a", "b"), 5, replace = TRUE)
)
)
g %>%
igraph::vertex_attr() %>%
unname() %>%
purrr::map(
function(this_attribute) {
unique(this_attribute) %>%
purrr::map(
function(this_value) {
utils::combn(
which(this_attribute == this_value), 2
) %>%
as.integer()
}
) %>% unlist()
}
) %>%
unlist() %>%
igraph::add_edges(g, .)
Something similar but cleaner would be fantastic.

So, I don't think igraph has anything as succinct as the gremlin example in which a general statement of connect any vertex (A) with any vertex (B) if they share an attribute However, R provides a bunch of ways to do this with matrices (as #Julius showed) and data frames. Below is how I'd go about this problem with igraph and R.
Given the following graph:
set.seed(4321)
g <- make_empty_graph() %>%
add_vertices(nv = 5, attr = list(sample_attr = sample(c("a", "b"), 5, replace = TRUE)))
We can make a data frame with information taken from the vertices and then left_join it to itself using the attribute column. I'm assuming direction doesn't matter here and that we want to get rid of duplicates. If that is the case, then simply filter the node columns using a < operator.
edge_list <- data.frame(
#id = V(g)$name #if it has a name.....
id = 1:vcount(g), #if no name exists, then then the order of a vertex represents an id
attr = V(g)$sample_attr #the first item in this vector corresponds to the first vertex/node
) %>%
dplyr::left_join(., ., by = 'attr') %>% #join the data frame with itself
dplyr::filter(id.x < id.y) #remove self pointing edges and duplicates
# 1 %--% 2 equals 2 %--% 1 connection and are duplicates
Once we have information the edge list in a data frame, we need to convert the pair of node columns into a pairwise vector. This can be done by converting the columns into a matrix, transposing the matrix so that the rows are now columns, then converting the matrix into a single (pair-wise) vector.
edge_vector <- edge_list %>%
dplyr::select(id.x, id.y) %>% #select only the node/vertex columns
as.matrix %>% #convert into a matrix so we can make a pairwise vector
t %>% #transpose matrix because matrices convert to vectors by columns
c #now we have a pairwise vector
Now, all we need to do is add the pairwise vector and the associated attributes to the graph.
g <- add_edges(g,
edge_vector,
attr = list(this_attr = edge_list$attr)) #order of pairwise vector matches order of edgelist
Let's plot this to see if it worked.
set.seed(4321)
plot(g,
vertex.label = V(g)$sample_attr,
vertex.color = ifelse(V(g)$sample_attr == 'a', 'pink', 'skyblue'),
edge.arrow.size = 0)
Another potential solution is to start with a data frame instead of an empty graph. The data frame would represent a node list that we can join to itself and create an edge list.
set.seed(4321)
node_list <- data.frame(id = 1:5,
attr= sample(c('a', 'b'), 5, replace = T))
edge_list <- merge(node_list, node_list, by = 'attr') %>% #base R merge
.[.$id.x < .$id.y, c('id.x', 'id.y', 'attr')] #rearrange columns in base so first two are node ids
g <- graph_from_data_frame(d = edge_list, directed = F, vertices = node_list)
set.seed(4321)
plot(g,
vertex.label = V(g)$attr,
vertex.color = ifelse(V(g)$attr == 'a', 'pink', 'skyblue'),
edge.arrow.size = 0)

Given a graph,
g <- make_empty_graph() %>%
add_vertices(nv = 5, attr = list(this_attr = sample(c("a", "b"), 5, replace = TRUE)))
we can first define this adjacency matrix in terms of the attribute
(auxAdj <- tcrossprod(table(1:gorder(g), V(g)$this_attr)) - diag(gorder(g)))
# 1 2 3 4 5
# 1 0 1 1 1 0
# 2 1 0 1 1 0
# 3 1 1 0 1 0
# 4 1 1 1 0 0
# 5 0 0 0 0 0
and use it to add edges as in
g <- add_edges(g, c(t(which(auxAdj == 1, arr.ind = TRUE))))
where
c(t(which(auxAdj == 1, arr.ind = TRUE)))
# [1] 2 1 3 1 4 1 1 2 3 2 4 2 1 3 2 3 4 3 1 4 2 4 3 4
meaning the we want edges (2,1), (3,1), (4,1) and so on.

Related

Return list index from a list of tidygraph objects in R?

So, I have a list of multiple tidygraph objects and what Im trying to do is return the index a specific tidygraph object, selected by the user. Hopefully my example below will explain the problem.
(ASIDE: I have attempted a solution that I show below, but at the moment it is super slow to run. Im hoping to come up with a different, faster solution.)
To begin, I create some data to turn into tidygraph objects, then I create the tidygraph objects and put them all together into a list:
library(tidygraph)
# create some data for the tbl_graph
nodes <- data.frame(name = c("Hadley", "David", "Romain", "Julia"),
level = c(1,1,1,1),
rank = c(1,1,1,1))
nodes1 <- data.frame(name = c("Hadley", "David", "Romain", "Julia"),
level = c(1,1,1,1),
rank = c(2,2,2,2))
nodes2 <- data.frame(name = c("Hadley", "David", "Romain", "Julia"),
level = c(1,1,1,1),
rank = c(3,3,3,3))
nodes3 <- data.frame(name = c("Hadley", "David", "Romain", "Julia"),
level = c(2,2,2,2),
rank = c(1,1,1,1))
edges <- data.frame(from = c(1, 1, 1, 2, 3, 3, 4, 4, 4),
to = c(2, 3, 4, 1, 1, 2, 1, 2, 3))
# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes1, edges = edges)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges)
tg_3 <- tbl_graph(nodes = nodes3, edges = edges)
# put into list
myList <- list(tg, tg_1, tg_2, tg_3)
For clarity, looking at the 1st list element looks like this:
> myList[1]
[[1]]
# A tbl_graph: 4 nodes and 9 edges
#
# A directed simple graph with 1 component
#
# Node Data: 4 × 3 (active)
name level rank
<chr> <dbl> <dbl>
1 Hadley 1 1
2 David 1 1
3 Romain 1 1
4 Julia 1 1
#
# Edge Data: 9 × 2
from to
<int> <int>
1 1 2
2 1 3
3 1 4
# … with 6 more rows
We can see that each object has a variable called level and another called rank. What Im trying to do is return the list index of an object by selecting the level and rank number. So, for example, if I select level = 1 and rank = 2, my function would return the index of the object with those values (in this case the 2nd list element). My attempted solution to this is below, but it's a very slow process... I was wondering if there is a better way to achieve what I want?
My Attempted Solution
In my solution, I begin by turning each of the tidygraph objects in a tibble to make them easier to manipulate. And this is why my function is so slow. In my data, I could have up to 200,000 tidygraph objects in a list, so going through them and converting them all to tibbles is a very slow process. I do that like so:
# seperating out the list to make it easier to manipulate
list_obj <- lapply(myList, function(x){
edges <- tidygraph::activate(x, edges) %>% tibble::as_tibble()
nodes <- tidygraph::activate(x, nodes) %>% tibble::as_tibble()
return(list(edges = edges, nodes = nodes))
} )
And then this is the function I actually use to extract the index of the chosen object:
# this function returns the tree index asked for by user
getTreeListNumber <- function(listObj, level, rank){
res <- 0
listNumber <- NA
for(i in 1:length(listObj)){
res <- level %in% listObj[[i]]$nodes$level && rank %in% listObj[[i]]$nodes$rank
if(res == TRUE){
listNumber <- i
}
}
return(listNumber)
}
For example:
> getTreeListNumber(list_obj, level = 1, rank = 2)
[1] 2
By selecting the level and rank, the function return the objects index within the list. But is there a faster way to achieve this result?
You may try -
getTreeListNumber <- function(listObj, level, rank){
which(sapply(myList, function(x) {
nodes <- tidygraph::activate(x, nodes) %>% tibble::as_tibble()
all(nodes$level == level & nodes$rank == rank)
}))
}
getTreeListNumber(myList, 1, 2)
#[1] 2

Select data frame values row-wise using a variable of column names

Suppose I have a data frame that looks like this:
dframe = data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
# x y
# 1 1 4
# 2 2 5
# 3 3 6
And a vector of column names, one per row of the data frame:
colname = c('x', 'y', 'x')
For each row of the data frame, I would like to select the value from the corresponding column in the vector. Something similar to dframe[, colname] but for each row.
Thus, I want to obtain c(1, 5, 3) (i.e. row 1: col "x"; row 2: col "y"; row 3: col "x")
My favourite old matrix-indexing will take care of this. Just pass a 2-column matrix with the respective row/column index:
rownames(dframe) <- seq_len(nrow(dframe))
dframe[cbind(rownames(dframe),colname)]
#[1] 1 5 3
Or, if you don't want to add rownames:
dframe[cbind(seq_len(nrow(dframe)), match(colname,names(dframe)))]
#[1] 1 5 3
One can use mapply to pass arguments for rownumber (of dframe) and vector for column name (for each row) to return specific column value.
The solution using mapply can be as:
dframe = data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
colname = c('x', 'y', 'x')
mapply(function(x,y)dframe[x,y],1:nrow(dframe), colname)
#[1] 1 5 3
Although, the next option may not be very intuitive but if someone wants a solution in dplyr chain then a way using gather can be as:
library(tidyverse)
data.frame(colname = c('x', 'y', 'x'), stringsAsFactors = FALSE) %>%
rownames_to_column() %>%
left_join(dframe %>% rownames_to_column() %>%
gather(colname, value, -rowname),
by = c("rowname", "colname" )) %>%
select(rowname, value)
# rowname value
# 1 1 1
# 2 2 5
# 3 3 3

Using visNetwork to dynamically update nodes in R

the below snapshot visual is created using the "visNetwork" package. My requirement here is that I have to hard code the edges and also after using visHierarchicalLayout(), I am not able to see them in order, Please help me with a dynamic approach such that no matter how many numbers, I get consecutive numbers in order without hard code. Thanks and please help.
library(visNetwork)
nodes <- data.frame(id = 1:7, label = 1:7)
edges <- data.frame(from = c(1,2,3,4,5,6),
to = c(2,3,4,5,6,7))
visNetwork(nodes, edges, width = "100%") %>%
visEdges(arrows = "to") %>%
visHierarchicalLayout()
Using level attribute does the job, it aligns the network based on the order given.
library(visNetwork)
nodes <- data.frame(id = 1:7, label = 1:7, level = 1:7)
# Extract the id
num <- nodes$id
# Repeat the numbers
num2 <- rep(num, each = 2)
# Remove the first and last numbers
num3 <- num2[c(-1, -length(num2))]
#Create a data frame
edges <- as.data.frame(matrix(num3, ncol = 2, byrow = TRUE))
names(edges) <- c("from", "to")
visNetwork(nodes, edges, width = "100%") %>%
visEdges(arrows = "to") %>%
visHierarchicalLayout()
If I understand your question correctly, you want to create the edges data frame based on the id in the nodes data frame. Here is one option.
# Extract the id
num <- nodes$id
# Repeat the numbers
num2 <- rep(num, each = 2)
# Remove the first and last numbers
num3 <- num2[c(-1, -length(num2))]
# Create a data frame
edges <- as.data.frame(matrix(num3, ncol = 2, byrow = TRUE))
names(edges) <- c("from", "to")
edges
# from to
# 1 1 2
# 2 2 3
# 3 3 4
# 4 4 5
# 5 5 6
# 6 6 7

Deduplicating a data frame when the order of values may differ in R

Let's say I have a data.frame that looks like this:
df = data.frame(from=c(1, 1, 2, 1),
to=c(2, 3, 1, 4),
title=c("A", "B", "A", "A"),
stringsAsFactors=F)
df is an object that holds all of the various connections for a network graph. I also have a second data.frame, which is the simplified graph data:
df2 = data.frame(from=c(1, 1, 3),
to=c(2, 4, 1),
stringsAsFactors=F)
What I need is to pull the title values from df into df2. I can't simply dedup df because a) from and to can be in different orders, and b) title is not unique between connections. The current condition I have is:
df2$title = df$title[df2$from == df$from & df2$to == df$to]
However, this results in too few rows due to the order of from and to being reversed in row 2 of df2. If I introduce an OR condtion, then I get too many results because the connection between 1 and 2 will be matched twice.
My question, then, is how do I effectively "dedup" the title variable to append it to df2?
The expected outcome is this:
from to title
1 1 2 A
2 1 4 A
3 3 1 B
library(dplyr);
merge(mutate(df2, from1 = pmin(from, to), to1 = pmax(from, to)),
mutate(df, from1 = pmin(from, to), to1 = pmax(from, to)),
by = c("from1", "to1"), all.x = T) %>%
select(from1, to1, title) %>% unique()
# from1 to1 title
#1 1 2 A
#3 1 3 B
#4 1 4 A
Another way we can try, where edgeSort function produce unique edges if the two vertices are the same and use match function to match all equal edges.
edgeSort <- function(df) apply(df, 1, function(row) paste0(sort(row[1:2]), collapse = ", "))
df2$title <- df$title[match(edgeSort(df2), edgeSort(df))]
df2
from to title
1 1 2 A
2 1 4 A
3 3 1 B
I guess you can do it in base R by 2 merge statements:
step1 <- merge(df2, df, all.x = TRUE)
step2 <- merge(df2[is.na(step1$title),], df, all.x = TRUE, by.x = c("to", "from"), by.y = c("from", "to"))
rbind(step1[!is.na(step1$title),], step2)
from to title
1 1 2 A
2 1 4 A
3 3 1 B

Can't sort column with R

I want to create from the dataset a list that contains word and frequency of the word . I did it and saved into val named 'mylist'. now I want to sort the list according to the frequency of the word and to create barplot from the 10 words that have the higher frequency.
but I not succeeded to sort it. I tried many ways to change the type of 'mylist' to data.frame or date.table but still the column of the frequency stay a list.
To sumup I have the DT var that contains it is a list with 2 columns x-contains the words and type is character .
The 2 column is 'v' - that contains the frequency and it is a list.
I am not succeeding to sort it by the frequency.
please help me.
library(ggplot2)
libary(MASS)
#get the data
data.uri = "http://www.crowdflower.com/wp-content/uploads/2016/03/gender-classifier-DFE-791531.csv"
pwd = getwd()
data.file.name = "gender.csv"
data.file = paste0(pwd, "./", data.file.name)
download.file(data.uri, data.file)
data = read.csv(data.file.name)
#manipulate the data
data <- data[data$X_unit_id < 815719694,]
print(data$X_unit_id)
#get all female has white sidebar
female_colors <- subset(data, data$gender=="female")
female_colors$fav_number
#get all male fav_numbers
male_colors <- subset(data, data$gender=="male")
male_colors$fav_number
text_male = subset(data, data$gender=="male")
text_male = text_male$text
print(text_male[1])
print(length(text_male))
v <- text_male[1:length(text_male)]
print(v)
print (v[1])
count_of_list = 0;
x = list()
for ( i in v) {
# Merge the two lists.
x <- c(x,unlist(strsplit(i," ")))
}
count = 0;
mylist = list()
for (word in x){
for (xWord in x){
if (word == xWord)
count = count + 1;
}
key <- word
value <- count
mylist[[ key ]] <- value
count = 0;
}
libary(data.table)
require(data.table)
DT = data.table(x=c(names(mylist)),v=c(mylist))
DT
As suggested in comments, a reproducible example would be useful in creating an answer to help you. I will suggest a proposal anyway. Try to adapt this peocedure to your data.
Convert your list to a dataframe and use order:
df <- as.data.frame(your.data)
df <- data.frame(id = c("B", "A", "D", "C"), y = c(6, 8, 1, 5))
df
id y
1 B 6
2 A 8
3 D 1
4 C 5
df2 <- df[order(df$id), ]
df2
id y
2 A 8
1 B 6
4 C 5
3 D 1
It looks like you're using a cumbersome way to calculate the word counts, something like this is faster and simpler -
library(dplyr)
foo <- c("ant", "ant", "bat", "dog","egg","ant","bat")
bar <- rnorm(7, 5, 2)
df <- data.frame(foo, bar)
group_by(df, foo) %>% summarise(n = n()) %>% arrange(desc(n))
foo n
(fctr) (int)
1 ant 3
2 bat 2
3 dog 1
4 egg 1

Resources