how to specify the labels of vertices in R - r

I have an matrix as below:
jerry peter king
jerry 1 0 0
peter 0 1 0
king 1 1 1
Now I am trying to draw a graph standing for the matrix with the code below:
t <- read.table("../data/table.dat");
adjm <- data.matrix(t);
g1 <- graph.adjacency(adjm,add.colnames=NULL);
plot(g1, main="social network", vertex.color="white", edge.color="grey", vertex.size=8,
vertex.frame.color="yellow");
The labels of the vertices is the id, so my question is how do I set the label of the vertices by the dimnames of the matrix?
I have tried to the code
vertex.label=attr(adjm,"dimnames")
but get the wrong graph.

There are 2 ways to do this:
When you create the graph object, assign the names to a vertex attribute called label. This is the default that plot.igraph() looks for when plotting.
g1 <- graph.adjacency(adjm,add.colnames='label')
Use the V iterator to extract the name vertex attribute, which is how they are stored if you use add.colnames=NULL.
plot(g1, main="social network", vertex.color="white", edge.color="grey", vertex.size=8, vertex.frame.color="yellow", vertex.label=V(g1)$name)
Either way will give you your desired result. Something like:

Related

How to draw graph of relationships from binary matrix

I'm trying to create a graph with relationships of people involved in the 9/11 attacks, but I don't understand the input very much. I use loops to group the hijackers (hijacker1 knows hijacker2; hijacker5 knows hijacker3 etc.) but it doesn't work for me.
The result of my work should be a relationship graph as on this page: LINK
I use data in csv format: Data to download
The data schema looks like the screenshots below. There are three files available, but if I understand correctly to get what I want, enough data from the first file (?)
Hijackers ASSOCIATES
Hijackers ATTR
Hijackers PRIORITY_CONTACT
Hname1 HName2 HName3
HName1 0 1 0
HName2 1 0 1
HName3 0 1 0
...
I would like to draw a relationship diagram and extract information about which of the hijackers had the most relationships (Should I use betweenness() from igraph library?).
Here's an approach with igraph:
First, let's grab the data and make it into an adjacency matrix:
temp <- tempfile(fileext = ".zip")
download.file("https://sites.google.com/site/ucinetsoftware/datasets/covert-networks/911hijackers/9%2011%20Hijackers%20CSV.zip?attredirects=0&d=1",
temp,
mode = "wb")
data <- read.csv(unz(temp,"CSV/9_11_HIJACKERS_ASSOCIATES.csv"))
my.rownames <- data$X
data2 <- sapply(data[,-1], as.numeric)
rownames(data2) <- my.rownames
Adj <- as.matrix(data2)
Now the easy parts. We can convert the adjacency matrix into an igraph graph, compute vertex degree and add that data to to the graph.
library(igraph)
Graph <- graph_from_adjacency_matrix(Adj)
V(Graph)$vertex_degree <- degree(Graph)
Finally we can plot the graph with the vertex size being proportional to the degree:
plot.igraph(Graph,
vertex.size = V(Graph)$vertex_degree,
layout=layout.fruchterman.reingold, main="Hijacker Relationships")

Why does the matrix made from as_adjacency_matrix(g) for undirected graphs sometimes show entries which are 2 instead of 0 or 1?

In igraph in R, if we use as_adjacency_matrix(g) on an undirected graph g, sometimes I see an entry of 2 versus just getting 0 or 1. I am used to only getting 0's or 1's but what does a 2 mean and how can I get rid of it?
as_adj() returns a matrix where any edges between two edges are counted. If you have more than one edge between two vertices, the weighted matrix will contain numbers above one. Compare these two examples:
# Single edges give weights of 1
g <- make_empty_graph(directed=F) %>% add_vertices(3) %>% add_edges(c(1,2,2,3,3,1))
plot(g)
( as_adj(g) )
# Multiple links give weights > 1
g <- g %>% add_edges(c(1,2,2,3,2,3))
( as_adj(g) )
plot(g)

DiagrammeR: Devise a graph from a dataframe

Objective (In the R environment): extract nodes and edges from a dataframe to use them for modeling a graph!!
I am trying to learn how to work with DiagrammeR or any other graph modeling libraries in order to get a graph such as the one in below (you can follow the link [The GRAPH1]) from a data frame :
The data frame:
a b c classes
1 2 0 a
0 0 2 b
0 1 0 c
I have used DiagrammeR library and defined nodes and edges manually by these commands:
library(DiagrammeR)
egrViz("
digraph boxes_and_circles{
#add the node statement
node[shape=box]
a; b; c;
#add the nodge statement
a->a [label=1]; a-> b[label=2]; b->c[label=2]; c->b[label=1]
graph [nodesep=0.1]
}
")
Could you help me to understand how I can get the nodes and edges automatically? Thank you in advance.
You can do this with the igraph package. Your data frame is an adjacency matrix and igraph contains a function to make that into a graph. My code below adds a layout to position the vertices in the positions that you indicated in your sample graph.
## Your data
df = read.table(text="a b c classes
1 2 0 a
0 0 2 b
0 1 0 c",
header=TRUE)
library(igraph)
g = graph_from_adjacency_matrix(as.matrix(df[,1:3]), weighted=TRUE)
LO = matrix(c(0,0,0,3,2,1), ncol=2)
plot(g, layout=LO, edge.label=E(g)$weight, vertex.shape="rectangle",
vertex.color="white", edge.curved=c(0,0,0.15,0.15))

Get clusters from PCA r

I have a PCA that shows two really big clusters and I dont know how to figure out which of my samples are in each cluster.
If it helps, Im using prcomp to generate the PCA:
pca1 <- autoplot(prcomp(df), label = TRUE, label.size = 2)
My approach has been to attempt to cluster the PCA output using kmeans with 2 groups to get the clusters:
pca <- prcomp(df, scale.=TRUE)
clust <- kmeans(pca$x[,1:2], centers=2)$cluster
I can then make a beautiful plot, but I am still lost as to which samples are in each cluster. For reference, here is the plot generate if I graph the kmeans output:
As you can see in the first PCA plot, the labels literally say which sample each dot is. My ideal output would be a two column txt file with the sample name in one column, and the group it belongs to in the other column.
All that aside, if there is a better way, please let me know.
Thanks in advance.
Here is a chunk of my data:
a b c b e
Sample_1013 312011 624559 625898 534309 220415
Sample_1046 474774 949458 951145 843049 366136
Sample_104 645363 1290450 1292520 919474 272200
Sample_1057 267319 534685 535294 690574 422645
Sample_106 414065 830571 834527 657354 234130
Sample_107 299289 602483 603756 566256 262153
In my question, clust is the name of the output from my kmeans:
clust <- kmeans(pca$x[,1:2], centers=2)$cluster
I typed clust into the terminal and got which samples belong to each group:
> clust
Sample_1013 Sample_1046 Sample_104 Sample_1057 Sample_106 Sample_107
1 1 1 1 1 1
Sample_1098 Sample_109 Sample_1109 Sample_1129 Sample_1130 Sample_1140
1 1 1 1 1 1
Sample_1149 Sample_115 Sample_118 Sample_1220 Sample_1223 Sample_1225
1 1 1 1 1 1
Hopefully this helps someone.

How to randomly select 2 vertices from a graph in R?

I'm new to R, and I'm trying to randomly select 2 vertices from a graph.
What I've done so far is:
First, set up a graph
edgePath <- "./project1/data/smalledges.csv"
edgesMatrix <- as.matrix(read.csv(edgePath, header = TRUE, colClasses = "character"))
graph <- graph.edgelist(edgesMatrix)
The smalledges.csv is a file look like this:
from to
4327231 2587908
Then I get all the vertices from the graph into a list:
vList <- as.list(get.data.frame(graph, what = c("vertices")))
After that, I try to use:
sample(vList, 2)
But what I've got is an error:
cannot take a sample larger than the population when 'replace = FALSE'
I guess it's because R thinks what I want is 2 random lists, so I tried this:
sample(vList, 2, replace = TRUE)
And then I've got 2 large lists... BUT THAT'S NOT WHAT I WANTED! So guys, how can I randomly select 2 vertices from my graph? Thanks!
Not clear from your question whether you want just the vertices, or a sub-graph containing those vertices. Here's an example of both.
library(igraph)
set.seed(1) # for reproducible example
g <- erdos.renyi.game(10, 0.3)
par(mfrow=c(1,3), mar=c(1,1,1,1))
set.seed(1) # for reproducible plot
plot(g)
# random sample of vertices
smpl <- sample(1:vcount(g),5)
V(g)[smpl] # 5 random vertices
# Vertex sequence:
# [1] 9 5 7 2 4
# change the color of only those vertices
V(g)[smpl]$color="lightgreen" # make them light green
set.seed(1) # for reproducible plot
plot(g)
# create a sub-graph with only those vertices, retaining edge structure
sub.g <- induced.subgraph(g,V(g)[smpl])
plot(sub.g)

Resources