plotting the most represented node in my igraph object in R - r

I have the fblog data set that is about french political party blogs.and is an object of igraph
I just want to plot the most represented party(node) in my the set
I used degree as below,
but now I dont know how to use it to plot it
I want just to show 20 of most important party(nodes) in my graph and plot them.
I hope you can help me
deg_g <-sort(igraph::degree(fblog, mode = "all", normalized = T),decreasing = TRUE)
class(deg_g)
UU<-deg_g[1:20]

In order to get the subgraph, you need to know which nodes have the highest degree, not what their degree is. Once you have that, you can just use induced_subgraph.
library(igraph)
library(sand)
data(fblog)
fblog = upgrade_graph(fblog)
DEG <-order(igraph::degree(fblog, mode = "all", normalized = T),
decreasing = TRUE)
HighDeg = induced_subgraph(fblog, DEG[1:20])
plot(HighDeg)
I am sure that you can layout the graph to make it prettier, but this is the subgraph that you requested.

Related

How to get the best polygon point pattern data in spatstat analysis in R

I have a dataset of spatial locations data. I want to do a point pattern analysis using the spatstat package in R using this data. I want the best polygon area for the analysis instead of the rectangle area. The code I have is
original_data = read.csv("/home/hudamoh/PhD_Project_Moh_Huda/Dataset_files/my_coordinates.csv")
plot(original_data$row, original_data$col)
which results in a plot that looks like this
Setting the data for point pattern data
point_pattern_data = ppp(original_data$row, original_data$col, c(0, 77), c(0, 116))
plot(point_pattern_data)
summary(point_pattern_data)
resulting in a plot that looks like this
#The observed data has considerably wide white spaces, which I want to remove for a better analysis area. Therefore, I want to make the point pattern a polygon instead of a rectangle. The vertices for the polygon are the pairs of (x,y) below to avoid white space as much as possible.
x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)
y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)
I find these vertices above manually by considering the plot below (with the grid lines)
plot(original_data$row,original_data$col)
grid(nx = 40, ny = 25,
lty = 2, # Grid line type
col = "gray", # Grid line color
lwd = 2) # Grid line width
So I want to make the point pattern polygon. The code is
my_data_poly = owin(poly = list(x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40), y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)))
plot(my_data_poly)
but it results in an error. The error is
I fix it by
my_data_poly = owin(poly = list(x = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8), y = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)))
plot(my_data_poly)
It results in a plot
However, this is not what I want. How to get the observed area as a polygon in point pattern data analysis?
This should be a reasonable solution to the problem.
require(sp)
poly = Polygon(
cbind(original_data$col,
original_data$row)
))
This will create a polygon from your points. You can use this document to understand the sp package better
We don’t have access to the point data you read in from file, but if you just want to fix the polygonal window that is not a problem.
You need to traverse the vertices of your polygon sequentially and anti-clockwise.
The code connects the first point you give to the next etc. Your vertices are:
library(spatstat)
x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)
y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)
vert <- ppp(x, y, window = owin(c(0,80),c(0,120)))
plot.ppp(vert, main = "", show.window = FALSE, chars = NA)
text(vert)
Point number 13 is towards the bottom left and 14 in the top right, which gives the funny crossing in the polygon.
Moving the order around seems to help:
xnew <- c(x[1:11], x[13:12], x[23:14])
ynew <- c(y[1:11], y[13:12], y[23:14])
p <- owin(poly = cbind(xnew, ynew))
plot(p, main = "")
It is unclear from your provided plot of the data that you really should apply point pattern analysis.
The main assumption underlying point process modelling as implemented in spatstat
is that the locations of events (points) are random and the process that
generated the random locations is of interest.
Your points seem to be on a grid and maybe you need another tool for your analysis.
Of course spatstat has a lot of functionality for simply handling and summarising data like this so you may still find useful tools in there.

Mark.groups in Igraph Error: unknown vertex.names selected

I am trying to create a network plot in igraph where
communities are marked by a color overlay as created by mark.groups
nodes are colored by a node attribute: deu
nodes are shaped a node attribute: topic_type
For this, I created an igraph object
And now, I try the following code:
set.seed(2)
plot(graph_deu,
mark.groups=list(c(33,1,34,2,36,53,54,56,42,43,55,57,18), c(35,48,50,27), c(38,45,46,47,49,28,25)),
mark.col=c("lemonchiffon", "slategray1", "thistle1"),
mark.border = NA,
edge.width =E(graph_deu)$weight,
vertex.size = deu_deg,
vertex.color = deu,
vertex.shape = topic_type,
vertex.label = node_labels,
vertex.label.cex=1.5
)
And I get the error:
Fehler in simple_vs_index(x, ii, na_ok) : Unknown vertex selected.
This seems to be igraph not finding the vertices as specified in mark.groups, but I have actually no idea why it would not find these vertices, as they are all correctly numbered.
Then, to avoid mark.groups, I tried another option - I directly plot the community object (mod2), however, in this case, nodes get colored according to community and not according to attribute deu:
plot(mod2, graph_deu,
edge.width =E(graph_deu)$weight,
vertex.size = deu_deg,
vertex.color = deu,
vertex.shape = topic_type,
vertex.label = node_labels,
vertex.label.cex=1.5)
This produces a network where vertices are colored by community, not by deu-attribute. What I would like it to look: I would like the communities to be circled by the semi-transparent overlay, but the nodes should be individually colored by deu-attribute.
Your help will be very appreciated. this is my first post on stackoverflow so if I should provide more pieces of code to reproduce I am happy to share it, I hope though that my igraph object is enough for the problem at hand.
Your graph has only 24 nodes, but you are referring to nodes using number higher than that, e.g. 36,53,54. If you use a number, igraph assumes that is the number of the node, so these number don't make sense for this graph. What you mean is the nodes with the names "36","53","54". The names are strings, not numbers. What you need to do is find the node numbers that correspond to these names. I show one way to do that below. Also, your plot statement refers to a number of variables that you did not provide so I commented them out here.
graph_deu = upgrade_graph(graph_deu)
plot(graph_deu)
Group1 = as.numeric(V(graph_deu)[as.character(c(33,1,34,2,36,53,54,56,42,43,55,57,18))])
Group2 = as.numeric(V(graph_deu)[as.character(c(35,48,50,27))])
Group3 = as.numeric(V(graph_deu)[as.character(c(38,45,46,47,49,28,25))])
set.seed(2)
plot(graph_deu,
mark.groups=list(Group1, Group2, Group3),
mark.col=c("lemonchiffon", "slategray1", "thistle1"),
mark.border = NA,
edge.width =E(graph_deu)$weight,
# vertex.size = deu_deg,
# vertex.color = deu,
# vertex.shape = topic_type,
# vertex.label = node_labels,
vertex.label.cex=1.5
)

(R language) Understanding what is a "weighted" graph

I am using R and the igraph library to learn about network graph data. In particular, I am trying to understand the concept of a "weighted graph" - from what I have read, the "weights" are generally associated with the "Edges" in the graph. But can the "weights" ever be associated with the "nodes"? (sometimes, I see that "nodes" are also referred to as "vertexes")
Suppose I have two datasets : one for the nodes and one for the edges.
library(igraph)
library(visNetwork)
Nodes <-data.frame(
"Source" = c("123","124","125","122","111", "126"),
"Salary" = c("100","150","200","200","100", "100"),
"Debt" = c("10","15","20","20","10", "10"),
"Savings" = c("1000","1500","2000","2000","1000", "1000")
)
Nodes$Salary= as.numeric(Nodes$Salary)
Nodes$Debt = as.numeric(Nodes$Debt)
Nodes$Savings = as.numeric(Nodes$Savings)
mydata <-data.frame(
"source" = c("123","124","123","125","123"),
"target" = c("126", "123", "125", "122", "111"),
"color" = c("red","red","green","blue","red"),
"food" = c("pizza","pizza","cake","pizza","cake")
)
Normally, I would have made a simple binary graph for this data, in which the entire analysis would only involve two columns:
#make graph
graph <- graph_from_data_frame(mydata[,c(1:2)], directed=FALSE)
simple_graph<- simplify(graph)
plot(simple_graph)
#do some clustering on the graph#
fc <- fastgreedy.community(simple_graph)
V(simple_graph)$community <- fc$membership
nodes <- data.frame(id = V(simple_graph)$name, title = V(simple_graph)$name, group = V(simple_graph)$community)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(simple_graph, what="edges")[1:2]
visNetwork(nodes, edges) %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
Now, I want to explore the concept of a "weighted graph". I want to make a graph such that I can use the financial information (salary, debt, savings) for each node in the analysis. The way I see it, this would assign a notion of "weight" to the nodes and not the edges, correct?
A very basic way to approach this problem, would be to take the average(salary, debt and savings) for each node and considering this average amount as a weight. This way, we could begin to ask questions such as "are nodes with larger average financial amounts more likely to form relationships with one another, compared to nodes with smaller average financial amounts?" (in network science, I believe this concept is referred to as "homophily")
Thus, we can modify the file containing information about the nodes (calculate average financial amount for each node) :
nodes_avg = data.frame(ID=Nodes[,1], Means=rowMeans(Nodes[,-1]))
Now, we need to create a new graph in which this averaged financial information is considered as a "weight". This is where I begin to get confused.
This way does not work:
set_vertex_attr(simple_graph, Weight, index = V(graph), nodes_avg$Means)
Error in as.igraph.vs(graph, index) :
Cannot use a vertex sequence from another graph.
I tried the following command, but I got a warning message:
E(simple_graph)$weight <- nodes_avg$Means
Warning message:
In eattrs[[name]][index] <- value :
number of items to replace is not a multiple of replacement length
Finally, I tried this command, but I don't think it is using the averaged financial amounts as node weights:
weighted_graph <- graph_from_data_frame(mydata, directed=TRUE, vertices=nodes_avg)
Does anyone know how can I make a "weighted_graph" with the averaged financial amounts, and then run a clustering algorithm on the network graph which takes into consideration the node weights? Something like this:
simple_weighted_graph<- simplify(weighted_graph)
plot(simple_weighted_graph)
#do some clustering on the weighted_graph#
fc <- fastgreedy.community(simple_weighted_graph)
V(simple_weighted_graph)$community <- fc$membership
nodes <- data.frame(id = V(simple_weighted_graph)$name, title = V(simple_weighted_graph)$name, group = V(simple_weighted_graph)$community)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(simple_weighted_graph, what="edges")[1:2]
visNetwork(nodes, edges) %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
Or is this not possible? That is, weighted graphs are only made using "edge weights" and CAN NOT be done using "node weights" ... and therefore, graph network clustering can not be done on a weighted graph made of node weights.
Thanks

compare communities from graphs with different number of vertices

I am calculating louvain communities on graphs of communications data, where vertices represent performers on a big project. The graphs represent different communication methods (e.g., email, phone).
We want to try to identify teams of performers from their communication data. Since performers have preferences for different communication methods, the graphs are of different sizes and may have some unique vertices which may not be present in both. When I try to compare the community objects from the respective graphs, igraph::compare() throws an exception. See toy reprex below.
I considered a dplyr::full_join() or inner_join() of the vertex lists before constructing the graph & community objects to make them the same size, but worry about the impact of doing so on the resulting cluster_louvain() solutions.
Any ideas on how I can compare the community objects to one another from these different communication methods? Thanks in advance!
library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)
nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
to = "sample2"))
net <- graph_from_data_frame(d = edge, vertices = nodes, directed = FALSE)
com <- cluster_louvain(net)
nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23"
)))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
to = c("sample21", "sample22")))
net2 <- graph_from_data_frame(d = edge2, vertices = nodes2, directed = FALSE)
com2 <- cluster_louvain(net2)
# # uncomment to see graph plots
# plot.igraph(net, mark.groups = com)
# plot.igraph(net2, mark.groups = com2)
compare(com, com2)
#> Error in i_compare(comm1, comm2, method): At community.c:3106 : community membership vectors have different lengths, Invalid value
Created on 2019-02-22 by the reprex package (v0.2.1)
You will not (I don't believe) be able to compare clustering algorithms from two different graphs that contain two different sets of nodes. Practically you can't do it in igraph and conceptually its hard because the way clustering algorithms are compared is by considering all pairs of nodes in a graph and checking whether they are placed in the same cluster or a different cluster in each of the two clustering approaches. If both clustering approaches typically put the same nodes together and the same nodes apart then they are considered more similar.1
I suppose another valid way to approach the problem would be to evaluate how similar the clustering schemes are for purely the set of nodes that are the intersection of the two graphs. You'll have to decide what makes more sense in your setting. I'll show how to do it using the union of nodes rather than the intersection.
So you need all the same nodes in both graphs in order to make the comparison. In fact, I think the easier way to do it is to put all the same nodes in one graph and have different edge types. Then you can compute your clusters for each edge type separately and then make the comparison. The reprex below is hopefully clear:
# repeat your set-up
library(tidyverse, warn.conflicts = FALSE)
library(igraph, warn.conflicts = FALSE)
nodes <- as_tibble(list(id = c("sample1", "sample2", "sample3")))
edge <- as_tibble(list(from = "sample1",
to = "sample2"))
nodes2 <- as_tibble(list(id = c("sample1","sample21", "sample22","sample23")))
edge2 <- as_tibble(list(from = c("sample1", "sample21"),
to = c("sample21", "sample22")))
# approach from a single graph
# concatenate edges
edges <- rbind(edge, edge2)
# create an edge attribute indicating network type
edges$type <- c("phone", "email", "email")
# the set of nodes (across both graphs)
nodes <- unique(rbind(nodes, nodes2))
g <- graph_from_data_frame(d = edges, vertices = nodes, directed = F)
# We cluster over the graph without the email edges
com_phone <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="email"]))
plot(g, mark.groups = com_phone)
# Now we can cluster over the graph without the phone edges
com_email <- cluster_louvain(g %>% delete_edges(E(g)[E(g)$type=="phone"]))
plot(g, mark.groups = com_email)
# Now we can compare
compare(com_phone, com_email)
#> [1] 0.7803552
As you can see from the plots we pick out the same initial clustering structure you found in the separate graphs with the additions of the extra isolated nodes.
1: Obviously this is a pretty vague explanation. The default algorithm used in compare is from this paper, which has a nice discussion.

Visualizing relation between two objects using R and export to HTML

I am using R to visualize relation between, say, 5-6 different nodes. Now, a graph is probably the best way to do it. The issue is, the edges are way too many. There can be a hundred edges between two vertexes. That makes the graph look very clumsy. I want the edge name to be displayed. With a hundred edge name being displayed, they overlap over each other and hence not comprehensible.
So, I have two questions-
Is there any other way in which I can represent the data? Some kind of chart probably?
I want to export the final output to HTML, which uses d3.js or any other similar library, keeping the edge name and a few other similar information intact. What will be the best plugin to use in that case?
I am using the igraph library to create the graph in R.
I also tried using the networkD3 library to export it to an HTML and make it interactive.
graph <- graph.data.frame(edges, directed = TRUE, vertices = vertex)
plot(graph, edge.label = E(graph)$name)
wc <- cluster_walktrap(graph)
members <- membership(wc)
graph_d3 <- igraph_to_networkD3(graph, group = members)
graph_forceNetwork <- forceNetwork(Links = graph_d3$links, Nodes = graph_d3$nodes,
Source = 'source',
Target = 'target',
NodeID = 'name',
Group = 'group',
zoom = TRUE,
fontSize = 20)
Right now, it is a graph with only two vertex and about 60-70 edges between them. So, I did not use any particular layout.

Resources