choose neighborhood that have specific edge weights in igraph R - r

I have a weighted, undirected network with weights 1 and 2. I need to choose the neighbors of all the vertices that are up to 5 step away. However the 5 steps should include 2 edges with weight=2. For example, if all the 5 edges have weight 1, these neighbors should be excluded.
Question: How do I choose neighbors that are connected with specific edge weights?
Code:
matrix= matrix(as.integer(runif(100,0,3)), 10, 10)
matrix
ntwrk=graph.adjacency(matrix,weighted=TRUE, mode="undirected")
neighborhood(ntwrk,5)
Now, I need to figure out which one of those include edges with weight=2. Then, I need to keep only those neighbors, and measure the neighborhood size with neighborhood.size

The result of neighborhood is a list of of list of edges. You can use lapply and filter each list using the attribute weight.
res.tokeep <- lapply(res, function(x) which(E(ntwrk)[x]$weight==2))
Here a complete example , where I plot the graph before and after the weight filter.
library(igraph)
set.seed(1)
mat = matrix(as.integer(runif(10*10,0,3)), 10, 10)
ntwrk=graph.adjacency(mat,weighted=TRUE, mode="undirected")
res <- neighborhood(ntwrk,5)
op <- par(mfrow=c(1,2))
E(ntwrk)$label <- E(ntwrk)$weight
plot(ntwrk)
res.tokeep <- lapply(res, function(x) which(E(ntwrk)[x]$weight==2))
res.todelete <- lapply(res, function(x) which(E(ntwrk)[x]$weight!=2))
ntwrk <- delete.vertices(ntwrk, unique(unlist(res.todelete)))
plot(ntwrk)
par(mfrow=op)

Related

How do I see the two nodes of most weighted egdes in R igraph?

I have an igraph network object constructed in R and generated weight information for each edge. I want to see the nodes of the most weighted edges (descending). What codes should I use to do that? Thank you!
# create an igraph project of user interaction network and check descriptives.
library(igraph)
#edge list
EL = read.csv("(file path omitted)user_interaction_structure.csv")
head(EL)
#node list: I do not have a node list
#construct an igraph oject
g <- graph_from_data_frame(EL, directed = TRUE, vertices = NULL)
#check the edge and node number of the network
gsize(g)
vcount(g)
#check nodes based on degree (descending)
deg <- igraph::degree(g)
dSorted <-sort.int(deg,decreasing=TRUE,index.return=FALSE)
dSorted
#check edges based on weight
E(g)
#the network will contain loop edges and multiple edges
#simplify multiple edges
g_simple <- graph.adjacency(get.adjacency(g),weighted=TRUE)
#check edge weight
E(g_simple)$weight
#igraph can generate a matrix
g_simple[]
Then I wanted to see who were interacting heavily with whom (the nodes of the edges with the largest weight),so I tried
e_top_weights <- order(order(E(g_simple))$weight, decreasing=TRUE)
but it did not work.
I think what you want is the igraph function strength(), which gives the sum of the weights of the edges incident to each node. Here's an example:
library(igraph)
# A small graph we can visualize
g <- make_ring(5)
# Assign each edge an increasing weight, to make things
# easy
edgeweights<- 1:ecount(g)
E(g)$weight <- edgeweights
# The strength() function sums the weights of edges incident
# to each node
strengths <- strength(g)
# We can collect the top two strengths by sorting the
# strengths vector, then asking for which elements of the
# strengths vector are equal to or greater than the second
# largest element.
toptwo <- which(strengths >= sort(strengths, decreasing = TRUE)[2])
## [1] 4 5
# Assign nodes a color blue that is more saturated when nodes
# have greater strength.
cr <- colorRamp(c(rgb(0,0,1,.1), rgb(0,0,1,1)), alpha = TRUE)
colors <- cr(strengths/max(strengths))
V(g)$color <- apply(colors, 1, function(row) rgb(row[1], row[2], row[3], row[4], maxColorValue = 255))
# Plot to confirm
plot(g, edge.width = edgeweights)
Edit
Here are two different ways to find the two nodes (the "from" node and the "to" node) which are the ends of the edge with the maximum weight:
## 1
edge_df <- as_data_frame(g, "edges")
edge_df[which(edge_df$weight == max(edge_df$weight)), c("from", "to")]
## 2
max_weight_edge <- E(g)[which(E(g)$weight == max(E(g)$weight))]
ends(g, es = max_weight_edge)

How to calculate the number of vertices contracted into one graph?

I have a few large igraph objects that represent social networks. All nodes have various attributes, among them sector which is a factor variable. I have contracted this large network into a small where vertices represent groups and edges have the sum of individual edges in the original network. The label attribute in the second network represents the sector attribute in the first.
groupnet <- contract(g, as.integer(as.factor(V(g)$sector)), "ignore")
E(groupnet)$weight <- 1
groupnet <- simplify(groupnet, edge.attr.comb = list(weight = "sum"))
V(groupnet)$label <- levels(as.factor(V(g)$sector))
I would like to add another attribute to the second object V(groupnet)$groupsize that represents the number of original vertices that were contracted into groupnet. I have tried it with the following code but it did not work:
V(groupnet)$groupsize <- length(V(g)$sector[V(g)$sector == V(groupnet)$label])
How can I do this properly?
table() could be helpful here. Try out:
set.seed(1234)
library(igraph)
g <- make_ring(1000)
V(g)$sector <- factor(sample(LETTERS, 100, replace = T))
V(g)$sector
## contracted network
groupnet <- contract(g, as.integer(as.factor(V(g)$sector)), "ignore")
E(groupnet)$weight <- 1
V(groupnet)$label <- levels(as.factor(V(g)$sector))
## number of original vertices that were contracted into groupnet
# the tip is to see that table(V(g)$sector) provides the number of vertices per sector and
# its output is also arranged like V(groupnet)
table(V(g)$sector)
V(groupnet)
# solution
V(groupnet)$groupsize <- as.numeric(table(V(g)$sector))

Clustering based on connectivity of points

I have 1 million records of lat long [5 digits precision] and Route. I want to cluster those data points.
I dont want to use standard k-means clustering as I am not sure how many clsuters [tried Elbow method but not convinced].
Here is my Logic -
1) I want to reduce width of lat long from 5 digits to 3 digits.
2) Now lat longs which are in range of +/- 0.001 are to be clustered in once cluster. Calculate centroid of cluster.
But in doing so I am unable to find good algorithm and R Script to execute my thought code.
Can any one please help me in above problem.
Thanks,
Clustering can be done based on connected components.
All points that are in +/-0.001 distance to each other can be connected so we will have a graph that contains subgraphs that each may be a single poin or a series of connected points(connected components)
then connected components can be found and their centeroid can be calculated.
Two packages required for this task :
1.deldir to form triangulation of points and specify which points are adaject to each other and to calculate distances between them.
2 igraph to find connected components.
library(deldir)
library(igraph)
coords <- data.frame(lat = runif(1000000),long=runif(1000000))
#round to 3 digits
coords.r <- round(coords,3)
#remove duplicates
coords.u <- unique(coords.r)
# create triangulation of points. depends on the data may take a while an consume more memory
triangulation <- deldir(coords.u$long,coords.u$lat)
#compute distance between adjacent points
distances <- abs(triangulation$delsgs$x1 - triangulation$delsgs$x2) +
abs(triangulation$delsgs$y1 - triangulation$delsgs$y2)
#remove edges that are greater than .001
edge.list <- as.matrix(triangulation$delsgs[distances < .0011,5:6])
if (length(edge.list) == 0) { #there is no edge that its lenght is less than .0011
coords.clustered <- coords.u
} else { # find connected components
#reformat list of edges so that if the list is
# 9 5
# 5 7
#so reformatted to
# 3 1
# 1 2
sorted <- sort(c(edge.list), index.return = TRUE)
run.length <- rle(sorted$x)
indices <- rep(1:length(run.length$lengths),times=run.length$lengths)
edge.list.reformatted <- edge.list
edge.list.reformatted[sorted$ix] <- indices
#create graph from list of edges
graph.struct <- graph_from_edgelist(edge.list.reformatted, directed = FALSE)
# cluster based on connected components
clust <- components(graph.struct)
#computation of centroids
coords.connected <- coords.u[run.length$values, ]
centroids <- data.frame(lat = tapply(coords.connected$lat,factor(clust$membership),mean) ,
long = tapply(coords.connected$long,factor(clust$membership),mean))
#combine clustered points with unclustered points
coords.clustered <- rbind(coords.u[-run.length$values,], centroids)
# round the data and remove possible duplicates
coords.clustered <- round(coords.clustered, 3)
coords.clustered <- unique(coords.clustered)
}

R, determine shortest path

I have a graph and need the shortest distance between all nodes.
Now I made the following function,
shortestPath <- function(streets, length)
{
streets <- matrix(streets, byrow=TRUE, ncol=2) # from -> to
g <- graph.data.frame(as.data.frame(streets)) # create graph, see plot(g)
return <- shortest.paths(g, weights = length) # return routes lengths
}
Here streets is a vector which contains data where we have an edge and length is (obviously) the length of the edge.
I have the following graph where each edge has length two, note that the graph has to be undirected.
You can use the following data to reproduce the problem.
# Data
edges <- c(1,2, 2,3, 3,4, 4,5, 2,6, 3,7, 4,8, 6,8);
length <- rep(2,8);
aantalNodes <- 8;
# Determine shortest path
routes <- matrix(shortestPath(edges,length), byrow=FALSE, ncol=aantalNodes);
We clearly see that the shortest path between node 6 and node 8 has length 2, however, this function returns length 4. What's going wrong? I'm already tinker with it for two days. Looking forward for you help!
You may want to have a look at the rownames and colnames of shortestPath(edges,length). It's really rather revealing...
res <- shortestPath(edges,length)
res[order(as.integer(rownames(res))),
order(as.integer(colnames(res)))]

2nd Degree Connections in igraph

I think have this working correctly, but I am looking to mimic something similar to Facebook's Friend suggestion. Simply, I am looking to find 2nd degree connections (friends of your friends that you do not have a connection with). I do want to keep this as a directed graph and identify the 2nd degree outward connections (the people your friends connect to).
I believe my dummy code is achieving this, but since the reference is on indices and not vertex labels, I was hoping you could help me modify the code to return useable names.
### create some fake data
library(igraph)
from <- sample(LETTERS, 50, replace=T)
to <- sample(LETTERS, 50, replace=T)
rel <- data.frame(from, to)
head(rel)
### lets plot the data
g <- graph.data.frame(rel)
summary(g)
plot(g, vertex.label=LETTERS, edge.arrow.size=.1)
## find the 2nd degree connections
d1 <- unlist(neighborhood(g, 1, nodes="F", mode="out"))
d2 <- unlist(neighborhood(g, 2, nodes="F", mode="out"))
d1;d2;
setdiff(d2,d1)
Returns
> setdiff(d2,d1)
[1] 13
Any help you can provide will be great. Obviously I am looking to stay within R.
You can index back into the graph vertices like:
> V(g)[setdiff(d2,d1)]
Vertex sequence:
[1] "B" "W" "G"
Also check out ?V for ways to get at this type of info through direct indexing.
You can use the adjacency matrix $G$ of the graph $g$ (no latex here?). One of the properties of the adjacency matrix is that its nth power gives you the number of $n$-walks (paths of length n).
G <- get.adjacency(g)
G2 <- G %*% G # G2 contains 2-walks
diag(G2) <- 0 # take out loops
G2[G2!=0] <- 1 # normalize G2, not interested in multiplicity of walks
g2 <- graph.adjacency(G2)
An edge in graph g2 represents a "friend-of-a-friend" bond.

Resources