I have data which provides information on flows by actor c, broken down by inputs originating from source s, to partner p.
Network data normally has only one information: Data / Information flows from A->B, B->C etc.
However, my data shows which flows from A->B then goes to C, and which from A->B goes to D.
The data is structured as a three-column edgelist.
source <- c("A", "D", "B", "B")
country <- c("B", "B", "A", "A")
partner <- c("C", "C", "C", "D")
value <- c("5", "0", "2", "4")
df <- data.frame(source, country, partner, value)
df
I kinda dont see how it would be possible to use this as network data - however, if anyone got an idea on how to use that way more fine-grained network that be amazing ((:
best,
moritz
Maybe like this:
library(igraph)
g <- graph_from_data_frame(
rbind(
setNames(df[, c(1, 2, 4)], c("from", "to", "value")),
setNames(df[, c(1, 3, 4)], c("from", "to", "value"))
)
)
plot(g)
I am not sure if the code below is what you want
library(igraph)
v <- c(3,1)
g <- Reduce(union,lapply(v, function(k) graph_from_data_frame(df[-k])))
such that plot(g) gives
Related
I'd like to color the nodes of a graph based on an attribute in the original dataframe. But I think I haven't "carried through" that aestetic variable to the graph.
Example here that works:
library(dplyr)
library(igraph)
library(ggraph)
data <-
tibble(
from = c("a", "a", "a", "b", "b", "c"),
to = c(1, 2, 3, 1, 4, 2),
type = c("X", "Y", "Y", "X", "Y", "X")
)
graph <-
graph_from_data_frame(data)
ggraph(graph,
layout = "fr") +
geom_node_point() +
geom_edge_link()
I'd like something like geom_node_point(aes(color = type)), but haven't made type findable in the graph?
The issue here is that you added the type column as an edge-attribute whereas geom_node_point expects a vertex-attribute (see ?graph_from_data_frame: Additional columns are considered as edge attributes.).
Another issue is that type is not consistent for either node column (e.g. a is associated with type X and also Y, the same is true for node 2).
To address the first issue you could add additional vertex information to the vertices argument of the graph_from_data_frame function.
The simplest solution to address both issues is to add the type attribute after creating the graph:
data <-
tibble(
from = c("a", "a", "a", "b", "b", "c"),
to = c(1, 2, 3, 1, 4, 2)
)
graph <- graph_from_data_frame(data)
V(graph)$type <- bipartite.mapping(graph)$type
The bipartite.mapping function adds either TRUE or FALSE consistently to each vertex of different type.
Let's assume that we have following toy data:
library(tidyverse)
data <- tibble(
subject = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3),
id1 = c("a", "a", "b", "a", "a", "a", "b", "a", "a", "b"),
id2 = c("b", "c", "c", "b", "c", "d", "c", "b", "c", "c")
)
which represent network relationships for each subject. For example, there are three unique subjects in the data and the network for the first subject could be represented as sequence of relations:
a -- b, a --c, b -- c
The task is to compute centralities for each network. Using for loop this is straightforward:
library(igraph)
# Get unique subjects
subjects_uniq <- unique(data$subject)
# Compute centrality of nodes for each graph
for (i in 1:length(subjects_uniq)) {
current_data <- data %>% filter(subject == i) %>% select(-subject)
current_graph <- current_data %>% graph_from_data_frame(directed = FALSE)
centrality <- eigen_centrality(current_graph)$vector
}
Question: My dataset is huge so I wonder how to avoid explicit for loop. Should I use apply() and its modern cousins (maybe map() in the purrr package)? Any suggestions are greatly welcome.
Here is an option using map
library(tidyverse)
library(igraph)
map(subjects_uniq, ~data %>%
filter(subject == .x) %>%
select(-subject) %>%
graph_from_data_frame(directed = FALSE) %>%
{eigen_centrality(.)$vector})
#[[1]]
#a b c
#1 1 1
#[[2]]
# a b c d
#1.0000000 0.8546377 0.8546377 0.4608111
#[[3]]
#a b c
#1 1 1
I'm working with public transit data in the GTFS standard and have been building edge lists of origin stop to target stop in a sequence across an entire route. I've put some sample R code below to show a sample of the data and graph.
library(igraph)
# edgelist with two nodes with outdegree > 1.
edgelist <- data.frame(source = c("Z","A", "B", "C", "D", "E", "F", "F", "A"),
target = c("A","B", "C", "D", "E", "F", "G", "H", "I"),
edge_sequence = c(0,1, 2, 3, 4, 5, 6, NA , NA),
source_node_out_degree = c(1,1, 1, 1, 1, 1, 2, 2, 2),
group = factor(c(1,1,1,1,1,1,1,2,2)))
# i would like to remove edges within my sequence that have an outdegree of
# one and merge the original source with the
plot(graph.data.frame(edgelist), edge.arrow.size = 0.3)
Below is the edgelist I would like to generate. In this example I've reduced the connection from A->F because it is a. along the sequence and b. only nodes with an out degree of one are between A and F.
# the expected edgelist after simplifying the network. Connecting nodes that
# have outdegree > 1 on the sequence of edges.
new_expected_edgelist <- data.frame(source = c("Z","A", "F", "F", "A"),
target = c("A","F", "G", "H", "I"))
# edges with outdegree == 1 have been reduced.
plot(graph.data.frame(new_expected_edgelist), edge.arrow.size = 0.3)
The application of this would allow me to simplify my network for visualizing only the edges that are shared between multiple public transit routes. Some routes extend for many stops outside a connection to any other stop and make visualizing the complexity of network more difficult.
You can use the contract.vertices command:
g<-graph.data.frame(edgelist)
h<-contract.vertices(g,c(1,2,3,3,3,3,3,8,9,10))
I'm writing a function to aggregate a dataframe, and it needs to be generally applicable to a wide variety of datasets. One step in this function is dplyr's filter function, used to select from the data only the ad campaign types relevant to the task at hand. Since I need the function to be flexible, I want ad_campaign_types as an input, but this makes filtering kind of hairy, as so:
aggregate_data <- function(ad_campaign_types) {
raw_data %>%
filter(ad_campaign_type == ad_campaign_types) -> agg_data
agg_data
}
new_data <- aggregate_data(ad_campaign_types = c("campaign_A", "campaign_B", "campaign_C"))
I would think the above would work, but while it runs, oddly enough it only returns only a small fraction of what the filtered dataset should be. Is there a better way to do this?
Another tiny example of replaceable code:
ad_types <- c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d")
revenue <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
data <- as.data.frame(cbind(ad_types, revenue))
# Now, filtering to select only ad types "a", "b", and "d",
# which should leave us with only 7 values
new_data <- filter(data, ad_types == c("a", "b", "d"))
nrow(new_data)
[1] 3
For multiple criteria use %in% function:
filter(data, ad_types %in% c("a", "b", "d"))
you can also use "not in" criterion:
filter(data, !(ad_types %in% c("a", "b", "d")))
However notice that %in%'s behavior is a little bit different than ==:
> c(2, NA) == 2
[1] TRUE NA
> c(2, NA) %in% 2
[1] TRUE FALSE
some find one of those more intuitive than other, but you have to remember about the difference.
As for using multiple different criteria simply use chains of criteria with and/or statements:
filter(mtcars, cyl > 2 & wt < 2.5 & gear == 4)
Tim is correct for filtering a dataframe. However, if you want to make a function with dplyr, you need to follow the instructions at this webpage: https://rpubs.com/hadley/dplyr-programming.
The code I would suggest.
library(tidyverse)
ad_types <- c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d")
revenue <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
df <- data_frame(ad_types = as.factor(ad_types), revenue = revenue)
aggregate_data <- function(df, ad_types, my_list) {
ad_types = enquo(ad_types) # Make ad_types a quosure
df %>%
filter(UQ(ad_types) %in% my_list) # Unquosure
}
new_data <- aggregate_data(df = df, ad_types = ad_types,
my_list = c("a", "b", "c"))
That should work!
I implemented the FR test here and now I would like to test it by means of visualizing the resulting minimum spanning trees in R. The vertices and edges should be plotted in a coordinate system.
Moreover I want to set the color for every dot (depending on to which sample it belongs) and express a possible third dimension through the size of the dots.
This is what I have got so far:
library(ggplot2)
nodes <- data.frame(cbind(c("A", "A", "A", "B", "B", "B"), c(1,2,3,8,2,1), c(6,3,1,4,5,6)))
edges <- data.frame(cbind(c("A", "A", "A"), c("A", "B", "B"), c(1,3,2), c(6,1,5), c(2,8,1), c(3,4,6)))
p <- ggplot() +
geom_point(nodes, aes(x=nodes[,2], y=nodes[,3])) +
geom_line(edges)
p
I also think igraph would be best here...
nodes <- data.frame(a=c("A", "A", "A", "B", "B", "B"), b=c(1,2,3,8,2,1),
d=c(6,3,1,4,5,6))
#cbind made your nodes characters so i have removed it here
edges <- data.frame(a=c("A", "A", "A"), b=c("A", "B", "B"), d=c(1,3,2),
e=c(6,1,5), f=c(2,8,1), g=c(3,4,6))
Here is an example using your data as above, to produce the colours colouring with the coordinate layout system coords
library(igraph)
from <- c(rep(edges[,3],3),rep(edges[,4],2),edges[,5])
to <- c(edges[,4],edges[,5],edges[,6],edges[,5],edges[,6],edges[,6])
myedges <- data.frame(from,to)
actors <- data.frame(acts=c(1,2,3,4,5,6,8))
colouring <- sample(colours(), 7)
sizes <- sample(15,7)
coords<-cbind(x=runif(7,0,1),y=runif(7,0,1))
myg <- graph.data.frame(myedges, vertices=actors, directed=FALSE)
V(myg)$colouring <- colouring
V(myg)$sizes <- sizes
plot(myg,vertex.color=V(myg)$colouring,vertex.size=V(myg)$sizes,
layout=coords,edge.color="#55555533")
for plotting a spanning there are also many options, e.g.
d <- c(1,2,3)
E(myg)$colouring <- "#55555533"
E(myg, path=d)$colouring <- "red"
V(myg)[ d ]$colouring <- "red"
plot(myg,vertex.color=V(myg)$colouring,vertex.size=V(myg)$sizes
,edge.width=3,layout=coords,edge.color=E(myg)$colouring )
with axes:
plot(myg,vertex.color=V(myg)$colouring,vertex.size=V(myg)$sizes
,edge.width=3,layout=coords,edge.color=E(myg)$colouring, axes=TRUE )
and use rescale=FALSE to keep original axes scale