BFS father attribute in igraph is wrong - r

I am using the igraph package and I am uncertain whether it is a bug or not, but the $father output makes no sense sometimes. Specifically, when I rename the vertex attributes.
h<-make_tree(10)
#with normal vertex attributes
graph.bfs(h, root="1", neimode='out', order=TRUE, father=TRUE,unreachable=FALSE) #father output seems correct
plot(h,layout=layout_as_tree)
#with renamed vertex attributes
set.seed(1)
h<-set.vertex.attribute(h, "name", value=sample(1:10,10))
plot(h,layout=layout_as_tree)
graph.bfs(h, root="3", neimode='out', order=TRUE, father=TRUE,unreachable=FALSE) #father output seems wrong
I obtain the output as below
#with normal vertex attributes
$order
+ 10/10 vertices, from ff55a96:
[1] 1 2 3 4 5 6 7 8 9 10
$rank
NULL
$father
+ 10/10 vertices, from ff55a96:
[1] NA 1 1 2 2 3 3 4 4 5
#with renamed vertex attributes
$order
+ 10/10 vertices, named, from 170f7a0:
[1] 3 4 5 7 2 8 9 6 10 1
$rank
NULL
$father
+ 10/10 vertices, named, from 170f7a0:
[1] 3 4 5 7 2 8 9 6 10 1
I do not understand why the father for the renamed vertex attributes case is wrong. For example, the first element should be NA but its not.
Can someone explain what is happening? If so how do I fix this such that my father elements reflects something similar to the first case.

It's a bit strange, but for some reason, the bfs function has a straight assignment of the vertex names to the names of the father vector. See the 54-55 line of code in the source code:
if (father)
names(res$father) <- V(graph)$name
Clearly, this simply overwrites the names of res$father with the vector of names in the graph. Notice that this conditional statement requires the argument igraph_opt("add.vertex.names") to be true.
So we can avoid this behavior by setting the global option for adding vertex names to false.
> igraph_options()$add.vertex.names
[1] TRUE
> igraph_options(add.vertex.names=F)
> igraph_options()$add.vertex.names
[1] FALSE
Now it should work:
h<-make_tree(10)
set.seed(1)
h<-set_vertex_attr(h, "name", value=sample(1:10,10))
bfs(h, root=1, neimode='out', order=TRUE, rank=TRUE, father=TRUE,unreachable=FALSE)
Output:
$root
[1] 1
$neimode
[1] "out"
$order
+ 10/10 vertices, named:
[1] 3 4 5 7 2 8 9 6 10 1
$rank
[1] 1 2 3 4 5 6 7 8 9 10
$father
+ 10/10 vertices, named:
[1] <NA> 3 3 4 4 5 5 7 7 2
$pred
NULL
$succ
NULL
$dist
NULL
Might be worth raising this on the igraph github, since this seems (at least to me) like undesirable behavior.

Related

R - Print list in file and recover list

I have a list that looks like this:
> indices
$`48-168`
$`48-168`$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`60-180`
$`60-180`$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`180-300`
$`180-300`$`1`
[1] 1 2
$`180-300`$`4`
[1] 4 5 6 7 8 9 10
$`180-300`$`3`
[1] 3
I want to print it somehow in a file and then recover the same list later.
I though printing the object given by unlist(as.relistable(obj)) and use relist later but then I do not know how to recover the information from the file.
Given that your data is not particularly well structured, you might want to just use save() here, and save the original R list object:
save(indices, file="/path/to/your/file.txt")
When you want to load indices again, use the load() function:
load(file="/path/to/your/file.txt")

Problems with igraph, statnet and GEPHI?

I'm working with some graph models in both Gephi, Python and R. Until by chance I decided to compare the results they gave me.
So I had the following problem. When calculating the betweenness centrality with Gephi and R (using igraph and statnet), the three give me different results (igraph and statnet, not very different). Since I am working a very large network, I decided to take a small network and perform the calculation by hand, the figure shown below (taked from: enter link description here )
enter image description here
Using the adjacency list:
source target
1 2
1 3
1 4
2 3
3 4
4 5
4 6
5 6
5 8
5 7
6 8
6 7
7 8
7 9
To then see what results I threw R and Gephi. I discovered that Gephi gives me the same results:
enter image description here
But R (both for igraph and statnet not).
> library('igraph')
> data <- read.csv(file.choose())
> set.seed(123456)
> graph_1<-graph.data.frame(data)
> summary(graph_1)
IGRAPH cfa51db DN-- 9 14 --
+ attr: name (v/c)
> graph_1
IGRAPH cfa51db DN-- 9 14 --
+ attr: name (v/c)
+ edges from cfa51db (vertex names):
[1] 1->2 1->3 1->4 2->3 3->4 4->5 4->6 5->6 5->8 5->7 6->8 6->7 7->8 7->9
> betweenness(graph_1)
1 2 3 4 5 6 7 8 9
0 0 6 15 6 6 6 0 0
> detach("package:igraph", unload=TRUE)
> library(statnet)
> library(intergraph)
> graph_2<-asNetwork(graph_1)
> betweenness(graph_2)
[1] 0 0 6 15 6 6 6 0 0
Am I doing something wrong by running my R code or is it using another algorithm to calculate the betweenness centrality?
Thank you :)
You are computing two different things.
First, to make your example reproducible, here is code that all of us can use to make your example.
library(igraph)
EL = matrix(c(1,2, 1,3, 1,4, 2,3, 3,4, 4,5, 4,6, 5,6, 5,8,
5,7, 6,8, 6,7, 7,8, 7,9), ncol=2, byrow=T)
graph_1 = graph_from_edgelist(EL)
Now, using your code, I get the same result.
betweenness(graph_1)
[1] 0 0 6 15 6 6 6 0 0
However,
betweenness(graph_1, directed=F)
[1] 3 0 3 15 6 6 7 0 0
Gives the same result as you got from Gephi.
The help page ?betweenness says:
directed
Logical, whether directed paths should be considered while
determining the shortest paths.
Clearly, Gephi has different defaults than R.

How to get the ID of each node from topological sort?

I have a network (a directed acyclic graph):
dag_1 <- barabasi.game(20)
I applied a topological sort:
top1 <- topo_sort(dag_1)
top1
+ 20/20 vertices, from 0ee5d26:
[1] 5 8 11 13 14 15 16 17 18 20 4 7 12 19 2 10 9 6 3 1
If I type top1 and hit enter, the results are above. I need to access the vector
5 8 11 13, ..., 1
I tried top1[1] and top1[[1]]. Neither of them gave me the vector.
How can I get it?
top1 is an igraph.vs class object, and indexing e.g. top1[1:10] returns the vertices of the graph. To return a vector of the vertices use:
as.vector(top1)

Short(er) notation of selecting a part of a data.frame or other objects in R

I always get angry at my R code when I have to process dataframes, i.e. filtering out certain rows. The code gets very illegible as I tend to choose meaningful, but long, names for my objects. An example:
all.mutations.extra.large.name <- read.delim(filename)
head(all.mutations.extra.large.name)
id gene pos aa consequence V
ENSG00000105732 ZN574_HUMAN 81 x/N missense_variant 3
ENSG00000125879 OTOR_HUMAN 7 V/3 missense_variant 2
ENSG00000129194 SOX15_HUMAN 20 N/T missense_variant 3
ENSG00000099204 ABLM1_HUMAN 33 H/R missense_variant 2
ENSG00000103335 PIEZ1_HUMAN 11 Q/R missense_variant 3
ENSG00000171533 MAP6_HUMAN 39 A/G missense_variant 3
all.mutations.extra.large.name <- all.mutations.extra.large.name[which(all.mutations.extra.large.name$gene == ZN574_HUMAN)]
So in order to kick out all other lines in which I am not interested I need to reference 3 times the object all.mutations.extra.large.name. And reating this kind of step for different columns makes the code really difficult to understand.
Therefore my question: Is there a way to filter out rows by a criterion without referencing the object 3 times. Something like this would be beautiful: myobj[,gene=="ZN574_HUMAN"]
You can use subset for that:
subset(all.mutations.extra.large.name, gene == "ZN574_HUMAN")
Several options:
all.mutations.extra.large.name <- data.frame(a=1:5, b=2:6)
within(all.mutations.extra.large.name, a[a < 3] <- 0)
a b
1 0 2
2 0 3
3 3 4
4 4 5
5 5 6
transform(all.mutations.extra.large.name, b = b^2)
a b
1 1 4
2 2 9
3 3 16
4 4 25
5 5 36
Also check ?attach if you would like to avoid repetitive typing like all.mutations.extra.large.name$foo.

returning a list in R and functional programming behavior

I have a basic questions regarding functional programming in R.
Given a function that returns a list, such as:
myF <- function(x){
return (list(a=11,b=x))
}
why is it that the list returned when calling the function with a range or vector is always the same lenght for 'a'
Ex:
myF(1:10)
returns:
$a
[1] 11
$b
[1] 1 2 3 4 5 6 7 8 9 10
How can one change the behavior so that the 'a' list has the sample length as b's.
I am actually working with a bunch of S4 objects that do I cannot easily convert to list (using as.list) so _apply is not my first choice.
Thanks for any insight or help!
EDIT (Added further explanations)
I am not necessarily looking to just pad 'a' to makes its length equal to b's. However using the solution
as.list(data.frame(a=myA,b=x)) pads the 'a' with the same value computed first.
myF <- function(x){
myA = ceiling(runif(1, max=100))
return (as.list(data.frame(a=myA
,b=x)))
}
myF(1:5)
$a
[1] 79 79 79 79 79 79 79 79 79 79
$b
[1] 1 2 3 4 5 6 7 8 9 10
I still am not sure why that happens!
Thanks
are you just looking to have 11 repeated so that a is the same length as b? if so:
> myF <- function(x){
+ return (list(a=rep(11,length(x)),b=x))
+ }
> myF(1:10)
$a
[1] 11 11 11 11 11 11 11 11 11 11
$b
[1] 1 2 3 4 5 6 7 8 9 10
EDIT based on OP's clarification/comments. If you want 'a' to instead be a random vector with length equal to 'b':
> myF <- function(x){
+ return (list(a=ceiling(runif(length(x),max=100)),b=x))
+ }
> myF(1:10)
$a
[1] 4 31 8 45 25 74 36 95 64 32
$b
[1] 1 2 3 4 5 6 7 8 9 10
I don't quite understand what you mean by not being able to use as.list. You should be able to get a version of your function satisfying the requirement that all components of the list be equally long by doing:
myF <- function(x){
return as.list(data.frame(a=11,b=x))
}
EDIT:
The reason list does not work the way you expect is that list applied to a number of lists/vectors/e.t.c. is just that, a list of those lists/vectors/e.t.c.; it does not "inspect" their structure.
What I think you want is the additional semantics that the vectors contained in the list should match up and produce a set of "rows", each with one corresponding element from each one of your vectors. This is exactly what a data frame is suppose to be (indeed how, I think, a data frame is represented in R). The final as.list call does little but change what type its tagged as.
EDIT2:
Note that if I'm wrong above (and that's not the general behaviour you want) then Mac's solution is more appropriate, as it gives you exactly the behaviour that both the vectors should have the same length, without implying that they should "line up".
This would both be confusing to anyone reading the code (as using a data.frame implies you think of your vectors as matching up) as well as forcing any additional elements you add to the list to be converted into vectors of the appropriate length (which may or may not be what you want)
In case I did not understand you correctly last time, here is another possibility:
If you want to generate a second vector, given some function/expression, of the same length as your argument you could do something like:
myF <- function(x){
return (list(a=replicate(length(x),f),b=x))
}
in your example f could be runif(1, max=100), though in the specific case of runif you could explicitly tell it to generate a vector of appropriate length by calling runif(length(x), max=100) inside the function.
replicate simply re-evaluates f the number of times you request, and gives you the vector of all the results.
It appears that your function is "hard coding" a. So no matter what you specify it will always give 11.
If for example you changed the function to:
myF <- function(x){ return (list(a=x,b=x)) }
myF(1:10)
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] 1 2 3 4 5 6 7 8 9 10
a is allowed to change like b.
or
myF <- function(x,y){ return (list(a=y,b=x)) }
myF(10:1,1:10)
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] 10 9 8 7 6 5 4 3 2 1
Now a is allowed to change independent of b.

Resources