Suppose I want to make a plot with the following data:
pairs <- c(1, 2, 2, 3, 2, 4, 2, 5, 2, 6, 2, 7, 2, 8, 2, 9, 2, 10, 2, 11, 4,
14, 4, 15, 6, 13, 6, 19, 6, 28, 6, 36, 7, 16, 7, 23, 7, 26, 7, 33,
7, 39, 7, 43, 8, 35, 8, 40, 9, 21, 9, 22, 9, 25, 9, 27, 9, 33, 9,
38, 10, 12, 10, 18, 10, 20, 10, 32, 10, 34, 10, 37, 10, 44, 10, 45,
10, 46, 11, 17, 11, 24, 11, 29, 11, 30, 11, 31, 11, 33, 11, 41, 11,
42, 11, 47, 14, 50, 14, 52, 14, 54, 14, 55, 14, 56, 14, 57, 14, 58,
14, 59, 14, 60, 14, 61, 15, 48, 15, 49, 15, 51, 15, 53, 15, 62, 15,
63)
g <- graph( pairs )
plot( g,layout = layout.reingold.tilford )
I get a plot like the one below:
As you can see the spaces between some of the vertices are so small that these vertices overlap.
1. I wonder if there is a way to change the spacing between vertices.
2. In addition, is the spacing between vertices arbitrary? For example, Vertices 3, 4, and 5 are very close to each other, but 5 and 6 are far apart.
EDIT:
For my 2nd question, I guess the spacing is dependent on the number of nodes below. E.g., 10 and 11 are farther from each other than 8 and 9 are because there are more children below 10 and 11 than there are below 8 and 9.
I bet there is a better solution but I cannot find it. Here my approach. Since seems that a general parameter for width is missing you have to adjust manually parameters in order to obtain the desired output.
My approach is primarily to resize some elements of the plot in order to make them of the right size, adjust margins in order to optimize the space as much as possible. The most important parameter here is the asp parameter that controls the aspect ratio of the plot (since in this case the plot I guess is better long than tall an aspect ratio of even less than 0.5 is right). Other tricks are to diminish the size of vertex and fonts. Here is the code:
plot( g, layout = layout.reingold.tilford,
edge.width = 1,
edge.arrow.width = 0.3,
vertex.size = 5,
edge.arrow.size = 0.5,
vertex.size2 = 3,
vertex.label.cex = 1,
asp = 0.35,
margin = -0.1)
That produces this plot:
another approach would be to set graphical devices to PDF (or JPEG etc.) and then set the rescale to FALSE. With Rstudio viewer this cut off a huge piece of the data but with other graphic devices it might (not guarantee) work well.
Anyway for every doubt about how to use these parameters (that are very tricky sometimes) type help(igraph.plotting)
For the second part of the question I am not sure but looking inside the function I cannot figure out a precise answer but I guess that the space between elements on the same level is calculated on the child elements they have, say 3,4,5 have to be closer because they have child and sub-child and then they require more space.
Related
I created a graph G and I have a node view as following < 0, 1,2,... 100>
I randomly removed 20 nodes and the node view of this new graph misses the nodes I removed randomly. to be precise for example , in the new graph there are some nodes missing(since they are removed
node view <0,1,3,5,6,7,9 ...100>
however, I want this graph to be a new graph having node view such as the following:
<0,1,2....80>
is there any solution? I tried relabeling, coping the same graph, they didn't work
PS. my nodes have attribute label equal to either 0,1
and i want to preserve them
Here is one approach you can take. After removing your nodes from the graph you can relabel the remaining nodes using nx.relabel_nodes to get the node view you want. See example below:
import networkx as nx
import numpy as np
#Creating random graph
N_nodes=50
G=nx.erdos_renyi_graph(N_nodes,p=0.25)
#Removing random nodes
N_del_nodes=10
del_node_list=np.random.choice(N_nodes,size=N_del_nodes,replace=False)
G.remove_nodes_from(del_node_list)
print('Node view without relabelling:' +str(G.nodes))
#Relabelling graph
label_mapping={list(G.nodes)[j]:j for j in range(N_nodes-N_del_nodes)}
G_rel=nx.relabel_nodes(G, label_mapping)
print('Node view with relabelling:' +str(G_rel.nodes))
And the output gives:
Node view without relabelling:[0, 1, 2, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 30, 31, 32, 33, 34, 36, 37, 38, 40, 41, 44, 45, 46, 47, 48, 49]
Node view with relabelling:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
suppose i have a graph that looks like this:
Is there any possible way to only count nodes that have links? So instead of 6, it counts 5, since there is one node that doesn't have link.
Also, if I open a graph with read_edgelist, and afterwards use number_of_nodes function, is the function counts all nodes whether those have links or not or only those which has links/edges, since I opened it with read_edgelist? Thank you for your help.
If you want to filter out single nodes, you can iterate through graph nodes and keep only nodes that have neighbors.
With generator:
import networkx as nx
G = nx.Graph()
G = nx.fast_gnp_random_graph(40, 0.05, directed=False, seed=1)
print([n for n in G.nodes if len(list(G.neighbors(n))) > 0])
or filter function:
print(list(filter(lambda n: len(list(G.neighbors(n))) > 0, G.nodes)))
Both will print the same:
[0, 1, 2, 3, 4, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 38, 39]
If I understand the second part correctly, networkx returns you the number of all nodes in the graph, not only that have edges.
I use diagrammeR library in R to create and render binary trees. I find it very simple to use and it creates high quality renders. However creating a tree that is not fully (perfect) generates messy renders.
Here is what I get when my tree has 16 leaves (h = 4):
Fully binary tree
To be clear, every node label is the row name of nodes data.frame which indicates the sequence of nodes passed to the graph:
nodes$label = rownames(nodes)
And here is what I get if I add one node [32] from node [31] - either manually or by add_node() and add_edge() functions:
Non-perfect binary tree
As you can see, everything goes messy. I would like to have node [32] directly under node [31] with edge of straight vertical line. Is it even possible with this library? I can't figure out the proper order of nodes in nodes data.frame.
Here is how my full code looks like:
library(DiagrammeR)
from = c(1, 1, 2, 2, 3, 3, 4, 4, 7, 7, 10, 10, 11, 11, 14, 14, 17, 17, 18, 18, 19, 19, 22, 22, 25, 25, 26, 26, 29, 29)
to = c(2, 17, 3, 10, 4, 7, 5, 6, 8, 9, 11, 14, 12, 13, 15, 16, 18, 25, 19, 22, 20, 21, 23, 24, 26, 29, 27, 28, 30, 31)
h=4
n = 2^(h+1)-1
edges = data.frame(from, to)
nodes = data.frame(id = 1:n, label=1:n, shape='circle')
g1 = create_graph(nodes, edges)
render_graph(g1, layout='tree', title='g1')
# add node [32] and edge [31-32]
edges2 = rbind(edges, c(31, 32))
nodes2 = nodes
nodes2[32, 1:2] = 32
nodes2[32, 3] = 'circle'
g2 = create_graph(nodes2, edges2)
render_graph(g2, layout='tree', title='g2')
I have a dataset with 50 thousand rows that I want to sort according the the values in one of the columns. The numbers in the column go from 1-30, and when I do the following
data=data[order(data$columnname),]
it gets sorted so that the order of the columns is like this
1, 10, 11 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 3, 30, 4, 5, 6, 7, 8, 9
how could I sort it so that it is like this
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
For me it seems, that your format is not numeric. Try this:
data$columnname<-as.numeric(data$columnname)
data=data[order(data$columnname),]
How can I calculate the mean of the top 4 observations in my column?
c(12, 13, 15, 1, 5, 9, 34, 50, 60, 50, 60, 4, 6, 8, 12)
For instance, in the above I would have (50+60+50+60)/4 = 55. I only know how to use the quantile, but it does not work for this.
Any ideas?
Since you're interested in only the top 4 items, you can use partial sort instead of full sort. If your vector is huge, you might save quite some time:
x <- c(12, 13, 15, 1, 5, 9, 34, 50, 60, 50, 60, 4, 6, 8, 12)
idx <- seq(length(x)-3, length(x))
mean(sort(x, partial=idx)[idx])
# [1] 55
Try this:
vec <- c(12, 13, 15, 1, 5, 9, 34, 50, 60, 50, 60, 4, 6, 8, 12)
mean(sort(vec, decreasing=TRUE)[1:4])
gives
[1] 55
Maybe something like this:
v <- c(12, 13, 15, 1, 5, 9, 34, 50, 60, 50, 60, 4, 6, 8, 12)
mean(head(sort(v,decreasing=T),4))
First, you sort your vector so that the largest values are in the beginning. Then with head you take the 4 first values in that vector, subsequently taking the mean value of that.
To be different! Also, please try to do some research on your own before posting.
x <- c(12, 13, 15, 1, 5, 9, 34, 50, 60, 50, 60, 4, 6, 8, 12)
mean(tail(sort(x), 4))
Just to show that you can use quantile in this exercise:
mean(quantile(x,1-(0:3)/length(x),type=1))
#[1] 55
However, the other answers are clearly more efficient.
You could use the order function. Order by -x to give the values in descending order, and just average the first 4:
x <- c(12, 13, 15, 1, 5, 9, 34, 50, 60, 50, 60, 4, 6, 8, 12)
mean(x[order(-x)][1:4])
[1] 55