how to create non-perfect binary tree with diagrammeR? - r

I use diagrammeR library in R to create and render binary trees. I find it very simple to use and it creates high quality renders. However creating a tree that is not fully (perfect) generates messy renders.
Here is what I get when my tree has 16 leaves (h = 4):
Fully binary tree
To be clear, every node label is the row name of nodes data.frame which indicates the sequence of nodes passed to the graph:
nodes$label = rownames(nodes)
And here is what I get if I add one node [32] from node [31] - either manually or by add_node() and add_edge() functions:
Non-perfect binary tree
As you can see, everything goes messy. I would like to have node [32] directly under node [31] with edge of straight vertical line. Is it even possible with this library? I can't figure out the proper order of nodes in nodes data.frame.
Here is how my full code looks like:
library(DiagrammeR)
from = c(1, 1, 2, 2, 3, 3, 4, 4, 7, 7, 10, 10, 11, 11, 14, 14, 17, 17, 18, 18, 19, 19, 22, 22, 25, 25, 26, 26, 29, 29)
to = c(2, 17, 3, 10, 4, 7, 5, 6, 8, 9, 11, 14, 12, 13, 15, 16, 18, 25, 19, 22, 20, 21, 23, 24, 26, 29, 27, 28, 30, 31)
h=4
n = 2^(h+1)-1
edges = data.frame(from, to)
nodes = data.frame(id = 1:n, label=1:n, shape='circle')
g1 = create_graph(nodes, edges)
render_graph(g1, layout='tree', title='g1')
# add node [32] and edge [31-32]
edges2 = rbind(edges, c(31, 32))
nodes2 = nodes
nodes2[32, 1:2] = 32
nodes2[32, 3] = 'circle'
g2 = create_graph(nodes2, edges2)
render_graph(g2, layout='tree', title='g2')

Related

I want to calculate a formula in R

I have a dataset that starts like this:
In dput it is
structure(list(20, TRUE, c(0, 0, 1, 1, 1, 1, 2, 3, 4, 4, 4, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7), c(8, 1, 0, 8, 9, 5,
8, 10, 10, 5, 7, 4, 11, 12, 6, 13, 14, 15, 16, 17, 18, 4, 5,
19, 4, 17), c(1, 0, 2, 5, 3, 4, 6, 7, 9, 10, 8, 11, 14, 12, 13,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), c(2, 1, 11, 21,
24, 5, 9, 22, 14, 10, 0, 3, 6, 4, 7, 8, 12, 13, 15, 16, 17, 18,
19, 25, 20, 23), c(0, 2, 6, 7, 8, 11, 21, 24, 26, 26, 26, 26,
26, 26, 26, 26, 26, 26, 26, 26, 26), c(0, 1, 2, 2, 2, 5, 8, 9,
10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26), list(c(1,
0, 1), structure(list(), names = character(0)), list(name = c("1",
"3", "5", "6", "8", "9", "12", "19", "2", "4", "7", "10", "11",
"14", "15", "16", "17", "18", "20", "13")), list(`Number of messages` = c(157,
1058, 2481, 833, 178, 119, 66, 222, 20, 343, 3, 4991, 47, 11,
83, 26, 10, 19, 33, 84, 51, 589, 79, 37, 110, 55))), <environment>), class = "igraph")
so far I have the following codelines:
Datensatz <- read_xlsx("...")
Netzwerkgraph <- graph.data.frame(Datensatz[,1:3], directed = TRUE)
actors<-Datensatz$From
relations<-Datensatz$To
weight<-Datensatz$`Number of messages`
How can I calculate the following formula in R with my data set?
I´ve tried the following code
Function <- function(i,j,x,y,z){
i <- actors
j <- relations
w <- weight
for(i in 1:20)
print (-1/(cumsum 1:length(actors, i)(w,i+1))logb(x,base=2)*1/(cumsum 1:length(actors, i)*w,i+1))
}
It isn't entirely clear how you wish to apply the given formula to your example data set, that is, exactly what inputs you are using and what outputs you wish to achieve. Hence, it also isn't clear if the following approach will be sufficient for your purposes. Here is my interpretation thus far.
If one interprets each unique value in the "from" column as being a node i, then it appears that you wish to calculate the sum of messages to each j in the "to" column for each sender i in the "from" column. One approach might then be to calculate all such sums by sender first and then run them all through a simple function that accepts the sum along with some lambda constant.
I used a lambda value of "2" below arbitrarily for illustrative purposes. Additionally, while the formula references a time t, there does not appear to be a time component in your example data set; time isn't represented in this approach. The output would presumably represent the expression for each node at a single point in time.
#written in R version 4.2.1
require(data.table)
##Example data frame
df = data.frame(from = c(1,1,3,3,3), to = c(2,3,1,2,4),nm = c(157,1058,2481,833,178))
df = data.table(df)
df
from to nm
1: 1 2 157
2: 1 3 1058
3: 3 1 2481
4: 3 2 833
5: 3 4 178
##Calculate the sum of messages by sender in "from" column
nf = df[,sum(nm), by = from]
colnames(nf) = c("from","message_total")
nf
from message_total
1: 1 1215
2: 3 3492
## Function
## inputs to function are the total number of messages of a sender in
## "from" column (called cit) and some lambda constant
icit = function(cit,lambda = 2){
-(1/(cit + lambda))*log(1/((cit + lambda)), base = 2)
}
##Find vector of values for each sender in the data set
ans = NULL
for(i in 1:dim(nf)[1]){
ans[i] = icit(nf$message_total[i])
}
ans
[1] 0.008421622 0.003368822

missing nodes in node view of graph after randomly removed some nodes

I created a graph G and I have a node view as following < 0, 1,2,... 100>
I randomly removed 20 nodes and the node view of this new graph misses the nodes I removed randomly. to be precise for example , in the new graph there are some nodes missing(since they are removed
node view <0,1,3,5,6,7,9 ...100>
however, I want this graph to be a new graph having node view such as the following:
<0,1,2....80>
is there any solution? I tried relabeling, coping the same graph, they didn't work
PS. my nodes have attribute label equal to either 0,1
and i want to preserve them
Here is one approach you can take. After removing your nodes from the graph you can relabel the remaining nodes using nx.relabel_nodes to get the node view you want. See example below:
import networkx as nx
import numpy as np
#Creating random graph
N_nodes=50
G=nx.erdos_renyi_graph(N_nodes,p=0.25)
#Removing random nodes
N_del_nodes=10
del_node_list=np.random.choice(N_nodes,size=N_del_nodes,replace=False)
G.remove_nodes_from(del_node_list)
print('Node view without relabelling:' +str(G.nodes))
#Relabelling graph
label_mapping={list(G.nodes)[j]:j for j in range(N_nodes-N_del_nodes)}
G_rel=nx.relabel_nodes(G, label_mapping)
print('Node view with relabelling:' +str(G_rel.nodes))
And the output gives:
Node view without relabelling:[0, 1, 2, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 30, 31, 32, 33, 34, 36, 37, 38, 40, 41, 44, 45, 46, 47, 48, 49]
Node view with relabelling:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]

Find Number of Nodes which have Links NetworkX

suppose i have a graph that looks like this:
Is there any possible way to only count nodes that have links? So instead of 6, it counts 5, since there is one node that doesn't have link.
Also, if I open a graph with read_edgelist, and afterwards use number_of_nodes function, is the function counts all nodes whether those have links or not or only those which has links/edges, since I opened it with read_edgelist? Thank you for your help.
If you want to filter out single nodes, you can iterate through graph nodes and keep only nodes that have neighbors.
With generator:
import networkx as nx
G = nx.Graph()
G = nx.fast_gnp_random_graph(40, 0.05, directed=False, seed=1)
print([n for n in G.nodes if len(list(G.neighbors(n))) > 0])
or filter function:
print(list(filter(lambda n: len(list(G.neighbors(n))) > 0, G.nodes)))
Both will print the same:
[0, 1, 2, 3, 4, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 38, 39]
If I understand the second part correctly, networkx returns you the number of all nodes in the graph, not only that have edges.

changing the spacing between vertices in iGraph in R

Suppose I want to make a plot with the following data:
pairs <- c(1, 2, 2, 3, 2, 4, 2, 5, 2, 6, 2, 7, 2, 8, 2, 9, 2, 10, 2, 11, 4,
14, 4, 15, 6, 13, 6, 19, 6, 28, 6, 36, 7, 16, 7, 23, 7, 26, 7, 33,
7, 39, 7, 43, 8, 35, 8, 40, 9, 21, 9, 22, 9, 25, 9, 27, 9, 33, 9,
38, 10, 12, 10, 18, 10, 20, 10, 32, 10, 34, 10, 37, 10, 44, 10, 45,
10, 46, 11, 17, 11, 24, 11, 29, 11, 30, 11, 31, 11, 33, 11, 41, 11,
42, 11, 47, 14, 50, 14, 52, 14, 54, 14, 55, 14, 56, 14, 57, 14, 58,
14, 59, 14, 60, 14, 61, 15, 48, 15, 49, 15, 51, 15, 53, 15, 62, 15,
63)
g <- graph( pairs )
plot( g,layout = layout.reingold.tilford )
I get a plot like the one below:
As you can see the spaces between some of the vertices are so small that these vertices overlap.
1. I wonder if there is a way to change the spacing between vertices.
2. In addition, is the spacing between vertices arbitrary? For example, Vertices 3, 4, and 5 are very close to each other, but 5 and 6 are far apart.
EDIT:
For my 2nd question, I guess the spacing is dependent on the number of nodes below. E.g., 10 and 11 are farther from each other than 8 and 9 are because there are more children below 10 and 11 than there are below 8 and 9.
I bet there is a better solution but I cannot find it. Here my approach. Since seems that a general parameter for width is missing you have to adjust manually parameters in order to obtain the desired output.
My approach is primarily to resize some elements of the plot in order to make them of the right size, adjust margins in order to optimize the space as much as possible. The most important parameter here is the asp parameter that controls the aspect ratio of the plot (since in this case the plot I guess is better long than tall an aspect ratio of even less than 0.5 is right). Other tricks are to diminish the size of vertex and fonts. Here is the code:
plot( g, layout = layout.reingold.tilford,
edge.width = 1,
edge.arrow.width = 0.3,
vertex.size = 5,
edge.arrow.size = 0.5,
vertex.size2 = 3,
vertex.label.cex = 1,
asp = 0.35,
margin = -0.1)
That produces this plot:
another approach would be to set graphical devices to PDF (or JPEG etc.) and then set the rescale to FALSE. With Rstudio viewer this cut off a huge piece of the data but with other graphic devices it might (not guarantee) work well.
Anyway for every doubt about how to use these parameters (that are very tricky sometimes) type help(igraph.plotting)
For the second part of the question I am not sure but looking inside the function I cannot figure out a precise answer but I guess that the space between elements on the same level is calculated on the child elements they have, say 3,4,5 have to be closer because they have child and sub-child and then they require more space.

Error in data.frame() arguments imply differing number of rows: 1, 11, 10, 3, 5, 4, 9, 2, 6, 7, 8, 12, 22, 13, 16, 14, 15, 19, 17, 20, 18, 28, 2

I am using this command in R Studio to split the data present in one column:
CTE.info <- data.frame(strsplit(as.character(CTE$V11),'|',fixed=TRUE))
But, I am getting the error:
Error in data.frame("orderItems", "79542;2;24.000;24.000;5.310", "Credit;1;-15.000;-15.000;.000", :
arguments imply differing number of rows: 1, 11, 10, 3, 5, 4, 9, 2, 6, 7, 8, 12, 22, 13, 16, 14, 15, 19, 17, 20, 18, 28, 24
Could someone assist and let me know how can this be sorted?
You can make the length of the list element same and it should work.
lst <- strsplit(as.character(CTE$V11),'|',fixed=TRUE)
d1 <- data.frame(lapply(lst, `length<-`, max(lengths(lst))))
colnames(d1) <- paste0('V', seq_along(d1))
data
CTE <- data.frame(V11= c('a|b|c', 'a|b', 'a|b|c|d'))

Resources