Large Dataset for a directed, labeled graph - graph

i am looking for a big dataset for a labeled graph.
The graph should have the following characteristics:
labeled Nodes and Edges (In the best case, the graph contains several nodes/edges with the same label - no unique labels)
directed (hopefully - but may not necessarily needed)
I have already searched but found nothing matching with my problem. Only something like https://snap.stanford.edu/data/, but non of the graphs satisfy my desire.

Related

What is this type of graph called and how can it be plotted in R

A graph to indicate the percentage of elements from one node that are transferred to another node in two different stages. In principle, the number of nodes in one stage need not be equal to the number of nodes in the next stage. I would like to know the name of this type of graph and if it is possible to create it in R.
This is sankey diagram:
https://r-graph-gallery.com/sankey-diagram.html
You could find more info how to do it here:
https://plotly.com/r/sankey-diagram/

Single linkage hierarchical clustering - boxplots on height of the branches to detect outliers

before k-means clustering for consumer segmentation, I want to identify and delete outliers of my sample. I tried hierarchical clustering with single linkage algorithm. The problem is, I have a sample with more than 800 cases, and in my plot (single linkage dendrogram) the numbers are written across each other and therefore not readable, so it is impossible for me to clearly identify the outliers by just looking at the graph :-/
Here they say, you can create boxplots based on the branch distance to identify outliers in a more objective way. I thought that would be also a great way to just make the row numbers of the outliers in my dataset readable, however I am struggling with creating the boxplots..
https://link.springer.com/article/10.1186/s12859-017-1645-5/figures/3
Does anyone know, how to write the code to get the boxplots based on the height of the branches?
This is the code I use for clustering and attached you can see the plot
dr_dist<-dist(dr_ma_cluster[,c(148:154)])
hc_dr<-hclust(dr_dist,method = "single") #single linkage
plot(hc_dr,labels=(row.names(dr_ma_cluster)))
This is my failed trial to do the boxplot, as I don't know how to address the branch height
> boxplot(hc_dr)
Error in x[floor(d)] + x[ceiling(d)] :
non-numeric argument for binary operator
> boxplot(hc_dr[,c(148:154)])
Error in hc_dr[, c(148:154)] : Incorrect number of dimensions
And here another way to do the graph (and some automated outlier detection approach), but it makes the readability even worse with large datasets..
Another code to plot the tree, even less readable for large datasets:
Delete outliers automatically of a calculated agglomerative hierarchical clustering data
Thanks for any help!!
boxplot(hc_dr$height) as suggested by StupidWolf was the simple thing I was looking for.
Unfortunately I did not manage to label the outlier dots with the rownames from the original dataframe. Rownames from the branch height table were useless as they were assigned in ascending order.
hang = 0.0001 gave a better look to the dendrogram, but labels were still unreadable as still over eachother.
If anyone has a similar problem check R Shiny, zoomable dendrogram program
the code given there in the answer was super easy to adapt, resulting in a zoomable dendrogram, which makes it easy to identify the relevant cases (->outliers). for details search dendextendas proposed by csgroen.
Both together, the boxplot and this nice tool served to identify the rownames of the outliers after single linkage clustering in order to delete them before km means clustering

Graphviz: How to include multiple graphs in the same graph?

In Jupyter notebook, I am writing code that deals with a graph. It involves a series of transformations on the given graph. I am using graphviz to render the graphs inline. I can only render one graph at a time.
How do I render more than one graph side by side so that I can see successive transformations of the graph?
I know that 'subgraph' can be used to cluster different components of the graph. But I can't use it because it draws connections between all those subgraphs.
But I can't use it because it draws connections between all those subgraphs.
Sounds like the problem is that you have nodes with the same names across different subgraphs.
GraphViz has no per-subgraph namespacing mechanism. Therefore, you will need to somehow make all node names unique, even across subgraphs. You could do this by, for example, prefixing every node name with an unique subgraph ID.
Note that node labels don't need to be the same as node names. For more information, see:
graphviz: subgraph has same node, how to unique

How to make vertex shapes different in R plot.igraph?

I'm using R and the package igraph to plot a network. The many vertices in the graph belong to 50 groups. For each group, I would like to use a unique color/vertex shape combination to distinguish it from the others.
Is there a way to plot some vertices as say circles and others as squares according to their attributes?

Graphviz: break flat but sparsely connected graph into multiple rows?

Howto break a flat but sparsely connected graphviz graph into multiple rows?
Graphviz yields a graph of about 4 ranks, but over 9000 nodes wide. However since the graph is sparsely connected we could break it in to rows, for example each 1000 nodes, and thus make it fit on nine rows one page. How can this be done?
Not looking for unflatten, but rather something like line breaks in a text editor (is it clear what I am looking for?).
Edit: PDF with example graph here
Like GraphViz Documentation, section 2.5: Node and Edge Placement, like Figure 9: Graph with constrained ranks?
http://www.graphviz.org/pdf/dotguide.pdf

Resources