Dendrogram in R by complete linkage not spaced properly

Dendrogram in R by complete linkage not spaced properly - r

I am a newbie with R. To speak in a very understandable way, What I want to achieve is a dendrogram like this
What I want the dendrogram to look like
and how I get it is like this,
How I am getting it
This is the code, that I ran,
tb <- read.csv("COM_PDT.csv", row.names = 1)
> d = as.dist(tb)
> hc.c <- hclust(d)
> plot(hc.c, hang = -1)
And here is the data set,
,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48
1,,0,0,0,0,0,0,0,3,1,4,0,4,0,3,0,0,4,0,0,0,4,0,0,1,1,4,3,0,3,3,0,1,0,4,4,0,0,0,0,1,0,3,0,0,0,4,1
2,0,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,2,3,1,0,0,0,1,0,0,0,0
3,0,0,,2,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1
4,0,0,2,,1,1,1,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,1,1,0,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0,1
5,0,0,0,1,,3,0,0,0,1,0,1,0,0,1,0,2,0,2,1,1,0,3,2,3,0,0,1,2,1,1,1,2,2,0,0,2,1,1,1,1,2,0,0,2,2,0,0
6,0,0,0,1,3,,0,1,0,2,0,2,0,1,1,1,3,0,3,1,1,0,4,2,3,0,0,1,3,1,1,0,3,3,0,0,2,0,0,0,2,1,0,0,3,3,0,1
7,0,0,1,1,0,0,,3,0,0,0,2,0,3,0,2,1,0,0,1,3,0,0,2,0,1,0,0,1,0,0,0,0,1,0,0,2,0,0,0,1,1,0,0,1,1,0,2
8,0,0,1,1,0,1,3,,0,1,0,3,0,4,0,3,2,0,1,1,2,0,1,1,0,1,0,0,2,0,0,0,1,2,0,0,1,0,0,0,2,1,0,0,2,2,0,3
9,3,0,0,0,0,0,0,0,,1,3,0,3,0,2,0,0,3,0,0,0,3,0,0,0,1,3,2,0,2,2,0,1,0,3,3,0,0,0,0,1,0,3,0,0,0,3,1
10,1,0,0,1,1,2,0,1,1,,1,1,1,1,1,1,1,1,1,0,0,1,2,0,1,0,1,2,1,2,2,0,3,1,1,1,0,0,0,0,2,1,1,1,1,1,1,2
11,4,0,0,0,0,0,0,0,3,1,,0,4,0,3,0,0,4,0,0,0,4,0,0,1,1,4,3,0,3,3,0,1,0,3,4,0,0,0,0,1,0,3,0,0,0,4,1
12,0,0,0,0,1,2,2,3,0,1,0,,0,3,1,2,3,0,2,2,3,0,2,2,1,0,0,0,3,0,0,0,2,3,0,0,2,0,0,0,3,0,0,0,3,3,0,2
13,4,0,0,0,0,0,0,0,3,1,4,0,,0,3,0,0,4,0,0,0,4,0,0,1,1,4,3,0,3,3,0,1,0,3,4,0,0,0,0,1,0,3,0,0,0,4,1
14,0,0,1,1,0,1,3,4,0,1,0,3,0,,0,3,2,0,1,1,2,0,1,1,0,1,0,0,2,0,0,0,1,2,0,0,1,0,0,0,2,1,0,0,2,2,0,3
15,3,0,0,0,1,1,0,0,2,1,3,1,3,0,,0,1,3,1,1,1,3,1,1,2,1,3,2,1,2,2,0,2,1,2,3,1,0,0,0,2,0,2,0,1,1,3,1
16,0,0,1,1,0,1,2,3,0,1,0,2,0,3,0,,2,0,1,2,1,0,1,1,0,2,0,0,2,0,0,0,1,2,0,0,1,0,0,0,2,1,0,0,2,2,0,3
17,0,0,0,0,2,3,1,2,0,1,0,3,0,2,1,2,,0,3,2,2,0,3,3,2,0,0,0,4,0,0,0,2,4,0,0,3,0,0,0,3,0,0,0,4,4,0,2
18,4,0,0,0,0,0,0,0,3,1,4,0,4,0,3,0,0,,0,0,0,4,0,0,1,1,4,3,0,3,3,0,1,0,3,4,0,0,0,0,1,0,3,0,0,0,4,1
19,0,0,0,0,2,3,0,1,0,1,0,2,0,1,1,1,3,0,,1,1,0,3,2,2,0,0,0,3,0,0,1,2,3,0,0,2,1,0,1,2,0,0,0,3,3,0,1
20,0,0,0,1,1,1,1,1,0,0,0,2,0,1,1,2,2,0,1,,2,0,1,2,1,1,0,0,2,0,0,0,1,2,0,0,2,0,0,0,2,0,0,0,2,2,0,1
21,0,0,0,0,1,1,3,2,0,0,0,3,0,2,1,1,2,0,1,2,,0,1,3,1,0,0,0,2,0,0,0,1,2,0,0,3,0,0,0,2,0,0,0,2,2,0,1
22,4,0,0,0,0,0,0,0,3,1,4,0,4,0,3,0,0,4,0,0,0,,0,0,1,1,4,3,0,3,3,0,1,0,3,4,0,0,0,0,1,0,3,0,0,0,4,1
23,0,0,0,1,3,4,0,1,0,2,0,2,0,1,1,1,3,0,3,1,1,0,,2,3,0,0,1,3,1,1,0,3,3,0,0,2,0,0,0,2,1,0,0,3,3,0,1
24,0,0,0,0,2,2,2,1,0,0,0,2,0,1,1,1,3,0,2,2,3,0,2,,2,0,0,0,3,0,0,0,1,3,0,0,4,0,0,0,2,0,0,0,3,3,0,1
25,1,0,0,1,3,3,0,0,0,1,1,1,1,0,2,0,2,1,2,1,1,1,3,2,,0,1,2,2,2,2,0,2,2,0,1,2,0,0,0,1,1,0,0,2,2,1,0
26,1,0,1,1,0,0,1,1,1,0,1,0,1,1,1,2,0,1,0,1,0,1,0,0,0,,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,1,1
27,4,0,0,0,0,0,0,0,3,1,4,0,4,0,3,0,0,4,0,0,0,4,0,0,1,1,,3,0,3,3,0,1,0,3,4,0,0,0,0,1,0,3,0,0,0,4,1
28,3,0,0,1,1,1,0,0,2,2,3,0,3,0,2,0,0,3,0,0,0,3,1,0,2,0,3,,0,4,4,0,2,0,2,3,0,0,0,0,1,1,2,0,0,0,3,1
29,0,0,0,0,2,3,1,2,0,1,0,3,0,2,1,2,4,0,3,2,2,0,3,3,2,0,0,0,,0,0,0,2,4,0,0,3,0,0,0,3,0,0,0,4,4,0,2
30,3,0,0,1,1,1,0,0,2,2,3,0,3,0,2,0,0,3,0,0,0,3,1,0,2,0,3,4,0,,4,0,2,0,2,3,0,0,0,0,1,1,2,0,0,0,3,1
31,3,0,0,1,1,1,0,0,2,2,3,0,3,0,2,0,0,3,0,0,0,3,1,0,2,0,3,4,0,4,,0,2,0,2,3,0,0,0,0,1,1,2,0,0,0,3,1
32,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,,0,0,0,0,0,4,3,3,0,1,0,0,0,0,0,0
33,1,0,0,1,2,3,0,1,1,3,1,2,1,1,2,1,2,1,2,1,1,1,3,1,2,0,1,1,2,1,2,0,,2,1,1,1,0,0,0,3,1,1,0,2,2,1,2
34,0,0,0,0,2,3,1,2,0,1,0,3,0,2,1,2,4,0,3,2,2,0,3,3,2,0,0,0,4,0,0,0,2,,0,0,3,0,0,0,3,0,0,0,4,4,0,2
35,4,0,0,0,0,0,0,0,3,1,4,0,4,0,3,0,0,4,0,0,0,4,0,0,1,1,4,3,0,3,3,0,1,0,,4,0,0,0,0,1,0,3,0,0,0,4,1
36,4,0,0,0,0,0,0,0,3,1,4,0,4,0,3,0,0,4,0,0,0,4,0,0,1,1,4,3,0,3,3,0,1,0,3,,0,0,0,0,1,0,3,0,0,0,4,1
37,0,0,0,0,2,2,2,1,0,0,0,2,0,1,1,1,3,0,2,2,3,0,2,4,2,0,0,0,3,0,0,0,1,3,0,0,,0,0,0,2,0,0,0,3,3,0,1
38,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,,3,3,0,1,0,0,0,0,0,0
39,0,3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,3,,2,0,1,0,0,0,0,0,0
40,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,3,2,,0,2,0,0,0,0,0,0
41,1,0,0,0,1,2,1,2,1,2,1,3,1,2,2,2,3,1,2,2,2,1,2,2,1,0,1,1,3,1,1,0,3,3,1,1,2,0,0,0,,0,1,0,3,3,1,3
42,0,0,1,2,2,1,1,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,2,0,,0,0,0,0,0,1
43,3,0,0,0,0,0,0,0,3,1,3,0,3,0,2,0,0,3,0,0,0,3,0,0,0,1,3,2,0,2,2,0,1,0,3,3,0,0,0,0,1,0,,0,0,0,3,1
44,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,0,0,0,0
45,0,0,0,0,2,3,1,2,0,1,0,3,0,2,1,2,4,0,3,2,2,0,3,3,2,0,0,0,4,0,0,0,2,4,0,0,3,0,0,0,3,0,0,0,,4,0,2
46,0,0,0,0,2,3,1,2,0,1,0,3,0,2,1,2,4,0,3,2,2,0,3,3,2,0,0,0,4,0,0,0,2,4,0,0,3,0,0,0,3,0,0,0,4,,0,2
47,4,0,0,0,0,0,0,0,3,1,4,0,4,0,3,0,0,4,0,0,0,4,0,0,1,1,4,3,0,3,3,0,1,0,3,4,0,0,0,0,1,0,3,0,0,0,,1
48,1,0,1,1,0,1,2,3,1,2,1,2,1,3,1,3,2,1,1,1,1,1,1,1,0,1,1,1,2,1,1,0,2,2,1,1,1,0,0,0,3,1,1,0,2,2,1,
Please help me to get a clean dendrogram that is neatly spaced and the end nodes properly at the floor of the graph!

I think you need to write dist(tb), not as.dist(tb). That will help the branching appearance of how it is being plotted. Change hang = to adjust the labeling, though initially I haven't been able to produce exactly how the labels are in your desired plot.
d = dist(tb)
hc.c <- hclust(d)
plot(hc.c, hang = -1)

Related

FactorMiner plot.HCPC function for cluster labeling

This is the function that is part of FactorMiner package
https://github.com/cran/FactoMineR/blob/master/R/plot.HCPC.R
As an example this is the code I ran
res.pca <- PCA(iris[, -5], scale = TRUE)
hc <- HCPC(res.pca, nb.clust=-1,)
plot.HCPC(hc, choice="3D.map", angle=60)
hc$call$X$clust <- factor(hc$call$X$clust, levels = unique(hc$call$X$clust))
plot(hc, choice="map")
The difference is when i run this hc$call$X$clust <- factor(hc$call$X$clust, levels = unique(hc$call$X$clust))
before plot.HCPC this doesn't change the annotation in the figure but when I do the same thing before I ran this plot(hc, choice="map") it is reflected in the final output.
When i see the plot.HCPC function this is the line of the code that does embed the cluster info into the figure
for(i in 1:nb.clust) leg=c(leg, paste("cluster",levs[i]," ", sep=" "))
legend("topleft", leg, text.col=as.numeric(levels(X$clust)),cex=0.8)
My question I have worked with small function where I understand when i edit or modify which one goes where and does what here in this case its a complicated function at least to me so Im not sure how do I modify that part and get what I would like to see.
I would like to see in case of my 3D dendrogram each of the cluster are labelled with group the way we can do in complexheatmap where we can annotate that are in row or column with a color code so it wont matter what the order in the data-frame we can still identify(it's just visual thing I know but I would like to learn how to modify these)

iGraph - Spacing between verticies

I have a dataset called data. The data is not that important, but every interaction has a name. I want to create a graph in iGraph with the following code:
tab <- count(data, B, S, K)
factors <- table(interaction(tab$B, tab$K),interaction(tab$S,tab$K))
graph1 <- graph_from_incidence_matrix(factors)
plot(graph1, vertex.size = 40, layout = layout.bipartite)
However, I get the following:
All the names of interactions are completely mixed together. I can make it a little more readable by lowering the vertex.size, but I want to find a solution to my problem.
I want to create more space between the verticies, but I cannot seem to find the right way.
I have tried creating a manual graph by using tkplot, but it is annoying that I manually have to sort them each time.
Best regards

Suppress graph output of a function [duplicate]

I am trying to turn off the display of plot in R.
I read Disable GUI, graphics devices in R but the only solution given is to write the plot to a file.
What if I don't want to pollute the workspace and what if I don't have write permission ?
I tried options(device=NULL) but it didn't work.
The context is the package NbClust : I want what NbClust() returns but I do not want to display the plot it does.
Thanks in advance !
edit : Here is a reproducible example using data from the rattle package :)
data(wine, package="rattle")
df <- scale (wine[-1])
library(NbClust)
# This produces a graph output which I don't want
nc <- NbClust(df, min.nc=2, max.nc=15, method="kmeans")
# This is the plot I want ;)
barplot(table(nc$Best.n[1,]),
xlab="Numer of Clusters", ylab="Number of Criteria",
main="Number of Clusters Chosen by 26 Criteria")

You can wrap the call in
pdf(file = NULL)
and
dev.off()
This sends all the output to a null file which effectively hides it.

Luckily it seems that NbClust is one giant messy function with some other functions in it and lots of icky looking code. The plotting is done in one of two places.
Create a copy of NbClust:
> MyNbClust = NbClust
and then edit this function. Change the header to:
MyNbClust <-
function (data, diss = "NULL", distance = "euclidean", min.nc = 2,
max.nc = 15, method = "ward", index = "all", alphaBeale = 0.1, plotetc=FALSE)
{
and then wrap the plotting code in if blocks. Around line 1588:
if(plotetc){
par(mfrow = c(1, 2))
[etc]
cat(paste(...
}
and similarly around line 1610. Save. Now use:
nc = MyNbClust(...etc....)
and you see no plots unless you add plotetc=TRUE.
Then ask the devs to include your patch.

How to incresase igraph distance for each edge in R?

I have a CSV file need to draw a graph.
The graph contains nodes and edges.
Therefore, I used the following code to do it.
start.time <- Sys.time()
#Loading Packages
library(igraph)
library(readr)
library(haven)
#import data
df = read.csv('../../Pre_Draw_Graph_for_R.csv', header = TRUE, encoding = 'UTF-8')
#Creating an iGraph Style Edge List
df_Edge_List <- df
#Creating Graph
df_graph = graph.data.frame(df_Edge_List, directed = TRUE)
#df Network: First Try
#Layout Options
set.seed(3500)
layout1 <- layout.fruchterman.reingold(df_graph)
#Node or vertex Options: Color
V(df_graph)$color <- "yellow"
V(df_graph)[degree(df_graph, mode = "in") > 500]$color <- "red"
#Edge Options: Size
E(df_graph)$color <- "grey"
#Plotting
plot(df_graph, vertex.label=NA)
#plot(df_graph)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
I can output the following result, but the result has some problems. It shows that every node is very crowded.
I hope to increase the graph distance, but I used a lot of methods already. I cannot fix the problem. I hope to get the following result, but I cannot do it now. I want to make it clear to see the graph.
I used the following to show dput result.
dput(df_graph, file = "G.R")
Due to the data is too big, I used google link to share it
https://drive.google.com/file/d/1wdF8ZKFde8bDSrFrN9e9KfD3tm6dN_s0/view?usp=sharing
Can anyone help me? Thanks

You can get most of the way to a nice graph with a few simple changes. First, change the vertex labeling to numbers as in your second graph. Also, use layout_components so that the components are separated from each other.
library(igraph)
set.seed(1234)
LOC = layout_components(df_graph)
plot(df_graph, layout=LOC, vertex.label=1:vcount(df_graph))
This is OK, but the arrows are a mess. There are two reasons for that: the arrow heads are too big and most (all?) vertices have loops back to themselves. Let's remove the loops and reduce the size of the arrows.
This looks a lot more useable. If you want anything nicer, you will need to start moving around badly placed vertices by hand. That is a lot more work, but you get a nicer picture.
LOC[ 5,] = c(-1.5,-5)
LOC[ 9,] = c(-3,-2)
LOC[16,] = c(-3,14)
LOC[21,] = c(-3,18)
LOC[25,] = c(9,18)
LOC[29,] = c(-9,18)
LOC[35,] = c(-6,15)
LOC[30,] = c(-9,12)
LOC[31,] = c(-6,18)
LOC[37,] = c(1,14)
LOC[38,] = c(-9,15)
LOC[44,] = c(1,18)
plot(DFGS, layout=LOC, vertex.label=1:vcount(df_graph),
edge.arrow.size = 0.5)
More could be done with the hand-editing, but I will leave that to you.

Error plotting Kohonen maps in R?

I was reading through this blog post on R-bloggers and I'm confused by the last section of the code and can't figure it out.
http://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
I've attempted to recreate this with my own data. I have 5 variables that follow an exponential distribution with 2755 points.
I am fine with and can plot the map that it generates:
plot(som_model, type="codes")
The section of the code I don't understand is the:
var <- 1
var_unscaled <- aggregate(as.numeric(training[,var]),by=list(som_model$unit.classif),FUN = mean, simplify=TRUE)[,2]
plot(som_model, type = "property", property=var_unscaled, main = names(training)[var], palette.name=coolBlueHotRed)
As I understand it, this section of the code is suppose to be plotting one of the variables over the map to see what it looks like but this is where I run into problems. When I run this section of the code I get the warning:
Warning message:
In bgcolors[!is.na(showcolors)] <- bgcol[showcolors[!is.na(showcolors)]] :
number of items to replace is not a multiple of replacement length
and it produces the plot:
Which just some how doesn't look right...
Now what I think it has come down to is the way the aggregate function has re-ordered the data. The length of var_unscaled is 789 and the length of som_model$data, training[,var] and unit.classif are all of length 2755. I tried plotting the aggregated data, the result was no warning but an unintelligible graph (as expected).
Now I think it has done this because unit.classif has a lot of repeated numbers inside it and that's why it has reduced in size.
The question is, do I worry about the warning? Is it producing an accurate graph? What exactly is the "Property"'s section looking for in the plot command? Is there a different way I could "Aggregate" the data?

I think that you have to create the palette color. If you put the argument
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
and then try to get a plot, for example
plot(som_model, type = "count", palette.name = coolBlueHotRed)
the end is succesful.
This link can help you: http://rgm3.lab.nig.ac.jp/RGM/R_rdfile?f=kohonen/man/plot.kohonen.Rd&d=R_CC

I think that not all of the cells on your map have points inside.
You have 30 by 30 map and about 2700 points. In average it's about 3 points per cell. With high probability some cells have more than 3 points and some cells are empty.
The code in the post on R-bloggers works well when all of the cells have points inside.
To make it work on your data try change this part:
var <- 1
var_unscaled <- aggregate(as.numeric(training[, var]), by = list(som_model$unit.classif), FUN = mean, simplify = TRUE)[, 2]
plot(som_model, type = "property", property = var_unscaled, main = names(training)[var], palette.name = coolBlueHotRed)
with this one:
var <- 1
var_unscaled <- aggregate(as.numeric(data.temp[, data.classes][, var]),
by = list(som_model$unit.classif),
FUN = mean,
simplify = T)
v_u <- rep(0, max(var_unscaled$Group.1))
v_u[var_unscaled$Group.1] <- var_unscaled$x
plot(som_model,
type = "property",
property = v_u,
main = colnames(data.temp[, data.classes])[var],
palette.name = coolBlueHotRed)
Hope it helps.

Just add these functions to your script:
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
pretty_palette <- c("#1f77b4","#ff7f0e","#2ca02c", "#d62728","#9467bd","#8c564b","#e377c2")