display clusters in radial format - r

I have a list of clusters lets say from cluster 1 to cluster 3; along with
their membership for example below. I would like to display the clusters in radial format. I was thinking of using the as.phylo function
in the ape package to display this, but that requires creating a hclust object.If anyone knows how to do this thats much appreciated creating a hclust object or otherwise.
Many Thanks!
cl var numberOfCluster
1 a 1
1 b 1
1 c 1
1 d 1
1 a 2
1 b 2
2 c 2
2 d 2
3 a 3
1 b 3
2 c 3
2 d 3
Thanks very much!

(This is a copy of my answer to a similar question from "crossvalidated")
Assuming you can create hclust (from variables which can have a distance measure defined on them) - then it can be done by combining two new packages: circlize and dendextend.
The plot can be made using the circlize_dendrogram function (allowing for a much more refined control over the "fan" layout of the plot.phylo function).
# install.packages("dendextend")
# install.packages("circlize")
library(dendextend)
library(circlize)
# create a dendrogram
hc <- hclust(dist(datasets::mtcars))
dend <- as.dendrogram(hc)
# modify the dendrogram to have some colors in the branches and labels
dend <- dend %>%
color_branches(k=4) %>%
color_labels
# plot the radial plot
par(mar = rep(0,4))
# circlize_dendrogram(dend, dend_track_height = 0.8)
circlize_dendrogram(dend, labels_track_height = NA, dend_track_height = .4)

Related

Grouping Set of Points to a Pre Defined Point

I'm looking to create a model that classifies a set of points that are near a pre-defined point.
For example, let's say I have points:
X
Y
1
1
1
2
1
3
2
1
2
3
3
1
3
2
3
3
6
6
8
7
8
5
9
3
10
7
My goal is to identify which points are closest to predefined point (2,2) and ideally output which points those are.
I tried using KNN, but I could not figure out how to get the KNN model to train results near (2,2). Any guidance to how I may accomplish this would be awesome. :)
Plot of Points
df <- data.frame( x = c(1,1,1,2,2,2,3,3,3,6,8,8,9,10), y = c(1,2,3,1,2,3,1,2,3,6,7,5,3,7))
df
goal_point <- c(x=2,y=2)
goal_point
You might approach this by calculating distance from goal as a feature.
df$dist = sqrt((df$x - goal_point["x"])^2 +
(df$y - goal_point["y"])^2)
df$clust = kmeans(df, 2)$cluster
library(ggplot2)
ggplot(df, aes(x, y, color = clust)) +
geom_point()
In this case kmeans is using x, y, and distance from goal. You could also use just distance from goal by using df$clust = kmeans(df[,3], 2)$cluster, which would lead here to the same clustering.

divide not rectangle plot into subplots within spatstat package in R

I have data that contains information about sub-plots with different numbers and their corresponding species types (more than 3 species within each subplot). Every species have X & Y coordinates.
> df
subplot species X Y
1 1 Apiaceae 268675 4487472
2 1 Ceyperaceae 268672 4487470
3 1 Vitaceae 268669 4487469
4 2 Ceyperaceae 268665 4487466
5 2 Apiaceae 268662 4487453
6 2 Magnoliaceae 268664 4487453
7 3 Magnoliaceae 268664 4487453
8 3 Apiaceae 268664 4487456
9 3 Vitaceae 268664 4487458
with these data, I have created ppp for the points of each subplot within a window of general plot (big).
grp <- factor(data$subplot)
win <- ripras(data$X, data$Y)
p.p <- ppp(data$X, data$Y, window = window, marks = grp)
Now I want to divide a plot into equal 3 x 3 sub-plots because there are 9 subplots. The genetal plot is not rectangular looks similar to rombo shape when I plot.
I could use quadrats() funcion as below but it has divided my plot into unequal subplots. Some are quadrat, others are traingle etc which I don't want. I want all the subplots to be equal sized quadrats (divide it by lines that paralel to each sides). Can you anyone guide me for this?
divide <-quadrats(p.patt,3,3)
plot(divide)
Thank you!
Could you break up the plot canvas into 3x3, then run each plot?
> par(mfrow=c(3,3))
> # run code for plot 1
> # run code for plot 2
...
> # run code for plot 9
To return back to one plot on the canvas type
> par(mfrow=c(1,1))
This is a question about the spatstat package.
You can use the function quantess to divide the window into tiles of equal area. If you want the tile boundaries to be vertical lines, and you want 7 tiles, use
B <- quantess(Window(p.patt), "x", 7)
where p.patt is your point pattern.

DiagrammeR: Devise a graph from a dataframe

Objective (In the R environment): extract nodes and edges from a dataframe to use them for modeling a graph!!
I am trying to learn how to work with DiagrammeR or any other graph modeling libraries in order to get a graph such as the one in below (you can follow the link [The GRAPH1]) from a data frame :
The data frame:
a b c classes
1 2 0 a
0 0 2 b
0 1 0 c
I have used DiagrammeR library and defined nodes and edges manually by these commands:
library(DiagrammeR)
egrViz("
digraph boxes_and_circles{
#add the node statement
node[shape=box]
a; b; c;
#add the nodge statement
a->a [label=1]; a-> b[label=2]; b->c[label=2]; c->b[label=1]
graph [nodesep=0.1]
}
")
Could you help me to understand how I can get the nodes and edges automatically? Thank you in advance.
You can do this with the igraph package. Your data frame is an adjacency matrix and igraph contains a function to make that into a graph. My code below adds a layout to position the vertices in the positions that you indicated in your sample graph.
## Your data
df = read.table(text="a b c classes
1 2 0 a
0 0 2 b
0 1 0 c",
header=TRUE)
library(igraph)
g = graph_from_adjacency_matrix(as.matrix(df[,1:3]), weighted=TRUE)
LO = matrix(c(0,0,0,3,2,1), ncol=2)
plot(g, layout=LO, edge.label=E(g)$weight, vertex.shape="rectangle",
vertex.color="white", edge.curved=c(0,0,0.15,0.15))

Color the individuals of a R PCoA plot by groups

Should be a simple question, but I haven't found exactly how to do it so far.
I have a matrix as follow:
sample var1 var2 var3 etc.
1 5 7 3 1
2 0 1 6 8
3 7 6 8 9
4 5 3 2 4
I performed a PCoA using Vegan and plotted the results. Now my problem is that I want to color the samples according to a pre-defined group:
group sample
1 1
1 2
2 3
2 4
How can I import the groups and then plot the points colored according to the group tey belong to? It looks simple but I have been scratching my head over this.
Thanks!
Seb
You said you used vegan PCoA which I assume to mean wcmdscale function. The default vegan::wcmdscale only returns a scores matrix similarly as standard stats::cmdscale, but if you added some special arguments (such as eig = TRUE) you get a full wcmdscale result object with dedicated plot and points methods and you can do:
plot(<pcoa-result>, type="n") # no reproducible example: edit like needed
points(<pcoa-result>, col = group) # no reproducible example: group must be visible
If you have a modern vegan (2.5.x) the following also works:
library(magrittr)
plot(<full-pcoa-result>, type = "n") %>% points("sites", col = group)

Get clusters from PCA r

I have a PCA that shows two really big clusters and I dont know how to figure out which of my samples are in each cluster.
If it helps, Im using prcomp to generate the PCA:
pca1 <- autoplot(prcomp(df), label = TRUE, label.size = 2)
My approach has been to attempt to cluster the PCA output using kmeans with 2 groups to get the clusters:
pca <- prcomp(df, scale.=TRUE)
clust <- kmeans(pca$x[,1:2], centers=2)$cluster
I can then make a beautiful plot, but I am still lost as to which samples are in each cluster. For reference, here is the plot generate if I graph the kmeans output:
As you can see in the first PCA plot, the labels literally say which sample each dot is. My ideal output would be a two column txt file with the sample name in one column, and the group it belongs to in the other column.
All that aside, if there is a better way, please let me know.
Thanks in advance.
Here is a chunk of my data:
a b c b e
Sample_1013 312011 624559 625898 534309 220415
Sample_1046 474774 949458 951145 843049 366136
Sample_104 645363 1290450 1292520 919474 272200
Sample_1057 267319 534685 535294 690574 422645
Sample_106 414065 830571 834527 657354 234130
Sample_107 299289 602483 603756 566256 262153
In my question, clust is the name of the output from my kmeans:
clust <- kmeans(pca$x[,1:2], centers=2)$cluster
I typed clust into the terminal and got which samples belong to each group:
> clust
Sample_1013 Sample_1046 Sample_104 Sample_1057 Sample_106 Sample_107
1 1 1 1 1 1
Sample_1098 Sample_109 Sample_1109 Sample_1129 Sample_1130 Sample_1140
1 1 1 1 1 1
Sample_1149 Sample_115 Sample_118 Sample_1220 Sample_1223 Sample_1225
1 1 1 1 1 1
Hopefully this helps someone.

Resources