plotting hypervolumes from hypervolume() in 3d - r

I am using the hypervolume package to construct hypervolumes in n-dimensional space.
I can't figure out how to tweak 3d plots when plot() is operated over an object of class 'Hypervolume' or 'HypervolumeList'. The arguments are listed in the package documentation for function 'plot.HypervolumeList' (pg 21).
Example using data included with package:
# Call data and run routine that will create a HypervolumeList
require(hypervolume)
data(finch)
doHypervolumeFinchDemo = T
demo(finch)
# Create 3d plot of HypervolumeList
plot(hv_finches_list, pairplot = F)
Specific questions follow. Included code that I tried--but doesn't work.
How do I change the size of the data points?
plot(hv_finches_list, pairplot = F, cex.data = 5)
How do I choose whether to plot the uniformly random points?
plot(hv_fiches_list, pairplot = F, showrandom = T)
Is it possible to add centroids or contours to the 3d plot?
plot(hv_finches_list, pairplot = F, showcentroid = T)

Related

Is there a way to remove points from a Mclust classification plot in R?

I am trying to plot the GMM of my dataset using the Mclust package in R. While the plotting is a success, I do not want points to show in the final plot, just the ellipses. For a reference, here is the plot I have obtained:
GMM Plot
But, I want the resulting plot to have only the ellipses, something like this:
GMM desired plot
I have been looking at the Mclust plot page in: https://rdrr.io/cran/mclust/man/plot.Mclust.html and looking at the arguments of the function, I see there is a scope of adding other graphical parameters. Looking at the documentation of the plot function, there is a parameter called type = 'n' which might help to do what I want but when I write it, it produces the following error:
Error in plot.default(data[, 1], data[, 2], type = "n", xlab = xlab, ylab = ylab, :
formal argument "type" matched by multiple actual arguments
For reference, this is the code I used for the first plot:
library(mclust)
Data1_2 <- Mclust(Data, G=15)
summary(Data1_2, parameters = TRUE, classification = TRUE)
plot(Data1_2, what="classification")
The code I tried using for getting the graph below is:
Data1_4 <- Mclust(Data, G=8)
summary(Data1_4, parameters = TRUE, classification = TRUE)
plot(Data1_4, what="classification", type = "n")
Any help on this matter will be appreciated. Thanks!
If you look under the source code of plot.Mclust, it calls plot.Mclust.classification which in turn calls coordProj for the dot and ellipse plot. Inside this function, the size is controlled by the option CEX= and shape PCH=.
So for your purpose, do:
library(mclust)
clu = Mclust(iris[,1:4], G = 3, what="classification")
plot(clu,what="classification",CEX=0)

Visualizing PCA with large number of variables in R using ggbiplot

I am trying to visualize a PCA that includes 87 variables.
prc <-prcomp(df[,1:87], center = TRUE, scale. = TRUE)
ggbiplot(prc, labels = rownames(df[,1:87]), var.axes = TRUE)
When I create the biplot, many of the vectors overlap with each other, making it impossible to read the labels. I was wondering if there is any way to only show some of the labels at a time. For example, I think it'd be useful if I could create a few separate biplots with each one showing only a subset of the labels on the vectors.
This question seems closely related, but I don't know if it translates to the latest version of ggbiplot. I'm also not sure how to modify the original functions.
A potential solution is to use the factoextra package to visualize your PCA results. The fviz_pca_biplot() function includes a repel argument. When repel = TRUE the plot labels are spread out to minimize overlap. There are also select.var options mentioned in the documentation, such as select.var = list(contrib=5) to display only the 5 most influential vectors. Also a select.var = list(name) option that seems to allow for the specification of a specific subset of variables that you want shown.
# read data
df <- mtcars[, c(1:7,10:11)]
# perform PCA
library("FactoMineR")
res.pca <- PCA(df, graph = FALSE)
# visualize
library(factoextra)
fviz_pca_biplot(res.pca, repel = TRUE, select.var = list(contrib = 5))

How do I change the labels of a dendogram using the fviz_dend function from the "factoextra" package in R?

I am creating a shiny app that showcases different clustering techniques such as hierarchical and k-means clustering using a cereal dataset as an example. I am using the fviz_dend function from the "factoextra" package to create my dendogram. However, when I do this the dendogram does not show the names of the cereals as the labels, rather it shows the numerical representation instead. Is there a way to change the numerical values to labels? I am attaching below a picture of my current dendogram using the fviz_dend function and a picture of a dendogram I made using the plot function in base R. Note that the dendogram in created by the plot function has the labels of the cereals as I need them (what I am trying to achieve).
Dendogram Created Using fviz_dend:
### Code for dendogram using fvizdend
hc <- hclust(dist(scale(xv), method = input$dmeth), method = input$meth)
fviz_dend(hc, k = input$clustgroup, cex = 0.5, k_colors = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"),
color_labels_by_k = T, rect = T, show_labels = T)
Dendogram Created Using plot function:
hc <- hclust(dist(scale(xv), method = input$dmeth), method = input$meth)
plot(hc, labels = xv$Brand)
Have you tried setting the row.names of xv with the labels?
rownames(xv) <- xv$Brand
Change the labels of the hclust object(hc in your case) and then plot the dendrogram.
hc$labels <- xv$Brand
library("factoextra")
fviz_dend(hc, cex = 0.5)

How to plot an nmds with coloured/symbol points based on SIMPROF

Hi so i am trying to plot my nmds of a assemblage data which is in a bray-curtis dissimilarity matrix in R. I have been able to apply ordielipse(),ordihull() and even change the colours based on group factors created by cutree() of a hclst()
e.g using the dune data from the vegan package
data(dune)
Dune.dis <- vegdist(Dune, method = "bray)
Dune.mds <- metaMDS(Dune, distance = "bray", k=2)
#hierarchical cluster
clua <- hclust(Dune.dis, "average")
plot(clua, hang = -1)
# set groupings
rect.hclust(clua, 4)
grp <- cutree(clua, 4)
#plot mds
plot(Dune.mds, display = "sites", type = "text", cex = 1.5)
#show groupings
ordielipse(Dune.mds, group = grp, border =1, col ="red", lwd = 3)
or even colour the points just by the cutree
colvec <- c("red2", "cyan", "deeppink3", "green3")
colvec[grp]
plot(Dune.mds, display = "sites", type = "text", cex = 1.5) #or use type = "points"
points(P4.mds, col = colvec[c2], bg =colvec[c2], pch=21)
However what i really want to do is use the SIMPROF function using the package "clustsig" to then colour the points based on significant groupings - this is more of a technical coding language thing - i am sure there is a way to create a string of factors but i am sure there is a more efficient way to do it
heres my code so far for that:
simp <- simprof(Dune.dis, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "braycurtis", alpha = 0.05, sample.orientation = "row")
#plot dendrogram
simprof.plot(simp, plot = TRUE)
Now i am just not sure how do the next step to plot the nmds using the groupings defined by the SIMPROF - how do i make the SIMPROF results a factor string without literally typing it my self it myself?
Thanks in advance.
You wrote you know how to get colours from an hclust object with cutree. Then read the documentation of clustsig::simprof. This says that simprof returns an hclust object within its result object. It also returns numgroups which is the suggested number of clusters. Now you have all information you need to use the cutree of hclust you already know. If your simprof result is called simp, use cutree(simp$hclust, simp$numgroups) to extract the integer vector corresponding to the clustsig::simprof result, and use this to colours.
I have never used simprof or clustsig, but I gathered all this information from its documentation.

How to get a good dendrogram using R

I am using R to do a hierarchical cluster analysis using the Ward's squared euclidean distance. I have a matrix of x columns(stations) and y rows(numbers in float), the first row contain the header(stations' names). I want to have a good dendrogram where the name of the station appear at the bottom of the tree as i am not able to interprete my result. My aim is to find those stations which are similar. However using the following codes i am having numbers (100,101,102,...) for the lower branches.
Yu<-read.table("yu_s.txt",header = T, dec=",")
library(cluster)
agn1 <- agnes(Yu, metric = "euclidean", method="ward", stand = TRUE)
hcd<-as.dendrogram(agn1)
par(mfrow=c(3,1))
plot(hcd, main="Main")
plot(cut(hcd, h=25)$upper,
main="Upper tree of cut at h=25")
plot(cut(hcd, h=25)$lower[[2]],
main="Second branch of lower tree with cut at h=25")
A nice collection of examples are present here (http://gastonsanchez.com/blog/how-to/2012/10/03/Dendrograms.html)
Two methods:
with hclust from base R
hc<-hclust(dist(mtcars),method="ward")
plot(hc)
Default plot
ggplot
with ggplot and ggdendro
library(ggplot2)
library(ggdendro)
# basic option
ggdendrogram(hc, rotate = TRUE, size = 4, theme_dendro = FALSE)

Resources