Supplementary variables in ggbiplot PCA - r

I am trying to draw PCA results with ggbiplot, how can I draw supplementary variables ?
I found this discussion for MCA results, but I would like to have the arrows as well...
data(wine)
wine.pca <- PCA(wine, scale. = TRUE, quanti.sup = c(4,5))
plot(wine.pca)
ggbiplot(wine.pca)
Besides, this code gives me an error :
1: In sweep(pcobj$ind$coord, 2, 1/(d * nobs.factor), FUN = "*") :
STATS is longer than the extent of 'dim(x)[MARGIN]'
2: In sweep(v, 2, d^var.scale, FUN = "*") :
STATS is longer than the extent of 'dim(x)[MARGIN]'

I tried your code and didn't reproduce your error but had other problems. I googled PCA() and found the package used to do the PCA was FactoMineR. After looking at the documentation, I also changed scale. to scale.unit and quanti.sup to quali.sup, giving the correct columns the categorical variables are in.
library(FactoMineR)
data(wine)
wine.pca <- PCA(wine, scale.unit = TRUE, quali.sup = c(1,2))
plot(wine.pca)
ggbiplot(wine.pca)
That should give the correct output.

Related

Is there a way to remove points from a Mclust classification plot in R?

I am trying to plot the GMM of my dataset using the Mclust package in R. While the plotting is a success, I do not want points to show in the final plot, just the ellipses. For a reference, here is the plot I have obtained:
GMM Plot
But, I want the resulting plot to have only the ellipses, something like this:
GMM desired plot
I have been looking at the Mclust plot page in: https://rdrr.io/cran/mclust/man/plot.Mclust.html and looking at the arguments of the function, I see there is a scope of adding other graphical parameters. Looking at the documentation of the plot function, there is a parameter called type = 'n' which might help to do what I want but when I write it, it produces the following error:
Error in plot.default(data[, 1], data[, 2], type = "n", xlab = xlab, ylab = ylab, :
formal argument "type" matched by multiple actual arguments
For reference, this is the code I used for the first plot:
library(mclust)
Data1_2 <- Mclust(Data, G=15)
summary(Data1_2, parameters = TRUE, classification = TRUE)
plot(Data1_2, what="classification")
The code I tried using for getting the graph below is:
Data1_4 <- Mclust(Data, G=8)
summary(Data1_4, parameters = TRUE, classification = TRUE)
plot(Data1_4, what="classification", type = "n")
Any help on this matter will be appreciated. Thanks!
If you look under the source code of plot.Mclust, it calls plot.Mclust.classification which in turn calls coordProj for the dot and ellipse plot. Inside this function, the size is controlled by the option CEX= and shape PCH=.
So for your purpose, do:
library(mclust)
clu = Mclust(iris[,1:4], G = 3, what="classification")
plot(clu,what="classification",CEX=0)

ggbiplot worked previously with prcomp, now will not

ggbiplot used to work with no problems using prcomp but now does not. All I receive is the following error code:
Error in plot_label(p = p, data = plot.data, label = label, label.label = label.label, :
Unsupported class: prcomp
I have installed ggbiplot using dependencies=TRUE and everything else other posts about similar issues have done but yet I still get this message.
Any help is appreciated.
mypca <- prcomp(mydata, center=TRUE, scale.=TRUE)
ggbiplot(mypca, center=TRUE, scale.=TRUE)
Error in plot_label(p = p, data = plot.data, label = label, label.label = label.label, :
Unsupported class: prcomp
I don't think ggbiplot has a center nor scale. argument. Are you confusing prcomp with ggbiplot function arguments?
The following works just fine:
library(ggbiplot)
pca <- prcomp(USArrests, center = TRUE, scale. = TRUE)
ggbiplot(pca)
Tested on ggbiplot_0.55.
Try,
ggbiplot::ggbiplot(mypca)

Error encountered: Plotting PCA figure via ggbiplot

I am very new to R and trying to plot a PCA figure of my data using ggbiplot. So please bear with me if my question does not make any senses to you. Basically, I was following the tutorial I found here, except I was using my own data set.
Everything was fine until I wish to use the code below to plot a figure:
g <- ggbiplot(ir.pca, obs.scale = 1, var.scale = 1,
groups = ir.ppm, ellipse = TRUE,
circle = TRUE)
Then, I encountered an error stating : Error in names(ell)[1:2] <- c("xvar", "yvar") :
'names' attribute [2] must be the same length as the vector [0]
After that, I edited my code and using the default setting for groups =, which should be = NULL as I recall.
g <- ggbiplot(ir.pca, obs.scale = 1, var.scale = 1,
groups = ir.ppm, ellipse = TRUE,
circle = TRUE) `
With the edited code, I did able to plot the PCA figure but It cannot categorize the observations into different groups as I desired. Although I still does not know the meaning of the error: Error in names(ell)[1:2] <- c("xvar", "yvar") : 'names' attribute [2] must be the same length as the vector [0] , I do suspect that it may have something to do with my factor ir.ppm.
Here are all the code I have used before I have encountered the error.
ppm3 = read.csv("normalize_GasPhase_heatmap_no_ID_transpose.csv", header = TRUE, row.names = 1)
ppm3_1 <- ppm3[,1:30]
ir.ppm <- ppm3[,31]
ir.pca <- prcomp(ppm3_1, center = TRUE, scale. = TRUE)
library(ggbiplot)
g <- ggbiplot(ir.pca, obs.scale = 1, var.scale = 1, groups = ir.ppm, ellipse = TRUE, circle = TRUE)
In total, I have 6 observations and 31 variables in my raw data ppm3.
I have been browsing some questions related to plotting PCA figure with ggbiplot in stackoverflow, but it seems not much people encountered the same problem as I did. I would really appreciate if anyone can offer me some help. Thank you.
You only have one observation for each of your factors in ir.ppm. You need more observations for each of the factors in order to display ellipses.
One work around is to remove the ellipses option like this:
g <- ggbiplot(ir.pca, obs.scale = 1, var.scale = 1,
groups = ir.ppm,
circle = TRUE)

mosaic()-function of the vcd package: error in adding text in the cells

I have created a mosaic plot using mosaic function in the vcd package. Now I wish to add some annotations using labeling_cells. Unfortunately, I get an error. The problem might be that it is not the standard Titanic example...
library("grid"); library("vcd")
dataset <- read.table("http://bit.ly/1aJTI1C")
# prepare data for plot as a "Structured Contingency Table"
data1 <- structable(dauer ~ groesse + ort, dataset)
# basic plot
mosaic(data1,
# separate the two elements of the plot
split_vertical = c(T, T, F),
# put the names in the right places and adds boxes
labeling_args = list(tl_labels = TRUE,
tl_varnames = FALSE,
boxes = TRUE),
# grip remains open
pop=FALSE
)
# structure that matches plot, but it does not help
#match<-t(data1)
# try to add labels
labeling_cells(text = data1, clip = FALSE)(data1)
This results in:
# Error in ifelse(abbreviate_varnames, sapply(seq_along(dn), function(i) abbreviate(dn[i], :
# replacement has length zero
# In addition: Warning message:
# In rep(no, length.out = length(ans)) :
# 'x' is NULL so the result will be NULL
Another problem I have is that the boxes do not fit the labels. If you have a hint for that just let me know as well!
It's my first question here, so please excuse potential errors!
Thanks a lot!
Fixed upstream in vcd 1.4-4, but note that you can simply use
mosaic(data1, labeling = labeling_values)
Yes, this is quite confusing and ought to be fixed in labeling_cells(). For some reason the data in the labeling should be a regular table, not a structable. I'll raise this with David, the principal author of mosaic() and package maintainer.
If you know it it's easy to work around it, though:
labeling_cells(text = as.table(data1), clip = FALSE)(as.table(data1))

How can I plot a biplot for LDA in r?

I did a linear discriminant analysis using the function lda() from the package MASS. Now I would try to plot a biplot like in ade4 package (forLDA). Do you know how can I do this?
If I try to use the biplot() function it doesn't work. For example, if I use the Iris data and make LDA:
dis2 <- lda(as.matrix(iris[, 1:4]), iris$Species)
then I can plot it using the function plot(), but if I use the function biplot() it doesn't work:
biplot(dis2)
Error in nrow(y) : argument "y" is missing, with no default
How can I plot the arrows of variables?
I wrote the following function to do this:
lda.arrows <- function(x, myscale = 1, tex = 0.75, choices = c(1,2), ...){
## adds `biplot` arrows to an lda using the discriminant function values
heads <- coef(x)
arrows(x0 = 0, y0 = 0,
x1 = myscale * heads[,choices[1]],
y1 = myscale * heads[,choices[2]], ...)
text(myscale * heads[,choices], labels = row.names(heads),
cex = tex)
}
For your example:
dis2 <- lda(as.matrix(iris[, 1:4]), iris$Species)
plot(dis2, asp = 1)
lda.arrows(dis2, col = 2, myscale = 2)
The length of the arrows is arbitrary relative to the lda plot (but not to each other, of course!). If you want longer or shorter arrows, change the value of myscale accordingly. By default, this plots arrows for the first and second axes. If you want to plot other axes, change choices to reflect this.
My understanding is that biplots of linear discriminant analyses can be done, it is implemented in fact also in R package ggbiplot, see https://github.com/vqv/ggbiplot/tree/experimental and package ggord, see https://github.com/fawda123/ggord, for your example:
install.packages("devtools")
library(devtools)
install_github("fawda123/ggord")
library(ggord)
ord <- lda(Species ~ ., iris, prior = rep(1, 3)/3)
ggord(ord, iris$Species)
Also the book "Biplots in practice" by M. Greenacre has one chapter (chapter 11) on it and in Figure 11.5 it shows a biplot of a linear discriminant analysis of the iris dataset:
You can achieve this using the ggord package from github. The dataset used is IRIS dataset
# --- data partition -- #
set.seed(555)
IRSam <- sample.int(n = nrow(IR), size = floor(.60*nrow(IR)), replace = FALSE, prob = NULL)
IRTrain <- IR[IRSam,]
IRTest <- IR[-IRSam,]
# --- Prediction --- #
p<- predict(IR.lda, IRTrain)
# --- plotting a biplot --- #
library(devtools)
# install_github('fawda123/ggord') --- Used to install ggord from github we need to run devtools to achieve this.
library(ggord)
ggord(IR.lda, IRTrain$Species, ylim=c(-5,5), xlim=c(-10,10))

Resources