The Question is easy. I'd like to biplot the results of PCA(mydata), which I did with FactoMineR. As it seems I can only display ether the variables or the individuals with the built in ploting device:
plot.PCA(pca1, choix="ind/var").
I know it can be easily done with the result of princomp(), but I really like how FactoMineR handles the NA's and it seems easier to me in many ways.
Is there a way? I saw it somehow done with ggplot2 but again only with the result of princomp(), and I have no idea how to change the code so that it works with PCA().
I also saw a solution for doing the individual and variable plot separately with ggplot2 (Look at the bottom), but how do I combine those?
Maybe the solution is somehow in the first link, but then I don't really get it :/.
I hope I made myself clear!
Greetings
Lukas
You can also do it with ggord, and code is easy to adapt to other ordination objects if you add additional S3 methods for them :
install.packages("devtools")
library(devtools)
install_github("fawda123/ggord")
library(ggord)
library(FactoMineR)
ord <- PCA(iris[, 1:4], graph = FALSE)
ggord(ord, iris$Species)
(Different kinds of scaling are possible though, e.g. row principal (form biplot), column principal (covariance biplot), symmetric biplot, etc, which are currently not supported by goord. Though it would be easy to edit the ggord.PCA S3 method or goord.default method to support this.)
Hi you can adapt the code of your first link for PCA objects from FactoMineR like this :
PCbiplot2 <- function(res.pca, x="Dim.1", y="Dim.2") {
if(!require(ggplot2)) install.packages("ggplot2")
# res.pca being a PCA object
data <- data.frame(obsnames=row.names(res.pca$ind$coord), res.pca$ind$coord)
plot <- ggplot(data, aes_string(x=x, y=y)) + geom_text(alpha=.4, size=3, aes(label=obsnames))
plot <- plot + geom_hline(aes(0), size=.2) + geom_vline(aes(0), size=.2)
datapc <- data.frame(varnames=rownames(res.pca$var$coord), res.pca$var$coord)
mult <- min(
(max(data[,y]) - min(data[,y])/(max(datapc[,y])-min(datapc[,y]))),
(max(data[,x]) - min(data[,x])/(max(datapc[,x])-min(datapc[,x])))
)
datapc <- transform(datapc,
v1 = .7 * mult * (get(x)),
v2 = .7 * mult * (get(y))
)
plot <- plot + coord_equal() + geom_text(data=datapc, aes(x=v1, y=v2, label=varnames), size = 5, vjust=1, color="red")
plot <- plot + geom_segment(data=datapc, aes(x=0, y=0, xend=v1, yend=v2), arrow=arrow(length=unit(0.2,"cm")), alpha=0.75, color="red")
plot
}
library(FactoMineR)
fit2 <- PCA(USArrests, graph=F)
PCbiplot2(fit2)
Tell me if it works !
Edit : add libraries like jlhoward suggests
Related
I need to plot an ordination plot showing only let s say the 20 most abundant species.
I tried to do the sum of the species colunm and then select only a certain sum value:
abu <- colSums(dune)
abu
sol <- metaMDS(dune)
sol
plot(sol, type="text", display="species", select = abu > 40)
I get this error: select is not a graphical parameter
I would expect to see only small number of species but it does not happen,
how do you show only a small number of species in the NMDS plot?
This is not straightforward. You are getting an error because select is not a parameter for the plot. Unfortunately, the result of the analysis is not a data.frame that could be handled easily (e.g. with tidyverse), and even more unfortunately, the plot() function called is not your standard plot, but a method defined specifically for objects of this class. The authors of this method did not foresee your need, and therefore, we must make the plot manually. But to do that, we need to understand what is plotting and how.
Let us find out more about the object sol:
class(sol)
# [1] "metaMDS" "monoMDS"
methods(class="metaMDS")
# [1] goodness nobs plot points print scores sppscores<- text
Oh good, we have a plot method. After a moment of digging, we find it in the vegan package (not exported, so we need to access it via vegan:::plot.metaMDS). It appears to be a wrapper around a function called ordiplot. We edit the function with edit() to figure out what it is doing. Essentially, it boils down to the following (with loads of unnecessary code):
Y <- scores(sol, display="species")
plot(Y, type="n")
text(Y[,1], Y[,2], rownames(Y), col="red")
This is, more or less, your plot. Choosing the species to show is now trivial, but first we must make sure that rows of Y are in the same order as columns of dune:
all(colnames(dune) == rownames(Y))
Y.sel <- Y[colSums(dune) > 40, ]
plot(Y.sel[,1], Y.sel[,2], type="n", xlim=c(-.8, .8), ylim=c(-.4, .4))
text(Y.sel[,1], Y.sel[,2], rownames(Y.sel), col="red")
We can of course make a much nicer plot. For example, with ggplot (it is definitely possible to make a much nicer plot with base R as well). We could actually show the abundance of the plants using the size esthetics:
library(ggplot2)
library(ggrepel)
Y <- data.frame(Y)
Y$abundance <- colSums(dune)
Y$labels <- rownames(Y)
ggplot(Y, aes(x=NMDS1, y=NMDS2, size=abundance)) +
geom_point() + geom_text_repel(aes(label=labels)) +
theme_minimal()
To filter the species by abundance, we now can do the following:
library(tidyverse)
Y %>% filter(abundance > 40) %>%
ggplot(Y, aes(x=NMDS1, y=NMDS2, size=abundance)) +
geom_point() + geom_text_repel(aes(label=labels)) +
theme_minimal()
Trying to add geom_points to an autolayer() line ("fitted" in pic), which is a wrapper part of autoplot() for ggplot2 in Rob Hyndmans forecast package (there's a base autoplot/autolayer in ggplot2 too so same likely applies there).
Problem is (I'm no ggplot2 expert, and autoplot wrapper makes it trickier) the geom_point() applies fine to the main call, but how do I apply similar to the autolayer (fitted values)?
Tried type="b" like normal geom_line() but it's not an object param in autolayer().
require(fpp2)
model.ses <- ets(mdeaths, model="ANN", alpha=0.4)
model.ses.fc <- forecast(model.ses, h=5)
forecast::autoplot(mdeaths) +
forecast::autolayer(model.ses.fc$fitted, series="Fitted") + # cannot set to show points, and type="b" not allowed
geom_point() # this works fine against the main autoplot call
This seems to work:
library(forecast)
library(fpp2)
model.ses <- ets(mdeaths, model="ANN", alpha=0.4)
model.ses.fc <- forecast(model.ses, h=5)
# Pre-compute the fitted layer so we can extract the data out of it with
# layer_data()
fitted_layer <- forecast::autolayer(model.ses.fc$fitted, series="Fitted")
fitted_values <- fitted_layer$layer_data()
plt <- forecast::autoplot(mdeaths) +
fitted_layer +
geom_point() +
geom_point(data = fitted_values, aes(x = timeVal, y = seriesVal))
There might be a way to make forecast::autolayer do what you want directly but this solution works. If you want the legend to look right, you'll want to merge the input data and fitted values into a single data.frame.
I have a dataset of leaf trait measurements made at multiple sites at two contrasting seasons. I am interested to explore the association/line fit between a pair of traits and to differentiate the seasons at each site.
Rather than a linear regression, I would prefer to use the Standardised Major Axis approach within the smatr package:
e.g. sma.site1 <- sma(TraitA ~ TraitB * Visit, data=subset(myfile, Site=="Site1")) # testing the null hypothesis of common slopes for the two Visits (Seasons) at a given Site.
I can produce a handy lattice plot in ggplot2 with a separate panel for each Site and the points differentiated by Visit:
e.g. qplot(TraitB, TraitA, data=myfile, colour=Visit) + facet_wrap(~Site, ncol=2)
However, if I add trend lines fitted with the additional argument in ggplot2:
+ geom_smooth(aes(group=Visit), method="lm", se=F)
……, those lines are not a good match for the sma coefficients.
What I would like to do is fit the lines suggested by the sma test onto the ggplot lattice. Is there an easy, or efficient, way to do that?
I know that I can subset the data, produce a plot for each site, add the relevant lines with + geom_abline() and then stitch the separate plots up together with grid.arrange(). But that feels very long-winded.
I would be grateful for any pointers.
I don't know anything about the smatr package but you should be able to tweak this to get the right values. Since you provided no data I used the leaf data from the example in the pkg. The basic idea is to pull out the slope & intercept from the returned sma object and then facet the geom_abline. I may be misinterpreting the object, though.
library(smatr)
library(ggplot2)
data(leaflife)
do.call(rbind, lapply(unique(leaflife$site), function(x) {
obj <- sma(longev~lma*rain, data=subset(leaflife, site=x))
data.frame(site=x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
gg <- ggplot(leaflife)
gg <- gg + geom_point(aes(x=lma, y=longev, color=soilp))
gg <- gg + geom_abline(data=fits, aes(slope=slope, intercept=intercept))
gg <- gg + facet_wrap(~site, ncol=2)
gg
I just saw this question and am not sure if you are still interested in this. I run the code by hrbrmstr, and found actually the only thing you need to change is:
obj <- sma(longev~lma*rain, data=subset(leaflife, site == x))
then you can get the plot with four lines for each group.
and also
I'm trying to plot a scatter-plot with two layers. The reason is I want to represent the size of the points by its number of answers. Then I need to have a smooth-curve layed over it. So I use two datasets to achieve this.
The problem is, when I lay the second layer with the smoother using the original dataset, then the smoother is shifted by one point on the x-scale to the left.
Does anyone know, how to correct this in the R code? Is there maybe something wrong in it?
I thought about to add 1 to the x variable, but I don't want to have to go this far.
library(ggplot2)
q.tab <- xtabs(~x + y, mydata)
q.df <- as.data.frame(q.tab)
pointsize <- q.df$Freq
qplot(x, y, data=q.df) + geom_point(aes(size=as.factor(pointsize)))
+ geom_smooth(data=mydata, method="loess", span=1))
With ggplot2 , when you think in terms of layer it is better to use ggplot function and not qplot.
I generate your data (sample function is very convenient to generate data)
mydata$x <- sample(1:10,100,replace=TRUE)
mydata$y <- sample(1:10,100,replace=TRUE)
q.tab <- xtabs(~x + y, mydata)
q.df <- as.data.frame(q.tab)
ggplot version:
library(ggplot2)
ggplot(data=mydata,aes(x,y,size=Freq)) +
geom_point() +
geom_smooth( method="loess", span=1)
qplot version:
qplot(data=mydata,x=x,y=y,size=Freq,geom='point')+
geom_smooth( method="loess", span=1)
I am trying to produce something similar to densityplot() from the lattice package, using ggplot2 after using multiple imputation with the mice package. Here is a reproducible example:
require(mice)
dt <- nhanes
impute <- mice(dt, seed = 23109)
x11()
densityplot(impute)
Which produces:
I would like to have some more control over the output (and I am also using this as a learning exercise for ggplot). So, for the bmi variable, I tried this:
bar <- NULL
for (i in 1:impute$m) {
foo <- complete(impute,i)
foo$imp <- rep(i,nrow(foo))
foo$col <- rep("#000000",nrow(foo))
bar <- rbind(bar,foo)
}
imp <-rep(0,nrow(impute$data))
col <- rep("#D55E00", nrow(impute$data))
bar <- rbind(bar,cbind(impute$data,imp,col))
bar$imp <- as.factor(bar$imp)
x11()
ggplot(bar, aes(x=bmi, group=imp, colour=col)) + geom_density()
+ scale_fill_manual(labels=c("Observed", "Imputed"))
which produces this:
So there are several problems with it:
The colours are wrong. It seems my attempt to control the colours is completely wrong/ignored
There are unwanted horizontal and vertical lines
I would like the legend to show Imputed and Observed but my code gives the error invalid argument to unary operator
Moreover, it seems like quite a lot of work to do what is accomplished in one line with densityplot(impute) - so I wondered if I might be going about this in the wrong way entirely ?
Edit: I should add the fourth problem, as noted by #ROLO:
.4. The range of the plots seems to be incorrect.
The reason it is more complicated using ggplot2 is that you are using densityplot from the mice package (mice::densityplot.mids to be precise - check out its code), not from lattice itself. This function has all the functionality for plotting mids result classes from mice built in. If you would try the same using lattice::densityplot, you would find it to be at least as much work as using ggplot2.
But without further ado, here is how to do it with ggplot2:
require(reshape2)
# Obtain the imputed data, together with the original data
imp <- complete(impute,"long", include=TRUE)
# Melt into long format
imp <- melt(imp, c(".imp",".id","age"))
# Add a variable for the plot legend
imp$Imputed<-ifelse(imp$".imp"==0,"Observed","Imputed")
# Plot. Be sure to use stat_density instead of geom_density in order
# to prevent what you call "unwanted horizontal and vertical lines"
ggplot(imp, aes(x=value, group=.imp, colour=Imputed)) +
stat_density(geom = "path",position = "identity") +
facet_wrap(~variable, ncol=2, scales="free")
But as you can see the ranges of these plots are smaller than those from densityplot. This behaviour should be controlled by parameter trim of stat_density, but this seems not to work. After fixing the code of stat_density I got the following plot:
Still not exactly the same as the densityplot original, but much closer.
Edit: for a true fix we'll need to wait for the next major version of ggplot2, see github.
You can ask Hadley to add a fortify method for this mids class. E.g.
fortify.mids <- function(x){
imps <- do.call(rbind, lapply(seq_len(x$m), function(i){
data.frame(complete(x, i), Imputation = i, Imputed = "Imputed")
}))
orig <- cbind(x$data, Imputation = NA, Imputed = "Observed")
rbind(imps, orig)
}
ggplot 'fortifies' non-data.frame objects prior to plotting
ggplot(fortify.mids(impute), aes(x = bmi, colour = Imputed,
group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00"))
note that each ends with a '+'. Otherwise the command is expected to be complete. This is why the legend did not change. And the line starting with a '+' resulted in the error.
You can melt the result of fortify.mids to plot all variables in one graph
library(reshape)
Molten <- melt(fortify.mids(impute), id.vars = c("Imputation", "Imputed"))
ggplot(Molten, aes(x = value, colour = Imputed, group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00")) +
facet_wrap(~variable, scales = "free")