Plotting More than 2 Factors - r

Suppose I ran a factor analysis & got 5 relevant factors. Now, I want to graphically represent the loading of these factors on the variables. Can anybody please tell me how to do it. I can do using 2 factors. But can't able to do when number of factors are more than 2.
The 2 factor plotting is given in "Modern Applied Statistics with S", Fig 11.13. I want to create similar graph but with more than 2 factors. Please find the snap of the Fig mentioned above:
X & y axes are the 2 factors.
Regards,
Ari

Beware: not the answer you are looking for and might be incorrect also, this is my subjective thought.
I think you run into the problem of sketching several dimensions on a two dimension screen/paper.
I would say there is no sense in plotting more factors' or PCs' loadings, but if you really insist: display the first two (based on eigenvalues) or create only 2 factors. Or you could reduce dimension by other methods also (e.g. MDS).
Displaying 3 factors' loadings in a 3 dimensional graph would be just hardly clear, not to think about more factors.
UPDATE: I had a dream about trying to be more ontopic :)
You could easily show projections of each pairs of factors as #joran pointed out like (I am not dealing with rotation here):
f <- factanal(mtcars, factors=3)
pairs(f$loadings)
This way you could show even more factors and be able to tweak the plot also, e.g.:
f <- factanal(mtcars, factors=5)
pairs(f$loadings, col=1:ncol(mtcars), upper.panel=NULL, main="Factor loadings")
par(xpd=TRUE)
legend('topright', bty='n', pch='o', col=1:ncol(mtcars), attr(f$loadings, 'dimnames')[[1]], title="Variables")
Of course you could also add rotation vectors also by customizing the lower triangle, or showing it in the upper one and attaching the legend on the right/below etc.
Or just point the variables on a 3D scatterplot if you have no more than 3 factors:
library(scatterplot3d)
f <- factanal(mtcars, factors=3)
scatterplot3d(as.data.frame(unclass(f$loadings)), main="3D factor loadings", color=1:ncol(mtcars), pch=20)
Note: variable names should not be put on the plots as labels, but might go to a distinct legend in my humble opinion, specially with 3D plots.

It looks like there's a package for this:
http://factominer.free.fr/advanced-methods/multiple-factor-analysis.html
Comes with sample code, and multiple factors. Load the FactoMineR package and play around.
Good overview here:
http://factominer.free.fr/docs/article_FactoMineR.pdf
Graph from their webpage:
You can also look at the factor analysis object and see if you can't extract the values and plot them manually using ggplot2 or base graphics.

As daroczig mentions, each set of factor loadings gets its own dimension. So plotting in five dimensions is not only difficult, but often inadvisable.
You can, though, use a scatterplot matrix to display each pair of factor loadings. Using the example you cite from Venables & Ripley:
#Reproducing factor analysis from Venables & Ripley
#Note I'm only doing three factors, not five
data(ability.cov)
ability.FA <- factanal(covmat = ability.cov,factor = 3, rotation = "promax")
load <- loadings(ability.FA)
rot <- ability.FA$rot
#Pairs of factor loadings to plot
ind <- combn(1:3,2)
par(mfrow = c(2,2))
nms <- row.names(load)
#Loop over pairs of factors and draw each plot
for (i in 1:3){
eqscplot(load[,ind[1,i]],load[,ind[2,i]],xlim = c(-1,1),
ylim = c(-0.5,1.5),type = "n",
xlab = paste("Factor",as.character(ind[1,i])),
ylab = paste("Factor",as.character(ind[2,i])))
text(load[,ind[1,i]],load[,ind[2,i]],labels = nms)
arrows(c(0,0),c(0,0),rot[ind[,i],ind[,i]][,1],
rot[ind[,i],ind[,i]][,2],length = 0.1)
}
which for me resulting in the following plot:
Note that I had to play a little with the the x and y limits, as well as the various other fiddly bits. Your data will be different and will require different adjustments. Also, plotting each pair of factor loadings with five factors will make for a rather busy collection of scatterplots.

Related

How to change color of data points to match treatment groups?

I am currently using RStudio to generate a 3D plot for my PCA using data imported from SPSS.
Currently, I have 10 treatment groups, each with 5 subjects. I would like to plot a 3d plot where each treatment group is represented by a color, and each subject in the same treatment group has the same color.
It is also vital that none of these colors are repeated.
I am able to generate the 3d plot however, there are 2 treatment groups that utilizes the same color.
Can anyone help me rectify this issue so there would be no repeating colors for different treatment groups?
Here's the code that I'm using.
db = file.choose()
hpca = read.table(db, header=TRUE)
pc <- princomp(hpca[,2:7], cor=TRUE, scores=TRUE)
plot3d(pc$scores[,1:3], col=hpca$group, size = 6)
text3d(pc$scores[,1:3],texts=hpca$ï..tag)
text3d(pc$loadings[,1:3], texts=rownames(pc$loadings), col="red")
coords <- NULL
for (i in 1:nrow(pc$loadings)) {
coords <- rbind(coords, rbind(c(0,0,0),pc$loadings[i,1:3]))
}
lines3d(coords, col="red", lwd=4)
*p/s; I am completely new to R programming and most of these codes are copied from an online guide. Therefore it would be extremely helpful if you could show me where exactly to include the changes.
Many thanks in advance!
You are setting the colour to be the group value. That sometimes works, because group values are usually factors and factors are stored as integer values, but it is kind of hit and miss.
It is better to compute the colour explicitly. For example, you can get a vector of 10 colours using
cols <- rainbow(10)
and then use it as
plot3d(..., col=cols[as.numeric(gp)])
if gp contains a factor with 10 levels.
There are several functions in base R to select colours (see ?rainbow) and others in contributed packages for different palettes.

How do I remove the second x and y axes in R?

Hopefully a simple question today:
I'm plotting an RDA (in R Studio) and would like to remove the second X and Y (top and right) axes . Purely for aesthetic purposes, but still. The code I'm using is below. I've managed to remove the first axes (I'll replace them with something nicer later) with xaxt="n" and yaxt="n", but it still puts the others in.
The question: How do I remove the top and right axes from a plot in R?
To make this example reproducible you will need two data frames of equal length called "bio" and "abio" respectively.
library (vegan) ##not sure which package I'm actually employing
library(MASS) ##these are just my defaults
rdaY1<-rda(bio,Abio) #any dummy data will do so long as they're of equal length
par(bg="transparent",new=FALSE)
plot(rdaY1,type="n",bty="n",main="Y1. P<0.001 R2=XXX",
ylab="XXX% variance explained",
xlab="XXX% variance explained",
col.main="black",col.lab="black", col.axis="white",
xaxt="n",yaxt="n",axes=FALSE, bty="n")
abline(h=0,v=0,col="black",lwd=1)
points(rdaY1,display="species",col="gray",pch=20)
#text(rdaY1,display="species",col="gray")
points(rdaY1,display="cn",col="black",lwd=2)
text(rdaY1,display="cn",col="black")
UPDATE: Using comments below I've played around with various ways to get rid of the axes and it seems like that second "points" command where I call for the vectors to be plotted is the problem. Any ideas?
bty="L" worked for me. I generated some random data using rnorm() to test:
library(vegan)
mat <- matrix(rnorm(100), nrow = 10)
pl <- rda(mat)
plot(pl, bty="L")
Here's the result.

How to extract coordinates to plot line segments connecting legend keys in ggplot2?

I've long puzzled over a concise way to communicate significance of an interaction between numeric and categorical variables in a line plot (response on the Y-axis, numeric predictor variable on the X-axis, and each level of the categoric variable a line of a different color or pattern plotted on those axes). I finally came up with the idea of drawing the traditional "brackets and p-values" connecting legend keys instead of lines of data.
Here is a mockup of what I mean:
library(ggplot2);
mydat <- do.call(rbind,lapply(1:3,function(ii) data.frame(
y=seq(0,10)*c(.695,.78,1.39)[ii]+c(.322,.663,.847)[ii],
a=factor(ii-1),b=0:10)));
myplot <- ggplot(data=mydat,aes(x=b,y=y,colour=a,group=a)) +
geom_line()+theme(legend.position=c(.1,.9));
# Plotting with p-value bracket:
myplot +
# The three line segments making up the bracket
geom_segment(x=1.2,xend=1.2,y=13.8,yend=13) +
geom_segment(x=1.1,xend=1.2,y=13,yend=13) +
geom_segment(x=1.1,xend=1.2,y=13.8,yend=13.8) +
# The text accompanying the bracket.
geom_text(label='p < 0.001',x=2,y=13.4);
This is less cluttered than trying to plot brackets someplace on the line-plot itself.
The problem is that the x and y values for the geom_segments and geom_text were obtained by trial and error and for another dataset these coordinates would be completely wrong. That's a problem if I'm trying to write a function whose purpose is to automate the process of pulling these contrasts out of models and plotting them (kind of like the effects package, but with more flexibility about how to represent the data).
My question is: is there a way to somehow pull the actual coordinates of each box comprising the legend and convert them to the scale used by geom_segment and geom_text, or manually specify the coordinates of each box when creating the myplot object, or reliably predict where the individual boxes will be and convert them to the plot's scale given that myplot$theme$legend.position returns 0.1 0.9?
I'd like to do this within ggplot2, because it's robust, elegant, and perfect for all the other things I want to do with my script. I'm open to using additional packages that extend ggplot2 and I'm also open to other approaches to visually indicating significance level on line-plots. However, suggestions that amount to "you shouldn't even do that" are not constructive-- because whether or not I personally agree with you, my collaborators and their editors don't read Stackoverflow (unfortunately).
Update:
This question kind of simplifies to: if the myplot$theme$legend.key.height is in lines and myplot$theme$legend.position seems to be roughly in fractions of the overall plot area (but not exactly) how can I convert these to the units in which the x and y axes are delineated, or alternatively, convert the x and y axis scales to the units of legend.key.height and legend.position?
I don't know the answer to your question as posed. But, another, definitely quickly do-able if less fancy approach to convey the information is to change the names of the levels so that the level names include significance codes. In your first example, you could use
levels(mydat$a) <- list("0" = "0", "1 *" = "1", "2 *" = "2")
And then the legend will reflect this:
With more levels and combos of significance, you could probably work out a set of symbols. Then mention in your figure legend the p level reflected in each set of symbols.
This might be a related way to convey the information: The figure below is produced by rxnNorm in HandyStuff here. Unfortunately, this is another non-answer as I have not been able to make this work with the new version of ggplot2. Hopefully I can figure it out soon.
My answer is not using ggplot2, but the lattice package. I think dotplot is what I would use if I want to compare a continuous variable versus categorical variables.
Here I use dotplot in 2 manners, one where I reproduce your plot, and another where
library(lattice)
library(latticeExtra) ## to get ggplot2 theme
#y versus levels of B, in different panel of A
p1 <- dotplot(b~y|a ,
data = mydat,
groups = a,
type = c("p", "h"),
main = "interaction between numeric and categorical variables ",
xlab = "continuous value",
par.settings = ggplot2like())
#y versus levels of B , grouped by a(color and line are defined by a)
p2 <- dotplot(b~y, groups= a ,
data = mydat,
type = c("l"),
main = "interaction between numeric and categorical variables ",
xlab = "continuous value",
par.settings = ggplot2like())
library(gridExtra) ## to arrange many grid plots
grid.arrange(p1,p2)

R - logistic curve plot with aggregate points

Let's say I have the following dataset
bodysize=rnorm(20,30,2)
bodysize=sort(bodysize)
survive=c(0,0,0,0,0,1,0,1,0,0,1,1,0,1,1,1,0,1,1,1)
dat=as.data.frame(cbind(bodysize,survive))
I'm aware that the glm plot function has several nice plots to show you the fit,
but I'd nevertheless like to create an initial plot with:
1)raw data points
2)the loigistic curve and both
3)Predicted points
4)and aggregate points for a number of predictor levels
library(Hmisc)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
All fine up to here.
Now I want to plot the real data survival rates for a given levels of x1
dat$bd<-cut2(dat$bodysize,g=5,levels.mean=T)
AggBd<-aggregate(dat$survive,by=list(dat$bd),data=dat,FUN=mean)
plot(AggBd,add=TRUE)
#Doesn't work
I've tried to match AggBd to the dataset used for the model and all sort of other things but I simply can't plot the two together. Is there a way around this?
I basically want to overimpose the last plot along the same axes.
Besides this specific task I often wonder how to overimpose different plots that plot different variables but have similar scale/range on two-dimensional plots. I would really appreciate your help.
The first column of AggBd is a factor, you need to convert the levels to numeric before you can add the points to the plot.
AggBd$size <- as.numeric (levels (AggBd$Group.1))[AggBd$Group.1]
to add the points to the exisiting plot, use points
points (AggBd$size, AggBd$x, pch = 3)
You are best specifying your y-axis. Also maybe using par(new=TRUE)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
#then
par(new=TRUE)
#
plot(AggBd$Group.1,AggBd$x,pch=30)
obviously remove or change the axis ticks to prevent overlap e.g.
plot(AggBd$Group.1,AggBd$x,pch=30,xaxt="n",yaxt="n",xlab="",ylab="")
giving:

How to plot data grouped by a factor, but not as a boxplot

In R, given a vector
casp6 <- c(0.9478638, 0.7477657, 0.9742675, 0.9008372, 0.4873001, 0.5097587, 0.6476510, 0.4552577, 0.5578296, 0.5728478, 0.1927945, 0.2624068, 0.2732615)
and a factor:
trans.factor <- factor (rep (c("t0", "t12", "t24", "t72"), c(4,3,3,3)))
I want to create a plot where the data points are grouped as defined by the factor. So the categories should be on the x-axis, values in the same category should have the same x coordinate.
Simply doing plot(trans.factor, casp6) does almost what I want, it produces a boxplot, but I want to see the individual data points.
require(ggplot2)
qplot(trans.factor, casp6)
You can do it with ggplot2, using facets. When I read "I want to create a plot where the data points are grouped as defined by the factor", the first thing that came to my mind was facets.
But in this particular case, faster alternative should be:
plot(as.numeric(trans.factor), casp6)
And you can play with plot options afterwards (type, fg, bg...), but I recommend sticking with ggplot2, since it has much cleaner code, great functionality, you can avoid overplotting... etc. etc.
Learn how to deal with factors. You got barplot when evaluating plot(trans.factor, casp6) 'cause trans.factor was class of factor (ironically, you even named it in such manor)... and trans.factor, as such, was declared before a continuous (numeric) variable within plot() function... hence plot() "feels" the need to subset data and draw boxplot based on each part (if you declare continuous variable first, you'll get an ordinary graph, right?). ggplot2, on the other hand, interprets factor in a different way... as "an ordinary", numeric variable (this stands for syntax provided by Jonathan Chang, you must specify geom when doing something more complex in ggplot2).
But, let's presuppose that you have one continuous variable and a factor, and you want to apply histogram on each part of continuous variable, defined by factor levels. This is where the things become complicated with base graph capabilities.
# create dummy data
> set.seed(23)
> x <- rnorm(200, 23, 2.3)
> g <- factor(round(runif(200, 1, 4)))
By using base graphs (package:graphics):
par(mfrow = c(1, 4))
tapply(x, g, hist)
ggplot2 way:
qplot(x, facets = . ~ g)
Try to do this with graphics in one line of code (semicolons and custom functions are considered cheating!):
qplot(x, log(x), facets = . ~ g)
Let's hope that I haven't bored you to death, but helped you!
Kind regards,
aL3xa
I find the following solution:
stripchart(casp6~trans.factor,data.frame(casp6,trans.factor),pch=1,vertical=T)
simple and direct.
(Refer eg to http://www.mail-archive.com/r-help#r-project.org/msg34176.html)
You may be able to get close to what you want using lattice graphics by doing:
library(lattice)
xyplot(casp6 ~ trans.factor,
scales = list(x = list(at = 1:4, labels = levels(trans.factor))))
I think there's a better solution (I wrote it for a workshop a few days ago), but it slipped my mind. Here's an ugly substitute with base graphics. Feel free to annotate the x axis ad libitum. Personally, I like Greg's solution.
plot(0, 0, xlim = c(1, 4), ylim = range(casp6), type = "n")
points(casp6 ~ trans.factor)
No extra package needed
I'm a bit late to the party, but I found that you can get the desired result very easily with the standard plot function -- simply convert the factor to a numeric value:
plot(as.numeric(trans.factor), casp6)
10 year old question...but if you want a neat base R solution:
plot(trans.factor, casp6, border=NA, outline=FALSE)
points(trans.factor, casp6)
The first line sets up the plot but draws nothing. The second adds the points. This is slightly neater than the solutions that force x to be numeric.

Resources