Generate correct legend with auto.key in xyplot - r

I realize this is perhaps more of a data frame issue than a xyplot questions - but here it goes.
I have a data frame dat that has 108 rows and 5 columns. dat$Treatment is a factor with 5 levels. I want to create an xy plot with ONLY the data where dat$Treatment=="Control". Since I didn't know any better way to do it, I created tmp as shown below. xyplot plots the correct graph, with only the data in the rows where dat$Treatment=="Control". However the legend displays all the data, for example those where dat$Treatment=="High dose"
Where is auto.key getting that from? I thought my tmp data frame didn't even have it. Can someone please help me understand?
tmp <- dat[dat$Treatment=="Control",]
xyplot(tmp[,5] ~ Day, groups=tmp$Animal, data=tmp,
type="b", ylab="Tumor volume",
par.settings=simpleTheme(col=1:8,
pch=20,
cex=1.3,
lwd=2,
lty="dotted"),
auto.key=list(title="Animal", x=.05, y=.95,
corner=c(0,1), border=T, lines=T, points=F, type="b"))

I'm not too familiar with the lattice package, so others with more experience will have to weigh in. My guess is that you're seeing this behavior because of how R is handling dat$Treatment. I'm guessing this variable is stored as a factor, with levels you don't want to include in the plot. As a rough first step, I'd try saving the new data frame (as you have), but additionally run the following command:
tmp$Treatment = as.factor(as.character(tmp$Treatment))
This should save the Treatment variable as a factor with only one level. My guess is that the xyplot function looks up the levels of that factor when it plots. As a related example, consider the following:
data(iris)
iris.2 = iris[iris$Species == "setosa",]
table(iris.2$Species)
iris.2$Species = factor(as.character(iris.2$Species))
table(iris.2$Species)
Here, the two tables are reported differently because we've resaved the Species variable as a new factor. Hope this helps --

auto.key get's its values form the levels of the factor variables. When you subset a factor variable, all the levels are maintained (so in the future, you can know which levels are missing from a particular subset). If you want to remove levels that aren't used in your subset you can use
tmp <- droplevels(dat[dat$Treatment=="Control",])
This way auto.key will never see the other factor levels.

Related

R CCA Plot: How to get rid of overlapping labels in plot()?

I am doing a CCA on a table of abundance data of species and environmental parameters, very similar to the doubs data, package ade4.
cil.cca.r<-cca(cil~.,envm.r)
Where cil is a table with abundance data with 16 observations (sample sites) and 76 variables (abundances) and envm.r is my table with 16 observations (sample sites) and 11 variables (environmental parameters).
My problem is that I have a lot of species and I want the names of the species in the plot to be more readable.
Or if this does´t work, I would need to find out which species e.g. please allow me to say "prefer" the environmental parameter PK/ml. How can I get to that information when the species names are not readable.
The command I started with is the following:
plot(cil.cca.r,scaling=1,display=c("sp","lc","cn"),main="Triplot CCA spe ~envm.r - scaling 1",repel = TRUE )
Next I also tried to separate the commands to make it more readable.
plot(cil.cca.rh, type="n",xlim=c(-2,2))
text(cil.cca.rh, "species", col="red", cex=0.6)
text(cil.cca.rh, dis="cn",cex=0.7,adj=0.8,col="blue")
I don´t get anywhere with trying out the parameters.
Then I found the function orditorp()
plot(cil.cca.rh, type="n",xlim=c(-2,2))
orditorp(cil.cca.rh, "species", col="red", cex=0.6)
orditorp(cil.cca.rh,display="sites",cex=0.8,adj=0.1,col="darkgreen",air=0.7)
Plot with orditorp
This makes the plot look nicer, but still I don´t get the information about my species, since a lots of them get points.
I hoped for some solution like package ggrepel e.g. function repel() in ggplot2 but this doesn´t work for a CCA.
Error "ggplot2 doesn't know how to deal with data of class cca"....
I also hoped for some function like fviz_pca_ind() of package factoextra
but for CCA data.
But I could´t find anything and would be very grateful for solutions either for better plotting the labels or for getting the species names with the locations!
Of course, what I could do is give the species shorter names or numbers instead of names, but it would be nicer to get at one view which species it is...
Or perhaps "zoomig in" part of the graphic?

qqnorm plotting for multiple subsets

I am very new to R. I have figured out how to make qqnorm plots on a subset of my dataframe. However, I would like to make qqnorm plots on subsets that are defined by two factors (one factor has 48 categories (brain_region) and each of those categories can be further subdivided by another factor, which has three levels (GroupID)). I have tried the following:
by(t, t[,"GroupID"], function(x) tapply(t$FA,t$brain_region,qqnorm))
but it does not seem to be working. I'm also not sure if this is the best approach, as I'm new to this program.
I would also like to save each of the separately generated qqnorm plot with the x axis as labeled as "FA" and the title with the specific level of each of the two factors (brain region/GroupID). Thank you very much for any help.
Plotting is one of the few things where apply isn't the optimal solution. ggplot offers you enough possibilities to get this done, as shown in this answer.
Plotting all levels in one go
If you use the base plots, you can better use a for loop for this. Plus, if you want to plot different plots on the same graphics device, you can use eg par(mfrow=) or layout (see the help page ?layout)
Let's take the built-in data set iris as an example:
data(iris)
op <- par(mfrow=c(1,3))
for(i in levels(iris$Species)){
tmp <- with(iris, Petal.Width[Species==i])
qqnorm(tmp,xlab="Petal.Width",main=i)
qqline(tmp)
}
par(op)
rm(i,tmp)
gives :
Don't forget to clean up your workspace after using a for loop. Not really obligatory, but it can prevent serious confusion later on.
Combine two factors
In order to get this done for 2 factor levels at the same time, you can either construct a nested for-loop, or combine both factors into a single factor. Take the dataset mtcars:
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$am <- factor(mtcars$am,
labels=c('automatic','manual'))
To combine both levels, you can use this simple construct :
mtcars$combined <- factor(paste(mtcars$cyl,mtcars$am,sep='/'))
And then do the same again. With two for loops, your code would like like the code below. Be warned though that this only works if you have data for every combination of the factors, and you don't have too many levels. If you have a lot of levels, you better save the plots by using eg png() (see ?png for info) instead of plotting them all on the same graphics device.
lcyl <- levels(mtcars$cyl)
lam <- levels(mtcars$am)
par(mfrow=c(length(lam),length(lcyl)))
for(i in lam){
for(j in lcyl){
tmp <- with(mtcars,mpg[am==i & cyl==j])
qqnorm(tmp,xlab="Petal.Width",
main=paste(i,j,sep="/"))
qqline(tmp)
}
}
gives :

R - logistic curve plot with aggregate points

Let's say I have the following dataset
bodysize=rnorm(20,30,2)
bodysize=sort(bodysize)
survive=c(0,0,0,0,0,1,0,1,0,0,1,1,0,1,1,1,0,1,1,1)
dat=as.data.frame(cbind(bodysize,survive))
I'm aware that the glm plot function has several nice plots to show you the fit,
but I'd nevertheless like to create an initial plot with:
1)raw data points
2)the loigistic curve and both
3)Predicted points
4)and aggregate points for a number of predictor levels
library(Hmisc)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
All fine up to here.
Now I want to plot the real data survival rates for a given levels of x1
dat$bd<-cut2(dat$bodysize,g=5,levels.mean=T)
AggBd<-aggregate(dat$survive,by=list(dat$bd),data=dat,FUN=mean)
plot(AggBd,add=TRUE)
#Doesn't work
I've tried to match AggBd to the dataset used for the model and all sort of other things but I simply can't plot the two together. Is there a way around this?
I basically want to overimpose the last plot along the same axes.
Besides this specific task I often wonder how to overimpose different plots that plot different variables but have similar scale/range on two-dimensional plots. I would really appreciate your help.
The first column of AggBd is a factor, you need to convert the levels to numeric before you can add the points to the plot.
AggBd$size <- as.numeric (levels (AggBd$Group.1))[AggBd$Group.1]
to add the points to the exisiting plot, use points
points (AggBd$size, AggBd$x, pch = 3)
You are best specifying your y-axis. Also maybe using par(new=TRUE)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
#then
par(new=TRUE)
#
plot(AggBd$Group.1,AggBd$x,pch=30)
obviously remove or change the axis ticks to prevent overlap e.g.
plot(AggBd$Group.1,AggBd$x,pch=30,xaxt="n",yaxt="n",xlab="",ylab="")
giving:

Plotting More than 2 Factors

Suppose I ran a factor analysis & got 5 relevant factors. Now, I want to graphically represent the loading of these factors on the variables. Can anybody please tell me how to do it. I can do using 2 factors. But can't able to do when number of factors are more than 2.
The 2 factor plotting is given in "Modern Applied Statistics with S", Fig 11.13. I want to create similar graph but with more than 2 factors. Please find the snap of the Fig mentioned above:
X & y axes are the 2 factors.
Regards,
Ari
Beware: not the answer you are looking for and might be incorrect also, this is my subjective thought.
I think you run into the problem of sketching several dimensions on a two dimension screen/paper.
I would say there is no sense in plotting more factors' or PCs' loadings, but if you really insist: display the first two (based on eigenvalues) or create only 2 factors. Or you could reduce dimension by other methods also (e.g. MDS).
Displaying 3 factors' loadings in a 3 dimensional graph would be just hardly clear, not to think about more factors.
UPDATE: I had a dream about trying to be more ontopic :)
You could easily show projections of each pairs of factors as #joran pointed out like (I am not dealing with rotation here):
f <- factanal(mtcars, factors=3)
pairs(f$loadings)
This way you could show even more factors and be able to tweak the plot also, e.g.:
f <- factanal(mtcars, factors=5)
pairs(f$loadings, col=1:ncol(mtcars), upper.panel=NULL, main="Factor loadings")
par(xpd=TRUE)
legend('topright', bty='n', pch='o', col=1:ncol(mtcars), attr(f$loadings, 'dimnames')[[1]], title="Variables")
Of course you could also add rotation vectors also by customizing the lower triangle, or showing it in the upper one and attaching the legend on the right/below etc.
Or just point the variables on a 3D scatterplot if you have no more than 3 factors:
library(scatterplot3d)
f <- factanal(mtcars, factors=3)
scatterplot3d(as.data.frame(unclass(f$loadings)), main="3D factor loadings", color=1:ncol(mtcars), pch=20)
Note: variable names should not be put on the plots as labels, but might go to a distinct legend in my humble opinion, specially with 3D plots.
It looks like there's a package for this:
http://factominer.free.fr/advanced-methods/multiple-factor-analysis.html
Comes with sample code, and multiple factors. Load the FactoMineR package and play around.
Good overview here:
http://factominer.free.fr/docs/article_FactoMineR.pdf
Graph from their webpage:
You can also look at the factor analysis object and see if you can't extract the values and plot them manually using ggplot2 or base graphics.
As daroczig mentions, each set of factor loadings gets its own dimension. So plotting in five dimensions is not only difficult, but often inadvisable.
You can, though, use a scatterplot matrix to display each pair of factor loadings. Using the example you cite from Venables & Ripley:
#Reproducing factor analysis from Venables & Ripley
#Note I'm only doing three factors, not five
data(ability.cov)
ability.FA <- factanal(covmat = ability.cov,factor = 3, rotation = "promax")
load <- loadings(ability.FA)
rot <- ability.FA$rot
#Pairs of factor loadings to plot
ind <- combn(1:3,2)
par(mfrow = c(2,2))
nms <- row.names(load)
#Loop over pairs of factors and draw each plot
for (i in 1:3){
eqscplot(load[,ind[1,i]],load[,ind[2,i]],xlim = c(-1,1),
ylim = c(-0.5,1.5),type = "n",
xlab = paste("Factor",as.character(ind[1,i])),
ylab = paste("Factor",as.character(ind[2,i])))
text(load[,ind[1,i]],load[,ind[2,i]],labels = nms)
arrows(c(0,0),c(0,0),rot[ind[,i],ind[,i]][,1],
rot[ind[,i],ind[,i]][,2],length = 0.1)
}
which for me resulting in the following plot:
Note that I had to play a little with the the x and y limits, as well as the various other fiddly bits. Your data will be different and will require different adjustments. Also, plotting each pair of factor loadings with five factors will make for a rather busy collection of scatterplots.

How to plot data grouped by a factor, but not as a boxplot

In R, given a vector
casp6 <- c(0.9478638, 0.7477657, 0.9742675, 0.9008372, 0.4873001, 0.5097587, 0.6476510, 0.4552577, 0.5578296, 0.5728478, 0.1927945, 0.2624068, 0.2732615)
and a factor:
trans.factor <- factor (rep (c("t0", "t12", "t24", "t72"), c(4,3,3,3)))
I want to create a plot where the data points are grouped as defined by the factor. So the categories should be on the x-axis, values in the same category should have the same x coordinate.
Simply doing plot(trans.factor, casp6) does almost what I want, it produces a boxplot, but I want to see the individual data points.
require(ggplot2)
qplot(trans.factor, casp6)
You can do it with ggplot2, using facets. When I read "I want to create a plot where the data points are grouped as defined by the factor", the first thing that came to my mind was facets.
But in this particular case, faster alternative should be:
plot(as.numeric(trans.factor), casp6)
And you can play with plot options afterwards (type, fg, bg...), but I recommend sticking with ggplot2, since it has much cleaner code, great functionality, you can avoid overplotting... etc. etc.
Learn how to deal with factors. You got barplot when evaluating plot(trans.factor, casp6) 'cause trans.factor was class of factor (ironically, you even named it in such manor)... and trans.factor, as such, was declared before a continuous (numeric) variable within plot() function... hence plot() "feels" the need to subset data and draw boxplot based on each part (if you declare continuous variable first, you'll get an ordinary graph, right?). ggplot2, on the other hand, interprets factor in a different way... as "an ordinary", numeric variable (this stands for syntax provided by Jonathan Chang, you must specify geom when doing something more complex in ggplot2).
But, let's presuppose that you have one continuous variable and a factor, and you want to apply histogram on each part of continuous variable, defined by factor levels. This is where the things become complicated with base graph capabilities.
# create dummy data
> set.seed(23)
> x <- rnorm(200, 23, 2.3)
> g <- factor(round(runif(200, 1, 4)))
By using base graphs (package:graphics):
par(mfrow = c(1, 4))
tapply(x, g, hist)
ggplot2 way:
qplot(x, facets = . ~ g)
Try to do this with graphics in one line of code (semicolons and custom functions are considered cheating!):
qplot(x, log(x), facets = . ~ g)
Let's hope that I haven't bored you to death, but helped you!
Kind regards,
aL3xa
I find the following solution:
stripchart(casp6~trans.factor,data.frame(casp6,trans.factor),pch=1,vertical=T)
simple and direct.
(Refer eg to http://www.mail-archive.com/r-help#r-project.org/msg34176.html)
You may be able to get close to what you want using lattice graphics by doing:
library(lattice)
xyplot(casp6 ~ trans.factor,
scales = list(x = list(at = 1:4, labels = levels(trans.factor))))
I think there's a better solution (I wrote it for a workshop a few days ago), but it slipped my mind. Here's an ugly substitute with base graphics. Feel free to annotate the x axis ad libitum. Personally, I like Greg's solution.
plot(0, 0, xlim = c(1, 4), ylim = range(casp6), type = "n")
points(casp6 ~ trans.factor)
No extra package needed
I'm a bit late to the party, but I found that you can get the desired result very easily with the standard plot function -- simply convert the factor to a numeric value:
plot(as.numeric(trans.factor), casp6)
10 year old question...but if you want a neat base R solution:
plot(trans.factor, casp6, border=NA, outline=FALSE)
points(trans.factor, casp6)
The first line sets up the plot but draws nothing. The second adds the points. This is slightly neater than the solutions that force x to be numeric.

Resources