I'm trying to carry out a PCA Analysis on my dataset, and I can plot the loadings out on the basic plot function. But I want to do them on ggplot, but I keep getting this error, "ggplot2 doesn't know how to deal with data of class loadings".
I'm using the princomp function, and I'm plotting my first component against my second component. And I need to colour it on the basis of an external factor which has the same rownames as the dataframe I'm carrying out the pca on. I've tried to do the same thing as this Tutorial with prcomp, but with pca$loadings, but it didnt work. I need to plot my first PC against my second PC in ggplot.
Blockquote
xy <- princomp(iris[,-5])
plot(xy$loadings[,"Comp.1"], xy$loadings[,"Comp.2"], col=iris$Species)
ggplot(xy, aes(x=xy$loadings[,1], y=xy$loadings[,2]))+geom_point()
ggplot(as.data.frame(xy$loadings[,1:2]), aes(x=Comp.1, y=Comp.2))+geom_point()
Related
I expect this kind of scatter plot.
However, whenever I tried to apply on my data, I get this.
I just used this code, and this is my data.
And I also confirmed they are numeric class.
ggplot(selected.df, aes(x, y))
making a right plot.
Those variables were not numeric.
I want to plot several variables and their respective correlation coefficients using the function pairs().
It works well, though I would like to put all the axes' legends on the bottom and left side of the plot.
By default, they are changing side every two plots as you can see on this example:
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris)
If anyone has an answer with and without ggplot2 R package, that would be perfect.
Use GGally
library(GGally)
ggpairs(data=iris)
Or just the continuous columns
ggpairs(data = iris[, 1:4])
I want to create a matrix plot but using one of the categorical variables as the color. I used the following code for the matrix:
pairs(salesintl)
It gave me the matrix plot just fine (see the output here).
Then I revised the code to:
pairs(salesintl, col=salesintl$Status)
It returns an empty matrix plot (see output here).
It is like a empty frame with no content. salesintl$status is a factor with 2 layers.
What I did wrong?
Thanks,
If you happy to use the ggplot world rather thn base R grahics, then:
library(GGally)
ggpairs(salesintl, aes(color = status))
is the simplest expression I've found.
I also like that ggpairs will accept categorical variables, and gives something similar to mosaicplots (when categorical vs categorical) or boxplots (when categorical vs numeric).
I am trying to do PCA with R.
My Data has 10,000 columns and 90 rows
I used the prcomp function to do PCA.
Trying to prepare a biplot with the prcomp results, I ran into the problem that the 10,000 plotted vectors cover my datapoints. Is there any option for the biplot to hide the vectors' representation?
OR
I can use plot to get the PCA results. But I am not sure how to label these points according to my datapoints, which are numbered 1 to 90.
Sample<-read.table(file.choose(),header=F,sep="\t")
Sample.scaled<-data.frame(apply(Sample_2XY,2,scale))
Sample_scaled.2<-data.frame(t(na.omit(t(Sample_2XY.scaled))))
pca.Sample<-prcomp(Sample_2XY.scaled.2,retx=TRUE)
pdf("Sample_plot.pdf")
plot(pca.Sample$x)
dev.off()
If you do a help(prcomp) or ?prcomp, the help file tells us all the things contained in the prcomp() object returned by the function. We just need to pick which things we want to plot and do it with some function that gives us more control than biplot().
A more general trick for cases when the help file doesn't clarify things is to do a str() on the prcomp object (in your case pca.Sample) to see all its parts and find what we want ( str() compactly displays the internal structure of an R object. )
Here is an example with some of R's sample data:
# do a pca of arrests in different states
p<-prcomp(USArrests, scale = TRUE)
str(p) gives me something ugly and too long to include, but I can see that p$x has the states as rownames and their locations on the principal components as columns. Armed with this, we can plot it any way we want, such as with plot() and text() (for labels):
# plot and add labels
plot(p$x[,1],p$x[,2])
text(p$x[,1],p$x[,2],labels=rownames(p$x))
If we are making a scatterplot with many observations, the labels may not be readable. We therefore might want to only label more extreme values, which we can identify with quantile():
#make a new dataframe with the info from p we want to plot
df <- data.frame(PC1=p$x[,1],PC2=p$x[,2],labels=rownames(p$x))
#make sure labels are not factors, so we can easily reassign them
df$labels <- as.character(df$labels)
# use quantile() to identify which ones are within 25-75 percentile on both
# PC and blank their labels out
df[ df$PC1 > quantile(df$PC1)["25%"] &
df$PC1 < quantile(df$PC1)["75%"] &
df$PC2 > quantile(df$PC2)["25%"] &
df$PC2 < quantile(df$PC2)["75%"],]$labels <- ""
# plot
plot(df$PC1,df$PC2)
text(df$PC1,df$PC2,labels=df$labels)
Let's say I have the following dataset
bodysize=rnorm(20,30,2)
bodysize=sort(bodysize)
survive=c(0,0,0,0,0,1,0,1,0,0,1,1,0,1,1,1,0,1,1,1)
dat=as.data.frame(cbind(bodysize,survive))
I'm aware that the glm plot function has several nice plots to show you the fit,
but I'd nevertheless like to create an initial plot with:
1)raw data points
2)the loigistic curve and both
3)Predicted points
4)and aggregate points for a number of predictor levels
library(Hmisc)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
All fine up to here.
Now I want to plot the real data survival rates for a given levels of x1
dat$bd<-cut2(dat$bodysize,g=5,levels.mean=T)
AggBd<-aggregate(dat$survive,by=list(dat$bd),data=dat,FUN=mean)
plot(AggBd,add=TRUE)
#Doesn't work
I've tried to match AggBd to the dataset used for the model and all sort of other things but I simply can't plot the two together. Is there a way around this?
I basically want to overimpose the last plot along the same axes.
Besides this specific task I often wonder how to overimpose different plots that plot different variables but have similar scale/range on two-dimensional plots. I would really appreciate your help.
The first column of AggBd is a factor, you need to convert the levels to numeric before you can add the points to the plot.
AggBd$size <- as.numeric (levels (AggBd$Group.1))[AggBd$Group.1]
to add the points to the exisiting plot, use points
points (AggBd$size, AggBd$x, pch = 3)
You are best specifying your y-axis. Also maybe using par(new=TRUE)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
#then
par(new=TRUE)
#
plot(AggBd$Group.1,AggBd$x,pch=30)
obviously remove or change the axis ticks to prevent overlap e.g.
plot(AggBd$Group.1,AggBd$x,pch=30,xaxt="n",yaxt="n",xlab="",ylab="")
giving: