How can I stop sapply dropping my barplot titles? - r

I'm wanting to make a barplot for the factor variables in my data set. To do this I've been running sapply(data[sapply(data, class)=='factor'],function(x) barplot(table(x))). To my annoyance, the plots remember their factor labels, but none of them have retained a title. How can I fix this without titling each graph by hand?
Currently, I'm getting humorously vague untitled graphs like this:

How about
## extract names
fvars <- names(data)[which(sapply(data,inherits,"factor"))]
## apply barplot() with main=
lapply(fvars, function(x) barplot(table(data[[x]]), main=x))
?
Example data:
data <- mtcars
for (i in c("vs","am","gear","carb")) data[[i]] <- factor(data[[i]])
Note that this creates all the plots at once. If you're working in a GUI with a plot history (RStudio or RGui) you can page back through the graphs. Otherwise, you might want to use par(mfrow=c(nr,nc)) (fill in number of rows and columns) to set up subplots before you start.
The numbers that are returned are the bar midpoints (see ?barplot): you could wrap the barplot() call in invisible() if you don't want to see them.

Related

R plot() - why do my points appear as horizontal lines

I'm trying to make a plot in R. My x-axis is a week number converted to factor and my y-axis is an amount.
When I run plot() instead of dots I get horizontal lines.
Why does this happen?
Here is a sample dataset:
df <- data.frame(fin_week=as.factor(seq(1,20, by =1)), amount=(rnorm(20)^2)*100)
plot(df)
Looking at the documentation, it's because the first column is a factor. When R tries to find the right plot() to run, it looks into plot.dataframe, where it plots on the type of 1st column i.e a factor. Hence it plots using plot.factor(), which gives a line by default, which is used for box plots.
try using plot.default(df) to plot and you should get it the scatter plot

qqnorm plotting for multiple subsets

I am very new to R. I have figured out how to make qqnorm plots on a subset of my dataframe. However, I would like to make qqnorm plots on subsets that are defined by two factors (one factor has 48 categories (brain_region) and each of those categories can be further subdivided by another factor, which has three levels (GroupID)). I have tried the following:
by(t, t[,"GroupID"], function(x) tapply(t$FA,t$brain_region,qqnorm))
but it does not seem to be working. I'm also not sure if this is the best approach, as I'm new to this program.
I would also like to save each of the separately generated qqnorm plot with the x axis as labeled as "FA" and the title with the specific level of each of the two factors (brain region/GroupID). Thank you very much for any help.
Plotting is one of the few things where apply isn't the optimal solution. ggplot offers you enough possibilities to get this done, as shown in this answer.
Plotting all levels in one go
If you use the base plots, you can better use a for loop for this. Plus, if you want to plot different plots on the same graphics device, you can use eg par(mfrow=) or layout (see the help page ?layout)
Let's take the built-in data set iris as an example:
data(iris)
op <- par(mfrow=c(1,3))
for(i in levels(iris$Species)){
tmp <- with(iris, Petal.Width[Species==i])
qqnorm(tmp,xlab="Petal.Width",main=i)
qqline(tmp)
}
par(op)
rm(i,tmp)
gives :
Don't forget to clean up your workspace after using a for loop. Not really obligatory, but it can prevent serious confusion later on.
Combine two factors
In order to get this done for 2 factor levels at the same time, you can either construct a nested for-loop, or combine both factors into a single factor. Take the dataset mtcars:
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$am <- factor(mtcars$am,
labels=c('automatic','manual'))
To combine both levels, you can use this simple construct :
mtcars$combined <- factor(paste(mtcars$cyl,mtcars$am,sep='/'))
And then do the same again. With two for loops, your code would like like the code below. Be warned though that this only works if you have data for every combination of the factors, and you don't have too many levels. If you have a lot of levels, you better save the plots by using eg png() (see ?png for info) instead of plotting them all on the same graphics device.
lcyl <- levels(mtcars$cyl)
lam <- levels(mtcars$am)
par(mfrow=c(length(lam),length(lcyl)))
for(i in lam){
for(j in lcyl){
tmp <- with(mtcars,mpg[am==i & cyl==j])
qqnorm(tmp,xlab="Petal.Width",
main=paste(i,j,sep="/"))
qqline(tmp)
}
}
gives :

save multiple plots in R as a .jpg file, how?

I am very new to R and I am using it for my probability class. I searched for this question here, but it looks that is not the same as I want to do. (If there is already an answer, please tell me).
The problem is that I want to save multiple plots of histograms in the same file. For example, if I do this in the R prompt, I get what I want:
library(PASWR)
data(Grades)
attach(Grades) # Grade has gpa and sat variables
par(mfrow=c(2,1))
hist(gpa)
hist(sat)
So I get both histograms in the same plot. but if I want to save it as a jpeg:
library(PASWR)
data(Grades)
attach(Grades) # Grades has gpa and sat variables
par(mfrow=c(2,1))
jpeg("hist_gpa_sat.jpg")
hist(gpa)
hist(sat)
dev.off()
It saves the file but just with one plot... Why? How I can fix this?
Thanks.
Also, if there is some good article or tutorial about how to plot with gplot and related stuff it will be appreciated, thanks.
Swap the order of these two lines:
par(mfrow=c(2,1))
jpeg("hist_gpa_sat.jpg")
so that you have:
jpeg("hist_gpa_sat.jpg")
par(mfrow=c(2,1))
hist(gpa)
hist(sat)
dev.off()
That way you are opening the jpeg device before doing anything related to plotting.
You could also have a look at the function layout. With this, you can arrange plots more freely. This example gives you a 2 column layout of plots with 3 rows.
The first row is occupied by one plot, the second row by 2 plots and the third row again by one plot. This can come in very handy.
x <- rnorm(1000)
jpeg("normdist.jpg")
layout(mat=matrix(c(1,1,2,3,4,4),nrow=3,ncol=2,byrow=T))
boxplot(x, horizontal=T)
hist(x)
plot(density(x))
plot(x)
dev.off()
Check ?layout how the matrix 'mat' (layout's first argument) is interpreted.

PCA Biplot : A way to hide vectors to see all data points clearly

I am trying to do PCA with R.
My Data has 10,000 columns and 90 rows
I used the prcomp function to do PCA.
Trying to prepare a biplot with the prcomp results, I ran into the problem that the 10,000 plotted vectors cover my datapoints. Is there any option for the biplot to hide the vectors' representation?
OR
I can use plot to get the PCA results. But I am not sure how to label these points according to my datapoints, which are numbered 1 to 90.
Sample<-read.table(file.choose(),header=F,sep="\t")
Sample.scaled<-data.frame(apply(Sample_2XY,2,scale))
Sample_scaled.2<-data.frame(t(na.omit(t(Sample_2XY.scaled))))
pca.Sample<-prcomp(Sample_2XY.scaled.2,retx=TRUE)
pdf("Sample_plot.pdf")
plot(pca.Sample$x)
dev.off()
If you do a help(prcomp) or ?prcomp, the help file tells us all the things contained in the prcomp() object returned by the function. We just need to pick which things we want to plot and do it with some function that gives us more control than biplot().
A more general trick for cases when the help file doesn't clarify things is to do a str() on the prcomp object (in your case pca.Sample) to see all its parts and find what we want ( str() compactly displays the internal structure of an R object. )
Here is an example with some of R's sample data:
# do a pca of arrests in different states
p<-prcomp(USArrests, scale = TRUE)
str(p) gives me something ugly and too long to include, but I can see that p$x has the states as rownames and their locations on the principal components as columns. Armed with this, we can plot it any way we want, such as with plot() and text() (for labels):
# plot and add labels
plot(p$x[,1],p$x[,2])
text(p$x[,1],p$x[,2],labels=rownames(p$x))
If we are making a scatterplot with many observations, the labels may not be readable. We therefore might want to only label more extreme values, which we can identify with quantile():
#make a new dataframe with the info from p we want to plot
df <- data.frame(PC1=p$x[,1],PC2=p$x[,2],labels=rownames(p$x))
#make sure labels are not factors, so we can easily reassign them
df$labels <- as.character(df$labels)
# use quantile() to identify which ones are within 25-75 percentile on both
# PC and blank their labels out
df[ df$PC1 > quantile(df$PC1)["25%"] &
df$PC1 < quantile(df$PC1)["75%"] &
df$PC2 > quantile(df$PC2)["25%"] &
df$PC2 < quantile(df$PC2)["75%"],]$labels <- ""
# plot
plot(df$PC1,df$PC2)
text(df$PC1,df$PC2,labels=df$labels)

Producing statistics over levels

I've generated a set of levels from my dataset, and now I want to find a way to sum the rest of the data columns in order to plot it while plotting my first column. Something like:
levelSet <- cut(frame$x1, "cutting")
boxplot(frame$x1~levelSet)
for (l in levelSet)
{
x2Sum<-sum(frame$x2[levelSet==l])
}
or maybe the inside of the loop should look like:
lines(sum(frame$x2[levelSet==l]))
Any thoughts? I am new to R, but I can't seem to get a hang of the indexing and ~ notation thus far.
I know r doesn't work this way, but I'd like functionality that 'looks' like
hist(frame$x2~levelSet)
## Or
hist(frame$x2, breaks = levelSet)
To plot a histograph, boxplot, etc. over a level set:
Try the lattice package:
library(lattice)
histogram(~x2|equal.count(x1),data=frame)
Substitute shingle for equal.count to set your own break points.
ggplot2 would also work nicely for this.
To put a histogram over a boxplot:
par(mfrow=c(2,1))
hist(x2)
boxplot(x2)
You can also use the layout() command to fine-tune the arrangement.

Resources