r producing multiple violin plots with one graphic device - r

I have a data frame that looks like that:
bin_with_regard_to_strand CLONE3
31 0.14750872
33 0.52735917
28 0.48559060
. .
. .
I want to use this data frame to generate violin plots in such a way that all of the values in CLONE3 corresponding to a given value of bin_with_regard_to_strand will generate one plot.
Further, I want all of the plots to appear in the same graphic device (I'm using R-studio, and I want all of the plots to appear in one plot window).
Theoretically I could do this with:
vioplot(df$CLONE3[which(df$bin_with_regard_to_strand==1)],
df$CLONE3[which(df$bin_with_regard_to_strand==2)]...)
but since bin_with_regard_to_strand has 60 different values, this seems a bit ridiculous.
I tried using tapply:
tapply(df$CLONE3, df$bin_with_regard_to_strand,vioplot)
But that would open 60 different windows (one for each plot).
Or, if I used the add parameter:
tapply(df$CLONE3, df$bin_with_regard_to_strand,vioplot(add=TRUE))
generated a single plot with the data from all values bin_with_regard_to_strand (seperated by lines).
Is there a way to do this?

You could use par(mfrow=c(rows, columns)) (see ?par for details).
(see also ?layout for complexer arrangements)
d <- lapply(1:6, function(x)runif(100)) # generate some example data
library("vioplot")
par(mfrow=c(3, 2)) # use a 3x2 (rows x columns) layout
lapply(d, vioplot) # call plot for each list element
par(mfrow=c(1, 1)) # reset layout

Another alternative to mfrow, is to use layout. It is very handy to organize your plots. You just create a matrix with plots index. Here what you can do. It seems that 60 boxplots is a huge number. Maybe you should organize them in 2 pages.
The code below in function of N (number of plots)
library(vioplot)
N <- 60
par(mar=rep(2,4))
layout(matrix(c(1:N),
nrow=10,byrow=T))
dat <- data.frame(bin_with_regard_to_strand=gl(N,10),CLONE3=rnorm(10*N))
with(dat ,
tapply(CLONE3,bin_with_regard_to_strand ,vioplot))

This is an old question, but though I would put out a different solution for getting vioplot to make multiple violin plots on the same graph (i.e. same axes), rather than on different graphics objects like the above answers.
Basically use do.call to apply vioplot to a list of data. Ultimately, vioplot is not very well written (can't even set the title, axis names, etc.). I usually prefer base R, but this is a case where ggplot2 options is probably the way to go.
x<-rnorm(1000)
fac<-rep(c(1:10),each=100)
listOfData<-tapply(x,fac,function(x){x},simplify=FALSE)
names(listOfData)[[1]]<-"x" #because vioplot requires a 'x' argument
do.call(vioplot,listOfData)
resultingImage

Related

How to check if a column has numeric or categorical levels in R?

I am trying to plot 9 barplots in a 3X3 matrix in R using base-R wrapped inside a for loop. (I am working on a workhorse solution for visualizing every column before I begin working on manipulating data) Below is the code:
library(ISLR);
library(ggplot2);
# load wage data
data(Wage)
par(mfrow=c(3,3))
for(i in 1:(dim(Wage)[2]-2)){
plot(Wage[,i],main = paste0(names(Wage)[i]),las = 2)
}
But unfortunately can't do properly for first 2 columns because they are numeric and actually needs a histogram. I get it that I need to fit if-else condition somewhere inside for() statement but that is giving me errors. below is the output where first 2 columns are plotted wrong. (Age and year are actually numeric and I may need to use them in X-axis instead of defaulting them to y).
Kindly requesting to suggest an edit/hack? I also learnt that I cant' use par() when I am wrapping ggplot inside for so I had to use base-R otherwise ggplot would have been great aesthetically.

How to stop hist from printing all the information into the console

I have a list of 4 data sets and I want to plot histograms based on each element of my list (each data set) into one graph. Let's say I would like to plot histograms of column means for each matrix. Here is a code that I am using:
my.data <- replicate(3, list(replicate(10, rnorm(20))))
lapply(my.data, function(x){hist(colMeans(x))})
I know how to plot multiple graphs on one figure but I don't know how to suppress printing of the histogram information.
As suggested by #Rich Scriven invisible(lapply(...)) solves the problem. I thought I would re-post the answer from comments here so that my question does not hang in the air as unanswered.

R: data.frame to vector

Let me preface this question by saying that I know very little about R. I'm importing a text file into R using read.table("file.txt", T). The text file is in the general format:
header1 header2
a 1
a 4
b 3
b 2
Each a is an observation from a sample and similarly each b is an observation from a different sample. I want to calculate various statistics of the sets of a and b which I'm doing with tapply(header2, header1, mean). That works fine.
Now I need to do some qqnorm plots of a and b and draw with qqline. I can use tapply(header2, header1, qqnorm) to make quantile plots of each BUT using tapply(header2, header1, qqline) draws both best fit lines on the last quantile plot. Programatically that makes sense but it doesn't help me.
So my question is, how can convert the data frame to two vectors (one for all a and one for all b)? Does that make sense? Basically, in the above example, I'd want to end up with two vectors: a=(1,4) and b=(3,2).
Thanks!
Create a function that does both. You won't be able (easily at least) to revert to an old graphics device.
e.g.
with(dd, tapply(header2,header1, function(x) {qqnorm(x); qqline(x)}))
You could use data.table here for coding elegance (and speed)
You can pass the equivalent of a body of a function that is evaluated within the scope of the data.table e.g.
library(data.table)
DT <- data.table(dd)
DT[, {qqnorm(x)
qqline(x)}, by=header1]
You don't really want to pollute your global environments with lots of objects (that will be inefficient).

Plotting Multiple Graphs using R

I currently have a dataset which has a format of: (x, y, type)
I've used the code that is found on the example of plotting with Postgres through R.
My question is: How would I get R to generate multiple graphs for each unique "type" column?
I'm new to R, so my appologies if this is something that is extremely easy and I just lack the understanding of loops with R.
So lets say we have this data:
(1,1,T), (1,2,T), (1,3,T), (1,4,T), (1,5,T), (1,6,T),
(1,1,A), (1,2,B), (1,3,B), (1,4,B), (1,5,A), (1,6,A),
(1,1,B), (1,2,B), (1,3,C), (1,4,C), (1,5,C), (1,6,C),
It would plot 4 individual graphs on the page. One for each of the types T, A, B, and C. [Ploting x,y]
How would I do that with R when the data coming in may look like the data above?
While the other post has some good info, there's a faster way to do all that. So assuming your data frame or matrix is called DF and is in the form above (where each (1,2,B) or whatever is a row), then:
by(DF, DF[,3], function(x) plot(x[,1], x[,2], main=unique(x[,3])))
And that's it.
If you'd like all the four plots to be on the same page, you can first change the graphing paramter option:
par(mfrow=c(2,2))
And back to default par(mfrow=c(1,1) when you're done.
I'm quite fond of the ggplot2 package, which does the same thing that user1717913 suggests, but with slightly different syntax (it does a lot of other things very nicely, which is why I like it.)
test <- data.frame(x=rep(1,18),y=rep(1:6,3),type=c("T","T","T","T","T","T","A","B","B","B","A","A","B","B","C","C","C","C"))
require(ggplot2)
ggplot(test, aes(x=x, y=y)) + #define the data that the plot will use, and which variables go where
geom_point() + #plot it with points
facet_wrap(~type) #facet it by the type variable
R is really cool in that there's a bazillion (that's a technical term) different ways to do most things. The way I would do is is to split the data along the groups, and then plot by group.
To do that, the split command is what you want (I'll assume your data is in an object called data):
data.splitted <- split(data, data$type)
Now the data will have this form (let's assume you have 3 types, A, B, and C):
data.splitted
L A
| L x y type
| 1 4 A
| 3 6 A
L B
| L x y type
| 3 3 B
| 2 1 B
L C
L x y type
4 5 C
5 2 C
and so on. You would reference the "4" in the y column of group A like so:
data.splitted$A$y[1] or data.splitted[[1]][[2]][1] Hopefully seeing them both together makes enough sense.
Now that we have the data split, we're getting closer.
We still need to tell R that we want to plot a bunch of graphs to the same window. Now, this is just one way to go about it. You could also tell it to write each graph to a image file, or a pdf, or whatever you want.
groups <- names(data.splitted) puts your different types into a variable for reference later.
par(mfcol=c(length(groups),1))
Using mfcol fills the graphs in vertically. the mfrow option fills in horizontally. The c() just combines input. The length(groups) returns the total number of groups.
Now we can work on the for-loop.
for(i in 1:length(data.splitted)){ # This tells it what i is iterating from and to.
# It can start and stop wherever, or be a
# sequence, ascending or descending,
# the sky is the limit.
tempx <- data.splitted[[i]][[x]] # This just saves us
tempy <- data.splitted[[i]][[y]] # a bunch of typing.
plot(tempx, tempy, main=groups[i]) # Plot it and make the title the type.
rm(tempx, tempy) # Remove our temporary variables for the next run through.
}
So you see, it's not too bad when you break it down into its components. You can do pretty much anything this way. I have a project I'm working on right now, where I'm doing this for 18 lidar metrics that I calculated using another for loop.
Commands to read up on:
split, plot, data.frame, "[",
par(mfrow=___) and par(mfcol=___)
Here's a few helpful links to get you started. The most helpful one of all is built right in to R though. a ? followed by a command will bring up the html help for that command in your browser.
Good luck!

performing a calculation with a `paste`d vector reference

So I have some lidar data that I want to calculate some metrics for (I'll attach a link to the data in a comment).
I also have ground plots that I have extracted the lidar points around, so that I have a couple hundred points per plot (19 plots). Each point has X, Y, Z, height above ground, and the associated plot.
I need to calculate a bunch of metrics on the plot level, so I created plotsgrouped with split(plotpts, plotpts$AssocPlot).
So now I have a data frame with a "page" for each plot, so I can calculate all my metrics by the "plot page". This works just dandy for individual plots, but I want to automate it. (yes, I know there's only 19 plots, but it's the principle of it, darn it! :-P)
So far, I've got a for loop going that calculates the metrics and puts the results in a data frame called Results. I pulled the names of the groups into a list called groups as well.
for(i in 1:length(groups)){
Results$Plot[i] <- groups[i]
Results$Mean[i] <- mean(plotsgrouped$PLT01$Z)
Results$Std.Dev.[i] <- sd(plotsgrouped$PLT01$Z)
Results$Max[i] <- max(plotsgrouped$PLT01$Z)
Results$75%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .75)])
Results$50%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .50)])
...
and so on.
The problem arises when I try to do something like:
Results$mean[i] <- mean(paste("plotsgrouped", groups[i],"Z", sep="$")). mean() doesn't recognize the paste as a reference to the vector plotsgrouped$PLT27$Z, and instead fails. I've deduced that it's because it sees the quotes and thinks, "Oh, you're just some text, I can't get the mean of you." or something to that effect.
Btw, groups is a list of the 19 plot names: PLT01-PLT27 (non-consecutive sometimes) and FTWR, so I can't simply put a sequence for the numeric part of the name.
Anyone have an easier way to iterate across my test plots and get arbitrary metrics?
I feel like I have all the right pieces, but just don't know how they go together to give me what I want.
Also, if anyone can come up with a better title for the question, feel free to post it or change it or whatever.
Try with:
for(i in seq_along(groups)) {
Results$Plot[i] <- groups[i] # character names of the groups
tempZ = plotsgrouped[[groups[i]]][["Z"]]
Results$Mean[i] <- mean(tempZ)
Results$Std.Dev.[i] <- sd(tempZ)
Results$Max[i] <- max(tempZ)
Results$75%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .75)])
Results$50%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .50)])
}

Resources