How to adjust x labels in R boxplot - r

This is my code to create a boxplot in R that has 4 boxplots in one.
psnr_x265_256 <- c(39.998,39.998, 40.766, 38.507,38.224,40.666,38.329,40.218,44.746,38.222)
psnr_x264_256 <- c(39.653, 38.106,37.794,36.13,36.808,41.991,36.718,39.26,46.071,36.677)
psnr_xvid_256 <- c(33.04564,33.207269,32.715427,32.104696,30.445141,33.135261,32.669766, 31.657039,31.53103,31.585865)
psnr_mpeg2_256 <- c(32.4198,32.055051,31.424819,30.560274,30.740421,32.484694, 32.512268,32.04659,32.345848, 31)
all_errors = cbind(psnr_x265_256, psnr_x264_256, psnr_xvid_256,psnr_mpeg2_256)
modes = cbind(rep("PSNR",10))
journal_linear_data <-data.frame(psnr_x265_256, psnr_x264_256, psnr_xvid_256,psnr_mpeg2_256)
yvars <- c("psnr_x265_256","psnr_x264_256","psnr_xvid_256","psnr_mpeg2_256")
xvars <- c("x265","x264","xvid","mpeg2")
bmp(filename="boxplot_PSNR_256.bmp")
boxplot(journal_linear_data[,yvars], xlab=xvars, ylab="PSNR")
dev.off()
This is the image I get.
I want to have the corresponding values for each boxplot in x axis "x265","x264","xvid","mpeg2".
Do you have any idea how to fix this?

There are multiple ways of changing the labels for your boxplot variables. Probably the simplest way is changing the column names of your data frame:
colnames(journal_linear_data) <- c("x265","x264","xvid","mpeg2")
Even simpler: you could do this right at the creation of your data frame too:
journal_linear_data <- data.frame(x265=psnr_x265_256, x264=psnr_x264_256, xvid=psnr_xvid_256, mpeg2=psnr_mpeg2_256)
If you run into the problem of your labels not being shown or overlapping due to too few space, try rotating the x labels using the las parameter, e.g. las=2 or las=3.

Related

Dotchart with secondary axis

I'm trying to produce a dotchart with a secondary axis on top. However once I plot the second dotchart (with a par(new=T)), I can't figure out how not to display the axis ticks over the previous ones in axis side=1. Here's my code with mock data:
y1_i <- c(2,8,2,14,2)
y2_i <- c(15,17,28,22,30)
y1_f <- c(4,9,11,16,7)
y2_f <- c(13,11,16,11,21)
y=c(y1_i,y2_i,y1_f,y2_f)
x <- c("AAEG","AALO","AGAM","ACHR","AALB")
y1=c(y1_i,y1_f)
y2=c(y2_i,y2_f)
dotchart(y1_i,labels=x,xlab="N50 length",xlim = c(0,max(y1)))
par(new=T)
dotchart(y2_i,labels=x,xlim = c(0,max(y2)))
axis(side=3)
Also, if possible, I would like to add a second data set which would be slightly pushed vertically above the first dataset (to not overlap it), but still corresponding to the same y-axis categories.
Thank you for any suggestion :)
Found it, by using dotchart2 from the Hmisc package
library(Hmisc)
y1_i <- c(2,8,2,14,2)
y2_i <- c(15,17,28,22,30)
y1_f <- c(4,9,11,16,7)
y2_f <- c(13,11,16,11,21)
y=c(y1_i,y2_i,y1_f,y2_f)
x <- c("AAEG","AALO","AGAM","ACHR","AALB")
y1=c(y1_i,y1_f)
y2=c(y2_i,y2_f)
y1_i <- c(2,8,2,14,2)
y2_i <- c(15,17,28,22,30)
y1_f <- c(4,9,11,16,7)
y2_f <- c(13,11,16,11,21)
y=c(y1_i,y2_i,y1_f,y2_f)
x <- c("AAEG","AALO","AGAM","ACHR","AALB")
y1=c(y1_i,y1_f)
y2=c(y2_i,y2_f)
dotchart2(y1_i,labels=x,xlab="N50 length",xlim = c(0,max(y1)))
par(new=T)
dotchart2(y2_i,labels=x,xlim = c(0,max(y2)),xlab="Scaffold number",lines=F,xaxis=F)
axis(side=3,xlab="Scaffold number")

Gantt plot in base r - modifying plot properties

I would like to ask a follow-up question related to the answer given in this post [Gantt style time line plot (in base R) ] on Gantt plots in base r. I feel like this is worth a new question as I think these plots have a broad appeal. I'm also hoping that a new question would attract more attention. I also feel like I need more space than the comments of that question to be specific.
The following code was given by #digEmAll . It takes a dataframe with columns referring to a start time, end time, and grouping variable and turns that into a Gantt plot. I have modified #digEmAll 's function very slightly to get the bars/segments in the Gantt plot to be contiguous to one another rather than having a gap. Here it is:
plotGantt <- function(data, res.col='resources',
start.col='start', end.col='end', res.colors=rainbow(30))
{
#slightly enlarge Y axis margin to make space for labels
op <- par('mar')
par(mar = op + c(0,1.2,0,0))
minval <- min(data[,start.col])
maxval <- max(data[,end.col])
res.colors <- rev(res.colors)
resources <- sort(unique(data[,res.col]),decreasing=T)
plot(c(minval,maxval),
c(0.5,length(resources)+0.5),
type='n', xlab='Duration',ylab=NA,yaxt='n' )
axis(side=2,at=1:length(resources),labels=resources,las=1)
for(i in 1:length(resources))
{
yTop <- i+0.5
yBottom <- i-0.5
subset <- data[data[,res.col] == resources[i],]
for(r in 1:nrow(subset))
{
color <- res.colors[((i-1)%%length(res.colors))+1]
start <- subset[r,start.col]
end <- subset[r,end.col]
rect(start,yBottom,end,yTop,col=color)
}
}
par(op) # reset the plotting margins
}
Here are some sample data. You will notice that I have four groups 1-4. However, not all dataframes have all four groups. Some only have two, some only have 3.
mydf1 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,1,1,2,2,2,1,1,1))
mydf2 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,2,2,3,4,3,2,1,1))
mydf3 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(4,4,4,4,4,4,3,2,3,3))
mydf4 <- data.frame(startyear=2000:2009, endyear=2001:2010, group=c(1,1,1,2,3,3,3,2,1,1))
Here I run the above function, but specify four colors for plotting:
plotGantt(mydf1, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
plotGantt(mydf2, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
plotGantt(mydf3, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
plotGantt(mydf4, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
These are the plots:
What I would like to do is modify the function so that:
1) it will plot on the y-axis all four groups regardless of whether they actually appear in the data or not.
2) Have the same color associated with each group for every plot regardless of how many groups there are. As you can see, mydf2 has four groups and all four colors are plotted (1-red, 2-orange, 3-yellow, 4-gray). These colors are actually plotted with the same groups for mydf3 as that only contains groups 2,3,4 and the colors are picked in reverse order. However mydf1 and mydf4 have different colors plotted for each group as they do not have any group 4's. Gray is still the first color chosen but now it is used for the lowest occurring group (group2 in mydf1 and group3 in mydf3).
It appears to me that the main thing I need to work on is the vector 'resources' inside the function, and have that not just contain the unique groups but all. When I try manually overriding to make sure it contains all the groups, e.g. doing something as simple as resources <-as.factor(1:4) then I get an error:
'Error in rect(start, yBottom, end, yTop, col = color) : cannot mix zero-length and non-zero- length coordinates'
Presumably the for loop does not know how to plot data that do not exist for groups that don't exist.
I hope that this is a replicable/readable question and it's clear what I'm trying to do.
EDIT: I realize that to solve the color problem, I could just specify the colors for the 3 groups that exist in each of these sample dfs. However, my intention is to use this plot as an output to a function whereby it wouldn't be known ahead of time if all of the groups exist for a particular df.
I slightly modified your function to account for NA in start and end dates :
plotGantt <- function(data, res.col='resources',
start.col='start', end.col='end', res.colors=rainbow(30))
{
#slightly enlarge Y axis margin to make space for labels
op <- par('mar')
par(mar = op + c(0,1.2,0,0))
minval <- min(data[,start.col],na.rm=T)
maxval <- max(data[,end.col],na.rm=T)
res.colors <- rev(res.colors)
resources <- sort(unique(data[,res.col]),decreasing=T)
plot(c(minval,maxval),
c(0.5,length(resources)+0.5),
type='n', xlab='Duration',ylab=NA,yaxt='n' )
axis(side=2,at=1:length(resources),labels=resources,las=1)
for(i in 1:length(resources))
{
yTop <- i+0.5
yBottom <- i-0.5
subset <- data[data[,res.col] == resources[i],]
for(r in 1:nrow(subset))
{
color <- res.colors[((i-1)%%length(res.colors))+1]
start <- subset[r,start.col]
end <- subset[r,end.col]
rect(start,yBottom,end,yTop,col=color)
}
}
par(mar=op) # reset the plotting margins
invisible()
}
In this way, if you simply append all your possible group values to your data you'll get them printed on the y axis. e.g. :
mydf1 <- data.frame(startyear=2000:2009, endyear=2001:2010,
group=c(1,1,1,1,2,2,2,1,1,1))
# add all the group values you want to print with NA dates
mydf1 <- rbind(mydf1,data.frame(startyear=NA,endyear=NA,group=1:4))
plotGantt(mydf1, res.col='group', start.col='startyear', end.col='endyear',
res.colors=c('red','orange','yellow','gray99'))
About the colors, at the moment the ordered res.colors are applied to the sorted groups; so the 1st color in res.colors is applied to 1st (sorted) group and so on...

Assigning "beanplot" object to variable in R

I have found that the beanplot is the best way to represent my data. I want to look at multiple beanplots together to visualize my data. Each of my plots contains 3 variables, so each one looks something like what would be generated by this code:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
c <- rnorm(100)
beanplot(a, b ,c ,ylim = c(-4, 4), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
(Would have just included an image but my reputation score is not high enough, sorry)
I have 421 of these that I want to put into one long PDF (EDIT: One plot per page is fine, this was just poor wording on my part). The approach I have taken was to first generate the beanplots in a for loop and store them in a list at each iteration. Then I will use the multiplot function (from the R Cookbook page on multiplot) to display all of my plots on one long column so I can begin my analysis.
The problem is that the beanplot function does not appear to be set up to assign plot objects as a variable. Example:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
plot1 <- beanplot(a, b, ylim = c(-5,5), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
plot1
If you then type plot1 into the R console, you will get back two of the plot parameters but not the plot itself. This means that when I store the plots in the list, I am unable to graph them with multiplot. It will simply return the plot parameters and a blank plot.
This behavior does not seem to be the case with qplot for example which will return a plot when you recall the stored plot. Example:
library(ggplot2)
a <- rnorm(100)
b <- rnorm(100)
plot2 <- qplot(a,b)
plot2
There is no equivalent to the beanplot that I know of in ggplot. Is there some sort of workaround I can use for this issue?
Thank you.
You can simply open a PDF device with pdf() and keep the default parameter onefile=TRUE. Then call all your beanplot()s, one after the other. They will all be in one PDF document, each one on a separate page. See here.

Densityplots using colwise - different colors for each line?

I need a plot of different density lines, each in another color. This is an example code (but much smaller), using the built-in data.fame USArrests. I hope it is ok to use it?
colors <- heat.colors(3)
plot(density(USArrests[,2], bw=1, kernel="epanechnikov", na.rm=TRUE),col=colors[1])
lines1E <- function(x)lines(density(x,bw=1,kernel="epanechnikov",na.rm=TRUE))
lines1EUSA <- colwise(lines1E)(USArrests[,3:4])`
Currently the code produces with colwise() just one color. How can I get each line with another color? Or is there ab better way to plot several density lines with different colors?
I don't quite follow your example, so I've created my own example data set. First, create a matrix with three columns:
m = matrix(rnorm(60), ncol=3)
Then plot the density of the first column:
plot(density(m[,1]), col=2)
Using your lines1E function as a template:
lines1E = function(x) {lines(density(x))}
We can add multiple curves to the plot:
colwise(lines1E)(as.data.frame(m[ ,2:3]))
Personally, I would just use:
##Added in NA for illustration
m = matrix(rnorm(60), ncol=3)
m[1,] = NA
plot(density(m[,1], na.rm=T))
sapply(2:ncol(m), function(i) lines(density(m[,i], na.rm=T), col=i))
to get:

Stacked barplot is opposite order to legend?

A minor question about plotting stacked barplot in R.
The stacked bars represent the series bottom-to-top.
But the legend always shows the series top-to-bottom. I think that is also true with ggplot2::geom_bar
Is there any nicer idiom than using rev(...) twice inside either legend() or barplot() as in:
exports <- data.frame(100*rbind('Americas'=runif(6),'Asia'=runif(6),'Other'=runif(6)))
colnames(exports) <- 2004:2009
series_we_want <- c(1,2,3)
barplot( as.matrix(exports[series_we_want,]), col=mycolors, ...)
legend(x="topleft", legend=rev(rownames(exports)[series_we_want]), col=rev(mycolors) ...)
(If you omit one of the rev()'s the output is obviously meaningless. Seems like an enhance case for adding a single flag yflip=TRUE or yreverse=TRUE)
This is what I got using your code:
exports <- data.frame(100*rbind('Americas'=runif(6),'Asia'=runif(6),'Other'=runif(6)))
colnames(exports) <- 2004:2009
series_we_want <- c(1,2,3)
barplot( as.matrix(exports[series_we_want,]))
legend(x="topleft", legend=rev(rownames(exports)[series_we_want]))
try this:
exports <- data.frame(100*rbind('Americas'=runif(6),'Asia'=runif(6),'Other'=runif(6)))
colnames(exports) <- 2004:2009
series_we_want <- c(1,2,3)
test_data<-as.matrix(exports[series_we_want])
barplot( test_data,
legend.text=as.character(rev(rownames(exports)[series_we_want])),
args.legend = list(x="topleft"))
seems to produce the legend in the opposite order of what you have

Resources