R multi boxplot in one graph with value (quantile) - r

How to create multiple boxplot with value shown in R ?
Now I'm using this code
boxplot(Data_frame[ ,2] ~ Data_frame[ ,3], )
I tried to use this
boxplot(Data_frame[ ,2] ~ Data_frame[ ,3], )
text(y=fivenum(Data_frame$x), labels =fivenum(Data_frame$x), x=1.25)
But only first boxplot have value. How to show value in all boxplot in one graph.
Thank you so much!

As far as I understand your question (it is not clear how the fivenum summary should be displayed) here is one solution. It presents the summary using the top axis.
x <- data.frame(
Time = c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3),
Value = c(5,10,15,20,30,50,70,80,100,5,7,9,11,15,17,19,17,19,100,200,300,400,500,700,1000,200))
boxplot(x$Value ~ x$Time)
fivenums <- aggregate(x$Value, by=list(Time=x$Time), FUN=fivenum)
labels <- apply(fivenums[,-1], 1, function(x) paste(x[-1], collapse = ", "))
axis(3, at=fivenums[,1],labels=labels, las=1, col.axis="red")
Of course you can additionally play with the font size or rotation for this summary. Moreover you can break the line in one place, so the label will have smaller width.
Edit
In order to get what have you posted in the comment below you can add
text(x = 3 + 0.5, y = fivenums[3,-1], labels=fivenums[3,-1])
and you will get
however it won't be readable for other boxplots.

Related

abline will not put line in correct position

I am quite new to programming/R and I'm having a very unusual problem. I've made a scatterplot and I would like to simply put the x y axis at 0 on the plot. However, when I use abline they are slightly off. I managed to get them to 0 using trial and error, but trying to plot other lines becomes impossible.
library('car')
scatterplot(cost~qaly, reg.line=FALSE, smooth=FALSE, spread=FALSE,
boxplots='xy', span=0.5, xlab="QALY", ylab="COST", main="Bootstrap",
cex=0.5, data=scat2, xlim=c(-.05,.05), grid=FALSE)
abline(v = 0, h = 0)
This gives lines which are slightly to the left and below 0.
here is an image of what this returns:
(I can't post an image since I'm new apparently)
I found that these values put the lines on 0:
abline(v=0.003)
abline(h=3000)
Thanks in advance for the help!
Using #Laterow's example, reproduce the issue
require(car)
set.seed(10)
x <- rnorm(1000); y <- rnorm(1000)
scatterplot(y ~ x)
abline(v=0, h=0)
scatterplot seems to be resetting the par settings on exit. You can sort of check this with locator(1) around some point, eg, for {-3,-3} I get
# $x
# [1] -2.469414
#
# $y
# [1] -2.223922
Option 1
As #joran points out, reset.par = FALSE is the easiest way
scatterplot(y ~ x, reset.par = FALSE)
abline(v=0, h=0)
Option 2
In ?scatterplot, it says that ... is passed to plot meaning you can use plot's very useful panel.first and panel.last arguments (among others).
scatterplot(y ~ x, panel.first = {grid(); abline(v = 0)}, grid = FALSE)
Note that if you were to do the basic
scatterplot(y ~ x, panel.first = abline(v = 0))
you would be unable to see the line because the default scatterplot grid covers it up, so you can turn that off, plot a grid first then do the abline.
You could also do the abline in panel.last, but this would be on top of your points, so maybe not as desirable.

Heatmap like plot with Lattice

I can not figure out how the lattice levelplot works. I have played with this now for some time, but could not find reasonable solution.
Sample data:
Data <- data.frame(x=seq(0,20,1),y=runif(21,0,1))
Data.mat <- data.matrix(Data)
Plot with levelplot:
rgb.palette <- colorRampPalette(c("darkgreen","yellow", "red"), space = "rgb")
levelplot(Data.mat, main="", xlab="Time", ylab="", col.regions=rgb.palette(100),
cuts=100, at=seq(0,1,0.1), ylim=c(0,2), scales=list(y=list(at=NULL)))
This is the outcome:
Since, I do not understand how this levelplot really works, I can not make it work. What I would like to have is the colour strips to fill the whole window of the corresponding x (Time).
Alternative solution with other method.
Basically, I'm trying here to plot the increasing risk over time, where the red is the highest risk = 1. I would like to visualize the sequence of possible increase or clustering risk over time.
From ?levelplot we're told that if the first argument is a matrix then "'x' provides the
'z' vector described above, while its rows and columns are
interpreted as the 'x' and 'y' vectors respectively.", so
> m = Data.mat[, 2, drop=FALSE]
> dim(m)
[1] 21 1
> levelplot(m)
plots a levelplot with 21 columns and 1 row, where the levels are determined by the values in m. The formula interface might look like
> df <- data.frame(x=1, y=1:21, z=runif(21))
> levelplot(z ~ y + x, df)
(these approaches do not quite result in the same image).
Unfortunately I don't know much about lattice, but I noted your "Alternative solution with other method", so may I suggest another possibility:
library(plotrix)
color2D.matplot(t(Data[ , 2]), show.legend = TRUE, extremes = c("yellow", "red"))
Heaps of things to do to make it prettier. Still, a start. Of course it is important to consider the breaks in your time variable. In this very simple attempt, regular intervals are implicitly assumed, which happens to be the case in your example.
Update
Following the advice in the 'Details' section in ?color2D.matplot: "The user will have to adjust the plot device dimensions to get regular squares or hexagons, especially when the matrix is not square". Well, well, quite ugly solution.
par(mar = c(5.1, 4.1, 0, 2.1))
windows(width = 10, height = 2.5)
color2D.matplot(t(Data[ , 2]),
show.legend = TRUE,
axes = TRUE,
xlab = "",
ylab = "",
extremes = c("yellow", "red"))

superpose a histogram and an xyplot

I'd like to superpose a histogram and an xyplot representing the cumulative distribution function using r's lattice package.
I've tried to accomplish this with custom panel functions, but can't seem to get it right--I'm getting hung up on one plot being univariate and one being bivariate I think.
Here's an example with the two plots I want stacked vertically:
set.seed(1)
x <- rnorm(100, 0, 1)
discrete.cdf <- function(x, decreasing=FALSE){
x <- x[order(x,decreasing=FALSE)]
result <- data.frame(rank=1:length(x),x=x)
result$cdf <- result$rank/nrow(result)
return(result)
}
my.df <- discrete.cdf(x)
chart.hist <- histogram(~x, data=my.df, xlab="")
chart.cdf <- xyplot(100*cdf~x, data=my.df, type="s",
ylab="Cumulative Percent of Total")
graphics.off()
trellis.device(width = 6, height = 8)
print(chart.hist, split = c(1,1,1,2), more = TRUE)
print(chart.cdf, split = c(1,2,1,2))
I'd like these superposed in the same frame, rather than stacked.
The following code doesn't work, nor do any of the simple variations of it that I have tried:
xyplot(cdf~x,data=cdf,
panel=function(...){
panel.xyplot(...)
panel.histogram(~x)
})
You were on the right track with your custom panel function. The trick is passing the correct arguments to the panel.- functions. For panel.histogram, this means not passing a formula and supplying an appropriate value to the breaks argument:
EDIT Proper percent values on y-axis and type of plots
xyplot(100*cdf~x,data=my.df,
panel=function(...){
panel.histogram(..., breaks = do.breaks(range(x), nint = 8),
type = "percent")
panel.xyplot(..., type = "s")
})
This answer is just a placeholder until a better answer comes.
The hist() function from the graphics package has an option called add. The following does what you want in the "classical" way:
plot( my.df$x, my.df$cdf * 100, type= "l" )
hist( my.df$x, add= T )

Error bar ticks |--o--| don't draw for more then three conditions in dotplot

This question is an unexpected follow-up from Draw vertical ending of error bar line in dotplot. While the quoted question was succesfully resolved - there is a caveat. When I introduce more then three conditions to dotplot it doesn't want to draw the vertical ticks |--o--| in the endings of error bars.
As #Josh suggested in the comments, I injected browser() into first line of function that draws updated panel.Dotplot to see what goes wrong, but it didn't come out with anything that helps me to solve it. Here is an example code for four-conditions Dotplot() with updated panel.Dotplot function that doesn't work. It will work, if you decrease number of conditions (check answer for the question quoted above):
require(Hmisc)
#Fake conditions
mean = c(1:18)
lo = mean-0.2
up = mean+0.2
name = c("a","b","c")
cond1 = c("A","B","C")
cond2 = c(rep("E1",9),rep("E2",9))
d = data.frame (name = rep(name,6), mean, lo, up,
cond1=rep(cond1,each=3,times=2), cond2)
# Create the customized panel function
mypanel.Dotplot <- function(x, y, ...) {
panel.Dotplot(x,y,...)
tips <- attr(x, "other")
panel.arrows(x0 = tips[,1], y0 = y,x1 = tips[,2],
y1 = y,length = 0.1, unit = "native",
angle = 90, code = 3)
}
#Draw Dotplot - `panel.Dotplot` doesn't change anything
setTrellis()
Dotplot(name ~ Cbind(mean,lo,up) | cond1 * cond2, data=d, ylab="", xlab="",col=1,
panel = mypanel.Dotplot)
The error bars are in fact being rendered, but are not visible due to their very short length (± 0.2 units). Increasing the error to ± 1 results in the following (I've also increased the length specified in panel.arrows - i.e. the error bar cap length - to 0.5):
If your true data is so precise relative to the range of x-values then you might want to consider smaller points (so they aren't as prone to obscuring the error bars) or a layout that exaggerates the x axis. For example, the following uses your original error of ± 0.2 units, and your original arrow cap length of 0.1:
Dotplot(name ~ Cbind(mean,lo,up) | cond1 * cond2, data=d, ylab="", xlab="",
col=1, panel = mypanel.Dotplot, pch=20, cex=0.4, layout=c(1, 6), strip=FALSE,
strip.left=strip.custom(par.strip.text=list(cex=0.75), bg=0, fg=0))

lattice or latticeExtra combine multiple plots different yscaling (log10 and non-transformed)

I have a multiple variable time series were some of the variables have rather large ranges. I wish to make a single-page plot with multiple stacked plots of each variable were some of the variables have a log10 y-axis scaling. I am relatively new to lattice and have not been able to figure out how to effectively mix the log10 scaling with non-transformed axes and get a publication quality plot. If print.trellis is used the plots are not aligned and the padding needs some work, if c.trellis is used the layout is good, but only the y-scaling from only one plot is used. Any suggestions for an efficient solution, where I can replicate the output of c.trellis using the different y-scaling for each (original) object?
Example below:
require(lattice)
require(latticeExtra)
# make data.frame
d.date <- as.POSIXct(c("2009-12-15", "2010-01-15", "2010-02-15", "2010-03-15", "2010-04-15"))
CO2dat <- c(100,200,1000,9000,2000)
pHdat <- c(10,9,7,6,7)
tmp <- data.frame(date=d.date ,CO2dat=CO2dat ,pHdat=pHdat)
# make plots
plot1 <- xyplot(pHdat ~ date, data=tmp
, ylim=c(5,11)
, ylab="pHdat"
, xlab="Date"
, origin = 0, border = 0
, scales=list(y=list(alternating=1))
, panel = function(...){
panel.xyarea(...)
panel.xyplot(...)
}
)
# make plot with log y scale
plot2 <- xyplot(CO2dat ~ date, data=tmp
, ylim=c(10,10^4)
, ylab="CO2dat"
, xlab="Date"
, origin = 0, border = 0
, scales=list(y=list(alternating=1,log=10))
, yscale.components = yscale.components.log10ticks
, panel = function(...){
panel.xyarea(...)
panel.xyplot(...)
# plot CO2air uatm
panel.abline(h=log10(390),col="blue",type="l",...)
}
)
# plot individual figures using split
print(plot2, split=c(1,1,1,2), more=TRUE)
print(plot1, split=c(1,2,1,2), more=F)
# combine plots (more convenient)
comb <- c(plot1, plot2, x.same=F, y.same=F, layout = c(1, 2))
# plot combined figure
update(comb, ylab = c("pHdat","log10 CO2dat"))
Using #joran's idea, I can get the axes to be closer but not exact; also, reducing padding gets them closer together but changes the aspect ratio. In the picture below I've reduced the padding perhaps by too much to show the not exactness; if this close were desired, you'd clearly want to remove the x-axis labels on the top as well.
I looked into the code that sets up the layout and the margin on the left side is calculated from the width of the labels, so #joran's idea is probably the only thing that will work based on the printing using split, unless one were to rewrite the plot.trellis command. Perhaps the c method could work but I haven't found a way yet to set the scale components separately depending on the panel. That does seem more promising though.
mtheme <- standard.theme("pdf")
mtheme$layout.heights$bottom.padding <- -10
plot1b <- update(plot1, scales=list(y=list(alternating=1, at=5:10, labels=paste(" ",c(5:10)))))
plot2b <- update(plot2, par.settings=mtheme)
pdf(file="temp.pdf")
print(plot2b, split=c(1,1,1,2), more=TRUE)
print(plot1b, split=c(1,2,1,2), more=F)

Resources