Display groups with different borders in histogram with panel.superpose - r

This answer shows how to use groups and panel.superpose to display overlapping histograms in the same panel, assigning different colors to each histogram. In addition, I want to give each histogram a different border color. (This will allow me to display one histogram as solid bars without a border, overlayed with a transparent, all-border histogram. The example below is a little different for the sake of clarity.)
Although it's possible to use border= to use different border colors in the plot, they are not assigned to groups as fill colors are with col=. If you give border= a sequence of colors, it seems to cycle through them one bar at at time. If the two histograms overlap, the effect is a bit silly (see below).
Is there a way to give each group a specific border color?
# This illustrates the problem: Assignment of border colors to bars ignores grouping:
# make some data
foo.df <- data.frame(x=c(rnorm(10),rnorm(10)+2), cat=c(rep("A", 10),rep("B", 10)))
# plot it
histogram(~ x, groups=cat, data=foo.df, ylim=c(0,75), breaks=seq(-3, 5, 0.5), lwd=2,
panel=function(...)panel.superpose(..., panel.groups=panel.histogram,
col=c("transparent", "cyan"),
border=c(rep("black", 3), rep("red", 3))))
Note that you can't just count how many bars there are in each group and provide those numbers to rep in the border setting. If the two histograms overlap, at least one of the histograms will use two border colors.
(It's the panel.superpose code that places the groups on the same panel and that assigns the colors. I don't have a deep understanding of it.)

panel.histogram() doesn't have a formal groups= argument, and if you examine its code, you'll see that it handles any supplied groups= argument differently and in a less standard way than panel.*() functions that do. The upshot of that design decision is that (as you've found) it's not in general easy to pass in to it vectors of graphical parameters specifying per-group appearance
As a workaround, I'd suggest using latticeExtra's +() and as.layer() functions to overlay a number of separate histogram() plots, one for each group. Here's how you might do that:
library(lattice)
library(latticeExtra)
## Split your data by group into separate data.frames
foo.df <- data.frame(x=c(rnorm(10),rnorm(10)+2), cat=c(rep("A", 10),rep("B", 10)))
foo.A <- subset(foo.df, cat=="A")
foo.B <- subset(foo.df, cat=="B")
## Use calls to `+ as.layer()` to layer each group's histogram onto previous ones
histogram(~ x, data=foo.A, ylim=c(0,75), breaks=seq(-3, 5, 0.5),
lwd=2, col="transparent", border="black") +
as.layer(
histogram(~ x, data=foo.B, ylim=c(0,75), breaks=seq(-3, 5, 0.5),
lwd=2, col="cyan", border="red")
)

Related

Issues with colour in plots [duplicate]

I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code:
data <- iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
This is all well and good but how do I know what factor has been coloured what colour??
data<-iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1)
should do it for you. But I prefer ggplot2 and would suggest that for better graphics in R.
The command palette tells you the colours and their order when col = somefactor. It can also be used to set the colours as well.
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
In order to see that in your graph you could use a legend.
legend('topright', legend = levels(iris$Species), col = 1:3, cex = 0.8, pch = 1)
You'll notice that I only specified the new colours with 3 numbers. This will work like using a factor. I could have used the factor originally used to colour the points as well. This would make everything logically flow together... but I just wanted to show you can use a variety of things.
You could also be specific about the colours. Try ?rainbow for starters and go from there. You can specify your own or have R do it for you. As long as you use the same method for each you're OK.
Like Maiasaura, I prefer ggplot2. The transparent reference manual is one of the reasons.
However, this is one quick way to get it done.
require(ggplot2)
data(diamonds)
qplot(carat, price, data = diamonds, colour = color)
# example taken from Hadley's ggplot2 book
And cause someone famous said, plot related posts are not complete without the plot, here's the result:
Here's a couple of references:
qplot.R example,
note basically this uses the same diamond dataset I use, but crops the data before to get better performance.
http://ggplot2.org/book/
the manual: http://docs.ggplot2.org/current/
There are two ways that I know of to color plot points by factor and then also have a corresponding legend automatically generated. I'll give examples of both:
Using ggplot2 (generally easier)
Using R's built in plotting functionality in combination with the colorRampPallete function (trickier, but many people prefer/need R's built-in plotting facilities)
For both examples, I will use the ggplot2 diamonds dataset. We'll be using the numeric columns diamond$carat and diamond$price, and the factor/categorical column diamond$color. You can load the dataset with the following code if you have ggplot2 installed:
library(ggplot2)
data(diamonds)
Using ggplot2 and qplot
It's a one liner. Key item here is to give qplot the factor you want to color by as the color argument. qplot will make a legend for you by default.
qplot(
x = carat,
y = price,
data = diamonds,
color = diamonds$color # color by factor color (I know, confusing)
)
Your output should look like this:
Using R's built in plot functionality
Using R's built in plot functionality to get a plot colored by a factor and an associated legend is a 4-step process, and it's a little more technical than using ggplot2.
First, we will make a colorRampPallete function. colorRampPallete() returns a new function that will generate a list of colors. In the snippet below, calling color_pallet_function(5) would return a list of 5 colors on a scale from red to orange to blue:
color_pallete_function <- colorRampPalette(
colors = c("red", "orange", "blue"),
space = "Lab" # Option used when colors do not represent a quantitative scale
)
Second, we need to make a list of colors, with exactly one color per diamond color. This is the mapping we will use both to assign colors to individual plot points, and to create our legend.
num_colors <- nlevels(diamonds$color)
diamond_color_colors <- color_pallet_function(num_colors)
Third, we create our plot. This is done just like any other plot you've likely done, except we refer to the list of colors we made as our col argument. As long as we always use this same list, our mapping between colors and diamond$colors will be consistent across our R script.
plot(
x = diamonds$carat,
y = diamonds$price,
xlab = "Carat",
ylab = "Price",
pch = 20, # solid dots increase the readability of this data plot
col = diamond_color_colors[diamonds$color]
)
Fourth and finally, we add our legend so that someone reading our graph can clearly see the mapping between the plot point colors and the actual diamond colors.
legend(
x ="topleft",
legend = paste("Color", levels(diamonds$color)), # for readability of legend
col = diamond_color_colors,
pch = 19, # same as pch=20, just smaller
cex = .7 # scale the legend to look attractively sized
)
Your output should look like this:
Nifty, right?
The col argument in the plot function assign colors automatically to a vector of integers. If you convert iris$Species to numeric, notice you have a vector of 1,2 and 3s So you can apply this as:
plot(iris$Sepal.Length, iris$Sepal.Width, col=as.numeric(iris$Species))
Suppose you want red, blue and green instead of the default colors, then you can simply adjust it:
plot(iris$Sepal.Length, iris$Sepal.Width, col=c('red', 'blue', 'green')[as.numeric(iris$Species)])
You can probably see how to further modify the code above to get any unique combination of colors.
The lattice library is another good option. Here I've added a legend on the right side and jittered the points because some of them overlapped.
xyplot(Sepal.Width ~ Sepal.Length, group=Species, data=iris,
auto.key=list(space="right"),
jitter.x=TRUE, jitter.y=TRUE)

Rescaling colors palette in r

In R i have a cloud of data around zero ,and some data around 1, i want to "rescale" my heat colors to distinguish lower numbers.This has to be done in a rainbow way, i don't want "discrete colors".I tried with breaks in image.plot but it doesn't work.
image.plot(X,Y,as.matrix(mymatrix),col=heat.colors(800),asp=1,scale="none")
I tried :
lowerbreak=seq(min(values),quantile2,len=80)
highbreak=seq(quantile2+0.0000000001,max(values),len=20)
break=c(lowerbreak,highbreak)
ii <- cut(values, breaks = break,
include.lowest = TRUE)
colors <- colorRampPalette(c("lightblue", "blue"))(99)[ii]
Here's an approach using the "squash" library. With makecmap(), you specify your colour values and breaks, and you can also specify that it should be log stretched using the base parameter. It's a bit complex, but gives you granular control. I use it to colorize skewed data, where I need more definition in the "low end".
To achieve the rainbow palette, I used the built-in "jet" colour function, but you can use any colour set - I give an example for creating a greyscale ramp with "colorRampPalette".
Whatever ramp you use, it will take some playing with the base value to optimize for your data.
install.packages("squash")
library("squash")
#choose your colour thresholds - outliers will be RED
minval=0 #lowest value to get a colour
maxval=2.0 #highest value to get a colour
n.cols=100 #how many colours do you want in your palette?
col.int=1/n.cols
#create your palette
colramp=makecmap(x=seq(minval,maxval,col.int),
n=n.cols,
breaks=prettyLog,
symm=F,
base=10,#to give ramp a log(base) stretch
colFn=jet,
col.na="red",
right=F,
include.lowest=T)
# If you don't like the colFn options in "makecmap", define your own!
# Here's an example in greyscale; pass this to "colFn" above
user.colfn=colorRampPalette(c("black","white"))
Example for using colramp in a plot (assuming you've already created colramp as above somewhere in your program):
varx=1:100
vary=1:100
plot(x,y,col=colramp$colors) #colors is the 2nd vector in the colramp list
To select specific colours, subset from the list via, e.g., colors[1:20] (if you try this with the example above, the first colors will repeat 5 times - not really useful but you get the logic and can play around).
In my case, I had a grid of values that I wanted to turn into a coloured raster image (i.e. colour mapping some continuous data). Here's example code for that, using a made up matrix:
#create a "dummy matrix"
matx=matrix(data=c(rep(2,50),rep(0,500),rep(0.5,500),rep(1,500),rep(1.5,500)),nrow=50,ncol=41,byrow=F)
#transpose the matrix
# the output of "savemat" is rotated 90 degrees to the left
# so savemat(maty) will be a colorized version of (matx)
maty=t(matx)
#savemat creates an image using colramp
savemat(x=maty,
filename="/Users/KeeganSmith/Desktop/matx.png",
map=colramp,
outlier="red",
dev="png",
do.dev.off=T)
When using colorRampPalette, you can set the bias argument to emphasise low (or high) values.
Something like colorRampPalette(heat.colors(100),bias=3) will result focus the 'ramp' on the lower, helping them to be more visually distinguishable.

How to superimpose a histogram on each panel

I would like to superimpose, on each lattice histogram panel, an additional histogram (which will be the same in each panel). I want the overlayed histogram to have solid borders but empty fill (col), to allow comparison with the underlying histograms.
That is, the end result will be a series of panels, each with a different colored histogram, and each with the same extra outline histogram on top of the colored histogram.
Here's something that I tried, but it just produces empty panels:
foo.df <- data.frame(x=rnorm(40), categ=c(rep("A", 20), rep("B", 20)))
bar.df <- data.frame(x=rnorm(20))
histogram(~ x | categ, data=foo.df,
panel=function(...){histogram(...);
histogram(~ x, data=bar.df, col=NULL)})
(My guess is that I need to use panel.superpose, but this function is somewhat confusing. Sarkar's book doesn't explain how to use it, and the R help page has no examples. I'm finding it difficult to make sense of the panel.superpose help page without already having a basic understanding. There are a very small number of examples that I've found on the web, but I have been unable to figure out what aspects of those examples apply to my case. This answer is surely relevant, but I don't understand its use of panel.groups, and the example overlays three different groups from a single dataframe, whereas I want to repeatedly overlay the same data on multiple panels that also have different data .)
I continued working on this problem, and came up with an answer. I had been on the right track but got several crucial details wrong. Comments in the code below spell out important points.
# Main data, which will be displayed as solid histograms, different in each panel:
foo.df <- data.frame(y=rnorm(40), cat=c(rep("A", 20), rep("B", 20)))
# Comparison data: This will be displayed as an outline histogram in each panel:
bar.df <- data.frame(y=rnorm(30)-2)
# Define some vectors that we'll use in the histogram call.
# These have to be adjusted for the data by trial and error.
# Usually, panel.histogram will figure out reasonable default values for these.
# However, the two calls to panel.histogram below may figure out different values,
# producing pairs of histograms that aren't comparable.
bks <- seq(-5,3,0.5) # breaks that define the bar bins
yl <- c(0,50) # height of plot
# The key is to coordinate breaks in the two panel.histogram calls below.
# The first one inherits the breaks from the top-level call through '...' .
# Using "..." in the second call generates an error, so I specify parameters explicitly.
# It's not necessary to specify type="percent" at the top level, since that's the default,
# but it is necessary to specify it in the second panel.histogram call.
histogram(~ y | cat, data=foo.df, ylim=yl, breaks=bks, type="percent", border="cyan",
panel=function(...){panel.histogram(...)
panel.histogram(x=bar.df$y, col="transparent",
type="percent", breaks=bks)})
# col="transparent" is what makes the second set of bars into outlines.
# In the first set of bars, I set the border color to be the same as the value of col
# (cyan by default) rather than using border="transparent" because otherwise a filled
# bar with the same number of points as an outline bar will be slightly smaller.

R lattice barchart: How to write the total sum on each bar in multiple panels?

I have a lattice bar chart with multiple panels and I would like to add the sum of each bar on top of the bars (e.g. (70) on top the of first bar on the top left, (20) on the second one, (150) on the third one etc.).
There is a similar question here but I could not find a way to adapt that code for my plot. Unlike in that example, what I would like to do is to add the 'total sum' of men and women on top of each bar vertical bar. I also could not label them separately using ltext as shown here. Any suggestion, using ltext or any other way, would be very helpful.
civ1<-c("Single","Single","Marr","Marr","Single","Single","Marr","Marr","Single","Single","Marr","Marr","Single","Single","Marr","Marr")
Sex<-rep(c("women","men"),8)
Year<-rep(c(rep(1990,4),rep(2000,4)),2)
Type1<-c(rep("Traditional",8),rep("Dual-earner",8))
Earn1<-c(seq(10, 160, by = 10))
df<-as.data.frame(cbind(civ1,Sex,Year,Type1,Earn1))
df$Earn1<-as.numeric(levels(df$Earn1))[df$Earn1]
my.key<-list(space="bottom",text=list(c("Women","Men"),col=c("black","black")), columns=2,points=T,pch=15,col=c("darkgray","lightgray"),cex=0.8)
labels=c("70","20","150","110")
print(figure1<-barchart(Earn1~civ1|Year+Type1,df,groups=Sex, ylim=c(0,350),horizontal=F,col=c("darkgray","lightgray"),cex=0.8,ylab="Earnings",stack=T,layout=c(2,2),key=my.key,
par.settings = list(strip.background=list(col=c("white","lightyellow")),
panel=function(x,y,subscripts...){
panel.grid(h=-1,v=0)
panel.barchart(...)
ltext(1,200, labels[subscripts]) #not working!
})))
I see several problems. First, your panel= parameter is inside your par.settings parameter which is incorrect. It should be passed to barchart directly. Then you have some syntax problems with a missing comma and I'm not sure how your labels were intended to work with only 4 values. Anyway, the following code should work.
barchart(
Earn1~civ1|Year+Type1,df,
groups=Sex,
ylim=c(0,350), cex=0.8, ylab="Earnings",
horizontal=F, stack=T, layout=c(2,2),
col=c("darkgray","lightgray"),
key=my.key,
par.settings = list(strip.background=list(col=c("white","lightyellow"))),
panel=function(x,y,subscripts,...){
panel.grid(h=-1,v=0)
panel.barchart(x,y,subscripts=subscripts,...)
t <- aggregate(y~x, data.frame(x,y), FUN=sum)
panel.text(t$x,t$y, labels=t$y, pos=3)
}
)
Aside from fixing the problems described above, I've use aggregate() to calculate the total for each column and used those values to plot the text labels at the appropriate spot. The resulting plot is below

Combine / overlay different types of graphs with "lattice" and "latticeExtra"

I'm trying to combine or that is to say overlay a barchart with a xyplot (with regression line) with two variables whose values are quite different.
Here's my data: https://www.dropbox.com/s/aacbkmo577uagjs/example.csv
There are the two numeric variables "rb" and "rae" and several factor variables (sample.size, effect.size, allocation.design, true.dose) that are to be displayed in panels according to the code below. The variable "rae" should be displayed in a barchart (ideally in faint colors in the background), whereas the variable "rb" is to be displayed in a xyplot with a regression line. There are two main questions:
(1) How to combine / overlay both types of graphs?
(2) How to customize axes labels (different scales for y-scale)
For (1), I know how to combine different types of graphs with ggplot2, but it should be also possible with lattice, am I right? I tried "doubleYScale", which doesn't seem to work.
For (2), I only accomplished to use "relation='free'" for the y-scale in the "scales"-option (see code). This is nice since the focus is on the important range of the values. However, it would be more appropriate if axes-labels are additionally drawn on the left and right outside (for "rae" and "rb", respectively).
Here's the code so far (modified by Dieter Menne to be self-contained)
library(lattice)
library(latticeExtra)
df.dose <- read.table("example.csv", header=TRUE, sep=",")
df.dose <- transform(df.dose,
sample.size=as.factor(sample.size),
true.dose = as.factor(true.dose))
rae.plot <- xyplot(
rae ~ sample.size | allocation.design*true.dose,
df.dose, as.table=TRUE,
groups = type,
lty = 1, jitter.x=TRUE,
main="RAE",
scales=list(y=list(draw=F, relation="free", tck=.5)),
panel = function(x,y) {
panel.xyplot(x,y,jitter.x=TRUE)
panel.lmline(x,y, col="darkgrey", lwd=1.5)
})
useOuterStrips(rae.plot)
rb.plot <- barchart(
rb ~ sample.size | allocation.design*as.factor(true.dose),
df.dose, as.table=TRUE,
groups = type,
key=list(
text=list(levels(as.factor(df.dose$type))),
scales=list(y=list(draw=F, relation="free", tck=.5)),
main="RB"))
useOuterStrips(rb.plot)
print(useOuterStrips(rae.plot), split=c(1,1,1,2),more=TRUE)
print(useOuterStrips(rb.plot), split=c(1,2,1,2), more=FALSE)
will print both on one page; it's easier than in ggplot2.
scales=list(
y=list(alternating=1,tck=c(1,0)),
x=list(alternating=1,tck=c(1,0)))
xyplot (... scales=scales)

Resources