How to use the pairs function combined with layout in R? - r

I am trying to add a simple legend to a customized pairs graph.
Here is the reproductible code (without my customized pairs function) :
layout(cbind(1,2),width=c(1,1))
layout.show(2)
pairs(USJudgeRatings)
Why is the pairs function "erasing" my layout information ?

A warning contained in the help for layout is
These functions are totally incompatible with the other mechanisms for arranging plots on a device: par(mfrow), par(mfcol)
Unfortunately, pairs uses mfrow for arranging the plots.
Using the hints from Duncan Murdoch and Uwe Ligges on R help, you can set oma to a reasonable value to give you room for a legend on the side, eg
pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",
pch = 21, bg = c("red", "green3", "blue")[iris$Species],
oma=c(4,4,6,12))
# allow plotting of the legend outside the figure region
# (ie within the space left by making the margins big)
par(xpd=TRUE)
legend(0.85, 0.7, as.vector(unique(iris$Species)),
fill=c("red", "green3", "blue"))

Related

Issues with colour in plots [duplicate]

I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code:
data <- iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
This is all well and good but how do I know what factor has been coloured what colour??
data<-iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1)
should do it for you. But I prefer ggplot2 and would suggest that for better graphics in R.
The command palette tells you the colours and their order when col = somefactor. It can also be used to set the colours as well.
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
In order to see that in your graph you could use a legend.
legend('topright', legend = levels(iris$Species), col = 1:3, cex = 0.8, pch = 1)
You'll notice that I only specified the new colours with 3 numbers. This will work like using a factor. I could have used the factor originally used to colour the points as well. This would make everything logically flow together... but I just wanted to show you can use a variety of things.
You could also be specific about the colours. Try ?rainbow for starters and go from there. You can specify your own or have R do it for you. As long as you use the same method for each you're OK.
Like Maiasaura, I prefer ggplot2. The transparent reference manual is one of the reasons.
However, this is one quick way to get it done.
require(ggplot2)
data(diamonds)
qplot(carat, price, data = diamonds, colour = color)
# example taken from Hadley's ggplot2 book
And cause someone famous said, plot related posts are not complete without the plot, here's the result:
Here's a couple of references:
qplot.R example,
note basically this uses the same diamond dataset I use, but crops the data before to get better performance.
http://ggplot2.org/book/
the manual: http://docs.ggplot2.org/current/
There are two ways that I know of to color plot points by factor and then also have a corresponding legend automatically generated. I'll give examples of both:
Using ggplot2 (generally easier)
Using R's built in plotting functionality in combination with the colorRampPallete function (trickier, but many people prefer/need R's built-in plotting facilities)
For both examples, I will use the ggplot2 diamonds dataset. We'll be using the numeric columns diamond$carat and diamond$price, and the factor/categorical column diamond$color. You can load the dataset with the following code if you have ggplot2 installed:
library(ggplot2)
data(diamonds)
Using ggplot2 and qplot
It's a one liner. Key item here is to give qplot the factor you want to color by as the color argument. qplot will make a legend for you by default.
qplot(
x = carat,
y = price,
data = diamonds,
color = diamonds$color # color by factor color (I know, confusing)
)
Your output should look like this:
Using R's built in plot functionality
Using R's built in plot functionality to get a plot colored by a factor and an associated legend is a 4-step process, and it's a little more technical than using ggplot2.
First, we will make a colorRampPallete function. colorRampPallete() returns a new function that will generate a list of colors. In the snippet below, calling color_pallet_function(5) would return a list of 5 colors on a scale from red to orange to blue:
color_pallete_function <- colorRampPalette(
colors = c("red", "orange", "blue"),
space = "Lab" # Option used when colors do not represent a quantitative scale
)
Second, we need to make a list of colors, with exactly one color per diamond color. This is the mapping we will use both to assign colors to individual plot points, and to create our legend.
num_colors <- nlevels(diamonds$color)
diamond_color_colors <- color_pallet_function(num_colors)
Third, we create our plot. This is done just like any other plot you've likely done, except we refer to the list of colors we made as our col argument. As long as we always use this same list, our mapping between colors and diamond$colors will be consistent across our R script.
plot(
x = diamonds$carat,
y = diamonds$price,
xlab = "Carat",
ylab = "Price",
pch = 20, # solid dots increase the readability of this data plot
col = diamond_color_colors[diamonds$color]
)
Fourth and finally, we add our legend so that someone reading our graph can clearly see the mapping between the plot point colors and the actual diamond colors.
legend(
x ="topleft",
legend = paste("Color", levels(diamonds$color)), # for readability of legend
col = diamond_color_colors,
pch = 19, # same as pch=20, just smaller
cex = .7 # scale the legend to look attractively sized
)
Your output should look like this:
Nifty, right?
The col argument in the plot function assign colors automatically to a vector of integers. If you convert iris$Species to numeric, notice you have a vector of 1,2 and 3s So you can apply this as:
plot(iris$Sepal.Length, iris$Sepal.Width, col=as.numeric(iris$Species))
Suppose you want red, blue and green instead of the default colors, then you can simply adjust it:
plot(iris$Sepal.Length, iris$Sepal.Width, col=c('red', 'blue', 'green')[as.numeric(iris$Species)])
You can probably see how to further modify the code above to get any unique combination of colors.
The lattice library is another good option. Here I've added a legend on the right side and jittered the points because some of them overlapped.
xyplot(Sepal.Width ~ Sepal.Length, group=Species, data=iris,
auto.key=list(space="right"),
jitter.x=TRUE, jitter.y=TRUE)

add multiple legends to a heatmap in R

My question is that I want to add the categorical variables' legends onto the graph but somehow when I used "topright", it did not work for me (see the image below). Basically my idea is to fill the blank area with the legends for my categorical variables on the side of the heatmap. My codes look like
heatmap.3(performance, Colv =NA,RowSideColors=row_annotation,col=my_palette)
par(lend = 1) # square line ends for the color legend
legend("topright", # location of the legend on the heatmap plot
legend = c("category1", "category2", "category3"), # category labels
col = c("gray", "blue", "black"), # color key
lty= 1, # line style
lwd = 10 # line width
)
Also I want to put multiple legend onto the plot but don't know how to specify their positions using x and y as there are no coordinates in my plot.
Thank you so much!
A simple fix to your problem would be to use the option inset=# (where # is some floating point number with a comma after it) following your "topright" specification after its respective comma, which would indicate how to place your legend relative to your graph.
Instead of specifying the default position "topright", perhaps you may want to try restructuring your code in a more personally customized manner, namely utilizing an x-y axis approach, such as for instance use :
dev.new(xpos=#,ypos=#)
Or you may also consider using:
legend.position=c(#,#)
Try these options, even though you say you have no access to coordinates, which is rare for graphical utilities in R, but may well be a future edit to heatmap.3. You could also use a fancy R utility called locator(1) to point and click with your mouse where you want to see your legend.
In general, the legend option is formally defined as:
legend(location, title, legend, ...)
If you have any more questions about the legend utility in R, please type in help(legend) in your R command line (in R Studio, for instance, if you use that).
To address your question on multiple legends, please consult: Plotting multiple legends

R-plot a centered legend at outer margins of multiple plots

I want to plot a centered legend outside of the plotting area in a device having multiple plots. There has been many questions (with slight variations) asked in SO about changing the position of legend in a R plot.
For example:
1) R - Common title and legend for combined plots
2) Common legend for multiple plots in R
3) Plot a legend outside of the plotting area in base graphics?
etc.
Now what I understood from the above questions is that I got to set the option xpd = T or xpd = NAto plot legends at the outer margins. However when I try this, it somehow does not work for me ..
par(mfrow=c(1,2),oma=c(0,3,0,0),xpd=TRUE)
plot(c(5,10),col=c("red","blue"),pch=20,cex=2,bty="n",xlab="",ylab="")
barplot(c(5,10),col=c("red","blue"))
mtext(text="My two plots",side=3,cex=2,outer=TRUE,line=-3)
legend("top",legend=c("A", "B"),fill=c("red","blue"),ncol=2,xpd=NA,bty="n") # Option 1
legend(x=0.01,y=11,legend=c("A", "B"),fill=c("red","blue"),ncol=2,xpd=TRUE,bty="n") # Option 2
Now my question is, how does xpd exactly work ? as I am unable to figure out why shouldn't the legend not be placed outside the plot area with xpd=T.
I apologize in advance if some consider this as a duplicate of the above questions !!
Help is much appreciated
Ashwin
Option #1 is likely the route you should take, with xpd=NA. It does not automatically place the legend in the outer margins, but it allows you to place the legend anywhere you want. So, for example, you could use this code to place the legend at the top of the page, approximately centered.
legend(x=-1.6, y=11.6, legend=c("A", "B"), fill=c("red", "blue"), ncol=2, xpd=NA, bty="n")
I chose these x and y values by trial and error. But, you could define a function that overlays a single (invisible) plot on top of the ones you created. Then you can use legend("top", ...). For example
reset <- function() {
par(mfrow=c(1, 1), oma=rep(0, 4), mar=rep(0, 4), new=TRUE)
plot(0:1, 0:1, type="n", xlab="", ylab="", axes=FALSE)
}
reset()
legend("top", legend=c("A", "B"), fill=c("red", "blue"), ncol=2, bty="n")
I also had a hard time to get coordinates on the margins. I think I found a solution, you can specify coordinates for the legend using:
getCoords() function.
Look also to legend_margin function from plotfunctions package.
So combining the solution from Jean V. Adams with one of these functions should get you there.
Hope that works :)

Scatterplot with color groups - base R plot

I have a simple scatter plot
x<-rnorm(100)
y<-rnorm(100)
z<-rnorm(100)
I want to plot the plot(x,y) but the color of the points should be color coded based on z.
Also, I would like to have the ability to define how many groups (and thus colours) z should have. And that this grouping should be resistant to outliers (maybe split the z density into n equal density groups).
Till now I do this manually, is there any way to do this automatically?
Note: I want to do this with base R not with ggplot.
You can pass a vector of colours to the col parameter, so it is just a matter of defining your z groups in a way that makes sense for your application. There is the cut() function in base, or cut2() in Hmisc which offers a bit more flexibility. To assist in picking reasonable colour palettes, the RColorBrewer package is invaluable. Here's a quick example after defining x,y,z:
z.cols <- cut(z, 3, labels = c("pink", "green", "yellow"))
plot(x,y, col = as.character(z.cols), pch = 16)
You can obviously add a legend manually. Unfortunately, I don't think all types of plots accept vectors for the col argument, but type = "p" obviously works. For instance, plot(x,y, type = "l", col = as.character(z.cols)) comes out as a single colour for me. For these plots, you can add different colours with lines() or segments() or whatever the low level plotting command you need to use is. See the answer by #Andrie for doing this with type = "l" plots in base graphics here.

How can I add a legend to a goodness of fit plot in R?

I'm using goodfit from vcd package to produce goodness of fit plots.
I would like to add a legend stating the bars are the actual counts and the dots (connected by the line) are the fit using e.g. Poisson and ML.
legend does not work. How can I easily add a legend to this plot?
Thanks!
The plot function for goodfit objects is using the grid graphics system (see ?rootogram and getAnywhere(rootogram.default)).
You have two options:
use the rather limited grid.legend function (from package grid).
embed a base graphics legend in the grid plot using the gridBase package.
Here is a simple example for the first option:
library("vcd")
dummy <- rnbinom(200, size=1.5, prob=0.8)
gf <- goodfit(dummy, type="nbinomial", method="MinChisq")
plot(gf)
pushViewport(viewport(x=unit(0.8, "npc"),
y=unit(0.8, "npc"),
width=stringWidth("Legend x"),
height=unit(6, "line"),
name="vp1"))
grid.legend(labels=c("Legend 1", "Legend 2"), pch=1:2)
popViewport()
Modifying #rcs's answer to use grid_legend (in the vcd package along with goodfit), which is intended for users (grid.legend is an undocumented internal function), and to show a legend specifically geared to this plot. It would be nice to use fill=c(NA,"gray") as in legend in base graphics, but it's not implemented in grid_legend.
library("vcd")
dummy <- rnbinom(200, size=1.5, prob=0.8)
gf <- goodfit(dummy, type="nbinomial", method="MinChisq")
plot(gf)
grid_legend(x=unit(0.8, "npc"),
y=unit(0.8, "npc"),
labels=c("est NBinom (MinChiSq)","obs"),
title="",
pch=c(16,15),col=c("red","gray"))
It is hard to tell without a specific example (AFAIK it is not a limitation with goodfit), but I would check a few things with legend:
You can place a legend with "topright", "bottomleft", etc for the argument x.
You can query the x and y axis limits with par("usr"). If the plot is in log scale and you want to place the legend at the maximum value of y, you have to use 10^par("usr")[4], and so on.
Pass the argument xpd=NA to see if you are placing the legend outside of the plotting region and see if you need to set xjust or yjust.

Resources