I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code:
data <- iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
This is all well and good but how do I know what factor has been coloured what colour??
data<-iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1)
should do it for you. But I prefer ggplot2 and would suggest that for better graphics in R.
The command palette tells you the colours and their order when col = somefactor. It can also be used to set the colours as well.
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
In order to see that in your graph you could use a legend.
legend('topright', legend = levels(iris$Species), col = 1:3, cex = 0.8, pch = 1)
You'll notice that I only specified the new colours with 3 numbers. This will work like using a factor. I could have used the factor originally used to colour the points as well. This would make everything logically flow together... but I just wanted to show you can use a variety of things.
You could also be specific about the colours. Try ?rainbow for starters and go from there. You can specify your own or have R do it for you. As long as you use the same method for each you're OK.
Like Maiasaura, I prefer ggplot2. The transparent reference manual is one of the reasons.
However, this is one quick way to get it done.
require(ggplot2)
data(diamonds)
qplot(carat, price, data = diamonds, colour = color)
# example taken from Hadley's ggplot2 book
And cause someone famous said, plot related posts are not complete without the plot, here's the result:
Here's a couple of references:
qplot.R example,
note basically this uses the same diamond dataset I use, but crops the data before to get better performance.
http://ggplot2.org/book/
the manual: http://docs.ggplot2.org/current/
There are two ways that I know of to color plot points by factor and then also have a corresponding legend automatically generated. I'll give examples of both:
Using ggplot2 (generally easier)
Using R's built in plotting functionality in combination with the colorRampPallete function (trickier, but many people prefer/need R's built-in plotting facilities)
For both examples, I will use the ggplot2 diamonds dataset. We'll be using the numeric columns diamond$carat and diamond$price, and the factor/categorical column diamond$color. You can load the dataset with the following code if you have ggplot2 installed:
library(ggplot2)
data(diamonds)
Using ggplot2 and qplot
It's a one liner. Key item here is to give qplot the factor you want to color by as the color argument. qplot will make a legend for you by default.
qplot(
x = carat,
y = price,
data = diamonds,
color = diamonds$color # color by factor color (I know, confusing)
)
Your output should look like this:
Using R's built in plot functionality
Using R's built in plot functionality to get a plot colored by a factor and an associated legend is a 4-step process, and it's a little more technical than using ggplot2.
First, we will make a colorRampPallete function. colorRampPallete() returns a new function that will generate a list of colors. In the snippet below, calling color_pallet_function(5) would return a list of 5 colors on a scale from red to orange to blue:
color_pallete_function <- colorRampPalette(
colors = c("red", "orange", "blue"),
space = "Lab" # Option used when colors do not represent a quantitative scale
)
Second, we need to make a list of colors, with exactly one color per diamond color. This is the mapping we will use both to assign colors to individual plot points, and to create our legend.
num_colors <- nlevels(diamonds$color)
diamond_color_colors <- color_pallet_function(num_colors)
Third, we create our plot. This is done just like any other plot you've likely done, except we refer to the list of colors we made as our col argument. As long as we always use this same list, our mapping between colors and diamond$colors will be consistent across our R script.
plot(
x = diamonds$carat,
y = diamonds$price,
xlab = "Carat",
ylab = "Price",
pch = 20, # solid dots increase the readability of this data plot
col = diamond_color_colors[diamonds$color]
)
Fourth and finally, we add our legend so that someone reading our graph can clearly see the mapping between the plot point colors and the actual diamond colors.
legend(
x ="topleft",
legend = paste("Color", levels(diamonds$color)), # for readability of legend
col = diamond_color_colors,
pch = 19, # same as pch=20, just smaller
cex = .7 # scale the legend to look attractively sized
)
Your output should look like this:
Nifty, right?
The col argument in the plot function assign colors automatically to a vector of integers. If you convert iris$Species to numeric, notice you have a vector of 1,2 and 3s So you can apply this as:
plot(iris$Sepal.Length, iris$Sepal.Width, col=as.numeric(iris$Species))
Suppose you want red, blue and green instead of the default colors, then you can simply adjust it:
plot(iris$Sepal.Length, iris$Sepal.Width, col=c('red', 'blue', 'green')[as.numeric(iris$Species)])
You can probably see how to further modify the code above to get any unique combination of colors.
The lattice library is another good option. Here I've added a legend on the right side and jittered the points because some of them overlapped.
xyplot(Sepal.Width ~ Sepal.Length, group=Species, data=iris,
auto.key=list(space="right"),
jitter.x=TRUE, jitter.y=TRUE)
Related
plot(iris$Sepal.Length, iris$Sepal.Width, col = iris$Species)
I know that I can use the legend() function to manually set my legend. However, I have no idea which color was assigned to the different species in my data? Is there an automatic way to get plot() to add a legend?
As #rawr says, palette() determines the colour sequence used. If you use integers to specify colours, it will also look at palette(). Thus
with(iris,plot(Sepal.Length, Sepal.Width, col = Species))
legend("topright",legend=levels(iris$Species),col=1:3, pch=1)
works nicely.
Base R doesn't have an auto-legend facility: the ggplot2 package does.
library(ggplot2)
ggplot(iris,aes(Sepal.Length,Sepal.Width,colour=Species))+geom_point()
gives you a plot with an automatic legend (use theme_set(theme_bw()) if you don't like the grey background).
The built-in lattice package can also do automatic legends:
library(lattice)
xyplot(Sepal.Width~Sepal.Length,group=Species,data=iris,auto.key=TRUE)
This answer shows how to use groups and panel.superpose to display overlapping histograms in the same panel, assigning different colors to each histogram. In addition, I want to give each histogram a different border color. (This will allow me to display one histogram as solid bars without a border, overlayed with a transparent, all-border histogram. The example below is a little different for the sake of clarity.)
Although it's possible to use border= to use different border colors in the plot, they are not assigned to groups as fill colors are with col=. If you give border= a sequence of colors, it seems to cycle through them one bar at at time. If the two histograms overlap, the effect is a bit silly (see below).
Is there a way to give each group a specific border color?
# This illustrates the problem: Assignment of border colors to bars ignores grouping:
# make some data
foo.df <- data.frame(x=c(rnorm(10),rnorm(10)+2), cat=c(rep("A", 10),rep("B", 10)))
# plot it
histogram(~ x, groups=cat, data=foo.df, ylim=c(0,75), breaks=seq(-3, 5, 0.5), lwd=2,
panel=function(...)panel.superpose(..., panel.groups=panel.histogram,
col=c("transparent", "cyan"),
border=c(rep("black", 3), rep("red", 3))))
Note that you can't just count how many bars there are in each group and provide those numbers to rep in the border setting. If the two histograms overlap, at least one of the histograms will use two border colors.
(It's the panel.superpose code that places the groups on the same panel and that assigns the colors. I don't have a deep understanding of it.)
panel.histogram() doesn't have a formal groups= argument, and if you examine its code, you'll see that it handles any supplied groups= argument differently and in a less standard way than panel.*() functions that do. The upshot of that design decision is that (as you've found) it's not in general easy to pass in to it vectors of graphical parameters specifying per-group appearance
As a workaround, I'd suggest using latticeExtra's +() and as.layer() functions to overlay a number of separate histogram() plots, one for each group. Here's how you might do that:
library(lattice)
library(latticeExtra)
## Split your data by group into separate data.frames
foo.df <- data.frame(x=c(rnorm(10),rnorm(10)+2), cat=c(rep("A", 10),rep("B", 10)))
foo.A <- subset(foo.df, cat=="A")
foo.B <- subset(foo.df, cat=="B")
## Use calls to `+ as.layer()` to layer each group's histogram onto previous ones
histogram(~ x, data=foo.A, ylim=c(0,75), breaks=seq(-3, 5, 0.5),
lwd=2, col="transparent", border="black") +
as.layer(
histogram(~ x, data=foo.B, ylim=c(0,75), breaks=seq(-3, 5, 0.5),
lwd=2, col="cyan", border="red")
)
I know it could be a simple question, but I'm really struggling..
Using the iris dataset as an example, I can use the following code to perform a plot with different colors depending on the different species:
plot(iris$Sepal.Length, iris$Sepal.Width, col=iris$Species)
legend('topright', legend=c("setosa", "virginica", "versicolor"))
but actually I'm not able to add the same colors used in the plot call in the legend.
Furthermore, the dataset I'm using has several unique values. Is there a way to add the colors without specifying them manually?
ggplot adds automatically the legend according to the color used in the aes option, but I need to do the same thing with the plot package.
Is there a simple solution?
Thanks
Your plot call uses the default colors corresponding to the values of iris$Species (i.e. 1,2,3, remember it's a factor, see the output of as.numeric(iris$Species)!)
For only a few classes the easy solution is to do something like:
cols <- c("darkgreen", "darkblue", "orange")
plot(iris$Sepal.Length, iris$Sepal.Width, col=cols[iris$Species], pch=20)
legend('topright', legend=levels(iris$Species),
col= cols, pch=20)
A more general solution is to use a palette of colours, using functions such as heat.colors, gray.colors or rainbow.colors, for instance
cols <- heat.colors(5) // 5 is the number of colors in the palette
More fancy palettes are found in the RColorBrewer package.
Also, the colorRampPalette function allows you to blend those palettes to get a more smooth result. For instance:
library(RColorBrewer)
colorRampPalette(brewer.pal(9,"Blues"))(100)
you can try this
plot(Sepal.Width~Sepal.Length, data=iris, col=Species)
legend('topright', legend=levels(iris$Species), col=1:3, pch=1)
This is probably a basic question. I’ve produced a plot that displays the home ranges for different lemurs. Great! Hard part done. But they are all lime green. How can I choose a different colour for each of my 5 ID's? It seems like is should be simple but I can’t see anything online. Would anyone be able to suggest something?
I’ve pasted my code below
dd <- read.csv(file.choose(), header = T)
xy <- dd[,c("X","Y")]
id <- dd[,"ID"]
hr<- mcp(xy,id,percent=95)
plot(hr,
main="95% Minimum Convex Polygon",
xlab="X Coordinate",
ylab="Y Coordinate")
Once i have 5 separate colors for my 5 ID's (frodo, bilbo, merry, pippin, sam) it would also be great to create a legend displaying the colors and the related ID. I was playing around with the following code
legend('topright', names(hr)[-1] ,
lty=1, col=c('red', 'blue', 'green',' brown'), bty='o', cex=1.5)
But that seems to just display a legend for the x,y coordinates not my ID's displayed in the plot. Can anyone tell me what i'm doing wrong?
Edit: I got it! The function "col=" doesnt work for polygons. Its "colpol=" Thanks for all the help
The hr object has a class of "area" and "data.frame". There is an area method for plot. It has a colpol argument. See ?plot.area when adehabitat is loaded:
plot(hr, colpol=c('red', 'blue', 'green',' brown') )
Originally it was not clear that you wanted to color the 4 (not 5) areas produced. I thought you wanted the points colored by group, which is what this produced.
If you know that ID is already a factor then the factor call is not needed. as.numeric applied to a factor turns it into an integer ranging from 1 to the number of levels, and that is being used as an index into that vector of 5 colors. If you want to see the names all of the 657 colors available, just type colors(). Refer to ?colors for additional links for managing color palettes.
As pointed out, we don't have the data or the mcp function to see what the hr object gets plotted as. If the plot method for that object is not assigning individual colors for the points, then do this instead:
points(xy[,1], xy[,2],
col = c("red", "green", "blue", "orange", "sandybrown")[as.numeric(factor(dd[,"ID"]))]
)
Is this what you are looking for
plot(hr$X,hr$Y,main="95% Minimum Convex Polygon",xlab="X Coordinate",
ylab="Y Coordinate",
col = rainbow(length(hr$ID))[rank(hr$ID)],
pch=c(1:25)[as.numeric(factor(hr$ID))])
legend('topleft', unique(unlist(as.character(factor(hr$ID)))) ,lty=1,
col=rainbow(length(hr$ID))[ unique(unlist(rank(hr$ID)))],
pch=c(1:25)[unique(unlist(as.numeric(factor(hr$ID))))],
bty='o', cex=1.5)
I have a simple scatter plot
x<-rnorm(100)
y<-rnorm(100)
z<-rnorm(100)
I want to plot the plot(x,y) but the color of the points should be color coded based on z.
Also, I would like to have the ability to define how many groups (and thus colours) z should have. And that this grouping should be resistant to outliers (maybe split the z density into n equal density groups).
Till now I do this manually, is there any way to do this automatically?
Note: I want to do this with base R not with ggplot.
You can pass a vector of colours to the col parameter, so it is just a matter of defining your z groups in a way that makes sense for your application. There is the cut() function in base, or cut2() in Hmisc which offers a bit more flexibility. To assist in picking reasonable colour palettes, the RColorBrewer package is invaluable. Here's a quick example after defining x,y,z:
z.cols <- cut(z, 3, labels = c("pink", "green", "yellow"))
plot(x,y, col = as.character(z.cols), pch = 16)
You can obviously add a legend manually. Unfortunately, I don't think all types of plots accept vectors for the col argument, but type = "p" obviously works. For instance, plot(x,y, type = "l", col = as.character(z.cols)) comes out as a single colour for me. For these plots, you can add different colours with lines() or segments() or whatever the low level plotting command you need to use is. See the answer by #Andrie for doing this with type = "l" plots in base graphics here.