colors in legend with many groups - r

I know it could be a simple question, but I'm really struggling..
Using the iris dataset as an example, I can use the following code to perform a plot with different colors depending on the different species:
plot(iris$Sepal.Length, iris$Sepal.Width, col=iris$Species)
legend('topright', legend=c("setosa", "virginica", "versicolor"))
but actually I'm not able to add the same colors used in the plot call in the legend.
Furthermore, the dataset I'm using has several unique values. Is there a way to add the colors without specifying them manually?
ggplot adds automatically the legend according to the color used in the aes option, but I need to do the same thing with the plot package.
Is there a simple solution?
Thanks

Your plot call uses the default colors corresponding to the values of iris$Species (i.e. 1,2,3, remember it's a factor, see the output of as.numeric(iris$Species)!)
For only a few classes the easy solution is to do something like:
cols <- c("darkgreen", "darkblue", "orange")
plot(iris$Sepal.Length, iris$Sepal.Width, col=cols[iris$Species], pch=20)
legend('topright', legend=levels(iris$Species),
col= cols, pch=20)
A more general solution is to use a palette of colours, using functions such as heat.colors, gray.colors or rainbow.colors, for instance
cols <- heat.colors(5) // 5 is the number of colors in the palette
More fancy palettes are found in the RColorBrewer package.
Also, the colorRampPalette function allows you to blend those palettes to get a more smooth result. For instance:
library(RColorBrewer)
colorRampPalette(brewer.pal(9,"Blues"))(100)

you can try this
plot(Sepal.Width~Sepal.Length, data=iris, col=Species)
legend('topright', legend=levels(iris$Species), col=1:3, pch=1)

Related

Issues with colour in plots [duplicate]

I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code:
data <- iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
This is all well and good but how do I know what factor has been coloured what colour??
data<-iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1)
should do it for you. But I prefer ggplot2 and would suggest that for better graphics in R.
The command palette tells you the colours and their order when col = somefactor. It can also be used to set the colours as well.
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
In order to see that in your graph you could use a legend.
legend('topright', legend = levels(iris$Species), col = 1:3, cex = 0.8, pch = 1)
You'll notice that I only specified the new colours with 3 numbers. This will work like using a factor. I could have used the factor originally used to colour the points as well. This would make everything logically flow together... but I just wanted to show you can use a variety of things.
You could also be specific about the colours. Try ?rainbow for starters and go from there. You can specify your own or have R do it for you. As long as you use the same method for each you're OK.
Like Maiasaura, I prefer ggplot2. The transparent reference manual is one of the reasons.
However, this is one quick way to get it done.
require(ggplot2)
data(diamonds)
qplot(carat, price, data = diamonds, colour = color)
# example taken from Hadley's ggplot2 book
And cause someone famous said, plot related posts are not complete without the plot, here's the result:
Here's a couple of references:
qplot.R example,
note basically this uses the same diamond dataset I use, but crops the data before to get better performance.
http://ggplot2.org/book/
the manual: http://docs.ggplot2.org/current/
There are two ways that I know of to color plot points by factor and then also have a corresponding legend automatically generated. I'll give examples of both:
Using ggplot2 (generally easier)
Using R's built in plotting functionality in combination with the colorRampPallete function (trickier, but many people prefer/need R's built-in plotting facilities)
For both examples, I will use the ggplot2 diamonds dataset. We'll be using the numeric columns diamond$carat and diamond$price, and the factor/categorical column diamond$color. You can load the dataset with the following code if you have ggplot2 installed:
library(ggplot2)
data(diamonds)
Using ggplot2 and qplot
It's a one liner. Key item here is to give qplot the factor you want to color by as the color argument. qplot will make a legend for you by default.
qplot(
x = carat,
y = price,
data = diamonds,
color = diamonds$color # color by factor color (I know, confusing)
)
Your output should look like this:
Using R's built in plot functionality
Using R's built in plot functionality to get a plot colored by a factor and an associated legend is a 4-step process, and it's a little more technical than using ggplot2.
First, we will make a colorRampPallete function. colorRampPallete() returns a new function that will generate a list of colors. In the snippet below, calling color_pallet_function(5) would return a list of 5 colors on a scale from red to orange to blue:
color_pallete_function <- colorRampPalette(
colors = c("red", "orange", "blue"),
space = "Lab" # Option used when colors do not represent a quantitative scale
)
Second, we need to make a list of colors, with exactly one color per diamond color. This is the mapping we will use both to assign colors to individual plot points, and to create our legend.
num_colors <- nlevels(diamonds$color)
diamond_color_colors <- color_pallet_function(num_colors)
Third, we create our plot. This is done just like any other plot you've likely done, except we refer to the list of colors we made as our col argument. As long as we always use this same list, our mapping between colors and diamond$colors will be consistent across our R script.
plot(
x = diamonds$carat,
y = diamonds$price,
xlab = "Carat",
ylab = "Price",
pch = 20, # solid dots increase the readability of this data plot
col = diamond_color_colors[diamonds$color]
)
Fourth and finally, we add our legend so that someone reading our graph can clearly see the mapping between the plot point colors and the actual diamond colors.
legend(
x ="topleft",
legend = paste("Color", levels(diamonds$color)), # for readability of legend
col = diamond_color_colors,
pch = 19, # same as pch=20, just smaller
cex = .7 # scale the legend to look attractively sized
)
Your output should look like this:
Nifty, right?
The col argument in the plot function assign colors automatically to a vector of integers. If you convert iris$Species to numeric, notice you have a vector of 1,2 and 3s So you can apply this as:
plot(iris$Sepal.Length, iris$Sepal.Width, col=as.numeric(iris$Species))
Suppose you want red, blue and green instead of the default colors, then you can simply adjust it:
plot(iris$Sepal.Length, iris$Sepal.Width, col=c('red', 'blue', 'green')[as.numeric(iris$Species)])
You can probably see how to further modify the code above to get any unique combination of colors.
The lattice library is another good option. Here I've added a legend on the right side and jittered the points because some of them overlapped.
xyplot(Sepal.Width ~ Sepal.Length, group=Species, data=iris,
auto.key=list(space="right"),
jitter.x=TRUE, jitter.y=TRUE)

How to specify color in base R for certain variables

I am having to use base R and I am only really familiar with ggplot2. I have told R to colour my plot by site, which it has done... but unlike ggplot2 it doesn't automatically give me a legend to tell me which colour it has assigned to which site. Is there either a way to tell R to plot a site in a specific colour or a way to make it create a legend. I tried creating a legend but it seemed I had to input exactly what colour, shape etc should be on the legend which defeats the point.
Thanks.
plot(mod1.xval, xval=TRUE, resid=TRUE, xlab="Observed elevation (m AHD)", ylab= "Inferred elevation (m AHD)",npls=2,col=spec$Site)
Here is an example with the iris data set.
Species would be your Site. You define the color for Species and then build the legend, in which the colors are made with unique(iris$Species)) .
plot(Sepal.Length~ Sepal.Width, iris, col=Species, pch=16)
legend('topleft', col=unique(iris$Species), legend=levels(iris$Species), pch =15)

R: adding a plot legend in R

plot(iris$Sepal.Length, iris$Sepal.Width, col = iris$Species)
I know that I can use the legend() function to manually set my legend. However, I have no idea which color was assigned to the different species in my data? Is there an automatic way to get plot() to add a legend?
As #rawr says, palette() determines the colour sequence used. If you use integers to specify colours, it will also look at palette(). Thus
with(iris,plot(Sepal.Length, Sepal.Width, col = Species))
legend("topright",legend=levels(iris$Species),col=1:3, pch=1)
works nicely.
Base R doesn't have an auto-legend facility: the ggplot2 package does.
library(ggplot2)
ggplot(iris,aes(Sepal.Length,Sepal.Width,colour=Species))+geom_point()
gives you a plot with an automatic legend (use theme_set(theme_bw()) if you don't like the grey background).
The built-in lattice package can also do automatic legends:
library(lattice)
xyplot(Sepal.Width~Sepal.Length,group=Species,data=iris,auto.key=TRUE)

Use R to make a barplot with bar colors determined by the height of the bar?

I would like to use R to make a barplot of ~100,000 numerical entries. The plot will be dense, which is what I want. So far I am using the following code:
sample_var <- c(2,5,3,2,3,2,6,10,20,...) #Filled with 100,000 entries
barplot(sample_var)
The resulting plot is just what I want, but I would like to make a conditional formatting statement so that bars less than 5 will be black, bars >= 5 and <= 10 are green, and bars > 10 are red.
Any help is appreciated!
Update: Looking at other solutions, "easily" was an overstatement. However, I'll leave my answer here for reference. Look at my other answer for a solution which does not require ggplot2.
You can use the ggplot2 package to produce that plot easily, using the bar geometry and identity statistic.
library(ggplot2)
sample_var <- log(runif(10000) + 1)
ggplot(data.frame(x=seq(1:length(sample_var)), y=sample_var), aes(x=x, y=y, fill=y)) + geom_bar(stat="identity")
If you want a simple answer, how about using the next vector as colors.
colors = as.character(cut(sample_var,breaks=c(0,5,10,20),labels=c('black','green','red')))
I do not quite remember where the inequalities are set in cut() but a simple help should clear everything.
But more importantly, do not make a barplot of 100000 entries.
You can use ?ifelse to create a vector of colors and include that in the call to barplot. To make it possible for the colors to show up with so many bars, do not include a border around your bars (h/t to #musically_ut).
set.seed(1) # this will allow you to get exactly the same data
# this generates data to use for the example plot:
sample_var <- rpois(100000, lambda=5)
cols <- ifelse(sample_var<=5, "black",
ifelse(sample_var<=10, "green", "red"))
barplot(sample_var, col=cols, border=NA)
I find nested ifelse()'s ugly and so generally use findInterval to do selections from disjoint choices over a range of intervals. This is an alternative to #gung's answer:
set.seed(1)
sample_var <- rpois(100000, lambda=5)
cols <- c("black", "green", "red") [findInterval(samplevar, c(-Inf, 5, 10, Inf) ) ]
barplot(sample_var, col=cols, border=NA)
This has the advantage that it's very easy to change the cutpoints and colors. (no need to put in an image; it's identical to gung's image.
Adding a separate answer which does not use ggplot2 but the native R functions.
You can use the palette functions in R to generate a gradient to suit your granularity:
sample_var <- log(runif(100000) + 1)
max.colors <- 1000
cols <- heat.colors(max.colors)
barplot(sample_var, col=cols[ max.colors - floor(max.colors * sample_var / max(sample_var)) ], border=NA)
There are some artefacts of condensing 100,000 lines into 800 or so pixels visible here. Some of the bars (periodically) are absent.

Scatterplot with color groups - base R plot

I have a simple scatter plot
x<-rnorm(100)
y<-rnorm(100)
z<-rnorm(100)
I want to plot the plot(x,y) but the color of the points should be color coded based on z.
Also, I would like to have the ability to define how many groups (and thus colours) z should have. And that this grouping should be resistant to outliers (maybe split the z density into n equal density groups).
Till now I do this manually, is there any way to do this automatically?
Note: I want to do this with base R not with ggplot.
You can pass a vector of colours to the col parameter, so it is just a matter of defining your z groups in a way that makes sense for your application. There is the cut() function in base, or cut2() in Hmisc which offers a bit more flexibility. To assist in picking reasonable colour palettes, the RColorBrewer package is invaluable. Here's a quick example after defining x,y,z:
z.cols <- cut(z, 3, labels = c("pink", "green", "yellow"))
plot(x,y, col = as.character(z.cols), pch = 16)
You can obviously add a legend manually. Unfortunately, I don't think all types of plots accept vectors for the col argument, but type = "p" obviously works. For instance, plot(x,y, type = "l", col = as.character(z.cols)) comes out as a single colour for me. For these plots, you can add different colours with lines() or segments() or whatever the low level plotting command you need to use is. See the answer by #Andrie for doing this with type = "l" plots in base graphics here.

Resources