I am using plot3d() from rgl to make 3D scatter plots, using three columns, of samples in a data frame. Furthermore, I am using a fourth column colorby from my data frame (where each sample takes values, say, 10, 11, 12 as factors/levels) to color the points in the plot.
When using plot() to make 2D plots, I first set palette(rainbow(3)) and then col = colorby within plot() followed by legend(). The plot, colors and legend work fine.
However, when I repeat the last part for plot3d(), the coloring mixes up and not the same colors are assigned to the same levels as they would be in plot(). Moreover, if I use legend3d("topright", legend = levels(colorby), col = rainbow(3)) to create a legend, it looks the same as the 2D legend, but the coloring is clearly wrong in the 3D plot.
Where am I going wrong?
This looks correct to me:
df <- data.frame(x=1:9, y=9:1, z=101:109, colorby=gl(3,3))
palette(rainbow(3))
plot(x~y, df, col = df$colorby)
library(rgl)
with(df, plot3d(x,y,z, col = colorby))
legend3d("topright", legend = levels(df$colorby), col = levels(df$colorby), pch=19)
Related
I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code:
data <- iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
This is all well and good but how do I know what factor has been coloured what colour??
data<-iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1)
should do it for you. But I prefer ggplot2 and would suggest that for better graphics in R.
The command palette tells you the colours and their order when col = somefactor. It can also be used to set the colours as well.
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
In order to see that in your graph you could use a legend.
legend('topright', legend = levels(iris$Species), col = 1:3, cex = 0.8, pch = 1)
You'll notice that I only specified the new colours with 3 numbers. This will work like using a factor. I could have used the factor originally used to colour the points as well. This would make everything logically flow together... but I just wanted to show you can use a variety of things.
You could also be specific about the colours. Try ?rainbow for starters and go from there. You can specify your own or have R do it for you. As long as you use the same method for each you're OK.
Like Maiasaura, I prefer ggplot2. The transparent reference manual is one of the reasons.
However, this is one quick way to get it done.
require(ggplot2)
data(diamonds)
qplot(carat, price, data = diamonds, colour = color)
# example taken from Hadley's ggplot2 book
And cause someone famous said, plot related posts are not complete without the plot, here's the result:
Here's a couple of references:
qplot.R example,
note basically this uses the same diamond dataset I use, but crops the data before to get better performance.
http://ggplot2.org/book/
the manual: http://docs.ggplot2.org/current/
There are two ways that I know of to color plot points by factor and then also have a corresponding legend automatically generated. I'll give examples of both:
Using ggplot2 (generally easier)
Using R's built in plotting functionality in combination with the colorRampPallete function (trickier, but many people prefer/need R's built-in plotting facilities)
For both examples, I will use the ggplot2 diamonds dataset. We'll be using the numeric columns diamond$carat and diamond$price, and the factor/categorical column diamond$color. You can load the dataset with the following code if you have ggplot2 installed:
library(ggplot2)
data(diamonds)
Using ggplot2 and qplot
It's a one liner. Key item here is to give qplot the factor you want to color by as the color argument. qplot will make a legend for you by default.
qplot(
x = carat,
y = price,
data = diamonds,
color = diamonds$color # color by factor color (I know, confusing)
)
Your output should look like this:
Using R's built in plot functionality
Using R's built in plot functionality to get a plot colored by a factor and an associated legend is a 4-step process, and it's a little more technical than using ggplot2.
First, we will make a colorRampPallete function. colorRampPallete() returns a new function that will generate a list of colors. In the snippet below, calling color_pallet_function(5) would return a list of 5 colors on a scale from red to orange to blue:
color_pallete_function <- colorRampPalette(
colors = c("red", "orange", "blue"),
space = "Lab" # Option used when colors do not represent a quantitative scale
)
Second, we need to make a list of colors, with exactly one color per diamond color. This is the mapping we will use both to assign colors to individual plot points, and to create our legend.
num_colors <- nlevels(diamonds$color)
diamond_color_colors <- color_pallet_function(num_colors)
Third, we create our plot. This is done just like any other plot you've likely done, except we refer to the list of colors we made as our col argument. As long as we always use this same list, our mapping between colors and diamond$colors will be consistent across our R script.
plot(
x = diamonds$carat,
y = diamonds$price,
xlab = "Carat",
ylab = "Price",
pch = 20, # solid dots increase the readability of this data plot
col = diamond_color_colors[diamonds$color]
)
Fourth and finally, we add our legend so that someone reading our graph can clearly see the mapping between the plot point colors and the actual diamond colors.
legend(
x ="topleft",
legend = paste("Color", levels(diamonds$color)), # for readability of legend
col = diamond_color_colors,
pch = 19, # same as pch=20, just smaller
cex = .7 # scale the legend to look attractively sized
)
Your output should look like this:
Nifty, right?
The col argument in the plot function assign colors automatically to a vector of integers. If you convert iris$Species to numeric, notice you have a vector of 1,2 and 3s So you can apply this as:
plot(iris$Sepal.Length, iris$Sepal.Width, col=as.numeric(iris$Species))
Suppose you want red, blue and green instead of the default colors, then you can simply adjust it:
plot(iris$Sepal.Length, iris$Sepal.Width, col=c('red', 'blue', 'green')[as.numeric(iris$Species)])
You can probably see how to further modify the code above to get any unique combination of colors.
The lattice library is another good option. Here I've added a legend on the right side and jittered the points because some of them overlapped.
xyplot(Sepal.Width ~ Sepal.Length, group=Species, data=iris,
auto.key=list(space="right"),
jitter.x=TRUE, jitter.y=TRUE)
Using the iris dataset, I am going to find a way to get the legend in coplot when I define the color of point as variable variable, in this example (Species).
in other words, I want to see a legend to tell me which shape and color represent which Species?
following is the script
coplot(Sepal.Width~Sepal.Length|Petal.Width*Petal.Length, data = iris,
number=c(3,3),overlap=.5,col=as.numeric(iris$Species),
pch=as.numeric(iris$Species)+1)
this is the produced graph:
coplot(Sepal.Width~Sepal.Length|Petal.Width*Petal.Length, data = iris,
number=c(3,3),overlap=.5,col=as.numeric(iris$Species),
pch=as.numeric(iris$Species)+1)
legend("topright", pch = unique(as.numeric(iris$Species)+1),
col = unique(as.numeric(iris$Species)),
legend = unique(iris$Species))
You just have to adjust legend position to what fits better to your figure size.
I plot several lines on a graph using matplot:
matplot(cumsum(as.data.frame(daily.pnl)),type="l")
This gives me default colours for each line - which is fine,
But I now want to add a legend that reflects those same colours - how can I achieve that?
PLEASE NOTE - I am trying NOT to specify the colours to matplot in the first place.
legend(0,0,legend=spot.names,lty=1)
Gives me all the same colour.
The default color parameter to matplot is a sequence over the nbr of column of your data.frame. So you can add legend like this :
nn <- ncol(daily.pnl)
legend("top", colnames(daily.pnl),col=seq_len(nn),cex=0.8,fill=seq_len(nn))
Using cars data set as example, here the complete code to add a legend. Better to use layout to add the legend in a pretty manner.
daily.pnl <- cars
nn <- ncol(daily.pnl)
layout(matrix(c(1,2),nrow=1), width=c(4,1))
par(mar=c(5,4,4,0)) #No margin on the right side
matplot(cumsum(as.data.frame(daily.pnl)),type="l")
par(mar=c(5,0,4,2)) #No margin on the left side
plot(c(0,1),type="n", axes=F, xlab="", ylab="")
legend("center", colnames(daily.pnl),col=seq_len(nn),cex=0.8,fill=seq_len(nn))
I have tried to reproduce what you are looking for using the iris dataset. I get the plot with the following expression:
matplot(cumsum(iris[,1:4]), type = "l")
Then, to add a legend, you can specify the default lines colour and type, i.e., numbers 1:4 as follows:
legend(0, 800, legend = colnames(iris)[1:4], col = 1:4, lty = 1:4)
Now you have the same in the legend and in the plot. Note that you might need to change the coordinates for the legend accordingly.
I like the #agstudy's trick to have a nice legend.
For the sake of comparison, I took #agstudy's example and plotted it with ggplot2:
The first step is to "melt" the data-set
require(reshape2)
df <- data.frame(x=1:nrow(cars), cumsum(data.frame(cars)))
df.melted <- melt(df, id="x")
The second step looks rather simple in comparison to the solution with matplot
require(ggplot2)
qplot(x=x, y=value, color=variable, data=df.melted, geom="line")
Interestingly #agstudy solution does the trick, but only for n ≤ 6
Here we have a matrix with 8 columns. The colour of the first 6 labels are correct.
The 7th and 8th are wrong. The colour in the plots restarts from the beginning (black, red ...) , whereas in the label it continues (yellow, grey, ...)
Still haven't figured out why this is the case. I'll maybe update this post with my findings.
matplot(x = lambda, y = t(ridge$coef), type = "l", main="Ridge regression", xlab="λ", ylab="Coefficient-value", log = "x")
nr = nrow(ridge$coef)
legend("topright", rownames(ridge$coef), col=seq_len(nr), cex=0.8, lty=seq_len(nr), lwd=2)
Just discovered that matplot uses linetypes 1:5 and colors 1:6 to establish the appearance of the lines. If you want to create a legend try the following approach:
## Plot multiple columns of the data frame 'GW' with matplot
cstart = 10 # from column
cend = cstart + 20 # to column
nr <- cstart:cend
ltyp <- rep(1:5, times=length(nr)/5, each=1) # the line types matplot uses
cols <- rep(1:6, times=length(nr)/6, each=1) # the cols matplot uses
matplot(x,GW[,nr],type='l')
legend("bottomright", as.character(nr), col=cols, cex=0.8, lty=ltyp, ncol=3)
I'm using prcomp to do PCA analysis in R, I want to plot my PC1 vs PC2 with different color text labels for each of the two categories,
I do the plot with:
plot(pca$x, main = "PC1 Vs PC2", xlim=c(-120,+120), ylim = c(-70,50))
then to draw in all the text with the different colors I've tried:
text(pca$x[,1][1:18], pca$[,1][1:18], labels=rownames(cava), col="green",
adj=c(0.3,-0.5))
text(pca$x[,1][19:35], pca$[,1][19:35], labels=rownames(cava), col="red",
adj=c(0.3,-0.5))
But R seams to plot 2 numbers over each other instead of one, the pcs$x[,1][1:18] plots the correct points I know because if I use that plot the points it works and produces the same plot as plot(pca$x).
It would be great if any could help to plot the labels for the two categories or
even plot the points different color to make it easy to differentiate between the plots easily.
You need to specify your x and y coordinates a bit differently:
text(pca$x[1:18,1], pca$x[1:18,2] ...)
This means take the first 18 rows and the first column (which is PC1) for the x coord, etc.
I'm surprised what you did doesn't throw an error.
If you want the points themselves colored, you can do it this way:
plot(pca$x, main = "PC1 Vs PC2", col = c(rep("green", 18), rep("red", 18)))
I have produced a figure with 4 density plots as shown below, using sm.density.compare in R:
How can I change the colour of the lines? Also of interest is how I can change the type of the line? i.e. dotted, solid
You can use the same parameters as in other base plotting functions - col= for the colors and lty= for the line types.
library(sm)
y <- rnorm(100)
g <- rep(1:2, rep(50,2))
sm.density.compare(y, g,col=c("blue","black"),lty=c(4,6))