If I generate a ggplot by:
x <- rnorm( 10^3, mean=0, sd=1)
y <- rnorm( 10^3, mean=0, sd=1)
z=x^2+y^2
df <- data.frame(x,y,z)
ggplot(df)+geom_point(aes(x,y,color=z))
By default, this is plotted on a blue scale. How can I combine different colors to make a new scale?
There is an almost limitless number of ways to set colors in ggplot, so many in fact that it can get confusing. Here are a couple of examples to get you started. See documentation here, here, and here many more options. IMO this site gives an excellent overview of color options in ggplot.
As #rawr points out in the comment, the options all involve some version of scale_color_
scale_color_gradient(...) associates colors with low and high values of the color scale variable and interpolates between them.
ggp <- ggplot(df)+geom_point(aes(x,y,color=z))
ggp + scale_color_gradient(low="red", high="blue")
scale_color_gradientn(...) takes a color palette as argument (e.g., a vector of colors) and interpolates between those. Color palettes can be defined manually or using one of the many tools in R. For example, the RColorBrewer package provides access to the color schemes on www.colorbrewer.org.
library(RColorBrewer) # for brewer.pal(...)
ggp + scale_color_gradientn(colours=rev(brewer.pal(9,"YlOrRd")))
library(colorRamps) # for matlab.like(...)
ggp + scale_color_gradientn(colours=matlab.like(10))
scale_color_gradient2(...) produces a divergent color scale, designed for data that has a natural midpoint (your example doesn't...).
ggp +
scale_color_gradient2(low="blue",mid="green",high="red",midpoint=5,limits=c(0,10))
This really just scratches the surface. For example, there is another set of tools in ggplot to deal with discrete color scales.
Related
I'm trying to create a picture with points (actually bars, but whatever) in two distinct colours with parallel saturated-to-unsaturated colour scales, with corresponding colourbar legends. I'm most of the way there, but there are a few minor points I can't handle yet.
tl;dr the color scales I get from a red-to-white gradient and a saturated-red-to-completely-unsaturated gradient are not identical.
Set up data: y will determine both y-axis position and degree of saturation, w will determine binary colour choice.
set.seed(101)
dd <- data.frame(x=1:100,y=rnorm(100))
dd$w <- as.logical(sample(0:1,size=nrow(dd),
replace=TRUE))
Get packages:
library(ggplot2)
library(cowplot)
library(gridExtra)
I can get the plot I want by allowing alpha (transparency) to vary with y, but the legend is ugly:
g0 <- ggplot(dd,aes(x,y))+
geom_point(size=8,aes(alpha=y,colour=w))+
scale_colour_manual(values=c("red","blue"))
## + scale_alpha(guide="colourbar") ## doesn't work
I can draw each half of the points by themselves to get a legend similar to what I want:
g1 <- ggplot(dd[!dd$w,],aes(x,y))+
geom_point(size=8,aes(colour=y))+
scale_colour_gradient(low="white",high="red",name="not w")+
expand_limits(x=range(dd$x),y=range(dd$y))
g2 <- ggplot(dd[dd$w,],aes(x,y))+
geom_point(size=8,aes(colour=y))+
scale_colour_gradient(low="white",high="blue",name="w")+
expand_limits(x=range(dd$x),y=range(dd$y))
Now I can use tools from cowplot to pick off the legends and combine them with the original plot:
g1_leg <- get_legend(g1)
g2_leg <- get_legend(g2)
g0_noleg <- g0 + theme(legend.position='none')
ggdraw(plot_grid(g0_noleg,g1_leg,g2_leg,nrow=1,rel_widths=c(1,0.2,0.2)))
This is most of the way there, but:
ideally I'd like to squash the two colourbars together (I know I can probably do that with sufficient grid-hacking ...)
the colours don't quite match; the legend colours are slightly warmer than the point colours ...
Ideas? Or other ways of achieving the same goal?
I am trying to build a type of color density plot similar to the one here:
https://stats.stackexchange.com/questions/26676/generating-visually-appealing-density-heat-maps-in-r
But with different types of data that goes into it. My real data has a bunch of rows but for example I have code that is put into a data frame that is X, Y, Score and I want to have a color density plot using these static X, Y buckets. Is that possible?
X=seq(0,10,by=1)
Y=seq(50,60,by=1)
total=expand.grid(X,Y)
nrow(total)
total$score=runif(nrow(total), min=0, max=100)
range(total$score)
head(total)
my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 100)
col_breaks = c(seq(0,100,length=100))
col=data.frame(as.character(my_palette),col_breaks)
col$num=row.names(col)
head(col)
col$col_breaks=round(col$col_breaks,0)
names(col)[1]="hex"
total$round=round(total$score)
total$color=as.character(col$hex[match(total$round,col$col_breaks)])
plot(total$Var1,total$Var2,col=total$color,xlim=c(0,10),ylim=c(50,60))
I am not trying to hexbin or anything confine into boxes, figured that out using conditional rect() with colors but wondering with this type of data if there is a way for it to sort of be more of a freeflowing shape of heat similar to this:
Or does it need to be continuous data to do something like that?
If I understand your question correctly, I think you can do this in ggplot.
Basically you can use geom_raster to fill in the tiles with an interpolate option so it won't look "blocky". You can then set the gradient to what you want. So for example, based on the sample data you gave me I have set the low, mid, high colours to be blue, white and red respectively. It would simply be the following code:
library(ggplot2)
ggplot(total, aes(x=Var1, y=Var2)) +
geom_raster(aes(fill=score), interpolate=TRUE) +
scale_fill_gradient2(limits=c(0,100), low="blue", mid="white", high="red", midpoint = 50)
Output:
Let's say I have this data.frame:
df <- data.frame(x = rep(1, 20), y = runif(20, 10, 20))
and I want to plot df$y vs. df$x.
Since the x values are constant, points that have identical or close y values will be plotted on top of each other in a simple scatterplot, which kind of hides the density of points at such y-values. One solution for that situation is of course to use a violin plot.
I'm looking for another solution - plotting clusters of points instead of the individual points, which will therefore look similar to a bubble plot. In a bubble plot however, a third dimension is required in order to make the bubbles meaningful, which I don't have in my data. Does anyone know of an R function/package that take as input points (and probably a defined radius) and will cluster them and plot them?
You can jitter the x values:
plot(jitter(df$x),df$y)
You could try a hexplot, using either the hexplot library or stat_binhex in ggplot2.
http://cran.r-project.org/web/packages/hexbin/
http://docs.ggplot2.org/0.9.3/stat_binhex.html
The other standard approach (vs. jitter) is to use a partially transparent color, so that overlapping points will appear darker than "lone" points.
De gustibus, etc.
Using transparency is another solution. E.g.:
ggplot(df, aes(x=x, y=y)) +
geom_point(alpha=0.2, size=3)
When there is only one x value, a density plot:
ggplot(df, aes(x=y)) +
stat_density(geom="line")
or a violin plot:
ggplot(df, aes(x=x, y=y)) +
geom_violin()
might also be options for displaying your data.
look at the sunflowerplot function (and the xyTable function that it uses to count overlapping points).
You could also use the my.symbols function from the TeachingDemos package with the results of xyTable to use other shapes (polygrams or example).
I'm looking at behavior of different groups of people (called Clusters in this data set) and their preference for the type of browser they use. I want to create a bar graph that shows the percentage of each cluster that is using each type of browser.
Here is some code to generate a similar dataset (please ignore that the percentages for each cluster will not add up to 1):
browserNames <- c("microsoft","mozilla","google")
clusterNames <- c("Cluster 1","Cluster 2","Cluster 3")
percentages <- runif(n=length(browserNames)*length(clusterNames),min=0,max=1)
myData<-as.data.frame(list(browserNames=rep(browserNames,3),
clusterNames=rep(clusterNames,each=3),
percentages=percentages))
Here's the code I've been able to come up with so far to get the graph I desire:
ggplot(myData, aes(x=browserNames, y=percentages, fill=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge") +
scale_y_continuous(name="Percent Weight", labels=percent)
I want the fill for each cluster to be a gradient fill with high and low values that I determine. So, in this example, I would like to be able to set 3 high and low values for each cluster that is represented.
I've had trouble with the different scale_fill commands, and I'm new enough to ggplot that I am pretty sure I'm probably just doing it wrong. Any ideas?
Edit: Here is a picture of what I'm looking for:
(Original image available at https://www.dropbox.com/s/py6hifejqz7k54v/gradientExample.bmp)
Is this close to what you had in mind??
# color set depends on browser
library(RColorBrewer) # for brewer.pal(...)
gg <- with(myData, myData[order(browserNames,percentages),])
gg$colors <- 1:9
colors <- c(brewer.pal(3,"Reds"),brewer.pal(3,"Greens"),brewer.pal(3,"Blues"))
ggplot(zz, aes(x=browserNames, y=percentages,
fill=factor(colors), group=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge", color="grey70") +
scale_fill_manual("Cluster", values=colors,
breaks=c(3,6,9), labels=c("Google","Microsoft","Mosilla"))
# color set depends on cluster
library(RColorBrewer) # for brewer.pal(...)
gg <- with(myData, myData[order(clusterNames,percentages),])
gg$colors <- 1:9
col <- c(brewer.pal(3,"Reds"),brewer.pal(3,"Greens"),brewer.pal(3,"Blues"))
ggplot(gg, aes(x=browserNames, y=percentages,
fill=factor(colors), group=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge", color="grey70") +
scale_fill_manual("Cluster", values=col,
breaks=c(3,6,9), labels=c("Cluster1","Cluster2","Cluster3"))
I have a R dataframe (df), which I am plotting as a bar graph in ggplot2 and coloring based on a column in the dataframe (df$type). Right now, I am using the default coloring pattern (scale_fill_brewer) to assign colors.
How can I assign the color black to one value, (df$type == -1) and use scale_fill_brewer to assign the rest of the colors? (all other df$types are a within a set of integers from 1 to X, where X is the number of unique values)
So far, I have been able to do this manually by figuring out the set of colors scale_fill_brewer uses for N different items then predending the color black and passing that to scale_fill_manual.
rhg_cols1<- c("#000000","#F8766D","#7CAE00","#00BFC4","#C77CFF" )
ggplot(y=values,data=df, aes(x=name, fill=factor(type))) +
geom_bar()+ scale_fill_manual(values = rhg_cols1)
The problem is that I need a solution that works without manually assigning colors by using a hex color calculator to figuring out the hex values of scale_fill_brewer.
something like:
ggplot(y=values,data=df, aes(x=name, fill=factor(type))) +
geom_bar()+ scale_fill_brewer(value(-1, "black")
Thank you!
EDIT: The solution must work for more than 30 colors and work for "Set2" of ColorBrewer
The package RColorBrewer contains the palettes and you can use the function brewer.pal to return a colour palette of your choice.
For example, a sequential blue palette of 5 colours:
library(RColorBrewer)
my.cols <- brewer.pal(5, "Blues")
my.cols
[1] "#EFF3FF" "#BDD7E7" "#6BAED6" "#3182BD" "#08519C"
You can get a list of valid palette names in the ?brewer.pal help files. These names correspond with the names at the ColorBrewer website.
You can now use or modify the results and pass these to ggplot using the scale_manual_fill as you suggested:
my.cols[1] <- "#000000"
library(ggplot2)
df <- data.frame(x=1:5, type=1:5)
ggplot(df, aes(x=x, fill=factor(type))) +
geom_bar(binwidth=1)+
scale_fill_manual(values = my.cols)
If you need to distinguish among this many (30+) different categories you probably need to back up and spend some more time thinking about the project strategically: it will be nearly impossible to come up with a set of 30 colo(u)rs that are actually distinguishable (especially in a way that is independent of platform/rendering channel).
There is basically no solution that will work with Set2 and 30+ colours. Some of the CB palettes (Set3 and Paired; library(RColorBrewer); display.brewer.all(n=12)) allow as many as 12 colours.
edit: the OP wants to do exploratory data analysis with good, distinguishable colours that won't break if there happen to be a lot of categories. I would suggest something along these lines:
library(RColorBrewer)
my.cols <- function(n) {
black <- "#000000"
if (n <= 9) {
c(black,brewer.pal(n-1, "Set2"))
} else {
c(black,hcl(h=seq(0,(n-2)/(n-1),
length=n-1)*360,c=100,l=65,fixup=TRUE))
}
}
library(ggplot2)
d <- data.frame(z=1:10)
g1 <- qplot(z,z,data=d,colour=factor(z))+opts(legend.position="none")
g1 + scale_colour_manual(values=my.cols(9))
g1 + scale_colour_manual(values=my.cols(10))
## check that we successfully recreated ggplot2 internals
## g1+scale_colour_discrete()
I think this works reasonably well (you could substitute Set3 and a cutoff of 13 colours if you preferred). The only drawback (that I can think of) is the discontinuity between the plots with 9 and 10 colours.
Coming up with a better solution for picking sets of N distinguishable colours in a programmatic way is going to be pretty hard ...