increasing contrast of ggplot2 scale_colour_gradient in R? - r

I use scale_colour_gradient/scale_colour_gradient2 to make gradients of colour of points in a scatterplot. The gradient is set from red to dark red, or black to red, as in:
ggplot(iris) + geom_point(aes(x=Sepal.Width, y=Sepal.Length, colour=Sepal.Length)) + scale_colour_gradient(low="red", high="darkred")
I often set scale_colour_gradient to be on a log scale since it represents ratios. my question is how can I increase the contrast between the points of the scale? E.g. make it so the difference between distinct parts of the scale? The scale is always continuous in my case (real numbers). any relevant points on this would help.

It seems black-to-red or red-to-darkred gives you very little color space to work with. You can use hex code to assign more specific colors to your low and high settings, and add a black background to improve contrast. For instance:
+ scale_colour_gradient2(low="#22FF00", mid="white", high="#FF0000", midpoint=median(iris$Sepal.Length)) + theme(panel.grid=element_blank(), panel.background=element_rect(fill="black"))
gives you much more contrast. Note that I am using scale_color_gradient2, which allows you to set a midpoint color and ascribe it to a summary statistic of the data (here, I used the median). I also used two colors at relatively opposite ends of the spectrum. Adding the above to your code produces:
But aside from playing around with the specific colors until you're satisfied (http://www.rapidtables.com/web/color/RGB_Color.htm and iwanthue are good resources for picking colors), I don't know if there exists a way to set a gradient so that contrast is maximized throughout, without creating some ungodly complex rainbow of colors. As you probably know, contrast between any two given points in your data is directly proportional to the difference between the values of those points, so varying that relationship within different locales of your gradient probably isn't desirable, and to my knowledge is not possible in ggplot2.
EDIT: Another way to improve contrast is to color by the rank-order of your desired variable (in this example, Sepal.Length) instead of the variable itself. This creates a uniform distribution, which will "spread out" your data by giving equal distance between quantiles. HOWEVER, this may produce a misleading visualization of your data--if your data are highly skewed, some identical/near-identical values could be represented by fairly contrasting colors. So use with caution.
Compare with above:
iris <- iris[with(iris, order(Sepal.Length)),]
iris$rank <- 1:150
ggplot(iris) + geom_point(aes(x=Sepal.Width, y=Sepal.Length, colour=rank)) + scale_colour_gradient2(low="#22FF00", mid="white", high="#FF0000", midpoint=median(iris$rank)) + theme(panel.grid=element_blank(), panel.background=element_rect(fill="black"))
Also, I realize that red-to-green is the least-colorblind-safe choice of colors possible. So you will want to choose colors so that your spectrum doesn't include red or green.

Related

How to customize bin graph in R

I am trying to customize some of the graph I am creating with R. The code I'm using looks something like this one
c1<-runif(5000, min=0, max=100)
c2<-sample(1:12, 5000, replace=TRUE)
dd<-data.frame(c1,c2)
g<-ggplot(dd, aes(c2, c1)) +
geom_bin2d(binwidth=c(2,5)) +
scale_fill_gradientn(colors=c("yellow", "red", "black")) +
xlim(-1,13) + ylim(-5,55)
g
How can I make so that the graph is contained between 0 and 12 and 0 and 50 (I tried to set these value but it cut off a portion of the bin, but it's very ugly to see this huge grey border around the graph)? I need essentially to have a white background (and I know how to do it) with the axis in black and adjacent to the actual plot.
Is there also a way to "smooth" the color gradient between each bin? The only way I know how to do it is reducing the number of bins but I need exactly those numbers.
Try this:
g + scale_x_continuous(expand=expansion()) + scale_y_continuous(expand=expansion())
Check the documentation for scale_x_continuous, section for expand and
expansion: expand: For position scales, a vector of range
expansion constants
used to add some padding around the data to ensure that they
are placed some distance away from the axes. Use the
convenience function ‘expansion()’ to generate the values for
the ‘expand’ argument. The defaults are to expand the scale
by 5% on each side for continuous variables, and by 0.6 units
on each side for discrete variables.

Coloring scatter plot in terms of an intensity in R

We would like to explain a continuous variable Y in terms of X1,X2,X3,X4,X5 (continuous grades from 0/20 to 20/20).
When plotting Y vs. X1, I would like to color the points in terms of the means (X1+X2+X3+X4+X5)/5 to see if the candidates that are bad in X1 are globally bad.
So, mean, varying from 0 to 20, would be blood red and gradually going to bright green for the ones having (and also indicate this in a legend). How could one proceed this?
Here is how I usually draw my scatter plots:
scatter.smooth(x=data$X1, y=data$Y1, main="Y1 ~ X1", xlab="X1", ylab="Y1")
Even better would be that each point is colored in terms of its X1 value, and has a colored "ring" around it corresponding to the color of its mean.
If you have no objection to ggplot, it'll make these things easy.
To shade by a continuous variable aes(color=myvar), has the behavior you want straight out of the box.
Customize the colors with +scale_color_gradient(low='red', high='green')
To do the rings, draw two sets of points: first one with size 3 (or whatever) in the ring color, then dot the centers with a point of size 1.

Rescaling colors palette in r

In R i have a cloud of data around zero ,and some data around 1, i want to "rescale" my heat colors to distinguish lower numbers.This has to be done in a rainbow way, i don't want "discrete colors".I tried with breaks in image.plot but it doesn't work.
image.plot(X,Y,as.matrix(mymatrix),col=heat.colors(800),asp=1,scale="none")
I tried :
lowerbreak=seq(min(values),quantile2,len=80)
highbreak=seq(quantile2+0.0000000001,max(values),len=20)
break=c(lowerbreak,highbreak)
ii <- cut(values, breaks = break,
include.lowest = TRUE)
colors <- colorRampPalette(c("lightblue", "blue"))(99)[ii]
Here's an approach using the "squash" library. With makecmap(), you specify your colour values and breaks, and you can also specify that it should be log stretched using the base parameter. It's a bit complex, but gives you granular control. I use it to colorize skewed data, where I need more definition in the "low end".
To achieve the rainbow palette, I used the built-in "jet" colour function, but you can use any colour set - I give an example for creating a greyscale ramp with "colorRampPalette".
Whatever ramp you use, it will take some playing with the base value to optimize for your data.
install.packages("squash")
library("squash")
#choose your colour thresholds - outliers will be RED
minval=0 #lowest value to get a colour
maxval=2.0 #highest value to get a colour
n.cols=100 #how many colours do you want in your palette?
col.int=1/n.cols
#create your palette
colramp=makecmap(x=seq(minval,maxval,col.int),
n=n.cols,
breaks=prettyLog,
symm=F,
base=10,#to give ramp a log(base) stretch
colFn=jet,
col.na="red",
right=F,
include.lowest=T)
# If you don't like the colFn options in "makecmap", define your own!
# Here's an example in greyscale; pass this to "colFn" above
user.colfn=colorRampPalette(c("black","white"))
Example for using colramp in a plot (assuming you've already created colramp as above somewhere in your program):
varx=1:100
vary=1:100
plot(x,y,col=colramp$colors) #colors is the 2nd vector in the colramp list
To select specific colours, subset from the list via, e.g., colors[1:20] (if you try this with the example above, the first colors will repeat 5 times - not really useful but you get the logic and can play around).
In my case, I had a grid of values that I wanted to turn into a coloured raster image (i.e. colour mapping some continuous data). Here's example code for that, using a made up matrix:
#create a "dummy matrix"
matx=matrix(data=c(rep(2,50),rep(0,500),rep(0.5,500),rep(1,500),rep(1.5,500)),nrow=50,ncol=41,byrow=F)
#transpose the matrix
# the output of "savemat" is rotated 90 degrees to the left
# so savemat(maty) will be a colorized version of (matx)
maty=t(matx)
#savemat creates an image using colramp
savemat(x=maty,
filename="/Users/KeeganSmith/Desktop/matx.png",
map=colramp,
outlier="red",
dev="png",
do.dev.off=T)
When using colorRampPalette, you can set the bias argument to emphasise low (or high) values.
Something like colorRampPalette(heat.colors(100),bias=3) will result focus the 'ramp' on the lower, helping them to be more visually distinguishable.

Geom line plotting with gradient intensity based peak concentrations

I have some data of "heartbeats" - measured in amps - (let's say) that beat over a period of time (seconds). Here are some lines:
time amps
5.32632 0.0291289784
5.334 0.0271881307
5.33424 0.0463933055
5.33624 0.0149292168
5.33888 0.0668341603
5.33924 0.0384420334
5.3402 0.028831443
5.34036 0.0386542207
5.34052 0.0146365606
5.34136 0.0374055127
5.3414 0.0544995649
5.34168 0.0342488711
5.34184 0.0197212594
5.34212 0.2039598122
5.34232 0.0565000587
5.34236 0.0332496556
5.34256 0.0346007892
5.3426 0.0325735156
5.343 0.0317928565
5.34316 0.034084553
5.3438 0.0875207643
5.34436 0.0356283179
5.34452 0.0306993392
5.34456 0.0288807644
5.3448 0.0165046742
5.34504 0.0282299051
5.3452 0.0533351795
5.3458 0.05287876
5.346 0.1192851075
5.346 0.0318748452
5.34648 0.022514099
5.34652 0.0295305232
These heart beats will peak at a certain frequency, and different intensities that are followed by moments of rest. I'm attaching, a plot I made with ggplot, that shows all the data.
My question: I want to make a color gradient (visually in ggplot) based on peak clusters. So, the more peaks that are clustered together the darker they appear in the plot. The areas of fewer clusters appear lighter. And also a key the top right/left corner with this gradient would be nice.
I used geom_line to make the plot below here's my code:
beats <- read.csv("beat_intesity.csv")
p <- ggplot(beats, aes(x=time, y=amps))
p + geom_line()
You can see some regions have more peaks than others. I'm not sure if this is a very easy problem. Thank you in advance.
The simplest measure of intensity I can think of is running average:
ma <- function(x, n=5){as.numeric(filter(x, rep(1, n), sides=2))}
Take a reasonable threshold, say 0.1.
beats$peaks <- ma(beats$amps > 0.1)
Map the resulting variable to colour.
ggplot(beats, aes(x=time, y=amps, color=peaks)) + geom_line()
Play around with scale_colour_gradient to choose the desired colour palette.

Geom_ribbon() just turns the graph blank

Hi I got a data frame weekly.mean.values with the following structure:
week:mean:ci.lower:ci.upper
Where week is a factor; mean, ci.lower and ci.upper are numeric. For each week, there is only one mean, and one ci.lower or ci.upper.
I was trying to plot a shaded area inside of the 95% confidence interval around the mean, with the following code:
ggplot(weekly.mean.values,aes(x=week,y=mean)) +
geom_line() +
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper))
The plot, however, came out blank (that is only with x-axis and y-axis present, but no lines, or points, let alone shaded areas).
If I removed the geom_ribbon part, I did get a line. I know that this should be a very simple task but I don't know why I couldn't get geom_ribbon to plot what I wanted. Any hint would be truly appreciated.
I realize this thread is super old, but google still find it.
The answer is that you need to set the ymin and ymax to use a part of the data you are using on the y-axis. It you set them to scalar values then the ribbon covers the entire plot from top to bottom.
You can use
ymin=0
ymax=mean
to go from 0 to your y-point or even
ymin=mean-1
ymax=mean+1
to have the ribbon cover a strip encompassing your actual data.
I may be missing something, but the ribbon will be plotted filled with grey20 by default. You are plotting this layer on top of the data so no wonder it obscures it. Also, it is also possible that the limits for the plot axes derived from the data provided to the initial ggplot() call will not be sufficient to contain the confidence interval ribbon. In that case, I would not be surprised to see a grey/blank plot.
To see if this is the problem, try altering your geom_ribbon() line to:
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper), alpha = 0.5)
which will plot the ribbon with transparency whic should show the data underneath if the problem is what I think it is.
If so, set the x and y limits to the range of the data +/- the confidence interval you wish to plot and swap the order of the layers (i.e. draw the line on top of the ribbon), and use transparency in the ribbon to show the grid through it.
From ggplot's docs for geom_ribbon (2.1.0):
For each continuous x value, geom_interval displays a y interval. geom_area is a special case of geom_ribbon, where the minimum of the range is fixed to 0.
In this case, x values cannot be factors for geom_ribbon. One solution would be to convert week from a factor to a numeric. e.g.
ggplot(weekly.mean.values,aes(x=as.numeric(week),y=mean)) +
geom_line() +
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper))
geom_line should handle the switch from factor to numeric without incident, although the X axis scale may display differently.

Resources