This question already has an answer here:
Manually setting group colors for ggplot2
(1 answer)
Closed 6 years ago.
I'm doing multiple plots split by one variable and in each plot, colour code based on another variable.
set.seed(12345)
dates = seq(as.Date("2000-01-01"), as.Date("2016-01-01"), by = 1)
dd = data.table(date = dates, value = rnorm(length(dates)))
dd[, year := lubridate::year(date)]
dd[, c := cut(value, c(-Inf, -3, 3, Inf))]
for (thisyear in 2000:2015) {
ggplot(dd[year == thisyear]) +
geom_ribbon(aes(x = date, ymin = -Inf, ymax = Inf, fill = c), alpha = 0.1)
}
dd[, length(unique(c)), by = year]
year V1
1: 2000 1
2: 2001 2
3: 2002 2
4: 2003 3
5: 2004 3
....
Now the colour in different plots will be inconsistent since not every year has the same length of unique cut values. Even worse is when one year has all (-Inf,3] values (unlikely here of course) and another year has all [3,Inf) values, they will both be coloured red in two plots.
How can I specify that (-Inf, 3] always take blue and (-3,3] always take green?
One way to manually specify the colors to use, would be to simply create a column in your data frame specifying the plot color to use.
For example:
# scatter plot
dd$color <- ifelse(dd$value <= 3, 'blue', 'green')
ggplot(dd, aes(date, value)) + geom_point(colour=dd$color)
# ribbon plot
thisyear <- '2001'
dd_year <- dd[year == thisyear,]
ggplot(dd_year, aes(date, group=color, colour=color)) +
geom_ribbon(aes(ymin=value - 1, ymax=value + 1, fill=color), alpha=0.5) +
scale_fill_manual(values=unique(dd_year$color)) +
scale_color_manual(values=unique(dd_year$color))
This would result in all points <= 3 being colored blue, and the remaining ones green.
Not the most interesting example perhaps since there is only only data point that gets colored green here, but it should look like this:
You can create a named vector of colors to pass to scale_fill_manual. This allows you to choose the colors of each group as well as ensuring that each plot has the same colors among groups.
colors = c("blue", "green", "red")
names(colors) = levels(dd$c)
(-Inf,-3] (-3,3] (3, Inf]
"blue" "green" "red"
Now the same plot, but with scale_fill_manual added.
for (thisyear in 2000:2015) {
print(ggplot(dd[year == thisyear]) +
geom_ribbon(aes(x = date, y = value, ymin = -Inf, ymax = Inf, fill = c), alpha = 0.1) +
scale_fill_manual(values = colors))
}
Related
I was recently attempting to plot some filled points with an additional white border around the standard black border. I was unable to find a posted solution that did not rely on custom point shapes but still plotted all components of each point together.
After far too much time, I came up with the following solution. This solution works by duplicating each point and plotting a slightly larger white point behind it. This is similar to the solutions listed here (R ggplot2: How to draw geom_points that have a solid color and a transparent stroke and are colored depending on color?). However, my solution allows both borders to overlap adjacent points rather than plotting one set of borders behind the other.
Reproducible Example:
## Create test dataframe from built in mpg data frame and add scatter
test_mpg = mpg
test_mpg$displ = test_mpg$displ + runif(length(test_mpg$displ), min = 0, max = 0.5) # Add random values between 0 and 0.5 to test_mpg$displ
test_mpg$cty = test_mpg$cty + runif(length(test_mpg$displ), min = 0, max = 5) # Add random values between 0 and 5 to test_mpg$cty
### Add second border color
## Step 1: Create new dataset with duplicate rows.
rowind = seq(1,nrow(test_mpg)) # Create array of row indices
rowind2 = rep(rowind, each = 2, times = 1) # Create new array with doubled row indices (i.e. 1,2,3 becomes 1,1,2,2,3,3)
newdata = test_mpg[rowind2,] # Create new dataset with doubled rows
newdata$odd_row = seq_len(nrow(newdata)) %% 2 # Create column indicating even and odd rows. Odd rows = 1 and even rows = 0
## Step 2: Create columns with alternating aesthetics
# Create point size aesthetic column with alternating values
newdata$pointsize = NA # Create new column
newdata[newdata$odd_row == 1,]$pointsize = 2.0 # Set odd numbered rows to the desired outer point size. This is plotted behind the main point and determines the amount of secondary border visible.
newdata[newdata$odd_row == 0,]$pointsize = 1.5 # Set even numbered rows to the desired main point size
# Create point color aesthetic column with alternating values
newdata$pointcolor = NA # Create new column
newdata[newdata$odd_row == 1,]$pointcolor = "white" # Set odd numbered rows to the desired outer border color.
newdata[newdata$odd_row == 0,]$pointcolor = "black" # Set even numbered rows to the desired main point border color
## Step 3: Plot your data
# Plot your data with point size set to newdata$pointsize and point color set to newdata$pointcolor. These should be set outside of the aes()
test = ggplot(newdata, aes(displ, cty, fill = drv, shape = drv)) +
scale_fill_manual(values=c("green","blue","yellow")) +
scale_shape_manual(values= c(23, 24, 25)) +
geom_point(alpha = 1, size = newdata$pointsize, stroke = 0.75, color = newdata$pointcolor) +
theme_bw()
test
Plot from above code
I am not at all sure that his is what you want, but the borders do not overlap.
The code below calls geom_point twice, what the question you link to tries to avoid.
suppressPackageStartupMessages({
library(ggplot2)
library(dplyr)
})
set.seed(2022)
mpg %>%
mutate(displ = displ + runif(length(displ), min = 0, max = 0.5),
cty = cty + runif(length(displ), min = 0, max = 5)) %>%
ggplot(aes(displ, cty, fill = drv, shape = drv)) +
geom_point(size = 2, color = "white", fill = "white") +
geom_point(size = 1.5, color = "black", stroke = 0.75) +
scale_fill_manual(values = c("green", "blue", "yellow")) +
scale_shape_manual(values = c(23, 24, 25)) +
theme_bw()
Created on 2022-06-19 by the reprex package (v2.0.1)
I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)
I want to create a line plot in ggplot2 that the panel background colors alternate between white and grey based on the X axis values.
In this case DOY is day of year and I would like for it to transition between each day.
I included some basic sample code. Basically want between DOY 1-2 to be white and DOY 2-3 to be grey and so forth.
Any help is appreciated, thanks in advance.
DOY <- c(1, 2, 3, 4, 5)
Max <- c(200, 225, 250, 275, 300)
sample <- data.frame(DOY, Max)
ggplot()+
geom_line(data=sample, aes(x=DOY, y=Max), color = "black")
One way to approach this is to add a new variable (called e.g. stripe) to the data, which alternates based on the value of DOY. Then you can use that variable as the basis for filled, transparent rectangles.
I'm assuming that DOY is a sequence of integers with interval = 1, so we can assign on the basis of whether DOY is odd or even.
(Note: sample - not a great variable name as there's a function of that name).
library(dplyr)
library(ggplot2)
sample %>%
mutate(stripe = factor(ifelse(DOY %% 2 == 0, 1, 0))) %>%
ggplot(aes(DOY, Max)) +
geom_point() +
geom_rect(aes(xmax = DOY + 1,
xmin = DOY,
ymin = min(Max),
ymax = Inf,
fill = stripe), alpha = 0.4) +
scale_fill_manual(values = c("white", "grey50")) +
theme_bw() +
guides(fill = FALSE)
Result:
I would like to make this plot:
Plot 1: The plot that I wanted
My data looks like this:
> head(ranges_example)
labels Minimum Maximum error
1 One -275 -240 1
2 Two -265 -210 1
3 Three -260 -215 1
4 Four -273 -230 1
5 Five NaN -200 1
6 Six NaN -240 1
But, alas, I had to make that plot in illustrator by modifying the plot that I did make in R, this one:
Plot 2: The plot that I got
And I made it using geom_linerange, specifically:
ggplot() +
geom_linerange(data = ranges_example,
mapping=aes(x = labels, ymin = Minimum, ymax = Maximum,
lwd = 1, color = error, alpha = 0.5),
position = position_dodge(width = 1)) +
scale_y_continuous(c(-240, -300)) +
coord_flip()
Plot 2 is good enough for this once--it takes maybe 15 minutes to turn it into Plot 1 in Illustrator--but I'll probably need to make a good few more of these.
The reason why I don't just remove the position_dodge statement is that then it just blends the colors together, like this:
I need them to be their own, distinct colors so that it's easy to tell them apart. The different shades mean different things and I need to be able to easily distinguish between and alter them.
How can I create a plot that looks more like Plot 2 right out of the box?
ggplot() +
geom_linerange(data = ranges_example %>% arrange(-error),
mapping=aes(x = labels, ymin = Minimum, ymax = Maximum,
lwd = 1, color = error)) +
scale_y_continuous(c(-240, -300)) +
scale_color_continuous(high = "lightgreen", low = "forestgreen") +
coord_flip() +
theme_classic()
# Example data
ranges_example <- tribble(
~labels, ~Minimum, ~Maximum, ~error,
"One", -275, -240, 1,
"Two", -265, -210, 1,
"One", -285, -215, 2,
"Two", -275, -190, 2,
"One", -300, -200, 3,
"Two", -290, -180, 3)
I'd like to make small returns in this plot more visible. The most appropriate function seems to be scale_colour_gradient2, but this washes out the small returns, which happen most often. Using limits helped but I couldn't work out how to set oob (out of bounds) so it would just have a "saturated" value rather than be grey. And the log transform just made small values stand out. Has someone else figured out how to do this elegantly?
library(zoo)
library(ggplot2)
library(tseries)
spx <- get.hist.quote(instrument="^gspc", start="2000-01-01",
end="2013-12-14", quote="AdjClose",
provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
spx.rtn <- diff(log(spx$AdjClose)) * 100
rtn.data <- data.frame(x=time(spx.rtn),yend=spx.rtn)
p <- ggplot(rtn.data) +
geom_segment(aes(x=x,xend=x,y=0,yend=yend,colour=yend)) +
xlab("") + ylab("S&P 500 Daily Return %") +
theme(legend.position="null",axis.title.x=element_blank())
# low returns invisible
p + scale_colour_gradient2(low="blue",high="red")
# extreme values are grey
p + scale_colour_gradient2(low="blue",high="red",limits=c(-3,3))
# log transform returns has opposite problem
max_val <- max(log(abs(spx.rtn)))
values <- seq(-max_val, max_val, length = 11)
library(RColorBrewer)
p + scale_colour_gradientn(colours = brewer_pal(type="div",pal="RdBu")(11),
values = values
, rescaler = function(x, ...) sign(x)*log(abs(x)), oob = identity)
Here is another possibility, using scale_colour_gradientn. Mapping of colours is set using values = rescale(...) so that resolution is higher for values close to zero. I had a look at some colour scales here: http://colorbrewer2.org. I chose a 5-class diverging colour scheme, RdBu, from red to blue via near-white. There might be other scales that suit your needs better, this is just to show the basic principles.
# check the colours
library(RColorBrewer)
# cols <- brewer_pal(pal = "RdBu")(5) # not valid in 1.1-2
cols <- brewer.pal(n = 5, name = "RdBu")
cols
# [1] "#CA0020" "#F4A582" "#F7F7F7" "#92C5DE" "#0571B0"
# show_col(cols) # not valid in 1.1-2
display.brewer.pal(n = 5, name = "RdBu")
Using rescale, -10 corresponds to blue #0571B0; -1 = light blue #92C5DE; 0 = light grey #F7F7F7; 1 = light red #F4A582; 10 = red #CA0020. Values between -1 and 1 are interpolated between light blue and light red, et c. Thus, mapping is not linear and resolution is higher for small values.
library(ggplot2)
library(scales) # needed for rescale
ggplot(rtn.data) +
geom_segment(aes(x = x, xend = x, y = 0, yend = yend, colour = yend)) +
xlab("") + ylab("S&P 500 Daily Return %") +
scale_colour_gradientn(colours = cols,
values = rescale(c(-10, -1, 0, 1, 10)),
guide = "colorbar", limits=c(-10, 10)) +
theme(legend.position = "null", axis.title.x = element_blank())
how about:
p + scale_colour_gradient2(low="blue",high="red",mid="purple")
or
p + scale_colour_gradient2(low="blue",high="red",mid="darkgrey")