Factor / level order with ggplot2 legend - r

I'm having a difficult time understanding how factor labels interact with legends. How do I set the order of levels so that the ggplot2 system creates a legend that I envision?
In the below code, the plot associate the Reds with the negative side of the interval and the blues with the positive.
require(ggplot2)
require(RColorBrewer)
set.seed(1492) #discovery!
sample = data.frame(x = c(1:20), y = 1, obs = runif(20, -150, 150))
the.breaks = seq(-100, 100, by = 20)
sample$interval = factor(findInterval(sample$obs, vec = the.breaks, all.inside = TRUE),
labels = the.breaks, levels = c(1:length(the.breaks)))
pal = rev(brewer.pal(11, "RdBu"))
p = ggplot(sample, aes(x, y, colour = interval))
p = p + geom_point(size = 10)
p = p + scale_colour_manual(values = pal, limits = the.breaks, labels = the.breaks)
p = p + guides(colour = guide_legend(override.aes = list(size = 3, shape = 19)))
p
This works fine, but I really don't like the
pal = rev(brewer.pal(11, "RdBu"))
statement - seems inelegant. I'd like to be able to replace the use of scale_colour_manual with something like
p = p + scale_colour_brewer(palette="RdBu", type="div", limits = the.breaks, labels = the.breaks)
but when I do, the Reds get associated with the positive end.

You can order your factor levels in any way you like, this should help you get them the right colours. Try this explanation or any others out there on ordering factor levels. Note that the opts command in the blog post is out-dated.
Also check out this question.
And this one as well.

Related

divide the y axis to make part with a score <25 occupies the majority in ggplot

I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.

Display custom axis labels in ggplot2

I'd like to plot histogram and density on the same plot. What I would like to add to the following is custom y-axis label which would be something like sprintf("[%s] %s", ..density.., ..count..) - two numbers at one tick value. Is it possible to obtain this with scale_y_continuous or do I need to work this around somehow?
Below current progress using scales::trans_new and sec_axis. sec_axis is kind of acceptable but the most desirable output is as on the image below.
set.seed(1)
var <- rnorm(4000)
binwidth <- 2 * IQR(var) / length(var) ^ (1 / 3)
count_and_proportion_label <- function(x) {
sprintf("%s [%.2f%%]", x, x/sum(x) * 100)
}
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(binwidth = binwidth) +
geom_density(aes(y = ..count.. * binwidth)) +
scale_y_continuous(
# this way
trans = trans_new(name = "count_and_proportion",
format = count_and_proportion_label,
transform = function(x) x,
inverse = function(x) x),
# or this way
sec.axis = sec_axis(trans = ~./sum(.),
labels = percent,
name = "proportion (in %)")
)
I've tried to create object with breaks before basing on the graphics::hist output - but these two histogram differs.
bins <- (max(var) - min(var))/binwidth
hdata <- hist(var, breaks = bins, right = FALSE)
# hist generates different bins than `ggplot2`
At the end I would like to get something like this:
Would it be acceptable to add percentage as a secondary axis? E.g.
your_plot + scale_y_continuous(sec.axis = sec_axis(~.*2, name = "[%]"))
Perhaps it would be possible to overlay the secondary axis on the primary one, but I'm not sure how you would go about doing that.
You can achieve your desired output by creating a custom set of labels, and adding it to the plot:
library(tidyverse)
library(ggplot2)
set.seed(1)
var <- rnorm(400)
bins <- .1
df <- data.frame(yvals = seq(0, 20, 5), labels = c("[0%]", "[10%]", "[20%]", "[30%]", "[40%]"))
df <- df %>% tidyr::unite("custom_labels", labels, yvals, sep = " ", remove = TRUE)
ggplot(data = data.frame(var = var), aes(x = var, y = ..count..)) +
geom_histogram(aes(y = ..count..), binwidth = bins) +
geom_density(aes(y = ..count.. * bins), color = "black", alpha = 0.7) +
ylab("[density] count") +
scale_y_continuous(breaks = seq(0, 20, 5), labels = df$custom_labels)

Plotting points and lines separately in R with ggplot

I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines

How to make a color scale with sharp transition in ggplot2

I am trying to create a color scale with a sharp color transition at one point. What I am currently doing is:
test <- data.frame(x = c(1:20), y = seq(0.01, 0.2, by = 0.01))
cutoff <- 0.10
ggplot(data = test,
aes(x = as.factor(x), y = y, fill = log(y), width = 1, binwidth = 0)) +
geom_bar(stat = "identity") +
scale_fill_gradientn(colours = c("red", "red", "yellow", "green"),
values = rescale(log(c(0.01, cutoff - 0.0000000000000001, cutoff, 0.2))),
breaks = c(log(cutoff)), label = c(cutoff))
It is producing the plots I want. But the position of the break in colorbar somehow varies depending on the cutoff. Sometimes below the value, sometimes above, sometimes on the line. Here are some plots with different cutoffs (0.05, 0.06, 0.1):
What am I doing wrong? Or alternatively, is there a better way to create a such a color scale?
Have you looked into scale_colour_steps or scale_colour_stepsn?
Using the option n.break from scale_colour_stepsn you should be able to specify the number of breaks you want and have sharper transitions.
Be sure to use ggplot2 > 3.3.2
In case you are still interested in a solution for this, you can add guide = guide_colourbar(nbin = <some arbitrarily large number>) to scale_fill_gradientn(). This increases the number of bins used by the colourbar legend, which makes the transition look sharper.
# illustration using nbin = 1000, & weighted colours below the cutoff
plot.cutoff <- function(cutoff){
p <- ggplot(data = test,
aes(x = as.factor(x), y = y, fill = log(y))) +
geom_col(width = 1) +
scale_fill_gradientn(colours = c("red4", "red", "yellow", "green"),
values = scales::rescale(log(c(0.01, cutoff - 0.0000000000000001,
cutoff, 0.2))),
breaks = c(log(cutoff)),
label = c(cutoff),
guide = guide_colourbar(nbin = 1000))
return(p)
}
cowplot::plot_grid(plot.cutoff(0.05),
plot.cutoff(0.06),
plot.cutoff(0.08),
plot.cutoff(0.1),
ncol = 2)
(If you find the above insufficiently sharp at very high resolutions, you can also set raster = FALSE in guide_colourbar(), which turns off interpolation & draws rectangles instead.)
I think it is slightly tricky to achieve an exact, discrete cutoff point in the continuous color scale using scale_fill_gradientn. A quick alternative would be to use scale_fill_gradient, set the cutoff with limits, and set the color of 'out-of-bounds' values with na.value.
Here's a slightly simpler example than in your question:
# some data
df <- data.frame(x = factor(1:10), y = 1, z = 1:10)
# a cutoff point
lo <- 4
ggplot(df, aes(x = x, y = y, fill = z)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "yellow", high = "green",
limits = c(lo, max(df$z)), na.value = "red")
As you see, the values below your cutpoint will not appear in the legend, but one may consider including a large chunk of red a waste of "legend band width" anyway. You might just add a verbal description of the red bars in the figure caption instead.
You may also wish to differentiate between values below a lower cutpoint and above an upper cutpoint. For example, set 'too low' values to blue and 'too high values' to red. Here I use findInterval to differentiate between low, mid and high values.
# some data
set.seed(2)
df <- data.frame(x = factor(1:10), y = 1, z = sample(1:10))
# lower and upper limits
lo <- 3
hi <- 8
# create a grouping variable based on the the break points
df$grp <- findInterval(df$z, c(lo, hi), rightmost.closed = TRUE)
ggplot(df, aes(x = x, y = y, fill = z)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "yellow", high = "green", limits = c(lo, hi), na.value = "red") +
geom_bar(data = df[df$grp == 0, ], fill = "blue", stat = "identity")

Non-linear color distribution over the range of values in a geom_raster

I'm faced with the following problem: a few extreme values are dominating the colorscale of my geom_raster plot. An example is probably more clear (note that this example only works with a recent ggplot2 version, I use 0.9.2.1):
library(ggplot2)
library(reshape)
theme_set(theme_bw())
m_small_sd = melt(matrix(rnorm(10000), 100, 100))
m_big_sd = melt(matrix(rnorm(100, sd = 10), 10, 10))
new_xy = m_small_sd[sample(nrow(m_small_sd), nrow(m_big_sd)), c("X1","X2")]
m_big_sd[c("X1","X2")] = new_xy
m = data.frame(rbind(m_small_sd, m_big_sd))
names(m) = c("x", "y", "fill")
ggplot(m, aes_auto(m)) + geom_raster() + scale_fill_gradient2()
Right now I solve this by setting the values over a certain quantile equal to that quantile:
qn = quantile(m$fill, c(0.01, 0.99), na.rm = TRUE)
m = within(m, { fill = ifelse(fill < qn[1], qn[1], fill)
fill = ifelse(fill > qn[2], qn[2], fill)})
This does not really feel like an optimal solution. What I would like to do is have a non-linear mapping of colors to the range of values, i.e. more colors present in the area with more observations. In spplot I could use classIntervals from the classInt package to calculate the appropriate class boundaries:
library(sp)
library(classInt)
gridded(m) = ~x+y
col = c("#EDF8B1", "#C7E9B4", "#7FCDBB", "#41B6C4",
"#1D91C0", "#225EA8", "#0C2C84", "#5A005A")
at = classIntervals(m$fill, n = length(col) + 1)$brks
spplot(m, at = at, col.regions = col)
To my knowledge it is not possible to hardcode this mapping of colors to class intervals like I can in spplot. I could transform the fill axis, but as there are negative values in the fill variable that will not work.
So my question is: are there any solutions to this problem using ggplot2?
Seems that ggplot (0.9.2.1) and scales (0.2.2) bring all you need (for your original m):
library(scales)
qn = quantile(m$fill, c(0.01, 0.99), na.rm = TRUE)
qn01 <- rescale(c(qn, range(m$fill)))
ggplot(m, aes(x = x, y = y, fill = fill)) +
geom_raster() +
scale_fill_gradientn (
colours = colorRampPalette(c("darkblue", "white", "darkred"))(20),
values = c(0, seq(qn01[1], qn01[2], length.out = 18), 1)) +
theme(legend.key.height = unit (4.5, "lines"))

Resources