Include NA values in plot and size, fill legend - r

I have NA values in a data set, which I would like to include in my ggplot as well as in the legend. I thought this would be easily done by specifying the na.values="somecolour" option, as shown e.g. in this post. However, for my example the code runs without plotting any of the NAs, nor including an entry in the legend. Instead rows with missing values are automatically removed. Here's some code for illustration:
set.seed(42)
lat <- rnorm(10, 54, 12)
long <- rnorm(10, 44, 12)
val <- rnorm(6, 10, 3)
val <- c(val,NA,NA,NA,NA)
df <- as.data.frame(cbind(long, lat, val))
library(ggplot2)
library(scales)
ggplot() +
geom_point(data=df, aes(x=lat, y=long, size=val, fill=val),shape=21,alpha=0.6) +
scale_size_continuous(range = c(2, 12), breaks=pretty_breaks(4)) +
scale_fill_distiller(direction = -1, palette="RdYlBu", breaks=pretty_breaks(4),na.value = "black") +
guides(fill = guide_legend(), size = guide_legend()) +
theme_minimal()
What am I doing wrong?

Problem comes from setting size in aes as you can't set size for NA values in scale_size_continuous.
My solution would be to plot NA values separately (not perfect, but works). To add them to legend set some dummy value within aes to call there guide.
However, there is a problem that NA legend doesn't align nicely with non-NA legend. To adjust the alignment we have to plot another set of invisible NA values with the size of maximum non-NA values.
ggplot(df, aes(lat, long, size = val, fill = val)) +
geom_point(shape = 21,alpha = 0.6) +
geom_point(data = subset(df, is.na(val)), aes(shape = "NA"),
size = 1, fill = "black") +
geom_point(data = subset(df, is.na(val)), aes(shape = "NA"),
size = 14, alpha = 0) +
scale_size_continuous(range = c(2, 12), breaks = pretty_breaks(4)) +
scale_fill_distiller(direction = -1, palette = "RdYlBu", breaks = pretty_breaks(4)) +
labs(shape = " val\n",
fill = NULL,
size = NULL) +
guides(fill = guide_legend(),
size = guide_legend(),
shape = guide_legend(order = 1)) +
theme_minimal() +
theme(legend.spacing.y = unit(-0.4, "cm"))
PS: requires ggplot2_3.0.0.

Related

How to remove zig-zag pattern in marginal distribution plot of integer values in R?

I am including marginal distribution plots on a scatterplot of a continuous and integer variable. However, in the integer variable maringal distribution plot (y-axis) there is this zig-zag pattern that shows up because the y-values are all integers. Is there any way to increase the "width" (not sure that's the right term) of the bins/values the function calculates the distribution density over?
The goal is to get rid of that zig-zag pattern that develops because the y-values are integers.
library(GlmSimulatoR)
library(ggplot2)
library(patchwork)
### Create right-skewed dataset that has one continous variable and one integer variable
set.seed(123)
df1 <- data.frame(matrix(ncol = 2, nrow = 1000))
x <- c("int","cont")
colnames(df1) <- x
df1$int <- round(rgamma(1000, shape = 1, scale = 1),0)
df1$cont <- round(rgamma(1000, shape = 1, scale = 1),1)
p1 <- ggplot(data = df1, aes(x = cont, y = int)) +
geom_point(shape = 21, size = 2, color = "black", fill = "black", stroke = 1, alpha = 0.4) +
xlab("Continuous Value") +
ylab("Integer Value") +
theme_bw() +
theme(panel.grid = element_blank(),
text = element_text(size = 16),
axis.text.x = element_text(size = 16, color = "black"),
axis.text.y = element_text(size = 16, color = "black"))
dens1 <- ggplot(df1, aes(x = cont)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df1, aes(x = int)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + p1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(6,1), heights = c(1,6))
From ?geom_density:
adjust: A multiplicate [sic] bandwidth adjustment. This makes it possible
to adjust the bandwidth while still using the a bandwidth
estimator. For example, ‘adjust = 1/2’ means use half of the
default bandwidth.
So as a start try e.g. geom_density(..., adjust = 2) (bandwidth twice as wide as default) and go from there.

how to change / specify fill color which exceeds the limits of a gradient bar?

In ggplot2/geom_tile, how to change fill color whice exceed the limits?
As the image, Region_4/5 are out of limis(1,11) , so the fill color is default grey, how to change 'Region_4' to 'darkblue', 'Region_5' to 'black' . Thanks!
library(tidyverse)
library(RColorBrewer)
tile_data <- data.frame(category=letters[1:5],
region=paste0('region_',1:5),
sales=c(1,2,5,0.1,300))
tile_data %>% ggplot(aes(x=category,
y=region,
fill=sales))+
geom_tile()+
scale_fill_gradientn(limits=c(1,11),
colors=brewer.pal(12,'Spectral'))+
theme_minimal()
If you want to keep the gradient scale and have two additional discrete values for off limits above and below, I think the easiest way would be to have separate fill scales for "in-limit" and "off-limit" values. This can be done with separate calls to geom_tile on subsets of your data and with packages such as {ggnewscale}.
I think it then would make sense to place the discrete "off-limits" at the respective extremes of your gradient color bar. You need then three geom_tile calls and three scale_fill calls, and you will need to specify the guide order within each scale_fill call. You will then need to play around with the legend margins, but it's not a big problem to make it look OK.
library(tidyverse)
library(RColorBrewer)
tile_data <- data.frame(
category = letters[1:5],
region = paste0("region_", 1:5),
sales = c(1, 2, 5, 0.1, 300)
)
ggplot(tile_data, aes(
x = category,
y = region,
fill = sales
)) +
geom_tile(data = filter(tile_data, sales <= 11 & sales >=1)) +
scale_fill_gradientn(NULL,
limits = c(1, 11),
colors = brewer.pal(11, "Spectral"),
guide = guide_colorbar(order = 2)
) +
ggnewscale::new_scale_fill() +
geom_tile(data = filter(tile_data, sales > 11), mapping = aes(fill = sales > 11)) +
scale_fill_manual("Sales", values = "black", labels = "> 11", guide = guide_legend(order = 1)) +
ggnewscale::new_scale_fill() +
geom_tile(data = filter(tile_data, sales < 1), mapping = aes(fill = sales < 1)) +
scale_fill_manual(NULL, values = "darkblue", labels = "< 1", guide = guide_legend(order = 3)) +
theme_minimal() +
theme(legend.spacing.y = unit(-6, "pt"),
legend.title = element_text(margin = margin(b = 10)))
Created on 2021-11-22 by the reprex package (v2.0.1)
You can try scales::squish, define the limits, and put the out of bound (oob) values into the scalw:
p = tile_data %>% ggplot(aes(x=category,y=region,fill=sales))+ geom_tile()
p + scale_fill_gradientn(colors = brewer.pal(11,"Spectral"),
limit = c(1,11),oob=scales::squish)

R bubble plot using ggplot manually selecting the colour and axis names

I using ggplot to create a bubble plot. With this code:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
theme_bw() +
theme() +
scale_size(range = c(1, 50)) +
ylim(0,100)
It is working perfectly apart from 2 things:
For each name (fill) I would like to manually specify the colour used (via a dataframe that maps name to colour) - this is to provide consistency across multiple figures.
I would like to substitute the numbers on the y for text labels (for several reasons I cannot use the text labels from the outset due to ordering issues)
I have tried several methods using scale_color_manual() and scale_y_continuous respectively and I am getting nowhere! Any help would be very gratefully received!
Thanks
Since you have not specified an example df, I created one of my own.
To manually specify the color, you have to use scale_fill_manual with a named vector as the argument of values.
Edit 2
This appears to do what you want. We use scale_y_continuous. The breaks argument specifies the vector of positions, while the labels argument specifies the labels which should appear at those positions. Since we already created the vectors when creating the data frame, we simply pass those vectors as arguments.
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(breaks = mean, labels = order_label)
Edit 1
From your comment, it appears that you want to label the circles. One option would be to use geom_text. Code below. You may need to experiment with values of nudge_y to get the position correct.
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
order_label <- c("New York", "London")
df <- data.frame(order, mean, n, name, order_label, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
geom_text(aes(label = order_label), size = 3, hjust = "inward",
nudge_y = 0.03) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab(NULL)
Original Answer
It is not clear what you mean by "substitute the numbers on the y for text labels". In the example below, I have formatted the y-axis as a percentage using the scales::percent_format() function. Is this similar to what you want?
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
df <- data.frame(order, mean, n, name, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(labels = scales::percent_format())
Thanks, for all your help, this worked perfectly:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_x_continuous(breaks = order, labels = order_label)

Error message when trying to add extra geoim_point layer to ggplot lineplot

I am trying to make a line graph using the following code:
ggplot(out2, aes(factor(out2$term, levels=unique(as.character(out2$term)) ),estimate, group = 1)) +
geom_line(aes(group = 1), size = 1.2) +
mytheme2 +
geom_point(shape = 21, colour = "black", fill = "white", size = 5, stroke = 2) +
scale_shape(solid = FALSE) +
theme(axis.text.x = element_text(angle = 50, hjust = 1, size = 15, family = "serif")) +
scale_x_discrete(labels = labels1) +
theme(plot.title = element_text(hjust = 0.5)) +
geom_ribbon(data=out2,aes(ymin=conf.low,ymax=conf.high),alpha=0.1)
Which gives me this graph:
However, based on a variable in the data frame called p.val I would like to add one asterisk if the value of p.val is less then .05, and two asterisks if the value is less than .001.
I tried to add a line at the bottom of the code to achieve this:
ggplot(out2, aes(factor(out2$term, levels=unique(as.character(out2$term)) ),estimate, group = 1)) +
geom_line(aes(group = 1), size = 1.2) +
mytheme2 +
geom_point(shape = 21, colour = "black", fill = "white", size = 5, stroke = 2) +
scale_shape(solid = FALSE) +
theme(axis.text.x = element_text(angle = 50, hjust = 1, size = 15, family = "serif")) +
scale_x_discrete(labels = labels1) +
#labs(y= "Standardized regression coefficient", x = "TAT threshold (Lux) minutes") +
#labs(title = "Sensitivity Analyses showing standardized regression coefficients for models with a range of \nTAT Light Thresholds (lux), Sleep Quality, Activity Level and BMI predicting T1 Hyperactivity.") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_ribbon(data=out2,aes(ymin=conf.low,ymax=conf.high),alpha=0.1) +
geom_point(data=out2[out2$p.value > 0.05,], color="red", size=3)
However, this gives me the error message:
Error: Aesthetics must be either length 1 or the same as the data (6): x, y, group
You are passing in a data frame to the last geom_point() layer that is a smaller subset of the original out2 and ggplot doesn't know how to distribute this shortened data across the original larger data, thus that warning.
It might be easier if you built a column in your data frame for the significance label first and then used geom_text() to layer it on instead of geom_point().
out2$signif_label <- ifelse(out2$p.value < .05, "*", "")
out2$signif_label <- ifelse(out2$p.value < .001, "**", out2$signif_label)
then add this instead of the last geom_point()
geom_text(aes(label = signif_label), color = "red", size = 3)
If you assign data in the initial ggplot(data = ,...) call then all subsequent layers will try to inherit the same data, so we don't need to assign it again, unless it's different.

How to merge legends for color and shape when geom_hline has a separate (additional) entry in the color legend?

I have the following code, which produces the following plot:
cols <- brewer.pal(n = 3, name = 'Dark2')
p4 <- ggplot(all.m, aes(x=xval, y=yval, colour = Approach, ymax = 0.95)) + theme_bw() +
geom_errorbar(aes(ymin= yval - se, ymax = yval + se), width=5, position=pd) +
geom_line(position=pd) +
geom_point(aes(shape=Approach, colour = Approach), size = 4) +
geom_hline(aes(yintercept = cp.best$slope, colour = "C2P"), show_guide = FALSE) +
scale_color_manual(name="Approach", breaks=c("C2P", "P2P", "CP2P"), values = cols[c(1,3,2)]) +
scale_y_continuous(breaks = seq(0.4, 0.95, 0.05), "Test AUROC") +
scale_x_continuous(breaks = seq(10, 150, by = 20), "# Number of Patient Samples in Training")
p4 <- p4 + theme(legend.direction = 'horizontal',
legend.position = 'top',
plot.margin = unit(c(5.1, 7, 4.5, 3.5)/2, "lines"),
text = element_text(size=15), axis.title.x=element_text(vjust=-1.5), axis.title.y=element_text(vjust=2))
p4 <- p4 + guides(colour=guide_legend(override.aes=list(shape=c(NA,17,16))))
p4
When I try show_guide = FALSE in geom_point, the shape of the point in the upper legend are all set to default solid circles.
How can I make the lower legend to disappear, without affecting the upper legend?
This is a solution, complete with reproducible data:
library("ggplot2")
library("grid")
library("RColorBrewer")
cp2p <- data.frame(xval = 10 * 2:15, yval = cumsum(c(0.55, rnorm(13, 0.01, 0.005))), Approach = "CP2P", stringsAsFactors = FALSE)
p2p <- data.frame(xval = 10 * 1:15, yval = cumsum(c(0.7, rnorm(14, 0.01, 0.005))), Approach = "P2P", stringsAsFactors = FALSE)
pd <- position_dodge(0.1)
cp.best <- list(slope = 0.65)
all.m <- rbind(p2p, cp2p)
all.m$Approach <- factor(all.m$Approach, levels = c("C2P", "P2P", "CP2P"))
all.m$se <- rnorm(29, 0.1, 0.02)
all.m[nrow(all.m) + 1, ] <- all.m[nrow(all.m) + 1, ] # Creates a new row filled with NAs
all.m$Approach[nrow(all.m)] <- "C2P"
cols <- brewer.pal(n = 3, name = 'Dark2')
p4 <- ggplot(all.m, aes(x=xval, y=yval, colour = Approach, ymax = 0.95)) + theme_bw() +
geom_errorbar(aes(ymin= yval - se, ymax = yval + se), width=5, position=pd) +
geom_line(position=pd) +
geom_point(aes(shape=Approach, colour = Approach), size = 4, na.rm = TRUE) +
geom_hline(aes(yintercept = cp.best$slope, colour = "C2P")) +
scale_color_manual(values = c(C2P = cols[1], P2P = cols[2], CP2P = cols[3])) +
scale_shape_manual(values = c(C2P = NA, P2P = 16, CP2P = 17)) +
scale_y_continuous(breaks = seq(0.4, 0.95, 0.05), "Test AUROC") +
scale_x_continuous(breaks = seq(10, 150, by = 20), "# Number of Patient Samples in Training")
p4 <- p4 + theme(legend.direction = 'horizontal',
legend.position = 'top',
plot.margin = unit(c(5.1, 7, 4.5, 3.5)/2, "lines"),
text = element_text(size=15), axis.title.x=element_text(vjust=-1.5), axis.title.y=element_text(vjust=2))
p4
The trick is to make sure that all of the desired levels of all.m$Approach appear in all.m, even if one of them gets dropped out of the graph. The warning about the omitted point is suppressed by the na.rm = TRUE argument to geom_point.
Short answer:
Just add a dummy geom_point layer (transparent points) where shape is mapped to the same level as in geom_hline.
geom_point(aes(shape = "int"), alpha = 0)
Longer answer:
Whenever possible, ggplot merges / combines legends of different aesthetics. For example, if colour and shape is mapped to the same variable, then the two legends are combined into one.
I illustrate this using simple data set with 'x', 'y' and a grouping variable 'grp' with two levels:
df <- data.frame(x = rep(1:2, 2), y = 1:4, grp = rep(c("a", "b"), each = 2))
First we map both color and shape to 'grp'
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4)
Fine, the legends for the aesthetics, color and shape, are merged into one.
Then we add a geom_hline. We want it to have a separate color from the geom_lines and to appear in the legend. Thus, we map color to a variable, i.e. put color inside aes of geom_hline. In this case we do not map the color to a variable in the data set, but to a constant. We may give the constant a desired name, so we don't need to rename the legend entries afterwards.
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int"))
Now two legends appears, one for the color aesthetics of geom_line and geom_hline, and one for the shape of the geom_points. The reason for this is that the "variable" which color is mapped to now contains three levels: the two levels of 'grp' in the original data, plus the level 'int' which was introduced in the geom_hline aes. Thus, the levels in the color scale differs from those in the shape scale, and by default ggplot can't merge the two scales into one legend.
How to combine the two legends?
One possibility is to introduce the same, additional level for shape as for color by using a dummy geom_point layer with transparent points (alpha = 0) so that the two aesthetics contains the same levels:
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int")) +
geom_point(aes(shape = "int"), alpha = 0) # <~~~~ a blank geom_point
Another possibility is to convert the original grouping variable to a factor, and add the "geom_hline level" to the original levels. Then use drop = FALSE in scale_shape_discrete to include "unused factor levels from the scale":
datadf$grp <- factor(df$grp, levels = c(unique(df$grp), "int"))
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int")) +
scale_shape_discrete(drop = FALSE)
Then, as you already know, you may use the guides function to "override" the shape aesthetics in the legend, and remove the shape from the geom_hline entry by setting it to NA:
guides(colour = guide_legend(override.aes = list(shape = c(16, 17, NA))))

Resources