R bubble plot using ggplot manually selecting the colour and axis names - r

I using ggplot to create a bubble plot. With this code:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
theme_bw() +
theme() +
scale_size(range = c(1, 50)) +
ylim(0,100)
It is working perfectly apart from 2 things:
For each name (fill) I would like to manually specify the colour used (via a dataframe that maps name to colour) - this is to provide consistency across multiple figures.
I would like to substitute the numbers on the y for text labels (for several reasons I cannot use the text labels from the outset due to ordering issues)
I have tried several methods using scale_color_manual() and scale_y_continuous respectively and I am getting nowhere! Any help would be very gratefully received!
Thanks

Since you have not specified an example df, I created one of my own.
To manually specify the color, you have to use scale_fill_manual with a named vector as the argument of values.
Edit 2
This appears to do what you want. We use scale_y_continuous. The breaks argument specifies the vector of positions, while the labels argument specifies the labels which should appear at those positions. Since we already created the vectors when creating the data frame, we simply pass those vectors as arguments.
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(breaks = mean, labels = order_label)
Edit 1
From your comment, it appears that you want to label the circles. One option would be to use geom_text. Code below. You may need to experiment with values of nudge_y to get the position correct.
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
order_label <- c("New York", "London")
df <- data.frame(order, mean, n, name, order_label, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
geom_text(aes(label = order_label), size = 3, hjust = "inward",
nudge_y = 0.03) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab(NULL)
Original Answer
It is not clear what you mean by "substitute the numbers on the y for text labels". In the example below, I have formatted the y-axis as a percentage using the scales::percent_format() function. Is this similar to what you want?
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
df <- data.frame(order, mean, n, name, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(labels = scales::percent_format())

Thanks, for all your help, this worked perfectly:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_x_continuous(breaks = order, labels = order_label)

Related

ggplot get a color for each value

I have a spatial dataset, containing values from 0 to 10. I want every number (11 numbers) to have a unique color from a gradient. The simple plot function does the trick (assigning one color to one value) but my default is ggplot, which I also want to use here. ggplot only uses ten colors for some reason and I cannot figure out why. I think I might just be using the wrong scale_x_y function.
Reproducible example:
library(raster)
#Colors
cols <- colorRampPalette(c("yellow", "red", "darkred", "black"))
# Create Raster
r <- raster(ncol=100, nrow=100)
r[] <- sample(0:10, 10000, replace = T)
# Plot simple
plot(r, col=cols(11)) # 11 colors seen here
# Convert to df
r <- as.data.frame(r, xy=T)
# Plot with ggplot
X <- ggplot(data = r) + geom_raster(aes(x = x, y = y, fill = layer), interpolate = F) +
scale_fill_stepsn(colors=cols(11), breaks=seq(0,10,1), show.limits=T)
print(X) # only 10 colors seen here
In scale_fill_stepsn the breaks are at the limits of each bin. If you have a sequence of 11 breaks, then you only have ten bins (if you have 11 fence posts you only have 10 stretches of fence between them). You need to add one to your sequence of breaks, otherwise the level 10 will be excluded:
ggplot(data = r) +
geom_raster(aes(x = x, y = y, fill = layer), interpolate = FALSE) +
scale_fill_stepsn(colors = cols(11), breaks = seq(0, 11, 1),
show.limits = TRUE) +
coord_equal()
An alternative is to use a manual scale, which I think makes more sense here. As I understand it, you are treating the fill color as a discrete variable, and the labels should correspond to the levels rather than corresponding to the break between labels as implied by scale_fill_stepsn
ggplot(data = r) +
geom_raster(aes(x = x, y = y, fill = factor(layer, 10:0))) +
scale_fill_manual(values = rev(cols(11)), name = 'layer') +
coord_equal()
EDIT
To get the legend at the bottom, try:
ggplot(data = r) +
geom_raster(aes(x = x, y = y, fill = factor(layer, 0:10))) +
scale_fill_manual(values = cols(11), name = 'layer ') +
coord_equal() +
guides(fill = guide_legend(label.position = 'top', nrow = 1)) +
theme(legend.position = 'bottom',
legend.spacing.x = unit(0, 'mm'),
legend.title = element_text(hjust = 3, vjust = 0.25))

Change ggplot bar chart fill colors

With this data:
df <- data.frame(value =c(20, 50, 90),
group = c(1, 2,3))
I can get a bar chart:
df %>% ggplot(aes(x = group, y = value, fill = value)) +
geom_col() +
coord_flip()+
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
But I would like to have the colors of those bars to vary according to their corresponding values in value.
I have managed to change them using geom_raster:
ggplot() +
geom_raster(aes(x = c(0:20), y = .9, fill = c(0:20)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:50), y = 2, fill = c(0:50)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:90), y = 3.1, fill = c(0:90)),
interpolate = TRUE) +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
This approach is not efficient when I have many groups in real data. Any suggestions to get it done more efficiently would be appreciated.
I found the accepted answer to a previous similar question, but "These numbers needs to be adjusted depending on the number of x values and range of y". I was looking for an approach that I do not have to adjust numbers based on data. David Gibson's answer fits my purpose.
It does not look like this is supported natively in ggplot. I was able to get something close by adding additional rows, ranging from 0 to value) to the data. Then use geom_tile and separating the tiles by specifying width.
library(tidyverse)
df <- data.frame(value = c(20, 50, 90),
group = c(1, 2, 3))
df_expanded <- df %>%
rowwise() %>%
summarise(group = group,
value = list(0:value)) %>%
unnest(cols = value)
df_expanded %>%
ggplot() +
geom_tile(aes(
x = group,
y = value,
fill = value,
width = 0.9
)) +
coord_flip() +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
If this is too pixilated you can increase the number of rows generated by replacing list(0:value) with seq(0, value, by = 0.1).
This is a real hack using ggforce. This package has a geom that can take color gradients but it is for a line segment. I've just increased the size to make the line segment look like a bar. I made all the bars the same length to get the correct gradient, then covered a portion of each bar over with the same color as the background color to make them appear to be the correct length. Had to hide the grid lines, however. :-)
df %>%
ggplot() +
geom_link(aes(x = 0, xend = max(value), y = group, yend = group, color = stat(index)), size = 30) +
geom_link(aes(x = value, xend = max(value), y = group, yend = group), color = "grey", size = 31) +
scale_color_viridis_c(option = "C") +
theme(legend.position = "none", panel.background = element_rect(fill = "grey"),
panel.grid = element_blank()) +
ylim(0.5, max(df$group)+0.5 )

Aesthetics must be either length 1 or the same as the data (1): x, y, label

I'm working on some data on party polarization (something like this) and used geom_dumbbell from ggalt and ggplot2. I keep getting the same aes error and other solutions in the forum did not address this as effectively. This is my sample data.
df <- data_frame(policy=c("Not enough restrictions on gun ownership", "Climate change is an immediate threat", "Abortion should be illegal"),
Democrats=c(0.54, 0.82, 0.30),
Republicans=c(0.23, 0.38, 0.40),
diff=sprintf("+%d", as.integer((Democrats-Republicans)*100)))
I wanted to keep order of the plot, so converted policy to factor and wanted % to be shown only on the first line.
df <- arrange(df, desc(diff))
df$policy <- factor(df$policy, levels=rev(df$policy))
percent_first <- function(x) {
x <- sprintf("%d%%", round(x*100))
x[2:length(x)] <- sub("%$", "", x[2:length(x)])
x
}
Then I used ggplot that rendered something close to what I wanted.
gg2 <- ggplot()
gg2 <- gg + geom_segment(data = df, aes(y=country, yend=country, x=0, xend=1), color = "#b2b2b2", size = 0.15)
# making the dumbbell
gg2 <- gg + geom_dumbbell(data=df, aes(y=country, x=Democrats, xend=Republicans),
size=1.5, color = "#B2B2B2", point.size.l=3, point.size.r=3,
point.color.l = "#9FB059", point.color.r = "#EDAE52")
I then wanted the dumbbell to read Democrat and Republican on top to label the two points (like this). This is where I get the error.
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Democrats, y=country, label="Democrats"),
color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Republicans, y=country, label="Republicans"),
color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
Any thoughts on what I might be doing wrong?
I think it would be easier to build your own "dumbbells" with geom_segment() and geom_point(). Working with your df and changing the variable refences "country" to "policy":
library(tidyverse)
# gather data into long form to make ggplot happy
df2 <- gather(df,"party", "value", Democrats:Republicans)
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
# our dumbell
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
# the text labels
geom_text(aes(label = party), vjust = -1.5) + # use vjust to shift text up to no overlap
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) + # named vector to map colors to values in df2
scale_x_continuous(limits = c(0,1), labels = scales::percent) # use library(scales) nice math instead of pasting
Produces this plot:
Which has some overlapping labels. I think you could avoid that if you use just the first letter of party like this:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(aes(label = gsub("^(\\D).*", "\\1", party)), vjust = -1.5) + # just the first letter instead
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red"),
guide = "none") +
scale_x_continuous(limits = c(0,1), labels = scales::percent)
Only label the top issue with names:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(data = filter(df2, policy == "Not enough restrictions on gun ownership"),
aes(label = party), vjust = -1.5) +
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) +
scale_x_continuous(limits = c(0,1), labels = scales::percent)

How to merge legends for color and shape when geom_hline has a separate (additional) entry in the color legend?

I have the following code, which produces the following plot:
cols <- brewer.pal(n = 3, name = 'Dark2')
p4 <- ggplot(all.m, aes(x=xval, y=yval, colour = Approach, ymax = 0.95)) + theme_bw() +
geom_errorbar(aes(ymin= yval - se, ymax = yval + se), width=5, position=pd) +
geom_line(position=pd) +
geom_point(aes(shape=Approach, colour = Approach), size = 4) +
geom_hline(aes(yintercept = cp.best$slope, colour = "C2P"), show_guide = FALSE) +
scale_color_manual(name="Approach", breaks=c("C2P", "P2P", "CP2P"), values = cols[c(1,3,2)]) +
scale_y_continuous(breaks = seq(0.4, 0.95, 0.05), "Test AUROC") +
scale_x_continuous(breaks = seq(10, 150, by = 20), "# Number of Patient Samples in Training")
p4 <- p4 + theme(legend.direction = 'horizontal',
legend.position = 'top',
plot.margin = unit(c(5.1, 7, 4.5, 3.5)/2, "lines"),
text = element_text(size=15), axis.title.x=element_text(vjust=-1.5), axis.title.y=element_text(vjust=2))
p4 <- p4 + guides(colour=guide_legend(override.aes=list(shape=c(NA,17,16))))
p4
When I try show_guide = FALSE in geom_point, the shape of the point in the upper legend are all set to default solid circles.
How can I make the lower legend to disappear, without affecting the upper legend?
This is a solution, complete with reproducible data:
library("ggplot2")
library("grid")
library("RColorBrewer")
cp2p <- data.frame(xval = 10 * 2:15, yval = cumsum(c(0.55, rnorm(13, 0.01, 0.005))), Approach = "CP2P", stringsAsFactors = FALSE)
p2p <- data.frame(xval = 10 * 1:15, yval = cumsum(c(0.7, rnorm(14, 0.01, 0.005))), Approach = "P2P", stringsAsFactors = FALSE)
pd <- position_dodge(0.1)
cp.best <- list(slope = 0.65)
all.m <- rbind(p2p, cp2p)
all.m$Approach <- factor(all.m$Approach, levels = c("C2P", "P2P", "CP2P"))
all.m$se <- rnorm(29, 0.1, 0.02)
all.m[nrow(all.m) + 1, ] <- all.m[nrow(all.m) + 1, ] # Creates a new row filled with NAs
all.m$Approach[nrow(all.m)] <- "C2P"
cols <- brewer.pal(n = 3, name = 'Dark2')
p4 <- ggplot(all.m, aes(x=xval, y=yval, colour = Approach, ymax = 0.95)) + theme_bw() +
geom_errorbar(aes(ymin= yval - se, ymax = yval + se), width=5, position=pd) +
geom_line(position=pd) +
geom_point(aes(shape=Approach, colour = Approach), size = 4, na.rm = TRUE) +
geom_hline(aes(yintercept = cp.best$slope, colour = "C2P")) +
scale_color_manual(values = c(C2P = cols[1], P2P = cols[2], CP2P = cols[3])) +
scale_shape_manual(values = c(C2P = NA, P2P = 16, CP2P = 17)) +
scale_y_continuous(breaks = seq(0.4, 0.95, 0.05), "Test AUROC") +
scale_x_continuous(breaks = seq(10, 150, by = 20), "# Number of Patient Samples in Training")
p4 <- p4 + theme(legend.direction = 'horizontal',
legend.position = 'top',
plot.margin = unit(c(5.1, 7, 4.5, 3.5)/2, "lines"),
text = element_text(size=15), axis.title.x=element_text(vjust=-1.5), axis.title.y=element_text(vjust=2))
p4
The trick is to make sure that all of the desired levels of all.m$Approach appear in all.m, even if one of them gets dropped out of the graph. The warning about the omitted point is suppressed by the na.rm = TRUE argument to geom_point.
Short answer:
Just add a dummy geom_point layer (transparent points) where shape is mapped to the same level as in geom_hline.
geom_point(aes(shape = "int"), alpha = 0)
Longer answer:
Whenever possible, ggplot merges / combines legends of different aesthetics. For example, if colour and shape is mapped to the same variable, then the two legends are combined into one.
I illustrate this using simple data set with 'x', 'y' and a grouping variable 'grp' with two levels:
df <- data.frame(x = rep(1:2, 2), y = 1:4, grp = rep(c("a", "b"), each = 2))
First we map both color and shape to 'grp'
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4)
Fine, the legends for the aesthetics, color and shape, are merged into one.
Then we add a geom_hline. We want it to have a separate color from the geom_lines and to appear in the legend. Thus, we map color to a variable, i.e. put color inside aes of geom_hline. In this case we do not map the color to a variable in the data set, but to a constant. We may give the constant a desired name, so we don't need to rename the legend entries afterwards.
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int"))
Now two legends appears, one for the color aesthetics of geom_line and geom_hline, and one for the shape of the geom_points. The reason for this is that the "variable" which color is mapped to now contains three levels: the two levels of 'grp' in the original data, plus the level 'int' which was introduced in the geom_hline aes. Thus, the levels in the color scale differs from those in the shape scale, and by default ggplot can't merge the two scales into one legend.
How to combine the two legends?
One possibility is to introduce the same, additional level for shape as for color by using a dummy geom_point layer with transparent points (alpha = 0) so that the two aesthetics contains the same levels:
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int")) +
geom_point(aes(shape = "int"), alpha = 0) # <~~~~ a blank geom_point
Another possibility is to convert the original grouping variable to a factor, and add the "geom_hline level" to the original levels. Then use drop = FALSE in scale_shape_discrete to include "unused factor levels from the scale":
datadf$grp <- factor(df$grp, levels = c(unique(df$grp), "int"))
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int")) +
scale_shape_discrete(drop = FALSE)
Then, as you already know, you may use the guides function to "override" the shape aesthetics in the legend, and remove the shape from the geom_hline entry by setting it to NA:
guides(colour = guide_legend(override.aes = list(shape = c(16, 17, NA))))

Change text/labels ggplot legend

I believe this question is slightly different than similar ones asked on here before because of the use of scale_fill_brewer(. I'm working on a choropleth similar to this one https://gist.github.com/233134
That looks like this:
and the legend like:
I like it but want to change the labels on the legend from cut looking labels ie (2, 4] to something more friendly like '2% to 4%' or '2% - 4%'. I've seen elsewhere it;s easy to change the labels inside of scale_... as seen here. I can't seem to figure out where to put the labels= argument. I of course could re code choropleth$rate_d but that seems to be inefficient. Where should I put the argument labels=c(A, B, C, D...)?
Here's the piece of the code of interest (for the full code use the link above)
choropleth$rate_d <- cut(choropleth$rate, breaks = c(seq(0, 10, by = 2), 35))
# Once you have the data in the right format, recreating the plot is straight
# forward.
ggplot(choropleth, aes(long, lat, group = group)) +
geom_polygon(aes(fill = rate_d), colour = alpha("white", 1/2), size = 0.2) +
geom_polygon(data = state_df, colour = "white", fill = NA) +
scale_fill_brewer(pal = "PuRd")
Thank you in advance for your assistance.
EDIT: USing DWin's method (should have posted this error as this is what I ran up against before)
> ggplot(choropleth, aes(long, lat, group = group)) +
+ geom_polygon(aes(fill = rate_d), colour = alpha("white", 1/2), size = 0.2) +
+ geom_polygon(data = state_df, colour = "white", fill = NA) +
+ scale_fill_brewer(pal = "PuRd", labels = lev4)
Error: Labels can only be specified in conjunction with breaks
Besides adding a modified version of the levels, you also need to set the 'breaks' parameter to scale_fill_brewer:
lev = levels(rate_d) # used (2, 4] as test case
lev2 <- gsub("\\,", "% to ", lev)
lev3 <- gsub("\\]$", "%", lev2)
lev3
[1] "(2% to 4%"
lev4 <- gsub("\\(|\\)", "", lev3)
lev4
[1] "2% to 4%"
ggplot(choropleth, aes(long, lat, group = group)) +
geom_polygon(aes(fill = rate_d), colour = alpha("white", 1/2), size = 0.2) +
geom_polygon(data = state_df, colour = "white", fill = NA) +
scale_fill_brewer(pal = "PuRd", labels = lev4, , breaks=seq(0, 10, by = 2) )

Resources