Change text/labels ggplot legend - r

I believe this question is slightly different than similar ones asked on here before because of the use of scale_fill_brewer(. I'm working on a choropleth similar to this one https://gist.github.com/233134
That looks like this:
and the legend like:
I like it but want to change the labels on the legend from cut looking labels ie (2, 4] to something more friendly like '2% to 4%' or '2% - 4%'. I've seen elsewhere it;s easy to change the labels inside of scale_... as seen here. I can't seem to figure out where to put the labels= argument. I of course could re code choropleth$rate_d but that seems to be inefficient. Where should I put the argument labels=c(A, B, C, D...)?
Here's the piece of the code of interest (for the full code use the link above)
choropleth$rate_d <- cut(choropleth$rate, breaks = c(seq(0, 10, by = 2), 35))
# Once you have the data in the right format, recreating the plot is straight
# forward.
ggplot(choropleth, aes(long, lat, group = group)) +
geom_polygon(aes(fill = rate_d), colour = alpha("white", 1/2), size = 0.2) +
geom_polygon(data = state_df, colour = "white", fill = NA) +
scale_fill_brewer(pal = "PuRd")
Thank you in advance for your assistance.
EDIT: USing DWin's method (should have posted this error as this is what I ran up against before)
> ggplot(choropleth, aes(long, lat, group = group)) +
+ geom_polygon(aes(fill = rate_d), colour = alpha("white", 1/2), size = 0.2) +
+ geom_polygon(data = state_df, colour = "white", fill = NA) +
+ scale_fill_brewer(pal = "PuRd", labels = lev4)
Error: Labels can only be specified in conjunction with breaks

Besides adding a modified version of the levels, you also need to set the 'breaks' parameter to scale_fill_brewer:
lev = levels(rate_d) # used (2, 4] as test case
lev2 <- gsub("\\,", "% to ", lev)
lev3 <- gsub("\\]$", "%", lev2)
lev3
[1] "(2% to 4%"
lev4 <- gsub("\\(|\\)", "", lev3)
lev4
[1] "2% to 4%"
ggplot(choropleth, aes(long, lat, group = group)) +
geom_polygon(aes(fill = rate_d), colour = alpha("white", 1/2), size = 0.2) +
geom_polygon(data = state_df, colour = "white", fill = NA) +
scale_fill_brewer(pal = "PuRd", labels = lev4, , breaks=seq(0, 10, by = 2) )

Related

How do I add a legend to identify vertical lines in ggplot?

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?
1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)
You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

R bubble plot using ggplot manually selecting the colour and axis names

I using ggplot to create a bubble plot. With this code:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
theme_bw() +
theme() +
scale_size(range = c(1, 50)) +
ylim(0,100)
It is working perfectly apart from 2 things:
For each name (fill) I would like to manually specify the colour used (via a dataframe that maps name to colour) - this is to provide consistency across multiple figures.
I would like to substitute the numbers on the y for text labels (for several reasons I cannot use the text labels from the outset due to ordering issues)
I have tried several methods using scale_color_manual() and scale_y_continuous respectively and I am getting nowhere! Any help would be very gratefully received!
Thanks
Since you have not specified an example df, I created one of my own.
To manually specify the color, you have to use scale_fill_manual with a named vector as the argument of values.
Edit 2
This appears to do what you want. We use scale_y_continuous. The breaks argument specifies the vector of positions, while the labels argument specifies the labels which should appear at those positions. Since we already created the vectors when creating the data frame, we simply pass those vectors as arguments.
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(breaks = mean, labels = order_label)
Edit 1
From your comment, it appears that you want to label the circles. One option would be to use geom_text. Code below. You may need to experiment with values of nudge_y to get the position correct.
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
order_label <- c("New York", "London")
df <- data.frame(order, mean, n, name, order_label, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
geom_text(aes(label = order_label), size = 3, hjust = "inward",
nudge_y = 0.03) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab(NULL)
Original Answer
It is not clear what you mean by "substitute the numbers on the y for text labels". In the example below, I have formatted the y-axis as a percentage using the scales::percent_format() function. Is this similar to what you want?
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
df <- data.frame(order, mean, n, name, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(labels = scales::percent_format())
Thanks, for all your help, this worked perfectly:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_x_continuous(breaks = order, labels = order_label)

ggplot generating two legends when only one is wanted

In R I'm trying to generate a plot where I want to apply unique colors, line types, transparencies, and line thicknesses by case grouping. As currently implemented two legend plots are generated instead of one. The second legend plot is the only one that I can change the legend title. Presumably I've made a mistake any help would be greatly appreciated.
Ultimately I want to generate a single legend and have the style changes and labeling changes take effect.
library(ggplot2)
temp_df <- data.frame(year = integer(50), value = numeric(50), case = character(50))
temp_df$year <- 1:50
temp_df$value <- runif(50)
temp_df$case <- "A"
df <- temp_df
temp_df$value <- runif(50)
temp_df$case <- "B"
df <- rbind(df, temp_df)
LineTypes <- c("solid", "dotted")
colors <- c("red", "black")
linealphas <- c(1, .8)
linesizes <- c(1, 2)
Plot <- ggplot(df, aes(x = year, y = value, group = case))+
geom_line(aes(linetype = case, color = case, size = case, alpha = case))+
scale_linetype_manual(values = LineTypes)+
scale_color_manual(values = colors)+
scale_y_continuous(limits = c(0, 1), labels = scales::percent)+
scale_alpha_manual(values = linealphas)+
scale_size_manual(values = linesizes)+
xlab("Year")+
ylab("Percentage%")+
labs(color = "Scenario")+
theme_minimal()
Plot
If you want ggplot to merge the legends then they all have to have the same title. You can specify the legend title with the name argument in the scales:
ggplot(df, aes(x = year, y = value, group = case))+
geom_line(aes(linetype = case, color = case, size = case, alpha = case)) +
scale_linetype_manual(values = LineTypes, name = "Scenario")+
scale_color_manual(values = colors, name = "Scenario")+
scale_y_continuous(limits = c(0, 1), labels = scales::percent)+
scale_alpha_manual(values = linealphas, name = "Scenario")+
scale_size_manual(values = linesizes, name = "Scenario")+
xlab("Year")+
ylab("Percentage%")+
theme_minimal()
A coworker pointed out a resolution to me the key was to remove the guides so that only one of styles that I had defined was being used for the legend.
guides(size = FALSE)+
guides(alpha = FALSE)+
guides(linetype = FALSE)+
His explanation for this was that R doesn't recognize that the vector of factors defining the properties of the plot are necessarily related. As a result it will generate multiple legends when only one is desired.
library(ggplot2)
temp_df<-data.frame(year=integer(50),value=numeric(50),case=character(50))
temp_df$year<-1:50
temp_df$value<-runif(50)
temp_df$case<-"A"
df<-temp_df
temp_df$value<-runif(50)
temp_df$case<-"B"
df<-rbind(df,temp_df)
LineTypes<-c("solid","dotted")
colors<-c("red","black")
linealphas<-c(1,.8)
linesizes<-c(1,2)
Plot<-ggplot(df,aes(x=year,y=value,group=case))+
geom_line(aes(linetype=case, color=case, size=case, alpha =case))+
scale_linetype_manual(values=LineTypes)+
scale_color_manual(values=colors)+
scale_y_continuous(limits=c(0,1),labels = scales::percent)+
scale_alpha_manual(values=linealphas)+
scale_size_manual(values=linesizes)+
xlab("Year")+
ylab("Percentage%")+
labs(color = "Scenario")+
guides(size = FALSE)+
guides(alpha = FALSE)+
guides(linetype = FALSE)+
theme_minimal()
Plot
Can't you just remove the line "labs(color = "Scenario")"?
This is the plot that gets generated. Not sure if it's missing anything that you need.
The result seems fine to me:

Aesthetics must be either length 1 or the same as the data (1): x, y, label

I'm working on some data on party polarization (something like this) and used geom_dumbbell from ggalt and ggplot2. I keep getting the same aes error and other solutions in the forum did not address this as effectively. This is my sample data.
df <- data_frame(policy=c("Not enough restrictions on gun ownership", "Climate change is an immediate threat", "Abortion should be illegal"),
Democrats=c(0.54, 0.82, 0.30),
Republicans=c(0.23, 0.38, 0.40),
diff=sprintf("+%d", as.integer((Democrats-Republicans)*100)))
I wanted to keep order of the plot, so converted policy to factor and wanted % to be shown only on the first line.
df <- arrange(df, desc(diff))
df$policy <- factor(df$policy, levels=rev(df$policy))
percent_first <- function(x) {
x <- sprintf("%d%%", round(x*100))
x[2:length(x)] <- sub("%$", "", x[2:length(x)])
x
}
Then I used ggplot that rendered something close to what I wanted.
gg2 <- ggplot()
gg2 <- gg + geom_segment(data = df, aes(y=country, yend=country, x=0, xend=1), color = "#b2b2b2", size = 0.15)
# making the dumbbell
gg2 <- gg + geom_dumbbell(data=df, aes(y=country, x=Democrats, xend=Republicans),
size=1.5, color = "#B2B2B2", point.size.l=3, point.size.r=3,
point.color.l = "#9FB059", point.color.r = "#EDAE52")
I then wanted the dumbbell to read Democrat and Republican on top to label the two points (like this). This is where I get the error.
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Democrats, y=country, label="Democrats"),
color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Republicans, y=country, label="Republicans"),
color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
Any thoughts on what I might be doing wrong?
I think it would be easier to build your own "dumbbells" with geom_segment() and geom_point(). Working with your df and changing the variable refences "country" to "policy":
library(tidyverse)
# gather data into long form to make ggplot happy
df2 <- gather(df,"party", "value", Democrats:Republicans)
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
# our dumbell
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
# the text labels
geom_text(aes(label = party), vjust = -1.5) + # use vjust to shift text up to no overlap
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) + # named vector to map colors to values in df2
scale_x_continuous(limits = c(0,1), labels = scales::percent) # use library(scales) nice math instead of pasting
Produces this plot:
Which has some overlapping labels. I think you could avoid that if you use just the first letter of party like this:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(aes(label = gsub("^(\\D).*", "\\1", party)), vjust = -1.5) + # just the first letter instead
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red"),
guide = "none") +
scale_x_continuous(limits = c(0,1), labels = scales::percent)
Only label the top issue with names:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(data = filter(df2, policy == "Not enough restrictions on gun ownership"),
aes(label = party), vjust = -1.5) +
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) +
scale_x_continuous(limits = c(0,1), labels = scales::percent)

Selecting entries in legend of ggplot in R

I'm creating a figure using ggplot. That figure has 27 lines that I want to show but not emphasize, and two lines, mean and weighted mean, that I want to emphasize. I would like only these last two lines to appear into the legend of the plot. Here is my code:
p_plot <- ggplot(data = dta, aes(x = date, y = premium, colour = State)) +
geom_line(, show_guide=FALSE) +
scale_color_manual(values=c(rep("gray60", 27)))
p_plot <- p_plot + geom_line(aes(y = premium.m), colour = "blue", size = 1.25,
show_guide=TRUE) + geom_line(aes(y = premium.m.w), colour = "red",
size = 1.25, show_guide=c(TRUE)) + ylab("Pe/pg")
p_plot
The show_guide = FALSE statement in the first geom_line seems to be overridden by the other show_guide=TRUE statements. How can I limit the number of entries in the legend of my figures to the lines "premium.m" and "premium.m.w"? Thank you.
I think this should answer your question: (the code's been slightly modified but the concept is the same)
dta <- data.frame(date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), "months"), 26),
premium = rnorm(12*26),
State = rep(letters, each = 12))
library(ggplot2)
p_plot <- ggplot(data = dta) +
geom_line(aes(x = date, y = premium, group = State), colour = "grey60")
p_plot + geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, mean)), colour = "mean"),
size = 1.25) +
geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, median)), colour = "median"),
size = 1.25) + ylab("Pe/pg") + scale_color_discrete("stats")
p_plot
However, this is just a (ugly) workaround and far from the best practice for data visualisation (especially for the purposes ggplot has been implemented for). Anyway, I could provide you with a more elegant solution if you edited your question adding more details.

Resources