R: ggplot2 removing some legend entries - r

require(reshape2);require(ggplot2)
df <- data.frame(time = 1:10,
x1 = rnorm(10),
x2 = rnorm(10),
x3 = rnorm(10),
y1 = rnorm(10),
y2 = rnorm(10))
df <- melt(df, id = "time")
ggplot(df, aes(x = time, y = value, color = variable, group = variable,
size = variable, linetype = variable)) +
geom_line() +
scale_linetype_manual(values = c(rep(1, 3), 2, 2)) +
scale_size_manual(values = c(rep(.3, 3), 2, 2)) +
scale_color_manual(values = c(rep("grey", 3), "red", "green")) +
theme_minimal()
This example might not be very representative, but, for example, imagine running bunch of regression models that individually are not important but just contribute to the picture. While I want to emphasize only actual and averaged fit series. So basically variables x are not important and should not appear on legend.
I've tried to set scale_color_discrete(breaks = c("y1", "y2")) as suggested in some other posts. But the problem is that all of aesthetics are already in use via manual and trying to set another discrete version will override properties that are already set for graph (and mess up whole thing). So ideally - I'd want to see the exact same graph, but only y1 and y2 displayed in the legend.

You can try subsetting the data set by the variable name and plotting them separately.
p <- ggplot(df, aes(x = time, y = value, color = variable,
group = variable, size = variable, linetype = variable)) +
geom_line(data=df[which(substr(df$variable,1,1)=='y'),])+
scale_linetype_manual(values = c(2, 2)) + scale_size_manual(values = c(2, 2)) +
scale_color_manual(values = c("red", "green")) +
theme_minimal() +
geom_line(data=df[which(substr(df$variable,1,1)=='x'),],
aes(x = time, y = value, group = variable),
color="grey",size=0.3,linetype=1)
# Plot elements that have attributes set outside of aes() will
# not appear on legend!

Related

ggplot2 fill legend does not display the correct "fill" color

I am confused of this problem for a long time. A simple data frame is constructed as follows
data <- data.frame(
x = 1:5,
y = 5:1,
fill = c(rep("pink", 3), rep("blue", 2)),
shape = c(rep(21, 3), rep(22, 2))
)
Suppose I wand to show the legend of the fill
uniFill <- unique(data$fill)
p <- ggplot(data,
mapping = aes(x = x,
y = y,
fill = fill)) +
geom_point(shape = data$shape) +
# show legend so that I do not call `scale_fill_identity()`
scale_fill_manual(values = uniFill,
labels = uniFill,
breaks = uniFill)
p
The graphics are OK, however, the legend is not correct
I guess, maybe different shapes (21 to 25) cannot be merged? Then, I partition the data into two subsets where the first set has shape 21 and the second has shape 22.
data1 <- data[1:3, ]
data2 <- data[4:5, ]
# > data1$shape
# [1] 21 21 21
# > data2$shape
# [1] 22 22
ggplot(mapping = aes(x = x,
y = y,
fill = fill)) +
geom_point(data = data1, shape = data1$shape) +
geom_point(data = data2, shape = data2$shape) +
scale_fill_manual(values = uniFill,
labels = uniFill,
breaks = uniFill)
Unfortunately, the legend does not change. Then, I changed the shape from a vector to a scalar, as in
ggplot(mapping = aes(x = x,
y = y,
fill = fill)) +
geom_point(data = data1, shape = 21) +
geom_point(data = data2, shape = 22) +
scale_fill_manual(values = uniFill,
labels = uniFill,
breaks = uniFill)
The legend of the fill color is correct finally...
So what happens here? Is it a bug? Is it possible to just add a single layer but with different shapes (21 to 25)?
A possible solution is that one can add component guides(), as in
p +
guides(fill = guide_legend(override.aes = list(fill = uniFill,
shape = 21)))
But I am more interested in why p does not work (legend)
The main reason your legend is not working in your first example is because you did not put your shape in the aesthetics.
I have a couple other suggestions: Do not define colors in your data frame; instead define a column to change the aesthetics using a code. Then define your fill and shape values explicitly. Each of the scales needs to have the same name - in this case "Legend."
Give this edit a try.
data <- data.frame(
x = 1:5,
y = 5:1,
fill = c(rep("p", 3), rep("b", 2))
)
uniFill <- c("p"="pink", "b"="blue")
uniShape <- c("p" = 21, "b" = 22)
p <- ggplot(data,
mapping = aes(x = x,
y = y,
fill = fill,
shape = fill)) +
geom_point() +
# show legend so that I do not call `scale_fill_identity()`
scale_fill_manual("Legend",values = uniFill,
labels = uniFill)+
scale_shape_manual("Legend",values = uniShape,
labels = uniFill)
p
(edit) If your fill and shape aesthetics do not match up, I don't see any other way than to use guides and two legends. Notice that if your attribute column is descriptive, you do not need to set the labels and your code will be cleaner (see shape vs fill aesthetics).
data <- data.frame(
x = 1:5,
y = 5:1,
fill = c(rep("p", 3), rep("b", 2)),
shape = c(rep("circles", 2), rep("squares", 3))
)
uniFill <- c("p"="pink", "b"="blue")
uniShape <- c("circles" = 21, "squares" = 22)
p <- ggplot(data,
mapping = aes(x = x,
y = y,
fill = fill,
shape = shape)) +
geom_point() +
# show legend so that I do not call `scale_fill_identity()`
scale_fill_manual("Legend fill",values = uniFill,
labels = uniFill)+
scale_shape_manual("Legend shape",values = uniShape )+
guides(fill = guide_legend("Legend fill", override.aes = list(shape = 21)))
p

Implement data from several data frames in one bubble plot

I have a data frame with the dimensions 625616 x 12. I would like to illustrate the data with a bubble plot. To illustrate my situation I will use the mtcars data set.
mtcars$cyl = as.factor(mtcars$cyl)
bp = ggplot(as.data.frame(mtcars), aes(x = wt, y = mpg, size = qsec)) + geom_point(shape = 21)
bp
Analogous to my data frame, I used with this command the data from 3 out of 12 columns. Ideally, I would like to add to this bubble plot another set of bubbles in another colour (column 4-6).
I tried to use the "add" function.
bp2 = ggplot(as.data.frame(mtcars), aes(x = wt2, y = mpg2, size = qsec2)) + geom_point(shape = 21)
plot(bp2, add = T)
Unfortunately it doesn't work out neither.
In case you have different x, y and size variables in the same data set, you can define them in the aesthetics for each geom_point
df <- data.frame(x1 = rnorm(20), y1 = rnorm(20),
x2 = rnorm(20), y2 = rnorm(20),
z1 = rnorm(20), z2 = rnorm(20))
ggplot(df) +
geom_point(aes(x = x1, y = y1, size = z1), col = "red") +
geom_point(aes(x = x2, y = y2, size = z2), col = "blue")
In case you have two distinct data sets, you can define that in the geoms as well:
ggplot() +
geom_point(aes(x = x1, y = y1, size = z1), col = "red", data = df1) +
geom_point(aes(x = x2, y = y2, size = z2), col = "blue",data = df2)
Edit based on your comment: you can change the overall size of points by e.g. using scale_size_continuous(range = c(0, 10)) and changing 10 to another value.

R bubble plot using ggplot manually selecting the colour and axis names

I using ggplot to create a bubble plot. With this code:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
theme_bw() +
theme() +
scale_size(range = c(1, 50)) +
ylim(0,100)
It is working perfectly apart from 2 things:
For each name (fill) I would like to manually specify the colour used (via a dataframe that maps name to colour) - this is to provide consistency across multiple figures.
I would like to substitute the numbers on the y for text labels (for several reasons I cannot use the text labels from the outset due to ordering issues)
I have tried several methods using scale_color_manual() and scale_y_continuous respectively and I am getting nowhere! Any help would be very gratefully received!
Thanks
Since you have not specified an example df, I created one of my own.
To manually specify the color, you have to use scale_fill_manual with a named vector as the argument of values.
Edit 2
This appears to do what you want. We use scale_y_continuous. The breaks argument specifies the vector of positions, while the labels argument specifies the labels which should appear at those positions. Since we already created the vectors when creating the data frame, we simply pass those vectors as arguments.
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(breaks = mean, labels = order_label)
Edit 1
From your comment, it appears that you want to label the circles. One option would be to use geom_text. Code below. You may need to experiment with values of nudge_y to get the position correct.
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
order_label <- c("New York", "London")
df <- data.frame(order, mean, n, name, order_label, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
geom_text(aes(label = order_label), size = 3, hjust = "inward",
nudge_y = 0.03) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab(NULL)
Original Answer
It is not clear what you mean by "substitute the numbers on the y for text labels". In the example below, I have formatted the y-axis as a percentage using the scales::percent_format() function. Is this similar to what you want?
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
df <- data.frame(order, mean, n, name, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(labels = scales::percent_format())
Thanks, for all your help, this worked perfectly:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_x_continuous(breaks = order, labels = order_label)

Side by side violin plots for multiple iteration

Here is my data
set.seed(42)
dat = data.frame(iter = rep(1:3, each = 10),
variable = rep(rep(letters[1:2], each = 5), 3),
value = rnorm(30))
I know I can draw violin plots for a and b with
library(ggplot2)
ggplot(data = dat, aes (x = variable, y = value)) + geom_violin()
But how do I draw violin plots for each iteration of a and b so that there will be three plots for a next to three plots for b. I have done it previously using base plot but I am looking for a better solution since the number of iterations as well as number of 'a's and 'b's keeps on changing.
There are two possible ways. One would be by adding a fill command, the other using facet_wrap (or facet_grid)
With fill:
ggplot(data = dat, aes (x = variable, y = value, fill = as.factor(iter))) + geom_violin(position = "dodge")
Or using facet_wrap:
ggplot(data = dat, aes (x = as.factor(iter), y = value)) + geom_violin(position = "dodge") + facet_wrap(~variable)
Maybe there is a better way but in this kind of situation I usually create a new variable:
set.seed(42)
dat = data.frame(iter = rep(1:3, each = 10),
variable = rep(rep(letters[1:2], each = 5), 3),
value = rnorm(30))
dat <- dat %>% mutate(x_axis = as.factor(as.numeric(factor(variable))*100 + 10*iter))
levels(dat$x_axis)<- c("a1", "a", "a3", "b2", "b", "b3")
ggplot(data = dat,
aes(x = x_axis,
y = value, fill =variable)) + geom_violin() + scale_x_discrete(breaks = c("a","b"))
Result is:

Condition a ..count.. summation on the faceting variable

I'm trying to annotate a bar chart with the percentage of observations falling into that bucket, within a facet. This question is very closely related to this question:
Show % instead of counts in charts of categorical variables but the introduction of faceting introduces a wrinkle. The answer to the related question is to use stat_bin w/ the text geom and then have the label be constructed as so:
stat_bin(geom="text", aes(x = bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
)
This works fine for an un-faceted plot. However, with facets, this sum(..count..) is summing over the entire collection of observations without regard for the facets. The plot below illustrates the issue---note that the percentages do not sum to 100% within a panel.
Here the actually code for the figure above:
g.invite.distro <- ggplot(data = df.exp) +
geom_bar(aes(x = invite_bins)) +
facet_wrap(~cat1, ncol=3) +
stat_bin(geom="text", aes(x = invite_bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
),
vjust = -1, size = 3) +
theme_bw() +
scale_y_continuous(limits = c(0, 3000))
UPDATE: As per request, here's a small example re-producing the issue:
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
ggplot(data = df) + geom_bar(aes(x = x)) +
stat_bin(geom = "text", aes(
x = x,
y = ..count.., label = ..count../sum(..count..)), vjust = -1) +
facet_wrap(~f)
Update geom_bar requires stat = identity.
Sometimes it's easier to obtain summaries outside the call to ggplot.
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
# Load packages
library(ggplot2)
library(plyr)
# Obtain summary. 'Freq' is the count, 'pct' is the percent within each 'f'
m = ddply(data.frame(table(df)), .(f), mutate, pct = round(Freq/sum(Freq) * 100, 1))
# Plot the data using the summary data frame
ggplot(data = m, aes(x = x, y = Freq)) +
geom_bar(stat = "identity", width = .7) +
geom_text(aes(label = paste(m$pct, "%", sep = "")), vjust = -1, size = 3) +
facet_wrap(~ f, ncol = 2) + theme_bw() +
scale_y_continuous(limits = c(0, 1.2*max(m$Freq)))

Resources