Selecting entries in legend of ggplot in R - r

I'm creating a figure using ggplot. That figure has 27 lines that I want to show but not emphasize, and two lines, mean and weighted mean, that I want to emphasize. I would like only these last two lines to appear into the legend of the plot. Here is my code:
p_plot <- ggplot(data = dta, aes(x = date, y = premium, colour = State)) +
geom_line(, show_guide=FALSE) +
scale_color_manual(values=c(rep("gray60", 27)))
p_plot <- p_plot + geom_line(aes(y = premium.m), colour = "blue", size = 1.25,
show_guide=TRUE) + geom_line(aes(y = premium.m.w), colour = "red",
size = 1.25, show_guide=c(TRUE)) + ylab("Pe/pg")
p_plot
The show_guide = FALSE statement in the first geom_line seems to be overridden by the other show_guide=TRUE statements. How can I limit the number of entries in the legend of my figures to the lines "premium.m" and "premium.m.w"? Thank you.

I think this should answer your question: (the code's been slightly modified but the concept is the same)
dta <- data.frame(date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), "months"), 26),
premium = rnorm(12*26),
State = rep(letters, each = 12))
library(ggplot2)
p_plot <- ggplot(data = dta) +
geom_line(aes(x = date, y = premium, group = State), colour = "grey60")
p_plot + geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, mean)), colour = "mean"),
size = 1.25) +
geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, median)), colour = "median"),
size = 1.25) + ylab("Pe/pg") + scale_color_discrete("stats")
p_plot
However, this is just a (ugly) workaround and far from the best practice for data visualisation (especially for the purposes ggplot has been implemented for). Anyway, I could provide you with a more elegant solution if you edited your question adding more details.

Related

Join lines of different aesthetics in ggplot2

I'm trying to solve the following problem: how to join lines of different aesthetics/groups in ggplot and maintain the color of a certain group. It is best to see the examples below for a better understanding of what I mean.
Consider the following code:
set.seed(870123)
library(ggplot2)
example_data <- data.frame(Date = seq.Date(as.Date("2020-01-01"),length.out = 20,by = "month"),
Series = rnorm(20) + 1,
Grouping = c(rep(c("Group1"),10),rep(c("Group2"),10)))
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line(aes(color = Grouping)) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
Which generates the plot:
My problem is that ggplot won't join the lines where the grouping variable changes. I would like, for example, there to be a red line joining the last blue point to the first red point. One way to solve such issue would be to not specify any color aesthetic for the geom_line portion, as below, but this is NOT what I want:
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line() +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
The only way I am able to achieve what I want is with the really bad solution demonstrated below...
### Dirty Solution
temporary <- example_data[as.Date("2020-10-01") == example_data$Date,]
temporary$Grouping <- "Group2"
example_data <- rbind(example_data,temporary)
print(ggplot(example_data,aes(x = Date, y = Series)) +
geom_line(aes(color = Grouping)) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red")))
Plot generated, which is what I would like to achieve through a better solution:
Is there any way to achieve what I want with ggplot2? I wasn't able to find a better solution.
library(ggplot2); library(dplyr)
ggplot(example_data,aes(x = Date, y = Series)) +
# use lag to connect backwards, lead to connect forwards
geom_segment(aes(color = Grouping, xend = lag(Date), yend = lag(Series))) +
geom_point(aes(color = Grouping)) +
scale_color_manual(values = c(Group1 = "blue", Group2 = "red"))

Adding a label to a straight line in each facet of a facet wrapped ggplot

I've been struggling with one last bit of code to make this graph I'm working on really work for me and my audience. I have a bar chart with a two lines (one is acting as a rolling average, the other as the peak of that rolling average). What I want to do is label that peak line with a number, one time, but in each facet where the number is different in each facet. Here's some stripped down data and code:
tdf <- data.frame(a=as.POSIXct(c("2019-10-15 08:00:00","2019-10-15 09:00:00","2019-10-15 10:00:00","2019-10-15 08:00:00","2019-10-15 09:00:00","2019-10-15 10:00:00")),
b=as.Date(c("2019-09-02","2019-09-02","2019-09-02","2019-09-03","2019-09-03","2019-09-03")),
m1=c(0.2222222,0.3636364, 0.2307692, 0.4000000, 0.3428571, 0.3529412),
m2=c(0.2222222,0.2929293, 0.2972028, 0.3153846, 0.3714286, 0.3529412),
m3=c(0.2929293, 0.2929293, 0.2929293, 0.3529412,0.3529412,0.3529412))
g <- ggplot(data = tdf, aes(x = a, y = m1)) +
geom_bar(stat = "identity", alpha = 0.75, fill = 352) +
xlab("time of day") +
ylab("metric name") +
ggtitle("Graph Title") +
scale_x_datetime(breaks = scales::date_breaks("1 hours"),
date_labels = "%H")+
scale_y_continuous(breaks = c(0,.10,.20,.30,.40,.50,.50,.60,.70,.80,.90,1.0),
labels = scales::percent) +
theme_minimal()
# add line for m2
g <- g +
geom_line(data = tdf,
aes(x = a, y = m2),
color = "blue",
size = 1.2)
# add line for m3
g <- g + geom_line(data=tdf,
aes(x = a, y = m3),
color = "#d95f02",
size = 0.6,
linetype = "dashed")
# last attempt to label the line results in an error: Invalid input: time_trans works with objects of class POSIXct
#g <- g+geom_text(aes(x=-Inf, y=Inf, label=median(tdf$m3)), size=2, hjust=-0.5, vjust= 1.4,inherit.aes=FALSE)
# facet wrap
g <- g + facet_wrap(~b, ncol = 5, scales = "fixed")
I've seen a few techniques, but none of them seem to relate having a time for the x-axis in the facets, and each facet having a different date. I'm reasonably certain it's related to the date, but I sort of have no clue how to make the text block happen on each facet anyway.
You just need to pass a different dataset to the labeling layer that still preserves your faceting variable. This will work using dplyr
g <- g +
geom_text(data = tdf %>%
group_by(b) %>%
summarize(median = median(m3)),
aes(x = as.POSIXct(-Inf, origin="1970-01-01"),
y = Inf,
label = median),
size = 2,
hjust = -0.5,
vjust = 1.4,
inherit.aes = FALSE)
We also have to explicitly convert the x to a date/time value for the axis to work.

How to correct the position of labels on piechart in ggplot. Also tell me how produce 3D piechart

One of the value in my dataset is zero, I think because of that I am not able to adjust labels correctly in my pie chart.
#Providing you all a sample dataset
Averages <- data.frame(Parameters = c("Cars","Motorbike","Bicycle","Airplane","Ships"), Values = c(15.00,2.81,50.84,51.86,0.00))
mycols <- c("#0073C2FF", "#EFC000FF", "#868686FF", "#CD534CFF","#FF9999")
duty_cycle_pie <- Averages %>% ggplot(aes(x = "", y = Values, fill = Parameters)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0)+
geom_text(aes(y = cumsum(Values) - 0.7*Values,label = round(Values*100/sum(Values),2)), color = "white")+
scale_fill_manual(values = mycols)
Labels are not placed in the correct way. Please tell me how can get 3D piechart.
Welcome to stackoverflow. I am happy to help, however, I must note that piecharts are highly debatable and 3D piecharts are considered bad practice.
https://www.darkhorseanalytics.com/blog/salvaging-the-pie
https://en.wikipedia.org/wiki/Misleading_graph#3D_Pie_chart_slice_perspective
Additionally, if the names of your variables reflect your actual dataset (Averages), a piechart would not be appropriate as the pieces do not seem to be describing parts of a whole. Ex: avg value of Bicycle is 50.84 and avg value of Airplane is 51.86. Having these result in 43% and 42% is confusing; a barchart would be easier to follow.
Nonetheless, the answer to your question about placement can be solved with position_stack().
library(tidyverse)
Averages <-
data.frame(
Parameters = c("Cars","Motorbike","Bicycle","Airplane","Ships"),
Values = c(15.00,2.81,50.84,51.86,0.00)
) %>%
mutate(
# this will ensure the slices go biggest to smallest (a best practice)
Parameters = fct_reorder(Parameters, Values),
label = round(Values/sum(Values) * 100, 2)
)
mycols <- c("#0073C2FF", "#EFC000FF", "#868686FF", "#CD534CFF","#FF9999")
Averages %>%
ggplot(aes(x = "", y = Values, fill = Parameters)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0) +
geom_text(
aes(y = Values, label = label),
color = "black",
position = position_stack(vjust = 0.5)
) +
scale_fill_manual(values = mycols)
To move the pieces towards the outside of the pie, you can look into ggrepel
https://stackoverflow.com/a/44438500/4650934
For my earlier point, I might try something like this instead of a piechart:
ggplot(Averages, aes(Parameters, Values)) +
geom_col(aes(y = 100), fill = "grey70") +
geom_col(fill = "navyblue") +
coord_flip()

How do I add a legend to identify vertical lines in ggplot?

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?
1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)
You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

ggplot outline jitter datapoints

I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")

Resources