How to add individual hlines for each bar in a plot? - r

Given a data frame and a plot as follows:
library(dplyr)
library(ggplot2)
dat <- data.frame(grp = c("a", "b", "c"),
val = c(30, 20, 10),
avg = c(25, 15, 5))
dat %>%
ggplot(aes(x = grp, y = val)) +
geom_bar(stat = "identity")
How do I amend the code above to place a unique horizontal reference line (avg) on each bar as shown below:

This could be achieved via geom_segment like so, where I first conver grp to a numeric and corresponding to the default width of a bar of .9 put the x at .45 to the left and xend at .45 to the right:
library(ggplot2)
dat <- data.frame(grp = c("a", "b", "c"),
val = c(30, 20, 10),
avg = c(25, 15, 5))
ggplot(dat, aes(x = grp, y = val)) +
geom_bar(stat = "identity") +
geom_segment(aes(y = avg, yend = avg,
x = as.numeric(factor(grp)) - .45,
xend = as.numeric(factor(grp)) + .45), color = "red")
EDIT Thanks to comment by #tjebo: As hard-coding is rarely a good idea one could set the width via a variable:
w <- .9
...
geom_segment(aes(y = avg, yend = avg,
x = as.numeric(factor(grp)) - w/2,
xend = as.numeric(factor(grp)) + w/2), color = "red")

Related

Overlay two plots from different dataframes in R

I would like to overlay two ggplots from different data sources. I don't think a left_join will work because the dataframes are of two different lengths and would potential change the underlying plots.[Maybe?]
library(tidyverse)
set.seed(123)
player_df <- tibble(name = rep(c("A","B","C","D"), each = 10, times = 1),
pos = rep(c("DEF","DEF","MID","MID"), each = 10, times = 1),
load = c(rnorm(10, mean = 200, sd = 100),
rnorm(10, mean = 300, sd = 50),
rnorm(10, mean = 400, sd = 100),
rnorm(10, mean = 500, sd = 50)))
p1 <- player_df %>%
ggplot(aes(x = load, y = name)) +
geom_point()
pos_df <- tibble(pos = rep(c("DEF","MID"), each = 30, times = 1),
load = (c(rnorm(30, mean = 250, sd = 100),
rnorm(30, mean = 350, sd = 100))))
p2 <- pos_df %>%
ggplot(aes(x = load, y = pos)) +
geom_boxplot()
p1
p2
# add p2 to every p1 player plot by pos
I would like p1 to have the corresponding p2 - by pos - appear behind it. So... add the matching p2 boxplot to each p1 scatterplot.
p1:
p2:
It's not really advisable to attempt to superimpose two plots on each other. A ggplot is made of layers already, so usually it's just a case of superimposing one geom on another. This can be difficult if (as in your case) one of the axes has different labels. However, with a little work it is possible to wrangle your data so that it all sits on a single plot. In your case, you could do something like:
levs <- c("A", "DEF", "B", "C", "MID", "D")
ggplot(within(pos_df, pos <- factor(pos, levs)), aes(x = load, y = pos)) +
geom_boxplot(width = 2.3) +
geom_point(data = within(player_df, pos <- factor(name, levs))) +
scale_y_discrete(limits = c("A", "DEF", "B", " ", "C", "MID", "D"))
Dug into ggplot a bit and re-engineered a boxplot bit by bit.
# manually calculate stats that are used in boxplots
pos_df_summary <- pos_df %>%
group_by(pos, .drop = FALSE) %>%
summarise(min = fivenum(load)[1],
Q1 = fivenum(load)[2],
median = fivenum(load)[3],
Q3 = fivenum(load)[4],
max = fivenum(load)[5]
)
# add the boxplot data to each player
joined_df <- player_df %>%
left_join(., pos_df_summary, by = "pos") %>%
distinct(name, .keep_all = TRUE)
# plot
ggplot(data = NULL, aes(group = name)) +
# create the line from min to max
geom_segment(data = joined_df, aes(y = name, yend = name, x=min, xend=max), color="black") +
#create the box with median line
geom_crossbar(data = joined_df,
aes(y = name, xmin = Q1, xmax = Q3, x = median, fill = "NA"),
color = "black",
fatten = 1) +
scale_fill_manual(values = "white") +
# add the points from the player_df
geom_point(data = player_df,
aes(x = load, y = name, group=name),
color = "red",
show.legend=FALSE) +
theme(legend.position = "none")
There may be some extraneous code in here as I cobbled it from some other resources. Specifically, I'm not sure what the aes(group = name) in the ggplot() call does exactly.

How to produce neat label positions in the ggplot2 line chart?

I have a line chart built using ggplot2. It looks following:
Lines are close to each other and data labels are overlapping. It is not convenient. It would be better if light red labels were below the line and green labels where there is room for them. Something of the sort:
This post is helpful. However, I do not know in advance for which line it would be better to put labels above and for which it would be better to keep them below. Therefore I am looking for a generic solution.
ggrepel does a great job in organizing labels. But cannot figure out how to make it work in my case. I tried different parameters. Here is one of the simplest variants (not the best looking):
Questions:
Is there any way to make in R the chart look like on the 2nd picture?
I think ggrepel computes the best label position taking into account the size of the chart. If I export the chart to PowerPoint, for example, the size of the PowerPoint chart might be different from the size used to get optimal data label positions. Is there any way to pass the size of the chart to ggrepel?
Here is a code I used to generate data and charts:
library(ggplot2)
library(ggrepel)
set.seed(1)
x = rep(1:20, 3)
y = c(runif(20, 10, 11),
runif(20, 11, 12),
runif(20, 12, 13))
z = rep(c("a", "b", "c"), each = 20)
df = data.frame(x = x, y = y, z = z)
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
geom_line() +
geom_text(aes(label = round(y, 1)), nudge_y = 1) +
ylim(c(0, 20))
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
geom_line() +
geom_text_repel(aes(label = round(y, 1)), nudge_y = 1) +
ylim(c(0, 20))
Changing the theme to theme_bw() and removing gridlines from {ggExtra}'s removeGridX() gets the plot closer your second image. I also increased the size of the lines, limited the axes, and changed geom_text_repel to geom_label_repel to improve readability.
library(ggplot2)
library(ggrepel)
library(ggExtra)
set.seed(1)
x = rep(1:20, 3)
y = c(runif(20, 10, 11),
runif(20, 11, 12),
runif(20, 12, 13))
z = rep(c("a", "b", "c"), each = 20)
df = data.frame(x = x, y = y, z = z)
ggplot(data = df, aes(x = x, y = y, group = z, color = z)) +
theme_bw() + removeGridX() +
geom_line(size = 2) +
geom_label_repel(aes(label = round(y, 1)),
nudge_y = 0.5,
point.size = NA,
segment.color = NA,
min.segment.length = 0.1,
key_glyph = draw_key_path) +
scale_x_continuous(breaks=seq(0,20,by=1)) +
scale_y_continuous(breaks = seq(0, 14, 2), limits = c(0, 14))

Error bar sizing skewed when using plotly

I have a chart which has an error bar on it:
However, when I put the chart inside a plotly wrapper, the error bar sizing gets messed up, as shown below:
Does anyone have a solution for keeping the error bar width the same size as the bar, as shown in plot 1, but while keeping the plot rendering with plotly?
library(tidyverse)
library(plotly)
dat <- data.frame(peeps= c("Bill", "Bob", "Becky"),
vals = c(10, 15, 12),
goals = c(8, 13, 10),
grp = c("Bears", "Bears", "Mongoose") %>% as.factor)
p1 <- dat %>%
ggplot(aes(x = peeps, y = vals, fill = grp)) +
geom_bar(stat = "identity") +
geom_errorbar(data = dat,
aes(ymin = goals, ymax = goals),
color = "blue",
size = 1,
linetype = 1) +
scale_y_continuous(expand = c(0, 0)) +
coord_flip()
p1
ggplotly(p1) %>%
layout(legend = list(orientation = "h",
xanchor = "center",
y = -0.15,
x = 0.5))
Using geom_segment() instead of geom_errorbar() is a work-around for this problem.
dat <- data.frame(peeps= c("Bill", "Bob", "Becky") %>% as.factor,
vals = c(10, 15, 12),
goals = c(8, 13, 10),
grp = c("Bears", "Bears", "Mongoose"),
rowid = 1:3)
p1 <- ggplot(data = dat, aes(x = peeps, y = vals, fill = grp, order = rowid)) +
geom_col() +
geom_segment(aes(
x = as.numeric(peeps)-0.45,
xend = as.numeric(peeps)+0.45,
y = goals, yend = goals),
color = "blue",
size = 1) +
scale_y_continuous(expand = c(0, 0)) +
coord_flip()
ggplotly(p1) %>%
layout(legend = list(orientation = "h",
xanchor = "center",
y = -0.15,
x = 0.5))

How to extend line across entire violin plot

Dataframe as example:
library(tidyverse)
set.seed(123)
df <- data.frame("b" = runif(1000, min = 2, max = 10),
"c" = runif(1000, min = 2, max = 10),
"d" = runif(1000, min = 2, max = 10))
df_2 <- data.frame(id = c("b", "c", "d"),
cutoff = c(5, 3, 5),
stringsAsFactors = FALSE)
df <-
pivot_longer(
df,
cols = c("b", "c", "d"),
names_to = "id",
values_to = "value"
) %>%
left_join(df_2, by = "id")
I can now make a violin plot (or a boxplot, same issue) with a line overlaid:
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_line(aes(x = id, y = cutoff, group = 1), color = red)
What I'd like though is three lines (don't need to be connected) each of which extend across the entire width of a single violin, at the cutoff value specified in df_2.
I can do this manually with geom_segment, but is there a better, more programmatic way?
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_segment(aes(x = 0.55, xend = 1.45, y = 5, yend = 5), color = "blue") +
geom_segment(aes(x = 1.55, xend = 2.45, y = 3, yend = 3), color = "blue") +
geom_segment(aes(x = 2.55, xend = 3.45, y = 5, yend = 5), color = "blue")
I understand that at some fundamental level the x-axis is ordered by factor level, with b = 1, c = 2 etc., so asking for a line intersecting x = 0.9 would require specifying corresponding y value. In another sense though, ggplot2 clearly knows (in some sense) that the region above x = 0.9 (that is, y values intersected by a vertical line at x = 0.9) is associated with factor level b because the corresponding violin for b overlaps that region. Is there a way to get at that information?
You can use geom_errorbar(). So change your second block to:
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_errorbar(aes(x = id, ymin = cutoff,ymax = cutoff), color = "red")

Add count as label to points in geom_count

I used geom_count to visualise overlaying points as sized groups, but I also want to add the actual count as a label to the plotted points, like this:
However, to achieve this, I had to create a new data frame containing the counts and use these data in geom_text as shown here:
#Creating two data frames
data <- data.frame(x = c(2, 2, 2, 2, 3, 3, 3, 3, 3, 4),
y = c(1, 2, 2, 2, 2, 2, 3, 3, 3, 3),
id = c("a", "b", "b", "b", "c",
"c", "d", "d", "d", "e"))
data2 <- data %>%
group_by(id) %>%
summarise(x = mean(x), y = mean(y), count = n())
# Creating the plot
ggplot(data = data, aes(x = x, y = y)) +
geom_count() +
scale_size_continuous(range = c(10, 15)) +
geom_text(data = data2,
aes(x = x, y = y, label = count),
color = "#ffffff")
Is there any way to achieve this in a more elegant way (i.e. without the need for the second data frame)? I know that you can access the count in geom_count using ..n.., yet if I try to access this in geom_text, this is not working.
Are you expecting this:
ggplot(data %>%
group_by(id) %>%
summarise(x = mean(x), y = mean(y), count = n()),
aes(x = x, y = y)) + geom_point(aes(size = count)) +
scale_size_continuous(range = c(10, 15)) +
geom_text(aes(label = count),
color = "#ffffff")
update:
If the usage of geom_count is must, then the expected output can be achieved using:
p <- ggplot(data = data, aes(x = x, y = y)) +
geom_count() + scale_size_continuous(range = c(10, 15))
p + geom_text(data = ggplot_build(p)$data[[1]],
aes(x, y, label = n), color = "#ffffff")
here would be a solution for a code with discrete values
f<-ggplot(data = STest, aes(x = x, y = y)) + geom_count()+scale_x_discrete(labels = c("strong decrease","decrease","no change","increase","strong increase","no opinion"))+scale_y_discrete(labels = c("strong decrease","decrease","no change","increase","strong increase","no opinion"))
f + geom_text(data = ggplot_build(p)$data[[1]],aes(x, y, label = n,vjust= -2))
Thank you so much!
A much easier way to change this is to use the labs() function so in this case it would be ...labs(size = "Count") + ....
That should be all you need.

Resources