Placing tick marks between bars in ggplot2 - r

Using the diamonds data set in the ggplot2 package, I can generate the following chart.
library(ggplot2)
library(dplyr)
diamond.summary <-
diamonds %>%
mutate(carat = ifelse(runif(nrow(.)) < 0.05, NA_real_, carat)) %>%
group_by(carat_quintile = ntile(carat, 5)) %>%
summarise(avg_price = mean(price))
diamond.summary %>%
filter(!is.na(carat_quintile)) %>%
ggplot(aes(carat_quintile, avg_price)) +
geom_bar(stat = "identity",
color = "black",
width = 1) +
scale_x_continuous("Carat percentile",
breaks = 1:6 - 0.5,
labels = seq(0,100, by = 20)) +
scale_y_continuous(expand = c(0,0),
limits = c(0, 1.1* max(diamond.summary$avg_price)))
So far, so easy. However, I would also like to display the average price of the missing entries alongside the chart. Similar to the following:
diamond.summary %>%
mutate(Facet = is.na(carat_quintile),
carat_quintile_noNA = ifelse(Facet, "Unknown", carat_quintile)) %>%
ggplot(aes(x = carat_quintile_noNA, y = avg_price, fill = Facet)) +
geom_bar(stat = "identity") +
facet_grid(~Facet, scales = "free_x", space = "free_x") +
scale_x_discrete(breaks = (0:6) - 0.5)
However, when I try to perform the same trick using scale_x_continuous, I get the error Discrete value supplied to continuous scale. When I try to use scale_x_discrete(breaks = c(0:6 + 0.5)) for example, the axis ticks and labels disappear.
My question is, how can I get the same faceted chart above with the tick marks in the first panel placed as in the first chart in this post? Advice about chart design could be an acceptable solution, but I don't think all problems like this are solvable with a redesign.

The trick is to convert your factor to a numeric, assigning a magic number to the unknown quantity. (ggplot2 will not plot bars with true NA values.) Then use scale_x_continuous
diamond.summary %>%
mutate(Facet = is.na(carat_quintile),
carat_quintile_noNA = ifelse(Facet, "Unknown", carat_quintile),
##
## 99 is a magic number. For our plot, it just has
## to be larger than 5. The value 6 would be a natural
## choice, but this means that the x tick marks would
## overflow ino the 'unknown' facet. You could choose
## choose 7 to avoid this, but any large number works.
## I used 99 to make it clear that it's magic.
numeric = ifelse(Facet, 99, carat_quintile)) %>%
ggplot(aes(x = numeric, y = avg_price, fill = Facet)) +
geom_bar(stat = "identity", width = 1) +
facet_grid(~Facet, scales = "free_x", space = "free_x") +
scale_x_continuous(breaks = c(0:5 + 0.5, 99),
labels = c(paste0(c(0:5) * 20, "%"), "Unknown"))

One solution is to approach a bit differently, and reposition the bars instead of the ticks, using position_nudge.
library(ggplot2)
library(dplyr)
diamond.summary <-
diamonds %>%
mutate(carat = ifelse(runif(nrow(.)) < 0.05, NA_real_, carat)) %>%
group_by(carat_quintile = ntile(carat, 5)) %>%
summarise(avg_price = mean(price))
# nudge bars to the left
diamond.summary %>%
filter(!is.na(carat_quintile)) %>%
ggplot(aes(carat_quintile, avg_price)) +
geom_bar(stat = "identity",
color = "black",
width = 1,
position=position_nudge((x=-1))) +
scale_x_continuous("Carat percentile",
breaks = 1:6 - 0.5,
labels = seq(0,100, by = 20)) +
scale_y_continuous(expand = c(0,0),
limits = c(0, 1.1* max(diamond.summary$avg_price)))
# nudge bars to the right
diamond.summary %>%
filter(!is.na(carat_quintile)) %>%
ggplot(aes(carat_quintile, avg_price)) +
geom_bar(stat = "identity",
color = "black",
width = 1,
position=position_nudge((x=1))) +
scale_x_continuous("Carat percentile",
breaks = 1:6 - 0.5,
labels = seq(0,100, by = 20)) +
scale_y_continuous(expand = c(0,0),
limits = c(0, 1.1* max(diamond.summary$avg_price)))

Related

Change ggplot bar chart fill colors

With this data:
df <- data.frame(value =c(20, 50, 90),
group = c(1, 2,3))
I can get a bar chart:
df %>% ggplot(aes(x = group, y = value, fill = value)) +
geom_col() +
coord_flip()+
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
But I would like to have the colors of those bars to vary according to their corresponding values in value.
I have managed to change them using geom_raster:
ggplot() +
geom_raster(aes(x = c(0:20), y = .9, fill = c(0:20)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:50), y = 2, fill = c(0:50)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:90), y = 3.1, fill = c(0:90)),
interpolate = TRUE) +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
This approach is not efficient when I have many groups in real data. Any suggestions to get it done more efficiently would be appreciated.
I found the accepted answer to a previous similar question, but "These numbers needs to be adjusted depending on the number of x values and range of y". I was looking for an approach that I do not have to adjust numbers based on data. David Gibson's answer fits my purpose.
It does not look like this is supported natively in ggplot. I was able to get something close by adding additional rows, ranging from 0 to value) to the data. Then use geom_tile and separating the tiles by specifying width.
library(tidyverse)
df <- data.frame(value = c(20, 50, 90),
group = c(1, 2, 3))
df_expanded <- df %>%
rowwise() %>%
summarise(group = group,
value = list(0:value)) %>%
unnest(cols = value)
df_expanded %>%
ggplot() +
geom_tile(aes(
x = group,
y = value,
fill = value,
width = 0.9
)) +
coord_flip() +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
If this is too pixilated you can increase the number of rows generated by replacing list(0:value) with seq(0, value, by = 0.1).
This is a real hack using ggforce. This package has a geom that can take color gradients but it is for a line segment. I've just increased the size to make the line segment look like a bar. I made all the bars the same length to get the correct gradient, then covered a portion of each bar over with the same color as the background color to make them appear to be the correct length. Had to hide the grid lines, however. :-)
df %>%
ggplot() +
geom_link(aes(x = 0, xend = max(value), y = group, yend = group, color = stat(index)), size = 30) +
geom_link(aes(x = value, xend = max(value), y = group, yend = group), color = "grey", size = 31) +
scale_color_viridis_c(option = "C") +
theme(legend.position = "none", panel.background = element_rect(fill = "grey"),
panel.grid = element_blank()) +
ylim(0.5, max(df$group)+0.5 )

Shape annotations across facets in ggplotly()

I have a Shiny dashboard which includes a line graph which tracks number of visitors on mon-thur and fri-sun periods per month for three years:
I originally also had an annotation which shaded the parts of the graph which occur during the Covid pandemic in Australia, i.e. 2020-03-01 to present. When ggplotly is called on the ggplot, it strips the annotations out. What I want to do is add the shading from 2020-03-01 to present back in. I've tried adding
%>% layout(
shapes = list(
list(type = "rect",
fillcolor = "blue", line = list(color = "blue"), opacity = 0.9,
x0 = "2020-03-01", x1 = Inf,
y0 = 0, y1 = Inf
)
)
after the ggplotly() call, but it doesn't do anything.
I also tried following the code in this question, but the shading doesn't start at the correct date, and it's also only on the first facet.
Reproducible code example:
date <- c("2019-01-01","2019-01-01","2019-02-01","2019-02-01","2019-03-01","2019-03-01","2019-04-01",
"2019-04-01","2019-05-01","2019-05-01","2019-06-01","2019-06-01","2019-07-01","2019-07-01",
"2019-08-01","2019-08-01","2019-09-01","2019-09-01","2019-10-01","2019-10-01","2019-11-01",
"2019-11-01","2019-12-01","2019-12-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01",
"2020-03-01","2020-03-01","2020-04-01","2020-04-01","2020-05-01","2020-05-01","2020-06-01",
"2020-06-01","2020-07-01","2020-07-01","2020-08-01","2020-08-01","2020-09-01","2020-09-01",
"2020-10-01","2020-10-01","2020-11-01","2020-11-01","2020-12-01","2020-12-01","2021-01-01",
"2021-01-01","2021-02-01","2021-02-01","2021-03-01","2021-03-01","2021-04-01","2021-04-01",
"2021-05-01","2021-05-01","2021-06-01","2021-06-01","2019-01-01","2019-01-01","2019-02-01",
"2019-02-01","2019-03-01","2019-03-01","2019-04-01","2019-04-01","2019-05-01","2019-05-01",
"2019-06-01","2019-06-01","2019-07-01","2019-07-01","2019-08-01","2019-08-01","2019-09-01",
"2019-09-01","2019-10-01","2019-10-01","2019-11-01","2019-11-01","2019-12-01","2019-12-01",
"2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-03-01","2020-03-01","2020-04-01",
"2020-04-01","2020-05-01","2020-05-01","2020-06-01","2020-06-01","2020-07-01","2020-07-01",
"2020-08-01","2020-08-01","2020-09-01","2020-09-01","2020-10-01","2020-10-01","2020-11-01",
"2020-11-01","2020-12-01","2020-12-01","2021-01-01","2021-01-01","2021-02-01","2021-02-01",
"2021-03-01","2021-03-01","2021-04-01","2021-04-01","2021-05-01","2021-05-01","2021-06-01",
"2021-06-01")
location <- rep(c("1001", "1002"), c(60, 60))
daytype <- rep(c("mon-thur", "fri-sat"), 60)
visitors <- c(5694,6829,3087,4247,2814,4187,5310,6408,5519,5934,2817,4080,6762,6595,5339,6669,
4863,6137,8607,11974,4909,9103,7986,9493,15431,13044,6176,5997,6458,7694,5990,5419,
5171,8149,6091,7971,10677,10468,7782,7627,7210,9526,8554,9844,8262,9218,9418,9038,
13031,13418,7408,10621,6908,8122,8851,8861,7940,9179,5992,7026,7939,6923,8209,7815,
8190,7085,9136,7905,9784,8454,9467,9092,9183,8436,9029,8927,8828,8323,7679,7112,
1885,3156,6932,5530,6077,4975,4922,4008,5549,4557,3932,3395,4865,4820,5090,4529,
5407,4262,4858,4200,5101,4761,5108,4413,5209,4116,5405,4445,4140,2985,5589,4684,
5322,4540,4898,4214,5266,4188,5184,4555)
total <- data.frame(location, date, daytype, visitors)
mon_year_vis <- total %>%
ggplot() +
(
mapping =
aes(
x = as.Date(date),
y = visitors,
group = daytype,
color = daytype
)
) +
geom_line() +
geom_point(show.legend = FALSE, size = 1) +
scale_y_continuous(labels = comma) +
facet_wrap( ~location, ncol = 1, scales = "free") +
scale_x_date(date_labels = "%b-%y",
breaks = "3 month",
limits = range)
ggplotly(mon_year_vis)
this task is a bit more complex than it appears to be, since you use the scales_free argument in the facet_wrap call. Because of this you need a little helper that holds none global limits of the shaded areas and work with ggplot2::geom_rect else you could use ggplot2::annotate (for completeness I will list this option also). It is important to bear in mind that plotly seems to have issues with INF as limitations for coordinates when using plotly::ggplotly at least. (I will omit the lines until the declaration of your total variable)
# libraries needed to make things work
library(dplyr)
library(ggplot2)
library(plotly)
library(scales)
ggplot2::geom_rect
# needed for coordinates of shadowed area
helper <- total %>%
dplyr::group_by(location) %>%
dplyr::summarise(mv = max(visitors) , md = max(as.Date(date))) %>%
dplyr::ungroup()
mon_year_vis <- total %>%
ggplot() +
(
mapping =
aes(
x = as.Date(date),
y = visitors,
group = daytype,
color = daytype
)
) +
# insert the geom_rect before the lines so that plotly gets the layer order right
geom_rect(data = helper, aes(xmin = as.Date("2020-03-01"), xmax = md, ymin = 0, ymax = mv), alpha = 0.3, fill="blue", inherit.aes = FALSE) +
geom_line() +
geom_point(show.legend = FALSE, size = 1) +
scale_y_continuous(labels = comma) +
facet_wrap( ~location, ncol = 1, scales = "free") +
scale_x_date(date_labels = "%b-%y",
breaks = "3 month",
limits = range)
ggplotly(mon_year_vis)
ggplot2::annotate
mon_year_vis2 <- total %>%
group_by(daytype) %>%
mutate(maxy = max(visitors)) %>%
ggplot() +
(
mapping =
aes(
x = as.Date(date),
y = visitors,
group = daytype,
color = daytype
)
) +
# insert the annotate before the lines so that plotly gets the layer order right
annotate("rect", xmin=as.Date("2020-03-01"), xmax=max(as.Date(date)), ymin=0, ymax=max(visitors), alpha=0.2, fill="blue") +
geom_line() +
geom_point(show.legend = FALSE, size = 1) +
scale_y_continuous(labels = comma) +
facet_wrap( ~location, ncol = 1, scales = "free") +
scale_x_date(date_labels = "%b-%y",
breaks = "3 month",
limits = range)
ggplotly(mon_year_vis2)
The two resources I have used are: 1 2

ggplot piecharts on a ggmap: labels destroy the small plots

My ggmap on which I would like small piecharts with labels is generated with the code:
p <-
get_googlemap(
"Poland",
maptype = "roadmap",
zoom = 6,
color = "bw",
crop = T,
style = 'feature:all|element:labels|visibility:off' #'feature:administrative.country|element:labels|visibility:off' or 'feature:all|element:labels|visibility:off'
) %>%
ggmap() + coord_cartesian() +
scale_x_continuous(limits = c(14, 24.3), expand = c(0, 0)) +
scale_y_continuous(limits = c(48.8, 55.5), expand = c(0, 0))
I am trying to plot my small ggplot piecharts on a ggmap following the answer
R::ggplot2::geom_points: how to swap points with pie charts?
I prepare data as follows:
df <-
df %>% mutate(Ours = Potential * MS, Others = Potential - Ours) %>%
na.omit() %>% filter(Potential > 0) %>%
select(-L.p., -MS) %>%
group_by(Miasto) %>%
summarise_each_(vars = c("Potential", "Ours", "Others"),
funs = funs(Sum = "sum")) %>%
left_join(coordinatesTowns, by = c("Miasto" = "address")) %>%
distinct(Miasto, .keep_all = T) %>%
select(-X) %>% ungroup()
df <-df %>% gather(key=component, value=sales, c(Ours_Sum,Others_Sum)) %>%
group_by(lon, lat,Potential_Sum)
My data looks then like
tibble::tribble(
~Miasto, ~Potential_Sum, ~lon, ~lat, ~component, ~sales,
"Bialystok", 100, 23.16433, 53.13333, "Ours_Sum", 70,
"Bialystok", 100, 23.16433, 53.13333, "Others_Sum", 30,
"Bydgoszcz", 70, 18.00762, 53.1235, "Ours_Sum", 0,
"Bydgoszcz", 70, 18.00762, 53.1235, "Others_Sum", 70,
"Gdansk", 50, 18.64637, 54.35205, "Ours_Sum", 25,
"Gdansk", 50, 18.64637, 54.35205, "Others_Sum", 75,
"Katowice", 60, 19.02754, 50.25842, "Ours_Sum", 20,
"Katowice", 60, 19.02754, 50.25842, "Others_Sum", 40
)
The last line group_by is essential for generating plots that will be pasted into my map. (I suspected maybe here is the reason of my problems described below).
Instead of totals, I would like to provide labels for each share in a piechart
In this answer I found the syntax, that should add labels to the piecharts https://stackoverflow.com/a/22804400/3480717
Below is the syntax in my script the line with geom_text (commented with hash) if uncommented, causes my plots to disappear and a long list (16 entries) for all small plots, of warnings:
1: Removed 1 rows containing missing values (geom_col).
I presume the reason can be in the last line of preparing the data, grouping it for the plotting.
The line I mark with a hash is a problem. If I put the hash plots are correct, if I include it, trying to get the desired labels on the slices, plots disappear or are very narrow vertical slices.
df.grobs <- df %>%
do(subplots = ggplot(., aes(1, sales, fill = component)) +
geom_bar(position = "fill", alpha = 0.5, colour = "white", stat="identity") +
# geom_text( aes(label = round(sales), y=sales), position = position_stack(vjust = 0.5), size = 2.5) +
coord_polar(theta = "y") +
scale_fill_manual(values = c("green", "red"))+
theme_void()+ guides(fill = F)) %>%
mutate(subgrobs = list(annotation_custom(ggplotGrob(subplots),
x = lon-Potential_Sum/300, y = lat-Potential_Sum/300,
xmax = lon+Potential_Sum/300, ymax = lat+Potential_Sum/300)))
df.grobs
df.grobs %>%
{p +
.$subgrobs +
geom_col(data = df,
aes(0,0, fill = component),
colour = "white")+ geom_text(data=df, aes(label = Miasto),nudge_y = -0.15, size=2.5)}
Why is the line marked with a hash (if uncommented) destroying the plot instead of adding labels? It seems to completely redefine aesthetics.
EDIT: I modified the marked line, now label=sales and y=sales. Now if I comment the line, the plots are generated, if I uncomment it, the labels are generated in correct position but without plots. Why I cannot have both?
Short answer:
I think the problem is actually in your earlier line:
geom_bar(position = "fill", alpha = 0.5, colour = "white", stat="identity") +
If you change the position from fill to stack (i.e. the default), it should work properly (at least it did on mine).
Long(-winded) explanation:
Let's use a summarised version of the mtcars dataset to reproduce the problem:
dfm <- mtcars %>% group_by(cyl) %>% summarise(disp = mean(disp)) %>% ungroup()
# correct pie chart
ggplot(dfm, aes(x = 1, y = disp, label = factor(cyl), fill = factor(cyl))) +
geom_bar(stat = "identity", position = "stack") +
geom_text(position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") + theme_void()
# "empty" pie chart
ggplot(dfm, aes(x = 1, y = disp, label = factor(cyl), fill = factor(cyl))) +
geom_bar(stat = "identity", position = "fill") +
geom_text(position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") + theme_void()
Why does changing geom_bar's position affect this? If we look at the plot before the coord_polar step, things may become clearer:
ggplot(dfm, aes(x = 1, y = disp, label = factor(cyl), fill = factor(cyl))) +
geom_bar(stat = "identity", position = "stack") +
geom_text(position = position_stack(vjust = 0.5))
Check the bar chart's y-axis. The bars & the labels are correctly positioned.
Now the version with position = "fill":
ggplot(dfm, aes(x = 1, y = disp, label = factor(cyl), fill = factor(cyl))) +
geom_bar(stat = "identity", position = "fill") +
geom_text(position = position_stack(vjust = 0.5))
Your bar chart now occupies the range 0-1 on the y-axis, while your labels continue to occupy the original full range, which is much larger. Thus when you convert the chart to polar coordinates, the bar chart is squeezed to a tiny slice that becomes practically invisible.

How to separately label and scale double y-axis in ggplot2?

I have a test dataset like this:
df_test <- data.frame(
proj_manager = c('Emma','Emma','Emma','Emma','Emma','Alice','Alice'),
proj_ID = c(1, 2, 3, 4, 5, 6, 7),
stage = c('B','B','B','A','C','A','C'),
value = c(15,15,20,20,20,70,5)
)
Preparation for viz:
input <- select(df_test, proj_manager, proj_ID, stage, value) %>%
filter(proj_manager=='Emma') %>%
do({
proj_value_by_manager = sum(distinct(., proj_ID, value)$value);
mutate(., proj_value_by_manager = proj_value_by_manager)
}) %>%
group_by(stage) %>%
do({
sum_value_byStage = sum(distinct(.,proj_ID,value)$value);
mutate(.,sum_value_byStage= sum_value_byStage)
}) %>%
mutate(count_proj = length(unique(proj_ID)))
commapos <- function(x, ...) {
format(abs(x), big.mark = ",", trim = TRUE,
scientific = FALSE, ...) }
Visualization:
ggplot (input, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-proj_value_by_manager),
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage), hjust = 5) +
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
My questions are:
Why is the y-values not showing up right? e.g. C is labeled 20, but nearing hitting 100 on the scale.
How to adjust the position of labels so that it sits on the top of its bar?
How to re-scale the y axis so that both the very short bar of 'count of project' and long bar of 'Project value' can be well displayed?
Thank you all for the help!
I think your issues are coming from the fact that:
(1) Your dataset has duplicated values. This causes geom_bar to add all of them together. For example there are 3 obs for B where proj_value_by_manager = 90 which is why the blue bar extends to 270 for that group (they all get added).
(2) in your second geom_bar you use y = -proj_value_by_manager but in the geom_text to label this you use sum_value_byStage. That's why the blue bar for A is extending to 90 (since proj_value_by_manager is 90) but the label reads 20.
To get you what I believe the chart you want is you could do:
#Q1: No dupe dataset so it doesnt erroneous add columns
input2 <- input[!duplicated(input[,-c(2,4)]),]
ggplot (input2, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-sum_value_byStage), #Q1: changed so this y-value matches your label
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage, y = -sum_value_byStage), hjust = 1) + #Q2: Added in y-value for label and hjust so it will be on top
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
For your last question, there is no good way to display both of these. One option would be to rescale the small data and still label it with a 1 or 3. However, I didn't do this because once you scale down the blue bars the other bars look OK to me.

Create abbreviated legends manually for long X labels in ggplot2

I would like to create a simple bar chart with ggplot2 and my problem is that my x variable contains long strings so the labels are overlaid.
Here are fake datas and the plot :
library(dplyr)
library(tidyr)
library(ggplot2)
set.seed(42)
datas <- data.frame(label = sprintf("aLongLabel%d", 1:8),
ok = sample(seq(0, 1, by = 0.1), 8, rep = TRUE)) %>%
mutate(err = abs(ok - 1)) %>%
gather(type, freq, ok, err)
datas %>%
ggplot(aes(x = label, y = freq)) +
geom_bar(aes(fill = type), stat = "identity")
I would like to replace the labels by shorter ones and create a legend to show the matches.
What I've tried :
I use the shape aes parameter in geo_point which will create a legend with shapes (and plots shapes that I hide with alpha = 0). Then I change the shapes with scale_shape_manual and replace the x labels with scale_x_discrete. With guides I override the alpha parameter of my shapes so they wont be invisible in the legend.
leg.txt <- levels(datas$label)
x.labels <- structure(LETTERS[seq_along(leg.txt)],
.Names = leg.txt)
datas %>%
ggplot(aes(x = label, y = freq)) +
geom_bar(aes(fill = type), stat = "identity") +
geom_point(aes(shape = label), alpha = 0) +
scale_shape_manual(name = "Labels", values = x.labels) +
guides(shape = guide_legend(override.aes = list(size = 5, alpha = 1))) +
scale_x_discrete(name = "Label", labels = x.labels)
It gives me the expected output but I feel like this is very hacky.
Does ggplot2 provides a way to do this more directly ? Thanks.
Rotation solution suggested by Pascal
Rotate the labels and align them to the edge :
datas %>%
ggplot(aes(x = label, y = freq)) +
geom_bar(aes(fill = type), stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

Resources