Problem with the x-axis labels in ggplot2 using n.breaks - r

I use n.breaks to have a labeled x-axis mark for each cluster this works well for 4, 5, 6 clusters. Now I tried it with two cluster and it does not work anymore.
I build the graphs like this:
country_plot <- ggplot(Data) + aes(x = Cluster) +
theme(legend.title = element_blank(), axis.title.y = element_blank()) +
geom_bar(aes(fill = country), stat = "count", position = "fill", width = 0.85) +
scale_fill_manual(values = color_map_3, drop = F) +
scale_x_continuous(n.breaks = max(unique(Data$Cluster))) + scale_y_continuous(labels = percent) +
ggtitle("Country")
and export it like this:
ggsave("country_plot.png", plot = country_plot, device = "png", width = 16, height = 8, units = "cm")
When it works it looks something like this:
But with two clusters I get something like this with only one mark beyond the actual bars with a 2.5:
I manually checked the return value of
max(unique(Data$Cluster))
and it returns 2 which in my understanding should lead to two x-axis marks with 1 and 2 like it works with more clusters.
edit:
mutate(country = factor(country, levels = 1:3)) %>%
mutate(country =fct_recode(country,!!!country_factor_naming))%>%
mutate(Gender = factor(Gender, levels = 1:2)) %>%
mutate(Gender = fct_recode(Gender, !!!gender_factor_naming))%>%

If I understand correctly the issue is caused by Cluster being treated as continuous variable. It needs to be turned into a factor.
Here is a minimal, reproducible example using the mtcars dataset that reproduces the unwanted behaviour:
First attempt (continuous x-axis)
library(ggplot2)
library(scales)
ggplot(mtcars) +
aes(x = gear, fill = factor(vs)) +
geom_bar(stat = "count", position = "fill", width = 0.85) +
scale_y_continuous(labels = percent)
In this example, gear takes over the role of Cluster and is assigned to the x-axis.
There are unwanted labeled tick marks at x = 2.5, 3.5, 4.5, 5.5 which are due to the continuous scale.
Second attempt (continuous x-axis with n.breaks given)
ggplot(mtcars) +
aes(x = gear, fill = factor(vs)) +
geom_bar(stat = "count", position = "fill", width = 0.85) +
scale_x_continuous(n.breaks = length(unique(mtcars$gear))) +
scale_y_continuous(labels = percent)
Specifying n.breaks in scale_x_continuous() does not change the x-axis to discrete.
Third attempt (discrete x-axis, gear as factor)
When gear is turned into a factor, we get a labeled tick mark for each factor value;
ggplot(mtcars) +
aes(x = factor(gear), fill = factor(vs)) +
geom_bar(stat = "count", position = "fill", width = 0.85) +
scale_y_continuous(labels = percent)

Related

ggplot2 with side by side and proportional fill

I have data that looks like this:
My goal is to have a barplot grid as follows: Each plot will be specific to 1 race_ethnicity group. The x-axis in each plot will be the different age_bin groups. For each age_bin, there will be two bars: 1 for men, and 1 for women. For each bar, I want it to be filled with the proportion of Likely/(Unlikely + Likely). Preferably, each bar would have a height of 1 and a line cut through it so Likely% of that bar is one color with a label. This is what I currently have:
I am running into issues with 1) using a predefined proportion as the fill, and 2) having two different "fills" (one for biological sex, one for the predefined proportion.
Thanks to anyone who can help with this. My code is currently the following:
ggplot(data=who_votes_data, aes(x=age_bin,y=1, fill=gender)) +
geom_bar(stat='identity',aes(fill = gender), position = position_dodge2()) +
facet_wrap(~race_ethnicity, nrow = 2, scales = "free") +
geom_text(aes(label=paste0(sprintf("%1.1f", prop*100),"%"), y=prop),
colour="white") +
labs(x = expression("Age Group"), y= ("Prortion of Likely Voters"),
title = "Proportion of Likely Voters Across Age Groups, Race/Ethnicity, and Sex",
caption="Figure 1") + theme(plot.caption = element_text(hjust = 0.5, vjust = -0.5, size = 18))
https://docs.google.com/spreadsheets/d/1a7433iwXNSwcuXDJOvqsxNDN6oaYULVlyw22E41JROU/edit?usp=sharing
Updated Code:
library(tidyverse)
library(ggplot2)
df<- read.csv("samplevotes.csv")
df %>%
group_by(race_ethnicity, age_bin, gender) %>%
summarise(Likely = sum(Likely),
Unlikely = sum(Unlikely),
proportion = Likely/(Likely+Unlikely)) %>% ungroup() %>%
ggplot(aes(x = age_bin, y = proportion, fill = gender)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~race_ethnicity, nrow = 2) +
geom_text(aes(label=paste0(sprintf("%1.1f", proportion*100),"%"), y=proportion), position = position_dodge(width = 1), colour="Black", size = 2.2) +
labs(x = expression("Age Group"), y= ("Proportion of Likely Voters"), title = "Proportion of Likely Voters Across Age Groups, Race/Ethnicity, and Sex", caption="Figure 1") +
theme(plot.caption = element_text(hjust = 0.5, vjust = -0.5, size = 18))
Here is the code I would use. I did make some changes based on the way the data was combined.
df %>%
group_by(race_ethnicity, age_bin, gender) %>%
summarise(Likely = sum(Likely),
Unlikely = sum(Unlikely),
proportion = Likely/(Likely+Unlikely)) %>% ungroup() %>%
ggplot(aes(x = age_bin, y = proportion, fill = gender)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~race_ethnicity, nrow = 2) +
geom_text(aes(label=paste0(sprintf("%1.1f", proportion*100),"%"), y=proportion), position = position_dodge(width = 1), colour="Black", size = 2.2) +
labs(x = expression("Age Group"), y= ("Proportion of Likely Voters"), title = "Proportion of Likely Voters Across Age Groups, Race/Ethnicity, and Sex", caption="Figure 1") +
theme(plot.caption = element_text(hjust = 0.5, vjust = -0.5, size = 18))
Here is what it looks like

geom_bar not displaying mean values

I'm currently trying to plot mean values of a variable pt for each combination of species/treatments in my experiments. This is the code I'm using:
ggplot(data = data, aes(x=treat, y=pt, fill=species)) +
geom_bar(position = "dodge", stat="identity") +
labs(x = "Treatment",
y = "Proportion of Beetles on Treated Side",
colour = "Species") +
theme(legend.position = "right")
As you can see, the plot seems to assume the mean of my 5N and 95E treatments are 1.00, which isn't correct. I have no idea where the problem could be here.
Took a stab at what you are asking using tidyverse and ggplot2 which is in tidyverse.
dat %>%
group_by(treat, species) %>%
summarise(mean_pt = mean(pt)) %>%
ungroup() %>%
ggplot(aes(x = treat, y = mean_pt, fill = species, group = species)) +
geom_bar(position = "dodge", stat = "identity")+
labs(x = "Treatment",
y = "Proportion of Beetles on Treated Side",
colour = "Species") +
theme(legend.position = "right") +
geom_text(aes(label = round(mean_pt, 3)), size = 3, hjust = 0.5, vjust = 3, position = position_dodge(width = 1))
dat is the actual dataset. and I calculated the mean_pt as that is what you are trying to plot. I also added a geom_text piece just so you can see what the results were and compare them to your thoughts.
From my understanding, this won't plot the means of your y variable by default. Have you calculated the means for each treatment? If not, I'd recommend adding a column to your dataframe that contains the mean. I'm sure there's an easier way to do this, but try:
data$means <- rep(NA, nrow(data))
for (x in 1:nrow(data)) {
#assuming "treat" column is column #1 in your data fram
data[x,ncol(data)] <- mean(which(data[,1]==data[x,1]))
}
Then try replacing
geom_bar(position = "dodge", stat="identity")
with
geom_col(position = "dodge")
If your y variable already contains means, simply switching geom_bar to geom_col as shown should work. Geom_bar with stat = "identity" will sum the values rather than return the mean.

ggplot piecharts on a ggmap: labels destroy the small plots

My ggmap on which I would like small piecharts with labels is generated with the code:
p <-
get_googlemap(
"Poland",
maptype = "roadmap",
zoom = 6,
color = "bw",
crop = T,
style = 'feature:all|element:labels|visibility:off' #'feature:administrative.country|element:labels|visibility:off' or 'feature:all|element:labels|visibility:off'
) %>%
ggmap() + coord_cartesian() +
scale_x_continuous(limits = c(14, 24.3), expand = c(0, 0)) +
scale_y_continuous(limits = c(48.8, 55.5), expand = c(0, 0))
I am trying to plot my small ggplot piecharts on a ggmap following the answer
R::ggplot2::geom_points: how to swap points with pie charts?
I prepare data as follows:
df <-
df %>% mutate(Ours = Potential * MS, Others = Potential - Ours) %>%
na.omit() %>% filter(Potential > 0) %>%
select(-L.p., -MS) %>%
group_by(Miasto) %>%
summarise_each_(vars = c("Potential", "Ours", "Others"),
funs = funs(Sum = "sum")) %>%
left_join(coordinatesTowns, by = c("Miasto" = "address")) %>%
distinct(Miasto, .keep_all = T) %>%
select(-X) %>% ungroup()
df <-df %>% gather(key=component, value=sales, c(Ours_Sum,Others_Sum)) %>%
group_by(lon, lat,Potential_Sum)
My data looks then like
tibble::tribble(
~Miasto, ~Potential_Sum, ~lon, ~lat, ~component, ~sales,
"Bialystok", 100, 23.16433, 53.13333, "Ours_Sum", 70,
"Bialystok", 100, 23.16433, 53.13333, "Others_Sum", 30,
"Bydgoszcz", 70, 18.00762, 53.1235, "Ours_Sum", 0,
"Bydgoszcz", 70, 18.00762, 53.1235, "Others_Sum", 70,
"Gdansk", 50, 18.64637, 54.35205, "Ours_Sum", 25,
"Gdansk", 50, 18.64637, 54.35205, "Others_Sum", 75,
"Katowice", 60, 19.02754, 50.25842, "Ours_Sum", 20,
"Katowice", 60, 19.02754, 50.25842, "Others_Sum", 40
)
The last line group_by is essential for generating plots that will be pasted into my map. (I suspected maybe here is the reason of my problems described below).
Instead of totals, I would like to provide labels for each share in a piechart
In this answer I found the syntax, that should add labels to the piecharts https://stackoverflow.com/a/22804400/3480717
Below is the syntax in my script the line with geom_text (commented with hash) if uncommented, causes my plots to disappear and a long list (16 entries) for all small plots, of warnings:
1: Removed 1 rows containing missing values (geom_col).
I presume the reason can be in the last line of preparing the data, grouping it for the plotting.
The line I mark with a hash is a problem. If I put the hash plots are correct, if I include it, trying to get the desired labels on the slices, plots disappear or are very narrow vertical slices.
df.grobs <- df %>%
do(subplots = ggplot(., aes(1, sales, fill = component)) +
geom_bar(position = "fill", alpha = 0.5, colour = "white", stat="identity") +
# geom_text( aes(label = round(sales), y=sales), position = position_stack(vjust = 0.5), size = 2.5) +
coord_polar(theta = "y") +
scale_fill_manual(values = c("green", "red"))+
theme_void()+ guides(fill = F)) %>%
mutate(subgrobs = list(annotation_custom(ggplotGrob(subplots),
x = lon-Potential_Sum/300, y = lat-Potential_Sum/300,
xmax = lon+Potential_Sum/300, ymax = lat+Potential_Sum/300)))
df.grobs
df.grobs %>%
{p +
.$subgrobs +
geom_col(data = df,
aes(0,0, fill = component),
colour = "white")+ geom_text(data=df, aes(label = Miasto),nudge_y = -0.15, size=2.5)}
Why is the line marked with a hash (if uncommented) destroying the plot instead of adding labels? It seems to completely redefine aesthetics.
EDIT: I modified the marked line, now label=sales and y=sales. Now if I comment the line, the plots are generated, if I uncomment it, the labels are generated in correct position but without plots. Why I cannot have both?
Short answer:
I think the problem is actually in your earlier line:
geom_bar(position = "fill", alpha = 0.5, colour = "white", stat="identity") +
If you change the position from fill to stack (i.e. the default), it should work properly (at least it did on mine).
Long(-winded) explanation:
Let's use a summarised version of the mtcars dataset to reproduce the problem:
dfm <- mtcars %>% group_by(cyl) %>% summarise(disp = mean(disp)) %>% ungroup()
# correct pie chart
ggplot(dfm, aes(x = 1, y = disp, label = factor(cyl), fill = factor(cyl))) +
geom_bar(stat = "identity", position = "stack") +
geom_text(position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") + theme_void()
# "empty" pie chart
ggplot(dfm, aes(x = 1, y = disp, label = factor(cyl), fill = factor(cyl))) +
geom_bar(stat = "identity", position = "fill") +
geom_text(position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") + theme_void()
Why does changing geom_bar's position affect this? If we look at the plot before the coord_polar step, things may become clearer:
ggplot(dfm, aes(x = 1, y = disp, label = factor(cyl), fill = factor(cyl))) +
geom_bar(stat = "identity", position = "stack") +
geom_text(position = position_stack(vjust = 0.5))
Check the bar chart's y-axis. The bars & the labels are correctly positioned.
Now the version with position = "fill":
ggplot(dfm, aes(x = 1, y = disp, label = factor(cyl), fill = factor(cyl))) +
geom_bar(stat = "identity", position = "fill") +
geom_text(position = position_stack(vjust = 0.5))
Your bar chart now occupies the range 0-1 on the y-axis, while your labels continue to occupy the original full range, which is much larger. Thus when you convert the chart to polar coordinates, the bar chart is squeezed to a tiny slice that becomes practically invisible.

How to separately label and scale double y-axis in ggplot2?

I have a test dataset like this:
df_test <- data.frame(
proj_manager = c('Emma','Emma','Emma','Emma','Emma','Alice','Alice'),
proj_ID = c(1, 2, 3, 4, 5, 6, 7),
stage = c('B','B','B','A','C','A','C'),
value = c(15,15,20,20,20,70,5)
)
Preparation for viz:
input <- select(df_test, proj_manager, proj_ID, stage, value) %>%
filter(proj_manager=='Emma') %>%
do({
proj_value_by_manager = sum(distinct(., proj_ID, value)$value);
mutate(., proj_value_by_manager = proj_value_by_manager)
}) %>%
group_by(stage) %>%
do({
sum_value_byStage = sum(distinct(.,proj_ID,value)$value);
mutate(.,sum_value_byStage= sum_value_byStage)
}) %>%
mutate(count_proj = length(unique(proj_ID)))
commapos <- function(x, ...) {
format(abs(x), big.mark = ",", trim = TRUE,
scientific = FALSE, ...) }
Visualization:
ggplot (input, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-proj_value_by_manager),
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage), hjust = 5) +
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
My questions are:
Why is the y-values not showing up right? e.g. C is labeled 20, but nearing hitting 100 on the scale.
How to adjust the position of labels so that it sits on the top of its bar?
How to re-scale the y axis so that both the very short bar of 'count of project' and long bar of 'Project value' can be well displayed?
Thank you all for the help!
I think your issues are coming from the fact that:
(1) Your dataset has duplicated values. This causes geom_bar to add all of them together. For example there are 3 obs for B where proj_value_by_manager = 90 which is why the blue bar extends to 270 for that group (they all get added).
(2) in your second geom_bar you use y = -proj_value_by_manager but in the geom_text to label this you use sum_value_byStage. That's why the blue bar for A is extending to 90 (since proj_value_by_manager is 90) but the label reads 20.
To get you what I believe the chart you want is you could do:
#Q1: No dupe dataset so it doesnt erroneous add columns
input2 <- input[!duplicated(input[,-c(2,4)]),]
ggplot (input2, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-sum_value_byStage), #Q1: changed so this y-value matches your label
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage, y = -sum_value_byStage), hjust = 1) + #Q2: Added in y-value for label and hjust so it will be on top
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
For your last question, there is no good way to display both of these. One option would be to rescale the small data and still label it with a 1 or 3. However, I didn't do this because once you scale down the blue bars the other bars look OK to me.

How to add percentage or count labels above percentage bar plot?

Using ggplot2 1.0.0, I followed the instructions in below post to figure out how to plot percentage bar plots across factors:
Sum percentages for each facet - respect "fill"
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
library(ggplot2)
library(scales)
ggplot(test, aes(x= test2, group = test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
However, I cannot seem to get a label for either the total count or the percentage above each of the bar plots when using geom_text.
What is the correct addition to the above code that also preserves the percentage y-axis?
Staying within ggplot, you might try
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
geom_text(aes( label = format(100*..density.., digits=2, drop0trailing=TRUE),
y= ..density.. ), stat= "bin", vjust = -.5) +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
For counts, change ..density.. to ..count.. in geom_bar and geom_text
UPDATE for ggplot 2.x
ggplot2 2.0 made many changes to ggplot including one that broke the original version of this code when it changed the default stat function used by geom_bar ggplot 2.0.0. Instead of calling stat_bin, as before, to bin the data, it now calls stat_count to count observations at each location. stat_count returns prop as the proportion of the counts at that location rather than density.
The code below has been modified to work with this new release of ggplot2. I've included two versions, both of which show the height of the bars as a percentage of counts. The first displays the proportion of the count above the bar as a percent while the second shows the count above the bar. I've also added labels for the y axis and legend.
library(ggplot2)
library(scales)
#
# Displays bar heights as percents with percentages above bars
#
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
geom_text(aes( label = scales::percent(..prop..),
y= ..prop.. ), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="test2") +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
#
# Displays bar heights as percents with counts above bars
#
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
geom_text(aes(label = ..count.., y= ..prop..), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="test2") +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
The plot from the first version is shown below.
This is easier to do if you pre-summarize your data. For example:
library(ggplot2)
library(scales)
library(dplyr)
set.seed(25)
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
# Summarize to get counts and percentages
test.pct = test %>% group_by(test1, test2) %>%
summarise(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(test.pct, aes(x=test2, y=pct, colour=test2, fill=test2)) +
geom_bar(stat="identity") +
facet_grid(. ~ test1) +
scale_y_continuous(labels=percent, limits=c(0,0.27)) +
geom_text(data=test.pct, aes(label=paste0(round(pct*100,1),"%"),
y=pct+0.012), size=4)
(FYI, you can put the labels inside the bar as well, for example, by changing the last line of code to this: y=pct*0.5), size=4, colour="white"))
I've used all of your code and came up with this. First assign your ggplot to a variable i.e. p <- ggplot(...) + geom_bar(...) etc. Then you could do this. You don't need to summarize much since ggplot has a build function that gives you all of this already. I'll leave it to you for the formatting and such. Good luck.
dat <- ggplot_build(p)$data %>% ldply() %>% select(group,density) %>%
do(data.frame(xval = rep(1:6, times = 2),test1 = mapvalues(.$group, from = c(1,2), to = c("a","b")), density = .$density))
p + geom_text(data=dat, aes(x = xval, y = (density + .02), label = percent(density)), colour="black", size = 3)

Resources