I've got a data frame with two categorical variables called verified and procedure.
I'd like to make a bar chart with procedure on the x-axis, and the corresponding percentages rather than counts on the y-axis. Furthermore, I'd like for verified to be the fill of the bars.
The problem's that when I've tried using the fill argument it hasn't worked. My current code gets me bars that are all grey with a black line (despite the absence of a fill argument the black line seems to indicate the levels of verified???). Instead I'd like the levels to be in different colours.
Thanks!
starting point (df):
df <- data.frame(verified=c("small","large","small","small","large","small","small","large","small"),procedure=c(1,2,1,2,1,2,2,2,2))
current code:
library(dplyr)
library(gglot2)
df %>%
count(procedure,verified) %>%
mutate(prop = round((n / sum(n))*100),2) %>%
group_by(procedure) %>%
ggplot(aes(x = procedure, y = prop)) +
geom_bar(stat = "identity",colour="black")
just add fill = verified to your initial aes or within your geom_bar
# common elements
g_df <- df %>%
count(procedure, verified) %>%
mutate(prop = round((n / sum(n)) * 100), 2) %>%
group_by(procedure)
# fill added to initial aes
g1 <- ggplot(g_df, aes(x = procedure, y = prop, fill = verified)) +
geom_bar(stat = "identity", colour = "black")
# fill added to geom_bar
g2 <- ggplot(aes(x = procedure, y = prop)) +
geom_bar(aes(fill = verified), stat = "identity", colour = "black")
Both g1 and g2 produce the same plot below
As suggested by eipi10 in the comments to my answer, you could clean up the xaxis by making it a factor, a modification of their code below.
df %>%
count(procedure, verified) %>%
mutate(prop = n / sum(n)) %>%
ggplot(aes(x = factor(procedure), y = prop, fill = verified)) +
geom_bar(stat = "identity", colour = "black") +
labs(x = "procedure", y = "percent")
to produce
Related
I have a dataset that has the variables "SEXO" (M or F) and "Class" (0 or 1). I want to create a bar plot using ggplot2 that shows, for each sex, the distribution of Class as a percentage. I was able to get the plot, but I can't seem to get the labels working on the bars itself. I don't want to change the labels on the axis, I just want to get the % shown on the plot for each SEXO.
This is the code I have been using:
ggplot(data = df, aes(x = SEXO, fill = Class)) + geom_bar(position = 'fill')
I also attach an image of the plot produced by the code:
This would be the ideal outcome:
Here an example using the mtcars dataset where you can calculate the percentage per group and use these to place in your bars using label with geom_text like this:
library(ggplot2)
library(dplyr)
mtcars %>%
group_by(am, vs) %>%
summarise(cnt = n()) %>%
mutate(perc = round(cnt/sum(cnt), 2)) %>%
ggplot(aes(x = factor(vs), fill = factor(am), y = perc)) +
geom_col(position = 'fill') +
geom_text(aes(label = paste0(perc*100,"%"), y = perc), position = position_stack(vjust = 0.5), size = 3) +
labs(fill = 'Class', x = 'vs') +
scale_y_continuous(limits = c(0,1))
#> `summarise()` has grouped output by 'am'. You can override using the `.groups`
#> argument.
Created on 2022-11-02 with reprex v2.0.2
Is there a way to order the bars in geom_bar() when y is just the count of x?
Example:
ggplot(dat) +
geom_bar(aes(x = feature_1))
I tried using reorder() but it requires a defined y variable within aes().
Made up data:
dfexmpl <- data.frame(stringsAsFactors = FALSE,
group = c("a","a","a","a","a","a",
"a","a","a","b","b","b","b","b","b","b","b","b",
"b","b","b","b","b","b"))
plot code - reorder is doing the work of arranging by count:
dfexmpl %>%
group_by(group) %>%
mutate(count = n()) %>%
ggplot(aes(x = reorder(group, -count), y = count)) +
geom_bar(stat = "identity")
results in:
I needed to add some partial boxplots to the following plot:
library(tidyverse)
foo <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(value = rnorm(n()) + 10 * as.integer(group)) %>%
ungroup()
foo %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE)
I would add a grid of (2 x 4 = 8) boxplots (4 per group) to the plot above. Each boxplot should consider a consecutive selection of 25 (or n) points (in each group). I.e., the firsts two boxplots represent the points between the 1st and the 25th (one boxplot below for the group a, and one boxplot above for the group b). Next to them, two other boxplots for the points between the 26th and 50th, etcetera. If they are not in a perfect grid (which I suppose would be both more challenging to obtain and uglier) it would be even better: I prefer if they will "follow" their corresponding smooth line!
That all without using facets (because I have to insert them in a plot which is already facetted :-))
I tried to
bar <- foo %>%
group_by(group) %>%
mutate(cut = 12.5 * (time %/% 25)) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(x = cut))
but it doesn't work.
I tried to call geom_boxplot() using group instead of x
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = cut))
But it draws the boxplots without considering the groups and loosing even the colors (and add a redundant call including color = group doesn't help)
Finally, I decided to try it roughly:
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(data = filter(bar, group == "a"), aes(group = cut)) +
geom_boxplot(data = filter(bar, group == "b"), aes(group = cut))
And it works (maintaining even the correct colors from the main aes)!
Does someone know if it is possible to obtain it using a single call to geom_boxplot()?
Thanks!
This was interesting! I haven't tried to use geom_boxplot with a continuous x before and didn't know how it behaved. I think what is happening is that setting group overrides colour in geom_boxplot, so it doesn't respect either the inherited or repeated colour aesthetic. I think this workaround does the trick; we combine the group and cut variables into group_cut, which takes 8 different values (one for each desired boxplot). Now we can map aes(group = group_cut) and get the desired output. I don't think this is particularly intuitive and it might be worth raising it on the Github, since usually we expect aesthetics to combine nicely (e.g. combining colour and linetype works fine).
library(tidyverse)
bar <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(
value = rnorm(n()) + 10 * as.integer(group),
cut = 12.5 * ((time - 1) %/% 25), # modified this to prevent an extra boxplot
group_cut = str_c(group, cut)
) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, colour = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = group_cut), position = "identity")
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2019-08-13 by the reprex package (v0.3.0)
I have a dataset and I need to plot a bar chart of the counts of the different outcomes of a certain column. For this example I am using the mtcars dataset.
When I first attempted this, I found that the labels on the bars were getting cut off at the top, so I used the expand_limits argument to give them more space. As I want to be able to use this code for refreshed data, the limits might change, which is why I've used the max() function.
mtcars_cyl_counts <- as.data.frame(table(mtcars$cyl))
colnames(mtcars_cyl_counts)[1:2] <- c("cyl", "counts")
mtcars_cyl_counts %>%
arrange(desc(counts)) %>%
ggplot(aes(x = reorder(cyl, -counts), y = counts)) +
geom_bar(stat = "identity") +
geom_text(aes(label = comma(counts), vjust = -0.5), size = 3) +
expand_limits(y = max((mtcars_cyl_counts$counts) * 1.05))
This works fine, but it seemed unnecessarily cumbersome to create a separate table of counts, and made some of the future code more complicated, so I redid this:
mtcars %>%
group_by(cyl) %>%
summarize(counts = n()) %>%
arrange(-counts) %>%
mutate(cyl = factor(cyl, cyl)) %>%
ggplot() +
geom_bar(aes(x = cyl, y = counts), stat = "identity") +
geom_text(aes(x = cyl, y = counts, label = comma(counts), vjust = -0.5), size = 3) +
expand_limits(y = max((counts) * 1.05))
However, this returns the following error:
Error in data.frame(..., stringsAsFactors = FALSE) :
object 'counts' not found
I get that 'counts' is not technically in the mtcars dataset (which is why it also doesn't work if I use mtcars$counts), but it's what I've used elsewhere in the code for y definitions.
So, is there a way to write this so that it works, or an alternative way to expand the vertical limits in a way that will adapt to different datasets?
(NB: with these examples, the bar labels don't get cut off because they aren't very big, but for the purpose of this I just need the limits expanded so that detail is not critically important to the working...)
If this helps Megan,
mtcars %>%
count(cyl) %>%
arrange(-n) %>%
mutate(cyl = factor(cyl, cyl)) %>%
ggplot(aes(cyl, n)) +
geom_text(vjust = -0.5, aes(label = n)) +
geom_bar(stat = "identity") +
expand_limits(y = max(table(mtcars$cyl) * 1.05))
I'm trying to change my (stacked) bar width according to the counts (or proportion) of the categories, As an example i used the diamonds dataset. I want to see a varying width according to the frequency of each category (of the variable cut). I first created a variable cut_prop and then plotted with the following code
library(tidyverse)
cut_prop = diamonds %>%
group_by(cut) %>%
summarise(cut_prop = n()/nrow(diamonds))
diamonds = left_join(diamonds, cut_prop)
ggplot(data = diamonds,
aes(x = cut, fill = color)) +
geom_bar(aes(width=cut_prop), position = "fill") +
theme_minimal() +
coord_flip()
Which gave me the following barplot:
R gives a warning which tells: Ignoring unknown aesthetics: width and obviously doesn't take the proportion of categories for the width of the bars into account, anyone who can help me out here? Thanks!
I think this works. Starting where you left off...
df <- diamonds %>%
count(cut, color, cut_prop) %>%
group_by(cut) %>%
mutate(freq = n / sum(n)) %>%
ungroup
ggplot(data = df,
aes(x = cut, fill = color, y = freq, width = cut_prop)) +
geom_bar(stat = "identity") +
theme_minimal() +
coord_flip()
Essentially, I calculate the proportions myself instead of using position = "fill", then use stat = identity rather than stat = count.