Fill by group is not showing up in ggplot - r

I am trying to make a stacked 100% area chart showing the distribution of two rider types (casual vs member) from hours between 0 and 24. However, my plot does not show up with separate fills for my group.
My table is the following:
start_hour_dist <- clean_trips %>%
group_by(start_hour, member_casual) %>%
summarise(n = n()) %>%
mutate(percentage = n / sum(n))
start_hour_dist table
my code for the plot is the following:
ggplot(start_hour_dist, mapping = aes(x=start_hour, y=percentage, fill=member_casual)) +
geom_area()
However, when I run the plot, my chart does not have the fill and looks like this:
plot
What can I do to make the plot show up something like this?
image from r-graph-gallery
Thanks!
Ben

Your problem is likely the start_hour column being passed as a character vector. Change to an integer first. For example:
library(tidyverse)
df <- tibble(start_hour = sprintf("%02d", rep(0:23, each = 2)),
member_casual = rep(c("member", "casual"), times = 24),
percentage = runif(48))
df |>
ggplot(mapping = aes(
x = start_hour,
y = percentage,
fill = member_casual
)) +
geom_area()
This re-creates your blank graph:
Changing the column type first:
df |>
mutate(start_hour = as.integer(start_hour)) |>
ggplot(mapping = aes(
x = start_hour,
y = percentage,
fill = member_casual
)) +
geom_area(position = "fill")

Related

Plotting a line graph by datetime with a histogram/bar graph by date

I'm relatively new to R and could really use some help with some pretty basic ggplot2 work.
I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.
I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.
Here's what I have so far:
df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))
#sort by datetime
df <- df %>%
arrange(datetime)
#add total number of submissions
df <- df %>%
mutate(total = row_number())
#ggplot
line_plus_histo <- df%>%
ggplot() +
geom_histogram(data = df, aes(x=datetime)) +
geom_line(data = df, aes(x=datetime, y=total), col = "red") +
stat_bin(data = df, aes(x=date), geom = "bar") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
line_plus_histo
As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.
Please, find below the line plus histogram of time v. submissions:
Here's the pastebin link with my data
You can extend your data manipulation by:
df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()
then use midday for bars and datetime for line:
df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
PS. Sorry for Polish locales on X axis.
PS2. With geom_bar it looks much better
Created on 2022-02-03 by the reprex package (v2.0.1)

Reorder vertical axis alphabetically and change position of binary variable of stacked percent bar graph (ggplot2)

I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).
I would like to use ggplot2 to
Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"
My sample code is as follow:
ID <- c("A","A","A","A","A","A",
"B","B","B",
"C","C","C","C","C","C","C",
"D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)
library(ggplot2)
library(dplyr)
library(reshape2)
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected)
df.plot %>%
melt() %>%
ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
coord_flip()
Right now I managed to get this output:
I hope to get this:
Thank you so much!
First, we need to add a count to your original data.frame.
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected,
count = n())
Then, we augment our ID column, turn the Infection Status into a factor variable, use forcats::fct_rev to reverse the ID ordering, and use scale_fill_manual to control your legend.
df.plot %>%
mutate(ID = paste0(ID, " (n=", count, ")")) %>%
select(-count) %>%
melt() %>%
mutate(variable = factor(variable, levels = c("Non_Infected", "Infected"))) %>%
ggplot(aes(x = forcats::fct_rev(ID), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_manual("Infection Status",
values = c("Infected" = "#F8766D", "Non_Infected" = "#00BFC4"),
labels = c("Non-Infected", "Infected"))+
coord_flip()

Back to back bar chart with three levels: Can I center the plot?

I wish to create a back to back bar chart. In my data, I have a number of species observations (n) from 2017 and 2018. Some species occurred only in 2017 other occurred both years and some only occurred in 2018. I wish to depict this in a graph centered around the number of species occurring both years across multiple sites (a,b,c).
First, I create a data set:
n <- sample(1:50, 9)
reg <- c(rep("2017", 3), rep("Both",3), rep("2018", 3))
plot <- c(rep(c("a", "b", "c"), 3))
d4 <- data.frame(n, reg, plot)
I use ggplot to try to plot my graph - I have tried two ways:
library(ggplot2)
ggplot(d4, aes(plot, n, fill = reg)) +
geom_col() +
coord_flip()
ggplot(d4, aes(x = plot, y = n, fill = reg))+
coord_flip()+
geom_bar(stat = "identity", width = 0.75)
I get a plot similar to what I want. However, would like the blue 'both' bar to be in between the 2017 and 2018 bars. Further, my main problem, I would like to center the 'both' bar in the middle of the plot. The 2017 column should extend to the left and the 2018 column to the right. My question is somewhat similar to the one in the link below; however, as I have only three and not four levels in my graph, I cannot use the same approach as below.
Creating a stacked bar chart centered on zero using ggplot
I'm not sure this is the best way to do that, but here is a way to do that:
library(dplyr)
d4pos <- d4 %>%
filter(reg != 2018) %>%
group_by(reg, plot) %>%
summarise(total = sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
d4neg <- d4 %>%
filter(reg != 2017) %>%
group_by(reg, plot) %>%
summarise(total = - sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
coord_flip()
I generate two data frames for the total of each group. One contains the 2017 and (half of) Both, and the other contains the rest. The value for the 2018 data frame is flipped to plot on the negative side.
The output looks like this:
EDIT
If you want to have positive values in both directions for the horizontal axis, you can do something like this:
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
scale_y_continuous(breaks = seq(-50, 50, by = 25),
labels = abs(seq(-50, 50, by = 25))) +
coord_flip()

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

ggplot2() bar chart fill argument

I've got a data frame with two categorical variables called verified and procedure.
I'd like to make a bar chart with procedure on the x-axis, and the corresponding percentages rather than counts on the y-axis. Furthermore, I'd like for verified to be the fill of the bars.
The problem's that when I've tried using the fill argument it hasn't worked. My current code gets me bars that are all grey with a black line (despite the absence of a fill argument the black line seems to indicate the levels of verified???). Instead I'd like the levels to be in different colours.
Thanks!
starting point (df):
df <- data.frame(verified=c("small","large","small","small","large","small","small","large","small"),procedure=c(1,2,1,2,1,2,2,2,2))
current code:
library(dplyr)
library(gglot2)
df %>%
count(procedure,verified) %>%
mutate(prop = round((n / sum(n))*100),2) %>%
group_by(procedure) %>%
ggplot(aes(x = procedure, y = prop)) +
geom_bar(stat = "identity",colour="black")
just add fill = verified to your initial aes or within your geom_bar
# common elements
g_df <- df %>%
count(procedure, verified) %>%
mutate(prop = round((n / sum(n)) * 100), 2) %>%
group_by(procedure)
# fill added to initial aes
g1 <- ggplot(g_df, aes(x = procedure, y = prop, fill = verified)) +
geom_bar(stat = "identity", colour = "black")
# fill added to geom_bar
g2 <- ggplot(aes(x = procedure, y = prop)) +
geom_bar(aes(fill = verified), stat = "identity", colour = "black")
Both g1 and g2 produce the same plot below
As suggested by eipi10 in the comments to my answer, you could clean up the xaxis by making it a factor, a modification of their code below.
df %>%
count(procedure, verified) %>%
mutate(prop = n / sum(n)) %>%
ggplot(aes(x = factor(procedure), y = prop, fill = verified)) +
geom_bar(stat = "identity", colour = "black") +
labs(x = "procedure", y = "percent")
to produce

Resources