Adding points to stacked barplot based on conditions in ggplot - r

I have a stacked bar plot which shows the frequency of size distributions of different kinds of buildings across several companies.
I am replicating this chart for each company in the dataset and for each company would like to add a dot in the stacks that represent the sizes of the buildings in their company. I was able to set up bar chart and add points to the graph. However, I’m unable to figure out how to place the dots in the correct places.
Here’s my code:
Data<-data.frame(Location = c("HQ", "Plant", "Warehouse", "Office", "HQ","Plant",
"Warehouse","Office","HQ","Plant","Warehouse","Office"),
Company=c("a","a","a","a","b","b","b","b","c","c","c","c"),
Staff=c("Small","Medium","Large","Medium","Small","Medium","Medium","Large","Large","Large","Small","Small"))
ggplot(Data,aes(x=Location,fill=Staff))+
geom_bar(position = 'fill')+
geom_point(aes(y = stat(..count../sum(..count..)),
color = Staff), stat = 'count',
position = position_fill(0.5), size = 5)+
scale_color_manual(values = 'black', limits = "Medium")
Here’s what I have so far.
I would like to figure out how to do this for company "a" so the chart looks something like this:
I’m thinking to start I would need to create a vector that shows the staff bins for that company: Point<-as.character(Data$Staff[Data$Company=="a"])

The following should fit your requirements. (I've also wrapped it in a function for convenience in switching between companies.)
library(dplyr)
show.company.point <- function(company.name) {
p <- ggplot(Data,
aes(x = Location, fill = Staff))+
geom_bar(position = 'fill')+
geom_point(data = . %>%
mutate(alpha = Company == company.name) %>%
group_by(Location, Staff) %>%
summarise(n = n(),
alpha = any(alpha)) %>%
ungroup(),
aes(y = n, alpha = alpha, group = Staff),
position = position_fill(0.5, reverse = F),
size = 5, show.legend = F) +
ggtitle(paste("Company", company.name)) +
scale_alpha_identity()
return(p)
}
show.company.point("a")
show.company.point("b")
show.company.point("c")

Related

How to insert color code for two geom_step functions in the same grid

I am currently working in a comparison between two inventory levels and I want to plot two step graphs in the same grid with a color code. This is my code.
Intento1<-data.frame(Fecha, NivelI)
Intento2<-data.frame(Fecha, Nivel2)
#Printing the step graphs in one grid
ggplot()+geom_step(Intento1, mapping=aes(x=Fecha, y=NivelI))+geom_step(Intento2, mapping=aes(x=Fecha, y=Nivel2))
And it works fine plotting both graphs in the same grid, I could also add a different color to each graph but I couldn´t add the little colored labels that appear normally at the right. All support is appreciated.
For example data dummy,
dummy <- data.table(
Fecha = seq(as.Date("2020/1/1"), as.Date("2020/1/31"), "day")
)
dummy$NivelI = runif(31, 0, 10)
dummy$Nivel2 = runif(31, 0, 10)
plot using reshape2::melt like below will work.
dummy %>%
melt(id.vars = "Fecha") %>%
ggplot(aes(Fecha, value, group = variable, color = variable)) +
geom_step() + guides(color = guide_legend(title = "aaa"))
In your case, to make dummy formed data, if Fecha, NivelI and Nivel2 are vectors, just try
df <- data.frame(
Fecha,
NivelI,
Nivel2
)
then
df %>%
melt(id.vars = "Fecha") %>%
ggplot(aes(Fecha, value, group = variable, color = variable)) +
geom_step() + guides(color = guide_legend(title = "aaa"))
where "aaa" will be your legend name.

How do you make a line graph with multiple lines from multiple variables in R

I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so
df2019 <- data.frame(Role = c("A","B","C"),Women_percent = c(65,50,70),Men_percent = c(35,50,30), Women_total =
c(130,100,140), Men_total = c(70,100,60))
df2016 <- data.frame(Role= c("A","B","C"),Women_percent = c(70,45,50),Men_percent = c(30,55,50),Women_total =
c(140,90,100), Men_total = c(60,110,100))
all_melted <- reshape2::melt(
rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)),
id=c("year", "Role"))
Theres no reason I need the data in melted from, I just did it because I was plotting bar graphs with it, but now I need a line graph and I dont know how to make line graphs in melted form, and dont know how to keep that 19/16 tag if not in melted frame. When i try to make a line graph I dont know how to specify what "variable" will be used. I want the lines to be the Women,Men percent values, and the label to be the totals. (in this picture the geom_text is the percent values, I want it to use the total values)
Crucially I want the linetype to be dotted in 2016 and for the legend to show that
I think it would be simplest to rbind the two frames after labelling them with their year, then reshape the result so that you have columns for role, year, gender, percent and total.
I would then use a bit of alpha scale trickery to hide the points and labels from 2016:
df2016$year <- 2016
df2019$year <- 2019
rbind(df2016, df2019) %>%
pivot_longer(cols = 2:5, names_sep = "_", names_to = c("Gender", "Type")) %>%
pivot_wider(names_from = Type) %>%
ggplot(aes(Role, percent, color = Gender,
linetype = factor(year),
group = paste(Gender, year))) +
geom_line(size = 1.3) +
geom_point(size = 10, aes(alpha = year)) +
geom_text(aes(label = total, alpha = year), colour = "black") +
scale_colour_manual(values = c("#07aaf6", "#ef786f")) +
scale_alpha(range = c(0, 1), guide = guide_none()) +
scale_linetype_manual(values = c(2, 1)) +
labs(y = "Percent", color = "Gender", linetype = "Year")

Barplot side by side and line charts in the same plot

I want to create in R a plot which contains side by side bars and line charts as follows:
I tried:
Total <- c(584,605,664,711,759,795,863,954,1008,1061,1117,1150)
Infected <- c(366,359,388,402,427,422,462,524,570,560,578,577)
Recovered <- c(212,240,269,301,320,359,385,413,421,483,516,548)
Death <- c(6,6,7,8,12,14,16,17,17,18,23,25)
day <- itemizeDates(startDate="01.04.20", endDate="12.04.20")
df <- data.frame(Day=day, Infected=Infected, Recovered=Recovered, Death=Death, Total=Total)
value_matrix = matrix(, nrow = 2, ncol = 12)
value_matrix[1,] = df$Recovered
value_matrix[2,] = df$Death
plot(c(1:12), df$Total, ylim=c(0,1200), xlim=c(1,12), type = "b", col="peachpuff", xaxt="n", xlab = "", ylab = "")
points(c(1:12), df$Infected, type = "b", col="red")
barplot(value_matrix, beside = TRUE, col = c("green", "black"), width = 0.35, add = TRUE)
But the bar chart does not fit the line chart. I guess it would be easier to use ggplot2, but don't know how. Could anyone help me? Thanks a lot in advance!
With ggplot2, the margins are handled nicely for you, but you'll need the data in two separate long forms. Reshape from wide to long with tidyr::gather, tidyr::pivot_longer, reshape2::melt, reshape, or whatever you prefer.
library(tidyr)
library(ggplot2)
df <- data.frame(
Total = c(584,605,664,711,759,795,863,954,1008,1061,1117,1150),
Infected = c(366,359,388,402,427,422,462,524,570,560,578,577),
Recovered = c(212,240,269,301,320,359,385,413,421,483,516,548),
Death = c(6,6,7,8,12,14,16,17,17,18,23,25),
day = seq(as.Date("2020-04-01"), as.Date("2020-04-12"), by = 'day')
)
ggplot(
tidyr::gather(df, Population, count, Total:Infected),
aes(day, count, color = Population, fill = Population)
) +
geom_line() +
geom_point() +
geom_col(
data = tidyr::gather(df, Population, count, Recovered:Death),
position = 'dodge', show.legend = FALSE
)
Another way to do it is to gather twice before plotting. Not sure if this is easier or harder to understand, but you get the same thing.
df %>%
tidyr::gather(Population, count, Total:Infected) %>%
tidyr::gather(Resolution, count2, Recovered:Death) %>%
ggplot(aes(x = day, y = count, color = Population)) +
geom_line() +
geom_point() +
geom_col(
aes(y = count2, color = Resolution, fill = Resolution),
position = 'dodge', show.legend = FALSE
)
You can actually plot the lines and points without reshaping by making separate calls for each, but to dodge bars (or get legends), you'll definitely need to reshape.

Back to back bar chart with three levels: Can I center the plot?

I wish to create a back to back bar chart. In my data, I have a number of species observations (n) from 2017 and 2018. Some species occurred only in 2017 other occurred both years and some only occurred in 2018. I wish to depict this in a graph centered around the number of species occurring both years across multiple sites (a,b,c).
First, I create a data set:
n <- sample(1:50, 9)
reg <- c(rep("2017", 3), rep("Both",3), rep("2018", 3))
plot <- c(rep(c("a", "b", "c"), 3))
d4 <- data.frame(n, reg, plot)
I use ggplot to try to plot my graph - I have tried two ways:
library(ggplot2)
ggplot(d4, aes(plot, n, fill = reg)) +
geom_col() +
coord_flip()
ggplot(d4, aes(x = plot, y = n, fill = reg))+
coord_flip()+
geom_bar(stat = "identity", width = 0.75)
I get a plot similar to what I want. However, would like the blue 'both' bar to be in between the 2017 and 2018 bars. Further, my main problem, I would like to center the 'both' bar in the middle of the plot. The 2017 column should extend to the left and the 2018 column to the right. My question is somewhat similar to the one in the link below; however, as I have only three and not four levels in my graph, I cannot use the same approach as below.
Creating a stacked bar chart centered on zero using ggplot
I'm not sure this is the best way to do that, but here is a way to do that:
library(dplyr)
d4pos <- d4 %>%
filter(reg != 2018) %>%
group_by(reg, plot) %>%
summarise(total = sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
d4neg <- d4 %>%
filter(reg != 2017) %>%
group_by(reg, plot) %>%
summarise(total = - sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
coord_flip()
I generate two data frames for the total of each group. One contains the 2017 and (half of) Both, and the other contains the rest. The value for the 2018 data frame is flipped to plot on the negative side.
The output looks like this:
EDIT
If you want to have positive values in both directions for the horizontal axis, you can do something like this:
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
scale_y_continuous(breaks = seq(-50, 50, by = 25),
labels = abs(seq(-50, 50, by = 25))) +
coord_flip()

ggplot2 multiple time-series plots

I'm just learning ggplot, so my apologies if this is a really basic question. I have data that has been aggregated by year with a few different qualities to slice on (code below will generate sample data). I'm trying to show a few different charts: one that shows overall for a given metric, then a couple that show the same metric split across the qualities, but its not going right. Ideally, I want to make the plot once, then call the geom layer for each of the individual charts. I do have examples of how I want it to look in the code as well.
I'm starting to think this is a data structure issue, but really can't figure it out.
Secondary question - My years are formatted as integers, is that the best way to do that here, or should I convert them to dates?
library(data.table)
library(ggplot2)
#Generate Sample Data - Yearly summarized data
BaseData <- data.table(expand.grid(dataYear = rep(2010:2017),
Program = c("A","B","C"),
Indicator = c("0","1")))
set.seed(123)
BaseData$Metric1 <- runif(nrow(BaseData),min = 10000,100000)
BaseData$Metric2 <- runif(nrow(BaseData),min = 10000,100000)
BaseData$Metric3 <- runif(nrow(BaseData),min = 10000,100000)
BP <- ggplot(BaseData, aes(dataYear,Metric1))
BP + geom_area() #overall Aggregate
BP + geom_area(position = "stack", aes(fill = Program)) #Stacked by Program
BP + geom_area(position = "stack", aes(fill = Indicator)) #stacked by Indicator
#How I want them to look
##overall Aggregate
BP.Agg <- BaseData[,.(Metric1 = sum(Metric1)),
by = dataYear]
ggplot(BP.Agg,aes(dataYear, Metric1))+geom_area()
##Stacked by Program
BP.Pro <- BaseData[,.(Metric1 = sum(Metric1)),
by = .(dataYear,
Program)]
ggplot(BP.Pro,aes(dataYear, Metric1, fill = Program))+geom_area(position = "stack")
##stacked by Indicator
BP.Ind <- BaseData[,.(Metric1 = sum(Metric1)),
by = .(dataYear,
Indicator)]
ggplot(BP.Ind,aes(dataYear, Metric1, fill = Indicator))+geom_area(position = "stack")
I was right, it was an easy fix. I should have used stat_summary instead of geom_area, here are the correct layers to add:
BP + stat_summary(fun.y = sum, geom = "area")
BP + stat_summary(fun.y = sum, geom = "area", position = "stack", aes(fill = Program, group = Program))
BP + stat_summary(fun.y = sum, geom = "area", position = "stack", aes(fill = Indicator, group = Indicator))

Resources