Ordering of items within a stacked geom_bar - r

I want, for reasons which seems good to me, to plot a stacked bar chart, with the bars in a specific, data dependent order. For reasons which are obscure to me, it does not seem to work. Specifically, while I can readily arrange the rows of my dataframe in the right order, and make the column of names identifying the bars an ordered factor, so getting the bars in the order I desire, the graph does not list the columns of the dataframe in the order I desire.
An example
tab <- structure(list(Item = c("Personal", "Peripheral", "Communication", "Multimedia", "Office", "Social Media"), `Not at all` = c(3.205128, 18.709677, 5.844156, 31.578947, 20.666667, 25.827815), Somewhat = c(30.76923, 23.87097, 24.67532, 18.42105, 30, 16.55629), `Don't know` = c(0.6410256, 2.5806452, 1.9480519, 11.1842105, 2.6666667, 5.9602649), Confident = c(32.69231, 29.67742, 33.11688, 17.10526, 23.33333, 27.15232), `Very confident` = c(32.69231, 25.16129, 34.41558, 21.71053, 23.33333, 24.50331)), .Names = c("Item", "Not at all", "Somewhat", "Don't know", "Confident", "Very confident"), row.names = c(NA, -6L), class = "data.frame")
Title <- 'Plot title'
ResponseLevels <- c("Not at all", "Somewhat", "Don't know", "Confident", "Very confident") # Labels for bars
pal.1 <- brewer.pal(category, 'BrBG') # Colours
tab <- tab %>% arrange(.[,2]) # Sort by first columns of responses
tab$Item <- factor(tab$Item, levels = tab$Item[order(tab[,2])], ordered = TRUE) # Reorder factor levels
tab.m <- melt(tab, id = 'Item')
tab.m$col <- rep(pal.1, each = items) # Set colours
g <- ggplot(data = tab.m, aes(x = Item, y = value, fill = col)) +
geom_bar(position = "stack", stat = "identity", aes(group = variable)) +
coord_flip() +
scale_fill_identity("Percent", labels = ResponseLevels,
breaks = pal.1, guide = "legend") +
labs(title = Title, y = "", x = "") +
theme(plot.title = element_text(size = 14, hjust = 0.5)) +
theme(axis.text.y = element_text(size = 16,hjust = 0)) +
theme(legend.position = "bottom")
g
The stacked pieces of the bars run from right to left, from 'Not at all' to 'Very confident'. The items are in the correct order, from 'Multimedia' to 'Personal', ordered by the proportion of those who said 'Not at all' to each item.
What I want to get is this graph with the responses ordered the other way, the same way as the legend, that is from 'Not at all' on the left, to 'Very confident' on the right. I cannot figure out how this ordering is set, nor how to change it.
I've read through the 'similar questions', but can see no answer to this specific query. Suggestions, using ggplot, not base R graphics, welcome.
Ok, building on the useful, and much appreciated answer from allstaire, I try the following
library(tidyverse)
tab <- structure(list(Item = c("Personal", "Peripheral", "Communication", "Multimedia", "Office", "Social Media"), `Not at all` = c(3.205128, 18.709677, 5.844156, 31.578947, 20.666667, 25.827815), Somewhat = c(30.76923, 23.87097, 24.67532, 18.42105, 30, 16.55629), `Don't know` = c(0.6410256, 2.5806452, 1.9480519, 11.1842105, 2.6666667, 5.9602649), Confident = c(32.69231, 29.67742, 33.11688, 17.10526, 23.33333, 27.15232), `Very confident` = c(32.69231, 25.16129, 34.41558, 21.71053, 23.33333, 24.50331)), .Names = c("Item", "Not at all", "Somewhat", "Don't know", "Confident", "Very confident"), row.names = c(NA, -6L), class = "data.frame")
tab <- tab %>% select(1,6,5,4,3,2,1) ## Re-order the columns of tab
tab.m <- tab %>% arrange(`Not at all`) %>%
mutate(Item = factor(Item, levels = Item[order(`Not at all`)])) %>%
gather(variable, value, -Item, factor_key = TRUE)
ggplot(data = tab.m, aes(x = Item, y = value, fill = variable)) +
geom_col() +
coord_flip() +
scale_fill_brewer("Percent", type = 'cat', palette = 'BrBG',
guide = guide_legend(reverse = TRUE)) +
labs(title = 'Plot title', y = NULL, x = NULL) +
theme(legend.position = "bottom")
And this is exactly the graph I want, so my pressing problem is solved.
However, if I say instead
ggplot(data = tab.m, aes(x = Item, y = value, fill = variable)) +
geom_col() +
coord_flip() +
scale_fill_brewer("Percent", type = 'cat', palette = 'BrBG',
guide = guide_legend(reverse = FALSE)) +
labs(title = 'Plot title', y = NULL, x = NULL) +
theme(legend.position = "bottom")
The picture I get is this
Here the body of the chart is correct, but the legend is going in the wrong direction.
This solves my problem, but does not quite answer my question. I start with a dataframe, and to get what I want I have to reverse the order of the data columns, and reverse the guide legend. This evidently works, but it's perverse.
So, how does a stacked bar chart decide in what order to present the stacked items? It's clearly related to their order in the melted dataset, but simply changing the order leaves the legend going in the wrong direction. Looking at the melted dataset, tab.m, from top to bottom, the responses are in the order 'Very confident' to 'Not at all', but the default legend is the reverse order 'Not at all' to 'Very confident'.

If you pass guide_legend instead of just a string, you can set its reverse parameter to TRUE. Simplifying a bit,
library(tidyverse)
tab <- structure(list(Item = c("Personal", "Peripheral", "Communication", "Multimedia", "Office", "Social Media"), `Not at all` = c(3.205128, 18.709677, 5.844156, 31.578947, 20.666667, 25.827815), Somewhat = c(30.76923, 23.87097, 24.67532, 18.42105, 30, 16.55629), `Don't know` = c(0.6410256, 2.5806452, 1.9480519, 11.1842105, 2.6666667, 5.9602649), Confident = c(32.69231, 29.67742, 33.11688, 17.10526, 23.33333, 27.15232), `Very confident` = c(32.69231, 25.16129, 34.41558, 21.71053, 23.33333, 24.50331)), .Names = c("Item", "Not at all", "Somewhat", "Don't know", "Confident", "Very confident"), row.names = c(NA, -6L), class = "data.frame")
tab.m <- tab %>% arrange(`Not at all`) %>%
mutate(Item = factor(Item, levels = Item[order(`Not at all`)])) %>%
gather(variable, value, -Item, factor_key = TRUE)
ggplot(data = tab.m, aes(x = Item, y = value, fill = variable)) +
geom_col() +
coord_flip() +
scale_fill_brewer("Percent", palette = 'BrBG',
guide = guide_legend(reverse = TRUE)) +
labs(title = 'Plot title', y = NULL, x = NULL) +
theme(legend.position = "bottom")
For the edit:
Bar order is determined by factor level order, which in the above is determined by column order due to the use of gather to create the factor, thoughcoord_flip is making it less obvious. It's easy to reverse level order with levels<- or by reassembling the factor, though. To keep the colors with the same levels, pass direction = -1 to scale_fill_brewer to reverse their order, as well.
tab.m <- tab %>% arrange(`Not at all`) %>%
mutate(Item = factor(Item, levels = Item[order(`Not at all`)])) %>%
gather(variable, value, -Item, factor_key = TRUE) %>%
mutate(variable = factor(variable, levels = rev(levels(variable)), ordered = TRUE))
ggplot(data = tab.m, aes(x = Item, y = value, fill = variable)) +
geom_col() +
coord_flip() +
scale_fill_brewer("Percent", palette = 'BrBG', direction = -1,
guide = guide_legend(reverse = TRUE)) +
labs(title = 'Plot title', y = NULL, x = NULL) +
theme(legend.position = "bottom")

Related

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

As a preliminary disclaimer, I am still very new to R (this is the first analysis I've performed independently), and am hoping this is a reproducible example.
I have a dataset measuring the d.13.C and d.18.O values of various enamel samples through time and space. I want to represent trends within Families across space and time. I have a boxplot I generated in ggplot2 that does this, but I'm running into a few problems:
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
It produces something like this:
Since my data is not evenly distributed (not every depositional context is represented in each geologic member in each family), the boxplots for each depositional environment are different. I would like them to all be the same width, regardless of if the data is present or not (e.g., equivalent to the size of the ones in Bovidae in the KBS Member).
I've tried messing around with width = in the geom_boxplot call, I've tried using theme() to change aspects of the grid, and I've tried the drop = FALSE call, but that didn't change anything. I've also tried faceting my member and depositional environment, but that did not look as appealing and seemed clunkier. Is there a way to accomplish this, or is faceting the way to go?
I provided my dataframe below. *note: it's a subset since otherwise, the output was too long.
dput(head(d))
structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS",
"Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic",
"Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae",
"Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("",
"", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus",
"Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56",
"1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
You could use position_dodge2 with preserve = "single" to keep the boxplot width the same across different groups like this:
library(ggplot2)
library(dplyr)
d %>%
mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>%
ggplot(aes(x = Member, y = d.13.C)) +
geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
position = position_dodge2(preserve = "single")) +
facet_wrap(~Family) +
scale_fill_brewer(palette = "Dark2") +
scale_color_brewer(palette = "Dark2") +
theme_bw()
Created on 2023-02-08 with reprex v2.0.2

How to adjust barchart in ggplotly?

I have barchart with dual-axis in order to visualize 3 numerical variables. All these work pretty nice in ggplot. However, when I convert ggplot to ggplotly, there are issues:
in the legend, there are strange 1s (highlighted in yellow)
in the hover, there are double values (highlighted in yellow)
changes of hjust=0, vjust=-0.5 in geom_text are not reflected on the plot
Could anybody help me to adjust these issues?
df <- data.frame (model = c("A", "B", "C","D","E","F"),
share = c(12,20,15,9,60,20),
sale = c(16,25,18,14,67,28),
cost = c(14,19,28,24,57,28))
#set levels of model by cost
df$model <- factor(df$model, levels = arrange(df, desc(df$cost))$model)
library(tidyverse)
df_long <- df %>%
pivot_longer(
cols = -model
)
plt <- ggplot(df_long, aes(x = model, y= value, label=value))+
geom_col(data = filter(df_long, name != "cost"), aes(fill=name), position = position_dodge())+
scale_fill_manual(values = c("blue", "grey"))+
geom_line(data = filter(df_long, name == "cost"), aes(color = name, group = 1), size = 1)+
scale_color_manual(values = "red")+
geom_text(data = filter(df_long, name == "cost"), size = 3,hjust=0, vjust=-0.5)+
geom_label(data = filter(df_long, name == "cost"), hjust=0, vjust=-0.5)+
scale_y_continuous(
name = "Sale and Share",
sec.axis = sec_axis(~., name ="Cost")
)+
theme_minimal()+
theme(legend.title=element_blank())
ggplotly(plt)
Further approach from #Quinten's answer,
to handle 'name name' and 'value value' things,
try
tooltip = c("value", "name", "model")
plt1$x$layout$legend$title$text <- "name"
Full code is
plt <-
ggplot(df_long, aes(x = model, y= value, label = NA))+
geom_col(data = filter(df_long, name != "cost"), aes(fill=name), position = position_dodge())+
scale_fill_manual(values = c("blue", "grey"))+
geom_line(data = filter(df_long, name == "cost"), aes( group = 1, color = name), size = 1)+
scale_color_manual(values = "red")+
#geom_text(data = filter(df_long, name == "cost"), size = 3,hjust=0, vjust=-0.5)+
geom_label(data = filter(df_long, name == "cost"), hjust=0, vjust=-0.5)+
scale_y_continuous(
name = "Sale and Share",
sec.axis = sec_axis(~., name ="Cost")
)+
theme_minimal()
plt1 <- ggplotly(plt, tooltip = c("value", "name", "model"))
for (i in 1:length(plt1$x$data)){
if (!is.null(plt1$x$data[[i]]$name)){
plt1$x$data[[i]]$name = gsub("\\(","",str_split(plt1$x$data[[i]]$name,",")[[1]][1])
}
}
plt1$x$layout$legend$title$text <- "name"
It looks like you've got some great information so far. This addresses all of the things you identified. Although, at this point, it would be a LOT easier to just make the plot in Plotly!
The first thing I did is comment out the call for geom_text and geom_label. Plotly doesn't tend to play well here. It is going back into the plot, but not here.
Next, I built your plot and looked at the names and legend groups that were assigned by the conversion. This doesn't change anything—this is just looking.
plt2 <- plotly_build(plt)
invisible(
lapply(1:length(plt2$x$data),
function(j) {
message(j, " ", plt2$x$data[[j]]$name, " & ",
plt2$x$data[[j]]$legendgroup)
})
)
# 1 (sale,1) & (sale,1)
# 2 (share,1) & (share,1)
# 3 (cost,1) & (cost,1)
#Quinten addressed this issue, but this is how you can just look. Once I saw what Plotly "made", I was sure I knew what I needed to change.
This code changes these strings. It also prints the update to the console so that you can inspect what you expect.
invisible(
lapply(1:length(plt2$x$data),
function(j) {
x <- plt2$x$data[[j]]$name # find the name
y <- str_extract(x, "[a-z]+") # remove anything that's not a letter
plt2$x$data[[j]]$name <<- y # put it back
plt2$x$data[[j]]$legendgroup <<- y
message(j, " ", plt2$x$data[[j]]$name, " & ",
plt2$x$data[[j]]$legendgroup)
})
)
# 1 sale & sale
# 2 share & share
# 3 cost & cost
You can use this sort of look/change/check to validate the information that ends up in the tooltips, as well. Instead of $name or $legendgroup, you'll look at $text.
This next chunk of code doesn't check the input and print it out (I figured that would be redundant). This just changes it. (I did use that process to build this though.)
tx = " "
invisible(
lapply(1:length(plt2$x$data),
function(k){
tx <<- plt2$x$data[[k]]$text # tooltip for each trace
lapply(1:length(tx),
function(m) {
tr <- strsplit(tx[[m]], "<br />") # tooltip for each point
tr2 <- unique(tr[[1]]) # remove redundancy
str <- paste0(tr2, collapse = "<br />")
tx[[m]] <<- str # put it back together
})
plt2$x$data[[k]]$text <<- tx # change the plot
})
)
Now on to the labels-if you want a background or border, you have to use annotations in Plotly. Like annotation in the ggplot package, annotations in Plotly has less 'rules' per se.
You have an odd order for the model, so that has to be addressed, as well. When data moves between ggplot and Plotly, things tend to be awry. So it's unlikely that you'd be able to connect to the original data.
One thing to keep in mind, I used paper space for the x-axis. The default paper space (domain) in Plotly is [0,1]. Your graph is evenly spaced along the x, with your values in the middle of each of the six categories, so everything on the x is in terms of 1/6th space.
So first, put the data in order as it needs to appear in the plot. Then add the annotations (labels) to the plot. I've also removed the name of the legend here.
# to add labels, we need to have the order the data appears on the plot
df2 <- df_long %>%
arrange(desc(value)) %>%
filter(name == "cost")
plt2 %>%
layout(legend = list(title = "")) %>% # remove legend name
add_annotations(x = c(1/12, 1/6 + 1/12, 1/3 + 1/12, # using domain for x-axis
1/2 + 1/12, 2/3 + 1/12, 5/6 + 1/12),
y = df2$value,
text = df2$value,
xshift = 20, # shift right 20 px
yshift = 15, # shift up 15 px
hoverinfo = "skip",
bgcolor = "white",
bordercolor = "black",
xref = "paper", yref = "y", # cat x, use domain for annot x
showarrow = F)
After all of that, here's your plot.
This is straight plotly. I think the labels would look a bit better with padding (which can be added).
df_long %>%
filter(name != "cost") %>%
plot_ly(x = ~model, y = ~value, color = ~name, type = "bar",
customdata = ~name, colors = c("blue", "gray"),
hovertemplate = paste0("Model: %{x}<br>Value: %{y}<br>",
"Name: %{customdata}<extra></extra>")) %>%
add_lines(inherit = F, data = df, x = ~model,
y = ~cost, color = I("red"),
name = "cost",
hovertemplate = paste0("Model: %{x}<br>Value: %{y}<br>",
"Name: cost<extra></extra>")) %>%
add_annotations(data = df, x = ~model, y = ~cost, text = ~cost,
bgcolor = "white", bordercolor = "black",
xshift = 15, yshift = 15, showarrow = F) %>%
layout(barmode = "group")
Pretty much the same as the converted plot.
Legend issue:
Using the code in this post: Strange formatting of legend in ggplotly in R . You can change the legend in ggplotly like this:
library(plotly)
myplot = ggplotly(plt)
for (i in 1:length(myplot$x$data)){
if (!is.null(myplot$x$data[[i]]$name)){
myplot$x$data[[i]]$name = gsub("\\(","",str_split(myplot$x$data[[i]]$name,",")[[1]][1])
}
}
myplot
Output:

Ggplot - always place 'Total' bar as the farthest right bar using geom_col

I am creating a chart that looks like the below. Problem is that I'd like the grey 'total' bar to always be on the far right hand side.
Current code is below, can anyone please amend/provide any additional code to create this effect?
#plot with reorder
PrevalencePlot <- ggplot(ICSTable4, aes(x = reorder(value, Area), y = value, fill = Statistical_Significance)) +
geom_col() +
scale_fill_manual(values = colours)+
geom_errorbar(aes(ymin=errorbarlowerplot, ymax=errorbarhigherplot),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
theme_bw() +
geom_text(aes(label = valuelabel), vjust = 2.5, colour = "black")+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
If anyone is able to help then the below data frame could be used to generate the principle I think? Thank you!
df <- data.frame(Area = c("Area1", "Area2", "Area3", "Area4", "Total"),
Value = c(1, 3, 7, 5, 4)
)
Building on the minimal example data, we can make a spartanic version of the plot that addresses the question of ordering the values, and placing a selected column at the end.
df <- data.frame(Area = c("Area1", "Area2", "Area3", "Area4", "Total"),
value = c(1, 3, 7, 5, 4),
Statistical_Significance = c("higher", "lower", "lower", "higher", NA))
It's easier to create the order of the columns before plotting, as we need to create the factors based on the order of value and then reposition the target column ("Total").
df <- df %>%
dplyr::arrange(desc(value)) %>% #arrange by value
dplyr::mutate(Area = forcats::as_factor(Area)) %>% # factor that defines order on x-axis
dplyr::mutate(Area = forcats::fct_relevel(Area, "Total", after = Inf)) # reposition "Total" column
ggplot(df, aes(x = Area, y = value, fill = Statistical_Significance)) +
geom_col() +
theme_bw()

How to Clean up Donut Chart?

I need to clean up the donut chart below, by getting rid of the double data labels, and rounding the percents. I would like the final graph to have one data label with a % and no decimal place. Anyone know how I accomplish this?
df <- data.frame(
group = cc("DoDEA", "ERAC", "Evaluator", "Project Director", "Project Staff - Finance",
"Project Staff - Other"),
value = c(3,7, 10, 43, 12, 25))
head(df)
ggdonutchart(df, label = "group")
ggdonutchart(df, "value",
color = "white",fill = "group",
palette = c("yellow", "steelblue3","slategray3", "dodgerblue4",
"yellowgreen", "sienna2") ) +
labs(fill = "") + #THIS DELETS GROUP FROM LEGEND
geom_text(aes(label = scales::percent(value/sum(value))),
position = position_stack(vjust = 0.5))`
You can add a new column to your dataframe to create the labels. This code will put the % on a new line. Then add these labels in the ggdonutchart function using the label argument. The function then has additional arguments to adjust the label position.
df <- data.frame(
group = c("DoDEA", "ERAC", "Evaluator", "Project Director", "Project Staff - Finance",
"Project Staff - Other"),
value = c(3,7, 10, 43, 12, 25))
# create labels
df$label <- paste(df$group,
scales::percent(df$value/sum(df$value), accuracy = 1), sep = "\n")
ggdonutchart(df, "value",
label = "label", # add labels
color = "white",fill = "group",
palette = c("yellow", "steelblue3","slategray3", "dodgerblue4",
"yellowgreen", "sienna2")) +
theme(legend.position = "none")

geom_text in the right of a heatmap in ggplot

I created a heatmap with this dataframe:
datos<- data.frame(
stringsAsFactors = FALSE,
country_name = c("Argentina","Bolivia",
"Brazil","Chile","Colombia","Paraguay","Peru","Uruguay",
"Argentina","Bolivia","Brazil","Chile","Colombia",
"Paraguay","Peru","Uruguay","Argentina","Bolivia",
"Brazil","Chile"),
year = c("1961","1961","1961","1961",
"1961","1961","1961","1961","1962","1962","1962",
"1962","1962","1962","1962","1962","1963","1963",
"1963","1963"),
crec = c(1,1,1,1,1,1,1,1,0,1,1,
1,1,1,1,0,0,1,1,1)
)
colors<-c("red","blue")
chart<- ggplot(datos,aes(x=year,y=country_name,fill=factor(crec))) +
geom_tile(color=gris,size=0.01)+
scale_fill_manual(values=colors)+
scale_y_discrete(limits = crisis$country_name)+
guides(fill=FALSE)
I would like to add a geom_text at the right of the last year of each country, so I can show the counts how many red squares each country has. I think geom_text would be good, but i am not sure about how to create one for each country.
text<- data.frame(
stringsAsFactors = FALSE,
country_name = c("Colombia","Bolivia","Chile",
"Peru","Brazil","Paraguay","Uruguay","Argentina"),
label = c("0 years","0 years","0 years",
"0 years","0 years","0 years","1 years","2 years")
)
library(dplyr)
# get the maximum year per country
text = text %>%
left_join(
datos %>% group_by(country_name) %>%
summarize(year = max(year))
)
chart +
geom_text(
data = text,
aes(label = label, x = year, y = country_name),
# left justified, but nudged to the right
hjust = 0, nudge_x = 0.55,
inherit.aes = FALSE
) +
# give a little extra room for the text
scale_x_discrete(expand = expansion(mult = 0, add = c(0, 1)))

Resources