dodge columns in ggplot2 - r

I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.
Here is a subset of the data to work on:
gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)
The code I wrote is this
p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
geom_bar(stat="identity",position = position_dodge(0.9)) +
labs(x="Drug",y="Prevalence") +
geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The figure I obtain is this one where bars are all on top of each other while I want them "dodge".
I also obtain the following warning:
ymax not defined: adjusting position using y instead
Warning message:
position_dodge requires non-overlapping x intervals
Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.
Also should I be concerned about the warning (which I clearly do not fully understand)?
I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.
Thank you

ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here:
library(dplyr)
# calculate x-axis position for bars of varying width
gr <- gr %>%
group_by(drug) %>%
arrange(practice) %>%
mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
ungroup()
x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]
ggplot(gr,
aes(x = pos, y = prevalence,
fill = country, width = prop,
ymin = low.CI, ymax = high.CI)) +
geom_col(col = "black") +
geom_errorbar(size = 0.25, colour = "black") +
facet_wrap(~drug) +
scale_fill_manual(values = c("c1" = "gray79",
"c2" = "gray60",
"c3" = "gray39"),
guide = F) +
scale_x_continuous(name = "Drug",
labels = x.labels,
breaks = x.pos) +
labs(title = "Drug usage by country and practice", y = "Prevalence") +
theme_classic()

There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
gr$drug <- paste("Drug", gr$drug)
p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence),
ymax=high.CI,ymin = low.CI,
position="dodge",fill=practice, width=prop))
p + theme_bw()+ facet_grid(drug~country, scales="free") +
geom_bar(stat="identity") +
labs(x="Practice",y="Prevalence") +
geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The width is too small in the C1 country and as you indicated the one clinic is quite influential.
Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.

Related

Add error bar to ggplot2 stacked barplot, without using dodge

I can find examples of adding error bars to dodged barplots (e.g. here).
However, is it possible to denote both a stacked barplot, with a single error bar at the top of each bar showing overall error? For example, like this middle plot below? How would I add the red error bars?
My basic ggplot2 code is here:
ggplot(sample, aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
facet_grid(~scenario_capacity)
And my data are below:
income,scenario_capacity,strategy_short,baseline,high,low
LIC,50_gb_month,4G_f,0.260317022,0.326222444,0.234391846
LIC,50_gb_month,5G_f,0.124212858,0.146834332,0.115607428
LIC,50_gb_month,4G_w,0.266087059,0.331992481,0.240156101
LIC,50_gb_month,5G_w,0.129977113,0.152604368,0.121371683
LMIC,50_gb_month,4G_f,0.83300281,0.981024297,0.770961424
LMIC,50_gb_month,5G_f,0.527561846,0.56027992,0.517383821
LMIC,50_gb_month,4G_w,0.837395381,0.985564298,0.77528317
LMIC,50_gb_month,5G_w,0.53198477,0.564819922,0.521741702
UMIC,50_gb_month,4G_f,2.084363642,2.161110527,2.047796949
UMIC,50_gb_month,5G_f,1.644845928,1.667321898,1.634737764
UMIC,50_gb_month,4G_w,2.08822286,2.165063696,2.051605578
UMIC,50_gb_month,5G_w,1.648696474,1.67124905,1.638559402
HIC,50_gb_month,4G_f,1.016843718,1.026058625,1.010465168
HIC,50_gb_month,5G_f,0.820046245,0.823345129,0.81792777
HIC,50_gb_month,4G_w,1.019669475,1.028904617,1.013290925
HIC,50_gb_month,5G_w,0.823000642,0.82634578,0.820861932
Whenever I try to use an aggregated dataframe to feed to geom_errorbar, as below, I end up with an error message ('object 'income' not found').
sample_short <- sample %>%
group_by(scenario_capacity, strategy_short) %>%
summarize(
low = sum(low),
baseline = sum(baseline),
high = sum(high),
)
ggplot(sample, aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
geom_errorbar(data=sample_short, aes(y = baseline, ymin = low, ymax = high)) +
facet_grid(~scenario_capacity)
You need to include income in your summary stats, like so:
(df being your dataframe: avoid naming objects with function names like sample):
df_errorbar <-
df |>
group_by(scenario_capacity, strategy_short) |>
summarize(
income = first(income),
low = sum(low),
baseline = sum(baseline),
high = sum(high)
)
df |>
ggplot(aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
geom_errorbar(data = df_errorbar, aes(y = baseline, ymin = low, ymax = high)) +
facet_grid(~scenario_capacity)
take care about appropriate grouping when desiring an overall "error"

Additional x axis on ggplot

I'm aware there are similar posts but I could not get those answers to work in my case.
e.g. Here and here.
Example:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut))
Returns a plot:
Since I used scale, those numbers are the zscores or standard deviations away from the mean of each break.
I would like to add as a row underneath the equivalent non scaled raw number that corresponds to each.
Tried:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(aes(label = price))
Gives:
Error: geom_text requires the following missing aesthetics: y
My primary question is how can I add the raw values underneath -3:3 of each break? I don't want to change those breaks, I still want 6 breaks between -3:3.
Secondary question, how can I get -3 and 3 to actually show up in the chart? They have been trimmed.
[edit]
I've been trying to make it work with geom_text but keep hitting errors:
diamonds %>%
ggplot(aes(x = scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(label = price)
Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomText, :
object 'price' not found
I then tried changing my call to geom_text()
geom_text(data = diamonds, aes(price), label = price)
This results in the same error message.
You can make a custom labeling function for your axis. This takes each label on the axis and performs a custom transform for you. In your case you could paste the z score, a line break, and the z-score times the standard deviation plus the mean. Because of the distribution of prices in the diamonds data set, this means that z scores below about -1 represent negative prices. This may not be a problem in your own data. For clarity I have drawn in a vertical line representing $0
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(diamonds$price) * x + mean(diamonds$price)))
}
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
geom_vline(aes(xintercept = -0.98580251364833), linetype = 2) +
facet_wrap(vars(cut)) +
scale_x_continuous(label = labeller, limits = c(-3, 3)) +
xlab("price")
We can use the sec_axis functionality in scale_x_continuous. To use this functionality we need to manually scale your data. This will add a secondary axis at the top of the plot, not underneath. So it's not quite exactly what you're looking for.
library(tidyverse)
# manually scale the data
mean_price <- mean(diamonds$price)
sd_price <- sd(diamonds$price)
diamonds$price_scaled <- (diamonds$price - mean_price) / sd_price
# make the plot
ggplot(diamonds, aes(price_scaled))+
geom_density()+
facet_wrap(~cut)+
scale_x_continuous(sec.axis = sec_axis(~ mean_price + (sd_price * .)),
limits = c(-3, 4), breaks = -3:3)
You could cheat a bit by passing some dummy data to geom_text:
geom_text(data = tibble(label = round(((-3:3) * sd_price) + mean_price),
y = -0.25,
x = -3:3),
aes(x, y, label = label))

ggplot2 and jitter/dodge points by a group

I have 'elevation' as my y-axis and I want it as a discrete variable (in other words I want the space between each elevation to be equal and not relative to the numerical differences). My x-axis is 'time' (julian date).
mydata2<- data.frame(
"Elevation" = c(rep(c(1200),10),rep(c(1325.5),10),rep(c(1350.75),10), rep(c(1550.66),10)),
"Sex" = c(rep(c("F","M"),20)),
"Type" = c(rep(c("emerge","emerge","endhet","endhet","immerge","immerge","melt","melt", "storpor","storpor"),4)),
"mean" = c(rep(c(104,100,102,80,185,210,84,84,188,208,104,87,101,82, 183,188,83,83,190,189),2))
"se"=c(rep(c(.1,.01,.2,.02,.03),4)))
mydata2$Sex<-factor(mydata2$Sex))
mydata2$Type<-factor(mydata2$Type))
mydata2$Elevation<-factor(mydata2$Elevation))
at<-ggplot(mydata2, aes(y = mean, x = Elevation,color=Type, group=Sex)) +
geom_pointrange(aes(ymin = mean-se, ymax = mean+se),
position=position_jitter(width=0.2,height=.1),
linetype='solid') +
facet_grid(Sex~season,scales = "free")+
coord_flip()
at
Ideally, I would like each 'type' to be separated vertically. When I jitter or dodge only those that are close separate and not evenly. Is there a way to force each 'type' to be slightly shifted so they are all on their own line? I tried to force it by giving each type a slightly different 'elevation' but then I end up with a messy y-axis (I can't figure out a way to keep the point but not display all the tick marks with a discrete scale).
Thank you for your help.
If you want to use a numerical value as a discrete value, you should use as.factor. In your example, try to use x = as.factor(Elevation).
Additionally, I will suggest to use position = position_dodge() to get points from different conditions corresponding to the same elevation to be plot side-by-side
ggplot(mydata2, aes(y = mean, x = as.factor(Elevation),color=Type, group=Sex)) +
geom_pointrange(aes(ymin = mean-se, ymax = mean+se),
position=position_dodge(),
linetype='solid') +
facet_grid(Sex~season,scales = "free")+
coord_flip()
EDIT with example data provided by the OP
Using your dataset, I was not able to get range being plot with your point. So, I create two variable Lower and Upper using dplyr package.
Then, I did not pass your commdnas facotr(...) you provided in your question but instead, I used as.factor(Elevation) and position_dodge(0.9) for the plotting to get the following plot:
library(tidyverse)
mydata2 %>% mutate(Lower = mean-se*100, Upper = mean+se*100) %>%
ggplot(., aes( x = as.factor(Elevation), y = mean, color = Type))+
geom_pointrange(aes(ymin = Lower, ymax = Upper), linetype = "solid", position = position_dodge(0.9))+
facet_grid(Sex~., scales = "free")+
coord_flip()
Does it look what you are looking for ?
Data
Your dataset provided contains few errors (too much parenthesis), so I correct here.
mydata2<- data.frame(
"Elevation" = c(rep(c(1200),10),rep(c(1325.5),10),rep(c(1350.75),10), rep(c(1550.66),10)),
"Sex" = rep(c("F","M"),20),
"Type" = rep(c("emerge","emerge","endhet","endhet","immerge","immerge","melt","melt", "storpor","storpor"),4),
"mean" = rep(c(104,100,102,80,185,210,84,84,188,208,104,87,101,82, 183,188,83,83,190,189),2),
"se"=rep(c(.1,.1,.2,.05,.03),4))

Plotting a bar chart with years grouped together

I am using the fivethirtyeight bechdel dataset, located here https://github.com/rudeboybert/fivethirtyeight, and am attempting to recreate the first plot shown in the article here https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/. I am having trouble getting the years to group together similarly to how they did in the article.
This is the current code I have:
ggplot(data = bechdel, aes(year)) +
geom_histogram(aes(fill = clean_test), binwidth = 5, position = "fill") +
scale_fill_manual(breaks = c("ok", "dubious", "men", "notalk", "nowomen"),
values=c("red", "salmon", "lightpink", "dodgerblue",
"blue")) +
theme_fivethirtyeight()
I see where you were going with using the histogram geom but this really looks more like a categorical bar chart. Once you take that approach it's easier, after a bit of ugly code to get the correct labels on the year columns.
The bars are stacked in the wrong order on this one, and there needs to be some formatting applied to look like the 538 chart, but I'll leave that for you.
library(fivethirtyeight)
library(tidyverse)
library(ggthemes)
library(scales)
# Create date range column
bechdel_summary <- bechdel %>%
mutate(date.range = ((year %/% 10)* 10) + ((year %% 10) %/% 5 * 5)) %>%
mutate(date.range = paste0(date.range," - '",substr(date.range + 5,3,5)))
ggplot(data = bechdel_summary, aes(x = date.range, fill = clean_test)) +
geom_bar(position = "fill", width = 0.95) +
scale_y_continuous(labels = percent) +
theme_fivethirtyeight()
ggplot

Stacked Bar Plot for Temperature vs Home Runs

I am trying to make some changes to my plot, but am having difficulty doing so.
(1) I would like warm, avg, and cold to be filled in as the colors red, yellow, and blue, respectively.
(2) I am trying to make the y-axis read "Count" and have it be horizontally written.
(3) In the legend, I would like the title to be Temperatures, rather than variable
Any help making these changes would be much appreciated along with other suggestions to make the plot look nicer.
df <- read.table(textConnection(
'Statistic Warm Avg Cold
Homers(Away) 1.151 1.028 .841
Homers(Home) 1.202 1.058 .949'), header = TRUE)
library(ggplot2)
library(reshape2)
df <- melt(df, id = 'Statistic')
ggplot(
data = df,
aes(
y = value,
x = Statistic,
group = variable,
shape = variable,
fill = variable
)
) +
geom_bar(stat = "identity")
You are on the right lines by trying to reshape the data into long format. My preference is to use gather from the tidyr package for that. You can also create the variable names Temperatures and Count in the gather step.
The next step is to turn the 3 classes of temperature into a factor, ordered from cold, through average, to warm.
Now you can plot. You want position = "dodge" to get the bars side by side, since it makes no sense to stack the values in a single bar. Fill colours you specify using scale_fill_manual.
You rotate the y-axis title by manipulating axis.title.y.
So putting all of that together (plus a black/white theme):
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
gather(Temperatures, Count, -Statistic) %>%
mutate(Temperatures = factor(Temperatures, c("Cold", "Avg", "Warm"))) %>%
ggplot(aes(Statistic, Count)) +
geom_col(aes(fill = Temperatures), position = "dodge") +
scale_fill_manual(values = c("blue", "yellow", "red")) +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))
Result:
I'd question whether Count is a sensible variable name in this case.
You are almost there. To map specific colors to specific factor levels you can use scale_fill_manual and create your own scale:
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
Changing the y axis legend is also easy in ggplot:
ylab("Count") +
And to change the legend title you can use:
labs(fill='TEMPERATURE') +
Giving us:
ggplot(df, aes(y = value, x = Statistic, group= variable, fill = variable)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
labs(fill='TEMPERATURE') +
ylab("Count") +
xlab("") +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))

Resources