Side by side barplot in R with ggplot - r

I used the next code in order to get a barplot representing the relative frequencies of my dataset.
ggplot(Votantes_Col,aes(x=Voto)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels=scales::percent) +
ylab("Frecuencias relativas")
Until here everything its ok. But now i need to add in the same chart another barplot like this one.
The final chart should compare the frequencies of YES/NO on both countries. The problem is that the two datasets have different length so i cant bind them in one data frame.
Thanks for your help.

It sounds like you want data from both countries on a single plot.
You are only plotting a single variable, so it is easy to join the data from the two countries together. Let us recreate your plots with some sample data:
Plot 1:
library(ggplot2)
Votantes_Col <-
data.frame(Voto = rep(c("No", "Si"), c(22, 79)))
ggplot(Votantes_Col, aes(x = Voto)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels=scales::percent) +
ylab("Frecuencias relativas")
Plot 2:
Otros_Votantes_Col <-
data.frame(Voto = factor(rep(c("Si", "No", "NS/NR"), c(65, 33, 1)),
levels = c("Si", "No", "NS/NR")))
ggplot(Otros_Votantes_Col, aes(x = Voto)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels=scales::percent) +
ylab("Frecuencias relativas")
Now all we need to do is take the Voto column from each data frame, and concatenate them together (ensuring they are first characters rather than factors). We put this column in a new data frame with a second column labelling which country they came from:
df <- data.frame(Voto = c(as.character(Votantes_Col$Voto),
as.character(Otros_Votantes_Col$Voto)),
Pais = rep(c("Pais_1", "Pais_2"),
c(nrow(Votantes_Col), nrow(Otros_Votantes_Col))))
Now we simply use this to create our new plot. We will use the column Pais as the fill colour for our bars:
Plot 3:
ggplot(df, aes(x = Voto, fill = Pais)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = position_dodge()) +
scale_y_continuous(labels=scales::percent) +
ylab("Frecuencias relativas")
Created on 2020-09-08 by the reprex package (v0.3.0)

Related

stack bars by an ordering variable which is numeric ggplot

I am trying to create a swimlane plot of different subjects doses over time. When I run my code the bars are stacked by amount of dose. My issue is that subjects doses vary they could have 5, 10 , 5 in my plot the 5's are stacked together. But I want the represented as they happen over time. In my data set I have the amount of time each patient was on a dose for ordered by when they had the dose. I want by bars stacked by ordering variable called "p" which is numeric is goes 1,2,3,4,5,6 etc which what visit the subject had that dose.
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(EXDOSE))) +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
I want the bars stacked by my variable "p" not by fill
I tried forcats but that does not work. Unsure how to go about this the data in the dataset is arranged by p for each subject
example data
dataset <- data.frame(subject = c("1002", "1002", "1002", "1002", "1034","1034","1034","1034"),
exdose = c(5,10,20,5,5,10,20,20),
p= c(1,2,3,4,1,2,3,4),
diff = c(3,3,9,7,3,3,4,5)
)
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(exdose)),position ="stack") +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
If you want to order your stacked bar chart by p you have to tell ggplot2 to do so by mapping p on the group aesthetic. Otherwise ggplot2 will make a guess which by default is based on the categorical variables mapped on any aesthetic, i.e. in your case the fill aes:
Note: I dropped the scale_fill_manual as you did not provide the vector of colors. But that's not important for the issue.
library(ggplot2)
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)))
EDIT And to get the right order we have to reverse the order of the stack which could be achieved using position_stack(reverse = TRUE):
Note: To check that we have the right order I added a geom_text showing the p value.
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)), position = position_stack(reverse = TRUE)) +
geom_text(aes(label = p), position = position_stack(reverse = TRUE))
Second option would be to convert p to a factor which the order of levels set in the reverse order:
ggplot(dataset, aes(x = diff + 1, y = subject, group = factor(p, rev(sort(unique(p)))))) +
geom_col(aes(fill = as.factor(exdose))) +
geom_text(aes(label = p), position = "stack")

How do you create a plot from two different data frames (or how do you combine data frames with identical column names)

I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so
df2019 <- data.frame(Institute = c("A","B","C"),Women = c(65,50,70),Men = c(35,50,30))
df2016 <- data.frame(Institute = c("A","B","C"),Women = c(70,45,50),Men = c(30,55,50))
df2019_melted <- melt(df2019)
ggplot(data = df2019_melted, aes(x = Institute, y = value, fill = variable))+
geom_bar(stat = "identity", position = "dodge")+
labs(fill = "Gender")+
xlab("Institute")+
ylab("Percent")+
scale_fill_discrete(labels = c("Women","Men"))+
ggtitle("Overall Gender Composition 2019")
but I want the plot to show 2016 in faded bars, but grouped the same way as 2019, so 4 bars for each Institute.
Since the column names are the same for all of my dataframes I cant use rbind() or similar since it doesnt differentiate between what dataframe is what when combined.
Add a column for year to your data frames and then combine and melt. ggplot prefers everything to be in one data.frame
all_melted <- reshape2::melt(
rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)),
id=c("year", "Institute"))
Then you can plot with something like this, mapping year to alpha to make "faded" bars
ggplot(all_melted, aes(x = Institute, y = value, fill = variable, alpha=factor(year)))+
geom_col(position = "dodge")+
labs(fill = "Gender")+
xlab("Institute")+
ylab("Percent")+
scale_alpha_discrete(range=c(.4, 1), name="Year") +
ggtitle("Overall Gender Composition")

bar chart within 2 group variables using ggplot

The idea was to develop a chart where
1- It is the mix between the two charts here:
The chart is supposed to have for "Branco00" variable group the data showed as in the first chart. For the "Year" variable group, the data is supposed to be showed up as on the second chart, i.e., would be like a "fill" by "Branco00" onto "Year".
I tried:
g <- ggplot(tabDummy19, aes(X, Y, group = Year, fill = Branco00))
g + geom_col()
However, it gets close what I am looking for, however it did not separate by year on different bars but on the same ones, such as here:
Since it seems you intend to unstack the numeric bars and then organize by Year. Consider geom_bar(...), specifiying dodge position, instead of geom_col() and then run grouping variable, Year, in facet_wrap().
To demonstrate below data uses the populations of Brazil's states (as your data seems to include), pulled from Wikipedia's List of Brazilian states by population for 2000, 2010, and 2014. I include a Branco00 variable equal to each other for each state.
Data
txt = 'UF,Year,Population
SãoPaulo,2014,44035304
MinasGerais,2014,20734097
RiodeJaneiro,2014,16461173
Bahia,2014,15126371
RioGrandedoSul,2014,11207274
Paraná,2014,11081692
Pernambuco,2014,9277727
Ceará,2014,8842791
Pará,2014,8073924
Maranhão,2014,6850884
SantaCatarina,2014,6727148
Goiás,2014,6523222
Paraíba,2014,3943885
EspíritoSanto,2014,3885049
Amazonas,2014,3873743
RioGrandedoNorte,2014,3408510
Alagoas,2014,3321730
MatoGrosso,2014,3224357
Piauí,2014,3194718
DistritoFederal,2014,2852372
MatoGrossodoSul,2014,2619657
Sergipe,2014,2219574
Rondônia,2014,1748531
Tocantins,2014,1496880
Acre,2014,790101
Amapá,2014,750912
Roraima,2014,496936
SãoPaulo,2010,41262199
MinasGerais,2010,19597330
RiodeJaneiro,2010,15989929
Bahia,2010,14016906
RioGrandedoSul,2010,10693929
Paraná,2010,10444526
Pernambuco,2010,8796448
Ceará,2010,8452381
Pará,2010,7581051
Maranhão,2010,6574789
SantaCatarina,2010,6248436
Goiás,2010,6003788
Paraíba,2010,3766528
EspíritoSanto,2010,3512672
Amazonas,2010,3483985
RioGrandedoNorte,2010,3168027
Alagoas,2010,3120494
MatoGrosso,2010,3035122
Piauí,2010,3118360
DistritoFederal,2010,2570160
MatoGrossodoSul,2010,2449024
Sergipe,2010,2068017
Rondônia,2010,1562409
Tocantins,2010,1383445
Acre,2010,733559
Amapá,2010,669526
Roraima,2010,450479
SãoPaulo,2000,37032403
MinasGerais,2000,17891494
RiodeJaneiro,2000,14391282
Bahia,2000,13070250
RioGrandedoSul,2000,10187798
Paraná,2000,9569458
Pernambuco,2000,7918344
Ceará,2000,7430661
Pará,2000,6192307
Maranhão,2000,5651475
SantaCatarina,2000,5356360
Goiás,2000,5003228
Paraíba,2000,3443825
EspíritoSanto,2000,3097232
Amazonas,2000,2812557
RioGrandedoNorte,2000,2776782
Alagoas,2000,2822621
MatoGrosso,2000,2504353
Piauí,2000,2843278
DistritoFederal,2000,2051146
MatoGrossodoSul,2000,2078001
Sergipe,2000,1784475
Rondônia,2000,1379787
Tocantins,2000,1157098
Acre,2000,557526
Amapá,2000,477032
Roraima,2000,324397'
# STACK EQUAL-LENGTH BRANCO AND NEGRO DFS
brazil_pop_df <- rbind(transform(read.csv(text=txt, header=TRUE), Branco00 = "Branco"),
transform(read.csv(text=txt, header=TRUE), Branco00 = "Negro"))
Original Output (reproducing OP's similar structure)
library(ggplot2)
ggplot(brazil_pop_df, aes(UF, Population, group = Year, fill = Branco00)) +
geom_col()
Adjusted Output (with scales package to adjustment of Y axis and rotating X labels)
library(ggplot2)
library(scales)
ggplot(brazil_pop_df, aes(UF, Population, fill = Branco00)) +
geom_bar(stat="identity", position="dodge") +
scale_y_continuous(labels = comma, expand = c(0, 0), limits = c(0, 50000000)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_grid(. ~ Year)

Plotting a bar chart with years grouped together

I am using the fivethirtyeight bechdel dataset, located here https://github.com/rudeboybert/fivethirtyeight, and am attempting to recreate the first plot shown in the article here https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/. I am having trouble getting the years to group together similarly to how they did in the article.
This is the current code I have:
ggplot(data = bechdel, aes(year)) +
geom_histogram(aes(fill = clean_test), binwidth = 5, position = "fill") +
scale_fill_manual(breaks = c("ok", "dubious", "men", "notalk", "nowomen"),
values=c("red", "salmon", "lightpink", "dodgerblue",
"blue")) +
theme_fivethirtyeight()
I see where you were going with using the histogram geom but this really looks more like a categorical bar chart. Once you take that approach it's easier, after a bit of ugly code to get the correct labels on the year columns.
The bars are stacked in the wrong order on this one, and there needs to be some formatting applied to look like the 538 chart, but I'll leave that for you.
library(fivethirtyeight)
library(tidyverse)
library(ggthemes)
library(scales)
# Create date range column
bechdel_summary <- bechdel %>%
mutate(date.range = ((year %/% 10)* 10) + ((year %% 10) %/% 5 * 5)) %>%
mutate(date.range = paste0(date.range," - '",substr(date.range + 5,3,5)))
ggplot(data = bechdel_summary, aes(x = date.range, fill = clean_test)) +
geom_bar(position = "fill", width = 0.95) +
scale_y_continuous(labels = percent) +
theme_fivethirtyeight()
ggplot

dodge columns in ggplot2

I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.
Here is a subset of the data to work on:
gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)
The code I wrote is this
p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
geom_bar(stat="identity",position = position_dodge(0.9)) +
labs(x="Drug",y="Prevalence") +
geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The figure I obtain is this one where bars are all on top of each other while I want them "dodge".
I also obtain the following warning:
ymax not defined: adjusting position using y instead
Warning message:
position_dodge requires non-overlapping x intervals
Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.
Also should I be concerned about the warning (which I clearly do not fully understand)?
I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.
Thank you
ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here:
library(dplyr)
# calculate x-axis position for bars of varying width
gr <- gr %>%
group_by(drug) %>%
arrange(practice) %>%
mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
ungroup()
x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]
ggplot(gr,
aes(x = pos, y = prevalence,
fill = country, width = prop,
ymin = low.CI, ymax = high.CI)) +
geom_col(col = "black") +
geom_errorbar(size = 0.25, colour = "black") +
facet_wrap(~drug) +
scale_fill_manual(values = c("c1" = "gray79",
"c2" = "gray60",
"c3" = "gray39"),
guide = F) +
scale_x_continuous(name = "Drug",
labels = x.labels,
breaks = x.pos) +
labs(title = "Drug usage by country and practice", y = "Prevalence") +
theme_classic()
There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
gr$drug <- paste("Drug", gr$drug)
p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence),
ymax=high.CI,ymin = low.CI,
position="dodge",fill=practice, width=prop))
p + theme_bw()+ facet_grid(drug~country, scales="free") +
geom_bar(stat="identity") +
labs(x="Practice",y="Prevalence") +
geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The width is too small in the C1 country and as you indicated the one clinic is quite influential.
Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.

Resources