bar chart within 2 group variables using ggplot - r
The idea was to develop a chart where
1- It is the mix between the two charts here:
The chart is supposed to have for "Branco00" variable group the data showed as in the first chart. For the "Year" variable group, the data is supposed to be showed up as on the second chart, i.e., would be like a "fill" by "Branco00" onto "Year".
I tried:
g <- ggplot(tabDummy19, aes(X, Y, group = Year, fill = Branco00))
g + geom_col()
However, it gets close what I am looking for, however it did not separate by year on different bars but on the same ones, such as here:
Since it seems you intend to unstack the numeric bars and then organize by Year. Consider geom_bar(...), specifiying dodge position, instead of geom_col() and then run grouping variable, Year, in facet_wrap().
To demonstrate below data uses the populations of Brazil's states (as your data seems to include), pulled from Wikipedia's List of Brazilian states by population for 2000, 2010, and 2014. I include a Branco00 variable equal to each other for each state.
Data
txt = 'UF,Year,Population
SãoPaulo,2014,44035304
MinasGerais,2014,20734097
RiodeJaneiro,2014,16461173
Bahia,2014,15126371
RioGrandedoSul,2014,11207274
Paraná,2014,11081692
Pernambuco,2014,9277727
Ceará,2014,8842791
Pará,2014,8073924
Maranhão,2014,6850884
SantaCatarina,2014,6727148
Goiás,2014,6523222
Paraíba,2014,3943885
EspíritoSanto,2014,3885049
Amazonas,2014,3873743
RioGrandedoNorte,2014,3408510
Alagoas,2014,3321730
MatoGrosso,2014,3224357
Piauí,2014,3194718
DistritoFederal,2014,2852372
MatoGrossodoSul,2014,2619657
Sergipe,2014,2219574
Rondônia,2014,1748531
Tocantins,2014,1496880
Acre,2014,790101
Amapá,2014,750912
Roraima,2014,496936
SãoPaulo,2010,41262199
MinasGerais,2010,19597330
RiodeJaneiro,2010,15989929
Bahia,2010,14016906
RioGrandedoSul,2010,10693929
Paraná,2010,10444526
Pernambuco,2010,8796448
Ceará,2010,8452381
Pará,2010,7581051
Maranhão,2010,6574789
SantaCatarina,2010,6248436
Goiás,2010,6003788
Paraíba,2010,3766528
EspíritoSanto,2010,3512672
Amazonas,2010,3483985
RioGrandedoNorte,2010,3168027
Alagoas,2010,3120494
MatoGrosso,2010,3035122
Piauí,2010,3118360
DistritoFederal,2010,2570160
MatoGrossodoSul,2010,2449024
Sergipe,2010,2068017
Rondônia,2010,1562409
Tocantins,2010,1383445
Acre,2010,733559
Amapá,2010,669526
Roraima,2010,450479
SãoPaulo,2000,37032403
MinasGerais,2000,17891494
RiodeJaneiro,2000,14391282
Bahia,2000,13070250
RioGrandedoSul,2000,10187798
Paraná,2000,9569458
Pernambuco,2000,7918344
Ceará,2000,7430661
Pará,2000,6192307
Maranhão,2000,5651475
SantaCatarina,2000,5356360
Goiás,2000,5003228
Paraíba,2000,3443825
EspíritoSanto,2000,3097232
Amazonas,2000,2812557
RioGrandedoNorte,2000,2776782
Alagoas,2000,2822621
MatoGrosso,2000,2504353
Piauí,2000,2843278
DistritoFederal,2000,2051146
MatoGrossodoSul,2000,2078001
Sergipe,2000,1784475
Rondônia,2000,1379787
Tocantins,2000,1157098
Acre,2000,557526
Amapá,2000,477032
Roraima,2000,324397'
# STACK EQUAL-LENGTH BRANCO AND NEGRO DFS
brazil_pop_df <- rbind(transform(read.csv(text=txt, header=TRUE), Branco00 = "Branco"),
transform(read.csv(text=txt, header=TRUE), Branco00 = "Negro"))
Original Output (reproducing OP's similar structure)
library(ggplot2)
ggplot(brazil_pop_df, aes(UF, Population, group = Year, fill = Branco00)) +
geom_col()
Adjusted Output (with scales package to adjustment of Y axis and rotating X labels)
library(ggplot2)
library(scales)
ggplot(brazil_pop_df, aes(UF, Population, fill = Branco00)) +
geom_bar(stat="identity", position="dodge") +
scale_y_continuous(labels = comma, expand = c(0, 0), limits = c(0, 50000000)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_grid(. ~ Year)
Related
Side by side barplot in R with ggplot
I used the next code in order to get a barplot representing the relative frequencies of my dataset. ggplot(Votantes_Col,aes(x=Voto)) + geom_bar(aes(y = (..count..)/sum(..count..))) + scale_y_continuous(labels=scales::percent) + ylab("Frecuencias relativas") Until here everything its ok. But now i need to add in the same chart another barplot like this one. The final chart should compare the frequencies of YES/NO on both countries. The problem is that the two datasets have different length so i cant bind them in one data frame. Thanks for your help.
It sounds like you want data from both countries on a single plot. You are only plotting a single variable, so it is easy to join the data from the two countries together. Let us recreate your plots with some sample data: Plot 1: library(ggplot2) Votantes_Col <- data.frame(Voto = rep(c("No", "Si"), c(22, 79))) ggplot(Votantes_Col, aes(x = Voto)) + geom_bar(aes(y = (..count..)/sum(..count..))) + scale_y_continuous(labels=scales::percent) + ylab("Frecuencias relativas") Plot 2: Otros_Votantes_Col <- data.frame(Voto = factor(rep(c("Si", "No", "NS/NR"), c(65, 33, 1)), levels = c("Si", "No", "NS/NR"))) ggplot(Otros_Votantes_Col, aes(x = Voto)) + geom_bar(aes(y = (..count..)/sum(..count..))) + scale_y_continuous(labels=scales::percent) + ylab("Frecuencias relativas") Now all we need to do is take the Voto column from each data frame, and concatenate them together (ensuring they are first characters rather than factors). We put this column in a new data frame with a second column labelling which country they came from: df <- data.frame(Voto = c(as.character(Votantes_Col$Voto), as.character(Otros_Votantes_Col$Voto)), Pais = rep(c("Pais_1", "Pais_2"), c(nrow(Votantes_Col), nrow(Otros_Votantes_Col)))) Now we simply use this to create our new plot. We will use the column Pais as the fill colour for our bars: Plot 3: ggplot(df, aes(x = Voto, fill = Pais)) + geom_bar(aes(y = (..count..)/sum(..count..)), position = position_dodge()) + scale_y_continuous(labels=scales::percent) + ylab("Frecuencias relativas") Created on 2020-09-08 by the reprex package (v0.3.0)
How do you create a plot from two different data frames (or how do you combine data frames with identical column names)
I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so df2019 <- data.frame(Institute = c("A","B","C"),Women = c(65,50,70),Men = c(35,50,30)) df2016 <- data.frame(Institute = c("A","B","C"),Women = c(70,45,50),Men = c(30,55,50)) df2019_melted <- melt(df2019) ggplot(data = df2019_melted, aes(x = Institute, y = value, fill = variable))+ geom_bar(stat = "identity", position = "dodge")+ labs(fill = "Gender")+ xlab("Institute")+ ylab("Percent")+ scale_fill_discrete(labels = c("Women","Men"))+ ggtitle("Overall Gender Composition 2019") but I want the plot to show 2016 in faded bars, but grouped the same way as 2019, so 4 bars for each Institute. Since the column names are the same for all of my dataframes I cant use rbind() or similar since it doesnt differentiate between what dataframe is what when combined.
Add a column for year to your data frames and then combine and melt. ggplot prefers everything to be in one data.frame all_melted <- reshape2::melt( rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)), id=c("year", "Institute")) Then you can plot with something like this, mapping year to alpha to make "faded" bars ggplot(all_melted, aes(x = Institute, y = value, fill = variable, alpha=factor(year)))+ geom_col(position = "dodge")+ labs(fill = "Gender")+ xlab("Institute")+ ylab("Percent")+ scale_alpha_discrete(range=c(.4, 1), name="Year") + ggtitle("Overall Gender Composition")
Time series data using ggplot: how use different color for each time point and also connect with lines data belonging to each subject?
I have data from several cells which I tested in several conditions: a few times before and also a few times after treatment. In ggplot, I use color to indicate different times of testing. Additionally, I would like to connect with lines all data points which belong to the same cell. Is that possible?... Here is my example data (https://www.dropbox.com/s/eqvgm4yu6epijgm/df.csv?dl=0) and a simplified code for the plot: df$condition = as.factor(df$condition) df$cell = as.factor(df$cell) df$condition <- factor(df$condition, levels = c("before1", "before2", "after1", "after2", "after3") windows(width=8,height=5) ggplot(df, aes(x=condition, y=test_variable, color=condition)) + labs(title="", x = "Condition", y = "test_variable", color="Condition") + geom_point(aes(color=condition),size=2,shape=17, position = position_jitter(w = 0.1, h = 0))
I think you get in the wrong direction for your code, you should instead group and colored each points based on the column Cell. Then, if I'm right, you are looking to see the evolution of the variable for each cell before and after a treatment, so you can order the x variable using scale_x_discrete. Altogether, you can do something like that: library(ggplot2) ggplot(df, aes(x = condition, y = variable, group = Cell)) + geom_point(aes(color = condition))+ geom_line(aes(color = condition))+ scale_x_discrete(limits = c("before1","before2","after1","after2","after3")) Does it look what you are expecting ? Data df = data.frame(Cell = c(rep("13a",5),rep("1b",5)), condition = rep(c("before1","before2","after1","after2","after3"),2), variable = c(58,55,36,29,53,57,53,54,52,52))
Highlight positions without data in facet_wrap ggplot
When facetting barplots in ggplot the x-axis includes all factor levels. However, not all levels may be present in each group. In addition, zero values may be present, so from the barplot alone it is not possible to distinguish between x-axis values with no data and those with zero y-values. Consider the following example: library(tidyverse) set.seed(43) site <- c("A","B","C","D","E") %>% sample(20, replace=T) %>% sort() year <- c("2010","2011","2012","2013","2014","2010","2011","2012","2013","2014","2010","2012","2013","2014","2010","2011","2012","2014","2012","2014") isZero = rbinom(n = 20, size = 1, prob = 0.40) value <- ifelse(isZero==1, 0, rnorm(20,10,3)) %>% round(0) df <- data.frame(site,year,value) ggplot(df, aes(x=year, y=value)) + geom_bar(stat="identity") + facet_wrap(~site) This is fish census data, where not all sites were fished in all years, but some times no fish were caught. Hence the need to differentiate between the two situations. For example, there was no catch at site C in 2010 and it was not fished in 2011, and the reader cannot tell the difference. I would like to add something like "no data" to the plot for 2011. Maybe it is possible to fill the rows where data is missing, generate another column with the desired text to be added and then include this via geom_text?
So here is an example of your proposed method: # Tabulate sites vs year, take zero entries tab <- table(df$site, df$year) idx <- which(tab == 0, arr.ind = T) # Build new data.frame missing <- data.frame(site = rownames(tab)[idx[, "row"]], year = colnames(tab)[idx[, "col"]], value = 1, label = "N.D.") # For 'no data' ggplot(df, aes(year, value)) + geom_col() + geom_text(data = missing, aes(label = label)) + facet_wrap(~site) Alternatively, you could also let the facets omit unused x-axis values: ggplot(df, aes(x=year, y=value)) + geom_bar(stat="identity") + facet_wrap(~site, scales = "free_x")
Plot a stacked barplot - amended
I have 4 dataframes, which all have a column called Results showing Wins, Draws, Losses. I would like to create a layered histogram as the picture below. Any idea if it is achievable in R? This is what I was playing with: ggplot(results, aes(x = Country, y = ??)) + geom_bar(aes(fill = Performance), stat = "identity") Problem with this is I don't know what should I set the y axis to be. These are supposed to be counts Another option I tried which is almost what I want is this: counts <- table(results$Performance, results$Country) barplot(counts, main="Game Count per Football Team", xlab="Football Teams", ylab = "Game Count", col=c("darkblue","red", "Yellow"), legend = rownames(counts)) Although the y axis stop at 800 although I have 908 observations max in one of the countries
Well, I can give you some code that will show you how you could do this. You basically would just want four different geom_bar statements. To demonstrate, I'll create two different dataframes from the mpg dataset that comes with the ggplot2 package, because you didn't provide any data. library(tidyverse) # I'm making two different data frames from the # 'mpg' dataset, which comes with the ggplot package mpg$year = as.character(mpg$year) df1 = filter(mpg, year == "1999") df2 = filter(mpg, year == "2008") plot = ggplot() + geom_bar(data=df1 , aes(x = year, y = hwy, fill = manufacturer) , stat = "identity") + geom_bar(data=df2 , aes(x = year, y = hwy, fill = manufacturer) , stat = "identity") print(plot)