multiple line and facet_grid in Bar plot - r

I have a dataframe with 53 states and sex variable. e.g. the below DF is having 26 states.
set.seed(25)
test <- data.frame(
state = sample(letters[1:26], 10000, replace = TRUE),
sex = sample(c("M","F"), 10000, replace = TRUE)
)
Now I want to see which state has more female member, so I created a bar plot in a grid for each state and each grid has two bars (M,F).
test.pct = test %>% group_by(state, sex) %>%
summarise(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(test.pct, aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_grid(. ~ state)
The problem is all these 26 grid are appearing in single line - visibility issue. I want to construct the plot in multiple frame, e.g 3X9 instead of 1X26.
Also the state should be ordered based of Female percentage.
Thanks for your help.

Problem #1: Use facet_wrap. Problem #2: Reorder the state levels beforehand.
It could look like this:
ggplot(transform(test.pct, state=factor(state,
levels=with(subset(test.pct, sex=="F"),
state[order(pct)]))),
aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_wrap(~ state, nrow = 3)

The first part is straightforward: just use facet_wrap instead of facet_grid. The ordering is a bit trickier; you have to reorder the levels of the factor. Just to make it a bit clearer, I've split the operation up into a few steps. First, extract only female percentages, then find the order of those percentages, and finally use that order to rearrange the order of the levels of state. That's a long-winded way of doing it, but I hope it makes the principle clear.
wom.pct <- test.pct %>% filter(sex == 'F')
ix <- order(wom.pct$pct)
test.pct$state <- factor(test.pct$state, levels = letters[1:26][ix])
ggplot(test.pct, aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_wrap( ~ state)

Related

How can I change the colors of my bars according to a condition in a ggplot in R?

df <-data.frame(BA_5287)
question <- df$Type.of.Question
submission <- df$Students.submitted
score <- df$Score.Correctly
cond <- ifelse(abs(score)>94,'darkgreen',
ifelse(abs(score)<0.94 & abs(score) >=0.7,'yellow','red'))
graph <- ggplot(data=df,
aes(x=Question,y=Score))+geom_bar(stat = "identity",
color='blue',
fill=cond)
graph + coord_flip()
This is my code. The colors of the bars change but not according to the condition. Can somebody help me please?
Thank you!
The primary problem is that you must use another aes() in the geom_bar line and call the fill argument in that. Then, as #dc37 mentioned above, you just need to use scale_fill_identity.
Another thing to note is that you do not need to define the variables outside of your dataframe as you do in your question. You can simply call them by their column names.
Here's an example, with some made up data
library(dplyr)
library(ggplot2)
df <- data.frame(question = LETTERS[1:15],
score = rnorm(15, 90,5))
Rather than a nested ifelse statement, using case_when is much more human readable.
df <- df %>%
mutate(cond = case_when(
score > 94 ~ 'darkgreen',
score < 0.7 ~ 'red',
TRUE ~ 'yellow' #anything that does not meet the criteria above
))
Then you can use fill within the aes() call to geom_bar and add scale_fill_identity
ggplot(data = df, aes(x = question, y =score)) +
geom_bar(stat = "identity", color = 'blue', aes(fill = cond)) +
scale_fill_identity() +
coord_flip()

how to plot summarized data in ggplot

I have a dataset that I want to summarize by calculating the ratio of 2 columns. However, I also need to calculate this ratio by different ‘cuts’ of my data set. i.e, ratio of the overall data, ratio by year, ratio by type, etc.
I will also need to put each ratio calculation in a bar chart.
What I want to know is whether I can plot all these bar charts without having to create a separate summary grouping dataset first.
For example, right now, before I send it to ggplot, I use group_by/summarize to my data first to calculate the ratio. Then I send it to ggplot.
Chart1 <- data %>% group_by(cut1) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart1, aes(x=cut1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
for chart 2 and chart 3, I do the same thing again
Chart2 <- data %>% group_by(cut2) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart2, aes(x=cut2, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Chart3 <- data %>% group_by(cut3) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart3, aes(x=cut3, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Is there another way to do this? Initially, I was thinking there would be a way that I can just create the ratio once and then I can use it over many times (similar to a calculated field in excel pivot tables). Is there something better than the above method?
Also, if summarizing each ratio separately is the best way, how do I do a facet chart? For example, I may want to do a facet of ratio to cut 1 and cut 2
edit: more info with example using created data:
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>% group_by(c1) %>% summarise(ratio=sum(v1)/sum(v2))
ggplot(Chart1, aes(x=c1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2") + theme(axis.text.x=element_text(angle=90))
The outcome I want is to understand how to best summarize data before plotting it. Do I need to summarize each calculation by each grouping seperatly, or is there an easier way?
for the example above, if I wanted to calculate ratio and group it by c1, and then create another ratio chart and group by c2, and then another by c3....do I need to do 3 different aggregations.
Does this accomplish what you want?
library(tidyverse)
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>%
gather(key = 'cuts', value = 'categories', -(v1:v2)) %>%
group_by(cuts, categories) %>%
summarise(ratio=sum(v1)/sum(v2))
# This lets you facet them onto the same chart,
# but that doesn't really make sense,
# since the cuts will have different x axes
ggplot(Chart1, aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
facet_grid(cuts~.) +
theme(axis.text.x=element_text(angle=90))
# This lets you make each plot separately
Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
# Use a for loop to save all of the plots to files
for(i in 1:(length(mydata)-2)){
p <-
Chart1 %>%
filter(cuts == names(mydata)[[i]]) %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
ggsave(paste0("myPlot",i,".png"), plot = p)
}
Only thing that I wasn't sure about, is how to facet the different cuts if they don't have the same values on the x-axis. If you just want to stack them on top of each other, you could use the gridExtra package:
library(gridExtra)
plot1 <- Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
plot2 <- Chart1 %>%
filter(cuts == 'c2') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
grid.arrange(plot1, plot2, ncol=1, nrow = 2)

Rank Stacked Bar Chart by Sum of Subset of Fill Variable

Sample data:
set.seed(145)
df <- data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
df.plot <- ggplot(df,aes(x=Age,y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Within the ggplot, how can I reorder x=Age, by the sum of Ranks "Extremely" and "Very" only?
I tried using the below, without success.
df.plot <- ggplot(df,aes(x=reorder(Age,Rank=="Extremely",sum),y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Couple of notes:
The way that you are simulating your data does not rule out the possibility that for some ages, all categories are not represented (which is fine), but also that for some ages, some categories are duplicated. I am assuming that this is not true for your real data, so have let this be. Note also that your simulation logic does not produce percentages that add up, although the category names indicate that they should.
The way I would do this is to create the ordering of age based on your desired logic, and then pass that order to the factor call. This decouples the ordering logic and allows arbitrary ordering logic.
Here is then what I think you are looking for:
library(ggplot2)
library(dplyr)
library(scales)
set.seed(145)
# simulate the data
df_foo = data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
# get the ordering that you are interested in
age_order = df_foo %>%
filter(Rank %in% c("Extremely", "Very")) %>%
group_by(Age) %>%
summarize(SumRank = sum(Percent)) %>%
arrange(desc(SumRank)) %>%
`[[`("Age")
# in some cases ages do not appear in the order because the
# ordering logic does not span all categories
age_order = c(age_order, setdiff(unique(df_foo$Age), age_order))
# make age a factor sorted by the ordering above
ggplot(df_foo, aes(x = factor(Age, levels = age_order), y = Percent, fill = Rank))+
geom_bar(stat = "identity") +
coord_flip() +
theme_bw() +
scale_y_continuous(labels = percent)
Which code produces:

Plotting a bar graph in R

Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)

ggplot2: Stack barcharts with group means

I have tried several things to make ggplot plot barcharts with means derived from factors in a dataframe, but i wasnt successful.
If you consider:
df <- as.data.frame(matrix(rnorm(60*2, mean=3,sd=1), 60, 2))
df$factor <- c(rep(factor(1:3), each=20))
I want to achieve a stacked, relative barchart like this:
This chart was created with manually calculating group means in a separate dataframe, melting it and using geom_bar(stat="identity", position = "fill) and scale_y_continuous(labels = percent_format()). I havent found a way to use stat_summary with stacked barcharts.
In a second step, i would like to have errorbars attached to the breaks of each column. I have six treatments and three species, so errorbars should be OK.
For anything this complicated, I think it's loads easier to pre-calculate the numbers, then plot them. This is easily done with dplyr/tidyr (even the error bars):
gather(df, 'cat', 'value', 1:2) %>%
group_by(factor, cat) %>%
summarise(mean=mean(value), se=sd(value)/sqrt(n())) %>%
group_by(cat) %>%
mutate(perc=mean/sum(mean), ymin=cumsum(perc) -se/sum(mean), ymax=cumsum(perc) + se/sum(mean)) %>%
ggplot(aes(x=cat, y=perc, fill=factor(factor))) +
geom_bar(stat='identity') +
geom_errorbar(aes(ymax=ymax, ymin=ymin))
Of course this looks a bit strange because there are error bars around 100% in the stacked bars. I think you'd be way better off ploting the actual data points, plus means and error bars and using faceting:
gather(df, 'cat', 'value', 1:2) %>%
group_by(cat, factor) %>%
summarise(mean=mean(value), se=sd(value)/sqrt(n())) %>%
ggplot(aes(x=cat, y=mean, colour=factor(factor))) +
geom_point(aes(y=value), position=position_jitter(width=.3, height=0), data=gather(df, 'cat', 'value', 1:2) ) +
geom_point(shape=5, size = 3) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1) +
facet_grid(factor ~ .)
This way anyone can examine the data and see for themselves that they are normally distributed

Resources