I have a dataset that I want to summarize by calculating the ratio of 2 columns. However, I also need to calculate this ratio by different ‘cuts’ of my data set. i.e, ratio of the overall data, ratio by year, ratio by type, etc.
I will also need to put each ratio calculation in a bar chart.
What I want to know is whether I can plot all these bar charts without having to create a separate summary grouping dataset first.
For example, right now, before I send it to ggplot, I use group_by/summarize to my data first to calculate the ratio. Then I send it to ggplot.
Chart1 <- data %>% group_by(cut1) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart1, aes(x=cut1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
for chart 2 and chart 3, I do the same thing again
Chart2 <- data %>% group_by(cut2) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart2, aes(x=cut2, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Chart3 <- data %>% group_by(cut3) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart3, aes(x=cut3, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Is there another way to do this? Initially, I was thinking there would be a way that I can just create the ratio once and then I can use it over many times (similar to a calculated field in excel pivot tables). Is there something better than the above method?
Also, if summarizing each ratio separately is the best way, how do I do a facet chart? For example, I may want to do a facet of ratio to cut 1 and cut 2
edit: more info with example using created data:
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>% group_by(c1) %>% summarise(ratio=sum(v1)/sum(v2))
ggplot(Chart1, aes(x=c1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2") + theme(axis.text.x=element_text(angle=90))
The outcome I want is to understand how to best summarize data before plotting it. Do I need to summarize each calculation by each grouping seperatly, or is there an easier way?
for the example above, if I wanted to calculate ratio and group it by c1, and then create another ratio chart and group by c2, and then another by c3....do I need to do 3 different aggregations.
Does this accomplish what you want?
library(tidyverse)
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>%
gather(key = 'cuts', value = 'categories', -(v1:v2)) %>%
group_by(cuts, categories) %>%
summarise(ratio=sum(v1)/sum(v2))
# This lets you facet them onto the same chart,
# but that doesn't really make sense,
# since the cuts will have different x axes
ggplot(Chart1, aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
facet_grid(cuts~.) +
theme(axis.text.x=element_text(angle=90))
# This lets you make each plot separately
Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
# Use a for loop to save all of the plots to files
for(i in 1:(length(mydata)-2)){
p <-
Chart1 %>%
filter(cuts == names(mydata)[[i]]) %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
ggsave(paste0("myPlot",i,".png"), plot = p)
}
Only thing that I wasn't sure about, is how to facet the different cuts if they don't have the same values on the x-axis. If you just want to stack them on top of each other, you could use the gridExtra package:
library(gridExtra)
plot1 <- Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
plot2 <- Chart1 %>%
filter(cuts == 'c2') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
grid.arrange(plot1, plot2, ncol=1, nrow = 2)
Related
I'm hoping to recreate the gridExtra output below with ggplot's facet_grid, but I'm unsure of what variable ggplot identifies with the layers in the plot. In this example, there are two geoms...
require(tidyverse)
a <- ggplot(mpg)
b <- geom_point(aes(displ, cyl, color = drv))
c <- geom_smooth(aes(displ, cyl, color = drv))
d <- a + b + c
# output below
gridExtra::grid.arrange(
a + b,
a + c,
ncol = 2
)
# Equivalent with gg's facet_grid
# needs a categorical var to iter over...
d$layers
#d + facet_grid(. ~ d$layers??)
The gridExtra output that I'm hoping to recreate is:
A hacky way of doing this is to take the existing data frame and create two, three, as many copies of the data frame you need with a value linked to it to be used for the facet and filtering later on. Union (or rbind) the data frames together into one data frame. Then set up the ggplot and geoms and filter each geom for the desired attribute. Also for the facet use the existing attribute to split the plots.
This can be seen below:
df1 <- data.frame(
graph = "point_plot",
mpg
)
df2 <- data.frame(
graph = "spline_plot",
mpg
)
df <- rbind(df1, df2)
ggplot(df, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(data = filter(df, graph == "point_plot")) +
geom_smooth(data = filter(df, graph == "spline_plot"), se=FALSE) +
facet_grid(. ~ graph)
If you really want to show different plots on different facets, one hacky way would be to make separate copies of the data and subset those...
mpg2 <- mpg %>% mutate(facet = 1) %>%
bind_rows(mpg %>% mutate(facet = 2))
ggplot(mpg2, aes(displ, cyl, color = drv)) +
geom_point(data = subset(mpg2, facet == 1)) +
geom_smooth(data = subset(mpg2, facet == 2)) +
facet_wrap(~facet)
I would like to rearrange the facet_wrap plots in a better way.
library(ggplot2)
set.seed(123)
freq <- sample(1:10, 20, replace = T)
labels <- sample(LETTERS, 20)
value <- paste("i",1:13,sep='')
lab <- rep(unlist(lapply(1:length(freq), function(x) rep(labels[x],freq[x]))),2)
ival <- rep(unlist(lapply(1:length(freq), function(x) value[1:freq[x]])),2)
df <- data.frame(lab, ival, type=c(rep('Type1',119),rep('Type2',119)),val=runif(238,0,1))
ggplot(df, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
It results in the below plot:
Is there any way rearrange the plots based on their frequency? Some of the lab frequencies (or the number of points per type) are very low(1-3). I would like to arrange the plots facet_wrap wrt their frequencies instead of their label orders. One advantage is to reduce the plotting area and get better intuition from the plots.
Can it be done using the frequency values computed on the fly and passing them to the facet_wrap? Or it should be done separately using dplyr approaches and divide the data into low/medium/high frequent set of plots?
Here is one idea. We can use dplyr to calculate the number of each group in lab and use fct_reorder from forcats to reorder the factor level.
library(dplyr)
library(forcats)
df2 <- df %>%
group_by(lab) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(lab = fct_reorder(lab, N))
ggplot(df2, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
Set .desc = TRUE when using fct_reorder if you want to reverse the factor levels.
I have a dataframe with 53 states and sex variable. e.g. the below DF is having 26 states.
set.seed(25)
test <- data.frame(
state = sample(letters[1:26], 10000, replace = TRUE),
sex = sample(c("M","F"), 10000, replace = TRUE)
)
Now I want to see which state has more female member, so I created a bar plot in a grid for each state and each grid has two bars (M,F).
test.pct = test %>% group_by(state, sex) %>%
summarise(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(test.pct, aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_grid(. ~ state)
The problem is all these 26 grid are appearing in single line - visibility issue. I want to construct the plot in multiple frame, e.g 3X9 instead of 1X26.
Also the state should be ordered based of Female percentage.
Thanks for your help.
Problem #1: Use facet_wrap. Problem #2: Reorder the state levels beforehand.
It could look like this:
ggplot(transform(test.pct, state=factor(state,
levels=with(subset(test.pct, sex=="F"),
state[order(pct)]))),
aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_wrap(~ state, nrow = 3)
The first part is straightforward: just use facet_wrap instead of facet_grid. The ordering is a bit trickier; you have to reorder the levels of the factor. Just to make it a bit clearer, I've split the operation up into a few steps. First, extract only female percentages, then find the order of those percentages, and finally use that order to rearrange the order of the levels of state. That's a long-winded way of doing it, but I hope it makes the principle clear.
wom.pct <- test.pct %>% filter(sex == 'F')
ix <- order(wom.pct$pct)
test.pct$state <- factor(test.pct$state, levels = letters[1:26][ix])
ggplot(test.pct, aes(x=sex, y=pct, fill=sex)) +
geom_bar(stat="identity") +
facet_wrap( ~ state)
Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)
I´m trying to produce this plot here:
But the order on the right side gets mixed up.
In each horizontal line there are two stacked barplots, one in positive, one in negative direction. Each one has its own dataframe. df1 for the left side, df2 for the right side. The middle-category is split in half and partly on the left and the other half on the right side.
I tried to reorder the factor df2$level, which holds the order for the right side barplot, but it didn´t change a thing (of course i took out the order= as.numeric(level) from the ggplot2 call).
df2$level <- factor(df2$level, levels=rev(levels(df2$level)))
df2$level
Here is the example-data:
library("plyr")
library("dplyr")
library("stringr")
library("ggplot2")
# example data
Variable<-c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4")
level<-c(5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1)
perc_w<-c(3.70,11.80,10.10,25.80,38.60,2.00,16.90,13.25,28.80,25.80,1.80,6.50,9.35,33.60,39.40,3.50,12.40,14.10,34.80,21.10)
df<-data.frame(Variable,level,perc_w)
df$perc_w<-as.numeric(df$perc_w)
df$level<-as.factor(df$level)
# item text
items<-c("~ It´s not known, if climate change is real",
"~ In my opinion, the risks of climate change are exaggerated by activists",
"~ Climate change is not as dangerous as it is claimed",
"~ I´m convinced that we can handle climate change")
df$Variable<-as.character(df$Variable)
df$Variable[df$Variable==1]<-items[1]
df$Variable[df$Variable==2]<-items[2]
df$Variable[df$Variable==3]<-items[3]
df$Variable[df$Variable==4]<-items[4]
df$Variable<-as.ordered(df$Variable)
# calculate halves of the neutral category
df.split <-df %>% filter(level==3) %>% mutate(perc_w=as.numeric(perc_w/2))
# replace old neutral-category
df<-df %>% filter(!level==3)
df<-full_join(df,df.split) %>% arrange(level) %>% arrange(desc(Variable))
#split dataframe
df1<-df %>% filter(level == 3 | level== 2 | level==1)
df2<-df %>% filter(level == 5 | level== 4 | level==3) %>% mutate(perc_w = perc_w *-1)
# automatic line break
df1$Variable <-str_wrap(df1$Variable, width = 41)
df2$Variable <-str_wrap(df2$Variable, width = 41)
# reorder factor "Variable"
df1$Variable <- factor(df1$Variable, levels=rev(unique(df1$Variable)))
df2$Variable <- factor(df2$Variable, levels=rev(unique(df2$Variable)))
#Plot
p<-ggplot() +
geom_bar(data=df1, aes(x = Variable, y=perc_w, fill = level, order = -as.numeric(level)),position="stack", stat="identity") +
geom_bar(data=df2, aes(x = Variable, y=perc_w, fill = level, order = as.numeric(level)),position="stack", stat="identity") +
geom_hline(yintercept = 0, color =c("black"))+
theme_bw() +
coord_flip() +
guides(fill=guide_legend(title="",reverse=TRUE)) +
scale_fill_brewer(palette="Blues", name="",labels=c("--","-","0","+","++")) +
labs(title=expression(atop(bold("Attitudes towards climate change"),
atop(italic("Some roughly translated items"),""))),
y="percentages",x="") +
theme(legend.position="top",
axis.ticks = element_blank(),
plot.title = element_text(size=25),
axis.title.y=element_text(size=16),
axis.text.y=element_text(size=13),
axis.title.x=element_text(size=16),
axis.text.x=element_text(size=13),
legend.title=element_text(size=14),
legend.text=element_text(size=12)
)
p
Shamelessly taking a hint from this SO question:
ggplot will plot the stacked bars in the order it encounters them when using stat = "identity".
So, adding
df1 <- df1 %>% group_by(Variable) %>% arrange(desc(level))
df2 <- df2 %>% group_by(Variable) %>% arrange(level)
just before your plot code should give you the desired results.