I am trying to obtain a bar plot where each bar is made of a pile of squares, so that it is easier to count the observations on each bar. Here is a minimal example
library(ggplot2)
d = data.frame(p = rbinom(20,10,.3))
d %>% ggplot(aes(x=p))+geom_bar(fill="white",color="black",position="stack",alpha=.5)+
theme_void()
Which gives something like:
Basically, I'd like to have horizontal separating lines at every unit on the bars.
This approach can be useful:
library(ggplot2)
library(dplyr)
#Data
d = data.frame(p = rbinom(20,10,.3))
#Plot
d %>%
group_by(p) %>%
mutate(col=row_number()) %>%
ggplot(aes(x=p,fill=factor(col)))+
geom_bar(position="stack",alpha=.5,color='black')+
theme_void()+
scale_fill_manual(values=rep('white',5))+
theme(legend.position = 'none')
Output:
Related
Say I use the following to produce the bar graph below. How would I only visualize bars where the count is above, say, 20? I'm able to do this kind of filtering with the which function on variables I create or that exist in my data, but I'm not sure how to access/filter the auto-counts generated by ggplot. Thanks.
g <- ggplot(mpg, aes(class))
g + geom_bar()
Simple bar graph
Aggregating and filtering your data before plotting and using stat = "identity" is probably the easiest solution, e.g.:
library(tidyverse)
mpg %>%
group_by(class) %>%
count %>%
filter(n > 20) %>%
ggplot(aes(x = class, y = n)) +
geom_bar(stat = "identity")
You can try this, so the auto-counts are ..count.. in aes (yes I know it's weird, you can see Special variables in ggplot (..count.., ..density.., etc.)). And if you apply an ifelse, that makes it NA if < 20, then you have your plot.. (not very nice code..)
g <- ggplot(mpg, aes(class))
g + geom_bar(aes(y = ifelse(..count.. > 20, ..count.., NA)))
is there a way to plot percentages using plot_ly. For example, the below is used to plot the count of cut from diamonds dataset,
plot_ly(diamonds, x = ~cut)
But i tried to plot the percentage for cut. For example I need the percentage of "Good" to the total count. Is there a way to get it?
It could be done like this.
First, create percentage for each cut category
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100)
summarized dataset
Second, pipe the resultant data set to plot_ly()
diamonds %>% group_by(cut) %>% summarize(perc = n()/53940*100) %>% plot_ly(x = ~cut, y = ~perc)
R Plot
You can use data.table and ggplot2:
library(data.table)
library(ggplot2)
dt <- data.table(diamonds)
Calculate the number of records by each cut, and then calculate the prop.table of those counts:
result <- dt[, .N, by = cut][, .(cut, N, percentCut = prop.table(N))]
Now you can plot it with ggplot and use the library scales to have a beautiful percent-formatted y-axis:
p <- ggplot(result, aes(x = cut, y = percentCut))+
geom_col()+
scale_y_continuous(labels = scales::percent)
Now you can pass p to plotly, if so you want:
plotly::ggplotly(p)
I have a dataset that I want to summarize by calculating the ratio of 2 columns. However, I also need to calculate this ratio by different ‘cuts’ of my data set. i.e, ratio of the overall data, ratio by year, ratio by type, etc.
I will also need to put each ratio calculation in a bar chart.
What I want to know is whether I can plot all these bar charts without having to create a separate summary grouping dataset first.
For example, right now, before I send it to ggplot, I use group_by/summarize to my data first to calculate the ratio. Then I send it to ggplot.
Chart1 <- data %>% group_by(cut1) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart1, aes(x=cut1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
for chart 2 and chart 3, I do the same thing again
Chart2 <- data %>% group_by(cut2) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart2, aes(x=cut2, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Chart3 <- data %>% group_by(cut3) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart3, aes(x=cut3, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Is there another way to do this? Initially, I was thinking there would be a way that I can just create the ratio once and then I can use it over many times (similar to a calculated field in excel pivot tables). Is there something better than the above method?
Also, if summarizing each ratio separately is the best way, how do I do a facet chart? For example, I may want to do a facet of ratio to cut 1 and cut 2
edit: more info with example using created data:
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>% group_by(c1) %>% summarise(ratio=sum(v1)/sum(v2))
ggplot(Chart1, aes(x=c1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2") + theme(axis.text.x=element_text(angle=90))
The outcome I want is to understand how to best summarize data before plotting it. Do I need to summarize each calculation by each grouping seperatly, or is there an easier way?
for the example above, if I wanted to calculate ratio and group it by c1, and then create another ratio chart and group by c2, and then another by c3....do I need to do 3 different aggregations.
Does this accomplish what you want?
library(tidyverse)
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>%
gather(key = 'cuts', value = 'categories', -(v1:v2)) %>%
group_by(cuts, categories) %>%
summarise(ratio=sum(v1)/sum(v2))
# This lets you facet them onto the same chart,
# but that doesn't really make sense,
# since the cuts will have different x axes
ggplot(Chart1, aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
facet_grid(cuts~.) +
theme(axis.text.x=element_text(angle=90))
# This lets you make each plot separately
Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
# Use a for loop to save all of the plots to files
for(i in 1:(length(mydata)-2)){
p <-
Chart1 %>%
filter(cuts == names(mydata)[[i]]) %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
ggsave(paste0("myPlot",i,".png"), plot = p)
}
Only thing that I wasn't sure about, is how to facet the different cuts if they don't have the same values on the x-axis. If you just want to stack them on top of each other, you could use the gridExtra package:
library(gridExtra)
plot1 <- Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
plot2 <- Chart1 %>%
filter(cuts == 'c2') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
grid.arrange(plot1, plot2, ncol=1, nrow = 2)
Hi I'm having issues with a stacked bar chart.
The goal is to print a bar chart that shows the sum of products sold stacked on top of each other, which I have done, but the products are not grouped together, so instead of having big blocks per product, they are all split. I need some way to aggregate the count, so it sums and then I can add the chart in some sort of order
library(ggplot2)
library(plyr) #Is this automatically loaded with ggplot2?
library(dplyr)
salesMixData <- read.csv("SalesMix.csv", stringsAsFactors = FALSE, header = TRUE)
productMix <- salesMixData[,c(1,6,7)]
ggplot(productMix, aes(x=JoinMonthYear, y=Count,fill=Prod)) +
geom_bar(stat='identity') +
theme(axis.text.x = element_text(angle=60, hjust = 1),legend.position="bottom")
The output looks like the following:
You probably want to summarise the data first, calculating an aggregate sum for each combination of JoinMonthYear and Prod.
Here's an example with a dummy data set:
library(ggplot2)
library(dplyr)
d <- data.frame(x=sample(20, 1000, replace=T),
count=rpois(1000, 10),
grp=sample(LETTERS[1:10], 1000, replace=TRUE))
This is equivalent to what you're seeing:
ggplot(d, aes(x=x, y=count, fill=grp)) +
geom_bar(stat='identity')
Grouping the observations (in your case by JoinMonthYear and Prod), and then summarising to the groups' sums, should get you what you're after:
d %>%
group_by(x, grp) %>%
summarise(sum_count=sum(count, na.rm=TRUE)) %>%
ggplot(aes(x=x, y=sum_count, fill=grp)) +
geom_bar(stat='identity')
I want to plot two stacked histograms that share a common x-axis. I want the second histogram to be plotted as the inverse(pointing downward) of the first. I found this post that shows how to plot the stacked histograms (How to plot multiple stacked histograms together in R?). For the sake of simplicity, let's say I just want to plot that same histogram, on the same x-axis but facing in the negative y-axis direction.
You could count up cases and then multiply the count by -1 for one category. Example with data.table / ggplot
library(data.table)
library(ggplot2)
# fake data
set.seed(123)
dat <- data.table(value = factor(sample(1:5, 200, replace=T)),
category = sample(c('a', 'b'), 200, replace=T))
# count by val/category; cat b as negative
plot_dat <-
dat[, .(N = .N * ifelse(category=='a', 1, -1)),
by=.(value, category)]
# plot
ggplot(plot_dat, aes(x=value, y=N, fill=category)) +
geom_bar(stat='identity', position='identity') +
theme_classic()
You can try something like this:
ggplot() +
stat_bin(data = diamonds,aes(x = depth)) +
stat_bin(data = diamonds,aes(x = depth,y = -..count..))
Responding to the additional comment:
library(dplyr)
library(tidyr)
d1 <- diamonds %>%
select(depth,table) %>%
gather(key = grp,value = val,depth,table)
ggplot() +
stat_bin(data = d1,aes(x = val,fill = grp)) +
stat_bin(data = diamonds,aes(x = price,y = -..count..))
Visually, that's a bad example because the scales of the variables are all off, but that's the general idea.