How to overwrap on geom_bar in ggplot2? - r

I would like to create a bar chart with ggplot in R.
The sample data is as follows:
Name <- c('Sample1', 'Sample2', 'Sample3')
Total <- c(86020045,30974095,1520609)
Part <- c(41348957, 2956650, 595121)
DT <- data.frame(Name,Total,Part)
DT
ggplot(DT, aes(Name, Total, fill=Name)) +
geom_bar(position="stack",stat="identity")
What I would like to show is the stack bar chart that shows each Name's Total counts, and show the Part counts within the bar + label the % of it on in the middle of the bar.
Is there any way possible to do this? I've been searching on here but haven't been able to find a solution.

Oh... It seems like someone already commented the answer while I was writing it down. I'll post mine anyways since it's slightly different.
DT <- transform(DT, Part0 = Total - Part)
library(reshape2)
DT2 <- melt(DT, id.vars = c("Name", "Total"))
DT2 <- transform(DT2, perc = value/Total * 100)
ggplot(DT2, aes(Name, perc, fill=variable)) +
geom_bar(position="stack",stat="identity") +
geom_text(data = subset(DT2, variable == "Part"), aes(y = (perc),
label = paste0("Total = ", Total, "\n",
"Part = ", value, "\n",
round(perc, 1), "%\n")))
If you use value instead of perc you will get a proportional bar chart but since the total for sample 3 is a lot smaller than sample 1, it's going to be difficult to read the table. So I decided to use percentage instead of the actual values.

Related

Horizontal Group Bar Chart - How to scale to 100% and how to specify the order of the layers

So I have the following code which produces:
The issue here is twofold:
The group bar chart automatically places the highest value on the top (i.e. for avenue 4 CTP is on top), whereas I would always want FTP to be shown first then CTP to be shown after (so always blue bar then red bar)
I need all of the values to scale to 100 or 100% for their respective group (so for CTP avenue 4 would have a huge bar graph but the other avenues should be extremely tiny)
I am new to 'R'/Stack overflow so sorry if anything is wrong/you need more but any help is greatly appreciated.
library(ggplot2)
library(tidyverse)
library(magrittr)
# function to specify decimals
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
# sample data
avenues <- c("Avenue1", "Avenue2", "Avenue3", "Avenue4")
flytip_amount <- c(1000, 2000, 1500, 250)
collection_amount <- c(5, 15, 10, 2000)
# create data frame from the sample data
df <- data.frame(avenues, flytip_amount, collection_amount)
# got it working - now to test
df3 <- df
SumFA <- sum(df3$flytip_amount)
df3$FTP <- (df3$flytip_amount/SumFA)*100
df3$FTP <- specify_decimal(df3$FTP, 1)
SumCA <- sum(df3$collection_amount)
df3$CTP <- (df3$collection_amount/SumCA)*100
df3$CTP <- specify_decimal(df3$CTP, 1)
# Now we have percentages remove whole values
df2 <- df3[,c(1,4,5)]
df2 <- df2 %>% pivot_longer(-avenues)
FTGraphPos <- df2$name
ggplot(df2, aes(x = avenues, fill = as.factor(name), y = value)) +
geom_col(position = "dodge", width = 0.75) + coord_flip() +
labs(title = "Flytipping & Collection %", x = "ward_name", y = "Percentageperward") +
geom_text(aes(x= avenues, label = value), vjust = -0.1, position = "identity", size = 5)
I have tried the above and I have looked at lots of tutorials but nothing is exactly precise to what I need of ensuring the group bar charts puts the layers in the same order despite amount and scaling to 100/100%
As Camille notes, to handle ordering of the categories in a plot, you need to set them as factors, and then use functions from the forcats package to handle the order. Here I am using fct_relevel() (note that it will automatically convert character variables to factors).
Your numeric values are in fact set to character, so they need to be set to numeric for the chart to make sense.
To cover point #2, I'm using group_by() to calculate percentages within each name.
I have also fixed the labels so that they are properly dodged along with the bar chart. Also, note that you don't need to call ggplot2 or magrittr if you are calling tidyverse - those packages come along with it already.
df_plot <- df2 |>
mutate(name = fct_relevel(name, "CTP"),
value = as.numeric(value)) |>
group_by(name) |>
mutate(perc = value / sum(value)) |>
ungroup()
ggplot(df_plot, aes(x = value, y = avenues, fill = name)) +
geom_col(position = "dodge", width = 0.75) +
geom_text(aes(label = value), position = position_dodge(width = 0.75), size = 5) +
labs(title = "Flytipping & Collection %", x = "Percentageperward", y = "ward_name") +
guides(fill = guide_legend(reverse = TRUE))

How to make funnel chart with bars in R ggplot2?

I want to make a funnel chart in R with ggplot2 as following:
https://chartio.com/assets/c15a30/tutorials/charts/funnel-charts/c7cd4465bc714689646515692b6dbe7c74ae7550a265cd2d6a530f1f34d68ae1/funnel-chart-example.png
My code looks like this, but I don't know how to do the the light blue fills between the bars. (maybe with polygon?)
library(ggplot2)
library(reshape2) # for melt()
library(dplyr)
# get data
dat <- read.table(text=
"steps numbers rate
clicks 332835 100.000000
signup 157697 47.379933
cart 29866 8.973215
buys 17012 5.111241",
header = T)
barWidth <- 0.9
# add spacing, melt, sort
total <- subset(dat, rate==100)$numbers
dat$padding <- (total - dat$numbers) / 2
molten <- melt(dat[, -3], id.var='steps')
molten <- molten[order(molten$variable, decreasing = T), ]
molten$steps <- factor(molten$steps, levels = rev(dat$steps))
ggplot(molten, aes(x=steps)) +
geom_bar(aes(y = value, fill = variable),
stat='identity', position='stack') +
geom_text(data=dat,
aes(y=total/2, label= paste(round(rate), '%')),
color='white') +
scale_fill_manual(values = c('grey40', NA) ) +
coord_flip() +
theme(legend.position = 'none') +
labs(x='steps', y='volume')
I needed the same but hadn't found one, so I created a function to do so. It might need some improvements, but it is working well. The example below shows only numbers, but you can also add texts.
x <- c(86307,
34494,
28127,
17796,
12488,
11233
)
source("https://gist.github.com/jjesusfilho/fd14b58becab4924befef5be239c6011")
gg_funnel(x, color = viridisLite::plasma(6))
This should be just a comment, since you explicitly asked for a ggplot solution, which this is not - I posted it as an answer purely for reasons of code formatting.
You could consider plotly, which has a funnel type. Something like
library(plotly)
dat %>% mutate(steps=factor(steps, unique(steps)),
rate=sprintf("%.2f%%", rate)) %>%
plot_ly(
type = "funnel",
y = ~steps,
text= ~rate,
x = ~numbers)
could get you started; I do not really grasp the padding you have in your data, so this might not be exactly what you want.

Add labels for selected observations in ggplot2 histogram at the same height as the bins

I'd like to add an "id" annotation to certain observations in a histogram.
So far, I'm able to add the annotation with no problem, but I'd like the 'y' position of my annotations to be the count of the bin + 1 (for aesthetic reasons).
This is what I have so far:
library(tidyverse)
library(ggrepel)
selected_obs <- c("S10", "S100", "S245", "S900")
set.seed(0)
values <- rnorm(1000)
plot_df <- tibble(id = paste0("S", 1:1000),
values = values) %>%
mutate(obs_labels = ifelse(id %in% selected_obs, id, NA))
ggplot(plot_df, aes(values)) +
geom_histogram(binwidth = 0.3, color = "white") +
geom_label_repel(aes(label = obs_labels, y = 100))
I've seen multiple answers dealing with annotating the count for each bin using geom_text(stat = count", aes(y=..count.., label=..count..).
Based on that, I've tried these two work-arounds, but no success:
geom_label_repel(stat = "count", aes(label = obs_labels, y = ..count..)) yields:
"Error: geom_label_repel requires the following missing aesthetics: label"
geom_label_repel(aes(label = obs_labels, y = ..count..)) yields "Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y = ..count...
Did you map your stat in the wrong layer?".
Anybody that can shed some light here?
That may be a mildly misleading visualisation, because you are labelling a unique ID, but with the positioning of this label to the count height you are suggesting that this ID was counted that often. Anyways.
The most straight forward option is to manually calculate the bin to which your ID belongs, then count this bin, and then use this data in order to set the x and y for your labels.
Unfortunately, I have to use R online and cannot create a nice reprex, therefore including a screenshot. But the code should be reproducible, as it is running online
library(tidyverse)
library(ggrepel)
selected_obs <- c("S10", "S100", "S245", "S900")
set.seed(0)
values <- rnorm(1000)
plot_df <- tibble(id = paste0("S", 1:1000),
values = values) %>%
mutate(obs_labels = ifelse(id %in% selected_obs, id, NA),
bins = as.factor( as.numeric( cut(values, 30)))) # cutting into 30 bins
label_df<- plot_df %>% filter(id %in% selected_obs) %>% left_join(plot_df, by = 'bins') %>%
group_by(values = values.x, obs_labels = obs_labels.x) %>% count
ggplot(plot_df, aes(values)) +
geom_histogram(color = "white") + # removed your bin argument, as to default to 30
geom_label(data = label_df, aes(label = obs_labels, y = n))
The label positions are not quite perfect - this is because I chose to cut into 30 equal bins and the binning may be slightly different between cut and histogram. This may need some tweaking, depending on the size of your bins, and if you include upper/lower margins.
P.S. Credit to cut into equal bins goes to this answer by user pedrosaurio

Highlight positions without data in facet_wrap ggplot

When facetting barplots in ggplot the x-axis includes all factor levels. However, not all levels may be present in each group. In addition, zero values may be present, so from the barplot alone it is not possible to distinguish between x-axis values with no data and those with zero y-values. Consider the following example:
library(tidyverse)
set.seed(43)
site <- c("A","B","C","D","E") %>% sample(20, replace=T) %>% sort()
year <- c("2010","2011","2012","2013","2014","2010","2011","2012","2013","2014","2010","2012","2013","2014","2010","2011","2012","2014","2012","2014")
isZero = rbinom(n = 20, size = 1, prob = 0.40)
value <- ifelse(isZero==1, 0, rnorm(20,10,3)) %>% round(0)
df <- data.frame(site,year,value)
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site)
This is fish census data, where not all sites were fished in all years, but some times no fish were caught. Hence the need to differentiate between the two situations. For example, there was no catch at site C in 2010 and it was not fished in 2011, and the reader cannot tell the difference. I would like to add something like "no data" to the plot for 2011. Maybe it is possible to fill the rows where data is missing, generate another column with the desired text to be added and then include this via geom_text?
So here is an example of your proposed method:
# Tabulate sites vs year, take zero entries
tab <- table(df$site, df$year)
idx <- which(tab == 0, arr.ind = T)
# Build new data.frame
missing <- data.frame(site = rownames(tab)[idx[, "row"]],
year = colnames(tab)[idx[, "col"]],
value = 1,
label = "N.D.") # For 'no data'
ggplot(df, aes(year, value)) +
geom_col() +
geom_text(data = missing, aes(label = label)) +
facet_wrap(~site)
Alternatively, you could also let the facets omit unused x-axis values:
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site, scales = "free_x")

Creating Relative Frequency Table and Graph in R

Below is the dataset.
https://docs.google.com/spreadsheet/ccc?key=0AjmK45BP3s1ydEUxRWhTQW5RczVDZjhyell5dUV4YlE#gid=0
Code:
counts = table(finaldata$satjob, finaldata$degree)
barplot(counts, xlab="Highest Degree after finishing 9-12th Grade",col = c("Dark Blue","Blueviolet","deepPink4","goldenrod"), legend =(rownames(counts)))
The below barplot is the result of the above code.
https://docs.google.com/file/d/0BzmK45BP3s1yVkx5OFlGQk5WVE0/edit
Now, i want to create the plot for relative frequency table of "counts".
For creating a relative frequency table, I need the divide each cell of the column by the column total to get the relative frequency for that cell and so for others as well. How to go about doing it.
I have tried this formula counts/sum(counts) , but this is not working. counts[1:4]/sum(counts[1:4]), this gives me the relative frequency of the first column.
Help me obtain the same for other columns as well in the same table.
I'm a big fan of plyr & ggplot2, so you may have to download a few packages for the below to work.
install.packages('ggplot2') # only have to run once
install.packages('plyr') # only have to run once
install.packages('scales') # only have to run once
library(plyr)
library(ggplot2)
library(scales)
# dat <- YOUR DATA
dat_count <- ddply(ft, .(degree, satjob), 'count')
dat_rel_freq <- ddply(dat, .(degree), transform, rel_freq = freq/sum(freq))
ggplot(dat_rel_freq, aes(x = degree, y = rel_freq, fill = satjob)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
labs(title = 'Highest Degree After finishing 9-12th Grade\n',
x = '',
y = '',
fill = 'Job Satisfaction')

Resources