R ggplot2: place value of column on top of stacked bars - r

There might be a duplicate, but I do not find an answer that applies to my particular case...
I just have a very simple data frame like the one below, with counts in two columns (Number_NonHit_Cells, Number_Hit_Cells) that I want to show in stacked bars, having the value of another column (Freq) placed on top of the stacked bars.
The MWE below is the best I have been able to get so far, but I only need the value of Freq once, and at the very top of the bars combined...
It would even be better if Freq could be calculated inside the ggplot2 call.
This is my MWE:
clono_df_long <- data.frame(Clonotype=LETTERS[1:5], Number_Hit_Cells=c(234,56,568,34,46),
Number_NonHit_Cells=c(c(52,12,234,21,31)))
clono_df_long$Clonotype_Size <- clono_df_long$Number_Hit_Cells+clono_df_long$Number_NonHit_Cells
clono_df_long$Freq <- round(clono_df_long$Number_Hit_Cells/clono_df_long$Clonotype_Size,4)*100
clono_df_long <- as.data.frame(tidyr::pivot_longer(clono_df_long,
-c(Clonotype,Clonotype_Size,Freq),
names_to = "Cells", values_to = "Value"))
clono_df_long$Clonotype <- factor(clono_df_long$Clonotype, levels=unique(clono_df_long$Clonotype))
clono_df_long$Cells <- factor(clono_df_long$Cells, levels=c('Number_NonHit_Cells','Number_Hit_Cells'))
P <- ggplot2::ggplot(clono_df_long, ggplot2::aes(x=Clonotype, y=Value, fill=Cells)) +
ggplot2::geom_bar(stat="identity") +
ggplot2::scale_fill_manual(values=c('gray70', 'gray40')) +
ggplot2::geom_text(ggplot2::aes(label=paste0(Freq,'%')), vjust=-1) +
ggplot2::theme_light()
grDevices::pdf(file='test.pdf', height=6, width=6)
print(P)
grDevices::dev.off()
Which produces this:

You may try
clono_df_long$Freq <- ifelse(clono_df_long$Cells == "Number_NonHit_Cells", clono_df_long$Freq, NA)
ggplot2::ggplot(clono_df_long, ggplot2::aes(x=Clonotype, y=Value, fill=Cells)) +
ggplot2::geom_bar(stat="identity") +
ggplot2::scale_fill_manual(values=c('gray70', 'gray40')) +
#ggplot2::geom_text(ggplot2::aes(label=paste0(Freq,'%')), vjust=-1) +
ggplot2::theme_light() +
ggplot2::geom_text(aes(label = scales::percent(Freq/100) ),position = "stack")

Related

Individually change x labels using expressions in ggplot2 boxplot with facet_grid in R

I want to individually change the x labels of my ggplot2 boxplot when using a facet_grid. I made the following simple example:
library(ggplot2)
data1 <- InsectSprays
data1$group <- "group 1"
data2 <- InsectSprays
data2$group <- "group 2"
plotData <- rbind(data1, data2)
ggplot(plotData, aes(x=spray, y=count, fill=spray))+
guides(fill=FALSE) +
facet_grid(. ~ group) +
geom_boxplot()
I want to change the labels on the x axis (A, B, C,...), but individually for the two groups. One way changing the labels would be, using:
scale_x_discrete(labels=c("label 1", "label 2", ...))
but this would change the labels in both groups to the same values. At the end I also need to be able to use expressions for the labels. Is there any way to achieve what I want?
EDIT:
There is a very simple way to solve my problem (thanks #Axeman). By using:
scale_x_discrete(labels=c('A' = expression(beta)))
I can change the labels. In my example this would change both groups, but for me it is possible to rename the labels to individual labels beforehand and than use this trick to use expressions for the labels.
plotData$x <- interaction(plotData$spray, plotData$group)
plotData$x <- factor(plotData$x, labels = paste('labels', 1:12))
ggplot(plotData, aes(x=x, y=count, fill=spray))+
geom_boxplot(show.legend = FALSE) +
facet_grid(. ~ group, scales = 'free')
I would have expected the following to work, but it doesn't!
ggplot(plotData, aes(x=interaction(spray, group), y=count, fill=spray))+
geom_boxplot(show.legend = FALSE) +
facet_grid(. ~ group, scales = 'free') +
scale_x_discrete(labels = paste('labels', 1:12))

Align multiple ggplot graphs with and without legends [duplicate]

This question already has answers here:
Align multiple plots in ggplot2 when some have legends and others don't
(6 answers)
Closed 5 years ago.
I'm trying to use ggplot to draw a graph comparing the absolute values of two variables, and also show the ratio between them. Since the ratio is unitless and the values are not, I can't show them on the same y-axis, so I'd like to stack vertically as two separate graphs with aligned x-axes.
Here's what I've got so far:
library(ggplot2)
library(dplyr)
library(gridExtra)
# Prepare some sample data.
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# Plot absolute values
plot_values <- ggplot(results, aes(x=index)) +
geom_point(aes(y=value, color="value")) +
geom_point(aes(y=control, color="control"))
# Plot ratios between values
plot_ratios <- ggplot(results, aes(x=index, y=ratio)) +
geom_point()
# Arrange the two plots above each other
grid.arrange(plot_values, plot_ratios, ncol=1, nrow=2)
The big problem is that the legend on the right of the first plot makes it a different size. A minor problem is that I'd rather not show the x-axis name and tick marks on the top plot, to avoid clutter and make it clear that they share the same axis.
I've looked at this question and its answers:
Align plot areas in ggplot
Unfortunately, neither answer there works well for me. Faceting doesn't seem a good fit, since I want to have completely different y scales for my two graphs. Manipulating the dimensions returned by ggplot_gtable seems more promising, but I don't know how to get around the fact that the two graphs have a different number of cells. Naively copying that code doesn't seem to change the resulting graph dimensions for my case.
Here's another similar question:
The perils of aligning plots in ggplot
The question itself seems to suggest a good option, but rbind.gtable complains if the tables have different numbers of columns, which is the case here due to the legend. Perhaps there's a way to slot in an extra empty column in the second table? Or a way to suppress the legend in the first graph and then re-add it to the combined graph?
Here's a solution that doesn't require explicit use of grid graphics. It uses facets, and hides the legend entry for "ratio" (using a technique from https://stackoverflow.com/a/21802022).
library(reshape2)
results_long <- melt(results, id.vars="index")
results_long$facet <- ifelse(results_long$variable=="ratio", "ratio", "values")
results_long$facet <- factor(results_long$facet, levels=c("values", "ratio"))
ggplot(results_long, aes(x=index, y=value, colour=variable)) +
geom_point() +
facet_grid(facet ~ ., scales="free_y") +
scale_colour_manual(breaks=c("control","value"),
values=c("#1B9E77", "#D95F02", "#7570B3")) +
theme(legend.justification=c(0,1), legend.position=c(0,1)) +
guides(colour=guide_legend(title=NULL)) +
theme(axis.title.y = element_blank())
Try this:
library(ggplot2)
library(gtable)
library(gridExtra)
AlignPlots <- function(...) {
LegendWidth <- function(x) x$grobs[[8]]$grobs[[1]]$widths[[4]]
plots.grobs <- lapply(list(...), ggplotGrob)
max.widths <- do.call(unit.pmax, lapply(plots.grobs, "[[", "widths"))
plots.grobs.eq.widths <- lapply(plots.grobs, function(x) {
x$widths <- max.widths
x
})
legends.widths <- lapply(plots.grobs, LegendWidth)
max.legends.width <- do.call(max, legends.widths)
plots.grobs.eq.widths.aligned <- lapply(plots.grobs.eq.widths, function(x) {
if (is.gtable(x$grobs[[8]])) {
x$grobs[[8]] <- gtable_add_cols(x$grobs[[8]],
unit(abs(diff(c(LegendWidth(x),
max.legends.width))),
"mm"))
}
x
})
plots.grobs.eq.widths.aligned
}
df <- data.frame(x = c(1:5, 1:5),
y = c(1:5, seq.int(5,1)),
type = factor(c(rep_len("t1", 5), rep_len("t2", 5))))
p1.1 <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
p1.2 <- ggplot(df, aes(x = x, y = y, colour = type)) + geom_line()
plots1 <- AlignPlots(p1.1, p1.2)
do.call(grid.arrange, plots1)
p2.1 <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
p2.2 <- ggplot(df, aes(x = x, y = y)) + geom_line()
plots2 <- AlignPlots(p2.1, p2.2)
do.call(grid.arrange, plots2)
Produces this:
// Based on multiple baptiste's answers
Encouraged by baptiste's comment, here's what I did in the end:
library(ggplot2)
library(dplyr)
library(gridExtra)
# Prepare some sample data.
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# Plot ratios between values
plot_ratios <- ggplot(results, aes(x=index, y=ratio)) +
geom_point()
# Plot absolute values
remove_x_axis =
theme(
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank())
plot_values <- ggplot(results, aes(x=index)) +
geom_point(aes(y=value, color="value")) +
geom_point(aes(y=control, color="control")) +
remove_x_axis
# Arrange the two plots above each other
grob_ratios <- ggplotGrob(plot_ratios)
grob_values <- ggplotGrob(plot_values)
legend_column <- 5
legend_width <- grob_values$widths[legend_column]
grob_ratios <- gtable_add_cols(grob_ratios, legend_width, legend_column-1)
grob_combined <- gtable:::rbind_gtable(grob_values, grob_ratios, "first")
grob_combined <- gtable_add_rows(
grob_combined,unit(-1.2,"cm"), pos=nrow(grob_values))
grid.draw(grob_combined)
(I later realised I didn't even need to extract the legend width, since the size="first" argument to rbind tells it just to have that one override the other.)
It feels a bit messy, but it is exactly the layout I was hoping for.
An alternative & quite easy solution is as follows:
# loading needed packages
library(ggplot2)
library(dplyr)
library(tidyr)
# Prepare some sample data
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# reshape into long format
long <- results %>%
gather(variable, value, -index) %>%
mutate(facet = ifelse(variable=="ratio", "ratio", "values"))
long$facet <- factor(long$facet, levels=c("values", "ratio"))
# create the plot & remove facet labels with theme() elements
ggplot(long, aes(x=index, y=value, colour=variable)) +
geom_point() +
facet_grid(facet ~ ., scales="free_y") +
scale_colour_manual(breaks=c("control","value"), values=c("green", "red", "blue")) +
theme(axis.title.y=element_blank(), strip.text=element_blank(), strip.background=element_blank())
which gives:

bar z-index for ggplot2 geom_bar

I have created the following graphic ggplot
with the following code;
p <- ggplot(df, aes(x=100*prop_select, y=(1500-s_meanResponse), fill=product)) +
geom_bar(stat='identity', width=2, alpha=.6, color='#333333') +
coord_cartesian(xlim = c(0, 5)) +
coord_flip()
print(p)
I am attempting to intentionally overlap the bars. I would like to know how to change the z-index (depth) of each of the bars. I attempted to do this by simply reordering the levels of the factor column that determines my bars
# Order by mean response to make sure that bars are layered correctly (does not work!)
# Switching this minus makes no difference to the bar z-index but does reverse legend entries
df <- df[sort.list(-df$s_meanResponse),]
df$product <- factor(df$product, levels=df$product)
Anybody know if this is possible with ggplot2?
EDIT:
dataframe is structured similar to below
df <- data.frame( product=c('a','b','c'), s_meanResponse=c(1120,1421,1320), prop_select=c(.3,.2,.5))
I'm using the following df with actual overlapping:
df <- data.frame(product=c('a','b','c'),
s_meanResponse=c(1120,1421,1320),
prop_select=c(.311,.32,.329))
It seems that the plotting order remains the same regardless of the factor levels ordering, instead it just plots bars from lowest to highest y-value. To achieve custom ordering, we'll have to do the following, plotting layers explicitly one by one in the desired order:
geom_bars_with_order <- function(vals)
{
l <- list()
for (i in vals)
{
l <- c(l, geom_bar(data = df[df$product == i, ],
stat='identity', width=2, alpha=.6, color='#333333'))
}
l
}
# default order
ggplot(NULL, aes(x=100*prop_select, y=(1500-s_meanResponse), fill=product)) +
geom_bars_with_order(c("a", "b", "c")) +
coord_cartesian(xlim = c(0, 5)) +
coord_flip()
# custom order, "a" on top
ggplot(NULL, aes(x=100*prop_select, y=(1500-s_meanResponse), fill=product)) +
geom_bars_with_order(c("b", "c", "a")) +
coord_cartesian(xlim = c(0, 5)) +
coord_flip()

ggplot Donut chart

Hi I really have googled this a lot without any joy. Would be happy to get a reference to a website if it exists. I'm struggling to understand the Hadley documentation on polar coordinates and I know that pie/donut charts are considered inherently evil.
That said, what I'm trying to do is
Create a donut/ring chart (so a pie with an empty middle) like the tikz ring chart shown here
Add a second layer circle on top (with alpha=0.5 or so) that shows a second (comparable) variable.
Why? I'm looking to show financial information. The first ring is costs (broken down) and the second is total income. The idea is then to add + facet=period for each review period to show the trend in both revenues and expenses and the growth in both.
Any thoughts would be most appreciated
Note: Completely arbitrarily if an MWE is needed if this was tried with
donut_data=iris[,2:4]
revenue_data=iris[,1]
facet=iris$Species
That would be similar to what I'm trying to do.. Thanks
I don't have a full answer to your question, but I can offer some code that may help get you started making ring plots using ggplot2.
library(ggplot2)
# Create test data.
dat = data.frame(count=c(10, 60, 30), category=c("A", "B", "C"))
# Add addition columns, needed for drawing with geom_rect.
dat$fraction = dat$count / sum(dat$count)
dat = dat[order(dat$fraction), ]
dat$ymax = cumsum(dat$fraction)
dat$ymin = c(0, head(dat$ymax, n=-1))
p1 = ggplot(dat, aes(fill=category, ymax=ymax, ymin=ymin, xmax=4, xmin=3)) +
geom_rect() +
coord_polar(theta="y") +
xlim(c(0, 4)) +
labs(title="Basic ring plot")
p2 = ggplot(dat, aes(fill=category, ymax=ymax, ymin=ymin, xmax=4, xmin=3)) +
geom_rect(colour="grey30") +
coord_polar(theta="y") +
xlim(c(0, 4)) +
theme_bw() +
theme(panel.grid=element_blank()) +
theme(axis.text=element_blank()) +
theme(axis.ticks=element_blank()) +
labs(title="Customized ring plot")
library(gridExtra)
png("ring_plots_1.png", height=4, width=8, units="in", res=120)
grid.arrange(p1, p2, nrow=1)
dev.off()
Thoughts:
You may get more useful answers if you post some well-structured sample data. You have mentioned using some columns from the iris dataset (a good start), but I am unable to see how to use that data to make a ring plot. For example, the ring plot you have linked to shows proportions of several categories, but neither iris[, 2:4] nor iris[, 1] are categorical.
You want to "Add a second layer circle on top": Do you mean to superimpose the second ring directly on top of the first? Or do you want the second ring to be inside or outside of the first? You could add a second internal ring with something like geom_rect(data=dat2, xmax=3, xmin=2, aes(ymax=ymax, ymin=ymin))
If your data.frame has a column named period, you can use facet_wrap(~ period) for facetting.
To use ggplot2 most easily, you will want your data in 'long-form'; melt() from the reshape2 package may be useful for converting the data.
Make some barplots for comparison, even if you decide not to use them. For example, try:
ggplot(dat, aes(x=category, y=count, fill=category)) +
geom_bar(stat="identity")
Just trying to solve question 2 with the same approach from bdemarest's answer. Also using his code as a scaffold. I added some tests to make it more complete but feel free to remove them.
library(broom)
library(tidyverse)
# Create test data.
dat = data.frame(count=c(10,60,20,50),
ring=c("A", "A","B","B"),
category=c("C","D","C","D"))
# compute pvalue
cs.pvalue <- dat %>% spread(value = count,key=category) %>%
ungroup() %>% select(-ring) %>%
chisq.test() %>% tidy()
cs.pvalue <- dat %>% spread(value = count,key=category) %>%
select(-ring) %>%
fisher.test() %>% tidy() %>% full_join(cs.pvalue)
# compute fractions
#dat = dat[order(dat$count), ]
dat %<>% group_by(ring) %>% mutate(fraction = count / sum(count),
ymax = cumsum(fraction),
ymin = c(0,ymax[1:length(ymax)-1]))
# Add x limits
baseNum <- 4
#numCat <- length(unique(dat$ring))
dat$xmax <- as.numeric(dat$ring) + baseNum
dat$xmin = dat$xmax -1
# plot
p2 = ggplot(dat, aes(fill=category,
alpha = ring,
ymax=ymax,
ymin=ymin,
xmax=xmax,
xmin=xmin)) +
geom_rect(colour="grey30") +
coord_polar(theta="y") +
geom_text(inherit.aes = F,
x=c(-1,1),
y=0,
data = cs.pvalue,aes(label = paste(method,
"\n",
format(p.value,
scientific = T,
digits = 2))))+
xlim(c(0, 6)) +
theme_bw() +
theme(panel.grid=element_blank()) +
theme(axis.text=element_blank()) +
theme(axis.ticks=element_blank(),
panel.border = element_blank()) +
labs(title="Customized ring plot") +
scale_fill_brewer(palette = "Set1") +
scale_alpha_discrete(range = c(0.5,0.9))
p2
And the result:

How can a line be overlaid on a bar plot using ggplot2?

I'm looking for a way to plot a bar chart containing two different series, hide the bars for one of the series and instead have a line (smooth if possible) go through the top of where bars for the hidden series would have been (similar to how one might overlay a freq polynomial on a histogram). I've tried the example below but appear to be running into two problems.
First, I need to summarize (total) the data by group, and second, I'd like to convert one of the series (df2) to a line.
df <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,1,2,2,3,3))
df2 <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,4,3,5,1,2))
ggplot(df, aes(x=grp, y=val)) +
geom_bar(stat="identity", alpha=0.75) +
geom_bar(data=df2, aes(x=grp, y=val), stat="identity", position="dodge")
You can get group totals in many ways. One of them is
with(df, tapply(val, grp, sum))
For simplicity, you can combine bar and line data into a single dataset.
df_all <- data.frame(grp = factor(levels(df$grp)))
df_all$bar_heights <- with(df, tapply(val, grp, sum))
df_all$line_y <- with(df2, tapply(val, grp, sum))
Bar charts use a categorical x-axis. To overlay a line you will need to convert the axis to be numeric.
ggplot(df_all) +
geom_bar(aes(x = grp, weight = bar_heights)) +
geom_line(aes(x = as.numeric(grp), y = line_y))
Perhaps your sample data aren't representative of the real data you are working with, but there are no lines to be drawn for df2. There is only one value for each x and y value. Here's a modifed version of your df2 with enough data points to construct lines:
df <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,2,3,1,2,3))
df2 <- data.frame(grp=c("A","A","B","B","C","C"),val=c(1,4,3,5,0,2))
p <- ggplot(df, aes(x=grp, y=val))
p <- p + geom_bar(stat="identity", alpha=0.75)
p + geom_line(data=df2, aes(x=grp, y=val), colour="blue")
Alternatively, if your example data above is correct, you can plot this information as a point with geom_point(data = df2, aes(x = grp, y = val), colour = "red", size = 6). You can obviously change the color and size to your liking.
EDIT: In response to comment
I'm not entirely sure what the visual for a freq polynomial over a histogram is supposed to look like. Are the x-values supposed to be connected to one another? Secondly, you keep referring to wanting lines but your code shows geom_bar() which I assume isn't what you want? If you want lines, use geom_lines(). If the two assumptions above are correct, then here's an approach to do that:
#First let's summarise df2 by group
df3 <- ddply(df2, .(grp), summarise, total = sum(val))
> df3
grp total
1 A 5
2 B 8
3 C 3
#Second, let's plot df3 as a line while treating the grp variable as numeric
p <- ggplot(df, aes(x=grp, y=val))
p <- p + geom_bar(alpha=0.75, stat = "identity")
p + geom_line(data=df3, aes(x=as.numeric(grp), y=total), colour = "red")

Resources