Automatic n plotting with ggplot and stat_summary

Automatic n plotting with ggplot and stat_summary - r

This is a question related to this one. I'm dealing with a boxplot of two groups and used the function n_fun proposed in that question with a small modification (I used y=10 to locate the "n = " because I find it disturbing above the median).
Here's the function:
n_fun <- function(x){
return(data.frame(y = 10, label = paste0("n = ",length(x))))
}
ggplot(mtcars, aes(x=factor(cyl), mpg, fill=factor(am))) +
geom_boxplot() + stat_summary(fun.data = n_fun, geom = "text")
The thing is that the function recognizes that there are two different "n = " to be plotted but they get plotted together on a single 'y'. I've tried to enter a vector on the y position of the n_fun and it is accepted. However, I get two overplotted "n= ". I'm looking for something like "position = dodge" for the stat_summary or another way to tell the ggplot that it must plot those texts in the same way that it plot's the dodged boxplots.

Well, as the help ?position_dodge states: Dodging things with different widths can be tricky. You may need to explicitly specify the width for dodging. In your case:
ggplot(mtcars, aes(x=factor(cyl), mpg, fill=factor(am))) +
stat_summary(fun.data = n_fun, geom = "text",
position = position_dodge(.9))

Related

How do I add label for each of my bar plot? [duplicate]

I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?

To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)

As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.

Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))

So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.

Creating a horizontal bar plots in the reverse direction

I'm trying to do a pyramid-"like" plot in R and I think I am close. I know there are functions such as plotrix's pyramid.plot but what I want to do isn't a real pyramid plot. In a pyramid plot, there are text labels down the middle that line up with the bars on the left and on the right. Instead, what I'd like to do is have two columns of text with bars coming away from them.
I'm using ggplot (but I guess I don't have to) and the multiplot function. A minimal example would be something like this:
mtcars$`car name` <- rownames(mtcars)
obj_a <- ggplot (mtcars, aes (x=`car name`, y=mpg))
obj_a <- obj_a + geom_bar (position = position_dodge(), stat="identity")
obj_a <- obj_a + coord_flip ()
obj_a <- obj_a + xlab ("")
USArrests$`states` <- rownames(USArrests)
obj_b <- ggplot (USArrests, aes (x=`states`, y=UrbanPop))
obj_b <- obj_b + geom_bar (position = position_dodge(), stat="identity")
obj_b <- obj_b + coord_flip ()
obj_b <- obj_b + xlab ("")
multiplot (obj_a, obj_b, cols=2)
Which looks like this:
I guess what I'd like is just to flip the left half so that each row has (from left-to-right): left bar, car model, state name, right bar. (The graph I'm making will have the same number of rows in both halves so it won't look so cramped.) However, the point is, there are two columns of text, not one.
Of course, since both halves are independent of each other, my real problem is I don't know how to make the left half. (A bar plot with bars going in the opposite direction.) But I thought I'd also explain what I'm trying to do...
Thank you in advance!

You can set the mpg values in obj_a negative, & position the car names axis on the opposite side:
ggplot (mtcars, aes (x=`car name`, y=-mpg)) + # y takes on negative values
geom_bar (position = position_dodge(), stat = "identity") +
coord_flip () +
scale_x_discrete(name = "", position = "top") + # x axis (before coord_flip) on opposite side
scale_y_continuous(name = "mpg",
breaks = seq(0, -30, by = -10), # y axis values (before coord_flip)
labels = seq(0, 30, by = 10)) # show non-negative values

It seems that pyramid.plot already does what you need. Using their example:
xy.pop<-c(3.2,3.5,3.6,3.6,3.5,3.5,3.9,3.7,3.9,3.5,3.2,2.8,2.2,1.8,
1.5,1.3,0.7,0.4)
xx.pop<-c(3.2,3.4,3.5,3.5,3.5,3.7,4,3.8,3.9,3.6,3.2,2.5,2,1.7,1.5,
1.3,1,0.8)
agelabels<-c("0-4","5-9","10-14","15-19","20-24","25-29","30-34",
"35-39","40-44","45-49","50-54","55-59","60-64","65-69","70-74",
"75-79","80-44","85+")
mcol<-color.gradient(c(0,0,0.5,1),c(0,0,0.5,1),c(1,1,0.5,1),18)
fcol<-color.gradient(c(1,1,0.5,1),c(0.5,0.5,0.5,1),c(0.5,0.5,0.5,1),18)
par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels,
main="Australian population pyramid 2002",lxcol=mcol,rxcol=fcol,
gap=0.5,show.values=TRUE))
# three column matrices
avtemp<-c(seq(11,2,by=-1),rep(2:6,each=2),seq(11,2,by=-1))
malecook<-matrix(avtemp+sample(-2:2,30,TRUE),ncol=3)
femalecook<-matrix(avtemp+sample(-2:2,30,TRUE),ncol=3)
# *** Make agegrps a two column data frame with the labels ***
# group by age
agegrps<-data.frame(c("0","11","21","31","41","51",
"61-70","71-80","81-90","91+"),
c("10","20","30","40","50","60",
"70","80","90","91"))
oldmar<-pyramid.plot(malecook,femalecook,labels=agegrps,
unit="Bowls per month",lxcol=c("#ff0000","#eeee88","#0000ff"),
rxcol=c("#ff0000","#eeee88","#0000ff"),laxlab=c(0,10,20,30),
raxlab=c(0,10,20,30),top.labels=c("Males","Age","Females"),gap=4,
do.first="plot_bg(\"#eedd55\")")
# put a box around it
box()
# give it a title
mtext("Porridge temperature by age and sex of bear",3,2,cex=1.5)
# stick in a legend
legend(par("usr")[1],11,c("Too hot","Just right","Too cold"),
fill=c("#ff0000","#eeee88","#0000ff"))
# don't forget to restore the margins and background
par(mar=oldmar,bg="transparent")
Result:

Instead of negating the variable, you could just add + scale_y_reverse(), which becomes x after the flip
This way you don't have to set axis labels manually.
You will still need to change the x axis label position, as suggested in the answer by user Z.Lin
E.g.
library(ggplot2)
mtcars$`car name` <- rownames(mtcars)
ggplot (mtcars, aes (x=`car name`, y=mpg)) +
geom_bar (position = position_dodge(), stat="identity") +
scale_y_reverse () +
scale_x_discrete(name = "", position = "top") +
coord_flip ()
Created on 2020-04-09 by the reprex package (v0.3.0)
EDIT: An even simpler solution is not to use coord_flip() at all, but rather specify the desired mapping directly right away. I would now recommend this approach, having encountered issues with how coord_flip() behaves under certain more complex scenarios.
I give the code for this below; the plot should still look the same.
library(ggplot2)
mtcars$`car name` <- rownames(mtcars)
ggplot (mtcars, aes(x = mpg, y = `car name`)) +
geom_bar(position = position_dodge(), stat = "identity") +
scale_x_reverse() +
scale_y_discrete(name = "", position = "right")

R-Programming - ggplot2 - boxplot issues (varwidth & position_dodge / stat_summary & position_dodge)

I am currently using ggplot2 to display some feature distributions with boxplots.
I can produce some simple boxplots, changing color, form, etc. but I cannot achieve the ones that combine several options.
1°)
My purpose is to display side by side boxplots for men and for women, which can be done with position = position_dodge(width=0.9).
I want that the width of the boxplot be proportional to the size of the sample, which can be done with var_width=TRUE.
First problem: when I put the two options together, it does not work and I get the following message:
position_dodge requires non-overlapping x intervals
Boxplot when using var_width=TRUE and position_dodge together:
I have tried to change the size of the plot, but it did not help. If I skip var_width=TRUE, then the boxplots are correctly dodged.
Is there a way out to this or is this a limit of ggplot2?
2°)
Besides, I want to display the size of each sample building the boxplots.
I can get the calculation with stat_summary(fun.data = give.n, but unfortunately, I have not found a way to avoid that the numbers overlap over each other when the boxplots are of similar positions.
I tried to use hjust & vjust to change the numbers’ positions, but they seem to share the same origin, so that does not help.
Overlapping numbers produced by stats_summary when boxplots are dodged:
As there are not labels, I could not use geom_text or I do not find a way how to get the stat passed to the geom_text.
So the second problem is: how can I nicely display each number on its own boxplot?
Here is my code:
`library(ggplot2)
# function to get the median of my sample
give.n <- function(x){
return(c(y = median(x), label = length(x)))
}
plot_boxes <- function(mydf, mycolumn1, mycolumn2) {
mylegendx <- deparse(substitute(mycolumn1))
mylegendy <- deparse(substitute(mycolumn2))
g2 <- ggplot(mydf, aes(x=as.factor(mycolumn1), y=mycolumn2, color=Gender,
fill=Gender)) +
geom_boxplot( data=mydf, aes(x=as.factor(mycolumn1), y=mycolumn2,
color=Gender), position=position_dodge(width=0.9), alpha=0.3) +
stat_summary(fun.data = give.n, geom = "text", size = 3, vjust=1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_discrete(name = mylegendx ) +
labs(title=paste("Boxplot ", substring(mylegendy, 11), " by ",
substring(mylegendx, 11)) , x = mylegendx, y = mylegendy)
print(g2)
}
#setwd("~/data")
filename <- "df_stackoverflow.csv"
df_client <- read.csv(file=filename, header=TRUE, sep=";", dec=".")
plot_boxes(df_client, df_client$Client.Class, df_client$nbyears_client)`
And the data looks like this (small sample from the dataset - 20,000 lines):
Client.Id;Client.Status;Client.Class;Gender;nbyears_client
3;Active;Middle Class;Male;1.38
4;Active;Middle Class;Male;0.9
5;Active;Retiree;Female;0.21
6;Active;Middle Class;Male;0.9
7;Active;Middle Class;Male;3.55
8;Active;Subprime;Male;1.16
9;Active;Middle Class;Male;1.21
10;Active;Part-time;Male;3.38
17;Active;Middle Class;Male;1.83
19;Active;Subprime;Female;5.81
20;Active;Farming;Male;8.99
21;Active;Subprime;Female;6.49
22;Active;Middle Class;Male;1.54
23;Active;Middle Class;Female;2.74
24;Active;Subprime;Male;0.46
25;Active;Executive;Female;0.49
26;Active;Middle Class;Female;3.55
27;Active;Middle Class;Male;3.83
29;Active;Subprime;Female;2.66
30;Active;Middle Class;Male;2.72
31;Active;Middle Class;Female;4.88
32;Active;Subprime;Male;1.46
34;Active;Middle Class;Female;7.16
41;Active;Middle Class;Male;0.65
44;Active;Middle Class;Male;2
45;Active;Subprime;Male;1.13

changing ggplot legend unit scale

This question is motivated by a previous post illustrating various ways to change how axes scales are plotted in a ggplot figure, from the default exponential notation to the full integer value (when ones axes values are very large). While I am able to convert the axes scales from exponential notation to full values, I am unclear how one would achieve the same goal for the values appearing in the legend.
While I understand that one can manually change the length of the legend scale with "scale_color..." or "scale_fill..." followed by the "limits" argument, this does not appear to be a solution to getting my legend values to show "6000000000" rather than "6e+09" (or "0" rather than "0e+00" for that matter).
The following example should suffice. My hope is someone can point out how to implement the 'scales' package to apply for legend scales rather than axes scales.
Thanks very much.
library(ggplot2)
library(scales)
Data <- data.frame(
pi = c(2,71,828,1828,45904,523536,2874713,52662497,757247093,6999595749),
e = c(3,14,159,2653,58979,311599,7963468,54418516,1590576171, 99),
face = 1:10)
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000))
myplot

Use the Comma formatter in scale_color_gradientn by setting labels = comma e.g.:
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000), labels = comma)
myplot

facet_wrap Title wrapping & Decimal places on free_y axis (ggplot2)

I have a set of code that produces multiple plots using facet_wrap:
ggplot(summ,aes(x=depth,y=expr,colour=bank,group=bank)) +
geom_errorbar(aes(ymin=expr-se,ymax=expr+se),lwd=0.4,width=0.3,position=pd) +
geom_line(aes(group=bank,linetype=bank),position=pd) +
geom_point(aes(group=bank,pch=bank),position=pd,size=2.5) +
scale_colour_manual(values=c("coral","cyan3", "blue")) +
facet_wrap(~gene,scales="free_y") +
theme_bw()
With the reference datasets, this code produces figures like this:
I am trying to accomplish two goals here:
Keep the auto scaling of the y axis, but make sure only 1 decimal place is displayed across all the plots. I have tried creating a new column of the rounded expr values, but it causes the error bars to not line up properly.
I would like to wrap the titles. I have tried changing the font size as in Change plot title sizes in a facet_wrap multiplot, but some of the gene names are too long and will end up being too small to read if I cram them on a single line. Is there a way to wrap the text, using code within the facet_wrap statement?

Probably cannot serve as definite answer, but here are some pointers regarding your questions:
Formatting the y-axis scale labels.
First, let's try the direct solution using format function. Here we format all y-axis scale labels to have 1 decimal value, after rounding it with round.
formatter <- function(...){
function(x) format(round(x, 1), ...)
}
mtcars2 <- mtcars
sp <- ggplot(mtcars2, aes(x = mpg, y = qsec)) + geom_point() + facet_wrap(~cyl, scales = "free_y")
sp <- sp + scale_y_continuous(labels = formatter(nsmall = 1))
The issue is, sometimes this approach is not practical. Take the leftmost plot from your figure, for example. Using the same formatting, all y-axis scale labels would be rounded up to -0.3, which is not preferable.
The other solution is to modify the breaks for each plot into a set of rounded values. But again, taking the leftmost plot of your figure as an example, it'll end up with just one label point, -0.3
Yet another solution is to format the labels into scientific form. For simplicity, you can modify the formatter function as follow:
formatter <- function(...){
function(x) format(x, ..., scientific = T, digit = 2)
}
Now you can have a uniform format for all of plots' y-axis. My suggestion, though, is to set the label with 2 decimal places after rounding.
Wrap facet titles
This can be done using labeller argument in facet_wrap.
# Modify cyl into factors
mtcars2$cyl <- c("Four Cylinder", "Six Cylinder", "Eight Cylinder")[match(mtcars2$cyl, c(4,6,8))]
# Redraw the graph
sp <- ggplot(mtcars2, aes(x = mpg, y = qsec)) + geom_point() +
facet_wrap(~cyl, scales = "free_y", labeller = labeller(cyl = label_wrap_gen(width = 10)))
sp <- sp + scale_y_continuous(labels = formatter(nsmall = 2))
It must be noted that the wrap function detects space to separate labels into lines. So, in your case, you might need to modify your variables.

This only solved the first part of the question. You can create a function to format your axis and use scale_y_continous to adjust it.
df <- data.frame(x=rnorm(11), y1=seq(2, 3, 0.1) + 10, y2=rnorm(11))
library(ggplot2)
library(reshape2)
df <- melt(df, 'x')
# Before
ggplot(df, aes(x=x, y=value)) + geom_point() +
facet_wrap(~ variable, scale="free")
# label function
f <- function(x){
format(round(x, 1), nsmall=1)
}
# After
ggplot(df, aes(x=x, y=value)) + geom_point() +
facet_wrap(~ variable, scale="free") +
scale_y_continuous(labels=f)

scale_*_continuous(..., labels = function(x) sprintf("%0.0f", x)) worked in my case.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Automatic n plotting with ggplot and stat_summary - r

Related

How do I add label for each of my bar plot? [duplicate]

Creating a horizontal bar plots in the reverse direction

R-Programming - ggplot2 - boxplot issues (varwidth & position_dodge / stat_summary & position_dodge)

changing ggplot legend unit scale

facet_wrap Title wrapping & Decimal places on free_y axis (ggplot2)

Categories

Resources