ggplot: relative frequencies of two groups - r

I want a plot like this except that each facet sums to 100%. Right now group M is 0.05+0.25=0.30 instead of 0.20+0.80=1.00.
df <- rbind(
data.frame(gender=c(rep('M',5)), outcome=c(rep('1',4),'0')),
data.frame(gender=c(rep('F',10)), outcome=c(rep('1',7),rep('0',3)))
)
df
ggplot(df, aes(outcome)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
facet_wrap(~gender, nrow=2, ncol=1)
(Using y = ..density.. gives worse results.)

here's another way
ggplot(df, aes(outcome)) +
geom_bar(aes(y = ..count.. / sapply(PANEL, FUN=function(x) sum(count[PANEL == x])))) +
facet_wrap(~gender, nrow=2, ncol=1)

I usually do this by simply precalculating the values outside of ggplot2 and using stat = "identity":
df1 <- melt(ddply(df,.(gender),function(x){prop.table(table(x$outcome))}),id.vars = 1)
ggplot(df1, aes(x = variable,y = value)) +
facet_wrap(~gender, nrow=2, ncol=1) +
geom_bar(stat = "identity")

Related

Make geom_histogram display x-axis labels as integers instead of numerics

I have a data.frame that has counts for several groups:
set.seed(1)
df <- data.frame(group = sample(c("a","b"),200,replace = T),
n = round(runif(200,1,2)))
df$n <- as.integer(df$n)
And I'm trying to display a histogram of df$n, facetted by the group using ggplot2's geom_histogram:
library(ggplot2)
ggplot(data = df, aes(x = n)) + geom_histogram() + facet_grid(~group) + theme_minimal()
Any idea how to get ggplot2 to label the x-axis ticks with the integers the histogram is summarizing rather than the numeric values it is currently showing?
You could tweak this by the binwidth argument of geom_histogram:
library(ggplot2)
ggplot(data = df, aes(x = n)) +
geom_histogram(binwidth = 0.5) +
facet_grid(~group) +
theme_minimal()
Another example:
set.seed(1)
df <- data.frame(group = sample(c("a","b"),200,replace = T),
n = round(runif(200,1,5)))
library(ggplot2)
ggplot(data = df, aes(x = n)) +
geom_histogram(binwidth = 0.5) +
facet_grid(~group) +
theme_minimal()
You can manually specify the breaks with scale_x_continuous(breaks = seq(1, 2)). Alternatively, you can set the breaks and labels separately as well.

is it possible to create a ggMarginal plot without desaggredating the data?

I have a data frame with some points and their frequency of occurrence and I want to plot points (balls) using their frequency to represent their size. But I also want to use ggMarginal to create the marginal plots. The code bellow creates the marginal without taking in account their frequencies.
library(ggplot2)
df <- data.frame("x" = 1:5, "y" = c(5,8,8,12,10), "f" = c(4,5,8,8,5))
p <- ggplot(df, aes(x=x, y=y, size=f)) + geom_point() + theme_bw()
ggExtra::ggMarginal(p, data=df, type = "histogram")
I don't want to create another data frame with disaggregated data. But it would lead to the right marginals. As presented bellow:
# disaggregated data
df2 <- df[ rep(1:nrow(df), df$f), c("x", "y") ]
p <- ggplot(df2, aes(x=x, y=y)) + geom_point() + theme_bw()
ggExtra::ggMarginal(p, data=df2, type = "histogram")
But even if I try to use both data frames, the resulting marginals still go wrong.
p <- ggplot(df, aes(x=x, y=y, size=f)) + geom_point() + theme_bw()
ggExtra::ggMarginal(p, data=df2, type = "histogram")
Is it possible to create the marginals with disaggregating the data? How?
If 1. is not possible, how to do it anyway, since none of the examples above provided the desired plot?
It can be done with cowplot package.
library(tidyverse)
library(cowplot)
df <- data.frame("x" = 1:5,
"y" = c(5,8,8,12,10),
"f" = c(4,5,8,8,5))
df2 <- df[rep(1:nrow(df), df$f), c("x", "y") ]
p <-
ggplot(df, aes(x=x, y=y, size=f)) +
geom_count() +
theme_bw()
xhist <-
axis_canvas(p, axis = "x") +
geom_histogram(data = df2, aes(x = x), color = 'lightgray')
yhist <-
axis_canvas(p, axis = "y", coord_flip = TRUE) +
geom_histogram(data = df2, aes(x = y), color = 'lightgray') +
coord_flip()
p %>%
insert_xaxis_grob(xhist, grid::unit(1, "in"), position = "top") %>%
insert_yaxis_grob(yhist, grid::unit(1, "in"), position = "right") %>%
ggdraw()

full text label on Boxplot, with added mean point

Am trying to get text label similar to what this https://stats.stackexchange.com/questions/8206/labeling-boxplots-in-r, but I cant get it to work. MWE similar to what I have is this:
data <- data.frame(replicate(5,sample(0:100,100,rep=TRUE)))
meanFunction <- function(x){
return(data.frame(y=round(mean(x),2),label=round(mean(x,na.rm=T),2)))}
ggplot(melt(data), aes(x=variable, y=value)) +
geom_boxplot(aes(fill=variable), width = 0.7) +
stat_summary(fun.y = mean, geom="point",colour="darkred", size=4) +
stat_summary(fun.data = meanFunction, geom="text", size = 4, vjust=1.3)
That produces something like "A" in the attached image, and I am trying to get something like "B" for each of the boxes. Thanks.
Here is my attempt. First, I reshaped your data. Then, I produced your boxplot. I changed the size and colour of text for mean. Then, I looked into the data that ggplot used, which you can access using ggplot_build(objectname)$data[[1]]. You can see the numbers you need. I selected necessary variables and reshaped the data, which is df. Using df, you can annotate the numbers you want.
library(dplyr)
library(tidyr)
library(ggplot2)
set.seed(123)
mydf <- data.frame(replicate(5,sample(0:100,100,rep=TRUE)))
mydf <- gather(mydf, variable, value)
meanFunction <- function(x){
return(data.frame(y=round(mean(x),2),label=round(mean(x,na.rm=T),2)))}
g <- ggplot(data = mydf, aes(x = variable, y = value, fill = variable)) +
geom_boxplot(width = 0.5) +
stat_summary(fun.y = mean, geom = "point",colour = "darkred", size=4) +
stat_summary(fun.data = meanFunction, geom ="text", color = "white", size = 3, vjust = 1.3)
df <- ggplot_build(g)$data[[1]] %>%
select(ymin:ymax, x) %>%
gather(type, value, - x) %>%
arrange(x)
g + annotate("text", x = df$x + 0.4, y = df$value, label = df$value, size = 3)
First, I would take your data and then calculate all the boxplot features yourself. Here's one way to do that
dd <- data.frame(replicate(5,sample(0:100,100,rep=TRUE)))
tt <- data.frame(t(sapply(dd, function(x) c(boxplot.stats(x)$stats, mean(x)))))
names(tt) <- c("ymin","lower","middle","upper","ymax", "mean")
tt$var <- factor(rownames(tt))
I'm sure there are prettier ways to do that with dplyr but this point is you'll need to calculate those values yourself so you know where to draw the labels. Then you can do
ggplot(tt) +
geom_boxplot(aes(x=var, ymin=ymin, lower=lower, middle=middle, upper=upper, ymax=ymax), stat="identity", width=.5) +
geom_text(aes(x=as.numeric(var)+.3, y=middle, label=formatC(middle,1, format="f")), hjust=0) +
geom_text(aes(x=as.numeric(var)+.3, y= lower, label=formatC(lower,1, format="f")), hjust=0) +
geom_text(aes(x=as.numeric(var)+.3, y= upper, label=formatC(upper,1, format="f")), hjust=0) +
geom_text(aes(x=as.numeric(var)+.3, y= ymax, label=formatC(ymax,1, format="f")), hjust=0) +
geom_text(aes(x=as.numeric(var)+.3, y= ymin, label=formatC(ymin,1, format="f")), hjust=0) +
geom_point(aes(x=var, y=mean)) +
geom_text(aes(x=as.numeric(var), y= mean, label=formatC(mean,1, format="f")), hjust=.5, vjust=1.5)
to draw each of the labels

How to find Percent Frequency gg plot

There have been a few questions on here asking how to plot percent frequency. I have tried implementing the suggestions but am still having trouble.
I have the following vector:
var <- c(2,2,1,0,1,1,1,1,1,3,2,3,3,5,1,4,4,0,3,4,1,0,3,3,0,0,
1,3,2,6,2,2,2,1,0,2,3,2,0,0,0,0,3,2,2,4,3,2,2,0,4,1,0,1,3,1,4,3,1,2,
6,7,6,1,2,2,4,5,3,0,6,5,2,0,7,1,7,3,1,4,1,1,2,1,1,2,1,1,4,2,0,3,3,2,2,2,5,3,2,5,2,5)
I plotted a histogram using the following code:
df <- data.table(x = var)
df <- df[, .N, by=x]
df$x <- factor(df$x, levels=c(0:25))
p <- ggplot(df, aes(x=x, y= N)) +
geom_bar(
stat="identity", width=1.0,
colour = "darkgreen",
fill = 'paleturquoise4'
)
p <- p + labs(scale_x_discrete(drop=FALSE) )
p = p + coord_cartesian(ylim=c(0, 50)) +
scale_y_continuous(breaks=seq(0, 50, 2))
print(p)
I tried using the following but it does not work.
p <- ggplot(df, aes(x=x, y= N)) +
geom_bar(
aes(y = (..count..)/sum(..count..)),
stat="identity", width=1.0,
colour = "darkgreen",
fill = 'paleturquoise4'
)
One thing you can do is that you can do the calculation before you draw the graphic. But, if I follow your approach, you would want something like this.
ggplot(df, aes(x=x)) +
geom_bar(aes(y = N/sum(N)), stat="identity", width=1.0,
colour = "dark green", fill = 'paleturquoise4') +
ylab("y")

plot only a select few facets in facet_grid

I was looking for a way to plot using facet_grid in ggplot2 that only displays just a few select facets. say I have the following plot:
Been looking for a quick way to, for instance, just plot facets 1 and 3.
#data
y<-1:12
x<-c(1,2,3,1,2,3,1,2,3,1,2,3)
z<-c("a","a","a","b","b","b","a","a","a","b","b","b")
df<-as.data.frame(cbind(x,y,z))
#plot
a <- ggplot(df, aes(x = z, y = y,
fill = z))
b <- a + geom_bar(stat = "identity", position = "dodge")
c <- b + facet_grid(. ~ x, scale = "free_y")
c
Obviously I figured out how to just chop up my data first but this must of course be possible to allocate in ggplot2 Even just a nudge would be most welcome.
Use subset in your ggplot call.
plot_1 = ggplot(subset(df, x %in% c(1, 2)), aes(x=z, y=y, fill=z)) +
geom_bar(stat = "identity", position = "dodge") +
facet_grid(. ~ x, scale = "free_y")
Would this be okay,
a <- ggplot(subset(df, x != 2), aes(x = z, y = y, fill = z))
b <- a + geom_bar(stat = "identity", position = "dodge")
c <- b + facet_grid(. ~ x, scale = "free_y")
c

Resources