R - Shading part of a ggplot2 histogram - r

So I have this data:
dataset = rbinom(1000, 16, 0.5)
mean = mean(dataset)
sd = sd(dataset)
data_subset = subset(dataset, dataset >= (mean - 2*sd) & dataset <= (mean + 2*sd))
dataset = data.frame(X=dataset)
data_subset = data.frame(X=data_subset)
And here's how I'm drawing my histogram for dataset:
ggplot(dataset, aes(x = X)) +
geom_histogram(aes(y=..density..), binwidth=1, colour="black", fill="white") +
theme_bw()
How can I shade the data_subset portion of the histogram, like so?

My solution is very similar to joran's -- I think they're both worth looking at for the slight differences:
ggplot(dataset,aes(x=X)) +
geom_histogram(binwidth=1,fill="white",color="black") +
geom_histogram(data=subset(dataset,X>6&X<10),binwidth=1,
colour="black", fill="grey")+theme_bw()

Just add another geom_histogram line using that data subset (although you may have to tinker with the binwidth a bit, I'm not sure):
ggplot(dataset, aes(x = X)) +
geom_histogram(aes(y=..density..), binwidth=1, colour="black", fill="white") +
geom_histogram(data = data_subset,aes(y=..density..), binwidth=1, colour="black",fill = "grey") +
theme_bw()

Related

Can anyone explain why creating a histogram with two conditions shows incorrect distribution in R?

I want to create a histogram with data from two different conditions (A and B in the example below). I want to plot both distributions in the same plot using geom_histogram in R.
However, it seems that for condition A, the distribution of the whole data set is shown (instead of only A).
In the example below, three cases are shown:
Plotting A and B
Plotting only A
Plotting only B
You will see that the distribution of A is not the same when you compare 1) and 2).
Can anyone explain why this occurs and how to fix this problem?
set.seed(5)
# Create test data frame
test <- data.frame(
condition=factor(rep(c("A", "B"), each=200)),
value =c(rnorm(200, mean=12, sd=2.5), rnorm(200, mean=13, sd=2.1))
)
# Create separate data sets
test_a <- test[test$condition == "A",]
test_b <- test[test$condition == "B",]
# 1) Plot A and B
ggplot(test, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test A and AB")
# 2) Plot only A
ggplot(test_a, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test A")
# 3) Plot only B
ggplot(test_b, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test B")
An alternative for visualization, not to supplant MichaelDewar's answer:
ggab <- ggplot(test, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5, position = "identity") +
ggtitle("Test A and AB") +
xlim(5, 20) +
ylim(0, 13)
# 2) Plot only A
gga <- ggplot(test_a, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test A") +
xlim(5, 20) +
ylim(0, 13)
# 3) Plot only B
ggb <- ggplot(test_b, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test B") +
xlim(5, 20) +
ylim(0, 13)
library(patchwork) # solely for a quick side-by-side-by-side presentation
gga + ggab + ggb + plot_annotation(title = 'position = "identity"')
The key in this visualization is adding position="identity" to the first hist (the others do not need it).
Alternatively, one could use position="dodge" (this is best viewed on the console, it's a bit difficult on this small snapshot).
And for perspective, position = "stack", the default, showing "A" with a demonstrably altered histogram.
The plots are stacked in the A+B plot. So the A bars start at the top of the B bars. Also, the scaling on the axes are different. It's also possible that the bins have different endpoints.
So, yes, the A+B plot is showing the total distribution. The fill helps you see the contribution from each of the A and B.
If you want to overlay the two plots, use:
ggplot(mapping = aes(x=value, fill=condition)) +
geom_histogram(data = test_a, binwidth = 0.25, alpha=.5) +
geom_histogram(data = test_b, binwidth = 0.25, alpha=.5) +
ggtitle("Test A and AB")

Change the scale of x axis in ggplot

I have a ggplot bar and don't know how to change the scale of the x axis. At the moment it looks like on the image below. However I'd like to reorder the scale of the x axis so that 21% bar is higher than the 7% bar. How could I get the % to the axis? Thanks in advance!
df= data.frame("number" = c(7,21), "name" = c("x","y"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title")
Use the prop.table function to in y variable in the geom plot.
ggplot(df, aes(x=name, y=100*prop.table(number))) +
geom_bar(stat="identity", fill = "blue") +
xlab("Stichprobe") + ylab("Paketmenge absolut") +
ggtitle("Menge total")
If you want to have the character, % in the y axis, you can add scale_y_continuous to the plot as below:
library(scales)
ggplot(df, aes(x=name, y=prop.table(number))) +
geom_bar(stat="identity", fill = "blue") +
xlab("Stichprobe") + ylab("Paketmenge absolut") +
ggtitle("Menge total") +
scale_y_continuous(labels=percent)
The only way I am able to duplicate the original plot is, as #sconfluentus noted, for the 7% and 21% to be character strings. As an aside the data frame column names need not be quoted.
df= data.frame(number = c('7%','21%'), name = c("x","y"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title")
Changing the numbers to c(0.07, 0.21) and adding, as #Mohanasundaram noted, scale_y_continuous(labels = scales::percent) corrects the situation:
To be pedantic using breaks = c(0.07, 0.21) creates nearly an exact duplicate. See also here.3
Hope this is helpful.
library(ggplot2)
library(scales)
df= data.frame(number = c(0.07,0.21), name = c("KG","MS"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title") + scale_y_continuous(labels = scales::percent, breaks = c(.07, .21)))

Is there a possibility to combine position_stack and nudge_x in a stacked bar chart in ggplot2?

I want to add labels to a stacked bar chart to achieve something like this:
The goal is simple: I need to show market shares and changes versus previous year in the same graph. In theory, I would just add "nudge_x=0.5," to geom_text in the code but I get the error: "Specify either position or nudge_x/nudge_y". Is it possible to use some workaround, maybe another package? Thanks a lot in advance!
Code:
DashboardCategoryText <- c("Total Market","Small Bites","Bars","Total Market","Small Bites","Bars","Total Market","Small Bites","Bars")
Manufacturer <- c("Ferrero","Ferrero","Ferrero","Rest","Rest","Rest","Kraft","Kraft","Kraft")
MAT <- c(-1,5,-7,6,8,10,-10,5,8)
Measure_MATCurrent <- c(500,700,200,1000,600,80,30,60,100)
data <- data.frame(DashboardCategoryText,Manufacturer,MAT,Measure_MATCurrent)
library(dplyr)
groupedresult <- group_by(data,DashboardCategoryText)
groupedresult <- summarize(groupedresult,SUM=sum(Measure_MATCurrent))
groupedresult <- as.data.frame(groupedresult)
data <- merge(data,groupedresult,by="DashboardCategoryText")
data$percent <- data$Measure_MATCurrent/data$SUM
library(ggplot2)
ggplot(data, aes(x=reorder(DashboardCategoryText, SUM), y=percent, fill=Manufacturer)) +
geom_bar(stat = "identity", width = .7, colour="black", lwd=0.1) +
geom_text(aes(label=ifelse(percent >= 0.005, paste0(sprintf("%.0f", percent*100),"%"),"")),
position=position_stack(vjust=0.5), colour="white") +
geom_text(aes(label=MAT,y=percent),
nudge_x=0.5,
position=position_stack(vjust=0.8),
colour="black") +
coord_flip() +
scale_y_continuous(labels = percent_format()) +
labs(y="", x="")
I have a somewhat 'hacky' solution where you essentially just change the geom_text data in the underlying ggplot object before you plot it.
p <- ggplot(data, aes(x=reorder(DashboardCategoryText, SUM), y=percent, fill=Manufacturer)) +
geom_bar(stat = "identity", width = .7, colour="black", lwd=0.1) +
geom_text(aes(label=ifelse(percent >= 0.005, paste0(sprintf("%.0f", percent*100),"%"),"")),
position=position_stack(vjust=0.5), colour="white") +
geom_text(aes(label=MAT,y=percent),
position=position_stack(vjust=.5),
colour="black") +
coord_flip() +
scale_y_continuous(labels = percent_format()) +
labs(y="", x="")
q <- ggplot_build(p) # get the ggplot data
q$data[[3]]$x <- q$data[[3]]$x + 0.5 # change it to adjust the x position of geom_text
plot(ggplot_gtable(q)) # plot everything

Deleting an entire row of facets of unused factor level combination

I want to remove the 2nd row of facets from my plot below because there is no data for that factor combination.
library(ggplot2)
library(grid)
set.seed(5000)
# generate first df
df1 = data.frame(x=rep(rep(seq(2,8,2),4),6),
y=rep(rep(seq(2,8,2),each=4),6),
v1=c(rep("x1",32),rep("x2",64)),
v2=c(rep("y1",64),rep("y2",32)),
v3=rep(rep(c("t1","t2"),each=16),3),
v4=rbinom(96,1,0.5))
# generate second df
df2 = data.frame(x=runif(20)*10, y=runif(20)*10,
v1=sample(c("x1","x2"),20,T))
# plot
ggplot() +
geom_point(data=df1, aes(x=x, y=y, colour = factor(v4)), shape=15, size=5) +
scale_colour_manual(values = c(NA,"black")) + facet_grid(v1+v2~v3, drop = T) +
geom_point(data=df2, aes(x=x,y=y), shape=23 , colour="black", fill="white", size=4) +
coord_equal(ratio=1) + xlim(0, 10) + ylim(0, 10)
I tried to use the idea from this post..
g=ggplotGrob(y)
pos=which(g$layout$t==5 | g$layout$t==6)
g$layout=g$layout[-c(pos),]
g$grobs=g$grobs[-c(pos)]
grid.newpage()
grid.draw(g)
..but got this.
How do I eliminate the white space? Also, is there a straightforward solution to this, without having to manipulate the grobs, etc?
Just modify the data:
df2 <- rbind(cbind(df2, v2 = "y1"),
cbind(df2, v2 = "y2"))
df2 <- df2[!(df2$v1 == "x1" & df2$v2 == "y2"),]
# plot
ggplot() +
geom_point(data=df1, aes(x=x, y=y, colour = factor(v4)), shape=15, size=5) +
scale_colour_manual(values = c(NA,"black")) + facet_grid(v1+v2~v3, drop = T) +
geom_point(data=df2, aes(x=x,y=y), shape=23 , colour="black", fill="white", size=4) +
coord_equal(ratio=1) + xlim(0, 10) + ylim(0, 10)

How to find Percent Frequency gg plot

There have been a few questions on here asking how to plot percent frequency. I have tried implementing the suggestions but am still having trouble.
I have the following vector:
var <- c(2,2,1,0,1,1,1,1,1,3,2,3,3,5,1,4,4,0,3,4,1,0,3,3,0,0,
1,3,2,6,2,2,2,1,0,2,3,2,0,0,0,0,3,2,2,4,3,2,2,0,4,1,0,1,3,1,4,3,1,2,
6,7,6,1,2,2,4,5,3,0,6,5,2,0,7,1,7,3,1,4,1,1,2,1,1,2,1,1,4,2,0,3,3,2,2,2,5,3,2,5,2,5)
I plotted a histogram using the following code:
df <- data.table(x = var)
df <- df[, .N, by=x]
df$x <- factor(df$x, levels=c(0:25))
p <- ggplot(df, aes(x=x, y= N)) +
geom_bar(
stat="identity", width=1.0,
colour = "darkgreen",
fill = 'paleturquoise4'
)
p <- p + labs(scale_x_discrete(drop=FALSE) )
p = p + coord_cartesian(ylim=c(0, 50)) +
scale_y_continuous(breaks=seq(0, 50, 2))
print(p)
I tried using the following but it does not work.
p <- ggplot(df, aes(x=x, y= N)) +
geom_bar(
aes(y = (..count..)/sum(..count..)),
stat="identity", width=1.0,
colour = "darkgreen",
fill = 'paleturquoise4'
)
One thing you can do is that you can do the calculation before you draw the graphic. But, if I follow your approach, you would want something like this.
ggplot(df, aes(x=x)) +
geom_bar(aes(y = N/sum(N)), stat="identity", width=1.0,
colour = "dark green", fill = 'paleturquoise4') +
ylab("y")

Resources