Consider the following two plots
library(ggplot2)
set.seed(666)
bigx <- data.frame(x=sample(1:12,50,replace=TRUE))
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour =
"black",stat="bin",binwidth=2) +
ylab("Frequency") +
xlab("things") +
ylim(c(0,30))
hist(bigx$x)
Why do I get the overhang above 12 on ggplot? When i play with right = TRUE this just shifts the overhang to below zero. I want the simple and simply bounded result from hist() but using ggplot2.
How can I do this?
If your goal is to reproduce the output of hist(...) using ggplot, this will work:
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour = "black",stat="bin",
binwidth=2, right=TRUE) +
scale_x_continuous(limits=c(0,12),breaks=seq(0,12,2))
Or, more generally, this:
brks <- hist(bigx$x, plot=F)$breaks
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour = "black",stat="bin",
breaks=brks, right=TRUE) +
scale_x_continuous(limits=range(brks),breaks=brks)
Evidently, the ggplot default for histograms is to use right-closed intervals, whereas the default for hist(...) is left closed intervals. Also, ggplot uses a different algorithm for calculating the x-axis breaks and limits.
Related
I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.
Please consider the following MWE
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()
which produces
and then
ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))
which probably is even worse
since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).
I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.
Is it possible to improve something?
EDIT:
This what happen when I applied Jaap solution to my real data
Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?
The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth=10) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0.015,0)) +
theme_bw()
gives:
In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.
The following code:
library(ggplot2)
library(scales)
ggplot(data, aes(x=dist)) +
stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
theme_bw()
will give this result:
I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()
Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.
ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))
A solution could be to convert your data to a factor:
library(ggplot2)
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
ggplot(data, aes(x=factor(dist))) +
geom_histogram(stat = "count") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Resulting in:
I had the same issue and, inspired by #Jaap's answer, I fiddled with the histogram binwidth using the x-axis in log scale.
If you use binwidth = 0.201, the bars will be juxtaposed as expected. However, this means you can only have up to five bars between two x coordinates.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth = 0.201, color = 'red') +
scale_x_log10()
Result:
How to overlay one plot on top of the other in ggplot2 as explained in the following sentences? I want to draw the grey time series on top of the red one using ggplot2 in R (now the red one is above the grey one and I want my graph to be the other way around). Here is my code (I generate some data in order to show you my problem, the real dataset is much more complex):
install.packages("ggplot2")
library(ggplot2)
time <- rep(1:100,2)
timeseries <- c(rep(0.5,100),rep(c(0,1),50))
upper <- c(rep(0.7,100),rep(0,100))
lower <- c(rep(0.3,100),rep(0,100))
legend <- c(rep("red should be under",100),rep("grey should be above",100))
dataset <- data.frame(timeseries,upper,lower,time,legend)
ggplot(dataset, aes(x=time, y=timeseries)) +
geom_line(aes(colour=legend, size=legend)) +
geom_ribbon(aes(ymax=upper, ymin=lower, fill=legend), alpha = 0.2) +
scale_colour_manual(limits=c("grey should be above","red should be under"),values = c("grey50","red")) +
scale_fill_manual(values = c(NA, "red")) +
scale_size_manual(values=c(0.5, 1.5)) +
theme(legend.position="top", legend.direction="horizontal",legend.title = element_blank())
Convert the data you are grouping on into a factor and explicitly set the order of the levels. ggplot draws the layers according to this order. Also, it is a good idea to group the scale_manual codes to the geom it is being applied to for readability.
legend <- factor(legend, levels = c("red should be under","grey should be above"))
c <- data.frame(timeseries,upper,lower,time,legend)
ggplot(c, aes(x=time, y=timeseries)) +
geom_ribbon(aes(ymax=upper, ymin=lower, fill=legend), alpha = 0.2) +
scale_fill_manual(values = c("red", NA)) +
geom_line(aes(colour=legend, size=legend)) +
scale_colour_manual(values = c("red","grey50")) +
scale_size_manual(values=c(1.5,0.5)) +
theme(legend.position="top", legend.direction="horizontal",legend.title = element_blank())
Note that the ordering of the values in the scale_manual now maps to "grey" and "red"
I want to create the next histogram density plot with ggplot2. In the "normal" way (base packages) is really easy:
set.seed(46)
vector <- rnorm(500)
breaks <- quantile(vector,seq(0,1,by=0.1))
labels = 1:(length(breaks)-1)
den = density(vector)
hist(df$vector,
breaks=breaks,
col=rainbow(length(breaks)),
probability=TRUE)
lines(den)
With ggplot I have reached this so far:
seg <- cut(vector,breaks,
labels=labels,
include.lowest = TRUE, right = TRUE)
df = data.frame(vector=vector,seg=seg)
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
But the "y" scale has the wrong dimension. I have noted that the next run gets the "y" scale right.
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
I just do not understand it. y=..density.. is there, that should be the height. So why on earth my scale gets modified when I try to fill it?
I do need the colours. I just want a histogram where the breaks and the colours of each block are directionally set according to the default ggplot fill colours.
Manually, I added colors to your percentile bars. See if this works for you.
library(ggplot2)
ggplot(df, aes(x=vector)) +
geom_histogram(breaks=breaks,aes(y=..density..),colour="black",fill=c("red","orange","yellow","lightgreen","green","darkgreen","blue","darkblue","purple","pink")) +
geom_density(aes(y=..density..)) +
scale_x_continuous(breaks=c(-3,-2,-1,0,1,2,3)) +
ylab("Density") + xlab("df$vector") + ggtitle("Histogram of df$vector") +
theme_bw() + theme(plot.title=element_text(size=20),
axis.title.y=element_text(size = 16, vjust=+0.2),
axis.title.x=element_text(size = 16, vjust=-0.2),
axis.text.y=element_text(size = 14),
axis.text.x=element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
fill=seg results in grouping. You are actually getting a different histogram for each value of seg. If you don't need the colours, you could use this:
ggplot(df) +
geom_histogram(breaks=breaks,aes(x=vector,y=..density..), position="identity") +
geom_density(aes(x=vector,y=..density..))
If you need the colours, it might be easiest to calculate the density values outside of ggplot2.
Or an option with ggpubr
library(ggpubr)
gghistogram(df, x = "vector", add = "mean", rug = TRUE, fill = "seg",
palette = c("#00AFBB", "#E7B800", "#E5A800", "#00BFAB", "#01ADFA",
"#00FABA", "#00BEAF", "#01AEBF", "#00EABA", "#00EABB"), add_density = TRUE)
The confusion regarding interpreting the y-axis might be due to density is plotted rather than count. So, the values on the y-axis are proportions of the total sample, where the sum of the bars is equal to 1.
I need to make several histograms regarding the same vector of values and a density estimation. So the next plot is good.
values = rnorm(100)
plot = ggplot(data.frame(val=values), aes(x=val)) + geom_histogram(aes(y = ..density..)) + geom_density()
However, I need to print several plots (not one plot with different panels) with different break points, say:
breaks = list(c(-1,0,1),c(-2,-1.5,0,1.5,2),c(-0.5,0,0.5))
How can I redefine the breaks for the variable plot?
Using your own code, you can do that with:
ggplot(data.frame(val=values), aes(x=val)) +
geom_histogram(aes(y = ..density..)) +
geom_density() +
scale_y_continuous(breaks=c(-2,-1.5,0,1.5,2))
I am trying to bluid a plot with ggplot2 where on the X-axis I could find some way of having a label for groups of variables. Here is a minimal version of my code:
Bzero <-100*matrix(runif(100),ncol=10,nrow=10)
B <-99
LNtype <-c(1,1,1,1,2,2,2,3,3,3)
LNnames <-c('grp1','grp2','grp3')
tB <-t(Bzero)/(B+1)
dfB <-data.frame(tB)
dfB$grp <-LNtype
dfB$vid <-1:nrow(tB)
mB0 <- melt(dfB,id.vars=c('grp','vid'))
mB0 <- mB0[order(mB0$grp,mB0$vid),]
gg0 <- ggplot(mB0,aes(x=vid,y=variable))
gg0 <- gg0 + geom_tile(aes(fill = value),colour = "white")
gg0 <- gg0 + scale_fill_gradient(low = "green", high = "red",na.value='white',limits=c(0,1),name='p0i')
gg0 <- gg0 + xlab('Equation')+ylab('Covariate')
Here's the resulting plot:
And here is what I'd like to have:
I have been tinkering with the scale, breaks, and labels to no avail. Even a massive amount of googling did reveal any plot with that kind of axis. Is there any way to get what I want?
You can replace numbers with groups using scale_x_continuous() and setting breaks at desired positions. With geom_segment() you can add those black lines to group data.
gg0+
geom_segment(aes(x=0.5,y=0.5,xend=10.5,yend=0.5))+
geom_segment(aes(x=c(0.5,4.5,7.5,10.5),
xend=c(0.5,4.5,7.5,10.5),y=rep(0.5,4),yend=rep(1,4)))+
scale_x_continuous("",breaks=c(2.5,6,9),labels=c("Group1","Group2","Group3"))