I have a simple problem. How to plot histogram with ggplot2 with fixed binwidth and filled with rainbow colors (or any other palette)?
Lets say I have a data like that:
myData <- abs(rnorm(1000))
I want to plot histogram, using e.g. binwidth=.1. That however will cause different number of bins, depending on data:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
If I knew number of bins (e.g. n=15) I'd use something like:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill=rainbow(n))
But with changing number of bins I'm kind of stuck on this simple problem.
If you really want the number of bins flexible, here is my little workaround:
library(ggplot2)
gg_b <- ggplot_build(
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
)
nu_bins <- dim(gg_b$data[[1]])[1]
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill = rainbow(nu_bins))
In case the binwidth is fixed, here is an alternative solution which is using the internal function ggplot2:::bin_breaks_width() to get the number of bins before creating the graph. It's still a workaround but avoids to call geom_histogram() twice as in the other solution:
# create sample data
set.seed(1L)
myData <- abs(rnorm(1000))
binwidth <- 0.1
# create plot
library(ggplot2) # CRAN version 2.2.1 used
n_bins <- length(ggplot2:::bin_breaks_width(range(myData), width = binwidth)$breaks) - 1L
ggplot() + geom_histogram(aes(x = myData), binwidth = binwidth, fill = rainbow(n_bins))
As a third alternative, the aggregation can be done outside of ggplot2. Then, geom_col() cam be used instead of geom_histogram():
# start binning on multiple of binwidth
start_bin <- binwidth * floor(min(myData) / binwidth)
# compute breaks and bin the data
breaks <- seq(start_bin, max(myData) + binwidth, by = binwidth)
myData2 <- cut(sort(myData), breaks = breaks, by = binwidth)
ggplot() + geom_col(aes(x = head(breaks, -1L),
y = as.integer(table(myData2)),
fill = levels(myData2))) +
ylab("count") + xlab("myData")
Note that breaks is plotted on the x-axis instead of levels(myData2) to keep the x-axis continuous. Otherwise each factor label would be plotted which would clutter the x-axis. Also note that the built-in ggplot2 color palette is used instead of rainbow().
Related
I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.
I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.
I'm trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient, and log10's them.
Here's the code:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
bins <- 10
cols <- c("darkblue","darkred")
colGradient <- colorRampPalette(cols)
cut.cols <- colGradient(bins)
df$cut <- cut(df$val,bins)
df$cut <- factor(df$cut,level=unique(df$cut))
Then,
ggplot(data=df,aes_string(x="val",y="..count..+1",fill="cut"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(df$cut))+
scale_fill_manual(values=cut.cols,labels=levels(df$cut))+
scale_y_log10()
gives:
whereas dropping the fill from the aesthetics:
ggplot(data=df,aes_string(x="val",y="..count..+1"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(cuts))+
scale_fill_manual(values=cut.cols,labels=levels(cuts))+
scale_y_log10()
gives:
Any idea why do the histogram bars differ between the two plots and to make the first one similar to the second one?
The OP is trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient...
The OP has already done the binning (with 10 bins) but is then calling geom_histogram() which does a binning on its own using 30 bins by default (see ?geomhistogram).
When geom_bar() is used instead together with cutinstead of val
ggplot(data = df, aes_string(x = "cut", y = "..count..+1", fill = "cut")) +
geom_bar(show.legend = FALSE) +
scale_color_manual(values = cut.cols, labels = levels(df$cut)) +
scale_fill_manual(values = cut.cols, labels = levels(df$cut)) +
scale_y_log10()
the chart becomes:
Using geom_histogram() with filled bars is less straightforward as can be seen in this and this answer to the question How to fill histogram with color gradient?
I have a simplified dataframe
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,1,2,1,1,1,3))
ggplot(df,aes(x=wins))+geom_histogram(binwidth=0.5,fill="red")
I would like to get the final value in the sequence,3, shown with either a different fill or alpha. One way to identify its value is
tail(df,1)$wins
In addition, I would like to have the histogram bars shifted so that they are centered over the number. I tried unsuccesfully subtracting from the wins value
You can do this with a single geom_histogram() by using aes(fill = cond).
To choose different colours, use one of the scale_fill_*() functions, e.g. scale_fill_manual(values = c("red", "blue").
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,11,2,11,15,1,1,3))
df$cond <- df$wins == tail(df,1)$wins
ggplot(df, aes(x=wins, fill = cond)) +
geom_histogram() +
scale_x_continuous(breaks=df$wins+0.25, labels=df$wins) +
scale_fill_manual(values = c("red", "blue"))
1) To draw bins in different colors you can use geom_histogram() for subsets.
2) To center bars along numbers on the x axis you can invoke scale_x_continuous(breaks=..., labels=...)
So, this code
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,11,2,11,15,1,1,3))
cond <- df$wins == tail(df,1)$wins
ggplot(df, aes(x=wins)) +
geom_histogram(data=subset(df,cond==FALSE), binwidth=0.5, fill="red") +
geom_histogram(data=subset(df,cond==TRUE), binwidth=0.5, fill="blue") +
scale_x_continuous(breaks=df$wins+0.25, labels=df$wins)
produces the plot:
I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.