Fill histogram bins with a custom gradient - r

I want to create a histogram in R and ggplot2, in which the bins are filled based on their continuous x-value. Most tutorials only feature coloring by discrete values or density/count.
Following this example was able to color the bins with a rainbow scale:
df <- data.frame(x = runif(100))
ggplot(df) +
geom_histogram(aes(x), fill = rainbow(30))
Rainbow histogram
I want to use a color gradient, where the bins are from blue (lowest) to yellow (highest). The scale_fill_gradient() function seems to achive that, yet when i insert it in place of rainbow() for the fill argument i receive an error:
> ggplot(df) +
+ geom_histogram(aes(x), fill = scale_fill_gradient(low='blue', high='yellow'))
Error: Aesthetics must be either length 1 or the same as the data (30): fill
I tried several ways to supply the length of 30 for the scale, yet i get the same error every time. So my question is:
Is scale_color_gradient the right function for the fill argument or do i have to use another one?
If it is the right function, how can i correctly supply the length?

If you want different colors for each bin, you need to specify fill = ..x.. in the aesthetics, which is a necessary quirk of geom_histogram. Using scale_fill_gradient with your preferred color gradient then yields the following output:
ggplot(df, aes(x, fill = ..x..)) +
geom_histogram() +
scale_fill_gradient(low='blue', high='yellow')

Related

Wrong density values in a histogram with `fill` option in `ggplot2`

I was creating histograms with ggplot2 in R whose bins are separated with colors and noticed one thing. When the bins of a histogram are separated by colors with fill option, the density value of the histogram turns funny.
Here is the data.
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
This is a histogram without fill.
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..))
This is a histogram with fill.
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(aes(y=..density..))
You can see the latter is pretty crazy. The left side of the bins is sticking out. The density values of the bins of each color are obviously wrong.
I thought over this issue for a while. The data can't be wrong for the first histogram was normal. It should be something in ggplot2 or geom_histogram function. I googled "geom_histogram density fill" and couldn't find much help.
I want the end product to look like:
Separated by colors as you see in the second histogram
Size and shape identical to the first histogram
The vertical axis being density
How would you deal with issue?
I think what you may want is this:
ggplot(df, aes(x = x, fill=b)) +
geom_histogram()
Rather than the density. As mentioned above the density is asking for extra calcuations.
One thing that is important (in my opinion) is that histograms are graphs of one variable. As soon as you start adding data from other variables you start to change them more into bar charts or something else like that.
You will want work on setting the axis manually if you want it to range from 0 to .4.
The solution is to hand-compute density like this (instead of using the built-in ggplot2 version):
library(ggplot2)
# Generate test data
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(mapping = aes(y = ..count.. / (sum(..count..) * ..width..)))
when you provide a column name for the fill parameter in ggplot it groups varaiables and plots them according to each group with a unique color.
if you want a single color for the plot just specify the color you want:
FIXED
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),fill="Blue")

ggplot geom_histogram color by factor not working properly

In trying to color my stacked histogram according to a factor column; all the bars have a "green" roof? I want the bar-top to be the same color as the bar itself. The figure below shows clearly what is wrong. All the bars have a "green" horizontal line at the top?
Here is a dummy data set :
BodyLength <- rnorm(100, mean = 50, sd = 3)
vector <- c("80","10","5","5")
colors <- c("black","blue","red","green")
color <- rep(colors,vector)
data <- data.frame(BodyLength,color)
And the program I used to generate the plot below :
plot <- ggplot(data = data, aes(x=data$BodyLength, color = factor(data$color), fill=I("transparent")))
plot <- plot + geom_histogram()
plot <- plot + scale_colour_manual(values = c("Black","blue","red","green"))
Also, since the data column itself contains color names, any way I don't have to specify them again in scale_color_manual? Can ggplot identify them from the data itself? But I would really like help with the first problem right now...Thanks.
Here is a quick way to get your colors to scale_colour_manual without writing out a vector:
data <- data.frame(BodyLength,color)
data$color<- factor(data$color)
and then later,
scale_colour_manual(values = levels(data$color))
Now, with respect to your first problem, I don't know exactly why your bars have green roofs. However, you may want to look at some different options for the position argument in geom_histogram, such as
plot + geom_histogram(position="identity")
..or position="dodge". The identity option is closer to what you want but since green is the last line drawn, it overwrites previous the colors.
I like density plots better for these problems myself.
ggplot(data=data, aes(x=BodyLength, color=color)) + geom_density()
ggplot(data=data, aes(x=BodyLength, fill=color)) + geom_density(alpha=.3)

Using a uniform color palette among different ggplot2 graphs with factor variable

I am using ggplot2 to create several plots about the same data. In particular I am interested in plotting observations according to a factor variable with 6 levels ("cluster").
But the plots produced by ggplot2 use different palettes every time!
For example, if I make a bar plot with this formula I get this result (this palette is what I expect to obtain):
qplot(cluster, data = data, fill = cluster) + ggtitle("Clusters")
And if I make a scatter plot and I try to color the observations according to their belonging to a cluster I get this result (notice that the color palette is different):
ggplot(data, aes(liens_ratio,RT_ratio)) +
geom_point(col=data$cluster, size=data$nombre_de_tweet/100+2) +
geom_smooth() +
ggtitle("Links - RTs")
Any idea on how to solve this issue?
I can't be certain this will work in your specific case without a reproducible example, but I'm reasonably confident that all you need to do is set your color inside an aes() call within the geom you want to color. That is,
ggplot(data, aes(x = liens_ratio, y = RT_ratio)) +
geom_point(aes(color = cluster, size = nombre_de_tweet/100+2)) +
geom_smooth() +
ggtitle("Links - RTs")
If all plots you make use the same data and this basic format, the color palette should be the same regardless of the geom used. Additional elements, such as the line from geom_smooth() will not be changed unless they are also explicitly colored.
The palette will just be the default one, of course; to change it look into scale_color_manual.

Highlight a particular plot among multiple plots

I've made a plot using a data frame and ggplot. Here's the plot for example
I'll be using this in a presentation. In one slide, I'm going to talk about epsilon=0.1, and in the next I'll be talking about epsilon=0.5. My question is: How do I make one particular plot thicker? i.e. I wish to create a plot where the orange graph corresponding to epsilon=0.1 is thick (and thus highlighted), so the audience knows that is the graph I'm referring to.
What I would do is add an additional column to the data, thickness, which you can assign to the size aesthetic of geom_line. You simply assign a higher value to the values in thickness where epsilon equals 0.1:
df$thickness = ifelse(df$epsilon == 0.1, 2, 1)
and use it in aes() of geom_line():
ggplot(df,aes(x,y,color=as.factor(epsilon))) +
geom_line(aes(size = thickness)) + scale_size_identity()
You can simply change the value in the call to ifelse to change which line get's highlighted. Note the use of scale_size_identity to prevent ggplot from scaling the values, and simply using the values in thickness as such.
An example with the built-in dataset mtcars:
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_line(aes(size = ifelse(mtcars$cyl == 6))) +
scale_size_identity()

ggplot boxplot + fill + color brewer spectrum

I can't seem to be able to fill a boxplot by a continuous value using color brewer, and I know it must just be a simple swap of syntax somewhere, since I can get the outlines of the boxes to adjust based on continuous values. Here's the data I'm working with:
data <- data.frame(
value = sample(1:50),
animals = sample(c("cat","dog","zebra"), 50, replace = TRUE),
region = sample(c("forest","desert","tundra"), 50, replace = TRUE)
)
I want to make a paneled boxplot, ordered by median "value", with the depth of color fill for each box increasing with "value" (I know this is redundant, but bear with me for the sake of the example)
(Ordering the data):
orderindex <- order(as.numeric(by(data$value, data$animals, median)))
data$animals <- ordered(data$animals, levels=levels(data$animals)[orderindex])
If I create the boxplot with panels, I can adjust the color of the outlines:
library(ggplot2)
first <- qplot(animals, value, data = data, colour=animals)
second <- first + geom_boxplot() + facet_grid(~region)
third <- second + scale_colour_brewer()
print(third)
But I want to do what I did to the outlines, but instead with the fill of each box (so each box gets darker as "value" increases). I thought that it might be a matter of putting the "scale_colour_brewer()" argument within the aesthetic argument for geom_boxplot, ie
second <- first + geom_boxplot(aes(scale_colour_brewer())) + facet_grid(~region)
but that doesn't seem to do the trick. I know it's a matter of positioning for this "scale_colour_brewer" argument; I just don't know where it goes!
(there is a similar question here but it's not quite what I'm looking for, since the colors of the box don't increase along a spectrum/gradient with some continuous value; it looks like these values are basically factors: Add color to boxplot - "Continuous value supplied to discrete scale" error, and the example at the ggplot site with the cars package:
http://docs.ggplot2.org/0.9.3.1/geom_boxplot.html doesn't seem to work when I set "fill" to "value" ... I get the error:
Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0)
)
If you need to set fill for the boxplots then instead of color=animals use fill=animals and the same way replace scale_color_brewer() with scale_fill_brewer().
qplot(animals, value, data = data, fill=animals)+
geom_boxplot() + facet_grid(~region) + scale_fill_brewer()

Resources