Histogram with fraction in qplot / ggplot

Histogram with fraction in qplot / ggplot - r

So far I have missed a histogram function with a fraction on the y-axis. Like this:
require(ggplot2)
data(diamonds)
idealD <- diamonds[diamonds[,"cut"]=="Ideal",]
fracHist <- function(x){
frac <- (hist(x,plot=F)$counts) / (sum(hist(x,plot=F)$counts))
barplot(frac)
}
### call
fracHist(idealD$carat)
It ain't pretty but basically should explain what I want: bar heights should add up to one. Plus the breaks should be labelling the x-axis. I'd love to create the same with ggplot2 but can't figure out how to get around plotting the frequencies offracinstead of plottingfracitself.
all I get with `ggplot` is density...
m <- ggplot(idealD, aes(x=carat))
m + geom_histogram(aes(y = ..density..)) + geom_density()

The solution is to use stat_bin and map the aesthetic y=..count../sum(..count..)
library(ggplot2)
ggplot(idealD, aes(x=carat)) + stat_bin(aes(y=..count../sum(..count..)))
From a quick scan of ?hist I couldn't find how the values are binned in hist. This means the graphs won't be identical unless you fiddle with the binwidth argument of stat_bin.

The trick works with geom_histogram histogram as well.
require(ggplot2)
data(diamonds)
idealD <- diamonds[diamonds[,"cut"]=="Ideal",]
ggplot(idealD, aes(x=carat)) + geom_histogram(aes(y=..count../sum(..count..)))

Related

Y axis proportions in histogram with ggplot

I have prepared a dataset that I wish to display as a histogram.
I believe I get the X axis right, but can't seem to get totmis1 on the Y axis... Just an unclear histogram:
ggplot(data = brfss2013a, aes(x = totmis)) +
geom_histogram(binwidth = 3)

tl;dr use geom_bar(stat="identity") instead of geom_histogram()
I think the terminology you are looking for is a bar chart (technically, a histogram is the result of counting/binning a continuous distribution of data; it's not clear whether you've already computed these values by binning, or whether the data mean something else, but I don't think it matters).
dd <- data.frame(totmis=1:11,
totmis1=c(5786,5086,3187,2594,1591,1318,
847,754,512,511,383))
library(ggplot2)
ggplot(dd, aes(totmis,totmis1))+
geom_bar(stat="identity")
You need stat="identity" because geom_bar() tries to count occurrences by default ...

Wrong density values in a histogram with `fill` option in `ggplot2`

I was creating histograms with ggplot2 in R whose bins are separated with colors and noticed one thing. When the bins of a histogram are separated by colors with fill option, the density value of the histogram turns funny.
Here is the data.
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
This is a histogram without fill.
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..))
This is a histogram with fill.
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(aes(y=..density..))
You can see the latter is pretty crazy. The left side of the bins is sticking out. The density values of the bins of each color are obviously wrong.
I thought over this issue for a while. The data can't be wrong for the first histogram was normal. It should be something in ggplot2 or geom_histogram function. I googled "geom_histogram density fill" and couldn't find much help.
I want the end product to look like:
Separated by colors as you see in the second histogram
Size and shape identical to the first histogram
The vertical axis being density
How would you deal with issue?

I think what you may want is this:
ggplot(df, aes(x = x, fill=b)) +
geom_histogram()
Rather than the density. As mentioned above the density is asking for extra calcuations.
One thing that is important (in my opinion) is that histograms are graphs of one variable. As soon as you start adding data from other variables you start to change them more into bar charts or something else like that.
You will want work on setting the axis manually if you want it to range from 0 to .4.

The solution is to hand-compute density like this (instead of using the built-in ggplot2 version):
library(ggplot2)
# Generate test data
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(mapping = aes(y = ..count.. / (sum(..count..) * ..width..)))

when you provide a column name for the fill parameter in ggplot it groups varaiables and plots them according to each group with a unique color.
if you want a single color for the plot just specify the color you want:
FIXED
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),fill="Blue")

How can I define a color palette (normalize) for multiple hexbin plots in R

I want to find a way to set a certain range of a color palette that is used for a hexbin plot to normalize multiple plots in R.
So far I have tried:
library(hexbin)
library(gplots)
my.colors <- function (n)
{
(rich.colors(n))
}
plot(hexbin(lastthousand$V4, lastthousand$V5, xbnds=c(0,35), ybnds=c(0,35),), xlab="Green Pucks", ylab="Red Pucks",colramp = my.colors, colorcut = seq(0, 1, length = 25),lcex=0.66)
Which results in the follwing plot:
hexbin plot #1
I understand that "colourcut" controls the resolution of the color palette. But I found no way to controll the min/max values
Lets say I have a second plot - 'hexbin plot #2' - with counts from 1(dark-blue) to 100(red). Is there a way to use only the colors 1(dark-blue)-24(light-blue) [based on only a part of the 1(dark-blue)-100(red) scale] for hexbin plot #1?
The final goal is to have several hexbin plots next to each other which follow the same colour scheme (min and max based on the one with the highest counts).
-this is my first question here :) and I'm new to R, please be gentle
//edit: For everyone with the same problem: My supervisor suggested to use facets in ggplot2. Will see how that works and return with another edit if it solves the issue.
//edit2: factes did the trick:
library(gplots)
library(ggplot2)
p <- ggplot(data=lastthousand, aes(lastthousand$V4,lastthousand$V5))+ geom_hex()
p + facet_grid(. ~ Market) + xlab("green pucks") + ylab("red pucks") + scale_colour_gradientn(colours=rainbow(7))

Maybe this can be useful: https://gist.github.com/wahalulu/1376861
and this for ranges:
https://stackoverflow.com/a/15505591/1600108
https://stackoverflow.com/a/14586941/1600108

Facets does the trick:
library(gplots)
library(ggplot2)
p <- ggplot(data=lastthousand, aes(lastthousand$V4,lastthousand$V5))+ geom_hex()
p + facet_grid(. ~ Market) + xlab("green pucks") + ylab("red pucks") + scale_colour_gradientn(colours=rainbow(7))

log-scaled density plot: ggplot2 and freqpoly, but with points instead of lines

What I really want to do is plot a histogram, with the y-axis on a log-scale. Obviously this i a problem with the ggplot2 geom_histogram, since the bottom os the bar is at zero, and the log of that gives you trouble.
My workaround is to use the freqpoly geom, and that more-or less does the job. The following code works just fine:
ggplot(zcoorddist) +
geom_freqpoly(aes(x=zcoord,y=..density..),binwidth = 0.001) +
scale_y_continuous(trans = 'log10')
The issue is that at the edges of my data, I get a couple of garish vertical lines that really thro you off visually when combining a bunch of these freqpoly curves in one plot. What I'd like to be able to do is use points at every vertex of the freqpoly curve, and no lines connecting them. Is there a way to to this easily?

The easiest way to get the desired plot is to just recast your data. Then you can use geom_point. Since you don't provide an example, I used the standard example for geom_histogram to show this:
# load packages
require(ggplot2)
require(reshape)
# get data
data(movies)
movies <- movies[, c("title", "rating")]
# here's the equivalent of your plot
ggplot(movies) + geom_freqpoly(aes(x=rating, y=..density..), binwidth=.001) +
scale_y_continuous(trans = 'log10')
# recast the data
df1 <- recast(movies, value~., measure.var="rating")
names(df1) <- c("rating", "number")
# alternative way to recast data
df2 <- as.data.frame(table(movies$rating))
names(df2) <- c("rating", "number")
df2$rating <- as.numeric(as.character(df$rating))
# plot
p <- ggplot(df1, aes(x=rating)) + scale_y_continuous(trans="log10", name="density")
# with lines
p + geom_linerange(aes(ymax=number, ymin=.9))
# only points
p + geom_point(aes(y=number))

How to plot stacked point histograms?

What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:
Plot Histogram with Points Instead of Bars
Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)

ggplot2 does dotplots Link to the manual.
Here is an example:
library(ggplot2)
set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))
ggplot(x, aes(y)) + geom_dotplot()
In order to make it behave like a simple dotplot, we should do this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')
You should get this:
To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()
More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.
So your new call looks like this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)
Which will give you this:
Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.
Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.

As #joran pointed out, we can use geom_dotplot
require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()
Edit: (moved useful comments into the post):
The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.
An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:
When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.
So you can use this code to hide y axis:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5) +
scale_y_continuous(name = "", breaks = NULL)

I introduce an exact approach using #Waldir Leoncio's latter method.
library(ggplot2); library(grid)
set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))
g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
# num_balls is the value of A
g + ylim(0, num_balls)

Apologies : I don't have enough reputation to 'comment'.
I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
to
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Histogram with fraction in qplot / ggplot - r

The trick works with geom_histogram histogram as well. require(ggplot2) data(diamonds) idealD <- diamonds[diamonds[,"cut"]=="Ideal",] ggplot(idealD, aes(x=carat)) + geom_histogram(aes(y=..count../sum(..count..)))

Related

Y axis proportions in histogram with ggplot

Wrong density values in a histogram with `fill` option in `ggplot2`

How can I define a color palette (normalize) for multiple hexbin plots in R

log-scaled density plot: ggplot2 and freqpoly, but with points instead of lines

How to plot stacked point histograms?

Categories

Resources