how to add mean value in ggplot2 - r

I am new to R, I have a plot that shows the percprof stats in each state of all counties, but when I am trying to add a mean value to each plot, it is not working:
library(ggplot2)
data(midwest)
percprof_mean <- sd(midwest$percprof)
ggplot(midwest, aes(x=percprof, y=..density..))
+ geom_histogram(binwidth = 0.5, color = "white") + facet_grid(state ~.)
+ stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
did I use stat_summary function wrong here? I get error says:
Error: stat_summary requires the following missing aesthetics: y

plyr based solution.
library(ggplot2)
library(plyr)
data(midwest)
nn <- ddply(midwest, "state", transform,
state_mean = mean(percprof))
ggplot(nn) +
geom_histogram(aes(percprof, y=..density..),binwidth = 0.5, color = "white") +
geom_vline(aes(xintercept = state_mean),data=nn,linetype = 5) + facet_grid(state~.)

Related

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

ggplot: plotting more than one function on one plot

I have a task to plot histogram using my data (here) named NoPodsWeight, its density and normal distribution for this segment (min(NoPodsWeight) and max(NoPodsWeight)).
I am trying this:
myframe <- read.csv(filepath, fileEncoding = "UTF-8", stringsAsFactors = FALSE)
myframe <- myframe[rowSums(is.na(myframe)) <= 0,]
nopodsweight <- myframe$NoPodsWeight
height <- myframe$Height
ggplot(myframe, aes(x = NoPodsWeight, y = ..density..)) +
geom_histogram(color="black", fill="white") +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(myframe$NoPodsWeight), sd = sd(myframe$NoPodsWeight)))
Using this code I get an error:
Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y =
..density...
Did you map your stat in the wrong layer?
I don't understand how to plot two or more functions on one plot. For example I can solve my problem using standard plot (but without density):
hist(x = nopodsweight, freq = F, ylim = c(0, 0.45), breaks = 37)
n_norm<-seq(min(nopodsweight)-1, max(nopodsweight)+1, 0.0001)
lines(n_norm, dnorm(n_norm), col = "red")
Is there any function in ggplot to plot (normal) distribution (or maybe using another function) like in lines?
You need to take ..density.. out of the ggplot() layer and put it specifically in the geom_histogram layer. I didn't download and import your data, but here's an example on mtcars:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..)) +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(mtcars$mpg), sd = sd(mtcars$mpg)))
The error message says "did you map your stat in the wrong layer?"; that's a hint. Moving aes(y=..density..) to apply specifically to geom_histogram() seems to make everything OK ...
ggplot(myframe, aes(x = NoPodsWeight)) +
geom_histogram(color="black", fill="white",
aes(y = ..density..)) +
## [... everything else ...]

Plot not showing up in R

How can I fix the following code
alpha <- 1
draws <- 15
dimen <- 10
require(MCMCpack)
x <- rdirichlet(draws, rep(alpha, dimen))
require(ggplot2)
dat <- data.frame(item=factor(rep(1:10,15)),
draw=factor(rep(1:15,each=10)),
value=as.vector(t(x)))
ggplot(dat,aes(x=item,y=value,ymin=0,ymax=value)) +
geom_point(colour=I("blue")) +
geom_linerange(colour=I("blue")) +
facet_wrap(~draw,ncol=5) +
scale_y_continuous(lim=c(0,1)) +
opts(panel.border=theme_rect())
to not to get this empty plot:
I assume you get the following error message:
'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
theme_rect is deprecated. Use 'element_rect' instead. (Deprecated; last used in version 0.9.1)
If so, this should be stated in your question.
Using the current version of ggplot2 (0.9.3.1) and theme() instead of opts(), this script:
ggplot(data = dat, aes(x = item, y = value, ymin = 0, ymax = value)) +
geom_point(colour = "blue") +
geom_linerange(colour = "blue") +
facet_wrap(~draw, ncol = 5) +
scale_y_continuous(lim = c(0, 1)) +
theme_bw() +
theme(panel.border = element_rect(colour = "black"))
...gives this plot:
Is this what you want?
You may also wish to check the scales argument in ?facet_wrap, and coord_cartesian as an alternative to set limits in scale_y_continuous

Plot mean and sd of dataset per x value using ggplot2

I have a dataset that looks a little like this:
a <- data.frame(x=rep(c(1,2,3,5,7,10,15,20), 5),
y=rnorm(40, sd=2) + rep(c(4,3.5,3,2.5,2,1.5,1,0.5), 5))
ggplot(a, aes(x=x,y=y)) + geom_point() +geom_smooth()
I want the same output as that plot, but instead of smooth curve, I just want to take line segments between the mean/sd values for each set of x values. The graph should look similar to the above graph, but jagged, instead of curved.
I tried this, but it fails, even though the x values aren't unique:
ggplot(a, aes(x=x,y=y)) + geom_point() +stat_smooth(aes(group=x, y=y, x=x))
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
?stat_summary is what you should look at.
Here is an example
# functions to calculate the upper and lower CI bounds
uci <- function(y,.alpha){mean(y) + qnorm(abs(.alpha)/2) * sd(y)}
lci <- function(y,.alpha){mean(y) - qnorm(abs(.alpha)/2) * sd(y)}
ggplot(a, aes(x=x,y=y)) + stat_summary(fun.y = mean, geom = 'line', colour = 'blue') +
stat_summary(fun.y = mean, geom = 'ribbon',fun.ymax = uci, fun.ymin = lci, .alpha = 0.05, alpha = 0.5)
You can use one of the built-in summary functions mean_sdl. The code is shown below
ggplot(a, aes(x=x,y=y)) +
stat_summary(fun.y = 'mean', colour = 'blue', geom = 'line')
stat_summary(fun.data = 'mean_sdl', geom = 'ribbon', alpha = 0.2)
Using ggplot2 0.9.3.1, the following did the trick for me:
ggplot(a, aes(x=x,y=y)) + geom_point() +
stat_summary(fun.data = 'mean_sdl', mult = 1, geom = 'smooth')
The 'mean_sdl' is an implementation of the Hmisc package's function 'smean.sdl' and the mult-variable gives how many standard deviations (above and below the mean) are displayed.
For detailed info on the original function:
library('Hmisc')
?smean.sdl
You could try writing a summary function as suggested by Hadley Wickham on the website for ggplot2: http://had.co.nz/ggplot2/stat_summary.html. Applying his suggestion to your code:
p <- qplot(x, y, data=a)
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="blue", geom=geom, width=0.2, ...)
}
p + stat_sum_df("mean_cl_normal", geom = "smooth")
This results in this graphic:

Overlaying histograms with ggplot2 in R

I am new to R and am trying to plot 3 histograms onto the same graph.
Everything worked fine, but my problem is that you don't see where 2 histograms overlap - they look rather cut off.
When I make density plots, it looks perfect: each curve is surrounded by a black frame line, and colours look different where curves overlap.
Can someone tell me if something similar can be achieved with the histograms in the 1st picture? This is the code I'm using:
lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
Using #joran's sample data,
ggplot(dat, aes(x=xx, fill=yy)) + geom_histogram(alpha=0.2, position="identity")
note that the default position of geom_histogram is "stack."
see "position adjustment" of this page:
geom_histogram documentation
Your current code:
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
is telling ggplot to construct one histogram using all the values in f0 and then color the bars of this single histogram according to the variable utt.
What you want instead is to create three separate histograms, with alpha blending so that they are visible through each other. So you probably want to use three separate calls to geom_histogram, where each one gets it's own data frame and fill:
ggplot(histogram, aes(f0)) +
geom_histogram(data = lowf0, fill = "red", alpha = 0.2) +
geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
geom_histogram(data = highf0, fill = "green", alpha = 0.2) +
Here's a concrete example with some output:
dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))
ggplot(dat,aes(x=xx)) +
geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)
which produces something like this:
Edited to fix typos; you wanted fill, not colour.
While only a few lines are required to plot multiple/overlapping histograms in ggplot2, the results are't always satisfactory. There needs to be proper use of borders and coloring to ensure the eye can differentiate between histograms.
The following functions balance border colors, opacities, and superimposed density plots to enable the viewer to differentiate among distributions.
Single histogram:
plot_histogram <- function(df, feature) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
geom_density(alpha=0.3, fill="red") +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
print(plt)
}
Multiple histogram:
plot_multi_histogram <- function(df, feature, label_column) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
Usage:
Simply pass your data frame into the above functions along with desired arguments:
plot_histogram(iris, 'Sepal.Width')
plot_multi_histogram(iris, 'Sepal.Width', 'Species')
The extra parameter in plot_multi_histogram is the name of the column containing the category labels.
We can see this more dramatically by creating a dataframe with many different distribution means:
a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))
Passing data frame in as before (and widening chart using options):
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')
To add a separate vertical line for each distribution:
plot_multi_histogram <- function(df, feature, label_column, means) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
The only change over the previous plot_multi_histogram function is the addition of means to the parameters, and changing the geom_vline line to accept multiple values.
Usage:
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))
Result:
Since I set the means explicitly in many_distros I can simply pass them in. Alternatively you can simply calculate these inside the function and use that way.

Resources