ggplot: plotting more than one function on one plot - r

I have a task to plot histogram using my data (here) named NoPodsWeight, its density and normal distribution for this segment (min(NoPodsWeight) and max(NoPodsWeight)).
I am trying this:
myframe <- read.csv(filepath, fileEncoding = "UTF-8", stringsAsFactors = FALSE)
myframe <- myframe[rowSums(is.na(myframe)) <= 0,]
nopodsweight <- myframe$NoPodsWeight
height <- myframe$Height
ggplot(myframe, aes(x = NoPodsWeight, y = ..density..)) +
geom_histogram(color="black", fill="white") +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(myframe$NoPodsWeight), sd = sd(myframe$NoPodsWeight)))
Using this code I get an error:
Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y =
..density...
Did you map your stat in the wrong layer?
I don't understand how to plot two or more functions on one plot. For example I can solve my problem using standard plot (but without density):
hist(x = nopodsweight, freq = F, ylim = c(0, 0.45), breaks = 37)
n_norm<-seq(min(nopodsweight)-1, max(nopodsweight)+1, 0.0001)
lines(n_norm, dnorm(n_norm), col = "red")
Is there any function in ggplot to plot (normal) distribution (or maybe using another function) like in lines?

You need to take ..density.. out of the ggplot() layer and put it specifically in the geom_histogram layer. I didn't download and import your data, but here's an example on mtcars:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..)) +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(mtcars$mpg), sd = sd(mtcars$mpg)))

The error message says "did you map your stat in the wrong layer?"; that's a hint. Moving aes(y=..density..) to apply specifically to geom_histogram() seems to make everything OK ...
ggplot(myframe, aes(x = NoPodsWeight)) +
geom_histogram(color="black", fill="white",
aes(y = ..density..)) +
## [... everything else ...]

Related

Group geom_vline for a conditional

I believe I'm going about this incorrectly.
I have a ggplot that has several lines graphed into it. Each line is categorized under a 'group.' (ie. predator lines include lines for bear frequency, lion_frequency; prey lines include lines for fish frequency, rabbit_frequency; etc.)
Here's a reproducible example using dummy data
p <- function(black_lines, green_lines){
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
geom_vline(xintercept = 5) +
geom_vline(xintercept = 10) +
geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
}
p()
Ideally, it would work like:
p <- function(black_lines, green_lines){
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
if (black_lines){
geom_vline(xintercept = 5) +
geom_vline(xintercept = 10) +
}
if(green_lines){
geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
}
}
p(T, T)
This method won't work, of course since R doesn't like ->
Error in ggplot_add():
! Cannot add ggproto objects together. Did you forget to add this object to a ggplot object?
But I'm wondering if this is possible? I couldn't find any similar questions so I feel like I'm going about wrongly.
For those who believe more context is needed. This is for a reactive Shiny app and I want the user to be able to select how the graph will be generated (as such with specific lines or not).
Thank you for your guidance in advance!
You could create your conditional layers using an if and assign them to a variable which could then be added to your ggplot like any other layer:
Note: In case you want to include multiple layers then you could put them in a list, e.g. list(geom_vline(...), geom_vline(...)).
library(ggplot2)
p <- function(black_lines, green_lines){
vline_black <- if (black_lines) geom_vline(xintercept = c(5, 10))
vline_green <- if (green_lines) geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
vline_black +
vline_green
}
p(T, T)
p(T, F)

Stat_function on ggplot doesn't work

I have a problem with ggplot, especially with stat_function.
I want to plot my distribution, then I want to calculate the Chi-square theoretical distribution and plot it on my actual plot.
Without ggplot, I made this with a simple lines()
res.lk <- apply (as.matrix(baf_matrix$res.fst), 2, FUN = function (x) baf_matrix$res.fst*4 / (mean(baf_matrix$res.fst)))
hist(res.lk, probability = T, breaks = 100,main = "LK distribution",
xlab ="LK")
x = c(0:as.integer(max(res.lk)))
lines(x,dchisq(x,df=4),lwd=3,col="orange")
Now to make it with ggplot I do this:
Y = as.data.frame(res.lk)
ggplot(data = Y , aes( x = Lk )) + geom_histogram( binwidth=.1, color="black", fill="white" ) + stat_function( fun = dchisq(c(0:as.integer(max(res.lk))),df=4), lwd = 3, col = "orange") + theme_bw()
And I get this error : Warning message:
Computation failed in stat_function():
'what' must be a function or character string
This is what my data looks like :
The Lk distribution
I'm trying to fix it but I didn't find how. Can somebody help me please?? Thank you a lot in advance.
Note: it would really help if you included example data for us to work with.
The argument fun has to be a function. You're passing a vector. Also, because the distribution line only depends on a single value in the data, it'd be better to use annotate:
xseq <- seq(max(Y$res.lk))
ggplot(data = Y, aes(x = Lk)) +
geom_histogram(binwidth = .1, color="black", fill="white") +
annotate(
geom = "line",
x = xseq,
y = dchisq(xseq, df = 4),
width = 3,
color = "orange"
) +
theme_bw()

How to pass literal text to a function in R

I'm new to R (and programming in general), and having some trouble wrapping my head around functions. I am trying to write a function for plotting a histogram of a given variable with a normal curve overlaid. Here, I have code that does this given a specific variable in data:
dev.new()
ggplot(data,aes(x = variable)) +
geom_histogram(aes(y=..density..),binwidth=.2)+
geom_density(na.rm=TRUE)+
stat_function(fun=dnorm, args=list(mean=mean(data$variable,na.rm=TRUE),
sd=sd(data$variable, na.rm=TRUE)), linetype=4, colour="red")
This code works just fine if I put in the specific data and variable, but if I try to pass the same things through a function, it no longer does. For instance:
plotnormal<-function(data,variable){
dev.new()
ggplot(data,aes(x = variable)) +
geom_histogram(aes(y=..density..),binwidth=.2)+
geom_density(na.rm=TRUE)+
stat_function(fun=dnorm, args=list(mean=mean(data$variable,na.rm=TRUE),
sd=sd(data$variable, na.rm=TRUE)), linetype=4, colour="red")}
plotnormal(data, variable)
What gives? Is there any way to just pass the exact text that I enter into the function?
When passing a string variable name into a function where you plan on using that name in a subset of a data frame or other structure, use
data[[variable]]
instead of
data$variable
For more, read help(Extract) Also, you'll want to use aes_string() instead of aes(). Then your updated function should be
plotnormal <- function(data, variable) {
dev.new()
ggplot(data, aes_string(x = variable)) +
geom_histogram(aes(y = ..density..), binwidth = .2) +
geom_density(na.rm = TRUE) +
stat_function(fun = dnorm, args = list(mean = mean(data[[variable]], na.rm = TRUE),
sd = sd(data[[variable]], na.rm = TRUE)), linetype = 4, colour = "red")
}
Hitting the space bar every once in a while helps too.

How to suppress warnings when plotting with ggplot

When passing missing values to ggplot, it's very kind, and warns us that they are present. This is acceptable in an interactive session, but when writing reports, you do not the output get cluttered with warnings, especially if there's many of them. Below example has one label missing, which produces a warning.
library(ggplot2)
library(reshape2)
mydf <- data.frame(
species = sample(c("A", "B"), 100, replace = TRUE),
lvl = factor(sample(1:3, 100, replace = TRUE))
)
labs <- melt(with(mydf, table(species, lvl)))
names(labs) <- c("species", "lvl", "value")
labs[3, "value"] <- NA
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value, label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
If we wrap suppressWarnings around the last expression, we get a summary of how many warnings there were. For the sake of argument, let's say that this isn't acceptable (but is indeed very honest and correct). How to (completely) suppress warnings when printing a ggplot2 object?
You need to suppressWarnings() around the print() call, not the creation of the ggplot() object:
R> suppressWarnings(print(
+ ggplot(mydf, aes(x = species)) +
+ stat_bin() +
+ geom_text(data = labs, aes(x = species, y = value,
+ label = value, vjust = -0.5)) +
+ facet_wrap(~ lvl)))
R>
It might be easier to assign the final plot to an object and then print().
plt <- ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
R> suppressWarnings(print(plt))
R>
The reason for the behaviour is that the warnings are only generated when the plot is actually drawn, not when the object representing the plot is created. R will auto print during interactive usage, so whilst
R> suppressWarnings(plt)
Warning message:
Removed 1 rows containing missing values (geom_text).
doesn't work because, in effect, you are calling print(suppressWarnings(plt)), whereas
R> suppressWarnings(print(plt))
R>
does work because suppressWarnings() can capture the warnings arising from the print() call.
A more targeted plot-by-plot approach would be to add na.rm=TRUE to your plot calls.
E.g.:
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5), na.rm=TRUE) +
facet_wrap(~ lvl)
In your question, you mention report writing, so it might be better to set the global warning level:
options(warn=-1)
the default is:
options(warn=0)

Plot mean and sd of dataset per x value using ggplot2

I have a dataset that looks a little like this:
a <- data.frame(x=rep(c(1,2,3,5,7,10,15,20), 5),
y=rnorm(40, sd=2) + rep(c(4,3.5,3,2.5,2,1.5,1,0.5), 5))
ggplot(a, aes(x=x,y=y)) + geom_point() +geom_smooth()
I want the same output as that plot, but instead of smooth curve, I just want to take line segments between the mean/sd values for each set of x values. The graph should look similar to the above graph, but jagged, instead of curved.
I tried this, but it fails, even though the x values aren't unique:
ggplot(a, aes(x=x,y=y)) + geom_point() +stat_smooth(aes(group=x, y=y, x=x))
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
?stat_summary is what you should look at.
Here is an example
# functions to calculate the upper and lower CI bounds
uci <- function(y,.alpha){mean(y) + qnorm(abs(.alpha)/2) * sd(y)}
lci <- function(y,.alpha){mean(y) - qnorm(abs(.alpha)/2) * sd(y)}
ggplot(a, aes(x=x,y=y)) + stat_summary(fun.y = mean, geom = 'line', colour = 'blue') +
stat_summary(fun.y = mean, geom = 'ribbon',fun.ymax = uci, fun.ymin = lci, .alpha = 0.05, alpha = 0.5)
You can use one of the built-in summary functions mean_sdl. The code is shown below
ggplot(a, aes(x=x,y=y)) +
stat_summary(fun.y = 'mean', colour = 'blue', geom = 'line')
stat_summary(fun.data = 'mean_sdl', geom = 'ribbon', alpha = 0.2)
Using ggplot2 0.9.3.1, the following did the trick for me:
ggplot(a, aes(x=x,y=y)) + geom_point() +
stat_summary(fun.data = 'mean_sdl', mult = 1, geom = 'smooth')
The 'mean_sdl' is an implementation of the Hmisc package's function 'smean.sdl' and the mult-variable gives how many standard deviations (above and below the mean) are displayed.
For detailed info on the original function:
library('Hmisc')
?smean.sdl
You could try writing a summary function as suggested by Hadley Wickham on the website for ggplot2: http://had.co.nz/ggplot2/stat_summary.html. Applying his suggestion to your code:
p <- qplot(x, y, data=a)
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="blue", geom=geom, width=0.2, ...)
}
p + stat_sum_df("mean_cl_normal", geom = "smooth")
This results in this graphic:

Resources