Stat_function on ggplot doesn't work - r

I have a problem with ggplot, especially with stat_function.
I want to plot my distribution, then I want to calculate the Chi-square theoretical distribution and plot it on my actual plot.
Without ggplot, I made this with a simple lines()
res.lk <- apply (as.matrix(baf_matrix$res.fst), 2, FUN = function (x) baf_matrix$res.fst*4 / (mean(baf_matrix$res.fst)))
hist(res.lk, probability = T, breaks = 100,main = "LK distribution",
xlab ="LK")
x = c(0:as.integer(max(res.lk)))
lines(x,dchisq(x,df=4),lwd=3,col="orange")
Now to make it with ggplot I do this:
Y = as.data.frame(res.lk)
ggplot(data = Y , aes( x = Lk )) + geom_histogram( binwidth=.1, color="black", fill="white" ) + stat_function( fun = dchisq(c(0:as.integer(max(res.lk))),df=4), lwd = 3, col = "orange") + theme_bw()
And I get this error : Warning message:
Computation failed in stat_function():
'what' must be a function or character string
This is what my data looks like :
The Lk distribution
I'm trying to fix it but I didn't find how. Can somebody help me please?? Thank you a lot in advance.

Note: it would really help if you included example data for us to work with.
The argument fun has to be a function. You're passing a vector. Also, because the distribution line only depends on a single value in the data, it'd be better to use annotate:
xseq <- seq(max(Y$res.lk))
ggplot(data = Y, aes(x = Lk)) +
geom_histogram(binwidth = .1, color="black", fill="white") +
annotate(
geom = "line",
x = xseq,
y = dchisq(xseq, df = 4),
width = 3,
color = "orange"
) +
theme_bw()

Related

Meansplot with Tukey HSD confidence intervals in R

I'm trying to make a meansplot with confidence intervals, but I would like the intervals to be Tukey HSD intervals after an ANOVA is computed.
I'll use the next example here to explain, in the dataframe there is a factor: poison {1,2,3}
library(magrittr)
library(ggplot2)
library(ggpubr)
library(dplyr)
library("agricolae")
PATH <- "https://raw.githubusercontent.com/guru99-edu/R-Programming/master/poisons.csv"
df <- read.csv(PATH) %>%
select(-X) %>%
mutate(poison = factor(poison, ordered = TRUE))
glimpse(df)
ggplot(df, aes(x = poison, y = time, fill = poison)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = "steelblue",
position = position_jitter(0.21)) +
theme_classic()
anova_one_way <- aov(time ~ poison, data = df)
summary(anova_one_way)
# Use TukeyHSD
tukeyHSD <- TukeyHSD(anova_one_way)
plot(tukeyHSD)
I would like the plot to be similar to the one from statgraphics, where you can see the mean point and the lenght of the bars is the HSD tuckey intervals, so in one simple glimpse you can apreciate the best level and if it is better and is statistically significantly better.
I have seen some examples in more complex questions but is for boxplots and I dont understand it enough to adapt the solutions here.
Tukey's results on boxplot in R
example1
example1
TukeyHSD results on boxplot after two-way anova
example2
example2
Edit#############
The answer provided by Allan Cameron #allan-cameron is great, however It doesnt work right now in my computer probably due to versions. stats_summary method keywords change a bit. I took his solution and did a couple of changes to make it work for me.
# Allans original response
tukeyCI <- (tukeyHSD$poison[1, 1] - tukeyHSD$poison[1, 2]) / 2
# Changed fun.max and min to ymax and ymin
# Changed fun to fun.y to make Allans solution work for me.
ggplot(df, aes(x = poison, y = time)) +
stat_summary(fun.ymax = function(x) mean(x) + tukeyCI,
fun.ymin = function(x) mean(x) - tukeyCI,
geom = 'errorbar', size = 1, color = 'gray50',
width = 0.25) +
stat_summary(fun.y = mean, geom = 'point', size = 4, shape = 21,
fill = 'white') +
geom_point(position = position_jitter(width = 0.25), alpha = 0.4,
color = 'deepskyblue4') +
theme_minimal(base_size = 16)
Error response was:
Warning:Ignoring unknown parameters:fun.max, fun.min
Warning:Ignoring unknown parameters:fun
No summary function supplied, defaulting to `mean_se()
I'm currently using these versions:
version R version 3.5.2 (2018-12-20)
packageVersion("ggplot2") ‘3.1.0’
packageVersion("dplyr") ‘0.7.8’
The image from statgraphics shows error bars around the mean points, and if I understand you correctly then you want to be able to draw error bars around your mean points such that non-overlapping error bars mean there are significant differences between the variables. That being the case, we can extract the required confidence interval like this:
tukeyCI <- (tukeyHSD$poison[1,1] - tukeyHSD$poison[1,2])/2
And we can draw the result in ggplot like this:
ggplot(df, aes(x = poison, y = time)) +
stat_summary(fun.max = function(x) mean(x) + tukeyCI,
fun.min = function(x) mean(x) - tukeyCI,
geom = 'errorbar', size = 1, color = 'gray50',
width = 0.25) +
stat_summary(fun = mean, geom = 'point', size = 4, shape = 21,
fill = 'white') +
geom_point(position = position_jitter(width = 0.25), alpha = 0.4,
color = 'deepskyblue4') +
theme_minimal(base_size = 16)
Here we can see that there are significant differences between 1 and 3, and between 2 and 3, but that the difference between 1 and 2 is non-significant.

ggplot: plotting more than one function on one plot

I have a task to plot histogram using my data (here) named NoPodsWeight, its density and normal distribution for this segment (min(NoPodsWeight) and max(NoPodsWeight)).
I am trying this:
myframe <- read.csv(filepath, fileEncoding = "UTF-8", stringsAsFactors = FALSE)
myframe <- myframe[rowSums(is.na(myframe)) <= 0,]
nopodsweight <- myframe$NoPodsWeight
height <- myframe$Height
ggplot(myframe, aes(x = NoPodsWeight, y = ..density..)) +
geom_histogram(color="black", fill="white") +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(myframe$NoPodsWeight), sd = sd(myframe$NoPodsWeight)))
Using this code I get an error:
Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y =
..density...
Did you map your stat in the wrong layer?
I don't understand how to plot two or more functions on one plot. For example I can solve my problem using standard plot (but without density):
hist(x = nopodsweight, freq = F, ylim = c(0, 0.45), breaks = 37)
n_norm<-seq(min(nopodsweight)-1, max(nopodsweight)+1, 0.0001)
lines(n_norm, dnorm(n_norm), col = "red")
Is there any function in ggplot to plot (normal) distribution (or maybe using another function) like in lines?
You need to take ..density.. out of the ggplot() layer and put it specifically in the geom_histogram layer. I didn't download and import your data, but here's an example on mtcars:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..)) +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(mtcars$mpg), sd = sd(mtcars$mpg)))
The error message says "did you map your stat in the wrong layer?"; that's a hint. Moving aes(y=..density..) to apply specifically to geom_histogram() seems to make everything OK ...
ggplot(myframe, aes(x = NoPodsWeight)) +
geom_histogram(color="black", fill="white",
aes(y = ..density..)) +
## [... everything else ...]

Plotting standard error bars

I have a long format dataset with 3 variables. Im plotting two of the variables and faceting by the other one, using ggplot2. I'd like to plot the standard error bars of the observations from each facet too, but I've got no idea how. Anyone knows?
Here´s a picture of what i've got. I'd like to have the standard error bars on each facet. Thanks!!
Edit: here's some example data and the plot.
data <- data.frame(rep(c("1","2","3","4","5","6","7","8","9","10",
"11","12","13","14","15","16","17","18","19","20",
"21","22","23","24","25","26","27","28","29","30",
"31","32"), 2),
rep(c("a","b","c","d","e","f","g","h","i","j","k","l"), 32),
rnorm(n = 384))
colnames(data) <- c("estado","sector","VA")
ggplot(data, aes(x = estado, y = VA, col = sector)) +
facet_grid(.~sector) +
geom_point()
If all you want is the mean & standard error bar associated with each "estado"-"sector" combination, you can leave ggplot to do all the work, by replacing the geom_point() line with stat_summary():
ggplot(data,
aes(x = estado, y = VA, col = sector)) +
facet_grid(. ~ sector) +
stat_summary(fun.data = mean_se)
See ?mean_se from the ggplot2 package for more details on the function. The default parameter option gives you the mean as well as the range for 1 standard error above & below the mean.
If you want to show the original points, just add back the geom_point() line. (Though I think the plot would be rather cluttered for the reader, in that case...)
Maybe you could try something like below?
set.seed(1)
library(dplyr)
dat = data.frame(estado = factor(rep(1:32, 2)),
sector = rep(letters[1:12], 32),
VA = rnorm(384))
se = function(x) {
sd(x)/sqrt(length(x))
}
dat_sum = dat %>% group_by(estado, sector) %>%
summarise(mu = mean(VA), se = se(VA))
dat_plot = full_join(dat, dat_sum)
ggplot(dat_plot, aes(estado, y = VA, color = sector)) +
geom_jitter() +
geom_errorbar(aes(estado, y = mu, color = sector,
ymin = mu - se, ymax = mu + se)) +
facet_grid(.~sector)

Plot mean and sd of dataset per x value using ggplot2

I have a dataset that looks a little like this:
a <- data.frame(x=rep(c(1,2,3,5,7,10,15,20), 5),
y=rnorm(40, sd=2) + rep(c(4,3.5,3,2.5,2,1.5,1,0.5), 5))
ggplot(a, aes(x=x,y=y)) + geom_point() +geom_smooth()
I want the same output as that plot, but instead of smooth curve, I just want to take line segments between the mean/sd values for each set of x values. The graph should look similar to the above graph, but jagged, instead of curved.
I tried this, but it fails, even though the x values aren't unique:
ggplot(a, aes(x=x,y=y)) + geom_point() +stat_smooth(aes(group=x, y=y, x=x))
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
?stat_summary is what you should look at.
Here is an example
# functions to calculate the upper and lower CI bounds
uci <- function(y,.alpha){mean(y) + qnorm(abs(.alpha)/2) * sd(y)}
lci <- function(y,.alpha){mean(y) - qnorm(abs(.alpha)/2) * sd(y)}
ggplot(a, aes(x=x,y=y)) + stat_summary(fun.y = mean, geom = 'line', colour = 'blue') +
stat_summary(fun.y = mean, geom = 'ribbon',fun.ymax = uci, fun.ymin = lci, .alpha = 0.05, alpha = 0.5)
You can use one of the built-in summary functions mean_sdl. The code is shown below
ggplot(a, aes(x=x,y=y)) +
stat_summary(fun.y = 'mean', colour = 'blue', geom = 'line')
stat_summary(fun.data = 'mean_sdl', geom = 'ribbon', alpha = 0.2)
Using ggplot2 0.9.3.1, the following did the trick for me:
ggplot(a, aes(x=x,y=y)) + geom_point() +
stat_summary(fun.data = 'mean_sdl', mult = 1, geom = 'smooth')
The 'mean_sdl' is an implementation of the Hmisc package's function 'smean.sdl' and the mult-variable gives how many standard deviations (above and below the mean) are displayed.
For detailed info on the original function:
library('Hmisc')
?smean.sdl
You could try writing a summary function as suggested by Hadley Wickham on the website for ggplot2: http://had.co.nz/ggplot2/stat_summary.html. Applying his suggestion to your code:
p <- qplot(x, y, data=a)
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="blue", geom=geom, width=0.2, ...)
}
p + stat_sum_df("mean_cl_normal", geom = "smooth")
This results in this graphic:

Is there a built-in way to do a logarithmic color scale in ggplot2?

Here's an example of a binned density plot:
library(ggplot2)
n <- 1e5
df <- data.frame(x = rexp(n), y = rexp(n))
p <- ggplot(df, aes(x = x, y = y)) + stat_binhex()
print(p)
It would be nice to adjust the color scale so that the breaks are log-spaced, but a try
my_breaks <- round_any(exp(seq(log(10), log(5000), length = 5)), 10)
p + scale_fill_hue(breaks = as.factor(my_breaks), labels = as.character(my_breaks))
Results in an Error: Continuous variable () supplied to discrete scale_hue. It seems breaks is expecting a factor (maybe?) and designed with categorical variables in mind?
There's a not built-in work-around I'll post as an answer, but I think I might just be lost in my use of scale_fill_hue, and I'd like to know if there's anything obvious I'm missing.
Yes! There is a trans argument to scale_fill_gradient, which I had missed before. With that we can get a solution with appropriate legend and color scale, and nice concise syntax. Using p from the question and my_breaks = c(2, 10, 50, 250, 1250, 6000):
p + scale_fill_gradient(name = "count", trans = "log",
breaks = my_breaks, labels = my_breaks)
My other answer is best used for more complicated functions of the data. Hadley's comment encouraged me to find this answer in the examples at the bottom of ?scale_gradient.
Another way, using a custom function in stat_summary_hex:
ggplot(cbind(df, z = 1), aes(x = x, y = y, z = z)) +
stat_summary_hex(function(z){log(sum(z))})
This is now part of ggplot, but was originally inspired by the wonderful code by by #kohske in this answer, which provided a custom stat_aggrhex. In versions of ggplot > 2.0, use the above code (or the other answer)
ggplot(cbind(df, z = 1), aes(x = x, y = y, z = z)) +
stat_aggrhex(fun = function(z) log(sum(z))) +
labs(fill = "Log counts")
To generate this plot.

Resources