Does ggplot use dynamic function for statistic functions? - r

So I'm running an optimization problem and am trying to add the function at each point in time to a plot. I'm able to plot the function but I have the variables stored and it seems like r doesn't evaluate the function until it renders it. It's hard to explain, but I have a simple example that shows it.
data = data.frame(x = runif(20, -10, 10), y = runif(20, -10,10))
p <- ggplot(data, aes(x = x, y =y))
slope = 0.5
yoff = 1
p <- p + stat_function(fun = function(x) slope*x+yoff)
slope = 1
yoff = -1
p <- p + stat_function(fun = function(x) slope*x+yoff)
p
And what I want is two lines on the graph with the slope and y-intercept that I had when I added the function to the graph.

If you have a lot of them, make a list of functions:
make_fun <- function(slope,yoff) {slope; yoff; function(x) x*slope + yoff}
> l <- mapply(FUN = make_fun,slope = 1:2,yoff = 3:4)
> l[[1]](1)
[1] 4
> l[[2]](1)
[1] 6

A function is evaluated when used, so there it is at render time.
You can rename your parameters to have different function:
p <- ggplot(data, aes(x = x, y =y))
slope1 = 0.5
yoff1 = 1
p <- p + stat_function(fun = function(x) slope1*x+yoff1)
slope2 = 1
yoff2 = -1
p <- p + stat_function(fun = function(x) slope2*x+yoff2)

Many parameters in ggplot aren't evaluated until the plot is actually rendered. Here we can make the slope and yoff values arguments to the functions and then pass in values via the args= parameter which does get evaluated earlier.
library(ggplot2)
data = data.frame(x = runif(20, -10, 10), y = runif(20, -10,10))
p <- ggplot(data, aes(x = x, y =y))
slope = 0.5
yoff = 1
p <- p + stat_function(fun = function(x, slope, yoff) slope*x+yoff, args=list(slope=slope, yoff=yoff))
slope = 1
yoff = -1
p <- p + stat_function(fun = function(x, slope, yoff) slope*x+yoff, args=list(slope=slope, yoff=yoff))
p

Related

Using ggplot to draw a density function for various values of parameters

I am trying to plot a density function for various values of two parameters as follows:
f_bdsn<-function(x){
2*(1+delta1*x^2)*dnorm(x)*pnorm(alpha1*x)/(1+delta1)
}
alpha1<<-0
alpha1<<-0
group1=paste("alpha=",alpha1,", delta=",delta1)
p9 <- ggplot(data.frame(x = c(-4, 4)), aes(x = x)) +
stat_function(fun = f_bdsn, aes(colour = group1))
alpha1<<-0
delta1<<-6
group2=paste("alpha=",alpha1,", delta=",delta1)
p9 <-p9 + stat_function(fun = f_bdsn,
aes(colour = group2))
p9
I am confused why it does not work! It only draws the function for last values of the parameters.
I've had to make some changes to your original function. Basically, the alpha and delta values need to be parameterised and passed into when calling the function. Then using a for loop we can create as many groups as we want.
# Create Function which takes in an x value, a delta value and an alpha value
f_bdsn<-function(x, delta_input, alpha_input){
2*(1+delta_input*x^2)*dnorm(x)*pnorm(alpha_input*x)/(1+delta_input)
}
# Define the number of groups, alpha values and delta values
# Note the length of both alpha_values and delta_values are the same
n_groups <- 2
alpha_values <- c(0, 10)
delta_values <- c(6, 16)
# Create inital plot
plot <- ggplot(data.frame(x = c(-4, 4)), aes(x = x))
# Create a for loop to through each group
for (i in seq_len(n_groups)) {
# Define the group name
group_name <- paste("alpha=", alpha_values[i],", delta=", delta_values[i])
# Add the values to the main plot variable
plot <- plot +
stat_function(fun = f_bdsn, args = list(delta_input = delta_values[i],
alpha_input = alpha_values[i]),
aes(colour = group_name))
}
# Print Plot
plot
The problem is that the parameter values of ggplot for the most part are lazily evaulated. The values aren't actually evaluated until the plot is drawn. Since your function uses global variables, those values aren't resolved till plot time and at the time of the plot they will only have one value, not two different values. You can change this by creating a function generator. For example
f_gen <- function(alpha1, delta1) {
force(c(alpha1, delta1))
function(x){
2*(1+delta1*x^2)*dnorm(x)*pnorm(alpha1*x)/(1+delta1)
}}
alpha1 <- 0
delta1 <- 0
group1 <- paste("alpha=",alpha1,", delta=",delta1)
p9 <- ggplot(data.frame(x = c(-4, 4)), aes(x = x)) +
stat_function(fun = f_gen(alpha1,delta1), aes(colour = group1))
alpha1 <- 0
delta1 <- 6
group2 <- paste("alpha=",alpha1,", delta=",delta1)
p9 <- p9 +
stat_function(fun = f_gen(alpha1,delta1), aes(colour = group2))
p9
Here fgen is a function that returns a function with the parameters you desire.
You might even simplify that to
f_gen <- function(alpha1, delta1) {
force(c(alpha1, delta1))
function(x){
2*(1+delta1*x^2)*dnorm(x)*pnorm(alpha1*x)/(1+delta1)
}}
gname <- function(alpha1, delta1) paste("alpha=",alpha1,", delta=",delta1)
ggplot(data.frame(x = c(-4, 4)), aes(x = x)) +
stat_function(fun = f_gen(0,0), aes(colour = gname(0,0))) +
stat_function(fun = f_gen(0,6), aes(colour = gname(0,6))) +
labs(color="Params")

How to deal with vertical asymptotes in ggplot2

Consider three simple mathematical functions :
f1 <- function(x) 1/x
f2 <- function(x) tan(x)
f3 <- function(x) 1 / sin(x)
There exist certain vertical asymptotes respectively, i.e. f(x) almost gets infinity when x approaches some values. I plot these three functions by ggplot2::stat_function() :
# x is between -5 to 5
ggplot(data.frame(x = c(-5, 5)), aes(x)) +
stat_function(fun = f1, n = 1000) +
coord_cartesian(ylim = c(-50, 50))
# x is between -2*pi to 2*pi
ggplot(data.frame(x = c(-2*pi, 2*pi)), aes(x)) +
stat_function(fun = f2, n = 1000) +
coord_cartesian(ylim = c(-50, 50))
# x is between -2*pi to 2*pi
ggplot(data.frame(x = c(-2*pi, 2*pi)), aes(x)) +
stat_function(fun = f3, n = 1000) +
coord_cartesian(ylim = c(-50, 50))
The asymptotes appear respectively at :
x1 <- 0
x2 <- c(-3/2*pi, -1/2*pi, 1/2*pi, 3/2*pi)
x3 <- c(-pi, 0, pi)
Actually, these lines do not exist, but ggplot makes them visible. I attempted to use geom_vline() to cover them, namely :
+ geom_vline(xintercept = x1, color = "white")
+ geom_vline(xintercept = x2, color = "white")
+ geom_vline(xintercept = x3, color = "white")
The outputs seem rough and indistinct black marks can be seen. Are there any methods which are much robuster ?
A solution related to #Mojoesque's comments that is not perfect, but also relatively simple and with two minor shortcomings: a need to know the asymptotes (x1, x2, x3) and possibly to reduce the range of y.
eps <- 0.01
f1 <- function(x) if(min(abs(x - x1)) < eps) NA else 1/x
f2 <- function(x) if(min(abs(x - x2)) < eps) NA else tan(x)
f3 <- function(x) if(min(abs(x - x3)) < eps) NA else 1 / sin(x)
ggplot(data.frame(x = c(-5, 5)), aes(x)) +
stat_function(fun = Vectorize(f1), n = 1000) +
coord_cartesian(ylim = c(-30, 30))
ggplot(data.frame(x = c(-2*pi, 2*pi)), aes(x)) +
stat_function(fun = Vectorize(f2), n = 1000) +
coord_cartesian(ylim = c(-30, 30))
ggplot(data.frame(x = c(-2*pi, 2*pi)), aes(x)) +
stat_function(fun = Vectorize(f3), n = 1000) +
coord_cartesian(ylim = c(-30, 30))
This solution is based on #Mojoesque's comment, which uses piecewise skill to partition x-axis into several subintervals, and then execute multiple stat_function() by purrr::reduce(). The restraint is that asymptotes need to be given.
Take tan(x) for example :
f <- function(x) tan(x)
asymp <- c(-3/2*pi, -1/2*pi, 1/2*pi, 3/2*pi)
left <- -2 * pi # left border
right <- 2 * pi # right border
d <- 0.001
interval <- data.frame(x1 = c(left, asymp + d),
x2 = c(asymp - d, right))
interval # divide the entire x-axis into 5 sections
# x1 x2
# 1 -6.283185 -4.713389
# 2 -4.711389 -1.571796
# 3 -1.569796 1.569796
# 4 1.571796 4.711389
# 5 4.713389 6.283185
library(tidyverse)
pmap(interval, function(x1, x2) {
stat_function(fun = f, xlim = c(x1, x2), n = 1000)
}) %>% reduce(.f = `+`,
.init = ggplot(data.frame(x = c(left, right)), aes(x)) +
coord_cartesian(ylim = c(-50, 50)))

Adding orthogonal regression line in ggplot

I have plotted a scatter graph in R, comparing expected to observed values,using the following script:
library(ggplot2)
library(dplyr)
r<-read_csv("Uni/MSci/Project/DATA/new data sheets/comparisons/for comarison
graphs/R Regression/GAcAs.csv")
x<-r[1]
y<-r[2]
ggplot()+geom_point(aes(x=x,y=y))+
scale_size_area() +
xlab("Expected") +
ylab("Observed") +
ggtitle("G - As x Ac")+ xlim(0, 40)+ylim(0, 40)
My plot is as follows:
I then want to add an orthogonal regression line (as there could be errors in both the expected and observed values). I have calculated the beta value using the following:
v <- prcomp(cbind(x,y))$rotation
beta <- v[2,1]/v[1,1]
Is there a way to add an orthogonal regression line to my plot?
Borrowed from this blog post & this answer. Basically, you will need Deming function from MethComp or prcomp from stats packages together with a custom function perp.segment.coord. Below is an example taken from above mentioned blog post.
library(ggplot2)
library(MethComp)
data(airquality)
airquality <- na.exclude(airquality)
# Orthogonal, total least squares or Deming regression
deming <- Deming(y=airquality$Wind, x=airquality$Temp)[1:2]
deming
#> Intercept Slope
#> 24.8083259 -0.1906826
# Check with prcomp {stats}
r <- prcomp( ~ airquality$Temp + airquality$Wind )
slope <- r$rotation[2,1] / r$rotation[1,1]
slope
#> [1] -0.1906826
intercept <- r$center[2] - slope*r$center[1]
intercept
#> airquality$Wind
#> 24.80833
# https://stackoverflow.com/a/30399576/786542
perp.segment.coord <- function(x0, y0, ortho){
# finds endpoint for a perpendicular segment from the point (x0,y0) to the line
# defined by ortho as y = a + b*x
a <- ortho[1] # intercept
b <- ortho[2] # slope
x1 <- (x0 + b*y0 - a*b)/(1 + b^2)
y1 <- a + b*x1
list(x0=x0, y0=y0, x1=x1, y1=y1)
}
perp.segment <- perp.segment.coord(airquality$Temp, airquality$Wind, deming)
perp.segment <- as.data.frame(perp.segment)
# plot
plot.y <- ggplot(data = airquality, aes(x = Temp, y = Wind)) +
geom_point() +
geom_abline(intercept = deming[1],
slope = deming[2]) +
geom_segment(data = perp.segment,
aes(x = x0, y = y0, xend = x1, yend = y1),
colour = "blue") +
theme_bw()
Created on 2018-03-19 by the reprex package (v0.2.0).
The MethComp package seems to be no longer maintained (was removed from CRAN).
Russel88/COEF allows to use stat_/geom_summary with method="tls" to add an orthogonal regression line.
Based on this and wikipedia:Deming_regression I created the following functions, which allow to use noise ratios other than 1:
deming.fit <- function(x, y, noise_ratio = sd(y)/sd(x)) {
if(missing(noise_ratio) || is.null(noise_ratio)) noise_ratio <- eval(formals(sys.function(0))$noise_ratio) # this is just a complicated way to write `sd(y)/sd(x)`
delta <- noise_ratio^2
x_name <- deparse(substitute(x))
s_yy <- var(y)
s_xx <- var(x)
s_xy <- cov(x, y)
beta1 <- (s_yy - delta*s_xx + sqrt((s_yy - delta*s_xx)^2 + 4*delta*s_xy^2)) / (2*s_xy)
beta0 <- mean(y) - beta1 * mean(x)
res <- c(beta0 = beta0, beta1 = beta1)
names(res) <- c("(Intercept)", x_name)
class(res) <- "Deming"
res
}
deming <- function(formula, data, R = 100, noise_ratio = NULL, ...){
ret <- boot::boot(
data = model.frame(formula, data),
statistic = function(data, ind) {
data <- data[ind, ]
args <- rlang::parse_exprs(colnames(data))
names(args) <- c("y", "x")
rlang::eval_tidy(rlang::expr(deming.fit(!!!args, noise_ratio = noise_ratio)), data, env = rlang::current_env())
},
R=R
)
class(ret) <- c("Deming", class(ret))
ret
}
predictdf.Deming <- function(model, xseq, se, level) {
pred <- as.vector(tcrossprod(model$t0, cbind(1, xseq)))
if(se) {
preds <- tcrossprod(model$t, cbind(1, xseq))
data.frame(
x = xseq,
y = pred,
ymin = apply(preds, 2, function(x) quantile(x, probs = (1-level)/2)),
ymax = apply(preds, 2, function(x) quantile(x, probs = 1-((1-level)/2)))
)
} else {
return(data.frame(x = xseq, y = pred))
}
}
# unrelated hlper function to create a nicer plot:
fix_plot_limits <- function(p) p + coord_cartesian(xlim=ggplot_build(p)$layout$panel_params[[1]]$x.range, ylim=ggplot_build(p)$layout$panel_params[[1]]$y.range)
Demonstration:
library(ggplot2)
#devtools::install_github("Russel88/COEF")
library(COEF)
fix_plot_limits(
ggplot(data.frame(x = (1:5) + rnorm(100), y = (1:5) + rnorm(100)*2), mapping = aes(x=x, y=y)) +
geom_point()
) +
geom_smooth(method=deming, aes(color="deming"), method.args = list(noise_ratio=2)) +
geom_smooth(method=lm, aes(color="lm")) +
geom_smooth(method = COEF::tls, aes(color="tls"))
Created on 2019-12-04 by the reprex package (v0.3.0)
I'm not sure I completely understand the question, but if you want line segments to show errors along both x and y axis, you can do this using geom_segment.
Something like this:
library(ggplot2)
df <- data.frame(x = rnorm(10), y = rnorm(10), w = rnorm(10, sd=.1))
ggplot(df, aes(x = x, y = y, xend = x, yend = y)) +
geom_point() +
geom_segment(aes(x = x - w, xend = x + w)) +
geom_segment(aes(y = y - w, yend = y + w))

saving ggplot in a list gives me the same graph

I am trying to plot 12 different plots on a 3 by 4 grid. But,it only plots the last one 12 times. Can any one help me? I am so fed up with it. Thanks
library(ggplot2)
library(gridExtra)
pmax=0.85
K_min = 0.0017
T = seq(100,1200,by=100) ## ISIs
lambda =1/T
p=list()
for(i in (1:length(lambda))){
p[[i]]<-ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = function (x) (lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1),colour = "dodgerblue3")+
scale_x_continuous(name = "Probability") +
scale_y_continuous(name = "Frequency") + theme_bw()
main <- grid.arrange(grobs=p,ncol=4)
}
This code produces the correct picture but I need to use ggplot since my other figures are in ggplot.
par( mfrow = c( 3, 4 ) )
for (i in (1:length(lambda))){
f <- function (x) ((lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1) )
curve(f,from=0, to=1, col = "violet",lwd=2,sub = paste0("ISI = ",round(1/lambda[i],3), ""),ylab="PDF",xlab="R")
}
Correct plot using curve:
ggplot objects created in a loop are evaluated at the end of the loop. Since all the ggplot objects in this case use data calculated with lambda[i], they get the same result based on the last i value (12). Here are two possible workarounds:
Workaround 1. Convert each ggplot object into a grob within the loop, & save that to the list:
for(i in (1:length(lambda))){
# code for generating each plot is unchanged
g <- ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = function (x) (lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1),colour = "dodgerblue3")+
scale_x_continuous(name = "Probability") +
scale_y_continuous(name = "Frequency") + theme_bw()
p[[i]] <- ggplotGrob(g)
}
main <- grid.arrange(grobs=p, ncol=4)
Workaround 2. Put all the data in a data frame, & create a single ggplot with a facet for each ISI:
library(dplyr)
pmax = 0.85
K_min = 0.0017
ISI = seq(100, 1200, by = 100) # I changed this; using `T` as a name clashes with T from TRUE/FALSE
lambda = 1/ISI
df <- data.frame(
x = rep(seq(0, 1, length.out = 101), length(ISI)),
ISI = rep(ISI, each = 101),
l = rep(lambda, each = 101)
) %>%
mutate(y = (l * pmax / K_min) * (1-x) ^ ((l / K_min) - 1) *
(1 - (1 - pmax) * x)^-((l / K_min) + 1))
ggplot(data,
aes(x = x, y = y, group = 1)) +
geom_line(colour = "dodgerblue3") +
facet_wrap(~ISI, nrow = 3, scales = "free_y") +
labs(x = "Probability", y = "Frequency") +
theme_bw()

2 polynomial regressions in a ggplot() graph

This is my Dataset:
As you can see, there are two quantitative variables (X, Y) and 1 categorical variable (molar, with two factors: M1, M2).
I would like to represent in one single graph two polynomial regressions and their respective prediction intervals: one for the M1 factor and one for the M2 factor. Each polynomial regression has its own degree (M1 is a 4 degree polynomial regression, and M2 is a 6 degree).
I want to use ggplot() function (which is in package ggplot2 in R). I have actually performed this figure but with all data merged (I mean, with no distinction between factors). This is the code I used:
# Fit a linear model
m <- lm(Y ~ X+I(X^2)+I(X^3)+I(X^4), data = Dataset)
# cbind the predictions to Dataset
mpi <- cbind(Dataset, predict(m, interval = "prediction"))
ggplot(mpi, aes(x = X)) +
geom_ribbon(aes(ymin = lwr, ymax = upr),
fill = "blue", alpha = 0.2) +
geom_point(aes(y = Y)) +
geom_line(aes(y = fit), colour = "blue", size = 1)
With this result:
So, I would like to have two different-grade polynomial regressions (one for the M1 and one for the M2), taking into account their respective predictions intervals. Which would be the exact code?
UPDATE - New code! I run this code with no success:
M1=subset(Dataset,Dataset$molar=="M1",select=X:Y)
M2=subset(Dataset,Dataset$molar=="M2",select=X:Y)
M1.R <- lm(Y ~ X +I(X^2)+I(X^3)+I(X^4),
data=subset(Dataset,Dataset$molar=="M1",select=X:Y))
M2.R <- lm(Y ~ X +I(X^2)+I(X^3)+I(X^4),
data=subset(Dataset,Dataset$molar=="M2",select=X:Y))
newdf <- data.frame(x = seq(0, 1, c(408,663)))
M1.P <- cbind(data=subset(Dataset,Dataset$molar=="M1",select=X:Y), predict(M1.R, interval = "prediction"))
M2.P <- cbind(data=subset(Dataset,Dataset$molar=="M2",select=X:Y), predict(M2.R, interval = "prediction"))
p = cbind(as.data.frame(rbind(M1.P, M2.P)), f = factor(rep(1:2, c(408,663)), x = rep(newdf$x, 2))
mdf = with(Dataset, data.frame(x = rep(x, 2), y = c(subset(Dataset,Dataset$molar=="M1",select=Y), subset(Dataset,Dataset$molar=="M2",select=Y),
f = factor(rep(1:2, c(408,663))))
ggplot(mdf, aes(x = x, y = y, colour = f)) + geom_point() +
geom_ribbon(data = p, aes(x = x, ymin = lwr, ymax = upr,
fill = f, y = NULL, colour = NULL),
alpha = 0.2) +
geom_line(data = p, aes(x = x, y = fit))
These are the messages I get now:
[98] WARNING: Warning in if (n < 0L) stop("wrong sign in 'by' argument") :
the condition has length > 1 and only the first element will be used
Warning in if (n > .Machine$integer.max) stop("'by' argument is much too small") :
the condition has length > 1 and only the first element will be used
Warning in 0L:n :
numerical expression has 2 elements: only the first used
Warning in if (by > 0) pmin(x, to) else pmax(x, to) :
the condition has length > 1 and only the first element will be used
[99] WARNING: Warning in predict.lm(M1.R, interval = "prediction") :
predictions on current data refer to _future_ responses
[100] WARNING: Warning in predict.lm(M2.R, interval = "prediction") :
predictions on current data refer to _future_ responses
[101] ERROR: <text>
I think I am closer but still can't see it. Help!
Here is one way. If you have more than two models/levels in the factor you should look into code that will work over the levels of the factor and fit the models that way.
Anyway, first some dummy data:
set.seed(100)
x <- runif(100)
y1 <- 2 + (0.3 * x) + (2.4 * x^2) + (-2.5 * x^3) + (3.4 * x^4) + rnorm(100)
y2 <- -1 + (0.3 * x) + (2.4 * x^2) + (-2.5 * x^3) + (3.4 * x^4) +
(-0.3 * x^5) + (2.4 * x^6) + rnorm(100)
df <- data.frame(x, y1, y2)
Fit our two models:
m1 <- lm(y1 ~ poly(x, 4), data = df)
m2 <- lm(y2 ~ poly(x, 6), data = df)
Now precict at some new locations x and stick it together with x and f, a factor indexing the model, into a tidy format:
newdf <- data.frame(x = seq(0, 1, length = 100))
p1 <- predict(m1, newdata = newdf, interval = "prediction")
p2 <- predict(m2, newdata = newdf, interval = "prediction")
p <- cbind(as.data.frame(rbind(p1, p2)), f = factor(rep(1:2, each = 100)),
x = rep(newdf$x, 2))
Melt the original data into tidy form
mdf <- with(df, data.frame(x = rep(x, 2), y = c(y1, y2),
f = factor(rep(1:2, each = 100))))
Draw the plot, using colour to distinguish the models/data
ggplot(mdf, aes(x = x, y = y, colour = f)) +
geom_point() +
geom_ribbon(data = p, aes(x = x, ymin = lwr, ymax = upr,
fill = f, y = NULL, colour = NULL),
alpha = 0.2) +
geom_line(data = p, aes(x = x, y = fit))
This gets us

Resources