How to deal with vertical asymptotes in ggplot2 - r

Consider three simple mathematical functions :
f1 <- function(x) 1/x
f2 <- function(x) tan(x)
f3 <- function(x) 1 / sin(x)
There exist certain vertical asymptotes respectively, i.e. f(x) almost gets infinity when x approaches some values. I plot these three functions by ggplot2::stat_function() :
# x is between -5 to 5
ggplot(data.frame(x = c(-5, 5)), aes(x)) +
stat_function(fun = f1, n = 1000) +
coord_cartesian(ylim = c(-50, 50))
# x is between -2*pi to 2*pi
ggplot(data.frame(x = c(-2*pi, 2*pi)), aes(x)) +
stat_function(fun = f2, n = 1000) +
coord_cartesian(ylim = c(-50, 50))
# x is between -2*pi to 2*pi
ggplot(data.frame(x = c(-2*pi, 2*pi)), aes(x)) +
stat_function(fun = f3, n = 1000) +
coord_cartesian(ylim = c(-50, 50))
The asymptotes appear respectively at :
x1 <- 0
x2 <- c(-3/2*pi, -1/2*pi, 1/2*pi, 3/2*pi)
x3 <- c(-pi, 0, pi)
Actually, these lines do not exist, but ggplot makes them visible. I attempted to use geom_vline() to cover them, namely :
+ geom_vline(xintercept = x1, color = "white")
+ geom_vline(xintercept = x2, color = "white")
+ geom_vline(xintercept = x3, color = "white")
The outputs seem rough and indistinct black marks can be seen. Are there any methods which are much robuster ?

A solution related to #Mojoesque's comments that is not perfect, but also relatively simple and with two minor shortcomings: a need to know the asymptotes (x1, x2, x3) and possibly to reduce the range of y.
eps <- 0.01
f1 <- function(x) if(min(abs(x - x1)) < eps) NA else 1/x
f2 <- function(x) if(min(abs(x - x2)) < eps) NA else tan(x)
f3 <- function(x) if(min(abs(x - x3)) < eps) NA else 1 / sin(x)
ggplot(data.frame(x = c(-5, 5)), aes(x)) +
stat_function(fun = Vectorize(f1), n = 1000) +
coord_cartesian(ylim = c(-30, 30))
ggplot(data.frame(x = c(-2*pi, 2*pi)), aes(x)) +
stat_function(fun = Vectorize(f2), n = 1000) +
coord_cartesian(ylim = c(-30, 30))
ggplot(data.frame(x = c(-2*pi, 2*pi)), aes(x)) +
stat_function(fun = Vectorize(f3), n = 1000) +
coord_cartesian(ylim = c(-30, 30))

This solution is based on #Mojoesque's comment, which uses piecewise skill to partition x-axis into several subintervals, and then execute multiple stat_function() by purrr::reduce(). The restraint is that asymptotes need to be given.
Take tan(x) for example :
f <- function(x) tan(x)
asymp <- c(-3/2*pi, -1/2*pi, 1/2*pi, 3/2*pi)
left <- -2 * pi # left border
right <- 2 * pi # right border
d <- 0.001
interval <- data.frame(x1 = c(left, asymp + d),
x2 = c(asymp - d, right))
interval # divide the entire x-axis into 5 sections
# x1 x2
# 1 -6.283185 -4.713389
# 2 -4.711389 -1.571796
# 3 -1.569796 1.569796
# 4 1.571796 4.711389
# 5 4.713389 6.283185
library(tidyverse)
pmap(interval, function(x1, x2) {
stat_function(fun = f, xlim = c(x1, x2), n = 1000)
}) %>% reduce(.f = `+`,
.init = ggplot(data.frame(x = c(left, right)), aes(x)) +
coord_cartesian(ylim = c(-50, 50)))

Related

draw vertical lines in ggplot with faceting

I have line plots y vs x. y is sigmoid and varies from 0 to 1.
determine the value of x where y = 0.5 or very close by interpolation.
draw vertical line at x where y = 0.5
library(tidyverse)
# continuous variables
x <- seq(-5, 5, 0.1)
# compute y1
error_term <- runif(1, min = -2, max = 2)
y1 <- 1/(1 + exp(-x + error_term))
# compute y2
error_term <- runif(1, min = -2, max = 2)
y2 <- 1/(1 + exp(-x + error_term))
# merge y
y <- c(y1, y2)
x <- c(x, x)
# categorical variable
a <- c(rep(0, 101), rep(1, 101))
tbl <- tibble(x, a, y)
# TASK
# 1. determine values of x at which y = 0.5 for all categories and store them in variable x0
# 2. Use x0 to draw vertical lines in plots at x where y is 0.5
# ggplot
ggplot(data = tbl,
aes(x = x,
y = y)) +
geom_line() +
theme_bw() +
facet_grid(a ~ .)
This really isn't something built in to ggplot so you'll need to summarize the data yourself prior to plotting. You can write a helper function and then create the data you need for the lines
find_intersect <- function(x,y, target=0.5) {
optimize(function(z) (approxfun(x,y)(z)-target)^2, x)$minimum
}
line_data <- tbl %>%
group_by(a) %>%
summarize(xint=find_intersect(x,y))
Then plot with
ggplot(data = tbl,
aes(x = x,
y = y)) +
geom_line() +
theme_bw() +
geom_vline(aes(xintercept=xint), data=line_data) +
facet_grid(a ~ .)

Use coefficients to draw curve in ggplot

I have the coefficients for a curve and would like to draw the curve in ggplot2. My formula is just a polynomial:
y = a * x^2 + b * x + c
I have these coefficients:
a <- 0.000000308
b <- -0.0168
c <- 437
I don't know if these points will fall near the line, but say we are plotting this df:
df <- data.frame(group = c("a", "b", "c"),
x_variable = c(20000, 32000, 48000),
y_variable = c( 175, 200, 250))
Here's what I tried:
ggplot(df, aes(x = x_variable, y = y_variable)) +
geom_point() +
# this next line doesn't work, is it close?
# geom_smooth(method = 'lm', formula = y ~ 0.000000308 * x^2 + -0.0168 * x + 437)
One option is to use stat_function which applies a function along a grid of x values that fits the plotting area:
ggplot(df, aes(x = x_variable, y = y_variable)) +
geom_point() +
stat_function(fun = function(x){0.000000308 * x^2 + -0.0168 * x + 437})
Not sure if I'm totally misunderstanding your question, but I would just create some sample points:
library(data.table)
DT = data.table(x=1:100000)
DT[,y := a * x^2 + b * x + c]
ggplot(DT,aes(x=x,y=y))+geom_smooth()
Generate a function grid first and then plot the function along with your points:
library(ggplot2)
a <- 0.000000308
b <- -0.0168
c <- 437
grid <- data.frame( x = seq(15000, 60000, 1))
grid$y <- a*grid$x^2 + b*grid$x +c
points <- data.frame(x_variable = c(20000, 32000, 48000),
y_variable = c( 175, 200, 250))
ggplot() +
geom_line(data = grid, aes(x,y), color = "red") +
geom_point(data = points, aes(x_variable,y_variable))

saving ggplot in a list gives me the same graph

I am trying to plot 12 different plots on a 3 by 4 grid. But,it only plots the last one 12 times. Can any one help me? I am so fed up with it. Thanks
library(ggplot2)
library(gridExtra)
pmax=0.85
K_min = 0.0017
T = seq(100,1200,by=100) ## ISIs
lambda =1/T
p=list()
for(i in (1:length(lambda))){
p[[i]]<-ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = function (x) (lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1),colour = "dodgerblue3")+
scale_x_continuous(name = "Probability") +
scale_y_continuous(name = "Frequency") + theme_bw()
main <- grid.arrange(grobs=p,ncol=4)
}
This code produces the correct picture but I need to use ggplot since my other figures are in ggplot.
par( mfrow = c( 3, 4 ) )
for (i in (1:length(lambda))){
f <- function (x) ((lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1) )
curve(f,from=0, to=1, col = "violet",lwd=2,sub = paste0("ISI = ",round(1/lambda[i],3), ""),ylab="PDF",xlab="R")
}
Correct plot using curve:
ggplot objects created in a loop are evaluated at the end of the loop. Since all the ggplot objects in this case use data calculated with lambda[i], they get the same result based on the last i value (12). Here are two possible workarounds:
Workaround 1. Convert each ggplot object into a grob within the loop, & save that to the list:
for(i in (1:length(lambda))){
# code for generating each plot is unchanged
g <- ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = function (x) (lambda[i]*(1-(1-pmax))/K_min)*(1-x)^((lambda[i]/K_min)-1)*
(1-(1-pmax)*x)^-((lambda[i]/K_min)+1),colour = "dodgerblue3")+
scale_x_continuous(name = "Probability") +
scale_y_continuous(name = "Frequency") + theme_bw()
p[[i]] <- ggplotGrob(g)
}
main <- grid.arrange(grobs=p, ncol=4)
Workaround 2. Put all the data in a data frame, & create a single ggplot with a facet for each ISI:
library(dplyr)
pmax = 0.85
K_min = 0.0017
ISI = seq(100, 1200, by = 100) # I changed this; using `T` as a name clashes with T from TRUE/FALSE
lambda = 1/ISI
df <- data.frame(
x = rep(seq(0, 1, length.out = 101), length(ISI)),
ISI = rep(ISI, each = 101),
l = rep(lambda, each = 101)
) %>%
mutate(y = (l * pmax / K_min) * (1-x) ^ ((l / K_min) - 1) *
(1 - (1 - pmax) * x)^-((l / K_min) + 1))
ggplot(data,
aes(x = x, y = y, group = 1)) +
geom_line(colour = "dodgerblue3") +
facet_wrap(~ISI, nrow = 3, scales = "free_y") +
labs(x = "Probability", y = "Frequency") +
theme_bw()

Does ggplot use dynamic function for statistic functions?

So I'm running an optimization problem and am trying to add the function at each point in time to a plot. I'm able to plot the function but I have the variables stored and it seems like r doesn't evaluate the function until it renders it. It's hard to explain, but I have a simple example that shows it.
data = data.frame(x = runif(20, -10, 10), y = runif(20, -10,10))
p <- ggplot(data, aes(x = x, y =y))
slope = 0.5
yoff = 1
p <- p + stat_function(fun = function(x) slope*x+yoff)
slope = 1
yoff = -1
p <- p + stat_function(fun = function(x) slope*x+yoff)
p
And what I want is two lines on the graph with the slope and y-intercept that I had when I added the function to the graph.
If you have a lot of them, make a list of functions:
make_fun <- function(slope,yoff) {slope; yoff; function(x) x*slope + yoff}
> l <- mapply(FUN = make_fun,slope = 1:2,yoff = 3:4)
> l[[1]](1)
[1] 4
> l[[2]](1)
[1] 6
A function is evaluated when used, so there it is at render time.
You can rename your parameters to have different function:
p <- ggplot(data, aes(x = x, y =y))
slope1 = 0.5
yoff1 = 1
p <- p + stat_function(fun = function(x) slope1*x+yoff1)
slope2 = 1
yoff2 = -1
p <- p + stat_function(fun = function(x) slope2*x+yoff2)
Many parameters in ggplot aren't evaluated until the plot is actually rendered. Here we can make the slope and yoff values arguments to the functions and then pass in values via the args= parameter which does get evaluated earlier.
library(ggplot2)
data = data.frame(x = runif(20, -10, 10), y = runif(20, -10,10))
p <- ggplot(data, aes(x = x, y =y))
slope = 0.5
yoff = 1
p <- p + stat_function(fun = function(x, slope, yoff) slope*x+yoff, args=list(slope=slope, yoff=yoff))
slope = 1
yoff = -1
p <- p + stat_function(fun = function(x, slope, yoff) slope*x+yoff, args=list(slope=slope, yoff=yoff))
p

How to visualise the difference between probability distribution functions? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I try to visualise the difference between two histograms of distribution functions such as the difference in following two curves :
When the difference is big, you could just plot two curves on top of each other and fill the difference as denoted above, though when the difference becomes very small, this is cumbersome. Another way to plot this, is plotting the difference itself as follows :
However, this seems very hard to read for everyone seeing such a graph for the first time, so i was wondering: is there any other way you can visualise the difference between two distribution functions ?
I thought that maybe it might be an option to simply combine your two propositions, while scaling up the differences to make them visible.
What follows is an attempt to do this with ggplot2. Actually it was quite a bit more involved to do this than I initially thought, and I'm definitely not a hundred percent satisfied with the result; but maybe it helps nevertheless. Comments and improvements very welcome.
library(ggplot2)
library(dplyr)
## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]
}
## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
mutate(sample = as.factor(sample))
## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)
## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2),
side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
geom_line(aes(x = x, y = 5 * abs(ydiff), colour = side)) +
geom_area(aes(x = x, y = 5 * abs(ydiff), fill = side, alpha = 0.4))
g3 <- g2 +
geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
xlim(0, 10) +
guides(alpha = FALSE, colour = FALSE) +
ylab("Curves: density\n Shaded area: 5 * difference of densities") +
scale_fill_manual(name = "samples", labels = 1:2, values = gg_color_hue(2)) +
scale_colour_manual(limits = list(1, 2, FALSE, TRUE), values = rep(gg_color_hue(2), 2))
print(g3)
Sources: SO answer 1, SO answer 2
As suggested by #Gregor in the comments, here's a version that does two separate plots below eachother but sharing the same x axis scaling. At least the legends should obviously be tweaked.
library(ggplot2)
library(dplyr)
library(grid)
## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]
}
## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
mutate(sample = as.factor(sample))
## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)
## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2),
side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
xlim(0, 10) +
guides(alpha = FALSE, fill = FALSE)
g3 <- ggplot(df2) +
geom_line(aes(x = x, y = abs(ydiff), colour = side)) +
geom_area(aes(x = x, y = abs(ydiff), fill = side, alpha = 0.4)) +
guides(alpha = FALSE, fill = FALSE)
## See [3]
grid.draw(rbind(ggplotGrob(g2), ggplotGrob(g3), size="last"))
... or with abs(ydiff) replaced by ydiff in the construction of the second plot:
Source: SO answer 3

Resources