deSolve: differential equations with two consecutive dynamics - r

I am simulating a ring tube with flowing water and a temperature gradient using deSolve::ode(). The ring is modelled as a vector where each element has a temperature value and position.
I am modelling the heat diffusion formula:
1)
But I'm struggling with also moving the water along the ring. In theory, it's just about substituting the temperature at the element i in the tube vector with that at the element s places earlier. Since s may not be an integer, it can be separated into the integer part (n) and the fractional part (p): s=n+p. Consequently, the change in temperature due to the water moving becomes:
2)
The problem is that s equals to the water velocity v by the dt evaluated at each iteration of the ode solver.
My idea is to treat the phenomenons as additive, that is first computing (1), then (2) and finally adding them together. I'm afraid though about the effect of time. The ode solver with implicit methods decides the time step automatically and scales down linearly the unitary change delta.
My question is whether just returning (1) + (2) in the derivative function is correct or if I should break the two processes apart and compute the derivatives separately. In the second case, what would be the suggested approach?
EDIT:
As by suggestion by #tpetzoldt I tried to implement the water flow using ReacTran::advection.1D(). My model has multiple sources of variation of temperature: the spontaneous symmetric heat diffusion; the water flow; a source of heat that is turned on if the temperature near a sensor (placed before the heat source) drops below a lower threshold and is turned off if raises above an upper threshold; a constant heat dispersion determined by a cyclical external temperature.
Below the "Moving water" section there is still my previous version of the code, now substituted by ReacTran::advection.1D().
The plot_type argument allows visualizing either a time sequence of the temperature in the water tube ("pipe"), or the temperature sequence at the sensors (before and after the heater).
library(deSolve)
library(dplyr)
library(ggplot2)
library(tidyr)
library(ReacTran)
test <- function(simTime = 5000, vel = 1, L = 500, thresh = c(16, 25), heatT = 25,
heatDisp = .0025, baseTemp = 15, alpha = .025,
adv_method = 'up', plot_type = c('pipe', 'sensors')) {
plot_type <- match.arg(plot_type)
thresh <- c(16, 25)
sensorP <- round(L/2)
vec <- c(rep(baseTemp, L), 0)
eventfun <- function(t, y, pars) {
heat <- y[L + 1] > 0
if (y[sensorP] < thresh[1] & heat == FALSE) { # if heat is FALSE -> T was above the threshold
#browser()
y[L + 1] <- heatT
}
if (y[sensorP] > thresh[2] & heat == TRUE) { # if heat is TRUE -> T was below the threshold
#browser()
y[L + 1] <- 0
}
return(y)
}
rootfun <- function (t, y, pars) {
heat <- y[L + 1] > 0
trigger_root <- 1
if (y[sensorP] < thresh[1] & heat == FALSE & t > 1) { # if heat is FALSE -> T was above the threshold
#browser()
trigger_root <- 0
}
if (y[sensorP] > thresh[2] & heat == TRUE & t > 1) { # if heat is TRUE -> T was below the threshold
#browser()
trigger_root <- 0
}
return(trigger_root)
}
roll <- function(x, n) {
x[((1:length(x)) - (n + 1)) %% length(x) + 1]
}
fun <- function(t, y, pars) {
v <- y[1:L]
# Heat diffusion: dT/dt = alpha * d2T/d2X
d2Td2X <- c(v[2:L], v[1]) + c(v[L], v[1:(L - 1)]) - 2 * v
dT_diff <- pars * d2Td2X
# Moving water
# nS <- floor(vel)
# pS <- vel - nS
#
# v_shifted <- roll(v, nS)
# nS1 <- nS + 1
# v_shifted1 <- roll(v, nS + 1)
#
# dT_flow <- v_shifted + pS * (v_shifted1 - v_shifted) - v
dT_flow <- advection.1D(v, v = vel, dx = 1, C.up = v[L], C.down = v[1],
adv.method = adv_method)$dC
dT <- dT_flow + dT_diff
# heating of the ring after the sensor
dT[sensorP + 1] <- dT[sensorP + 1] + y[L + 1]
# heat dispersion
dT <- dT - heatDisp * (v - baseTemp + 2.5 * sin(t/(60*24) * pi * 2))
return(list(c(dT, 0)))
}
out <- ode.1D(y = vec, times = 1:simTime, func = fun, parms = alpha, nspec = 1,
events = list(func = eventfun, root = T),
rootfunc = rootfun)
if (plot_type == 'sensors') {
## Trend of the temperature at the sensors levels
out %>%
{.[,c(1, sensorP + 1, sensorP + 3, L + 2)]} %>%
as.data.frame() %>%
setNames(c('time', 'pre', 'post', 'heat')) %>%
mutate(Amb = baseTemp + 2.5 * sin(time/(60*24) * pi * 2)) %>%
pivot_longer(-time, values_to = "val", names_to = "trend") %>%
ggplot(aes(time, val)) +
geom_hline(yintercept = thresh) +
geom_line(aes(color = trend)) +
theme_minimal() +
theme(panel.spacing=unit(0, "lines")) +
labs(x = 'time', y = 'T°', color = 'sensor')
} else {
## Trend of the temperature in the whole pipe
out %>%
as.data.frame() %>%
pivot_longer(-time, values_to = "val", names_to = "x") %>%
filter(time %in% round(seq.int(1, simTime, length.out = 40))) %>%
ggplot(aes(as.numeric(x), val)) +
geom_hline(yintercept = thresh) +
geom_line(alpha = .5, show.legend = FALSE) +
geom_point(aes(color = val)) +
scale_color_gradient(low = "#56B1F7", high = "red") +
facet_wrap(~ time) +
theme_minimal() +
theme(panel.spacing=unit(0, "lines")) +
labs(x = 'x', y = 'T°', color = 'T°')
}
}
It's interesting that setting an higher number of segment (L = 500) and high speed (vel = 2) it's possible to observe a spiking sequence in the post heating sensor. Also, the processing time drastically increases, but more as an effect of increased velocity than due to increased pipe resolution.
My biggest doubt now is whether ReacTran::advection.1D() does make sense in my context since I'm modeling water temperature, while this function seems more related to the concentration of a solute in flowing water.

The problem looks like a PDE example with a mobile and a fixed phase. A good introduction about the "method of lines" (MOL) approach with R/deSolve can be be found in the paper about ReachTran from Soetaert and Meysman (2012) doi.org/10.1016/j.envsoft.2011.08.011.
An example PDE can be found at slide 55 of some workshop slides, more in the teaching package RTM.
R/deSolve/ReacTran tries to make ODEs/PDEs easy, but pitfalls remain. If numerical dispersion or oscillations occur, it can be caused by violating the Courant–Friedrichs–Lewy condition.

Related

How to randomly divide interval into non overlapping, spaced bins of equal length

I have an interval from for example from 1 to 671. I would like to divide it into 5 random non-overlapping bins of length 50 but also spaced with min 51.
interval <- 1:671 (example, it does not need to be 671)
Result (this is an example as the bins should be random but within interval, equal length and spaced as defined):
bin1 <- 3:52
bin2 <- 103:152
bin3 <- 209:258
bin4 <- 425:474
bin5 <- 610:659
I would preferentially like the output to be a dataframe(bin, startOfbin, endOfbin), but other types like list would be also ok.
I am currently writing a function in R that would use this sampling for large number of intervals and I cannot come up with sensible solution. Thank you in advance.
If I understand your problem correctly you want 5 parts of your interval with length 50 and minimal distance of 51.
So your randomness is in how much bigger each distance is than 51.
This means you calculate how much space there really is to distribute.
intervalLength <- 671
nBins <- 5
binWidth <- 50
binMinDistance <- 51
spaceToDistribute <- intervalLength - (nBins * binWidth + (nBins - 1) * binMinDistance)
calculate a random splitting of this value
distances <- diff(floor(c(0, sort(runif(nBins))) * spaceToDistribute))
and construct your desired data.frame
startOfBin <- cumsum(distances) + (0:(nBins-1)) * 101
result <- data.frame(bin = 1:nBins, startOfBin = startOfBin, endOfBin = startOfBin + 49)
I don't know if this has the desired kind of randomness:
interval <- 1:671
set.seed(42)
repeat { #rejection sampling
int <- list(interval)
s <- integer(5) * NA
for (i in 1:5) {
#sample an interval from the list
sel <- sample(length(int), 1)
isel <- int[[sel]]
#sample start value
s[[i]] <- sample(head(isel,-49), 1)
#remove sampled values from interval
sp <-
split(isel, findInterval(isel, c(0, s[[i]], s[[i]] + 50, Inf)))
if (s[[i]] > isel[1] &&
s[[i]] < length(isel) - 49)
sp <- sp[-2]
else
if (s[[i]] == isel[1])
sp <- sp[-1]
else
if (s[[i]] == length(isel) - 49)
sp <- head(sp,-1)
sp <- sp[lengths(sp) >= 50]
int <- c(sp, int[-sel])
#break out of for loop
#if not enough intervals of sufficient length left
if (length(int) < 1) break
}
if (!anyNA(s)) break
}
s
#[1] 321 74 245 170 441
library(ggplot2)
ggplot(data.frame(s = s, e = s + 49), aes(x = s, xend = e, y = 0, yend = 0)) +
geom_segment(size = 3) +
theme_minimal() +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank()) +
xlab("") + ylab("")
Something like this could work:
set.seed(111)
n_bins <- 5
bl <- 50
spacing <- 51
start <- 1
end <- 671
end_int <- end - n_bins*bl - (n_bins-1)*spacing
first_bin_start <- sample(start:end_int, 1)
first_bin_end <- first_bin_start + bl
avail_spacing <- end - first_bin_end - (n_bins-1)*bl - (n_bins-1)*spacing
sp <- c()
for (i in 1:(n_bins-1)){
end <- sample(1:avail_spacing, 1)
sp <- c(sp, end)
avail_spacing <- avail_spacing - end
}
bin_start <- c(first_bin_start, first_bin_start + cumsum(spacing + bl + sp))
bin_end <- bin_start + bl
df <- data.frame(bin = 1:n_bins,
bin_start = bin_start,
bin_end = bin_end)
df

How to create covariance matrix in R?

I'm trying to build covariance matrix from a scratch (cov() function). My task is not to use any package. Hence I created my functions:
meanf <- function(x){
sum(x) / length(x)
}
sampleCov <- function(x,y){
stopifnot(identical(length(x), length(y)))
sum((x - meanf(x)) * (y - meanf(y))) / (length(x) - 1)
}
> sampleCov(winequality_red$quality, winequality_red$alcohol)
[1] 0.409789
Unfortunately, I'm stuck here. All loops I tried to apply are missing any point. Of course it's possible to just copy the sampleCov function and make it for every possible combination but that's not my point.
If I understand you correctly then I believe you want to recreate a covariate output like the one returned by cov function.
OPs given function:
meanf <- function(x){
sum(x) / length(x)
}
sampleCov <- function(x,y){
stopifnot(identical(length(x), length(y)))
sum((x - meanf(x)) * (y - meanf(y))) / (length(x) - 1)
}
You can try this way, I have taken mtcars data here:
Covariate Function:
vars <- names(mtcars)
egrid <- expand.grid(vars, vars)
egrid <- data.frame(sapply(egrid, as.character),stringsAsFactors = F)
egrid <- egrid[order(egrid$Var1, egrid$Var2),]
mat <- vector("list", nrow(egrid))
for(i in 1:nrow(egrid)){
mat[[i]] <- sampleCov(mtcars[,egrid[i,"Var1"]], mtcars[,egrid[i,"Var2"]])
}
finaldat <- cbind(egrid, cov = do.call('rbind', mat))
finaldat_list <- split(finaldat, finaldat$Var1)
mat_form <- do.call('cbind', finaldat_list)
cov_values <- mat_form[,grepl("\\.cov",names(mat_form))]
col_values <- mat_form[,paste0(egrid$Var1[1],".Var2")]
final_matrix_cov <- cbind(col_values, cov_values)
Sample Output:
> final_matrix_cov
col_values am.cov carb.cov cyl.cov disp.cov
9 mpg 1.80393145 -5.36310484 -9.1723790 -633.09721
20 cyl -0.46572581 1.52016129 3.1895161 199.66028
31 disp -36.56401210 79.06875000 199.6602823 15360.79983
42 hp -8.32056452 83.03629032 101.9314516 6721.15867
You need the matrix multiplication %*%.
sampleCov <- function(x,y){
stopifnot(identical(length(x), length(y)))
sum((x - mean(x)) %*% (y - mean(y))) / (length(x) - 1)
}
> sampleCov(rnorm(10000),rnorm(10000))
[1] 0.01808466
This is probably a little more than you need, but it should answer your question, and I think it is a nice illustration of the practical application of covariances, correlations, etc.
# load the data
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
# calculate the necessary values:
# I) expected returns for the two assets
er_x <- mean(df$x)
er_y <- mean(df$y)
# II) risk (standard deviation) as a risk measure
sd_x <- sd(df$x)
sd_y <- sd(df$y)
# III) covariance
cov_xy <- cov(df$x, df$y)
# create 1000 portfolio weights (omegas)
x_weights <- seq(from = 0, to = 1, length.out = 1000)
# create a data.table that contains the weights for the two assets
two_assets <- data.table(wx = x_weights,
wy = 1 - x_weights)
# calculate the expected returns and standard deviations for the 1000 possible portfolios
two_assets[, ':=' (er_p = wx * er_x + wy * er_y,
sd_p = sqrt(wx^2 * sd_x^2 +
wy^2 * sd_y^2 +
2 * wx * (1 - wx) * cov_xy))]
two_assets
# lastly plot the values
ggplot() +
geom_point(data = two_assets, aes(x = sd_p, y = er_p, color = wx)) +
geom_point(data = data.table(sd = c(sd_x, sd_y), mean = c(er_x, er_y)),
aes(x = sd, y = mean), color = "red", size = 3, shape = 18) +
# Miscellaneous Formatting
theme_bw() + ggtitle("Possible Portfolios with Two Risky Assets") +
xlab("Volatility") + ylab("Expected Returns") +
scale_y_continuous(label = percent, limits = c(0, max(two_assets$er_p) * 1.2)) +
scale_x_continuous(label = percent, limits = c(0, max(two_assets$sd_p) * 1.2)) +
scale_color_continuous(name = expression(omega[x]), labels = percent)
See the link below for all details.
https://datashenanigan.wordpress.com/2016/05/24/a-gentle-introduction-to-finance-using-r-efficient-frontier-and-capm-part-1/

Fast method for calculating frequency with Rcpp [duplicate]

I have ~ 5 very large vectors (~ 108 MM entries) so any plot/stuff I do with them in R takes quite long time.
I am trying to visualize their distribution (histogram), and was wondering what would be the best way to superimpose their histogram distributions in R without taking too long. I am thinking to first fit a distribution to the histogram, and then plot all the distribution line fits together in one plot.
Do you have some suggestions on how to do that?
Let us say my vectors are:
x1, x2, x3, x4, x5.
I am trying to use this code: Overlaying histograms with ggplot2 in R
Example of the code I am using for 3 vectors (R fails to do the plot):
n = length(x1)
dat <- data.frame(xx = c(x1, x2, x3),yy = rep(letters[1:3],each = n))
ggplot(dat,aes(x=xx)) +
geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)
but it takes forever to produce the plot, and eventually it kicks me out of R. Any ideas on how to use ggplot2 efficiently for large vectors? Seems to me that I had to create a dataframe, of 5*108MM entries and then plot, highly inefficient in my case.
Thanks!
Here's a little snippet of Rcpp that bins data very efficiently - on my computer it takes about a second to bin 100,000,000 observations:
library(Rcpp)
cppFunction('
std::vector<int> bin3(NumericVector x, double width, double origin = 0) {
int bin, nmissing = 0;
std::vector<int> out;
NumericVector::iterator x_it = x.begin(), x_end;
for(; x_it != x.end(); ++x_it) {
double val = *x_it;
if (ISNAN(val)) {
++nmissing;
} else {
bin = (val - origin) / width;
if (bin < 0) continue;
// Make sure there\'s enough space
if (bin >= out.size()) {
out.resize(bin + 1);
}
++out[bin];
}
}
// Put missing values in the last position
out.push_back(nmissing);
return out;
}
')
x8 <- runif(1e8)
system.time(bin3(x8, 1/100))
# user system elapsed
# 1.373 0.000 1.373
That said, hist is pretty fast here too:
system.time(hist(x8, breaks = 100, plot = F))
# user system elapsed
# 7.281 1.362 8.669
It's straightforward to use bin3 to make a histogram or frequency polygon:
# First we create some sample data, and bin each column
library(reshape2)
library(ggplot2)
df <- as.data.frame(replicate(5, runif(1e6)))
bins <- vapply(df, bin3, 1/100, FUN.VALUE = integer(100 + 1))
# Next we match up the bins with the breaks
binsdf <- data.frame(
breaks = c(seq(0, 1, length = 100), NA),
bins)
# Then melt and plot
binsm <- subset(melt(binsdf, id = "breaks"), !is.na(breaks))
qplot(breaks, value, data = binsm, geom = "line", colour = variable)
FYI, the reason I had bin3 on hand is that I'm working on how to make this speed the default in ggplot2 :)

Plotting family of functions with qplot without duplicating data

Given family of functions f(x;q) (x is argument and q is parameter) I'd like to visulaize this function family on x taking from the interval [0,1] for 9 values of q (from 0.1 to 0.9). So far my solution is:
f = function(p,q=0.9) {1-(1-(p*q)^3)^1024}
x = seq(0.0,0.99,by=0.01)
q = seq(0.1,0.9,by=0.1)
qplot(rep(x,9), f(rep(x,9),rep(q,each=100)), colour=factor(rep(q,each=100)),
geom="line", size=I(0.9), xlab="x", ylab=expression("y=f(x)"))
I get quick and easy visual with qplot:
My concern is that this method is rather memory hungry as I need to duplicate x for each parameter and duplicate each parameter value for whole x range. What would be alternative way to produce same graph without these duplications?
At some point ggplot will need to have the data available to plot it and the way that package works prohibits simply doing what you want. I suppose you could set up a blank plot if you know the x and y axis limits, and then loop over the 9 values of q, generating the data for that q, and adding a geom_line layer to the existing plot object. However, you'll have to produce the colours for each layer yourself.
If this is representative of the size of problem you have, I wouldn't worry too much about the memory footprint. We're only talking about a two vectors of length 900
> object.size(rnorm(900))
7240 bytes
and the 100 values over the range of x appears sufficient to give a smooth plot.
for loop to add layers to ggplot
require("ggplot2")
## something to replicate ggplot's colour palette, sure there is something
## to do this already in **ggplot** now...
ggHueColours <- function(n, h = c(0, 360) + 15, l = 65, c = 100,
direction = 1, h.start = 0) {
turn <- function(x, h.start, direction) {
(x + h.start) %% 360 * direction
}
if ((diff(h) %% 360) < 1) {
h[2] <- h[2] - 360 / n
}
hcl(h = turn(seq(h[1], h[2], length = n), h.start = h.start,
direction = direction), c = c, l = l)
}
f = function(p,q=0.9) {1-(1-(p*q)^3)^1024}
x = seq(0.0,0.99,by=0.01)
q = seq(0.1,0.9,by=0.1)
cols <- ggHueColours(n = length(q))
for(i in seq_along(q)) {
df <- data.frame(y = f(x, q[i]), x = x)
if(i == 1) {
plt <- ggplot(df, aes(x = x, y = y)) + geom_line(colour = cols[i])
} else {
plt <- plt + geom_line(data = df, colour = cols[i])
}
}
plt
which gives:
I'll leave the rest to you - I'm not familiar enough with ggplot to draw a legend manually.

ggplot2 - Shade area above line

I have some data that is constrained below a 1:1 line. I would to demonstrate this on a plot by lightly shading the area ABOVE the line, to draw the attention of the viewer to the area beneath the line.
I'm using qplot to generate the graphs. Quickly, I have;
qplot(x,y)+geom_abline(slope=1)
but for the life of me, can't figure out how to easily shade the above area without plotting a separate object. Is there an easy fix for this?
EDIT
Ok, Joran, here is an example data set:
df=data.frame(x=runif(6,-2,2),y=runif(6,-2,2),
var1=rep(c("A","B"),3),var2=rep(c("C","D"),3))
df_poly=data.frame(x=c(-Inf, Inf, -Inf),y=c(-Inf, Inf, Inf))
and here is the code that I'm using to plot it (I took your advice and have been looking up ggplot()):
ggplot(df,aes(x,y,color=var1))+
facet_wrap(~var2)+
geom_abline(slope=1,intercept=0,lwd=0.5)+
geom_point(size=3)+
scale_color_manual(values=c("red","blue"))+
geom_polygon(data=df_poly,aes(x,y),fill="blue",alpha=0.2)
The error kicked back is: "object 'var1' not found" Something tells me that I'm implementing the argument incorrectly...
Building on #Andrie's answer here is a more (but not completely) general solution that handles shading above or below a given line in most cases.
I did not use the method that #Andrie referenced here since I ran into issues with ggplot's tendency to automatically extend the plot extents when you add points near the edges. Instead, this builds the polygon points manually using Inf and -Inf as needed. A few notes:
The points have to be in the 'correct' order in the data frame, since ggplot plots the polygon in the order that the points appear. So it's not enough to get the vertices of the polygon, they must be ordered (either clockwise or counterclockwise) as well.
This solution assumes that the line you are plotting does not itself cause ggplot to extend the plot range. You'll see in my example that I pick a line to draw by randomly choosing two points in the data and drawing the line through them. If you try to draw a line too far away from the rest of you points, ggplot will automatically alter the plot ranges, and it becomes hard to predict what they will be.
First, here's the function that builds the polygon data frame:
buildPoly <- function(xr, yr, slope = 1, intercept = 0, above = TRUE){
#Assumes ggplot default of expand = c(0.05,0)
xrTru <- xr + 0.05*diff(xr)*c(-1,1)
yrTru <- yr + 0.05*diff(yr)*c(-1,1)
#Find where the line crosses the plot edges
yCross <- (yrTru - intercept) / slope
xCross <- (slope * xrTru) + intercept
#Build polygon by cases
if (above & (slope >= 0)){
rs <- data.frame(x=-Inf,y=Inf)
if (xCross[1] < yrTru[1]){
rs <- rbind(rs,c(-Inf,-Inf),c(yCross[1],-Inf))
}
else{
rs <- rbind(rs,c(-Inf,xCross[1]))
}
if (xCross[2] < yrTru[2]){
rs <- rbind(rs,c(Inf,xCross[2]),c(Inf,Inf))
}
else{
rs <- rbind(rs,c(yCross[2],Inf))
}
}
if (!above & (slope >= 0)){
rs <- data.frame(x= Inf,y= -Inf)
if (xCross[1] > yrTru[1]){
rs <- rbind(rs,c(-Inf,-Inf),c(-Inf,xCross[1]))
}
else{
rs <- rbind(rs,c(yCross[1],-Inf))
}
if (xCross[2] > yrTru[2]){
rs <- rbind(rs,c(yCross[2],Inf),c(Inf,Inf))
}
else{
rs <- rbind(rs,c(Inf,xCross[2]))
}
}
if (above & (slope < 0)){
rs <- data.frame(x=Inf,y=Inf)
if (xCross[1] < yrTru[2]){
rs <- rbind(rs,c(-Inf,Inf),c(-Inf,xCross[1]))
}
else{
rs <- rbind(rs,c(yCross[2],Inf))
}
if (xCross[2] < yrTru[1]){
rs <- rbind(rs,c(yCross[1],-Inf),c(Inf,-Inf))
}
else{
rs <- rbind(rs,c(Inf,xCross[2]))
}
}
if (!above & (slope < 0)){
rs <- data.frame(x= -Inf,y= -Inf)
if (xCross[1] > yrTru[2]){
rs <- rbind(rs,c(-Inf,Inf),c(yCross[2],Inf))
}
else{
rs <- rbind(rs,c(-Inf,xCross[1]))
}
if (xCross[2] > yrTru[1]){
rs <- rbind(rs,c(Inf,xCross[2]),c(Inf,-Inf))
}
else{
rs <- rbind(rs,c(yCross[1],-Inf))
}
}
return(rs)
}
It expects the x and y ranges of your data (as in range()), the slope and intercept of the line you are going to plot, and whether you want to shade above or below the line. Here's the code I used to generate the following four examples:
#Generate some data
dat <- data.frame(x=runif(10),y=runif(10))
#Select two of the points to define the line
pts <- dat[sample(1:nrow(dat),size=2,replace=FALSE),]
#Slope and intercept of line through those points
sl <- diff(pts$y) / diff(pts$x)
int <- pts$y[1] - (sl*pts$x[1])
#Build the polygon
datPoly <- buildPoly(range(dat$x),range(dat$y),
slope=sl,intercept=int,above=FALSE)
#Make the plot
p <- ggplot(dat,aes(x=x,y=y)) +
geom_point() +
geom_abline(slope=sl,intercept = int) +
geom_polygon(data=datPoly,aes(x=x,y=y),alpha=0.2,fill="blue")
print(p)
And here are some examples of the results. If you find any bugs, of course, let me know so that I can update this answer...
EDIT
Updated to illustrate solution using OP's example data:
set.seed(1)
dat <- data.frame(x=runif(6,-2,2),y=runif(6,-2,2),
var1=rep(c("A","B"),3),var2=rep(c("C","D"),3))
#Create polygon data frame
df_poly <- buildPoly(range(dat$x),range(dat$y))
ggplot(data=dat,aes(x,y)) +
facet_wrap(~var2) +
geom_abline(slope=1,intercept=0,lwd=0.5)+
geom_point(aes(colour=var1),size=3) +
scale_color_manual(values=c("red","blue"))+
geom_polygon(data=df_poly,aes(x,y),fill="blue",alpha=0.2)
and this produces the following output:
As far as I know there is no other way other than creating a polygon with alpha-blended fill. For example:
df <- data.frame(x=1, y=1)
df_poly <- data.frame(
x=c(-Inf, Inf, -Inf),
y=c(-Inf, Inf, Inf)
)
ggplot(df, aes(x, y)) +
geom_blank() +
geom_abline(slope=1, intercept=0) +
geom_polygon(data=df_poly, aes(x, y), fill="blue", alpha=0.2) +
One easy way to do this is to use geom_ribbon with the ymax value set to Inf, and the ymin value calculated by stat_function:
library(ggplot2)
myfun <- function(x) x
myfun2 <- function(x) x^2
ggplot() +
geom_function(fun = myfun) +
geom_ribbon(stat = 'function', fun = myfun,
mapping = aes(ymin = after_stat(y), ymax = Inf),
fill = 'lightblue', alpha = 0.5)
ggplot() +
geom_function(fun = myfun2) +
geom_ribbon(stat = 'function', fun = myfun2,
mapping = aes(ymin = after_stat(y), ymax = Inf),
fill = 'lightblue', alpha = 0.5)
Created on 2022-05-26 by the reprex package (v2.0.1)
Based on a minimally modified version of #joran's answer:
library(ggplot2)
library(tidyr)
library(dplyr)
buildPoly <- function(slope, intercept, above, xr, yr){
# By Joran Elias, #joran https://stackoverflow.com/a/6809174/1870254
#Find where the line crosses the plot edges
yCross <- (yr - intercept) / slope
xCross <- (slope * xr) + intercept
#Build polygon by cases
if (above & (slope >= 0)){
rs <- data.frame(x=-Inf,y=Inf)
if (xCross[1] < yr[1]){
rs <- rbind(rs,c(-Inf,-Inf),c(yCross[1],-Inf))
}
else{
rs <- rbind(rs,c(-Inf,xCross[1]))
}
if (xCross[2] < yr[2]){
rs <- rbind(rs,c(Inf,xCross[2]),c(Inf,Inf))
}
else{
rs <- rbind(rs,c(yCross[2],Inf))
}
}
if (!above & (slope >= 0)){
rs <- data.frame(x= Inf,y= -Inf)
if (xCross[1] > yr[1]){
rs <- rbind(rs,c(-Inf,-Inf),c(-Inf,xCross[1]))
}
else{
rs <- rbind(rs,c(yCross[1],-Inf))
}
if (xCross[2] > yr[2]){
rs <- rbind(rs,c(yCross[2],Inf),c(Inf,Inf))
}
else{
rs <- rbind(rs,c(Inf,xCross[2]))
}
}
if (above & (slope < 0)){
rs <- data.frame(x=Inf,y=Inf)
if (xCross[1] < yr[2]){
rs <- rbind(rs,c(-Inf,Inf),c(-Inf,xCross[1]))
}
else{
rs <- rbind(rs,c(yCross[2],Inf))
}
if (xCross[2] < yr[1]){
rs <- rbind(rs,c(yCross[1],-Inf),c(Inf,-Inf))
}
else{
rs <- rbind(rs,c(Inf,xCross[2]))
}
}
if (!above & (slope < 0)){
rs <- data.frame(x= -Inf,y= -Inf)
if (xCross[1] > yr[2]){
rs <- rbind(rs,c(-Inf,Inf),c(yCross[2],Inf))
}
else{
rs <- rbind(rs,c(-Inf,xCross[1]))
}
if (xCross[2] > yr[1]){
rs <- rbind(rs,c(Inf,xCross[2]),c(Inf,-Inf))
}
else{
rs <- rbind(rs,c(yCross[1],-Inf))
}
}
return(rs)
}
you can also extend ggplot like this:
GeomSection <- ggproto("GeomSection", GeomPolygon,
default_aes = list(fill="blue", size=0, alpha=0.2, colour=NA, linetype="dashed"),
required_aes = c("slope", "intercept", "above"),
draw_panel = function(data, panel_params, coord) {
ranges <- coord$backtransform_range(panel_params)
data$group <- seq_len(nrow(data))
data <- data %>% group_by_all %>% do(buildPoly(.$slope, .$intercept, .$above, ranges$x, ranges$y)) %>% unnest
GeomPolygon$draw_panel(data, panel_params, coord)
}
)
geom_section <- function (mapping = NULL, data = NULL, ..., slope, intercept, above,
na.rm = FALSE, show.legend = NA) {
if (missing(mapping) && missing(slope) && missing(intercept) && missing(above)) {
slope <- 1
intercept <- 0
above <- TRUE
}
if (!missing(slope) || !missing(intercept)|| !missing(above)) {
if (missing(slope))
slope <- 1
if (missing(intercept))
intercept <- 0
if (missing(above))
above <- TRUE
data <- data.frame(intercept = intercept, slope = slope, above=above)
mapping <- aes(intercept = intercept, slope = slope, above=above)
show.legend <- FALSE
}
layer(data = data, mapping = mapping, stat = StatIdentity,
geom = GeomSection, position = PositionIdentity, show.legend = show.legend,
inherit.aes = FALSE, params = list(na.rm = na.rm, ...))
}
To be able to use it as easily as a geom_abline:
set.seed(1)
dat <- data.frame(x=runif(6,-2,2),y=runif(6,-2,2),
var1=rep(c("A","B"),3),var2=rep(c("C","D"),3))
ggplot(data=dat,aes(x,y)) +
facet_wrap(~var2) +
geom_abline(slope=1,intercept=0,lwd=0.5)+
geom_point(aes(colour=var1),size=3) +
scale_color_manual(values=c("red","blue"))+
geom_section(slope=1, intercept=0, above=TRUE)
This variant has the additional advantage that it also works with multiple slopes and non-default limit expansions.
ggplot(data=dat,aes(x,y)) +
facet_wrap(~var2) +
geom_abline(slope=1,intercept=0,lwd=0.5)+
geom_point(aes(colour=var1),size=3) +
scale_color_manual(values=c("red","blue"))+
geom_section(data=data.frame(slope=c(-1,1), above=c(FALSE,TRUE), selected=c("selected","selected 2")),
aes(slope=slope, above=above, intercept=0, fill=selected), size=1) +
expand_limits(x=3)

Resources