Plotting inequalities in r - r

I have a number of values that come from my data, e.g. x1, x2, x3 etc., and I'm trying to plot shaded regions that are dependent on functionals of these values. These regions are defined by inequalities of the form:
f(x1, x2) <= A <= f(x3, x4)
f(x5, x6) <= A + B <= f(x7, x8)
and I would like to have A on the x-axis, B on the y-axis.
I've tried putting the values in a data frame and playing around with ggplot, but my r skills are lacking and beyond simple line-plots/bar-charts etc. I'm a bit lost, I especially can't figure out the syntax for plotting inequalities as regions.
Thanks
Edit for example:
x1 x2 x3 x4 x5 x6
Plot1 0.2 0.3 0.24 0.14 0.17 0.31
Plot2 0.14 0.35 0.30 0.11 0.21 0.39
with the hope of plotting two separate graphs (one for Plot1, one for Plot2) with the regions defined by:
max(x2 + x3, x4 + x5) <= A <= min(1 - x1, 1 - x6)
max(x3, x5) <= A + B <= min(x2 + x3, x5 + x6)

I'm not aware of any build-in R function to do this. You can however make a data.frame with a grid and compute for each gridpoint if the inequalities are satisfied. Then you can plot with geom_tile. A working example:
## Equation 1: f1(A,B) <= A < f2(A,B)
## Equation 2: f3(A,B) < A + B < f4(A,B)
library(ggplot2)
grid <- expand.grid( A = seq(-2,2, length.out = 100), B = seq(-1, 1, length.out = 100))
f1 <- function(A, B) B
f2 <- function(A, B) 2*(A^2 + B^2)
f3 <- function(A, B) A*B
f4 <- function(A, B) 2*A+B^3
grid$inside_eq1 <- (f1(grid$A, grid$B) <= grid$A) & (grid$A < f2(grid$A, grid$B))
grid$inside_eq2 <- (f3(grid$A, grid$B) < grid$A + grid$B) & (grid$A + grid$B < f4(grid$A, grid$B))
grid$inside <- grid$inside_eq1 & grid$inside_eq2
ggplot(grid) +
geom_tile(aes(x = A, y = B, color = inside, fill = inside))
Edit: You asked in the comment how you could draw a border around the region. I think a general soultion is a bit complex, but in your case it's not so hard. Your functions in the inequalities are independent of A and B, they are constant and thus convex. Since the intersection of convex sets is again convex the region in which the inequalities are fulfilled is also convex. This leads to a solution which makes use of the convex hull of the region:
## if you want a border for the region (works only if all equations are convex!)
f1 <- function() 0.1
f2 <- function() 1.8
f3 <- function() -1.2
f4 <- function() 0.3
grid$inside_eq1 <- (f1() <= grid$A) & (grid$A < f2())
grid$inside_eq2 <- (f3() < grid$A + grid$B) & (grid$A + grid$B < f4())
grid$inside <- grid$inside_eq1 & grid$inside_eq2
hull <- chull(grid$A[grid$inside], grid$B[grid$inside])
ggplot(grid, aes(x = A, y = B)) +
geom_tile(aes(color = inside, fill = inside)) +
geom_path(data = grid[grid$inside, ][c(hull, hull[1]), ], size = 2)

Related

R - How to select or separate the bottom half of cyclic, irregular data?

Below are data (50 points) obtained with cyclic voltammetry. I need to analyze just a portion of it: the portion below between the leftmost to the rightmost point (for example, the data from the green point to the blue point in the plot below).
x <- c(-0.4982, -0.3770, -0.2545, -0.1323, -0.0096, 0.1127, 0.2353, 0.3577, 0.4802,
0.6024, 0.7251, 0.8470, 0.9696, 0.9109, 0.7887, 0.6662, 0.5441, 0.4213, 0.2990,
0.1763, 0.0541, -0.0685, -0.1906, -0.3133, -0.4356, -0.4395, -0.3172, -0.1946,
-0.0723, 0.0501, 0.1724, 0.2950, 0.4175, 0.5400, 0.6623, 0.7848, 0.9070, 0.9735,
0.8510, 0.7290, 0.6063, 0.4840, 0.3614, 0.2391, 0.1165, -0.0090, -0.1316, -0.2539,
-0.3765)
y <- c(-3.24903226, -1.26193548, -0.51612903, -0.09741935, 0.21161290, 0.45870968,
0.69096774, 1.26387097, 4.03225806, 4.77806452, 4.55677419, 3.88129032,
3.36645161, 2.23677419, 1.37741935, 0.74516129, 0.22645161, -0.23161290,
-0.74129032, -1.66387097, -3.84709677, -6.11225806, -7.21741935, -6.37548387,
-4.11225806, -1.88516129, -0.78967742, -0.24709677, 0.08774194, 0.35096774,
0.57612903, 0.90193548, 2.16322581, 4.85935484, 5.02387097, 4.33870968,
3.66258065, 2.88645161, 1.77870968, 1.06000000, 0.48451613, 0.00193548,
-0.46193548, -1.10967742, -2.53161290, -5.13741935, -6.94903226, -7.14387097,
-5.36580645)
The image shows the complete dataset (black line). The points are the x and y sample data. Of course, I can't simple slice the y below the leftmost x, that is y[y < y[which.min(x)]] because I'll lost almost half of the data I want.
Any idea? Thanks
Would it work to take the points where x is decreasing?
x2 = x[x < lag(x) | x > lead(x)]
y2 = y[x < lag(x) | x > lead(x)]
plot(x2, y2)
df <- data.frame(x, y)
df2 <- data.frame(x = x[x < lag(x) | x > lead(x)],
y = y[x < lag(x) | x > lead(x)]) # edit added lead to get right edge
library(ggplot2)
ggplot(df, aes(x, y)) +
geom_point() +
geom_path() +
geom_point(data = df2, color = "red", size = 3)

How to perform nonlinear least squares with shared parameters in R?

I would like to perform nonlinear least squares regression in R where I simultaneously minimize the squared residuals of three models (see below). Now, the three models share some of the parameters, in my example, parameters b and d.
Is there a way of doing this with either nls(), or, either packages minpack.lm or nlsr?
So, ideally, I would like to generate the objective function (the sum of least squares of all models together) and regress all parameters at once: a1, a2, a3, b, c1, c2, c3 and d.
(I am trying to avoid running three independent regressions and then perform some averaging on b and d.)
my_model <- function(x, a, b, c, d) {
a * b ^ (x - c) + d
}
# x values
x <- seq(0, 10, 0.2)
# Shared parameters
b <- 2
d <- 10
a1 <- 1
c1 <- 1
y1 <- my_model(x,
a = a1,
b = b,
c = c1,
d = d) + rnorm(length(x))
a2 <- 2
c2 <- 5
y2 <- my_model(x,
a = a2,
b = b,
c = c2,
d = d) + rnorm(length(x))
a3 <- -2
c3 <- 3
y3 <- my_model(x,
a = a3,
b = b,
c = c3,
d = d) + rnorm(length(x))
plot(
y1 ~ x,
xlim = range(x),
ylim = d + c(-50, 50),
type = 'b',
col = 'red',
ylab = 'y'
)
lines(y2 ~ x, type = 'b', col = 'green')
lines(y3 ~ x, type = 'b', col = 'blue')
Below we run nls (using a slightly modified model) and nlxb (from nlsr) but nlxb stops before convergence. Desite these problems both of these nevertheless do give results which visually fit the data well. These problems suggest that there are problems with the model itself so in the Other section, guided by the nlxb output, we show how to fix the model giving a submodel of the original model which fits the data easily with both nls and nlxb and also gives a good fit. At the end in the Notes section we provide the data in reproducible form.
nls
Assuming the setup shown reproducibly in the Note at the end, reformulate the problem for the nls plinear algorithm by defining a right hand side matrix whose columns multiply each of the linear parameters, a1, a2, a3 and d, respectively. plinear does not require starting values for those simplifying the setup. It will report them as .lin1, .lin2, .lin3 and .lin4 respectively.
To get starting values we used a simpler model with no grouping and a grid search over b from 1 to 10 and c also from 1 to 10 using nls2 in the package of the same name. We also found that nls still produced errors but by using abs in the formula, as shown, it ran to completion.
The problems with the model suggest that there is a fundamental problem with it and in the Other section we discuss how to fix it up.
xx <- c(x, x, x)
yy <- c(y1, y2, y3)
# startingi values using nls2
library(nls2)
fo0 <- yy ~ cbind(b ^ abs(xx - c), 1)
st0 <- data.frame(b = c(1, 10), c = c(1, 10))
fm0 <- nls2(fo0, start = st0, alg = "plinear-brute")
# run nls using starting values from above
g <- rep(1:3, each = length(x))
fo <- yy ~ cbind((g==1) * b ^ abs(xx - c[g]),
(g==2) * b ^ abs(xx - c[g]),
(g==3) * b ^ abs(xx - c[g]),
1)
st <- with(as.list(coef(fm0)), list(b = b, c = c(c, c, c)))
fm <- nls(fo, start = st, alg = "plinear")
plot(yy ~ xx, col = g)
for(i in unique(g)) lines(predict(fm) ~ xx, col = i, subset = g == i)
fm
giving:
Nonlinear regression model
model: yy ~ cbind((g == 1) * b^abs(xx - c[g]), (g == 2) * b^abs(xx - c[g]), (g == 3) * b^abs(xx - c[g]), 1)
data: parent.frame()
b c1 c2 c3 .lin1 .lin2 .lin3 .lin4
1.997 0.424 1.622 1.074 0.680 0.196 -0.532 9.922
residual sum-of-squares: 133
Number of iterations to convergence: 5
Achieved convergence tolerance: 5.47e-06
(continued after plot)
nlsr
With nlsr it would be done like this. No grid search for starting values was needed and adding abs was not needed either. The b and d values seem similar to the nls solution but the other coefficients differ. Visually both solutions seem to fit the data.
On the other hand from the JSingval column we see that the jacobian is rank deficient which caused it to stop and not produce SE values and the convergence is in doubt (although it may be sufficient given that visually the plot, not shown, seems like a good fit). We discuss how to fix this up in the Other section.
g1 <- g == 1; g2 <- g == 2; g3 <- g == 3
fo2 <- yy ~ g1 * (a1 * b ^ (xx - c1) + d) +
g2 * (a2 * b ^ (xx - c2) + d) +
g3 * (a3 * b ^ (xx - c3) + d)
st2 <- list(a1 = 1, a2 = 1, a3 = 1, b = 1, c1 = 1, c2 = 1, c3 = 1, d = 1)
fm2 <- nlxb(fo2, start = st2)
fm2
giving:
vn: [1] "yy" "g1" "a1" "b" "xx" "c1" "d" "g2" "a2" "c2" "g3" "a3" "c3"
no weights
nlsr object: x
residual sumsquares = 133.45 on 153 observations
after 16 Jacobian and 22 function evaluations
name coeff SE tstat pval gradient JSingval
a1 3.19575 NA NA NA 9.68e-10 4097
a2 0.64157 NA NA NA 8.914e-11 662.5
a3 -1.03096 NA NA NA -1.002e-09 234.9
b 1.99713 NA NA NA -2.28e-08 72.57
c1 2.66146 NA NA NA -2.14e-09 10.25
c2 3.33564 NA NA NA -3.955e-11 1.585e-13
c3 2.0297 NA NA NA -7.144e-10 1.292e-13
d 9.92363 NA NA NA -2.603e-12 3.271e-14
We can calculate SE's using nls2 as a second stage but this still does not address the problem with the whole lthing that the singular values suggest.
summary(nls2(fo2, start = coef(fm2), algorithm = "brute-force"))
giving:
Formula: yy ~ g1 * (a1 * b^(xx - c1) + d) + g2 * (a2 * b^(xx - c2) + d) +
g3 * (a3 * b^(xx - c3) + d)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a1 3.20e+00 5.38e+05 0.0 1
a2 6.42e-01 3.55e+05 0.0 1
a3 -1.03e+00 3.16e+05 0.0 1
b 2.00e+00 2.49e-03 803.4 <2e-16 ***
c1 2.66e+00 9.42e-02 28.2 <2e-16 ***
c2 3.34e+00 2.43e+05 0.0 1
c3 2.03e+00 8.00e+05 0.0 1
d 9.92e+00 4.42e+05 0.0 1
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.959 on 145 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: NA
Other
When nls has trouble fitting a model it often suggests that there is something wrong with the model itself. Playing around with it a bit, guided by the JSingval column in nlsr output above which suggests that c parameters or d might be the problem, we find that if we fix all c parameter values to 0 then the model is easy to fit given sufficiently good starting values and it still gives a low residual sum of squares.
library(nls2)
fo3 <- yy ~ cbind((g==1) * b ^ xx, (g==2) * b ^ xx, (g==3) * b ^ xx, 1)
st3 <- coef(fm0)["b"]
fm3 <- nls(fo3, start = st3, alg = "plinear")
giving:
Nonlinear regression model
model: yy ~ cbind((g == 1) * b^xx, (g == 2) * b^xx, (g == 3) * b^xx, 1)
data: parent.frame()
b .lin1 .lin2 .lin3 .lin4
1.9971 0.5071 0.0639 -0.2532 9.9236
residual sum-of-squares: 133
Number of iterations to convergence: 4
Achieved convergence tolerance: 1.67e-09
which the following anova indicates is comparable to fm from above despite having 3 fewer parameters:
anova(fm3, fm)
giving:
Analysis of Variance Table
Model 1: yy ~ cbind((g == 1) * b^xx, (g == 2) * b^xx, (g == 3) * b^xx, 1)
Model 2: yy ~ cbind((g == 1) * b^abs(xx - c[g]), (g == 2) * b^abs(xx - c[g]), (g == 3) * b^abs(xx - c[g]), 1)
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1 148 134
2 145 133 3 0.385 0.14 0.94
We can redo fm3 using nlxb like this:
fo4 <- yy ~ g1 * (a1 * b ^ xx + d) +
g2 * (a2 * b ^ xx + d) +
g3 * (a3 * b ^ xx + d)
st4 <- list(a1 = 1, a2 = 1, a3 = 1, b = 1, d = 1)
fm4 <- nlxb(fo4, start = st4)
fm4
giving:
nlsr object: x
residual sumsquares = 133.45 on 153 observations
after 24 Jacobian and 33 function evaluations
name coeff SE tstat pval gradient JSingval
a1 0.507053 0.005515 91.94 1.83e-132 8.274e-08 5880
a2 0.0638554 0.0008735 73.11 4.774e-118 1.26e-08 2053
a3 -0.253225 0.002737 -92.54 7.154e-133 -4.181e-08 2053
b 1.99713 0.002294 870.6 2.073e-276 -2.55e-07 147.5
d 9.92363 0.09256 107.2 3.367e-142 -1.219e-11 10.26
Note
The assumed input below is the same as in the question except we additionally
set the seed to make it reproducible.
set.seed(123)
my_model <- function(x, a, b, c, d) a * b ^ (x - c) + d
x <- seq(0, 10, 0.2)
b <- 2; d <- 10 # shared
a1 <- 1; c1 <- 1
y1 <- my_model(x, a = a1, b = b, c = c1, d = d) + rnorm(length(x))
a2 <- 2; c2 <- 5
y2 <- my_model(x, a = a2, b = b, c = c2, d = d) + rnorm(length(x))
a3 <- -2; c3 <- 3
y3 <- my_model(x, a = a3, b = b, c = c3, d = d) + rnorm(length(x))
I'm not sure this is really the best way, but you could minimize the sum of the squared residuals using optim().
#start values
params <- c(a1=1, a2=1, a3=1, b=1, c1=1, c2=1, c3=1,d=1)
# minimize total sum of squares of residuals
fun <- function(p) {
sum(
(y1-my_model(x, p["a1"], p["b"], p["c1"], p["d"]))^2 +
(y2-my_model(x, p["a2"], p["b"], p["c2"], p["d"]))^2 +
(y3-my_model(x, p["a3"], p["b"], p["c3"], p["d"]))^2
)
}
out <- optim(params, fun, method="BFGS")
out$par
# a1 a2 a3 b c1 c2 c3
# 0.8807542 1.0241804 -2.8805848 1.9974615 0.7998103 4.0030597 3.5184600
# d
# 9.8764917
And we can add the plots on top of the image
curve(my_model(x, out$par["a1"], out$par["b"], out$par["c1"], out$par["d"]), col="red", add=T)
curve(my_model(x, out$par["a2"], out$par["b"], out$par["c2"], out$par["d"]), col="green", add=T)
curve(my_model(x, out$par["a3"], out$par["b"], out$par["c3"], out$par["d"]), col="blue", add=T)

Mixed Integer Programming in R - Indicator functions

I hope this message finds you well.
I am trying to solve an optimization problem formulated as a Mixed Integer Program with the lpSolveAPI R-package. However, there are indicator functions in the objective function and in some constraints. To be more specific, consider the following optimization problem:
min{ 2.8 * x1 + 3.2 * x2 + 3.5 * x3 +
17.5 * delta(x1) + 2.3 * delta(x2) + 5.5 * delta(x3) }
subject to:
0.4 * x1 + 8.7 * x2 + 4.5 * x3 <=
387 - 3 * delta(x1) - 1 * delta(x2) - 3 * delta(x3)
x1 <= 93 * delta(x1)
x2 <= 94 * delta(x2),
x3 <= 100 * delta(x3), and
x1, x2, and x3 are non-negative integers.
In this problem, for all i in {1, 2, 3}, delta(xi) = 1 if xi > 0, whereas delta(xi) = 0 otherwise.
The R-code I have so far is:
install.packages("lpSolveAPI")
library(lpSolveAPI)
a <- c(3, 1, 3)
b <- c(0.4, 8.7, 4.5)
q <- 387
M <- c(93, 94, 100)
A <- c(17.5, 2.3, 5.5)
h <- c(2.8, 3.2, 3.5)
Fn <- function(u1, u2, u3, u4){
lprec <- make.lp(0, 3)
lp.control(lprec, "min")
set.objfn(lprec, u1)
add.constraint(lprec, u2, "<=", u3)
set.bounds(lprec, lower = rep(0, 3), upper = u4)
set.type(lprec, columns = 1:3, type = "integer")
solve(lprec)
return(list(Soln = get.variables(lprec), MinObj = get.objective(lprec)))
}
TheTest <- Fn(u1 = h, u2 = b, u3 = q, u4 = M)
Please, I was wondering if someone could tell me how to put delta functions into this R-code to solve the aforementioned optimization problem.
Rodrigo.
A constraint like x1 <= 93 * delta(x1) looks very strange to me. I think this is just x1 <= 93. For a MIP solver replace the function delta(x) by a binary variable d. Then add the constraint d <= x <= M*d where M is an upper bound on x. To be explicit, for your model we have:
min 2.8*x1 + 3.2*x2 + 3.5*x3 + 17.5*d1 + 2.3*d2 + 5.5*d3
0.4*x1 + 8.7*x2 + 4.5*x3 <= 387 - 3*d1 - d2 - 3*d3
d1 <= x1 <= 93*d1
d2 <= x2 <= 94*d2
d3 <= x3 <= 100*d3
x1 integer in [0,93]
x2 integer in [0,94]
x3 integer in [0,100]
d1,d2,d3 binary
This is now trivial to solve with any MIP solver. Note that a double inequality like d1 <= x1 <= 93*d1 can be written as two inequalities: d1<=x1 and x1<=93*d1.

plot gradient with three dimensions (coded as red, green & blue) in a ternary plot (ggtern)

I have the following data frame;
df <- data.frame(expand.grid(A = seq(0,1,0.1), B = seq(0,1,0.1), C = seq(0,1,0.1)))
df <- df[rowSums(df[,1:3]) == 1,]
df <- data.frame(df, value_A = 0.2 * df$A + -0.4 * df$B + 0 * df$C, value_B = 0.8 * df$A + 0.1 * df$B + 0.5 * df$C, value_C = 0 * df$A + 0.8 * df$B + 0 * df$C)
df[,c(4:6)] <- (df[,c(4:6)] + abs(apply(df[,c(4:6)],1,min))) / (apply(df[,c(4:6)],1,max) + abs(apply(df[,c(4:6)],1,min)))
df <- data.frame(df, color = rgb(red=df$value_A, green=df$value_B, blue=df$value_C))
In the data-frame, each row gives me a proportion of A's, B's and C's that sum up to one.
Further, for each A, B and C, I have a value.
From these three values I generate one RGB value that gives me an indication of the relative importance of A,B & C under different proportions of A, B and C's.
Now I would like to plot these RGB values as a surface in a ternary plot.
I can plot one "dimension" with the following code using the ggtern package in R;
library(ggtern)
ggtern(df, aes(A,B,C, value=value_A)) +
theme_showarrows() +
stat_interpolate_tern(geom="polygon",
formula=value~x+y,
n=20, method='lm',
breaks=seq(0,1, by=0.001),
aes(fill=..level..), expand=F
) +
scale_fill_gradient(low="red", high="green")
But instead of value_A, I actually want to use the RGB values directly.
However, I can not figure out, how, instead of value=value_A I can specify the color value for each point from which the surface is calculated directly.
Is this possible with ggplot2 / ggtern?

Scalar field visualisation in R

I have a table with 3 columns
x y f
-101.0 -101.0 0.0172654144157
...
x and y are coordinates. f is value.
I want to make a 2d picture, where x and y are coordinates and f is a colour. But I need this picture to be not a number of coloured points, but a continuous schedule.
Help me someone please
There are a couple of simple ways to do this if you have a regular grid with your data. Try:
require(ggplot2)
require(lattice)
# make some data
s = 100
i = 0.5
x0 <- 27
y0 <- 34
df <- expand.grid(x=seq(0,s,i), y=seq(0,s,i))
df <- transform(df, f = cos( 10*pi * sqrt((x - x0)^2 + (y-y0)^2)))
# try as points
ggplot(df,aes(x,y,color=f)) + geom_point()
# or as tile
ggplot(df,aes(x,y,fill=f)) + geom_tile()
# or even easier with lattice
levelplot(f ~ x * y, df)
Output examples:

Resources