How can I plot this 'integral' code in R? - r

f1 <- function(x){integrate(f = function(t){
sqrt(t^3-1)
}, lower = 1, upper = x)}
The domain of x is 1 to 4. f1 always emit value characterized 'integrate'. I don't know how to plot this integral function in R.
Thanks to anyone who can help me.

You may need to compute the values of your function f1 and then use an apply function as follows:
f1 <- function(x) {
integrate( function(t) sqrt(t^3-1), lower = 1, upper = x)
}
u <- seq(1, 4, by = 0.1) # Defining a vector of values from 1 to 4 in steps of .1
f1u <- sapply(u, function(x) f1(x)$value) #computing the values of f1 over u
plot(u,f1u, type = "l", xlab = "x", ylab = "f1(x)") # your plot

You can vectorize the upper argument to integrate like so:
vintegrate <- Vectorize(integrate, "upper")
f1 <- function(x) {
unlist(vintegrate(function(t) sqrt(t^3-1), lower = 1, upper = x)[1,])
}
Then you can plot using the curve function in base R:
curve(f1(x), from = 1, to = 4)
Or using ggplot2:
library(ggplot2)
ggplot(data.frame(x = 0)) +
geom_function(fun = f1) +
xlim(1, 4)
Without vectorizing, the upper argument expects an vector of length 1 or else it will error:
integrate(function(t) sqrt(t^3-1), lower = 1, upper = 1:4)
Error in integrate(function(t) sqrt(t^3 - 1), lower = 1, upper = 1:4) :
length(upper) == 1 is not TRUE
After vectorizing:
vintegrate(function(t) sqrt(t^3-1), lower = 1, upper = 1:4)
[,1] [,2] [,3] [,4]
value 0 1.515927 5.356315 11.84309
abs.error 0 0.0001312847 0.0003641383 0.0006563824
subdivisions 1 5 5 5
message "OK" "OK" "OK" "OK"
call Expression Expression Expression Expression
And we use unlist and [1,] to get the value.

Related

Solutions to a system of inequalities in R

Suppose I have the following system of inequalities:
-2x + y <= -3
1.25x + y <= 2.5
y >= -3
I want to find multiple tuples of (x, y) that satisfy the above inequalities.
library(Rglpk)
obj <- numeric(2)
mat <- matrix(c(-2, 1, 1.25, 1, 0, 1), nrow = 3)
dir <- c("<=", "<=", ">=")
rhs <- c(-3, 2.5, -3)
Rglpk_solve_LP(obj = obj, mat = mat, dir = dir, rhs = rhs)
Using the above code only seems to return 1 possible solution tuple (1.5, 0). Is possible to return other solution tuples?
Edit: Based on the comments, I would be interested to learn if there are any functions that could help me find the corner points.
Actually to understand the possible answers for the given question we can try to solve the system of inequalities graphically.
There was a nice answer concerning plotting of inequations in R at stackowerflow. Using the given aproach we can plot the following graph:
library(ggplot2)
fun1 <- function(x) 2*x - 3 # this is the same as -2x + y <= -3
fun2 <- function(x) -1.25*x + 2.5 # 1.25x + y <= 2.5
fun3 <- function(x) -3 # y >= -3
x1 = seq(-1,5, by = 1/16)
mydf = data.frame(x1, y1=fun1(x1), y2=fun2(x1),y3= fun3(x1))
mydf <- transform(mydf, z = pmax(y3,pmin(y1,y2)))
ggplot(mydf, aes(x = x1)) +
geom_line(aes(y = y1), colour = 'blue') +
geom_line(aes(y = y2), colour = 'green') +
geom_line(aes(y = y3), colour = 'red') +
geom_ribbon(aes(ymin=y3,ymax = z), fill = 'gray60')
All the possible (infinite by number) tuples lie inside the gray triangle.
The vertexes can be found using the following code.
obj <- numeric(2)
mat <- matrix(c(-2, 1.25, 1, 1), nrow = 2)
rhs <- matrix(c(-3, 2.5), nrow = 2)
aPoint <- solve(mat, rhs)
mat <- matrix(c(-2, 0, 1, 1), nrow = 2)
rhs <- matrix(c(-3, -3), nrow = 2)
bPoint <- solve(mat, rhs)
mat <- matrix(c(1.25, 0, 1, 1), nrow = 2)
rhs <- matrix(c(2.5, -3), nrow = 2)
cPoint <- solve(mat, rhs)
Note the order of arguments of matrices.
And you get the coordinates:
> aPoint
[,1]
[1,] 1.6923077
[2,] 0.3846154
> bPoint
[,1]
[1,] 0
[2,] -3
> cPoint
[,1]
[1,] 4.4
[2,] -3.0
All the codes below are with base R only (no need library(Rglpk))
1. Corner Points
If you want to get all the corner points, here is one option
A <- matrix(c(-2, 1.25, 0, 1, 1, -1), nrow = 3)
b <- c(-3, 2.5, 3)
# we use `det` to check if the coefficient matrix is singular. If so, we return `Inf`.
xh <-
combn(nrow(A), 2, function(k) {
if (det(A[k, ]) == 0) {
rep(NA, length(k))
} else {
solve(A[k, ], b[k])
}
})
# We filter out the points that satisfy the constraint
corner_points <- t(xh[, colSums(A %*% xh <= b, na.rm = TRUE) == length(b)])
such that
> corner_points
[,1] [,2]
[1,] 1.692308 0.3846154
[2,] 0.000000 -3.0000000
[3,] 4.400000 -3.0000000
2. Possible Tuples
If you want to have multiple tuples, e.g., n=10, we can use Monte Carlo simulation (based on the obtained corner_points in the previous step) to select the tuples under the constraints:
xrange <- range(corner_points[, 1])
yrange <- range(corner_points[, 2])
n <- 10
res <- list()
while (length(res) < n) {
px <- runif(1, xrange[1], xrange[2])
py <- runif(1, yrange[1], yrange[2])
if (all(A %*% c(px, py) <= b)) {
res[length(res) + 1] <- list(c(px, py))
}
}
and you will see n possible tuples in a list like below
> res
[[1]]
[1] 3.643167 -2.425809
[[2]]
[1] 2.039007 -2.174171
[[3]]
[1] 0.4990635 -2.3363637
[[4]]
[1] 0.6168402 -2.6736421
[[5]]
[1] 3.687389 -2.661733
[[6]]
[1] 3.852258 -2.704395
[[7]]
[1] 1.7571062 0.1067597
[[8]]
[1] 3.668024 -2.771307
[[9]]
[1] 2.108187 -1.365349
[[10]]
[1] 2.106528 -2.134310
First of all, the matrix representing the three equations needs a small correction, because R fills matrices column by column :
-2x + y <= -3
1.25x + y <= 2.5
y >= -3
mat <- matrix(c(-2, 1.25, 0, 1, 1, 1), nrow = 3
# and not : mat <- matrix(c(-2, 1, 1.25, 1, 0, 1), nrow = 3)
To get different tuples, you could modify the objective function :
obj <- numeric(2) results in an objective function 0 * x + 0 * y which is always equal to 0 and can't be maximized : the first valid x,y will be selected.
Optimization on x is achieved by using obj <- c(1,0), resulting in maximization / minimization of 1 * x + 0 * y.
Optimization on y is achieved by using obj <- c(0,1).
#setting the bounds is necessary, otherwise optimization occurs only for x>=0 and y>=0
bounds <- list(lower = list(ind = c(1L, 2L), val = c(-Inf, -Inf)),
upper = list(ind = c(1L, 2L), val = c(Inf, Inf)))
# finding maximum x: obj = c(1,0), max = T
Rglpk_solve_LP(obj = c(10,0), mat = mat, dir = dir, rhs = rhs,bound=bounds, max = T)$solution
# [1] 4.4 -3.0
# finding minimum x: obj = c(1,0), max = F
Rglpk_solve_LP(obj = c(10,0), mat = mat, dir = dir, rhs = rhs,bound=bounds, max = F)$solution
#[1] 0 -3
# finding maximum y: obj = c(0,1), max = T
Rglpk_solve_LP(obj = c(0,1), mat = mat, dir = dir, rhs = rhs,bound=bounds, max = T)$solution
#[1] 1.6923077 0.3846154

How to simulate data and visualize in a single R function

I'm using replicate to simulate distributions in R and visualize how they change with different parameters (e.g., rbinom(100,1,0.5) vs. rbinom(100,1,0.01)).
I'd like to do all of this within a single function that 1. simulates replicates, 2. sets the plotting dimensions and parameters, and 3. loops through and draws density curves.
In separate pieces, this code works no problem:
n <- 100
d <- as.data.frame(
replicate(n,
expr = rbinom(n, 1, 0.5),
simplify = F)
)
colnames(d) <- 1:n
plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,2))
for(i in 1:n) lines( density( d[,i]) )
But inside a function, only a single density curve is returned:
plotcurves <- function(n, distr, ymax) {
d <- as.data.frame(
replicate(n,
expr = distr,
simplify = F)
)
colnames(d) <- 1:n
plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,ymax))
for(i in 1:n) lines( density( d[,i]) )
}
plotcurves(n = 100, distr = rbinom(100, 1, 0.5), ymax = 2)
The solution seems like it would be very simple but I cannot seem to find it.
What do I need to do to fix the code OR does a function like this already exist that I am unaware of?
The problem is that in your function, distr is evaluated before it reaches the call to replicate. You can see this if you make a variation of the function that just returns the data frame d instead of plotting it:
show_d <- function(n, distr, ymax)
{
d <- as.data.frame(
replicate(n,
expr = distr,
simplify = F)
)
return(d)
}
show_d(n = 3, distr = rbinom(5, 1, 0.5), ymax = 2)
#> c.1L..0L..1L..1L..1L. c.1L..0L..1L..1L..1L..1 c.1L..0L..1L..1L..1L..2
#> 1 1 1 1
#> 2 0 0 0
#> 3 1 1 1
#> 4 1 1 1
#> 5 1 1 1
You'll notice the columns are all the same. Effectively, the call to rbinom was evaluated then passed to replicate, which is the same as calling replicate(3, c(1, 0, 1, 1, 1)). So you are plotting all the lines - it's just that the lines are all the same.
What you need to do inside a function is to ensure that distr is passed as a call to replicate rather than being evaluated and sent as a vector. You can do this using match.call() and extracting the third element (which is the second parameter):
plotcurves <- function(n, distr, ymax) {
mc <- match.call()[[3]]
d <- as.data.frame(
replicate(n,
expr = mc,
simplify = F)
)
colnames(d) <- 1:n
plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,ymax))
for(i in 1:n) lines( density( d[,i]) )
}
plotcurves(n = 100, distr = rbinom(100, 1, 0.5), ymax = 2)

Finding the x value of a curve given f(x) in R?

I was wondering why I can't find the other existing value of x whose f(x) equals the f(.6)?
In other words, I'm wondering how to find the x value of the point indicated by the red X in the picture below?
Here is what I have tried without success:
source("https://raw.githubusercontent.com/rnorouzian/i/master/ii.r") # source the function
f <- function(x, n.pred = 5, N = 100, conf.level = .95){
ci <- R2.ci(R2 = x, n.pred = n.pred, N = N, conf.level = conf.level) # The objective function
ci$upper - ci$lower
}
curve(f, panel.f = abline(v = .6, h = f(.6), col = 2, lty = c(2, 1))) # curve the function
uniroot(function(x) f(.6) - f(x), c(0, 1))[[1]] # find the requested 'x' value
`Error: f() values at end points not of opposite sign`
abline(v=uniroot(function(x) f(.6) - f(x), c(0, 0.4))[[1]])

Using R to optimize parameters of a function

I am acquainted with the optimization functions in R. I tried three different ways:
library(optimx)
library(optimr)
f = function(par){20*par[1] - 3*par[1]^2 + par[1] * par[2]}
# First try wiht optim()
result1 = optim(par = c(0,0), fn = f, upper = c(5, 2), lower = c(0, 0), method = "L-BFGS-B", control = list(fnscale = -1))
coef(result1) # Null
# Second try with optimr
result2 = optimr(par = c(0,0), fn = f, upper = c(5, 2), lower = c(0, 0), method = "L-BFGS-B", control = list(maximize = TRUE))
coef(result2) # Null
# Third try with optimx
result3 = optimx(par = c(0,0), fn = f, upper = c(5, 2), lower = c(0, 0), method = "L-BFGS-B", control = list(maximize = TRUE))
coef(result3)
p1 : 3.666667
p2 : 2
Why optim() and optimr() fail where optimx() succeeds?
First, your f takes two arguments rather than a vector of length two (you need the latter). Second, it's not coef what gives the result; I suspect that you want result$par. Hence,
library(optimr)
f <- function(x) -(20 * x[1] - 3 * x[1]^2 + x[2] / 2)
result <- optimr(par = c(1, 1), fn = f, upper = c(5, 2), lower = c(0, 0),
method = "L-BFGS-B")
result$par
# [1] 3.333333 2.000000
where I added a minus sign to f to minimize the function as optimr somehow was unable to maximize it.
This is a solution with base optim function for your maximization. The problem you are having is that it should be one vector with all the parameters you want to solve 2, not independent parameters. I tried using optimr but for some reason it dint maximize.
f = function(x){-1*(20*x[1] - 3*x[1]^2 + x[2]/2)}
result = optim(par = c(1,1), fn = f, upper = c(5, 2), lower = c(0, 0), method = "L-BFGS-B")
No need for the {optimr} package, base R does just fine:
optim(
c(1, 1),
function (x) -f(x[1], x[2]),
method = 'L-BFGS-B'
)
$par
[1] 3.325034e+00 1.682724e+13
$value
[1] -8.413622e+12
$counts
function gradient
40 40
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Note that we need to adjust the parameter arity of the function (optim uses a single vector of parameters), and, since we want to maximise, we invert the sign of the objective function. Hence we pass function (x) -f(x[1], x[2]) as fn rather than simply f.

Automatically solve an equation of `pt` for `ncp`

I wonder if it is possible to efficiently change ncp in the below code such that x becomes .025 and .975 (within rounding error).
x <- pt(q = 5, df = 19, ncp = ?)
----------
Clarification
q = 5 and df = 19 (above) are just two hypothetical numbers, so q and df could be any other two numbers. What I expect is a function / routine, that takes q and df as input.
What is wrong with uniroot?
f <- function (ncp, alpha) pt(q = 5, df = 19, ncp = ncp) - alpha
par(mfrow = c(1,2))
curve(f(ncp, 0.025), from = 5, to = 10, xname = "ncp", main = "0.025")
abline(h = 0)
curve(f(ncp, 0.975), from = 0, to = 5, xname = "ncp", main = "0.975")
abline(h = 0)
So for 0.025 case, the root lies in (7, 8); for 0.975 case, the root lies in (2, 3).
uniroot(f, c(7, 8), alpha = 0.025)$root
#[1] 7.476482
uniroot(f, c(2, 3), alpha = 0.975)$root
#[1] 2.443316
---------
(After some discussion...)
OK, now I see your ultimate goal. You want to implement this equation solver as a function, with input q and df. So they are unknown, but fixed. They might come out of an experiment.
Ideally if there is an analytical solution, i.e., ncp can be written as a formula in terms of q, df and alpha, that would be so great. However, this is not possible for t-distribution.
Numerical solution is the way, but uniroot is not a great option for this purpose, as it relies on "plot - view - guess - specification". The answer by loki is also crude but with some improvement. It is a grid search, with fixed step size. Start from a value near 0, say 0.001, and increase this value and check for approximation error. We stop when this error fails to decrease.
This really initiates the idea of numerical optimization with Newton-method or quasi-Newton method. In 1D case, we can use function optimize. It does variable step size in searching, so it converges faster than a fixed step-size searching.
Let's define our function as:
ncp_solver <- function (alpha, q, df) {
## objective function: we minimize squared approximation error
obj_fun <- function (ncp, alpha = alpha, q = q, df = df) {
(pt(q = q, df = df, ncp = ncp) - alpha) ^ 2
}
## now we call `optimize`
oo <- optimize(obj_fun, interval = c(-37.62, 37.62), alpha = alpha, q = q, df = df)
## post processing
oo <- unlist(oo, use.names = FALSE) ## list to numerical vector
oo[2] <- sqrt(oo[2]) ## squared error to absolute error
## return
setNames(oo, c("ncp", "abs.error"))
}
Note, -37.62 / 37.62 is chosen as lower / upper bound for ncp, as it is the maximum supported by t-distribution in R (read ?dt).
For example, let's try this function. If you, as given in your question, has q = 5 and df = 19:
ncp_solver(alpha = 0.025, q = 5, df = 19)
# ncp abs.error
#7.476472e+00 1.251142e-07
The result is a named vector, with ncp and absolute approximation error.
Similarly we can do:
ncp_solver(alpha = 0.975, q = 5, df = 19)
# ncp abs.error
#2.443347e+00 7.221928e-07
----------
Follow up
Is it possible that in the function ncp_solver(), alpha takes a c(.025, .975) together?
Why not wrapping it up for a "vectorization":
sapply(c(0.025, 0.975), ncp_solver, q = 5, df = 19)
# [,1] [,2]
#ncp 7.476472e+00 2.443347e+00
#abs.error 1.251142e-07 7.221928e-07
How come 0.025 gives upper bound of confidence interval, while 0.975 gives lower bound of confidence interval? Should this relationship reversed?
No surprise. By default pt computes lower tail probability. If you want the "right" relationship, set lower.tail = FALSE in pt:
ncp_solver <- function (alpha, q, df) {
## objective function: we minimize squared approximation error
obj_fun <- function (ncp, alpha = alpha, q = q, df = df) {
(pt(q = q, df = df, ncp = ncp, lower.tail = FALSE) - alpha) ^ 2
}
## now we call `optimize`
oo <- optimize(obj_fun, interval = c(-37.62, 37.62), alpha = alpha, q = q, df = df)
## post processing
oo <- unlist(oo, use.names = FALSE) ## list to numerical vector
oo[2] <- sqrt(oo[2]) ## squared error to absolute error
## return
setNames(oo, c("ncp", "abs.error"))
}
Now you see:
ncp_solver(0.025, 5, 19)[[1]] ## use "[[" not "[" to drop name
#[1] 2.443316
ncp_solver(0.975, 5, 19)[[1]]
#[1] 7.476492
--------
Bug report and fix
I was reported that the above ncp_solver is unstable. For example:
ncp_solver(alpha = 0.025, q = 0, df = 98)
# ncp abs.error
#-8.880922 0.025000
But on the other hand, if we double check with uniroot here:
f <- function (ncp, alpha) pt(q = 0, df = 98, ncp = ncp, lower.tail = FALSE) - alpha
curve(f(ncp, 0.025), from = -3, to = 0, xname = "ncp"); abline(h = 0)
uniroot(f, c(-2, -1.5), 0.025)$root
#[1] -1.959961
So there is clearly something wrong with ncp_solver.
Well it turns out that we can not use too big bound, c(-37.62, 37.62). If we narrow it to c(-35, 35), it will be alright.
Also, to avoid tolerance problem, we can change objective function from squared error to absolute error:
ncp_solver <- function (alpha, q, df) {
## objective function: we minimize absolute approximation error
obj_fun <- function (ncp, alpha = alpha, q = q, df = df) {
abs(pt(q = q, df = df, ncp = ncp, lower.tail = FALSE) - alpha)
}
## now we call `optimize`
oo <- optimize(obj_fun, interval = c(-35, 35), alpha = alpha, q = q, df = df)
## post processing and return
oo <- unlist(oo, use.names = FALSE) ## list to numerical vector
setNames(oo, c("ncp", "abs.error"))
}
ncp_solver(alpha = 0.025, q = 0, df = 98)
# ncp abs.error
#-1.959980e+00 9.190327e-07
Damn, this is a pretty annoying bug. But relax now.
Report on getting warning messages from pt
I also receive some report on annoying warning messages from pt:
ncp_solver(0.025, -5, 19)
# ncp abs.error
#-7.476488e+00 5.760562e-07
#Warning message:
#In pt(q = q, df = df, ncp = ncp, lower.tail = FALSE) :
# full precision may not have been achieved in 'pnt{final}'
I am not too sure what is going on here, but meanwhile I did not observe misleading result. Therefore, I decide to suppress those warnings from pt, using suppressWarnings:
ncp_solver <- function (alpha, q, df) {
## objective function: we minimize absolute approximation error
obj_fun <- function (ncp, alpha = alpha, q = q, df = df) {
abs(suppressWarnings(pt(q = q, df = df, ncp = ncp, lower.tail = FALSE)) - alpha)
}
## now we call `optimize`
oo <- optimize(obj_fun, interval = c(-35, 35), alpha = alpha, q = q, df = df)
## post processing and return
oo <- unlist(oo, use.names = FALSE) ## list to numerical vector
setNames(oo, c("ncp", "abs.error"))
}
ncp_solver(0.025, -5, 19)
# ncp abs.error
#-7.476488e+00 5.760562e-07
OK, quiet now.
You could use two while loops like this:
i <- 0.001
lowerFound <- FALSE
while(!lowerFound){
x <- pt(q = 5, df = 19, ncp = i)
if (round(x, 3) == 0.025){
lowerFound <- TRUE
print(paste("Lower is", i))
lower <- i
} else {
i <- i + 0.0005
}
}
i <- 0.001
upperFound <- FALSE
while(!upperFound){
x <- pt(q = 5, df = 19, ncp = i)
if (round(x, 3) == 0.975){
upperFound <- TRUE
print(paste("Upper is ", i))
upper <- i
} else {
i <- i + 0.0005
}
}
c(Lower = lower, Upper = upper)
# Lower Upper
# 7.4655 2.4330
Of course, you can adapt the increment in i <- i + .... or change the check if (round(x,...) == ....) to fit this solution to your specific needs of accuracy.
I know this is an old question, but there is now a one-line solution to this problem using the conf.limits.nct() function in the MBESS package.
install.packages("MBESS")
library(MBESS)
result <- conf.limits.nct(t.value = 5, df = 19)
result
$Lower.Limit
[1] 2.443332
$Prob.Less.Lower
[1] 0.025
$Upper.Limit
[1] 7.476475
$Prob.Greater.Upper
[1] 0.025
$Lower.Limit is the result where pt = 0.975
$Upper.Limit is the result where pt = 0.025
pt(q=5,df=19,ncp=result$Lower.Limit)
[1] 0.975
> pt(q=5,df=19,ncp=result$Upper.Limit)
[1] 0.025

Resources