I have the following function.
Let F(.) is the cumulative distribution function of the gamma distribution with shape = 1 and rate =1. The denominator is the survival function S(X) = 1 - F(X). The g(x) is the mean residual life function.
I wrote the following function in r.
x = 5
denominator = 1 -pgamma(x, 1, 1)
numerator = function(t) (1 - pgamma(t, 1, 1))
intnum = integrate(numerator , x, Inf)
frac = intnum$value/denominator
frac
How can I find the maximum of the function g(x) for all possible values of X >= 0? Am I able to do this in r? Thank you very much for your help.
Before start, I defined the function you made
surviveFunction<-function(x){
denominator = 1 -pgamma(x, 1, 1)
numerator = function(t) (1 - pgamma(t, 1, 1))
# I used sapply to get even vector x
intnum = sapply(x,function(x){integrate(numerator , x, Inf)$value})
frac = intnum/denominator
return(frac)
}
Then let's fit our function to function called 'curve' it will draw the plot with continuous data.
The result is shown below:
df = curve(surviveFunction, from=0, to=45)
plot(df, type='l')
And adjust the xlim to find the maximum value
df = curve(surviveFunction, from=0, to=45,xlim = c(30,40))
plot(df, type='l')
And now we can guess the global maximum is located in near 35
I suggest two options to find the global maximum.
First using the df data to find maximum:
> max(df$y,na.rm = TRUE)
1.054248 #maximum value
> df$x[which(df$y==(max(df$y,na.rm = TRUE)))]
35.55 #maximum value of x
Second using the optimize:
> optimize(surviveFunction, interval=c(34, 36), maximum=TRUE)
$maximum
[1] 35.48536
$objective
[1] 1.085282
But the optimize function finds the not the global maximum value i think.
If you see below
optimize(surviveFunction, interval=c(0, 36), maximum=TRUE)
$maximum
[1] 11.11381
$objective
[1] 0.9999887
Above result is not the global maximum I guess it is local maximum.
So, I suggest you using first solution.
Related
I am trying to use a for-loop over a range of positive and negative values and then plot the results. However, I'm having trouble getting R not to plot the correct values, since the negative values seem to screw up the index.
More precisely, the code I am running is:
# Setup objects
R = (1:20)
rejection = rep(NA, 20)
t = seq(from = -10, to = 10, by = 1)
avg_rej_freq = rep(NA, 21)
# Test a hypothesis for each possible value of x and each replication
for (x in t) {
for (r in R) {
# Generate 1 observation from N(x,1)
y = rnorm(1, x, 1)
# Take the average of this observation
avg_y = mean(y)
# Test this observation using the test we found in part a
if (avg_y >= 1 + pnorm(.95))
{rejection[r] = 1}
if (y < 1 + pnorm(.95))
{rejection[r] = 0}
}
# Calculate the average rejection frequency across the 20 samples
avg_rej_freq[x] = mean(rejection)
}
# Plot the different values of x against the average rejection frequency
plot(t, avg_rej_freq)
The resulting graph should look something like this
# Define the rejection probability for n=1
rej_prob = function(x)(1-pnorm(1-x+qnorm(0.95)))
# Plot it
curve(rej_prob,from = -10, to = 10, xlab = expression(theta),
ylab = "Rejection probability")
...but there's clearly something wrong with my code that is shifting the positive values on the graph over to the left.
Any help on how to fix this would be much appreciated!
Yep, as you suspected the negative indices are causing problems. R doesn't know how to store something as the "negative first" object in a vector, so it just drops them. Instead, try using seq_along to produce a vector of all positive indices and looping over those instead:
# Setup objects
R = (1:20)
rejection = rep(NA, 20)
t = seq(from = -10, to = 10, by = 1)
avg_rej_freq = rep(NA, 21)
# Test a hypothesis for each possible value of x and each replication
for (x in seq_along(t)) {
for (r in R) {
# Generate 1 observation from N(x,1)
# Now we ask for the value of t at index x rather than t directly
y = rnorm(1, t[x], 1)
# Take the average of this observation
avg_y = mean(y)
# Test this observation using the test we found in part a
if (avg_y >= 1 + pnorm(.95))
{rejection[r] = 1}
if (y < 1 + pnorm(.95))
{rejection[r] = 0}
}
# Calculate the average rejection frequency across the 20 samples
avg_rej_freq[x] = mean(rejection)
}
# Plot the different values of x against the average rejection frequency
plot(t, avg_rej_freq)
which produces the following plot:
Not sure why you want to simulate the vectorized function pnrom() using for loops, still correcting the mistakes in your code (check the comments):
# Test a hypothesis for each possible value of x and each replication
for (x in t) {
for (r in R) {
# Generate 1 observation from N(x,1)
y = rnorm(1, x, 1)
# no need to take average since you have a single observation
# Test this observation using the test we found in part a
rejection[r] = ifelse(y >= 1 + pnorm(.95), 1, 0)
}
# Calculate the average rejection frequency across the 20 samples
# `R` vector index starts from 1, transform your x values s.t., negative values become positive
avg_rej_freq[x-min(t)+1] = mean(rejection)
}
# Define the rejection probability for n=1
rej_prob = function(x)(1-pnorm(1-x+qnorm(0.95)))
# Plot it
curve(rej_prob,from = -10, to = 10, xlab = expression(theta),
ylab = "Rejection probability")
# plot your points
points(t, avg_rej_freq, pch=19, col='red')
Not sure why the for loops etc, what you are doing can be collapsed into a one line. The rest of the code taken from #Sandipan Dey:
R <- 20
t <- seq(from = -10, to = 10, by = 1)
#All the for-loops collapsed into this one line:
avg_rej_freq <- rowMeans(matrix(rnorm(R * length(t), t), 21) >= 1 + pnorm(.95))
rej_prob <- function(x) 1 - pnorm(1 - x + qnorm(0.95))
curve(rej_prob,from = -10, to = 10, xlab = expression(theta),
ylab = "Rejection probability")
# plot your points
points(t, avg_rej_freq, pch=19, col='red')
How would you find local maxima for the function below, with the two inputs having different intervals?
f <- function(x, y) {
y/50*(100*x)^0.9 + (50-y)/y*(80*(10-x))^0.8
}
# interval for y = (0, 50)
# interval for x = (0, 10)
I looked into using the optim function, but I couldn't figure out how to set "par," intervals, and other arguments for the two input variables.
optim expects a function with a vector argument (+ optional parameters), i.e. to make the function given in the question work with optim one has to slightly alter the function or use some kind of wrapper function, like so:
# Function of two scalar inputs
f_xy <- function(x, y) {
y / 50 * (100 * x)^0.9 + (50 - y) / y * (80 * (10 - x))^0.8
}
# Wrapper or helper function with vector argument
f <- function(x) {
y <- x[2]
x <- x[1]
f_xy(x, y)
}
# Default optim with starting value c(x = 5, y = 20)
optim(c(5, 20), fn = f)
Note: By default optim performs minimization.
To maximise: Set control = list(fnscale = -1) (Thanks to #alistaire for pointing that out in the comments.)
optim(c(5, 20), fn = f, control = list(fnscale = -1))
However, for the function given in the question the optim output shows that the algorithm has not converged ($convergence != 0) (Thanks to #alistaire for pointing that out in the comments.):
optim(c(5, 20), fn = f, control = list(fnscale = -1))
#> $par
#> [1] 1.591824e+00 3.861200e-34
#>
#> $value
#> [1] 2.368542e+37
#>
#> $counts
#> function gradient
#> 501 NA
#>
#> $convergence
#> [1] 1
#>
#> $message
#> NULL
In the case given $convergence of 1 means that the maximum number of iterations was reached. One may tackle this problem by increasing the maximum number of iteration by e.g. setting control = list(..., maxit = 500)). However, this will not solve the problem as optim still fails to converge.
I wanted to set a small dataframe in order to plot myself some points of the incomplete elliptic integral of 1st kind for different values of amplitude phi and modulus k. The function to integrate is 1/sqrt(1 - (k*sin(x))^2) between 0 and phi.Here is the code I imagined:
v.phi <- seq(0, 2*pi, 1)
n.phi <- length(v.phi)
v.k <- seq(-1, +1, 0.5)
n.k <- length(v.k)
k <- rep(v.k, each = n.phi, times = 1)
phi <- rep(v.phi, each = 1, times = n.k)
df <- data.frame(k, phi)
func <- function(x, k) 1/sqrt(1 - (k*sin(x))^2)
df$area <- integrate(func,lower=0, upper=df$phi, k=df$k)
But this generates errors and I am obviously mistaking in constructing the new variable df$area... Could someone put me in the right way?
You can use mapply:
df$area <- mapply(function(phi,k){
integrate(func, lower=0, upper=phi, k=k)$value
}, df$phi, df$k)
However that generates an error because there are some values of k equal to 1 or -1, while the allowed values are -1 < k < 1. You can't evaluate this integral for k = +/- 1.
Note that there's a better way to evaluate this integral: the incomplete elliptic function of the first kind is implemented in the gsl package:
> integrate(func, lower=0, upper=6, k=0.5)$value
[1] 6.458877
> gsl::ellint_F(6, 0.5)
[1] 6.458877
As I said, this function is not defined for k=-1 or k=1:
> gsl::ellint_F(6, 1)
[1] NaN
> gsl::ellint_F(6, -1)
[1] NaN
> integrate(func, lower=0, upper=6, k=1)
Error in integrate(func, lower = 0, upper = 6, k = 1) :
non-finite function value
I would like to find the root of log(x) = x2 − 2 using uniroot in R
f <- function(x) (log(x)+2-x^2)
uniroot(f, lower=0, upper=100000000)$root
But this shows the error
Error in uniroot(f, lower = 0, upper = 1e+08) : f() values at end
points not of opposite sign
uniroot requires an interval where the function has opposite signs at the two endpoints (since it uses a variation of the bisection method). It isn't a bad idea to do a quick plot when you don't know about just where to look:
f <- function(x) (log(x)+2-x^2)
x <- seq(0.0,4,0.01)
y <- f(x)
plot(x,y,ylim = c(-1,1),type = "l")
abline(h=0)
This yields:
From this you can see that there are two roots, one between 0 and 1, and one between 1 and 2:
uniroot(f,interval = c(0,1))$root #returns 0.1379346
uniroot(f,interval = c(1,2))$root #returns 1.564445
Given the following function:
f(x) = (1/2*pi)(1/(1+x^2/4))
How do I identify it's distribution and write this distribution function in R?
So this is your function right now (hopefully you know how to write an R function; if not, check writing your own function):
f <- function (x) (pi / 2) * (1 / (1 + 0.25 * x ^ 2))
f is defined on (-Inf, Inf) so integration on this range gives an indefinite integral. Fortunately, it approaches to Inf at the speed of x ^ (-2), so the integral is well defined, and can be computed:
C <- integrate(f, -Inf, Inf)
# 9.869604 with absolute error < 1e-09
C <- C$value ## extract integral value
# [1] 9.869604
Then you want to normalize f, as we know that a probability density should integrate to 1:
f <- function (x) (pi / 2) * (1 / (1 + 0.25 * x ^ 2)) / C
You can draw its density by:
curve(f, from = -10, to = 10)
Now that I have the probably distribution function I was wondering how to create a random sample of say n = 1000 using this new distribution function?
An off-topic question, but OK to answer without your making a new thread. Useful as it turns out subtle.
Compare
set.seed(0); range(simf(1000, 1e-2))
#[1] -56.37246 63.21080
set.seed(0); range(simf(1000, 1e-3))
#[1] -275.3465 595.3771
set.seed(0); range(simf(1000, 1e-4))
#[1] -450.0979 3758.2528
set.seed(0); range(simf(1000, 1e-5))
#[1] -480.5991 8017.3802
So I think e = 1e-2 is reasonable. We could draw samples, make a (scaled) histogram and overlay density curve:
set.seed(0); x <- simf(1000)
hist(x, prob = TRUE, breaks = 50, ylim = c(0, 0.16))
curve(f, add = TRUE, col = 2, lwd = 2, n = 201)