Data generating for the functional regression problem - r

Have a problem with data generating and I have no idea how to solve this. All information provided in photo: Problem.
I think that X_i(t) in both cases should be 200 x 100 if we say that t is from 0 to 1 (length = 100). Furthermore, coefficients for polynomial should contain 200 x 4 and coefficients for fourier should contain 200 x 5. Bu I have no idea how to start to solve this problem.
Here is some code. So, I have already defined my beta's, but I can't defeat generating of X_i(t).
t <- seq(0, 1, length = 100)
beta_1t <- rep(0, 100)
plot(t, beta_1t, type = "l")
beta_2t <- (t >= 0 & t < 0.342) * ((t - 0.5)^2 - 0.025) +
(t >= 0.342 & t <= 0.658) * 0 +
(t > 0.658 & t <= 1) * (-(t - 0.5)^2 + 0.025)
plot(t, beta_2t, type = "l")
beta_3t <- t^3 - 1.6 * t^2 + 0.76 * t + 1
plot(t, beta_3t, type = "l")
poly_c <- matrix(rnorm(n = 800, mean = 0, sd = 1), ncol = 4)
four_c <- matrix(rnorm(n = 1000, mean = 0, sd = 1), ncol = 5)
As I mentioned before, there should be (X_i(t), Y_i(t)) samples. Here i = 1, 2, ..., 200; t from [0, 1] (length = 100).

Related

How to define a function of `f_n-chi-square and use `uniroot` to find Confidence Interval?

I want to get a 95% confidence interval for the following question.
I have written function f_n in my R code. I first randomly sample 100 with Normal and then I define function h for lambda. Then I can get f_n. My question is that how to define a function of f_n-chi-square and use uniroot` to find Confidence interval.
# I first get 100 samples
set.seed(201111)
x=rlnorm(100,0,2)
Based on the answer by #RuiBarradas, I try the following code.
set.seed(2011111)
# I define function h, and use uniroot function to find lambda
h <- function(lam, n)
{
sum((x - theta)/(1 + lam*(x - theta)))
}
# sample size
n <- 100
# the parameter of interest must be a value in [1, 12],
#true_theta<-1
#true_sd<- exp(2)
#x <- rnorm(n, mean = true_theta, sd = true_sd)
x=rlnorm(100,0,2)
xmax <- max(x)
xmin <- min(x)
theta_seq = seq(from = 1, to = 12, by = 0.01)
f_n <- rep(NA, length(theta_seq))
for (i in seq_along(theta_seq))
{
theta <- theta_seq[i]
lambdamin <- (1/n-1)/(xmax - theta)
lambdamax <- (1/n-1)/(xmin - theta)
lambda = uniroot(h, interval = c(lambdamin, lambdamax), n = n)$root
f_n[i] = -sum(log(1 + lambda*(x - theta)))
}
j <- which.max(f_n)
max_fn <- f_n[j]
mle_theta <- theta_seq[j]
plot(theta_seq, f_n, type = "l",
main = expression(Estimated ~ theta),
xlab = expression(Theta),
ylab = expression(f[n]))
points(mle_theta, f_n[j], pch = 19, col = "red")
segments(
x0 = c(mle_theta, xmin),
y0 = c(min(f_n)*2, max_fn),
x1 = c(mle_theta, mle_theta),
y1 = c(max_fn, max_fn),
col = "red",
lty = "dashed"
)
I got the following plot of f_n.
For 95% CI, I try
LR <- function(theta, lambda)
{
2*sum(log(1 + lambda*(x - theta))) - qchisq(0.95, df = 1)
}
lambdamin <- (1/n-1)/(xmax - mle_theta)
lambdamax <- (1/n-1)/(xmin - mle_theta)
lambda <- uniroot(h, interval = c(lambdamin, lambdamax), n = n)$root
uniroot(LR, c(xmin, mle_theta), lambda = lambda)$root
The result is 0.07198144. Then the logarithm is log(0.07198144)=-2.631347.
But there is NA in the following code.
uniroot(LR, c(mle_theta, xmax), lambda = lambda)$root
So the 95% CI is theta >= -2.631347.
But the question is that the 95% CI should be a closed interval...
Here is a solution.
First of all, the data generation code is wrong, the parameter theta is in the interval [1, 12], and the data is generated with rnorm(., mean = 0, .). I change this to a true_theta = 5.
set.seed(2011111)
# I define function h, and use uniroot function to find lambda
h <- function(lam, n)
{
sum((x - theta)/(1 + lam*(x - theta)))
}
# sample size
n <- 100
# the parameter of interest must be a value in [1, 12],
true_theta <- 5
true_sd <- 2
x <- rnorm(n, mean = true_theta, sd = true_sd)
xmax <- max(x)
xmin <- min(x)
theta_seq <- seq(from = xmin + .Machine$double.eps^0.5,
to = xmax - .Machine$double.eps^0.5, by = 0.01)
f_n <- rep(NA, length(theta_seq))
for (i in seq_along(theta_seq))
{
theta <- theta_seq[i]
lambdamin <- (1/n-1)/(xmax - theta)
lambdamax <- (1/n-1)/(xmin - theta)
lambda = uniroot(h, interval = c(lambdamin, lambdamax), n = n)$root
f_n[i] = -sum(log(1 + lambda*(x - theta)))
}
j <- which.max(f_n)
max_fn <- f_n[j]
mle_theta <- theta_seq[j]
plot(theta_seq, f_n, type = "l",
main = expression(Estimated ~ theta),
xlab = expression(Theta),
ylab = expression(f[n]))
points(mle_theta, f_n[j], pch = 19, col = "red")
segments(
x0 = c(mle_theta, xmin),
y0 = c(min(f_n)*2, max_fn),
x1 = c(mle_theta, mle_theta),
y1 = c(max_fn, max_fn),
col = "red",
lty = "dashed"
)
LR <- function(theta, lambda)
{
2*sum(log(1 + lambda*(x - theta))) - qchisq(0.95, df = 1)
}
lambdamin <- (1/n-1)/(xmax - mle_theta)
lambdamax <- (1/n-1)/(xmin - mle_theta)
lambda <- uniroot(h, interval = c(lambdamin, lambdamax), n = n)$root
uniroot(LR, c(xmin, mle_theta), lambda = lambda)$root
#> [1] 4.774609
Created on 2022-03-25 by the reprex package (v2.0.1)
The one-sided CI95 is theta >= 4.774609.

R nls(); Error in nlsModel: singular gradient matrix at initial parameter estimates

I receive an error from nls function in R. I search some similar questions, but do not solve this problem. For example, I try to use nlsLM from library 'minpack.lm', it also fails. So I have to ask for help here. Following is the code:
tt = c(10, 30, 50, 90, 180, 360, 720, 1440, 2880, 4320, 8640, 12960)
x = c(
1.53901e-06,
1.22765e-06,
1.11200e-06,
9.25185e-07,
8.71809e-07,
8.80705e-07,
8.36225e-07,
7.82849e-07,
8.18433e-07,
6.04928e-07,
3.46944e-07,
4.44800e-07
)
y = c(
3.81639e-06,
5.00623e-06,
4.62815e-06,
5.10631e-06,
4.48359e-06,
3.30487e-06,
2.64879e-06,
2.13727e-06,
8.02865e-07,
1.91487e-06,
3.73855e-06,
2.32631e-06
)
nt = length(tt)
L0 = 0.005
y0 = 0.000267681
model = function(K, Kd, k1) {
eta = 5 / (4 * Kd + 40)
eta1 = 1 - eta
eta1_seq = eta1 ^ c(0:(nt - 1))
Lt = L0 * eta * cumsum(eta1_seq)
b = K * x - K * Lt + 1
L = (-b + sqrt(b ^ 2.0 + 4 * K * Lt)) / (2 * K)
cx = x * K * L / (K * L + 1)
qx = Kd * cx
q1 = y0 * (1 - k1 * sqrt(tt))
y = qx + q1
return(y)
}
fit <- nls(
y ~ model(K, Kd, k1),
start = list(K = 1e+15,
Kd = 10,
k1 = 1e-5),
lower = c(1e+13, 1, 1e-10),
upper = c(1e+20, 200, 1e-3),
algorithm = "port"
)
Thanks in advance for your help!

Why is my Monte Carlo Integration wrong by a factor of 2?

I am trying to integrate the following function using a Monte Carlo Integration. The interval I want to integrate is x <- seq(0, 1, by = 0.01) and y <- seq(0, 1, by = 0.01).
my.f <- function(x, y){
result = x^2 + sin(x) + exp(cos(y))
return(result)
}
I calculated the integral using the cubature package.
library(cubature)
library(plotly)
# Rewriting the function, so it can be integrated
cub.function <- function(x){
result = x[1]^2 + sin(x[1]) + exp(cos(x[2]))
return(result)
}
cub.integral <- adaptIntegrate(f = cub.function, lowerLimit = c(0,0), upperLimit = c(1,1))
The result is 3.134606. But when I use my Monte Carlo Integration Code, see below, my result is about 1.396652. My code is wrong by more than a factor of 2!
What I did:
Since I need a volume to conduct a Monte Carlo Integration, I calculated the function values on the mentioned interval. This will give me an estimation of the maximum and minimum of the function.
# My data range
x <- seq(0, 1, by = 0.01)
y <- seq(0, 1, by = 0.01)
# The matrix, where I save the results
my.f.values <- matrix(0, nrow = length(x), ncol = length(y))
# Calculation of the function values
for(i in 1:length(x)){
for(j in 1:length(y)){
my.f.values[i,j] <- my.f(x = x[i], y = y[j])
}
}
# The maximum and minimum of the function values
max(my.f.values)
min(my.f.values)
# Plotting the surface, but this is not necessary
plot_ly(y = x, x = y, z = my.f.values) %>% add_surface()
So, the volume that we need is simply the maximum of the function values, since 1 * 1 * 4.559753 is simply 4.559753.
# Now, the Monte Carlo Integration
# I found the code online and modified it a bit.
monte = function(x){
tests = rep(0,x)
hits = 0
for(i in 1:x){
y = c(runif(2, min = 0, max = 1), # y[1] is y; y[2] is y
runif(1, min = 0, max = max(my.f.values))) # y[3] is z
if(y[3] < y[1]**2+sin(y[1])*exp(cos(y[2]))){
hits = hits + 1
}
prop = hits / i
est = prop * max(my.f.values)
tests[i] = est
}
return(tests)
}
size = 10000
res = monte(size)
plot(res, type = "l")
lines(x = 1:size, y = rep(cub.integral$integral, size), col = "red")
So, the result is completely wrong. But if I change the function a bit, suddenly is works.
monte = function(x){
tests = rep(0,x)
hits = 0
for(i in 1:x){
x = runif(1)
y = runif(1)
z = runif(1, min = 0, max = max(my.f.values))
if(z < my.f(x = x, y = y)){
hits = hits + 1
}
prop = hits / i
est = prop * max(my.f.values)
tests[i] = est
}
return(tests)
}
size = 10000
res = monte(size)
plot(res, type = "l")
lines(x = 1:size, y = rep(cub.integral$integral, size), col = "red")
Can somebody explain why the result suddenly changes? To me, both functions seem to do the exact same thing.
In your (first) code for monte, this line is in error:
y[3] < y[1]**2+sin(y[1])*exp(cos(y[2]))
Given your definition of my.f, it should surely be
y[3] < y[1]**2 + sin(y[1]) + exp(cos(y[2]))
Or..., given that you shouldn't be repeating yourself unnecessarily:
y[3] < my.f(y[1], y[2])

How to speed up the process of nonlinear optimization in R

Consider the following example of nonlinear optimization problem. The procedure is too slow to apply in simulation studies. For example, in case of my studies, it takes 2.5 hours for only one replication. How to speed up the process so that the processing time could also be optimized?
library(mvtnorm)
library(alabama)
n = 200
X <- matrix(0, nrow = n, ncol = 2)
X[,1:2] <- rmvnorm(n = n, mean = c(0,0), sigma = matrix(c(1,1,1,4),
ncol = 2))
x0 = matrix(c(X[1,1:2]), nrow = 1)
y0 = x0 - 0.5 * log(n) * (colMeans(X) - x0)
X = rbind(X, y0)
x01 = y0[1]
x02 = y0[2]
x1 = X[,1]
x2 = X[,2]
pInit = matrix(rep(0.1, n + 1), nrow = n + 1)
outopt = list(kkt2.check=FALSE, "trace" = FALSE)
f1 <- function(p) sum(sqrt(pmax(0, p)))/sqrt(n+1)
heq1 <- function(p) c(sum(x1 * p) - x01, sum(x2 * p) - x02, sum(p) - 1)
hin1 <- function(p) p - 1e-06
sol <- alabama::auglag(pInit, fn = function(p) -f1(p),
heq = heq1, hin = hin1,
control.outer = outopt)
-1 * sol$value

Is there anything wrong with nlminb in R?

I am trying to solve a minimization problem in R with nlminb as part of a statistical problem. However, there is something wrong when comparing the solution provided by nlminb with the plot of the function I am trying to minimize. This is the R-code of the objective function
library(cubature)
Objective_Function <- function(p0){
F2 <- function(x){
u.s2 <- x[1]
u.c0 <- x[2]
u.k0 <- x[3]
s2 <- u.s2^(-1) - 1
c0 <- u.c0^(-1) - 1
k0 <- u.k0/p0
L <- 1/2 * c0 * s2 - 1/c0 * log(1 - k0 * p0)
A <- 1 - pnorm(L, mean = 1, sd = 1)
A <- A * dgamma(k0, shape = 1, rate = 1)
A <- A * dgamma(c0, shape = 1, rate = 1)
A <- A * dgamma(s2, shape = 1, rate = 1)
A * u.s2^(-2) * u.c0^(-2) * 1/p0
}
Pr <- cubature::adaptIntegrate(f = F2,
lowerLimit = rep(0, 3),
upperLimit = rep(1, 3))$integral
A <- 30 * Pr * (p0 - 0.1)
B <- 30 * Pr * (1 - Pr) * (p0 - 0.1)^2
0.4 * B + (1 - 0.4) * (-A)
}
Following the R-command
curve(Objective_Function, 0.1, 4)
one observes a critical point close to 2. However, when one executes
nlminb(start = runif(1, min = 0.1, max = 4),
objective = Objective_Function,
lower = 0.1, upper = 4)$par
the minimum of the function takes place at the point 0.6755844.
I was wondering if you could tell me where my mistake is, please.
Is there any reliable R-command to solve optimization problems?
If this is a very basic question, I apologize.
Thank you for your help.
The problem is not nlminb() but the fact that you have not provided a vectorized function in curve(). You can get the correct figure using the following code, from which you see that nlminb() indeed finds the minimum:
min_par <- nlminb(start = runif(1, min = 0.1, max = 4),
objective = Objective_Function,
lower = 0.1, upper = 4)$par
vec_Objective_Function <- function (x) sapply(x, Objective_Function)
curve(vec_Objective_Function, 0.1, 4)
abline(v = min_par, lty = 2, col = 2)
In addition, for univariate optimization you can also use function optimize(), i.e.,
optimize(Objective_Function, c(0.1, 4))

Resources