How do I perform Non Linear Least Squares in R with a pre determined lag structure - r

Suppose I want to estimate the parameters of the following model:
$y_t = beta0 (sum_{i=1}^p w(delta;i) x_{t-i})$.
Latex version of the equation: https://i.stack.imgur.com/POOlD.png
Where y_t and x_{t-i} are known data points, wdelta follows an exponential Almon lag structure with two parameters delta1 and delta2(see image). And beta0 is the common parameter.
Generating some data for x and y
y <- seq(1:10)
x <- rnorm(10,2,5)
The literature suggests estimating the model parameters using NLS and the Gaussian Newton Method. R does have a function gaussNewton however I am not sure how to use this. How do I approach the estimation of the parameters beta0,delta1 and delta2?
Wikipedia suggest: https://en.wikipedia.org/wiki/Non-linear_least_squares, however I feel like this is not appropriate in this case.
The nls function in R is unable to deal with predefined lag structures so this is not an option either. Maybe I could write out the function in the form of the sum of squared residuals and use the optim function? Another option could be to use the nlm function.
nonls <- function(delta1,delta2,i,p) {
z <- exp(delta1 * i + delta2 *i)
wdelta[i] <- exp(delta1 * i + delta2 *i)/sum(z[1:i])
ssr <- (y[i]- (beta0 * wdelta[i] * x[i:p]))^2
}
optim(ssr)
I look forward to your suggestions.

Related

Expected value command R and JAGS

Assuming this ís my Bayesian model, how can i calculate the expected value of my Weibull distribution? Is there a command for finding the expected value of the Weibull distribution in R and JAGS? Thanks
model{
#likelihood function
for (i in 1:n)
{
t[i] ~ dweib(v,lambda)#MTBF
}
#Prior for MTBF
v ~ dgamma(0.0001, 0.0001)
lambda ~ dgamma(0.0001, 0.0001)
}
#inits
list(v=1, lambda=1,mu=0,tau=1)
#Data
list(n=10, t=c(5.23333333,8.95,8.6,230.983333,1.55,85.1,193.033333,322.966667,306.716667,1077.8)
The mean, or expected value, of the Weibull distribution using the moment of methods with parameters v and lambda, is:
lambda * Gamma(1 + 1/v)
JAGS does not have the Gamma function, but we can use a work around with a
function that is does have: logfact. You can add this line to your code and track the derived parameter exp_weibull.
exp_weibull <- lambda * exp(logfact(1/v))
Gamma is just factorial(x - 1), so the mean simplifies a bit. I illustrate
below with some R functions how this derivation is the same.
lambda <- 5
v <- 2
mu_traditional <- lambda * gamma(1 + 1/v)
mu_logged <- lambda * exp(lfactorial(1/v))
identical(mu_traditional, mu_logged)
[1] TRUE
EDIT:
It seems like JAGS also has the log of the Gamma distribution as well: loggam. Thus, another solution would be
exp_weibull <- lambda * exp(loggam(1 + 1/v))
My understanding is that the parameterization of the Weibull distribution used by JAGS is different from that used by dweibull in R. I believe the JAGS version uses shape, v and rate lambda with an expected value of lambda^{-1/v}*gamma(1+1/v). Thus, I've implemented the expected value in JAGS as lambda^(-1/v)*exp(loggam(1+(1/v))). Interested if others disagree, admittedly I've had a tough time tracking which parameterization is used and how the expected value is formulated, especially give some of the interchangeability in symbols used for different parameters in different formulations!

Fitting experimental data points to different cumulative distributions using R

I am new to programming and using R software, so I would really appreciate your feedback to the current problem that I am trying to solve.
So, I have to fit a cumulative distribution with some function (two/three parameter function). This seems to be pretty straight-forward task, but I've been buzzing around this now for some time.
Let me show you what are my variables:
x=c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196)
y=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999)
This is the plot where I set up x-axis as log:
After some research, I have tried with Sigmoid function, as found on one of the posts (I can't add link since my reputation is not high enough). This is the code:
# sigmoid function definition
sigmoid = function(params, x) {
params[1] / (1 + exp(-params[2] * (x - params[3])))
}
# fitting code using nonlinear least square
fitmodel <- nls(y~a/(1 + exp(-b * (x-c))), start=list(a=1,b=.5,c=25))
# get the coefficients using the coef function
params=coef(fitmodel)
# asigning to y2 sigmoid function
y2 <- sigmoid(params,x)
# plotting y2 function
plot(y2,type="l")
# plotting data points
points(y)
This led me to some good fitting results (I don't know how to quantify this). But, when I look at the at the plot of Sigmuid fitting function I don't understand why is the S shape now happening in the range of x-values from 40 until 7 (looking at the S shape should be in x-values from 10 until 200).
Since I couldn't explain this behavior, I thought of trying Weibull equation for fitting, but so far I can't make the code running.
To sum up:
Do you have any idea why is the Sigmoid giving me that weird fitting?
Do you know any better two or three parameter equation for this fitting approach?
How could I determine the goodness of fit? Something like r^2?
# Data
df <- data.frame(x=c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196),
y=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999))
# sigmoid function definition
sigmoid = function(x, a, b, c) {
a * exp(-b * exp(-c * x))
}
# fitting code using nonlinear least square
fitmodel <- nls(y ~ sigmoid(x, a, b, c), start=list(a=1,b=.5,c=-2), data = df)
# plotting y2 function
plot(df$x, predict(fitmodel),type="l", log = "x")
# plotting data points
points(df)
The function I used is the Gompertz function and this blog post explains why R² shouldn't be used with nonlinear fits and offers an alternative.
After going through different functions and different data-sets I have found the best solution that gives the answers to all of my questions posted.
The code is as it follows for the data-set stated in question:
df <- data.frame(x=c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196),
y=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999))
library(drc)
fm <- drm(y ~ x, data = df, fct = G.3()) #The Gompertz model G.3()
plot(fm)
#Gompertz Coefficients and residual standard error
summary(fm)
The plot after fitting

R gmm package using exactly identified moment conditions

For exactly identified moments, GMM results should be the same regardless of initial starting values. This doesn't appear to be the case however.
library(gmm)
data(Finance)
x <- data.frame(rm=Finance[1:500,"rm"], rf=Finance[1:500,"rf"])
# want to solve for coefficients theta[1], theta[2] in exactly identified
# system
g <- function(theta, x)
{
m.1 <- x[,"rm"] - theta[1] - theta[2]*x[,"rf"]
m.z <- (x[,"rm"] - theta[1] - theta[2]*x[,"rf"])*x[,"rf"]
f <- cbind(m.1, m.z)
return(f)
}
# gmm coefficient result should be identical to ols regressing rm on rf
# since two moments are E[u]=0 and E[u*rf]=0
model.lm <- lm(rm ~ rf, data=x)
model.lm
# gmm is consistent with lm given correct starting values
summary(gmm(g, x, t0=model.lm$coefficients))
# problem is that using different starting values leads to different
# coefficients
summary(gmm(g, x, t0=rep(0,2)))
Is there something wrong with my setup?
The gmm package author Pierre Chausse was kind enough to respond to my inquiry.
For linear models, he suggests using the formula approach:
gmm(rm ~ rf, ~rf, data=x)
For non-linear models, he emphasizes that the starting values are indeed critical. In the case of exactly identified models, he suggests setting the fnscale to a small number to force the optim minimizer to converge closer to 0. Also, he thinks the BFGS algorithm works better with GMM.
summary(gmm(g, x, t0=rep(0,2), method = "BFGS", control=list(fnscale=1e-8)))
Both solutions work for this example. Thanks Pierre!

How to minimise two or more equations in R

I am trying to find an iterative way to solve these two M-estimater equations with two unknown parameters.
For each patient, we measure his blood pressure twice $Y_{i1}$ and $Y_{i2}$ and note his alcohol consumption $X_i$. We have given the following M-estimators and have proven these give unbiased results:
$$\sum\limits_{i=1}^n\sum\limits_{j=1}^2\big(Y_{ij}-\beta_0-\beta_1X_i\big)=0\quad\mbox{and }
\sum\limits_{i=1}^n\sum\limits_{j=1}^2\big(Y_{i}-\beta_0-\beta_1X_i\big)X_i=0$$
(Where using OLS or maximum likelihood we assume all the measurements are independent).
I know it is possible to solve these analytically, but in case these two equations would be very complex, how do I solve these numerically in R?
Is there something like nlm for multiple equations?
What I should have done is minimise the norm of the two equations, and nlm is able to minimise a function with two unknown variables. This R script works fine:
y <- data_long$y
x <- data_long$x
f <- function(beta){
temp_1 <- sum(y - beta[1] - beta[2] * x)
temp_2 <- sum(x*(y - beta[1] - beta[2] * x))
sqrt(temp_1^2 + temp_2^2)
}
m_estimator <- nlm(f, c(0,0))
With data_long my simulated data, and beta my two estimates.

Solve variable coefficients second order linear ODE?

For the variable coefficients second order linear ODE
$x''(t)+\beta_1(t)x'(t)+\beta_0 x(t)=0$
I have the numerical values (in terms of vectors) for $\beta_1(t)$ and $\beta_0(t)$, does anyone know some R package to do that? And some simple examples to illustrate would be great as well.
I googled to find 'bvpSolve' can solve constant coefficients value.
In order to use deSolve, you have to make your second-order ODE
x''(t) + \beta_1(t) x'(t) + \beta_0 x(t)=0
into a pair of coupled first-order ODEs:
x'(t) = y(t)
y'(t) = - \beta_1(t) y(t) - \beta_0 x(t)
Then it's straightforward:
gfun <- function(t,z,params) {
g <- with(as.list(c(z,params)),
{
beta0 <- sin(2*pi*t)
beta1 <- cos(2*pi*t)
c(x=y,
y= -beta1*y - beta0*x))
list(g,NULL)
}
library("deSolve")
run1 <- ode(c(x=1,y=1),times=0:40,func=gfun,parms=numeric(0))
I picked some initial conditions (x(0)=1, x'(0)=1) arbitrarily; you might also want to add parameters to the model (i.e. make parms something other than numeric(0))
PS if you're not happy doing the conversion to coupled first-order ODEs by hand, and want a package that will seamlessly handle second-order ODEs, then I don't know the answer ...

Resources