Reparametrize to remove constraints for optimization (R) - r

I am teaching myself how to run some Markov models in R, by working through the textbook "Hidden Markov Models for Time Series: An Introduction using R". I am a bit stuck on how to go about implementing something that is mentioned in the text.
So, I have the following function:
f <- function(samples,lambda,delta) -sum(log(outer(samples,lambda,dpois)%*%delta))
Which I can optimize with respect to, say, lambda using:
optim(par, fn=f, samples=x, delta=d)
where "par" is the initial guess for lambda, for some x and d.
Which works perfectly fine. However, in the part of the text corresponding to the example I am trying to reproduce, they note: "The parameters delta and lambda are constrained by sum(delta_i)=1 for i=1,...m, delta_i>0, and lambda_i>0. It is therefore necessary to reparametrize if one wishes to use an unconstrained optimizer such as nlm". One possibility is to maximize the likelihood with respect to the 2m-1 unconstrained parameters".
The unconstrained parameters are given by
eta<-log(lambda)
tau<-log(delta/(1-sum(delta))
I don't entirely understand how to go about implementing this. How would I write a function to optimize over this transformed parameter space?

When using optim() without parmater transfromations like so:
simpleFun <- function(x)
(x-3)^2
out = optim(par=5,
fn=simpleFun)
the set of parmaters estimates would be obtained via out$par which is 3 in
the case, as you might expect. Alternatively, you can wrap your function
f in a transformation the parameters like so:
out = optim(par=5,
fn=function(x)
# apply the transformation x -> x^3
simpleFun(x^3))
and now the trick to get the correct set of optimal parmeters to your
function you need to apply the same transfromation to the parameter
estimates as in:
(out$par)^3
#> 2.99741
(and yes, the parameter estimate is slightly different. For this contrived
example, you could set method="BFGS" for a slightly better estimate. Anyhow, this goes to show that the choice of transformation does matter in
some cases, but that's for another discussion...)
To complete the answer, It sounds like you a want to use a wrapper like so
# the function to be optimized
f <- function(samples,lambda,delta)
-sum(log(outer(samples,lambda,dpois)%*%delta))
out <- optim(# par it now a 2m vector
par = c(eta1 = 1,
eta2 = 1,
eta3 = 1,
tau1 = 1,
tau2 = 1,
tau3 = 1),
# a wrapper that applies the constraints
fn=function(x,samples){
# exp() guarantees that the values of lambda are > 0
lambda = exp(x[1:3])
# delta is also > 0
delta = exp(x[4:6])
# and now it sums to 1
delta = delta / sum(delta)
f(samples,lambda,delta)
},
samples=samples)
The above guarantees that the the parameters passed to f()have the correct constraints, and as long as you apply the same transformation to out$par, optim() will estimate an optimal set of parameters for f().

Related

Initial state starts at y(1), how to go backwards to find y(0)? [duplicate]

I would like to solve a differential equation in R (with deSolve?) for which I do not have the initial condition, but only the final condition of the state variable. How can this be done?
The typical code is: ode(times, y, parameters, function ...) where y is the initial condition and function defines the differential equation.
Are your equations time reversible, that is, can you change your differential equations so they run backward in time? Most typically this will just mean reversing the sign of the gradient. For example, for a simple exponential growth model with rate r (gradient of x = r*x) then flipping the sign makes the gradient -r*x and generates exponential decay rather than exponential growth.
If so, all you have to do is use your final condition(s) as your initial condition(s), change the signs of the gradients, and you're done.
As suggested by #LutzLehmann, there's an even easier answer: ode can handle negative time steps, so just enter your time vector as (t_end, 0). Here's an example, using f'(x) = r*x (i.e. exponential growth). If f(1) = 3, r=1, and we want the value at t=0, analytically we would say:
x(T) = x(0) * exp(r*T)
x(0) = x(T) * exp(-r*T)
= 3 * exp(-1*1)
= 1.103638
Now let's try it in R:
library(deSolve)
g <- function(t, y, parms) { list(parms*y) }
res <- ode(3, times = c(1, 0), func = g, parms = 1)
print(res)
## time 1
## 1 1 3.000000
## 2 0 1.103639
I initially misread your question as stating that you knew both the initial and final conditions. This type of problem is called a boundary value problem and requires a separate class of numerical algorithms from standard (more elementary) initial-value problems.
library(sos)
findFn("{boundary value problem}")
tells us that there are several R packages on CRAN (bvpSolve looks the most promising) for solving these kinds of problems.
Given a differential equation
y'(t) = F(t,y(t))
over the interval [t0,tf] where y(tf)=yf is given as initial condition, one can transform this into the standard form by considering
x(s) = y(tf - s)
==> x'(s) = - y'(tf-s) = - F( tf-s, y(tf-s) )
x'(s) = - F( tf-s, x(s) )
now with
x(0) = x0 = yf.
This should be easy to code using wrapper functions and in the end some list reversal to get from x to y.
Some ODE solvers also allow negative step sizes, so that one can simply give the times for the construction of y in the descending order tf to t0 without using some intermediary x.

Why is the likelihood/AIC of my poisson regression infinite?

I am trying to evaluate themodel fit of several regressions in R, and I have run into a problem I have had multiple times now: the log-likelihood of my Poisson regression is infinite.
I'm using a non-integer dependent variable (Note: I know what I'm doing in this regard), and I'm wondering if maybe that's the problem. However, I don't get an infinite log-likelihood when running the regression with glm.nb.
Code to reproduce the issue is below.
Edit: the problem appears to go away when I coerce the DV to integer. Any idea how to get log likelihood from Poissons with non-integer DVs?
# Input Data
so_data <- data.frame(dv = c(21.0552722691125, 24.3061351414885, 7.84658638053276,
25.0294679770848, 15.8064731063311, 10.8171744654056, 31.3008088413026,
2.26643928259238, 18.4261153345417, 5.62915828161753, 17.0691184593063,
1.11959635820499, 30.0154935602592, 23.0000809735738, 28.4389825676123,
27.7678405415711, 23.7108405071757, 23.5070651053276, 14.2534787168392,
15.2058525068363, 19.7449094187771, 2.52384709295823, 29.7081691356397,
32.4723790240354, 19.2147002673637, 61.7911384519901, 10.5687170234821,
23.9047421013736, 18.4889651451222, 13.0360878554798, 15.1752866581849,
11.5205948111817, 31.3539840929108, 31.7255952728076, 25.3034625215724,
5.00013988265465, 30.2037887018226, 1.86123112349445, 3.06932041603219,
22.6739418581257, 6.33738321053804, 24.2933951601142, 14.8634827414491,
31.8302947881089, 34.8361908525564, 1.29606416941288, 13.206844629927,
28.843579313401, 25.8024295609021, 14.4414831628722, 18.2109680632694,
14.7092063453463, 10.0738043919183, 28.4124482962025, 27.1004208775326,
1.31350378236957, 14.3009307888745, 1.32555197766214, 2.70896028922312,
3.88043749517381, 3.79492216916016, 19.4507965653633, 32.1689088941444,
2.61278585713499, 41.6955885902228, 2.13466761675063, 30.4207256294235,
24.8231524369244, 20.7605955978196, 17.2182798298094, 2.11563574288652,
12.290778250655, 0.957467139696772, 16.1775287334746))
# Run Model
p_mod <- glm(dv ~ 1, data = so_data, family = poisson(link = 'log'))
# Be Confused
logLik(p_mod)
Elaborating on #ekstroem's comment: the Poisson distribution is only supported over the non-negative integers (0, 1, ...). So, technically speaking, the probability of any non-integer value is zero -- although R does allow for a little bit of fuzz, to allow for round-off/floating-point representation issues:
> dpois(1,lambda=1)
[1] 0.3678794
> dpois(1.1,lambda=1)
[1] 0
Warning message:
In dpois(1.1, lambda = 1) : non-integer x = 1.100000
> dpois(1+1e-7,lambda=1) ## fuzz
[1] 0.3678794
It is theoretically possible to compute something like a Poisson log-likelihood for non-integer values:
my_dpois <- function(x,lambda,log=FALSE) {
LL <- -lambda+x*log(lambda)-lfactorial(x)
if (log) LL else exp(LL)
}
but I would be very careful - some quick tests with integrate suggest it integrates to 1 (after I fixed the bug in it), but I haven't checked more carefully that this is really a well-posed probability distribution. (On the other hand, some reasonable-seeming posts on CrossValidated suggest that it's not insane ...)
You say "I know what I'm doing in this regard"; can you give some more of the context? Some alternative possibilities (although this is steering into CrossValidated territory) -- the best answer depends on where your data really come from (i.e., why you have "count-like" data that are non-integer but you think should be treated as Poisson).
a quasi-Poisson model (family=quasipoisson). (R will still not give you log-likelihood or AIC values in this case, because technically they don't exist -- you're supposed to do inference on the basis of the Wald statistics of the parameters; see e.g. here for more info.)
a Gamma model (probably with a log link)
if the data started out as count data that you've scaled by some measure of effort or exposure), use an appropriate offset model ...
a generalized least-squares model (nlme::gls) with an appropriate heteroscedasticity specification
Poisson log-likelihood involves calculating log(factorial(x)) (https://www.statlect.com/fundamentals-of-statistics/Poisson-distribution-maximum-likelihood). For values larger than 30 it has to be done using Stirling's approximation formula in order to avoid exceeding the limit of computer arithmetic. Sample code in Python:
# define a likelihood function. https://www.statlect.com/fundamentals-of- statistics/Poisson-distribution-maximum-likelihood
def loglikelihood_f(lmba, x):
#Using Stirling formula to avoid calculation of factorial.
#logfactorial(n) = n*ln(n) - n
n = x.size
logfactorial = x*np.log(x+0.001) - x #np.log(factorial(x))
logfactorial[logfactorial == -inf] = 0
result =\
- np.sum(logfactorial) \
- n * lmba \
+ np.log(lmba) * np.sum(x)
return result

Optimization using package "nloptr"

I am trying to replicate results in R from Excel's "Solver" add-in. I don't know about the inner workings of optimization (mathematically), hence my confusion at most post results as well as the error messages I am receiving. I tried using the optimx package, but apparently that doesn't allow for too much control over the constraints in the optimization, so now I'm trying out the nloptr package.
Basically, what I'm trying to do is replicate an optimum portfolio calculation (financial). Below is a sample of my code:
ret.cov <- cov(as.matrix(ret.p[,1:30]))
wts <- rep(1/portfolioSize, times = portfolioSize)
sharpe <- function(wts) {
mean.p <- sum(colMeans(ret.p[,1:30])*wts)
var.p <- t(wts) %*% (ret.cov %*% (wts))
sd.p <- sqrt(var.p)
SR <- (mean.p - Rf)/sd.p
return(as.numeric(SR))
}
fun.eq <- function(wts) {
sum(wts) == 1
}
optim.p <- nloptr(x0 = wts, eval_f = sharpe, lb = 0, ub = 1, eval_g_eq = fun.eq)
sharpe(as.numeric(optim.p$solution))
Calculates the covariance matrix of 30 stocks and their returns
Initializes the weights of those stocks to optimize (equally weighted to start)
Sets up a function to maximize which calculates the portfolio's Sharpe Ratio
Tries (???) to specify the equality function for nloptr that states that the sum of the wts vector must be equal to 1.
Tries to maximize the function (though I think it's minimizing by default, and I don't know how to change that to maximize instead).
Checks the resulting, maximized Sharpe Ratio
The Sharpe calculation function works fine, when I try it outside of the nloptr function. The issues are various, from needing to specify the proper algorithm to use, to the function not accepting the equality function I supplied.
So, the questions I have are:
How do you change the nloptr to maximize instead of minimize?
How would one write an equality function to specify that the sum of the input vector (weights) must be equal to 1?
What is the proper algorithm to specify using opts = list() here? Excel uses something called "GRG Nonlinear".
Thank you in advance!
Hope it's still relevant...
You don't supply data so I can't run it but I'll try to help.
1) In order to maximize just minimize the -sharpe
2) eval_g_eq needs to be in format of h(x)=0, meaning that you need on fun.eq to change sum(wts) == 1 to sum(wts) - 1.
3) There are a lot of decent options. I use NLOPT_LN_COBYLA

Pass vector of lambdas to Poisson(), or guidance on idiomatic function composition

I'm trying to learn a little Julia by doing some bayesian analysis. In Peter Hoff's textbook, he describes a process of sampling from a posterior predictive distribution of a Poisson-Gamma model in which he:
Samples values from the gamma distribution
Samples values from the poisson distribution, passing a vector of lambdas
Here is what this looks like in R:
a <- 2
b <- 1
sy1 <- 217; n1 <- 111
theta1.mc <- rgamma(1000, a+sy1, b+n1)
y1.mc <- rpois(1000, theta1.mc)
In Julia, I see that distributions can't take a vector of parameters. So, I end up doing something like this:
using Distributions
a = 2
b = 1
sy1 = 217; n1 = 111
theta_mc = rand(Gamma(a+217, 1/(b+n1)), 5000)
y1_mc = map(x -> rand(Poisson(x)), theta_mc)
While I was initially put off at the distribution function not taking a vector and working Just Like R™, I like that I'm not needing to set my number of samples more than once. That said, I'm not sure I'm doing this idiomatically, either in terms of how people would work with the distributions package, or more generically how to compose functions.
Can anyone suggest a better, more idiomatic approach than my example code?
I would usually do something like the following, which uses list comprehensions:
a, b = 2, 1
sy1, n1 = 217, 111
theta_mc = rand(Gamma(a + sy1, 1 / (b + n1)), 1000)
y1_mc = [rand(Poisson(theta)) for theta in theta_mc]
One source of confusion may be that Poisson isn't really a function, it's a type constructor and it returns an object. So vectorization over theta doesn't really make sense, since that wouldn't construct one object, but many -- which would then require another step to call rand on each generated object.

Fitting an inverse function

I have a function which looks like:
g(x) = f(x) - a^b / f(x)^b
g(x) - known function, data vector provided.
f(x) - hidden process.
a,b - parameters of this function.
From the above we get the relation:
f(x) = inverse(g(x))
My goal is to optimize parameters a and b such that f(x) would be as close as possible
to a normal distribution. If we look on a f(x) Q-Q normal plot (attached), my purpose is to minimize the distance between f(x) to the straight line which represents the normal distribution, by optimizing parameters a and b.
I wrote the below code:
g_fun <- function(x) {x - a^b/x^b}
inverse = function (f, lower = 0, upper = 2000) {
function (y) uniroot((function (x) f(x) - y), lower = lower, upper = upper)[1]
}
f_func = inverse(function(x) g_fun(x))
enter code here
# let's made up an example
# g(x) values are known
g <- c(-0.016339, 0.029646, -0.0255258, 0.003352, -0.053258, -0.018971, 0.005172,
0.067114, 0.026415, 0.051062)
# Calculate f(x) by using the inverse of g(x), when a=a0 and b=b0
for (i in 1:10) {
f[i] <- f_fun(g[i])
}
I have two question:
How to pass parameters a and b to the functions?
How to perform this optimization task, meaning find a and b such that f(x) would approximate normal distribution.
Not sure how you were able to produce the Q-Q plot since your provided examples do not work. You are not specifying the values of a and b and you are defining f_func but calling f_fun. Anyway here is my answer to your questions:
How to pass parameters a and b to the functions? - Just pass them as
arguments to the functions.
How to perform this optimization task, meaning find a and b such that f(x) would approximate normal distribution? - The same way any optimization task is done. Define a cost function, then minimize it.
Here is the revised code: I have added a and b as parameters, removed the inverse function and incorporated it inside f_func, which can now take vector input so no need for a for loop.
g_fun <- function(x,a,b) {x - a^b/x^b}
f_func = function(y,a,b,lower = 0, upper = 2000){
sapply(y,function(z) { uniroot(function(x) g_fun(x,a,b) - z, lower = lower, upper = upper)$root})
}
# g(x) values are known
g <- c(-0.016339, 0.029646, -0.0255258, 0.003352, -0.053258, -0.018971, 0.005172,
0.067114, 0.026415, 0.051062)
f <- f_func(g,1,1) # using a = 1 and b = 1
#[1] 0.9918427 1.0149329 0.9873386 1.0016774 0.9737270 0.9905320 1.0025893
#[8] 1.0341199 1.0132947 1.0258569
f_func(g,2,10)
[1] 1.876408 1.880554 1.875578 1.878138 1.873094 1.876170 1.878304 1.884049
[9] 1.880256 1.882544
Now for the optimization part, it depends on what you mean by f(x) would approximate normal distribution. You can compare mean square error from the qq-line if you want. Also since you say approximate, how close is good enough? You can go with shapiro.test and keep searching till you find p-value below 0.05 (be ware that there may not be a solution)
shapiro.test(f_func(g,1,2))$p
[1] 0.9484821
cost <- function(x,y) shapiro.test(f_func(g,x,y))$p
Now that we have a cost function how do we go about minimizing it. There are many many different ways to do numerical optimization. Take a look at optim function http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optim.html.
optim(c(1,1),cost)
This final line does not work, but without proper data and context this is as far as I can go. Hope this helps.

Resources