non-linear optimization in R using optim - r

I'm a newbie in R!
I would like to find the best gamma distribution parameters to fit my experimental counts data. The optim function's help file says the first argument of the function should be the parameters to be optimized. So I tried :
x = as.matrix(seq(1,20,0.1))
yexp = dgamma(x,2,1)*100 + rnorm(length(x),0,1)
f = function(p,x,yexp) {sum((p[1]*dgamma(x,p[2],scale=p[3]) - yexp)^2)}
mod = optim(c(50,2,1),f(p,x,yexp))
I get the error message :
Error in f(p, x, yexp) : object 'p' not found
Any hint where I'm wrong?
Supplementary question : Is there any other way to fit counts data with standard distribution (gamma, inverse gaussian, etc?)

optim expects its second argument to be a function. Also, the second and third arguments to f are fixed and need to be specified:
optim(c(50, 1, 2), f, x = x, yexp = yexp)
This would also work:
optim(c(50, 1, 2), function(p) f(p, x, yexp))
You could also use nls with default Nelder-Mead algorithm:
nls(yexp ~ a * dgamma(x, sh, scale=sc), start = list(a = 50, sh = 2, sc = 1))
or with plinear in which case no starting value is needed for the first parameter:
nls(c(yexp) ~ dgamma(x, sh, scale=sc), start = list(sh = 2, sc = 1), alg = "plinear")

Related

Can't use mppm on multitype point patterns

I'm trying to fit a MultiStraussHardcore interaction to one of the sample datatsets in spatstat (flu). I'm maintaining the same interaction and hardcore radius for all types and point patterns. I'm running the following block:
library(spatstat)
library("optimbase")
flusubset <- flu[1:4]
typelist <- lapply(lapply(flusubset$pattern, marks), levels)
stopifnot(length(unique(typelist))==1)
num_marks <- length(typelist[[1]])
iradii <- 50*ones(num_marks)
hradii <- 3*ones(num_marks)
Int <- anylist()
for (i in 1:dim(flusubset)[1]) {
Int[[i]] <- MultiStraussHard(iradii=iradii, hradii=hradii)
}
Int <- as.hyperframe(Int)
multmodel <- mppm(pattern ~ 1, data=flusubset, interaction=Int)
Each time I run mppm, I get the following error
Error in (function (d, tx, tu, par) : data and model do not have the same possible levels of marks
I've included the traceback, too.
12. stop("data and model do not have the same possible levels of marks")
11. (function (d, tx, tu, par) { r <- par$iradii h <- par$hradii ...
10. do.call(fun, usedargs)
9. do.call.matched(pairpot, list(d = matrix(, 0, 0), tx = marks(X)[integer(0)], tu = marks(P)[integer(0)], par = potpars))
8. evalPairPotential(X, U, EqualPairs, pairpot, potpars, Reach)
7. evaluate(X, P, E, interaction$pot, interaction$par, correction = correction, splitInf = splitInf, ..., Reach = Reach, precomputed = precomputed, savecomputed = savecomputed)
6. evalInterEngine(X = X, P = P, E = E, interaction = interaction, correction = correction, splitInf = splitInf, ..., precomputed = precomputed, savecomputed = savecomputed)
5. evalInteraction(X, P, E, interaction, correction, ..., splitInf = splitInf, precomputed = precomputed, savecomputed = savecomputed)
4. mpl.prepare(Q, X, P, trend, interaction, covariates, want.trend, want.inter, correction, rbord, "quadrature points", callstring, subsetexpr = subsetexpr, allcovar = allcovar, precomputed = precomputed, savecomputed = savecomputed, covfunargs = covfunargs, weightfactor = weightfactor, ...
3. mpl.engine(Q, trend = trend, interaction = interaction, ..., covariates = covariates, correction = correction, rbord = rbord, use.gam = use.gam, allcovar = allcovar, preponly = TRUE, forcefit = TRUE)
2. bt.frame(Yi, ~1, inter, ..., covariates = covariates, allcovar = TRUE, use.gam = use.gam, vnamebase = itags[j], vnameprefix = itags[j])
1. mppm(pattern ~ 1, data = flusubset, interaction = Int)
I've tried fitting a MultiStraussHardcore model with ppm for each individual point pattern, and I have no issues. I've confirmed that the possible levels of each point pattern are identical. I've also verified that the interaction and hardcore radii matrices have the correct dimensions (2x2 for both) and that my hyperframe containing the interact objects is the correct dimensions. Thanks!
Thank you for the well described problem and the reproducible example. It made things very easy for me.
Indeed you have found a minor bug (documentation inconsistency). Your code runs without errors if, in the for loop, you replace
MultiStraussHard(iradii=iradii, hradii=hradii)
by
typ <- levels(marks(flu$pattern[[1]]))
MultiStraussHard(iradii=iradii, hradii=hradii, types=typ)
The documentation for MultiStraussHard says that the argument types is optional, but that is only true for ppm calls at the moment. I will see if it is possible to extend the auto detection of types to mppm, so your original code would work in future versions of spatstat.
This has been fixed in the latest development version of spatstat available from the github repository

MXNET softmax output: label shape confusion

I have not got a clear idea about how labels for the softmax classifier should be shaped.
What I could understand from my experiments is that a scalar laber indicating the index of class probability output is one option, while another is a 2D label where the rows are class probabilities, or one-hot encoded variable, like c(1, 0, 0).
What puzzles me though is that:
I can use sclalar label values that go beyong indexing, like 4 in my
example below -- without warning or error. Why is that?
When my label is a negative scalar or an array with a negative value,
the model converges to uniform probablity distribution over classes.
For example, is this expected that actor_train.y = matrix(c(0, -1,v0), ncol = 1) results in equal probabilities in the softmax output?
I try to use softmax MXNET classifier to produce the policy gradient
reifnrocement learning, and my negative rewards lead to the issue
above: uniform probability. Is that expected?
require(mxnet)
actor_initializer <- mx.init.Xavier(rnd_type = "gaussian",
factor_type = "avg",
magnitude = 0.0001)
actor_nn_data <- mx.symbol.Variable('data') actor_nn_label <- mx.symbol.Variable('label')
device.cpu <- mx.cpu()
NN architecture
actor_fc3 <- mx.symbol.FullyConnected(
data = actor_nn_data
, num_hidden = 3 )
actor_output <- mx.symbol.SoftmaxOutput(
data = actor_fc3
, label = actor_nn_label
, name = 'actor' )
crossentfunc <- function(label, pred)
{
- sum(label * log(pred)) }
actor_loss <- mx.metric.custom(
feval = crossentfunc
, name = "log-loss"
)
initialize NN
actor_train.x <- matrix(rnorm(11), nrow = 1)
actor_train.y = 0 #1 #2 #3 #-3 # matrix(c(0, 0, -1), ncol = 1)
rm(actor_model)
actor_model <- mx.model.FeedForward.create(
symbol = actor_output,
X = actor_train.x,
y = actor_train.y,
ctx = device.cpu,
num.round = 100,
array.batch.size = 1,
optimizer = 'adam',
eval.metric = actor_loss,
clip_gradient = 1,
wd = 0.01,
initializer = actor_initializer,
array.layout = "rowmajor" )
predict(actor_model, actor_train.x, array.layout = "rowmajor")
It is quite strange to me, but I found a solution.
I changed optimizer from optimizer = 'adam' to optimizer = 'rmsprop', and the NN started to converge as expected in case of negative targets. I made simulations in R using a simple NN and optim function to get the same result.
Looks like adam or SGD may be buggy or whatever in case of multinomial classification... I also used to get stuck at the fact those optimizers did not converge to a perfect solution on just 1 example, while rmsprop does! Be aware!

How to write multiple random intercepts in lqmm?

I'm trying to define a linear mixed regression model using lqmm package with multiple random intercept terms. However, I do not find the good syntax to do it?
1. Is it possible to do it with the lqmm package?
2. If yes, do you know the good syntax to write it?
3. If no, do you know any other package (and associated syntax)?
Example of the syntax already used:
mod <- lqmm(fixed = Y ~ log10(X), random = ~ list(1,1), group = list(site.f,spp.f),
tau = 0.95, nK = 7, type = "normal", data = data_s)
It returns: Error in model.frame.default(groupFormula, dataMix) : type (list) incorrect pour la variable 'list(site.f, spp.f)
mod <- lqmm(fixed = Y ~ log10(X), random = ~ 1, group = site.f + spp.f,
tau = 0.95, nK = 7, type = "normal", data = data_s)
It returns:
Error in rep(weights, table(grp)) : invalid 'times' argument
Thanks a lot for your help
Vincent

Fitting a 3 parameter Weibull distribution

I have been doing some data analysis in R and I am trying to figure out how to fit my data to a 3 parameter Weibull distribution. I found how to do it with a 2 parameter Weibull but have come up short in finding how to do it with a 3 parameter.
Here is how I fit the data using the fitdistr function from the MASS package:
y <- fitdistr(x[[6]], 'weibull')
x[[6]] is a subset of my data and y is where I am storing the result of the fitting.
First, you might want to look at FAdist package. However, that is not so hard to go from rweibull3 to rweibull:
> rweibull3
function (n, shape, scale = 1, thres = 0)
thres + rweibull(n, shape, scale)
<environment: namespace:FAdist>
and similarly from dweibull3 to dweibull
> dweibull3
function (x, shape, scale = 1, thres = 0, log = FALSE)
dweibull(x - thres, shape, scale, log)
<environment: namespace:FAdist>
so we have this
> x <- rweibull3(200, shape = 3, scale = 1, thres = 100)
> fitdistr(x, function(x, shape, scale, thres)
dweibull(x-thres, shape, scale), list(shape = 0.1, scale = 1, thres = 0))
shape scale thres
2.42498383 0.85074556 100.12372297
( 0.26380861) ( 0.07235804) ( 0.06020083)
Edit: As mentioned in the comment, there appears various warnings when trying to fit the distribution in this way
Error in optim(x = c(60.7075705026659, 60.6300379017397, 60.7669410153573, :
non-finite finite-difference value [3]
There were 20 warnings (use warnings() to see them)
Error in optim(x = c(60.7075705026659, 60.6300379017397, 60.7669410153573, :
L-BFGS-B needs finite values of 'fn'
In dweibull(x, shape, scale, log) : NaNs produced
For me at first it was only NaNs produced, and that is not the first time when I see it so I thought that it isn't so meaningful since estimates were good. After some searching it seemed to be quite popular problem and I couldn't find neither cause nor solution. One alternative could be using stats4 package and mle() function, but it seemed to have some problems too. But I can offer you to use a modified version of code by danielmedic which I have checked a few times:
thres <- 60
x <- rweibull(200, 3, 1) + thres
EPS = sqrt(.Machine$double.eps) # "epsilon" for very small numbers
llik.weibull <- function(shape, scale, thres, x)
{
sum(dweibull(x - thres, shape, scale, log=T))
}
thetahat.weibull <- function(x)
{
if(any(x <= 0)) stop("x values must be positive")
toptim <- function(theta) -llik.weibull(theta[1], theta[2], theta[3], x)
mu = mean(log(x))
sigma2 = var(log(x))
shape.guess = 1.2 / sqrt(sigma2)
scale.guess = exp(mu + (0.572 / shape.guess))
thres.guess = 1
res = nlminb(c(shape.guess, scale.guess, thres.guess), toptim, lower=EPS)
c(shape=res$par[1], scale=res$par[2], thres=res$par[3])
}
thetahat.weibull(x)
shape scale thres
3.325556 1.021171 59.975470
An alternative: package "lmom". The estimative by L-moments technique
library(lmom)
thres <- 60
x <- rweibull(200, 3, 1) + thres
moments = samlmu(x, sort.data = TRUE)
log.moments <- samlmu( log(x), sort.data = TRUE )
weibull_3parml <- pelwei(moments)
weibull_3parml
zeta beta delta
59.993075 1.015128 3.246453
But I donĀ“t know how to do some Goodness-of-fit statistics in this package or in the solution above. Others packages you can do Goodness-of-fit statistics easily. Anyway, you can use alternatives like: ks.test or chisq.test

How to save estimated parameters from nigfit() in a variable

I want to automatically fit time series returns into a NIG distribution.
With nigfit() from the package fBasics I estimate the mu, alpha, beta and delta of the distribution.
> nigFit(histDailyReturns,doplot=FALSE,trace=FALSE)
Title:
Normal Inverse Gaussian Parameter Estimation
Call:
.nigFit.mle(x = x, alpha = alpha, beta = beta, delta = delta,
mu = mu, scale = scale, doplot = doplot, span = span, trace = trace,
title = title, description = description)
Model:
Normal Inverse Gaussian Distribution
Estimated Parameter(s):
alpha beta delta mu
48.379735861 -1.648483055 0.012361539 0.001125734
This works fine, which means that nigfit plots my parameters.
However I would like to use the estimated parameters and save them in variables. So I could use them later.
> variable = nigfit(histDailyReturns,doplot=FALSE,trace=FALSE)
This doesn't work out. 'variable' is an S4 object of class structure fDISTFIT. Calling the variable replots the output of nigfit above.
I tried the following notations, to get just one parameter:
> variable$alpha
> variable.alpha
> variable[1]
I couldn't find an answer in the documentation of nigfit.
Is it possible to save the estimated parameters in variables? How does it work?
access the output compenents using #. variable has different slots. Get their names using slotNames(). Using the example from the documentation:
set.seed(1953)
s <- rnig(n = 1000, alpha = 1.5, beta = 0.3, delta = 0.5, mu = -1.0)
a <- nigFit(s, alpha = 1, beta = 0, delta = 1, mu = mean(s), doplot = TRUE)
slotNames(a)
[1] "call" "model" "data" "fit" "title"
[6] "description"
# `fit` is a list with all the goodies. You're looking for the vector, `estimate`:
a#fit$estimate
alpha beta delta mu
1.6959724 0.3597794 0.5601027 -1.0446402
Examine the structure of the output object using str(variable):
> variable#fit$par[["alpha"]]
[1] 48.379735861
> variable#fit$par[["beta"]]
[1] -1.648483055
> variable#fit$par[["delta"]]
[1] 0.012361539
> variable#fit$par[["mu"]]
[1] 0.001125734

Resources