I have a data set with 1,000 of people some of which know each other and others do not. I am trying to predict using a hawkes point process which individuals (or nodes) will adopt a behaviour.
The issue is I am attempting to optimise the parameters.
I am assuming the edge list of who knows who is the alpha input into Hawkes, that lambda is a constant background value for all nodes and its the decay function beta I am trying to calculate.
I have this as an example, which runs but not certain this is the correct way to calculate this parameter?
library(hawkes)
#Multivariate Hawkes process - with 10 nodes
lambda0<-c(rep(0.2,10)) # set to constant to assume background intensity is equal for all nodes
alpha<-matrix(c(0.05,0.05,0,0.05,0,0.05,0.05,0,0,0.05, # matrix of who knows who 0.05 indicating a link 0 no link
0.05,0.05,0,0,0,0,0,0,0,0,
0,0,0.05,0,0,0,0,0,0,0,
0.05,0,0,0.05,0,0,0,0,0,0,
0,0,0,0,0.05,0,0,0,0,0,
0.05,0,0,0,0,0.05,0,0,0,0,
0.05,0,0,0,0,0,0.05,0,0,0,
0,0,0,0,0,0,0,0.05,0,0,
0,0,0,0,0,0,0,0,0.05,0,
0.05,0,0,0,0,0,0,0,0,0.05
),byrow=TRUE,nrow=10)
beta<-c(rep(0.7,10)) # set the initial values of beta to be able to generate some random history of events
history<-simulateHawkes(lambda0,alpha,beta,3600) # within 1 hour random generation of events for the 10 nodes
nloglik_bi_hawkes <- function(params, history){
beta <- c(params[1], params[2],params[3], params[4],params[5], params[6],params[7], params[8],params[9], params[10]) # in my real data I may have 1,000 of nodes so may need to optimise beta for more than 10.
return(likelihoodHawkes(lambda0, alpha, beta, history))
}
params_hawkes <- optim(c(rep(1,10)), nloglik_bi_hawkes, history = history) # to store the values of beta
A few points:
when using random numbers use set.seed to make the run reproducible
the beta <- line in the question is equivalent to beta <- params
the nloglik_bi_hawkes function is not really needed since you can pass likelihoodHawkes directly to optim. Fixed parameters to it can be passed to optim and it will forward them.
When I tried the code in the question it stopped after 500 function evaluations without converging. Use method = "BFGS" instead.
Thus we have
library(hawkes)
set.seed(123)
lambda0 <- rep(0.2,10)
alpha<-matrix(c(0.05,0.05,0,0.05,0,0.05,0.05,0,0,0.05,
0.05,0.05,0,0,0,0,0,0,0,0,
0,0,0.05,0,0,0,0,0,0,0,
0.05,0,0,0.05,0,0,0,0,0,0,
0,0,0,0,0.05,0,0,0,0,0,
0.05,0,0,0,0,0.05,0,0,0,0,
0.05,0,0,0,0,0,0.05,0,0,0,
0,0,0,0,0,0,0,0.05,0,0,
0,0,0,0,0,0,0,0,0.05,0,
0.05,0,0,0,0,0,0,0,0,0.05
),byrow=TRUE,nrow=10)
beta <- rep(0.7,10)
history <- simulateHawkes(lambda0, alpha, beta, 3600)
fm <- optim(rep(1, 10), likelihoodHawkes, method = "BFGS",
lambda0 = lambda0, alpha = alpha, history = history)
fm
giving:
$par
[1] 0.2656976 0.1371221 0.6215783 0.1425127 0.4481591 0.1409979 0.1428707
[8] 0.7836967 0.6930492 0.1490555
$value
[1] 15793.9
$counts
function gradient
105 26
$convergence
[1] 0
$message
NULL
Related
I'm implementing a Maximum-Likelihood estimation in R for a three parameter reverse Weibull model and have some troubles to get plausible results, which include:
Bad optimization results, unwanted optimx behaviour. Beside these I wonder, how I could make use of parscale in this model.
Here is my implementation attempt:
To generate data I use the probabilty integral transform:
#Generate N sigma*RWei(alph)-mu distributed points
gen.wei <- function(N, theta) {
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
return(
mu - sigma * (- log (runif(N)))**(1/alph)
)
}
Now I define the Log-Likelihood and negative Log-Likelihood to use optimx optimization:
#LL----
ll.wei <- function(theta,x) {
N <- length(x)
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
val <- sum(ifelse(
x <= mu,
log(alph/sigma) + (alph-1) * log( (mu-x)/sigma) - ( (mu-x)/sigma)**(alph-1),
-Inf
))
return(val)
}
#Negative LL----
nll.wei <- function(theta,x) {
return(-ll.wei(theta=theta, x=x))
}
Afterwards I define the analytical gradient of the negative LL. Remark: There are points at which the negative LL isn't differentiable (the upper end-point mu)
gradnll.wei <- function(theta,x) {
N <- length(x)
alph <- theta[1]
mu <- theta[2]
sigma <- theta[3]
argn <- (mu-x)/sigma
del.alph <- sum(ifelse(x <= mu,
1/alph + log(argn) - log(argn) * argn**(alph-1),
0
))
del.mu <- sum(ifelse(x <= mu,
(alph-1)/(mu-x) - (alph-1)/sigma * argn**(alph-2),
0))
del.sigma <- sum(ifelse(x <= mu,
((alph-1)*argn**(alph-1)-alph)/sigma,
0))
return (-c(del.alph, del.mu, del.sigma))
}
Finally I try to optimize using the optimx package and the methods Nelder-Mead (derivative free) and BFGS (my LL is kinda smooth, there's just one point, which is problematic).
#MLE for Weibull
mle.wei <- function(start,sample) {
optimx(
par=start,
fn = nll.wei,
gr = gradnll.wei,
method = c("BFGS"),
x = sample
)
}
theta.s <- c(4,1,1/2) #test for parameters
sample <- gen.wei(100, theta.s) #generate 100 data points distributed like theta.s
mle.wei(start=c(8,4, 2), sample) #MLE Estimation
To my surprise I get the following error:
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Cannot evaluate function at initial parameters
I checked manually: Both nll and gradnll are finite at the initial parameters...
If i switch to optim instead of optimx I get a result, but a pretty bad one:
$par
[1] 8.178674e-01 9.115766e-01 1.745724e-06
$value
[1] -1072.786
$counts
function gradient
574 100
$convergence
[1] 1
$message
NULL
So it doesn't converge. If I don't supply the gradient to BFGS, there isn't a result. If I use Nelder-Mead instead:
$par
[1] 1.026393e+00 9.649121e-01 9.865624e-18
$value
[1] -3745.039
$counts
function gradient
502 NA
$convergence
[1] 1
$message
NULL
So it is also very bad...
My questions are:
Should I instead of defining the ll outside of the support as -Inf give it a very high negative value like -1e20 to circumvent -Inf errors or does it not matter?
Like the first one but for the gradient: technically the ll isn't defined outside of the support but since the likelihood is 0 albeit constant outside of the support, is it smart to define the gradnll as 0 outside?
3.I checked the implementation of the MLE estimator fgev from the evd package and saw that they use the BFGS method but don't supply the gradient even though the gradient does exist. Therefore my question is, whether there are situations where it is contraproductive to supply the gradient since it isn't defined everywhere (like my and the evd case)?
I got an error of "argument x matches multiple formal arguments" type in optimx but not in optim, which surprised me. What am I doing wrong in supplying my functions and data to the optimx function?
Thank you very much in advance!
Re 3: That's kind of a bug in optimx, but one that's hard to avoid. It uses x as a variable name when calculating a numerical gradient; you also use it as an "additional parameter" to your functions. You can work around that by renaming your argument, e.g. call it xdata in your functions.
Re 1 & 2: There are several techniques to handle boundary problems in optimization. Setting to a big constant value tends not to work: if the optimizer goes out of bounds, it finds the objective function really flat. If the exact boundary is legal, then pushing the parameter to the boundary and adding a penalty sometimes works. If the exact boundary is illegal, you might be able to reflect: e.g. if mu > 0 is a requirement, sometimes replacing mu by abs(mu) in the objective function gets things to work. Sometimes the best solution is to get rid of the boundary by transforming the parameters.
Edited to add some more details:
For this problem, it looks to me as though transformations of the parameters might work. I think alpha and sigma must both be positive. Setting alpha <- exp(theta[1]) and sigma <- exp(theta[3]) will guarantee that. Limits on mu are harder, but I think mu > max(xdata) is needed, so mu <- max(xdata) + exp(theta[2]) should keep it in bounds. Of course, making these changes messes up your gradient formula and starting values.
As to resources: I'm afraid I don't know any. This advice is based on years of painful experience.
https://web.ncf.ca/nashjc/optimx202112/ has a version of the package that deals with at least some variable clashes in the dot args.
There are some separate cleanups to be done before this goes on CRAN, but
the package should be more or less robust at the moment.
JN
I am trying to plot the log-likelihood function of the Cauchy distribution for varying values of theta (location parameter). These are my observations:
obs<-c(1.77,-0.23,2.76,3.80,3.47,56.75,-1.34,4.24,3.29,3.71,-2.40,4.53,-0.07,-1.05,-13.87,-2.53,-1.74,0.27,43.21)
Here is my log-likelihood function:
ll_c<-function(theta,x_values){
n<-length(x_values)
logl<- -n*log(pi)-sum(log(1+(x_values-theta)^2))
return(logl)
}
and Ive tried making a plot by using this code:
x<-seq(from=-10,to=10,by=0.1);length(x)
theta_null<-NULL
for (i in x){
theta_log<-ll_c(i,counts)
theta_null<-c(theta_null,theta_log)
}
plot(theta_null)
The graph does not look right and for some reason the length of x and theta_null differs.
I am assuming that theta is your location parameter (the scale is set to 1 in my example). You should obtain the same result using a t-distribution with 1 df and shifting the observations by theta. I left some comments in the code as guidance.
obs = c(1.77,-0.23,2.76,3.80,3.47,56.75,-1.34,4.24,3.29,3.71,-2.40,4.53,-0.07,-1.05,-13.87,-2.53,-1.74,0.27,43.21)
ll_c=function(theta, obs)
{
# Compute log-lik for obs and a value of thet (location)
logl= sum(dcauchy(obs, location = theta, scale = 1, log = T))
return(logl)
}
# Loop for possible values of theta(obs given)
x = seq(from=-10,to=10,by=0.1)
ll = NULL
for (i in x)
{
ll = c(ll, ll_c(i, obs))
}
# Plot log-lik vs possible value of theta
plot(x, ll)
It is hard to say exactly what you are experiencing without more info. But I'll make an educated guess.
First of all, we can simplify this a lot by using the *t family of functions for the t distribution, as the cauchy distribution is just the t distribution with df = 1. So your calculations could've been done using
for(i in ncp)
theta_null <- c(theta_null, sum(dt(values, 1, i, log = TRUE)))
Note that multiplying by n doesn't actually matter for any practical purposes. We are usually interested in minimizing/maximizing the likelihood in which case all constants are irrelevant.
Now if we use this approach, we can quite quickly notice something by printing the values:
print(head(theta_null))
[1] -Inf -Inf -Inf -Inf -Inf -Inf
So I am assuming what you are experiencing is that many of your values are "almost" negative infinity, and maybe these are not stored correctly in your outcome vector. I can't see that this should be the case from your code, but this would be my initial guess.
I've been running estimations in R by fitting a curve to a price series. I want to evaluate the fitness of the curve by making very small changes to the key parameters m and omega at their optimum values. To do that I want to see how the sum of squared residuals changes at the optimum. I defined the function for residuals as below:
# Define function for sum of squared residuals, to evaluate the fitness of parameters m and omega
residuals <- function(m, omega, tc) {
lm.result <- LPPL(rTicker, m, omega, tc)
return(sum((FittedLPPL(rTicker, lm.result, m, omega, tc) - rTicker$Close) ** 2))
}
I can then yield an absolute value for the SSR at the optimum as follows:
#To return value of SSR
residvalue <- residuals(m, omega,tc)
What I want to do is repeat this code over a sequence of values for m (and then omega).
For instance if the optimum m = 0.5, I want to run this code to calculate the object 'residvalue' for a sequence of m values that lie between 0 < m < 1, interval size = 0.01 (ie run it 100 times for 100 different SSR values). I would then like to store these resulting SSR values in a vector (which I can then turn into a data frame of observations). This appears like a trivial task but I'm not sure how to go about doing it. Any help would be appreciated.
You could use sapply:
sapply(seq(0,1,0.01),function(m) residuals(m,omega,tc))
I am trying to optmize a function in R
The function is the Likelihood function of negative binominal when estimating only mu parameter. This should not be a problem since the function clearly has just one point of maximum. But, I am not being able to reach the desirable result.
The function to be optmized is:
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(Resultado)
}
Data is a vector of negative binomial variables with parameters 2 and 2
data <- rnegbin(10000, mu = 2, theta = 2)
When I plot the function having mu as variable with the following code:
x <- seq(0.1, 100, 0.02)
z <- EMV(data,0.1)
for (aux in x) {z <- rbind(z, EMV(data,aux))}
z <- z[2:NROW(z)]
plot(x,z)
I get the following curve:
And the maximum value of z is close to parameter value --> 2
x[which.max(z)]
But the optimization is not working with BFGS
Error in optim(par = theta, fn = EMV, data = data, method = "BFGS") :
non-finite finite-difference value [1]
And is not going to right value using SANN, for example:
$par
[1] 5.19767e-05
$value
[1] -211981.8
$counts
function gradient
10000 NA
$convergence
[1] 0
$message
NULL
The questions are:
What am I doing wrong?
Is there a way to tell optim that the param should be bigger than 0?
Is there a way to tell optim that I want to maximize the function? (I am afraid the optim is trying to minimize and is going to a very small value where function returns smallest values)
Minimization or Maximization?
Although ?optim says it can do maximization, but that is in a bracket, so minimization is default:
fn: A function to be minimized (or maximized) ...
Thus, if we want to maximize an objective function, we need to multiply an -1 to it, and then minimize it. This is quite a common situation. In statistics we often want to find maximum log likelihood, so to use optim(), we have no choice but to minimize the negative log likelihood.
Which method to use?
If we only do 1D minimization, we should use method "Brent". This method allows us to specify a lower bound and an upper bound of search region. Searching will start from one bound, and search toward the other, until it hit the minimum, or it reach the boundary. Such specification can help you to constrain your parameters. For example, you don't want mu to be smaller than 0, then just set lower = 0.
When we move to 2D or higher dimension, we should resort to "BFGS". In this case, if we want to constrain one of our parameters, say a, to be positive, we need to take log transform log_a = log(a), and reparameterize our objective function using log_a. Now, log_a is free of constraint. The same goes when we want constrain multiple parameters to be positive.
How to change your code?
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(-1 * Resultado)
}
optim(par = theta, fn = EMV, data = data, method = "Brent", lower = 0, upper = 1E5)
The help file for optim says: "By default optim performs minimization, but it will maximize if control$fnscale is negative." So if you either multiply your function output by -1 or change the control object input, you should get the right answer.
im now performing Location Model using non-parametric smoothing to estimate the paramneters.....one of the smoothed paramater is the lamdha that i have to optimize...
so in that case, i decide to use "nlminb function" to achieve it.....
however, my programing give me the same "$par" value even though it was iterate 150 time and make 200 evaluation (by default)..... which is it choose "the start value as $par" (that is 0.000001 ...... i think, there must be something wrong with my written program....
my programing look like:- (note: w is the parameter that i want to optimize and LOO is
stand for leave-one-out
BEGIN
Myfunc <- function(w, n1, n2, v1, v2, g)
{ ## open loop for main function
## DATA generation
# generate data from group 1 and 2
# for each group: discretise the continuous to binary
# newdata <- combine the groups 1 and 2
## MODEL construction
countError <- 0
n <- nrow(newdata)
for (k in 1:n)
{# open loop for leave-one-out
# construct model based on n-1 object using smoothing method
# classify omitted object
countError <- countError + countE
} # close loop for LOO process
Error <- countError / n # error rate counted from LOO procedure
return(Error) # The Average ERROR Rate from LOO procedure
} # close loop for Myfunc
library(stats)
nlminb(start=0.000001, Myfunc, lower=0.000001, upper=0.999999,
control=list(eval.max=100, iter.max=100))
END
could someone help me......
your concerns and guidances is highly appreciated and really100 needed......
Hashibah,
Statistic PhD Student
In your question, provide a nlminb with a univariate starting value. If you are doing univariate optimisation, it is probably worth looking at optimize. If your function is multivariate, then you need to call nlminb slightly differently.
You need define the objective function such that you provide the parameters to optimize over as a vector which is the first argument. Other inputs to the objective function should be provided as subsequent arguments.
For example (modified from the nlminb help page):
X <- rnbinom(100, mu = 10, size = 10)
hdev <- function(par, x) {
-sum(dnbinom(x, mu = par[1], size = par[2], log = TRUE))
}
nlminb(start = c(9, 12), hdev, x = X)