Integration and false convergence of optimization in R - r

I am trying to find MLEs of three positive parameters a, mu and theta, and then the value of a function, saying f1.
f1<-function(para)
{
a<-para[1]
mu<-para[2]
the<-para[3]
return(a*mu/the)
}
Step 1 Suppose we have the following (negative) log likelihood function.
where
x_ij and t_ij are known
loglik<-function(para, data)
{
n.quad<-64
a<-para[1]
mu<-para[2]
the<-para[3]
k<-length(table(data$group))
rule<-glaguerre.quadrature.rules(n.quad, alpha = 0)[[n.quad]]
int.ing.gl<-function(y, x, t)
{
(y^(a-mu-1)/(y+t)^(x+a))*exp(-the/y)
}
int.f<-function(x, t) glaguerre.quadrature(int.ing.gl, lower = 0, upper =
Inf, x=x, t=t, rule = rule, weighted = F)
v.int.f<-Vectorize(int.f)
int<-v.int.f(data$count, data$time)
loglik.value<-lgamma(a+data$count)-lgamma(a)+mu*log(the)-lgamma(mu)+log(int)
log.sum<-sum(loglik.value)
return(-log.sum)
}
Step 2 Let's fix true values and generate data.
### Set ###
library(tolerance)
library(lbfgs3)
a<-2
mu<-0.01
theta<-480
k<-10
f1(c(a, mu, theta))
[1] 5e-04
##### Data Generation #####
set.seed(k+100+floor(a*100)+floor(theta*1000)+floor(mu*1024))
n<-sample(50:150, k) # sample size for each group
X<-rep(0,sum(n))
# Initiate time vector
t<-rep(0, sum(n))
# Initiate the data set
group<-sample(rep(1:k,n)) # Randomly assign the group index
data.pre<-data.frame(X,t,group)
colnames(data.pre)<-c('count','time','group')
data<-data.pre[order(data.pre$group),] # Arrange by group index
# Generate time variable
mut<-runif(k, 50, 350)
for (i in 1:k)
{
data$time[which(data$group==i)]<-ceiling(r2exp(n[i], rate = mut[i], shift = 1))
}
### Generate count variable: Poisson
## First, Generate beta for each group: beta_i
beta<-rgamma(k, shape = mu, rate = theta)
# Generate lambda for each observation
lambda<-0
for (i in 1:k)
{
l<-rgamma(n[i], shape = a, rate = 1/beta[i])
lambda<-c(lambda,l)
}
lambda<-lambda[-1]
data<-data.frame(data,lambda)
data$count<-rpois(length(data$time), data$lambda*data$time) # Generate count variable
Step 3 optimization
head(data)
count time group
0 400 1
0 39 1
0 407 1
0 291 1
0 210 1
0 241 1
start.value<-c(2, 0.01, 100)
fit<-nlminb(start = start.value, loglik, data=data,
lower = c(0, 0, 0), control = list(trace = T))
fit
$par
[1] 1.674672e-02 1.745698e+02 3.848568e+03
$objective
[1] 359.5767
$convergence
[1] 1
$iterations
[1] 40
$evaluations
function gradient
79 128
$message
[1] "false convergence (8)"
One of the possible reasons leading to false convergence is the integral in the step 1. In the loglik function, I used glaguerre.quadrature. However, it failed to give correct result because the integral is converging slowly.
I gave an example to look for some suggestion in the following question
Use the Gauss-Laguerre quadrature to approximate an integral in R
Here, I just provide a complete example. Is there any method I can use to handle this integral?

Related

Difference drawing random numbers from distributions R [migrated]

This question was migrated from Stack Overflow because it can be answered on Cross Validated.
Migrated 24 days ago.
I am comparing these two forms of drawing random numbers from a beta and a Gaussian distribution. What are their differences? Why are they different?
The first way (_1) simulates from a Uniform(0,1) and then applies the inverse CDF of the Beta (Normal) distribution on those uniform draws to get draws from the Beta (Normal) distribution.
While the second way (_2) uses the default function to generate random numbers from the distribution.
Beta Distribution
set.seed(1)
beta_1 <- qbeta(runif(1000,0,1), 2, 5)
set.seed(1)
beta_2 <- rbeta(1000, 2,5)
> summary(beta_1); summary(beta_2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.009481 0.164551 0.257283 0.286655 0.387597 0.895144
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.006497 0.158083 0.261649 0.284843 0.396099 0.841760
Here every number is different.
Normal distribution
set.seed(1)
norm_1 <- qnorm(runif(1000, 0,1), 0, 0.1)
set.seed(1)
norm_2 <- rnorm(1000, 0, 0.1)
> summary(norm_1); summary(norm_2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.3008048 -0.0649125 -0.0041975 0.0009382 0.0664868 0.3810274
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.300805 -0.069737 -0.003532 -0.001165 0.068843 0.381028
Here the numbers are almost the same except in the mean and median
Shouldn't all be equal? Because I am generating random numbers from distributions with the same parameters
I think your question boils down to the assumption about the random number generator. If rnorm used the same RNG as runif under the hood, then your expectation would hold. It does not use the same RNG. The normal distribution RNG and uniform RNG are separate. See ?RNGkind. Without that exact match, you are left with the statistical tests below:
Is the mean of norm_1 different from the mean of norm_2?
t.test(x = norm_1, y = norm_2)
p-value > 0.05 indicates there is insufficient evidence to reject the null hypothesis that the means are equal at the 0.05 type I error level
Are the distributions different?
ks.test(x = norm_1, y = norm_2)
p-value > 0.05 indicates there is insufficient evidence to reject the null hypothesis that the distributions are equal at the 0.05 type I error level
I tried to sample a Bernoulli parameter at home using two different ways.
I flip a coin and assign 1 to heads and 0 to tails
I roll a six sided dice and assign the result 1 to a roll of the 3 highest numbers and the result 0 to a roll of the 3 lowest numbers.
I did this only twenty times instead of thousand times but the principle is the same. I got the following results:
result 0
result 1
Method 1
11
9
Method 2
8
12
Q: Why did I not get the same result for both methods?
A: Well, it is of course because they are samples and are supposed to be variable everytime.
If I would be able to reset some random seed to remove the variability, then this still doesn't matter because they are different methods.
Why is there no use of inverse transform sampling?
The normal distribution actually does use inverse transform sampling. The following command returns the same value of 0.3735462
set.seed(1)
rnorm(1,1,1)
set.seed(1)
qnorm(runif(1),1,1)
Also the rbeta uses inverse transform sampling and the following returns the same 0.7344913 and 0.2655087, which are only different by the relationship Y = 1-X (so internally there is some inversion)
alpha = 1
beta = 1
set.seed(1)
rbeta(1,alpha,beta)
set.seed(1)
qbeta(runif(1),alpha,beta)
The beta function becomes different when when $\alpha$ and $\beta$ are not both equal to one. This is because the inverse sampling is not very efficient and the the rbeta function will do some algorithm that creates the sample in a different way. Below is a code with the algorithm for the case that $min(\alpha,\beta) \leq 1$.
See for more about the algorithm: Hung, Ying-Chao, Narayanaswamy Balakrishnan, and Yi-Te Lin. "Evaluation of beta generation algorithms." Communications in Statistics-Simulation and Computation 38.4 (2009): 750-770.
You can see a few points that are calculated differently. The algorithm has a few steps where it starts redrawing random numbers, and it does this because redrawing numbers is easier than computing the inverse transform for a difficult case.
alpha = 0.9
beta = 0.9
#### Cheng's BC algorithm
### used if min(alpha,beta)<=1
### initialize
set.seed(1)
p = min(alpha,beta)
q = max(alpha,beta)
a = p+q
b = p^-1
delta = 1+q-p
k1 = delta*(0.0138889+0.0416667*p)/(q*b-0.777778)
k2 = 0.25 + (0.5+0.25/delta)*p
sample = function() {
### Perform steps of algorithm in a loop
step = 1
while(step<6) {
if (step == 1) {
U1 = runif(1)
U2 = runif(1)
if (U1 < 0.5)
{step = 2}
else
{step = 3}
}
if (step == 2) {
Y = U1*U2
Z = U1*Y
if (0.25*U2 + Z-Y >= k1) {
step = 1
} else {
step = 5
}
}
if (step == 3) {
Z = U1^2*U2
if (Z > 0.25) {
step = 4
} else {
V = b*log(U1/(1-U1))
W = q*exp(V)
step = 6
}
}
if (step == 4) {
if (Z < k2) {
step = 5
} else {
step = 1
}
}
if (step == 5) {
V = b*log(U1/(1-U1))
W = q*exp(V)
if (a*(log(a/(p+W))+V) - 1.3862944 < log(Z)) {
step = 1
} else {
step = 6
}
}
}
if (q == alpha) {
X = W/(p+W)
} else {
X = p/(p+W)
}
return(X)
}
sample()
n = 20
beta_orig = sapply(1:n,function(x) {
set.seed(x)
rbeta(1,alpha,beta)
})
beta_quantile = sapply(1:n,function(x) {
set.seed(x)
qbeta(runif(1),alpha,beta)
})
beta_BC = sapply(1:n,function(x) {
set.seed(x)
sample()
})
plot(beta_orig,beta_BC, pch = 1, xlim = c(0,1), ylim = c(0,1))
points(beta_orig,beta_quantile, col = 2, pch = 3)
legend(0.3,1, c("rbeta compared to inverse transform sampling", "rbeta compared to manual"), pch=c(3,1), col = c(2,1), cex = 0.85)
Some weird effect
In the code above I was resetting the random seed for each computation. The inverse transform is only the same for the first number. When you compute multiple numbers then only the first number is the same.
The following code
set.seed(1)
rnorm(6,1,1)
set.seed(1)
qnorm(runif(6),1,1)
set.seed(2)
rnorm(6,1,1)
set.seed(2)
qnorm(runif(6),1,1)
returns
[1] 0.3735462 1.1836433 0.1643714 2.5952808 1.3295078 0.1795316
[1] 0.3735462 0.6737666 1.1836433 2.3297993 0.1643714 2.2724293
[1] 0.1030855 1.1848492 2.5878453 -0.1303757 0.9197482 1.1324203
[1] 0.10308546 1.53124079 1.18484918 0.03810797 2.58784531 2.58463150
What you see here is that rnorm function skips a number. The reason is because it samples two random numbers to create more precision.
See these lines in the source ode of the norm_rand() function that R uses https://svn.r-project.org/R/trunk/src/nmath/snorm.c
define BIG 134217728 /* 2^27 */
/* unif_rand() alone is not of high enough precision */
u1 = unif_rand();
u1 = (int)(BIG*u1) + unif_rand();
return qnorm5(u1/BIG, 0.0, 1.0, 1, 0);

Minimising area under the ROC curve to optimise the parameters of a polynomial predictor with optim

My predictor (x) has U-shaped distribution in relation to the binary outcome (y), with positive outcomes at both low and high values of x, leading to a biconcave roc curve with a poor area under the curve (auc).
To maximise its ability to discriminate the outcome, I am trying to optimise the parameters of a second grade polynomial of x, by using optim and 1 - auc as the cost function to minimise.
x = c(13,7,7,7,1,100,3,4,4,2,2,7,14,8,3,14,5,12,8,
13,9,4,9,4,8,3,13,9,4,4,5,9,10,10,7,6,12,7,2,
6,6,4,3,2,3,10,5,2,5,8,3,5,4,2,7,5,7,6,79,9)
y = c(0,0,1,0,0,1,0,0,1,1,0,1,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0)
theta = c(0, 0, 0)
min_auc <- function(theta, x, y) {
(1 - roc(y, (theta[1] + theta[2]*x + theta[3]*x^2))$auc)
}
optim(theta, min_auc, x = x, y = y)
The results are as follow:
$par
[1] 0.0 0.1 0.0
$value
[1] 0.4380054
$counts
function gradient
8 NA
$convergence
[1] 0
$message
NULL
However, from a manual definition of the parameters, I know that min_auc can be further minimised.
theta = c(0, -40, 1)
(1 - roc(y, (theta[1] + theta[2]*x + theta[3]*(x^2)))$auc)
[1] 0.2762803
Could anyone explain to me what I am doing wrong, please? Is it possibly due to a non-convex cost function?
One possibility is it's a collinearity problem. Scaling the inputs helps:
min_auc <- function(theta, x, y) {
(1 - roc(y, (theta[1] + theta[2]*scale(x) + theta[3]*scale(x)^2))$auc)
}
optim(theta, min_auc, x = x, y = y)
# $par
# [1] -0.02469136 -0.03117284 0.11049383
#
# $value
# [1] 0.2762803
#
# $counts
# function gradient
# 30 NA
#
# $convergence
# [1] 0
#
# $message
# NULL
#
Another potential problem is that the surface over which you're optimizing has some flat spots. Let's say, for example, that we fix the intercept in this equation to -2. This is about what you get if you do qlogis(mean(y)). Then, you're only optimizing over 2 parameters so the surface is easier to see. Here's what it looks like with the two remaining theta terms on the two horizontal axes and the 1-AUC value on the y-axis.
min_auc <- function(theta, x, y) {
(1 - roc(y, (-2 + theta[1]*scale(x) + theta[2]*scale(x)^2))$auc)
}
s <- seq(-.25, .25, length=50)
o <- outer(s, s, Vectorize(function(z,w)min_auc(c(z,w),x,y)))
library(plotly)
plot_ly(x = ~s, y = ~s, z = ~o) %>% add_surface()
As you may have noticed above, there is no unique solution to the problem. There are lots of solutions that seem to get to the minimum value.

Why am I getting NAs in this calculation in R?

While working on an Rcpp program, I used the sample() function, which gave me the following error: "NAs not allowed in probability." I traced this issue to the fact that the probability vector I used had NA values in it. I have no idea how. Below is some R code that captures the errors:
n.0=20
n.1=20
n.reps=1
beta0.vals=rep(seq(-.3,.1,,n.0),n.reps)
beta1.vals=rep(seq(-7,0,,n.1),n.reps)
beta.grd=as.matrix(expand.grid(beta0.vals,beta1.vals))
n.rnd=200
beta.rnd.grd=cbind(runif(n.rnd,min(beta0.vals),max(beta0.vals)),runif(n.rnd,min(beta1.vals),max(beta1.vals)))
beta.grd=rbind(beta.grd,beta.rnd.grd)
N = 22670
count = 0
for(i in 1:dim(beta.grd)[1]){ # iterate through 600 possible beta values in beta grid
beta.ind = 0 # indicator for current pair of beta values
for(j in 1:N){ # iterate through all possible Nsums
logit = beta.grd[i,1]/N*(j - .1*N)^2 + beta.grd[i,2];
phi01 = exp(logit)/(1 + exp(logit))
if(is.na(phi01)){
count = count + 1
}
}
}
cat("Total number of invalid probabilities: ", count)
Here, $\beta_0 \in (-0.3, 0.1), \beta_1 \in (-7, 0), N = 22670, N_\text{sum} \in (1, N)$. Note that $N$ and $N_\text{sum}$ are integers, whereas the beta values may not be.
Since mathematically, $\phi_{01} \in (0,1)$, I'm assuming that NAs are arising because R is not liking extremely small values. I am receiving an overwhelming amount of NA values, too. More so than numbers. Why would I be getting NAs in this code?
Include print(logit) next to count = count + 1 and you will find lots of logit > 1000 values. exp(1000) == Inf so you divide Inf by Inf which will get you a NaN and NaN is NA:
> exp(500)
[1] 1.403592e+217
> Inf/Inf
[1] NaN
> is.na(NaN)
[1] TRUE
So your problems are not too small but to large numbers coming first out of the evaluation of exp(x) with x larger then roughly 700:
> exp(709)
[1] 8.218407e+307
> exp(710)
[1] Inf
Bernhard's answer correctly identifies the problem:
If logit is large, exp(logit) = Inf.
Here is a solution:
for(i in 1:dim(beta.grd)[1]){ # iterate through 600 possible beta values in beta grid
beta.ind = 0 # indicator for current pair of beta values
for(j in 1:N){ # iterate through all possible Nsums
logit = beta.grd[i,1]/N*(j - .1*N)^2 + beta.grd[i,2];
## This one isn't great because exp(logit) can be very large
# phi01 = exp(logit)/(1 + exp(logit))
## So, we say instead
## phi01 = 1 / ( 1 + exp(-logit) )
phi01 = plogis(logit)
if(is.na(phi01)){
count = count + 1
}
}
}
cat("Total number of invalid probabilities: ", count)
# Total number of invalid probabilities: 0
We can use the more stable 1 / (1 + exp(-logit)
(to convince yourself of this, multiply your expression with exp(-logit) / exp(-logit)),
and luckily either way, R has a builtin function plogis() that can calculate these probabilities quickly and accurately.
You can see from the help file (?plogis) that this function evaluates the expression I gave, but you can also double check to assure yourself
x = rnorm(1000)
y = 1 / (1 + exp(-x))
z = plogis(x)
all.equal(y, z)
[1] TRUE

Expected return and covariance from return time series

I’m trying to simulate the Matlab ewstats function here defined:
https://it.mathworks.com/help/finance/ewstats.html
The results given by Matlab are the following ones:
> ExpReturn = 1×2
0.1995 0.1002
> ExpCovariance = 2×2
0.0032 -0.0017
-0.0017 0.0010
I’m trying to replicate the example with the RiskPortfolios R package:
https://cran.r-project.org/web/packages/RiskPortfolios/RiskPortfolios.pdf
The R code I’m using is this one:
library(RiskPortfolios)
rets <- as.matrix(cbind(c(0.24, 0.15, 0.27, 0.14), c(0.08, 0.13, 0.06, 0.13)))
w <- 0.98
rets
w
meanEstimation(rets, control = list(type = 'ewma', lambda = w))
covEstimation(rets, control = list(type = 'ewma', lambda = w))
The mean estimation is the same of the one in the example, but the covariance matrix is different:
> rets
[,1] [,2]
[1,] 0.24 0.08
[2,] 0.15 0.13
[3,] 0.27 0.06
[4,] 0.14 0.13
> w
[1] 0.98
>
> meanEstimation(rets, control = list(type = 'ewma', lambda = w))
[1] 0.1995434 0.1002031
>
> covEstimation(rets, control = list(type = 'ewma', lambda = w))
[,1] [,2]
[1,] 0.007045044 -0.003857217
[2,] -0.003857217 0.002123827
Am I missing something?
Thanks
They give the same answer if type = "lw" is used:
round(covEstimation(rets, control = list(type = 'lw')), 4)
## 0.0032 -0.0017
## -0.0017 0.0010
They are using different algorithms. From the RiskPortfolio manual:
ewma ... See RiskMetrics (1996)
From the Matlab hlp page:
There is no relationship between ewstats function and the RiskMetrics® approach for determining the expected return and covariance from a return time series.
Unfortunately Matlab does not tell us which algorithm is used.
For those who eventually need an equivalent ewstats function in R, here the code I wrote:
ewstats <- function(RetSeries, DecayFactor=NULL, WindowLength=NULL){
#EWSTATS Expected return and covariance from return time series.
# Optional exponential weighting emphasizes more recent data.
#
# [ExpReturn, ExpCovariance, NumEffObs] = ewstats(RetSeries, ...
# DecayFactor, WindowLength)
#
# Inputs:
# RetSeries : NUMOBS by NASSETS matrix of equally spaced incremental
# return observations. The first row is the oldest observation, and the
# last row is the most recent.
#
# DecayFactor : Controls how much less each observation is weighted than its
# successor. The k'th observation back in time has weight DecayFactor^k.
# DecayFactor must lie in the range: 0 < DecayFactor <= 1.
# The default is DecayFactor = 1, which is the equally weighted linear
# moving average Model (BIS).
#
# WindowLength: The number of recent observations used in
# the computation. The default is all NUMOBS observations.
#
# Outputs:
# ExpReturn : 1 by NASSETS estimated expected returns.
#
# ExpCovariance : NASSETS by NASSETS estimated covariance matrix.
#
# NumEffObs: The number of effective observations is given by the formula:
# NumEffObs = (1-DecayFactor^WindowLength)/(1-DecayFactor). Smaller
# DecayFactors or WindowLengths emphasize recent data more strongly, but
# use less of the available data set.
#
# The standard deviations of the asset return processes are given by:
# STDVec = sqrt(diag(ECov)). The correlation matrix is :
# CorrMat = VarMat./( STDVec*STDVec' )
#
# See also MEAN, COV, COV2CORR.
NumObs <- dim(RetSeries)[1]
NumSeries <- dim(RetSeries)[2]
# size the series and the window
if (is.null(WindowLength)) {
WindowLength <- NumObs
}
if (is.null(DecayFactor)) {
DecayFactor = 1
}
if (DecayFactor <= 0 | DecayFactor > 1) {
stop('Must have 0< decay factor <= 1.')
}
if (WindowLength > NumObs){
stop(sprintf('Window Length #d must be <= number of observations #d',
WindowLength, NumObs))
}
# ------------------------------------------------------------------------
# size the data to the window
RetSeries <- RetSeries[NumObs-WindowLength+1:NumObs, ]
# Calculate decay coefficients
DecayPowers <- seq(WindowLength-1, 0, by = -1)
VarWts <- sqrt(DecayFactor)^DecayPowers
RetWts <- (DecayFactor)^DecayPowers
NEff = sum(RetWts) # number of equivalent values in computation
# Compute the exponentially weighted mean return
WtSeries <- matrix(rep(RetWts, times = NumSeries),
nrow = length(RetWts), ncol = NumSeries) * RetSeries
ERet <- colSums(WtSeries)/NEff;
# Subtract the weighted mean from the original Series
CenteredSeries <- RetSeries - matrix(rep(ERet, each = WindowLength),
nrow = WindowLength, ncol = length(ERet))
# Compute the weighted variance
WtSeries <- matrix(rep(VarWts, times = NumSeries),
nrow = length(VarWts), ncol = NumSeries) * CenteredSeries
ECov <- t(WtSeries) %*% WtSeries / NEff
list(ExpReturn = ERet, ExpCovariance = ECov, NumEffObs = NEff)
}

Find the parameters provided to an unknown vectorized function

I am trying to create an scoring function (called evalFunc). To get a score I am trying to calculate the R-squared value of a generated model. How do I know how the values are being passed to 'evalFunc' from within the rbga.bin function?
library(genalg)
library(ggplot2)
set.seed(1)
df_factored<-data.frame(colA=runif(30),colB=runif(30),colC=runif(30),colD=runif(30))
dataset <- colnames(df_factored)[2:length(df_factored)]
chromosome = sample(c(0,1),length(dataset),replace=T)
#dataset[chromosome == 1]
evalFunc <- function(x) {
#My end goal is to have the values passed into evalFunc be evaluated as the IV's in a linear model
res<-summary(lm(as.numeric(colA)~x,
data=df_factored))$r.squared
return(res)
}
iter = 10
GAmodel <- rbga.bin(size = 2, popSize = 200, iters = iter, mutationChance = 0.01, elitism = T, evalFunc = evalFunc)
cat(summary(GAmodel))
You can view the source by typing rbga.bin, but better than that you can run debug(rbga.bin), then the next time you call that function, it allows you to step through the function. In this case the first time you get to your function is in this line (approximately line 82 of the function):
evalVals[object] = evalFunc(population[object,
])
At this point, population is 200x2 matrix consisting of 0s and 1s:
head(population)
# [1,] 0 1
# [2,] 0 1
# [3,] 0 1
# [4,] 1 0
# [5,] 0 1
# [6,] 1 0
And object is the number 1, so population[object,] is the vector c(0,1).
When you've finished with debug you can undebug(rbga.bin) and it won't go into debug mode every time you call rbga.bin.

Resources