What is wrong with my implementation of AdaBoost? - r

I tried to implement the AdaBoost algorithm of Freund and Schapire as close to the original as possible (see p. 2 here: http://rob.schapire.net/papers/explaining-adaboost.pdf):
library(rpart)
library(OneR)
maxdepth <- 1
T <- 100 # number of rounds
# Given: (x_1, y_1),...,(x_m, y_m) where x_i element of X, y_i element of {-1, +1}
myocarde <- read.table("http://freakonometrics.free.fr/myocarde.csv", head = TRUE, sep = ";")
#myocarde <- read.table("data/myocarde.csv", header = TRUE, sep = ";")
y <- (myocarde[ , "PRONO"] == "SURVIE") * 2 - 1
x <- myocarde[ , 1:7]
m <- nrow(x)
data <- data.frame(x, y)
# Initialize: D_1(i) = 1/m for i = 1,...,m
D <- rep(1/m, m)
H <- replicate(T, list())
a <- vector(mode = "numeric", T)
set.seed(123)
# For t = 1,...,T
for(t in 1:T) {
# Train weak learner using distribution D_t
# Get weak hypothesis h_t: X -> {-1, +1}
data_D_t <- data[sample(m, 10*m, replace = TRUE, prob = D), ]
H[[t]] <- rpart(y ~., data = data_D_t, maxdepth = maxdepth, method = "class")
# Aim: select h_t with low weighted error: e_t = Pr_i~D_t[h_t(x_i) != y_i]
h <- predict(H[[t]], x, type = "class")
e <- sum(h != y) / m
# Choose a_t = 0.5 * log((1-e) / e)
a[t] <- 0.5 * log((1-e) / e)
# Update for i = 1,...,m: D_t+1(i) = (D_t(i) * exp(-a_t * y_i * h_t(x_i))) / Z_t
# where Z_t is a normalization factor (chosen so that Dt+1 will be a distribution)
D <- D * exp(-a[t] * y * as.numeric(h))
D <- D / sum(D)
}
# Output the final hypothesis: H(x) = sign(sum of a_t * h_t(x) for t=1 to T)
newdata <- x
H_x <- sapply(H, function(x) as.numeric(as.character(predict(x, newdata = newdata, type = "class"))))
H_x <- t(a * t(H_x))
pred <- sign(rowSums(H_x))
#H
#a
eval_model(pred, y)
##
## Confusion matrix (absolute):
## Actual
## Prediction -1 1 Sum
## -1 0 1 1
## 1 29 41 70
## Sum 29 42 71
##
## Confusion matrix (relative):
## Actual
## Prediction -1 1 Sum
## -1 0.00 0.01 0.01
## 1 0.41 0.58 0.99
## Sum 0.41 0.59 1.00
##
## Accuracy:
## 0.5775 (41/71)
##
## Error rate:
## 0.4225 (30/71)
##
## Error rate reduction (vs. base rate):
## -0.0345 (p-value = 0.6436)
As can be seen the accuracy of the model is horrible compared to other AdaBoost implementations, e.g.:
library(JOUSBoost)
## JOUSBoost 2.1.0
boost <- adaboost(as.matrix(x), y, tree_depth = maxdepth, n_rounds = T)
pred <- predict(boost, x)
eval_model(pred, y)
##
## Confusion matrix (absolute):
## Actual
## Prediction -1 1 Sum
## -1 29 0 29
## 1 0 42 42
## Sum 29 42 71
##
## Confusion matrix (relative):
## Actual
## Prediction -1 1 Sum
## -1 0.41 0.00 0.41
## 1 0.00 0.59 0.59
## Sum 0.41 0.59 1.00
##
## Accuracy:
## 1 (71/71)
##
## Error rate:
## 0 (0/71)
##
## Error rate reduction (vs. base rate):
## 1 (p-value < 2.2e-16)
My question
Could you please give me a hint what went wrong in my implementation? Thank you
Edit
The final and corrected code can be found in my blog post: Understanding AdaBoost – or how to turn Weakness into Strength

There are quite a few contributing factors as to why your implementation is not working.
You were not using rpart correctly. Adaboost implementation does not mention upsampling with the weights - but rpart itself can accept weights. My example below shows how rpart should be used for this purpose.
Calculation of the weighted error was wrong. You were calculating the error proportion (number of samples calculated incorrectly divided by number of samples). Adaboost uses the sum of the weights that were incorrectly predicted (sum(D[y != yhat])).
Final predictions seemed to be incorrect too, I just ended up doing a simple loop.
Next time I recommend diving into the source code the the other implementations you are comparing against.
https://github.com/cran/JOUSBoost/blob/master/R/adaboost.R uses almost identical code to my below example - and probably would have helped guide you originally.
Additionally using T as a variable could potentially interfere with the logical TRUE and it's shorthand T, so I'd avoid it.
### packages ###
library(rpart)
library(OneR)
### parameters ###
maxdepth <- 1
rounds <- 100
set.seed(123)
### data ###
myocarde <- read.table("http://freakonometrics.free.fr/myocarde.csv", head = TRUE, sep = ";")
y <- (myocarde[ , "PRONO"] == "SURVIE") * 2 - 1
x <- myocarde[ , 1:7]
m <- nrow(x)
dataset <- data.frame(x, y)
### initialisation ###
D <- rep(1/m, m)
H <- list()
a <- vector(mode = "numeric", length = rounds)
for (i in seq.int(rounds)) {
# train weak learner
H[[i]] = rpart(y ~ ., data = dataset, weights = D, maxdepth = maxdepth, method = "class")
# predictions
yhat <- predict(H[[i]], x, type = "class")
yhat <- as.numeric(as.character(yhat))
# weighted error
e <- sum(D[yhat != y])
# alpha coefficient
a[i] <- 0.5 * log((1 - e) / e)
# updating weights (D)
D <- D * exp(-a[i] * y * yhat)
D <- D / sum(D)
}
# predict with each weak learner on dataset
y_hat_final <- vector(mode = "numeric", length = m)
for (i in seq(rounds)) {
pred = predict(H[[i]], dataset, type = "class")
pred = as.numeric(as.character(pred))
y_hat_final = y_hat_final + (a[i] * pred)
}
pred <- sign(y_hat_final)
eval_model(pred, y)
> eval_model(pred, y)
Confusion matrix (absolute):
Actual
Prediction -1 1 Sum
-1 29 0 29
1 0 42 42
Sum 29 42 71
Confusion matrix (relative):
Actual
Prediction -1 1 Sum
-1 0.41 0.00 0.41
1 0.00 0.59 0.59
Sum 0.41 0.59 1.00
Accuracy:
1 (71/71)
Error rate:
0 (0/71)
Error rate reduction (vs. base rate):
1 (p-value < 2.2e-16)

Related

Counter loops for summary data Confidence Intervals in R

I am trying to use a for-loop as a repeat counter to add summary data to a test sample. I have tried to use a data.frame, matrix, and a vector push my data out of the for loop and populate a table. The best I have got is filling one complete column in a vector and completing all columns but one row in a data frame.
#try empty vector to populate
large.sample.df <- vector(mode = "double", length = 1000)
#try matrix to populate
large.matrix <- matrix(nrow = 1000, ncol = 3)
matrix.names <- c("mean", "lwr", "upr")
colnames(large.matrix) <- matrix.names
#Try dataframe to populate
large.df <- data.frame(mean="", lwr="", upr="")
#set total length
n <- length(large.sample.df)
#use functions to calculate confidence interval
lwr.ci <- function(a) (mean(a) - 1.96 * (sd(a)/sqrt(length(a))))
upp.ci <- function(a) (mean(a) + 1.96 * (sd(a)/sqrt(length(a))))
#Start new seed count
set.seed(1234)
#begin for loop for mean, lwr, upr CI
for (i in 1:n) {
large.sample <- rgamma(n = 1000, shape = 4, rate = 2)
large.df$mean[i] <- mean(large.sample)
large.df$lwr[i] <- lwr.ci(large.sample)
large.df$upr[i] <- upp.ci(large.sample)
}
Here are two ways to get what you want. First we should distinguish between the sample size and the number of samples:
set.seed(1234)
n <- 1000
samples <- 10 # Keep this small for testing and then increase it
s <- 4
r <- 2
First your loop approach:
results <- data.frame(mean=NA, lwr=NA, upr=NA) # Not "" which makes the variables character strings
set.seed(1234)
for (i in 1:samples) {
x <- rgamma(n, shape = s, rate = r)
mn <- mean(x)
sder <- sd(x)/sqrt(n)
lwr <- mn - 1.96 * sder
upr <- mn + 1.96 * sder
results[i, ] <- c(mn, lwr, upr)
}
results
# mean lwr upr
# 1 2.015193688 1.952431714 2.077955663
# 2 2.024218250 1.962404608 2.086031891
# 3 2.008401293 1.948363928 2.068438658
# 4 1.993061142 1.932020588 2.054101696
# 5 1.975824831 1.912961486 2.038688176
# 6 1.983761126 1.923583927 2.043938325
# 7 1.983166350 1.924890819 2.041441880
# 8 1.975453269 1.915336118 2.035570420
# 9 1.976118333 1.915025748 2.037210918
# 10 2.044088839 1.983435628 2.104742050
Now using replicate
confint <- function(n, s, r) {
x <- rgamma(n, shape = s, rate = r)
mn <- mean(x)
sder <- sd(x)/sqrt(n)
lwr <- mn - 1.96 * sder
upr <- mn + 1.96 * sder
return(c(mean=mn, lwr=lwr, upr=upr))
}
confint(n, s, r) # Test the function
# mean lwr upr
# 1.974328366 1.914003710 2.034653023
set.seed(1234)
results <- replicate(samples, confint(n, s, r))
results <- t(results)
results
# mean lwr upr
# [1,] 2.015193688 1.952431714 2.077955663
# [2,] 2.024218250 1.962404608 2.086031891
# [3,] 2.008401293 1.948363928 2.068438658
# [4,] 1.993061142 1.932020588 2.054101696
# [5,] 1.975824831 1.912961486 2.038688176
# [6,] 1.983761126 1.923583927 2.043938325
# [7,] 1.983166350 1.924890819 2.041441880
# [8,] 1.975453269 1.915336118 2.035570420
# [9,] 1.976118333 1.915025748 2.037210918
# [10,] 2.044088839 1.983435628 2.104742050
Both approaches agree.

Inner-loop behavior of a parallelized structure using R

I want to understand how my parallelization is working when there is a for-loop structure inside of the structure that I am parallelizing.
I have a routine called reg_simulation(), which generated 100 estimations (nrep=100) of linear regression, each of those using a different seed (seed <- seed + i).
Additionally, I wrapped up the reg_simulation() routine inside par_wrapper() to run it using different possible configurations of the data generating process. In particular, changing the number of observations (obs) and the error term variance (sigma). Finally, I parallelized this structure using pblapply.
Using the described setup, I am using a grid of obs = c(250, 500, 750, 1000, 2500) and
sigma = c(0.1, 0.2, 0.5, 0.8 , 1 ) meaning 5 values in each variable, leading to a 25 combinations of the two variables. However, I am running 100 times these 25 combinations.
Finally, here is my question:
My code is...
(a) Running in parallel 25 combinations but serially the 100 repetition inside of them.
(b) Running in parallel all the 2500 models.
If the answer is (a), please let me know how you arrived at such a conclusion because I haven't been sorted out yet, and probably it might imply that I should change my code structure.
Some additional comments: (1) The seed declaration on each iteration is important because it allows me to recover each possible combination of the data (e.g., iteration 78 (seed = 78), with sigma=0.1 and obs=1000) (2) I am using pblapply because I want to track my code simulations' progress.
Here the aforementioned routines:
reg_simulation()
reg_simulation<- function(obs = 1000,
sigma = 0.5,
nrep = 10 ,
seed = 0){
#seet seed
res <- vector("list", nrep)
# Forloop
for ( i in 1:nrep) {
#Changing seed each iteration
seed <- seed + i
#set seed
set.seed(seed)
#DGP
x1 <- rnorm(obs, 0 , sigma)
x2 <- rnorm(obs, 0 , sigma)
y <- 1 + 0.5* x1 + 1.5 * x2 + rnorm(obs, 0 , 1)
#Estimate OLS
ols <- lm(y ~ x1 + x2)
returnlist <- list(intercept = ols$coefficients[1],
beta1 = ols$coefficients[2],
beta2 = ols$coefficients[3],
seed = seed)
#save each iteration
res[[i]] <- returnlist
}
return(res)
}
par_wrapper()
### parallel wrapper
par_wrapper <- function(obs = c(250,500,750,1000,2500),
sigma = c(0.1, 0.2, 0.5, 0.8 , 1 ) ,
nrep = 10,
nClusters = 4)
{
require(parallel)
require(pbapply)
#grid of searching space
prs <- expand.grid(obs = obs,
sigma = sigma)
nprs <- nrow(prs)
rownames(prs) <- c(1:NROW(prs))
#Print number of combinations
print(prs)
#### ---- PARALLEL INIT ---- ####
## Parallel options
cl <- makeCluster(nClusters)
## Attaching necessary functions for internal computations
parallel::clusterExport(cl= cl,
list("reg_simulation"))
# pblapply
par_simres <- pblapply(cl = cl,
X = 1:nprs,
FUN = function(i){
reg_simulation(
sigma = prs$sigma[i],
obs = prs$obs[i],
nrep = nrep,
seed = 0)})
##exit cluster mode
stopCluster(cl)
return(par_simres)
}
Using the par_wrapper() function over a grid.
#using generated structure.
res_list <- par_wrapper(
obs = c(250,500,750,1000, 2500 ),
sigma = c(0.1, 0.2, 0.5, 0.8 , 1 ) ,
nrep = 100,
nClusters = 4)
Console output.
# obs sigma
# 1 250 0.1
# 2 500 0.1
# 3 750 0.1
# 4 1000 0.1
# 5 2500 0.1
# 6 250 0.2
# 7 500 0.2
# 8 750 0.2
# 9 1000 0.2
# 10 2500 0.2
# 11 250 0.5
# 12 500 0.5
# 13 750 0.5
# 14 1000 0.5
# 15 2500 0.5
# 16 250 0.8
# 17 500 0.8
# 18 750 0.8
# 19 1000 0.8
# 20 2500 0.8
# 21 250 1.0
# 22 500 1.0
# 23 750 1.0
# 24 1000 1.0
# 25 2500 1.0
# |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s

Expected return and covariance from return time series

I’m trying to simulate the Matlab ewstats function here defined:
https://it.mathworks.com/help/finance/ewstats.html
The results given by Matlab are the following ones:
> ExpReturn = 1×2
0.1995 0.1002
> ExpCovariance = 2×2
0.0032 -0.0017
-0.0017 0.0010
I’m trying to replicate the example with the RiskPortfolios R package:
https://cran.r-project.org/web/packages/RiskPortfolios/RiskPortfolios.pdf
The R code I’m using is this one:
library(RiskPortfolios)
rets <- as.matrix(cbind(c(0.24, 0.15, 0.27, 0.14), c(0.08, 0.13, 0.06, 0.13)))
w <- 0.98
rets
w
meanEstimation(rets, control = list(type = 'ewma', lambda = w))
covEstimation(rets, control = list(type = 'ewma', lambda = w))
The mean estimation is the same of the one in the example, but the covariance matrix is different:
> rets
[,1] [,2]
[1,] 0.24 0.08
[2,] 0.15 0.13
[3,] 0.27 0.06
[4,] 0.14 0.13
> w
[1] 0.98
>
> meanEstimation(rets, control = list(type = 'ewma', lambda = w))
[1] 0.1995434 0.1002031
>
> covEstimation(rets, control = list(type = 'ewma', lambda = w))
[,1] [,2]
[1,] 0.007045044 -0.003857217
[2,] -0.003857217 0.002123827
Am I missing something?
Thanks
They give the same answer if type = "lw" is used:
round(covEstimation(rets, control = list(type = 'lw')), 4)
## 0.0032 -0.0017
## -0.0017 0.0010
They are using different algorithms. From the RiskPortfolio manual:
ewma ... See RiskMetrics (1996)
From the Matlab hlp page:
There is no relationship between ewstats function and the RiskMetrics® approach for determining the expected return and covariance from a return time series.
Unfortunately Matlab does not tell us which algorithm is used.
For those who eventually need an equivalent ewstats function in R, here the code I wrote:
ewstats <- function(RetSeries, DecayFactor=NULL, WindowLength=NULL){
#EWSTATS Expected return and covariance from return time series.
# Optional exponential weighting emphasizes more recent data.
#
# [ExpReturn, ExpCovariance, NumEffObs] = ewstats(RetSeries, ...
# DecayFactor, WindowLength)
#
# Inputs:
# RetSeries : NUMOBS by NASSETS matrix of equally spaced incremental
# return observations. The first row is the oldest observation, and the
# last row is the most recent.
#
# DecayFactor : Controls how much less each observation is weighted than its
# successor. The k'th observation back in time has weight DecayFactor^k.
# DecayFactor must lie in the range: 0 < DecayFactor <= 1.
# The default is DecayFactor = 1, which is the equally weighted linear
# moving average Model (BIS).
#
# WindowLength: The number of recent observations used in
# the computation. The default is all NUMOBS observations.
#
# Outputs:
# ExpReturn : 1 by NASSETS estimated expected returns.
#
# ExpCovariance : NASSETS by NASSETS estimated covariance matrix.
#
# NumEffObs: The number of effective observations is given by the formula:
# NumEffObs = (1-DecayFactor^WindowLength)/(1-DecayFactor). Smaller
# DecayFactors or WindowLengths emphasize recent data more strongly, but
# use less of the available data set.
#
# The standard deviations of the asset return processes are given by:
# STDVec = sqrt(diag(ECov)). The correlation matrix is :
# CorrMat = VarMat./( STDVec*STDVec' )
#
# See also MEAN, COV, COV2CORR.
NumObs <- dim(RetSeries)[1]
NumSeries <- dim(RetSeries)[2]
# size the series and the window
if (is.null(WindowLength)) {
WindowLength <- NumObs
}
if (is.null(DecayFactor)) {
DecayFactor = 1
}
if (DecayFactor <= 0 | DecayFactor > 1) {
stop('Must have 0< decay factor <= 1.')
}
if (WindowLength > NumObs){
stop(sprintf('Window Length #d must be <= number of observations #d',
WindowLength, NumObs))
}
# ------------------------------------------------------------------------
# size the data to the window
RetSeries <- RetSeries[NumObs-WindowLength+1:NumObs, ]
# Calculate decay coefficients
DecayPowers <- seq(WindowLength-1, 0, by = -1)
VarWts <- sqrt(DecayFactor)^DecayPowers
RetWts <- (DecayFactor)^DecayPowers
NEff = sum(RetWts) # number of equivalent values in computation
# Compute the exponentially weighted mean return
WtSeries <- matrix(rep(RetWts, times = NumSeries),
nrow = length(RetWts), ncol = NumSeries) * RetSeries
ERet <- colSums(WtSeries)/NEff;
# Subtract the weighted mean from the original Series
CenteredSeries <- RetSeries - matrix(rep(ERet, each = WindowLength),
nrow = WindowLength, ncol = length(ERet))
# Compute the weighted variance
WtSeries <- matrix(rep(VarWts, times = NumSeries),
nrow = length(VarWts), ncol = NumSeries) * CenteredSeries
ECov <- t(WtSeries) %*% WtSeries / NEff
list(ExpReturn = ERet, ExpCovariance = ECov, NumEffObs = NEff)
}

Make LOOCV function work faster in R

I have this function written to perform LOOCV for classification of samples in a dataset. The problem is it is taking toooo long. I need to use it for bootstrapping of 150 iterations. How can I make it work faster please?
DoLOOCVa2 <- function(X,Class,S,MTHD,LV) {
#
# Leave-one-out cross validation (LOO-CV) with either the pls.lda function (MTHD = 1)
# or SVMs (MTHD = 2)
# X is the input matrix
# Class is the class vector
# S is the scaling to be performed:
# 0 = None; 1 = MC; 2 = AS; 3 = RS1; 4 = RS2; 5 = Norm
# MTHD is the classification function
# LV is the maximum number of latent variables (= 0 if SVM is used)
# Differs from DoLOOCV2 in that only the minimum LV is outputted corresponding to the maximum %Overall
#
#
source("pretreat.r") # Source function in case not sourced from calling script
C5 <- 0
C6 <- 0
cat(paste("\nSetting number of LVs to ", LV, "...\n"))
# Set number of iterations to LV
for (i in 1:length(LV)) {
cat(paste("\nLOO-CV for LV",i,"\n"))
# Set number of iterations to the number of rows of X
for (j in 1:dim(X)[1]) {
X1 <- X[-j,] # Remove the row j from X so that X1 is a "training set"
C1 <- Class[-j] # Remove element j from Class
X2 <- t(as.matrix(X[j,])) # Make X2 = to row j of X so that X2 is a "test set"
C2 <- Class[j] # Make C2 = to element j of Class
Xn <- pretreat(X1,X2,S) # Apply data pre-treatment
X1a <- Xn$trDATAscaled # Extract the scaled "training set"
X2a <- Xn$tstDATAscaled # Extract the scaled "test set"
if (MTHD == 1) {
# PLS-LDA
C3 <- pls.lda(X1a,C1,X2a,LV[i]) # Perform the classification
C4 <- as.numeric(C3$predclass) # Extract the predicted class
} else if (MTHD == 2) {
# SVM (LV = 1)
MODEL <- svm(X1a,as.factor(C1),scale=FALSE,type="C-classification",kernel="radial")
C4 <- as.numeric(predict(MODEL,X2a))
} else if (MTHD == 3) {
# Random Forests
C3 <- randomForest(x=X1a,y=as.factor(C1),xtest=X2a,ntree=LV[i]) # Perform the classification (LV value indicates here the number of trees
C4 <- as.numeric(C3$test$predicted) # Extract the predicted class
}
C5[j] <- C4 - C2 # Calculate the difference between the predicted class (C4) and the actual class (C2)
}
C6[i] <- 100*(length(which(C5==0)))/length(C5) # Work out the overall success rate of classiciation for LV i
}
cat("\nThe %success classified was:\n")
show(C6)
if (MTHD == 1 | MTHD == 3) {
if (MTHD == 1) {
cat("\nThe suggested optimum LV(s) are: \n")
OptLVs <- which(C6==max(C6)) # Attain the maximum success rate
show(OptLVs)
return(list(OptLVs=OptLVs[1],Overall=C6))
} else {
cat("\nThe suggested optimum TREE(s) are: \n")
OLV <- which(C6==max(C6)) # Attain the maximum success rate
OptLVs <- LV[OLV]
show(OptLVs)
return(list(OptLVs=OptLVs[1],Overall=C6))
}
} else {
OptLVs <- 0
return(C6)
}
} # End function
I managed to do the profiling below is the result:
total.time total.pct self.time self.pct
"DoLOOCVa2" 17.90 100.00 0.04 0.22
"pls.lda" 9.62 53.74 0.02 0.11
"pretreat" 8.20 45.81 0.02 0.11
"source" 6.02 33.63 1.24 6.93
"lda" 3.98 22.23 0.02 0.11
"pretreat1" 3.96 22.12 0.04 0.22
"lda.formula" 3.96 22.12 0.00 0.00
"FUN" 3.48 19.44 1.62 9.05

Nonlinear Least Squares in R - Levenberg Marquardt to Fit Heligman Pollard Model Parameters

I am attempting to reproduce the solutions of paper by Kostakis. In this paper an abridged mortality table is expanded to a complete life table using de Heligman-Pollard model. The model has 8 parameters which have to be fitted. The author used a modified Gauss-Newton algorithm; this algorithm (E04FDF) is part of the NAG library of computer programs. Should not Levenberg Marquardt yield the same set of parameters? What is wrong with my code or application of the LM algorithm?
library(minpack.lm)
## Heligman-Pollard is used to expand an abridged table.
## nonlinear least squares algorithm is used to fit the parameters on nqx observed over 5 year intervals (5qx)
AGE <- c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70)
MORTALITY <- c(0.010384069, 0.001469140, 0.001309318, 0.003814265, 0.005378395, 0.005985625, 0.006741766, 0.009325056, 0.014149626, 0.021601755, 0.034271934, 0.053836246, 0.085287751, 0.136549522, 0.215953304)
## The start parameters for de Heligman-Pollard Formula (Converged set a=0.0005893,b=0.0043836,c=0.0828424,d=0.000706,e=9.927863,f=22.197312,g=0.00004948,h=1.10003)
## I modified a random parameter "a" in order to have a start values. The converged set is listed above.
parStart <- list(a=0.0008893,b=0.0043836,c=0.0828424,d=0.000706,e=9.927863,f=22.197312,g=0.00004948,h=1.10003)
## The Heligman-Pollard Formula (HP8) = qx/px = ...8 parameter equation
HP8 <-function(parS,x)
ifelse(x==0, parS$a^((x+parS$b)^parS$c) + parS$g*parS$h^x,
parS$a^((x+parS$b)^parS$c) + parS$d*exp(-parS$e*(log(x/parS$f))^2) +
parS$g*parS$h^x)
## Define qx = HP8/(1+HP8)
qxPred <- function(parS,x) HP8(parS,x)/(1+HP8(parS,x))
## Calculate nqx predicted by HP8 model (nqxPred(parStart,x))
nqxPred <- function(parS,x)
(1 -(1-qxPred(parS,x)) * (1-qxPred(parS,x+1)) *
(1-qxPred(parS,x+2)) * (1-qxPred(parS,x+3)) *
(1-qxPred(parS,x+4)))
##Define Residual Function, the relative squared distance is minimized
ResidFun <- function(parS, Observed,x) (nqxPred(parS,x)/Observed-1)^2
## Applying the nls.lm algo.
nls.out <- nls.lm(par=parStart, fn = ResidFun, Observed = MORTALITY, x = AGE,
control = nls.lm.control(nprint=1,
ftol = .Machine$double.eps,
ptol = .Machine$double.eps,
maxfev=10000, maxiter = 500))
summary(nls.out)
## The author used a modified Gauss-Newton algorithm, this alogorithm (E04FDF) is part of the NAG library of computer programs
## Should not Levenberg Marquardt yield the same set of parameters
The bottom line here is that #Roland is absolutely right, this is a very ill-posed problem, and you shouldn't necessarily expect to get reliable answers. Below I've
cleaned up the code in a few small ways (this is just aesthetic)
changed the ResidFun to return residuals, not squared residuals. (The former is correct, but this doesn't make very much difference.)
explored results from several different optimizers. It actually looks like the answer you're getting is better than the "converged parameters" you list above, which I'm assuming are the parameters from the original study (can you please provide a reference?).
Load package:
library(minpack.lm)
Data, as a data frame:
d <- data.frame(
AGE = seq(0,70,by=5),
MORTALITY=c(0.010384069, 0.001469140, 0.001309318, 0.003814265,
0.005378395, 0.005985625, 0.006741766, 0.009325056,
0.014149626, 0.021601755, 0.034271934, 0.053836246,
0.085287751, 0.136549522, 0.215953304))
First view of the data:
library(ggplot2)
(g1 <- ggplot(d,aes(AGE,MORTALITY))+geom_point())
g1+geom_smooth() ## with loess fit
Parameter choices:
Presumably these are the parameters from the original paper ...
parConv <- c(a=0.0005893,b=0.0043836,c=0.0828424,
d=0.000706,e=9.927863,f=22.197312,g=0.00004948,h=1.10003)
Perturbed parameters:
parStart <- parConv
parStart["a"] <- parStart["a"]+3e-4
The formulae:
HP8 <-function(parS,x)
with(as.list(parS),
ifelse(x==0, a^((x+b)^c) + g*h^x,
a^((x+b)^c) + d*exp(-e*(log(x/f))^2) + g*h^x))
## Define qx = HP8/(1+HP8)
qxPred <- function(parS,x) {
h <- HP8(parS,x)
h/(1+h)
}
## Calculate nqx predicted by HP8 model (nqxPred(parStart,x))
nqxPred <- function(parS,x)
(1 -(1-qxPred(parS,x)) * (1-qxPred(parS,x+1)) *
(1-qxPred(parS,x+2)) * (1-qxPred(parS,x+3)) *
(1-qxPred(parS,x+4)))
##Define Residual Function, the relative squared distance is minimized
ResidFun <- function(parS, Observed,x) (nqxPred(parS,x)/Observed-1)
n.b. this is changed slightly from the OP's version; nls.lm wants residuals, not squared residuals.
A sum-of-squares function for use with other optimizers:
ssqfun <- function(parS, Observed, x) {
sum(ResidFun(parS, Observed, x)^2)
}
Applying nls.lm. (Not sure why ftol and ptol were lowered
from sqrt(.Machine$double.eps) to .Machine$double.eps -- the
former is generally a practical limit to precision ...
nls.out <- nls.lm(par=parStart, fn = ResidFun,
Observed = d$MORTALITY, x = d$AGE,
control = nls.lm.control(nprint=0,
ftol = .Machine$double.eps,
ptol = .Machine$double.eps,
maxfev=10000, maxiter = 1000))
parNLS <- coef(nls.out)
pred0 <- nqxPred(as.list(parConv),d$AGE)
pred1 <- nqxPred(as.list(parNLS),d$AGE)
dPred <- with(d,rbind(data.frame(AGE,MORTALITY=pred0,w="conv"),
data.frame(AGE,MORTALITY=pred1,w="nls")))
g1 + geom_line(data=dPred,aes(colour=w))
The lines are indistinguishable, but the parameters have some big
differences:
round(cbind(parNLS,parConv),5)
## parNLS parConv
## a 1.00000 0.00059
## b 50.46708 0.00438
## c 3.56799 0.08284
## d 0.00072 0.00071
## e 6.05200 9.92786
## f 21.82347 22.19731
## g 0.00005 0.00005
## h 1.10026 1.10003
d,f,g,h are close, but a,b,c are orders of magnitude different and e is 50% different.
Looking at the original equations, what's happening here is that a^((x+b)^c) is getting set to a constant, because a is approaching 1: once a is approximately 1, b and c are essentially irrelevant.
Let's check the correlation (we need a generalized inverse because
the matrix is so strongly correlated):
obj <- nls.out
vcov <- with(obj,deviance/(length(fvec) - length(par)) *
MASS::ginv(hessian))
cmat <- round(cov2cor(vcov),1)
dimnames(cmat) <- list(letters[1:8],letters[1:8])
## a b c d e f g h
## a 1.0 0.0 0.0 0.0 0.0 0.0 -0.1 0.0
## b 0.0 1.0 -1.0 1.0 -1.0 -1.0 -0.4 -1.0
## c 0.0 -1.0 1.0 -1.0 1.0 1.0 0.4 1.0
## d 0.0 1.0 -1.0 1.0 -1.0 -1.0 -0.4 -1.0
## e 0.0 -1.0 1.0 -1.0 1.0 1.0 0.4 1.0
## f 0.0 -1.0 1.0 -1.0 1.0 1.0 0.4 1.0
## g -0.1 -0.4 0.4 -0.4 0.4 0.4 1.0 0.4
## h 0.0 -1.0 1.0 -1.0 1.0 1.0 0.4 1.0
This is not actually so useful -- it really just confirms that lots
of the variables are strongly correlated ...
library(optimx)
mvec <- c('Nelder-Mead','BFGS','CG','L-BFGS-B',
'nlm','nlminb','spg','ucminf')
opt1 <- optimx(par=parStart, fn = ssqfun,
Observed = d$MORTALITY, x = d$AGE,
itnmax=5000,
method=mvec,control=list(kkt=TRUE))
## control=list(all.methods=TRUE,kkt=TRUE)) ## Boom!
## fvalues method fns grs itns conv KKT1 KKT2 xtimes
## 2 8.988466e+307 BFGS NA NULL NULL 9999 NA NA 0
## 3 8.988466e+307 CG NA NULL NULL 9999 NA NA 0
## 4 8.988466e+307 L-BFGS-B NA NULL NULL 9999 NA NA 0
## 5 8.988466e+307 nlm NA NA NA 9999 NA NA 0
## 7 0.3400858 spg 1 NA 1 3 NA NA 0.064
## 8 0.3400858 ucminf 1 1 NULL 0 NA NA 0.032
## 1 0.06099295 Nelder-Mead 501 NA NULL 1 NA NA 0.252
## 6 0.009275733 nlminb 200 1204 145 1 NA NA 0.708
This warns about bad scaling, and also finds a variety of different
answers: only ucminf claims to have converged, but nlminb gets a
better answer -- and the itnmax parameter seems to be ignored ...
opt2 <- nlminb(start=parStart, objective = ssqfun,
Observed = d$MORTALITY, x = d$AGE,
control= list(eval.max=5000,iter.max=5000))
parNLM <- opt2$par
Finishes, but with a false convergence warning ...
round(cbind(parNLS,parConv,parNLM),5)
## parNLS parConv parNLM
## a 1.00000 0.00059 1.00000
## b 50.46708 0.00438 55.37270
## c 3.56799 0.08284 3.89162
## d 0.00072 0.00071 0.00072
## e 6.05200 9.92786 6.04416
## f 21.82347 22.19731 21.82292
## g 0.00005 0.00005 0.00005
## h 1.10026 1.10003 1.10026
sapply(list(parNLS,parConv,parNLM),
ssqfun,Observed=d$MORTALITY,x=d$AGE)
## [1] 0.006346250 0.049972367 0.006315034
It looks like nlminb and minpack.lm are getting similar answers, and are actually doing better than the originally stated parameters (by quite a bit):
pred2 <- nqxPred(as.list(parNLM),d$AGE)
dPred <- with(d,rbind(dPred,
data.frame(AGE,MORTALITY=pred2,w="nlminb")))
g1 + geom_line(data=dPred,aes(colour=w))
ggsave("cmpplot.png")
ggplot(data=dPred,aes(x=AGE,y=MORTALITY-d$MORTALITY,colour=w))+
geom_line()+geom_point(aes(shape=w),alpha=0.3)
ggsave("residplot.png")
Other things one could try would be:
appropriate scaling -- although a quick test of this doesn't seem to help that much
provide analytical gradients
use AD Model Builder
use the slice function from bbmle to explore whether the old and new parameters seem to represent distinct minima, or whether the old parameters are just a false convergence ...
get the KKT (Karsh-Kuhn-Tucker) criterion calculators from optimx or related packages working for similar checks
PS: the largest deviations (by far) are for the oldest age classes, which probably also have small samples. From a statistical point of view it would probably be worth doing a fit that weighted by the precision of the individual points ...
#BenBolker, fitting the parameters with the entire dataset (underlying qx) values. Still not able to reproduce parameters
library(minpack.lm)
library(ggplot2)
library(optimx)
getwd()
d <- data.frame(AGE = seq(0,74), MORTALITY=c(869,58,40,37,36,35,32,28,29,23,24,22,24,28,
33,52,57,77,93,103,103,109,105,114,108,112,119,
125,117,127,125,134,134,131,152,179,173,182,199,
203,232,245,296,315,335,356,405,438,445,535,594,
623,693,749,816,915,994,1128,1172,1294,1473,
1544,1721,1967,2129,2331,2559,2901,3203,3470,
3782,4348,4714,5245,5646))
d$MORTALITY <- d$MORTALITY/100000
ggplot(d,aes(AGE,MORTALITY))+geom_point()
##Not allowed to post Images
g1 <- ggplot(d,aes(AGE,MORTALITY))+geom_point()
g1+geom_smooth()## with loess fit
Reported Parameters:
parConv <- c(a=0.0005893,b=0.0043836,c=0.0828424,d=0.000706,e=9.927863,f=22.197312,
g=0.00004948,h=1.10003)
parStart <- parConv
parStart["a"] <- parStart["a"]+3e-4
## Define qx = HP8/(1+HP8)
HP8 <-function(parS,x)
with(as.list(parS),
ifelse(x==0, a^((x+b)^c) + g*h^x, a^((x+b)^c) + d*exp(-e*(log(x/f))^2) + g*h^x))
qxPred <- function(parS,x) {
h <- HP8(parS,x)
h/(1+h)
}
##Define Residual Function, the relative squared distance is minimized,
ResidFun <- function(parS, Observed,x) (qxPred(parS,x)/Observed-1)
ssqfun <- function(parS, Observed, x) {
sum(ResidFun(parS, Observed, x)^2)
}
nls.out <- nls.lm(par=parStart, fn = ResidFun, Observed = d$MORTALITY, x = d$AGE,
control = nls.lm.control(nprint=1, ftol = sqrt(.Machine$double.eps),
ptol = sqrt(.Machine$double.eps), maxfev=1000, maxiter=1000))
parNLS <- coef(nls.out)
pred0 <- qxPred(as.list(parConv),d$AGE)
pred1 <- qxPred(as.list(parNLS),d$AGE)
#Binds Row wise the dataframes from pred0 and pred1
dPred <- with(d,rbind(data.frame(AGE,MORTALITY=pred0,w="conv"),
data.frame(AGE,MORTALITY=pred1,w="nls")))
g1 + geom_line(data=dPred,aes(colour=w))
round(cbind(parNLS,parConv),7)
mvec <- c('Nelder-Mead','BFGS','CG','L-BFGS-B','nlm','nlminb','spg','ucminf')
opt1 <- optimx(par=parStart, fn = ssqfun,
Observed = d$MORTALITY, x = d$AGE,
itnmax=5000,
method=mvec, control=list(all.methods=TRUE,kkt=TRUE,)
## control=list(all.methods=TRUE,kkt=TRUE)) ## Boom
get.result(opt1, attribute= c("fvalues","method", "grs", "itns",
"conv", "KKT1", "KKT2", "xtimes"))
## method fvalues grs itns conv KKT1 KKT2 xtimes
##5 nlm 8.988466e+307 NA NA 9999 NA NA 0
##4 L-BFGS-B 8.988466e+307 NULL NULL 9999 NA NA 0
##2 CG 8.988466e+307 NULL NULL 9999 NA NA 0.02
##1 BFGS 8.988466e+307 NULL NULL 9999 NA NA 0
##3 Nelder-Mead 0.5673864 NA NULL 0 NA NA 0.42
##6 nlminb 0.4127198 546 62 0 NA NA 0.17
opt2 <- nlminb(start=parStart, objective = ssqfun,
Observed = d$MORTALITY, x = d$AGE,
control= list(eval.max=5000,iter.max=5000))
parNLM <- opt2$par
Check on parameters:
round(cbind(parNLS,parConv,parNLM),5)
## parNLS parConv parNLM
##a 0.00058 0.00059 0.00058
##b 0.00369 0.00438 0.00369
##c 0.08065 0.08284 0.08065
##d 0.00070 0.00071 0.00070
##e 9.30948 9.92786 9.30970
##f 22.30769 22.19731 22.30769
##g 0.00005 0.00005 0.00005
##h 1.10084 1.10003 1.10084
SSE Review:
sapply(list(parNLS,parConv,parNLM),
ssqfun,Observed=d$MORTALITY,x=d$AGE)
##[1] 0.4127198 0.4169513 0.4127198
Not able to upload graphs but the code is here. Still appears that the parameters found in the article are not the best fit when the complete mortality data (not abridged or subset) is used
##pred2 <- qxPred(as.list(parNLM),d$AGE)
##dPred <- with(d,rbind(dPred,
data.frame(AGE,MORTALITY=pred2,w="nlminb")))
##g1 + geom_line(data=dPred,aes(colour=w))
ggplot(data=dPred,aes(x=AGE,y=MORTALITY-d$MORTALITY,colour=w))
+ geom_line()+geom_point(aes(shape=w),alpha=0.3)

Resources