Error in nonlinear least squeares in R - Logistic and Gompertz curves - r

I'm working on a model for variable y, in which I intend to use time as an explanatory variable. I've chosen a Gompertz and a logistic curve as candidates, but when I try to estimate the coefficients (using both nls and nls2), I end up getting different errors (singularity or step factor reduced below 'minFactor'). I would really appreciate any help. Here is my code and a deput version of the info object.
I chose the initial values according to the criteria in http://www.metla.fi/silvafennica/full/sf33/sf334327.pdf
library(nls2)
> dput(info)
structure(list(y = c(0.308, 0.279, 0.156, 0.214, 0.224, 0.222,
0.19, 0.139, 0.111, 0.17, 0.155, 0.198, 0.811, 0.688, 0.543,
0.536, 0.587, 0.765, 0.667, 0.811, 0.587, 0.617, 0.586, 0.633,
2.231, 2.202, 1.396, 1.442, 1.704, 2.59, 2.304, 3.026, 2.7, 3.275,
3.349, 3.936, 9.212, 8.773, 6.431, 6.983, 7.169, 9.756, 10.951,
13.938, 14.378, 18.406, 24.079, 28.462, 51.461, 46.555, 39.116,
43.982, 41.722), t = 1:53), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -53L))
summary(gomp_nls <- nls2(y ~ alpha*exp(-beta*exp(-gamma*t)),
data = info,
start = list(alpha = 40, beta = 4.9, gamma = 0.02),
algorithm = "default")
)
summary(logist_nls <- nls2(y ~ alpha/(1+beta*exp(-gamma*t)),
data = info,
start = list(alpha = 40, beta = 128, gamma = 0.02),
algorithm = "default"))
)
I'd appreciate any help

The "default" algorithm for nls2 is to use nls. You want to specify "brute-force" or one of the other algorithms for finding an initial value. The starting value should be a data frame of two rows such that it will fill in the hypercube so defined with potential starting values.
It will then evaluate the residual sum of squares at each of those starting values and return the starting values at which the formula gives the least sum of squares.
If you find that the result returned by nls2 is at the boundary of the region you defined then enlarge the region and try again. (You might not need this step if the starting value returned are good enough anyways.)
Finally run nls with the starting values you found.
library(nls2)
## 1
fo1 <- y ~ alpha*exp(-beta*exp(-gamma*t))
st1 <- data.frame(alpha = c(10, 100), beta = c(1, 100), gamma = c(0.01, 0.20))
fm1.0 <- nls2(fo1, data = info, start = st1, algorithm = "brute-force")
fm1 <- nls(fo1, data = info, start = coef(fm1.0))
## 2
fo2 <- y ~ alpha/(1+beta*exp(-gamma*t))
st2 <- data.frame(alpha = c(10, 1000), beta = c(1, 10000), gamma = c(0.01, 0.20))
fm2.0 <- nls2(fo2, data = info, start = st2, algorithm = "brute-force")
fm2 <- nls(fo2, data = info, start = coef(fm2.0))
# plot both fits
plot(y ~ t, info)
lines(fitted(fm1) ~ t, info, col = "blue")
lines(fitted(fm2) ~ t, info, col = "red")
Note
Note that for the data shown these two 2-parameter exponential models fit reasonably well so if you are only interested in the range where it rises exponentially then these could be alternatives to consider. (The first one below is better because the coefficients are more similar to each other. The second one may have scaling problems.)
fm3 <- nls(y ~ a * exp(b/t), info, start = c(a = 1, b = 1))
fm4 <- nls(y ~ a * t^b, info, start = c(a = .001, b = 6))

Related

Check for conditional heteroskedasticity

Data:
dput(head(mydata))
structure(list(DATE = structure(c(-315619200, -312940800, -310435200,
-307756800, -305164800, -302486400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), RF = c(0.33, 0.29, 0.35, 0.19, 0.27, 0.24), RMRF = c(-6.99,
0.99, -1.46, -1.7, 3.08, 2.09), SMB = c(2.13, 0.71, -0.65, 0.32,
1.42, -0.24), UMD = c(-3.28, 3.59, 1.85, 2.6, 4.77, 1.03), HML = c(2.65,
-2.15, -2.69, -2.22, -3.83, -0.3), JANDUM = c(1, 0, 0, 0, 0,
0), R4 = c(-4.57, 1.5, -2.83, -1.98, 3.54, 2.15)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
My data contains:
R4 is the percentage return of a portfolio, RF is the return of a good without risk (risk free rate), RMRF is the excess return of portfolio Market Portfolio, SMB, UMD, and HML are 3 factors, and JANDUM is a dummy variable for January (January Dummy).
The data is on a monthly frequency from 1/1960 to 12/2003 (there are 528 observations totaly).
I build the following code for the purpose of "Regress portfolio excess return (R4-RF) to one constant and all other variables (RMRF, SMB, UMD, HML, and JANDUM)".
mydata$PER <-mydata$R4 - mydata$RF
mydata$JANDUM <- as.factor(mydata$JANDUM)
# Fit regression model
model <- lm(PER ~ DATE + RMRF + SMB + UMD + HML + JANDUM, data = mydata)
summary(model)
Then i want to check for the presence of autocorrelation and conditional
heteroskedasticity.
My try:
# // Tests that comes to establish the presence or absence of heteroscedasticity
# Breusch-Pagan test
lmtest::bptest(model)
# Breusch-Pagan test
car::ncvTest(model)
PER_BCMod <- caret::BoxCoxTrans(mydata$PER)
print(PER_BCMod)
# The model for creating the box-cox transformed variable is ready. Lets now apply it on mydata$PER and append it to a new dataframe.
mydata <- cbind(mydata, PER_New = predict(PER_MCMod, mydata$PER))
head(mydata)
# The transformed data for our new regression model is ready. Lets build the model and check for heteroscedasticity.
model_bc <- lm(PER_New ~ DATE + RMRF + SMB + UMD + HML + JANDUM, data=mydata)
bptest(model_bc)
plot(model_bc)
Errors :
> print(PER_BCMod)
Box-Cox Transformation
528 data points used to estimate Lambda
Input data summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-30.0700 -2.7800 1.0500 0.6758 4.4325 23.5800
Lambda could not be estimated; no transformation is applied
> # The model for creating the box-cox transformed variable is ready. Lets now apply it on mydata$PER and append it to a new dataframe.
> mydata <- cbind(mydata, PER_New = predict(PER_MCMod, mydata$PER))
Error in predict(PER_MCMod, mydata$PER) : object 'PER_MCMod' not found
I cant understand whats the fault here ? Your suggestions would be welcome.
Is this process wrong , i must follow other solution ?

The method of all possible regressions in R

I want to determine which variables are best suited for the model. To do this, I use the method of all possible regressions, i.e. build models with all possible combinations of predictors. For example:
library(fpp2)
# uschange is dataset from "fpp2" package
train <- ts(uschange[1:180, ], start = 1970, frequency = 5)
fit1 <- tslm(Consumption ~ Income, data = train)
fit2 <- tslm(Consumption ~ Production, data = train)
fit3 <- tslm(Consumption ~ Savings, data = train)
fit4 <- tslm(Consumption ~ Unemployment, data = train)
fit5 <- tslm(Consumption ~ Income + Production, data = train)
fit6 <- tslm(Consumption ~ Income + Savings, data = train)
# and so on...
After that, I need to evaluate the models in two ways:
test <- data.frame(
Income = uschange[181:187, 2],
Production = uschange[181:187, 3],
Savings = uschange[181:187, 4],
Unemployment = uschange[181:187, 5]
)
y <- uschange[181:187, 1]
CV(fit1)
accuracy(forecast(fit1, test), y)
CV(fit2)
accuracy(forecast(fit2, test), y)
CV(fit3)
accuracy(forecast(fit3, test), y)
# and so on...
As a result, I want to get a model with the smallest value of AICc from CV() and with the smallest error value (for example MAE from accuracy()).
How can I do this automatically?
EDIT:
> dput(head(uschange, 20))
structure(c(0.615986218, 0.46037569, 0.876791423, -0.274245141,
1.897370758, 0.911992909, 0.794538845, 1.648587467, 1.313722178,
1.891474954, 1.530714, 2.318294715, 1.81073916, -0.041739961,
0.354235565, -0.291632155, -0.877027936, 0.351135548, 0.409597702,
-1.475808634, 0.972261043, 1.169084717, 1.55327055, -0.255272381,
1.987153628, 1.447334175, 0.531811929, 1.160125137, 0.457011505,
1.016624409, 1.904101264, 3.890258661, 0.708252663, 0.79430954,
0.433818275, 1.093809792, -1.661684821, -0.938353209, 0.094487794,
-0.122595985, -2.452700312, -0.551525087, -0.358707862, -2.185454855,
1.90973412, 0.901535843, 0.308019416, 2.291304415, 4.149573867,
1.89062398, 1.273352897, 3.436892066, 2.799076357, 0.817688618,
0.868996932, 1.472961869, -0.882483578, 0.074279194, -0.41314971,
-4.064118932, 4.810311502, 7.287992337, 7.289013063, 0.985229644,
3.657770614, 6.051341804, -0.445832214, -1.53087186, -4.35859438,
-5.054525795, 5.809959038, 16.04471706, -5.348868495, 8.426034362,
2.758795652, 11.14642986, -2.533514487, -6.592644641, 0.51717884,
11.3433954, 0.9, 0.5, 0.5, 0.7, -0.1, -0.1, 0.1, 0, -0.2, -0.1,
-0.2, -0.3, -0.3, 0, -0.1, 0.1, 0.2, 0.3, 0.5, 1.3), .Dim = c(20L,
5L), .Dimnames = list(NULL, c("Consumption", "Income", "Production",
"Savings", "Unemployment")), .Tsp = c(1970, 1974.75, 4), class = c("mts",
"ts", "matrix"))
Try this:
# get all names of predictors
cols <- colnames(uschange)[-1]
# create all combinations
out <- unlist(lapply(1:length(cols), function(n) combn(cols, n, FUN=function(row) paste0("Consumption ~ ", paste0(row, collapse = "+")))))
# fit models:
mods = lapply(out, function(frml) tslm(frml, data=train))
# define helper function:
cv_this <- function(x){
return(list('cv' = CV(x), 'acc' = accuracy(forecast(x, test), y)))
}
# run helper function over all models to get evaluations out:
lapply(mods, cv_this)

CES Production Function Estimation using micEconCES

I'm currently trying to do some estimations using the micEconCES package in R by Henningsen/Henningsen (2011). My issue is that I am not very familiar with R and I'm trying to implement my own dataset to get the estimations with the package.
They authors of the paper created this data set for the estimations.
R> set.seed( 123 )
R> cesData <- data.frame(x1 = rchisq(200, 10), x2 = rchisq(200, 10), x3 = rchisq(200, 10), x4 = rchisq(200, 10) )
R> cesData$y2 <- cesCalc( xNames = c( "x1", "x2" ), data = cesData, + coef = c( gamma = 1, delta = 0.6, rho = 0.5, nu = 1.1 ) )
R> cesData$y2 <- cesData$y2 + 2.5 * rnorm( 200 )
R> cesData$y3 <- cesCalc(xNames = c("x1", "x2", "x3"), data = cesData, coef = c( gamma = 1, delta_1 = 0.7, delta = 0.6, rho_1 = 0.3, rho = 0.5, + nu = 1.1), nested = TRUE )
R> cesData$y3 <- cesData$y3 + 1.5 * rnorm(200)
R> cesData$y4 <- cesCalc(xNames = c("x1", "x2", "x3", "x4"), data = cesData, coef = c(gamma = 1, delta_1 = 0.7, delta_2 = 0.6, delta = 0.5, rho_1 = 0.3, rho_2 = 0.4, rho = 0.5, nu = 1.1), nested = TRUE )
R> cesData$y4 <- cesData$y4 + 1.5 * rnorm(200)
The first line sets the“seed”for the random number generator so that these examples can be replicated with exactly the same data set. The second line creates a data set with four input variables (called x1, x2, x3, and x4) that each have 200 observations and are generated from random χ2 distributions with 10 degrees of freedom. The third, fifth, and seventh commands use the function cesCalc, which is included in the micEconCES package, to calculate the deterministic output variables for the CES functions with two, three, and four inputs (called y2, y3, and y4, respectively) given a CES production function. Now in my paper I'm trying to estimate the CES function for the U.S. at the Aggregate Level for the two input case with capital and labor. So what I did is I gathered data from the World Bank Data Base from 1990-2015, where I used Gross Fixed Capital Formation for capital and total Labor Force for Labor.
The authors did f.e. a non linear estimation the following way
R> cesNls <- nls( y2 ~ gamma * ( delta * x1^(-rho) + (1 - delta) * x2^(-rho) )^(-phi / rho), + data = cesData, start = c( gamma = 0.5, delta = 0.5, rho = 0.25, phi = 1 ) ) R> print( cesNls )
Now I want the exact same thing for my own data Set which is called Data_Extract_From_World_Development_Indicators. So what I did is firstly
ceslan <- cesCalc( xNames = c( "GrossFixedCapitalFormation", "LaborForce" ), data = Data_Extract_From_World_Development_Indicators, coef = c( gamma = 1, delta = 0.6, rho = 0.5, nu = 1.1 ) )
So i replicated
R> cesData$y2 <- cesCalc( xNames = c( "x1", "x2" ), data = cesData, coef = c( gamma = 1, delta = 0.6, rho = 0.5, nu = 1.1 ) )
All I did was changing the name of the Dataset and replaced x1 and x2 with my two variables for capital and Labor.
Afterwards I tried to do the non linear estimation
cesulan <- nls(y2 ~ gamma * (delta * GrossFixedCapitalFormation^(-rho) + (1-delta)*LaborForce^(-rho))^(-phi / rho), data = Data_Extract_From_World_Development_Indicators, start = c(gamma = 0.5, delta = 0.5, rho = 0.25, phi = 1) )
Now this is where my Problem is: I dont know what variable is meant to be y2 in my dataset. I can see in the formula that y2 ~ gamma *... so ist plotted against the rest of the term, but I dont know what Kind of value I need to plug in there. Does anyone have any advice?
Thanks in advance
In Hennigsen &Hennisgen (2011), the variable y2 is created with the function cesCalc. It is perturbed in order to theoretically test the introduced function cesEst. This variable is supposed to be your function's Output (usually Gross Domestic Product, but not exclusively).
this must be a series (a R column of a dataframe) of numerical non-negative values, of size equal to your other explanatory variables x1 and x2.

bounds for parameters in mle2() with optimizer=optimx

In the mle2, I used "optimx" as a optimizer. I want to use lower and upper bounds for parameters. How can I do this?
For example:
library("bbmle"); library("optimx")
y <- c(0.654, 0.613, 0.315, 0.449, 0.297, 0.402, 0.379,
0.423, 0.379, 0.3235, 0.269, 0.740, 0.418, 0.412,
0.494, 0.416, 0.338, 0.392, 0.484, 0.265)
gamma4 <- function(shape, scale) {
-sum(dgamma(y, shape = shape, scale = scale,log = TRUE))
}
gm <- mean(y)
cv <- var(y)/mean(y)
m5 <- mle2(gamma4,start = list(shape = gm/cv, scale = cv),
optimizer="optimx")
m5
Or:
mle2(gengamma3,start = list(shape = ci,
scale = bet, k=alp),
optimizer="optimx")
Thanks
You can try to write lower function as last parametr, like in example below:
## use bounded optimization
## the lower bounds are really > 0, but we use >=0 to stress-test
## profiling; note lower must be named
(fit1 <- mle2(LL, method="L-BFGS-B", lower=c(ymax=0, xhalf=0)))
p1 <- profile(fit1)
Or in that one:
# try bounded optimization with nlminb and constrOptim
(fit1B <- mle2(LL, optimizer="nlminb", lower=c(lymax=1e-7, lhalf=1e-7)))
p1B <- profile(fit1B)
confint(p1B)
(fit1C <- mle2(LL, optimizer="constrOptim", ui = c(lymax=1,lhalf=1), ci=2,
method="Nelder-Mead"))
But for fully understanding i advise to look here

Anesrake algorithm doesn't work with zero as a weight

I tried using the anesrake package, but it won't accept a weight of zero, giving the error message:
Error in while (range(weightvec)[2] > cap + 1e-04) { :
missing value where TRUE/FALSE needed
Sample code:
ipfdata<- read.csv("dummydata.csv", header = T)
ipfdata$caseid <- 1:length(ipfdata$age)
sex <- c(0.30, 0.70)
age <- c(0.2, 0.1, 0.05, 0.05, 0.05, 0.05, 0.3, 0.2)
ses <- c(0.20, 0.20, 0.0)
targets <- list(sex, age, ses)
names(targets) <- c("sex", "age", "ses")
outsave <- anesrake(targets, ipfdata, caseid = ipfdata$caseid, weightvec = NULL, cap = 10, verbose = TRUE, maxit = 50, choosemethod = "total", type = "nolim", pctlim = 0.0001, nlim=10, iterate = T, force1 = TRUE)
(sample code modified from this question: https://stackoverflow.com/questions/19458306/ipf-raking-using-anesrake-in-r-error)
The package was never updated despite my contacting the author to address this issue. The only workaround is to remove any rows with the variable set to zero before raking.
In the given sample above, you would have to remove any rows with the third SES factor, and then change the SES vector to c(0.20, 0.20) instead of c(0.20, 0.20, 0.0).

Resources