Fitting parameters do not converge, but nlsLM() stops - r

In R, I'm trying to fit a series of data to a curve given by the theory, which is:
y(x) = (1 + fitbeta * x)^(-fitgamma)
In theory this should work and others have used this formula successfully.
However, I cannot seem to make my fit parameters converge, no matter the starting conditions or options I try.
Below is a reproducible example. Note that I am using the package minpack.lm, which is a nice improvement over nls() which - by the way - throws a Missing value or an infinity produced when evaluating the model error at me when I try to use it (with the same settings).
This may be a trivial question for most of you, but I do not know nls() at all and I'm just starting to use it. There must be something super-simple I'm missing!
Here's the code:
#Libraries:
library(ggplot2) #Plots
library(minpack.lm) #Non linear formula fits
#Reproducible example parameters:
startbeta=0.001
startgamma=-10
X <- c(1, seq(2,240,2)) #Vector of durations in hours
Y <- c(1, 0.999349067282531, 0.997149973857984, 0.993613390445064,
0.988771983639396, 0.982724692889081, 0.975628661286657, 0.96751072812657,
0.958414894569813, 0.948463753530251, 0.93767394420049, 0.926259971613655,
0.91433495083748, 0.901955098152661, 0.889290679032582, 0.876927340669535,
0.864697870521103, 0.852357436802833, 0.839855401239168, 0.827134255036668,
0.814227652658426, 0.801278249082419, 0.788355912487271, 0.775514097293561,
0.762891867628759, 0.750380786683852, 0.738018762182673, 0.725799137700828,
0.713720035497274, 0.701808749767634, 0.690046213599144, 0.678484705844808,
0.667111445204795, 0.655977696751697, 0.645116379924585, 0.634460211234775,
0.623985607991471, 0.613706080277076, 0.603604313599018, 0.593685433942668,
0.58395373490791, 0.574696581531438, 0.565639259757887, 0.556883924877305,
0.54829105550864, 0.539882579975057, 0.531669333311634, 0.523789998486779,
0.516140798533169, 0.508732414242052, 0.501549858546355, 0.494581375404643,
0.487806083201077, 0.481215549260729, 0.475344757534521, 0.469883620527239,
0.464505182123833, 0.459295389779093, 0.454254664927743, 0.449272635346615,
0.444353923395879, 0.439502685117945, 0.434723592424652, 0.430300205554656,
0.425950322720235, 0.421651255977861, 0.417403585324494, 0.413205553596921,
0.409056611802817, 0.404966487426596, 0.400979187396173, 0.39721419353495,
0.393559414540655, 0.389971147514211, 0.386435641037176, 0.382947750185137,
0.379497143530884, 0.376143019983175, 0.373016368099911, 0.369904788649644,
0.366813427145508, 0.363784767175811, 0.360999911892512, 0.358249758228913,
0.355539445964091, 0.352899943455576, 0.35037387237155, 0.347925865476795,
0.345529621963385, 0.343187675737988, 0.340930763575173, 0.338722557572396,
0.336557943062853, 0.334418098646777, 0.332805911075547, 0.33117666406428,
0.329516391536038, 0.327847961775104, 0.32615922691243, 0.324473380427564,
0.322807248963926, 0.321128906622371, 0.319431589984492, 0.317714245126025,
0.315983206323488, 0.314233066510948, 0.312462213805877, 0.310672914913813,
0.308902798280917, 0.307149178519641, 0.305387995487162, 0.303621881791372,
0.301859643176666, 0.300098168944162, 0.298340765140062, 0.296633192262476,
0.294981140721158, 0.293349312493173, 0.291718961012827, 0.290087159821697,
0.288466970001273)
#Plot function:
ggp <- function(b=fitbeta, g=fitgamma) {
gg <- ggplot(data.frame(x = c(0, 240)), aes(x)) +
stat_function(fun = function(x) (1 + b * x)^(-g), geom = "line") +
geom_line(data=data.frame(x=X, y=Y), aes(y=y, x=x), colour="red")
print(gg)
}
#Fit:
nlc <- nls.control(maxiter = 1000)
fit <- nlsLM(y ~ I((1 + fitbeta * x)^(-fitgamma)), #Function from NERC(1975)
data = data.frame(y=Y, x=X),
start = list(fitbeta = startbeta, fitgamma=startgamma),
trace=TRUE, control=nlc)
#Get the coefficients of the fit
fitbeta <- coef(fit)[1]
fitgamma <- coef(fit)[2]
#Plot:
ggp()
The result is not too shabby:
(in black the fitted curve, in red the input data)
However, if I look at the nlsLM() trace:
...
It. 64, RSS = 0.13003, Par. = -1.37045e-05 -444.272
It. 65, RSS = 0.130024, Par. = -1.33717e-05 -455.143
It. 66, RSS = 0.130014, Par. = -1.30753e-05 -465.508
It. 67, RSS = 0.130006, Par. = -1.29238e-05 -471.145
It. 68, RSS = 0.13, Par. = -1.26237e-05 -482.163
It. 69, RSS = 0.129991, Par. = -1.23554e-05 -492.681
It. 70, RSS = 0.129984, Par. = -1.22119e-05 -498.639
It. 71, RSS = 0.129979, Par. = -1.19291e-05 -510.269
It. 72, RSS = 0.12997, Par. = -1.16755e-05 -521.396
It. 73, RSS = 0.129964, Par. = -1.15447e-05 -527.488
It. 74, RSS = 0.129959, Par. = -1.12865e-05 -539.368
It. 75, RSS = 0.129951, Par. = -1.10541e-05 -550.747
It. 76, RSS = 0.129945, Par. = -1.09312e-05 -557.117
It. 77, RSS = 0.129941, Par. = -1.06886e-05 -569.567
It. 78, RSS = 0.129933, Par. = -1.04713e-05 -581.433
It. 79, RSS = 0.129928, Par. = -1.03588e-05 -587.931
It. 80, RSS = 0.129924, Par. = -1.01368e-05 -600.613
It. 81, RSS = 0.129917, Par. = -9.93656e-06 -612.764
It. 82, RSS = 0.129912, Par. = -9.82966e-06 -619.607
It. 83, RSS = 0.129908, Par. = -9.61868e-06 -632.993
It. 84, RSS = 0.129902, Par. = -9.42521e-06 -646.024
It. 85, RSS = 0.129896, Par. = -9.24949e-06 -658.345
It. 86, RSS = 0.129891, Par. = -9.14794e-06 -665.812
It. 87, RSS = 0.129888, Par. = -8.9485e-06 -680.423
It. 88, RSS = 0.129882, Par. = -8.7693e-06 -694.38
It. 89, RSS = 0.129876, Par. = -8.58442e-06 -709.315
It. 90, RSS = 0.129871, Par. = -8.41437e-06 -723.697
It. 91, RSS = 0.129866, Par. = -8.25997e-06 -737.276
It. 92, RSS = 0.129862, Par. = -8.17745e-06 -744.9
It. 93, RSS = 0.129859, Par. = -8.01476e-06 -759.809
It does not actually look like it is converging, the two parameters continue to change and actually diverge. What is happening here and why does the fitter stop, if the parameters are not stable?

This is not a solution, but some food for thoughts:
In my experience, if you have serious errors from nls, using nlsLM is not a good fix as it will give you usually a result, but often not a good result. An alternative approach would be using a whole bunch of optimizers and check if they can somewhat agree on a solution.
library(optimx)
fun <- function(x, b, c) (1 + b * x)^(-log(c))
fun1 <- function(pars) sum((Y - fun(X, pars["b"], pars["c"]))^2)
fits <- optimx(c(b = 0.001, c = log(10)), fun1, control = list(all.methods = TRUE))
fits
# b c value fevals gevals niter convcode kkt1 kkt2 xtimes
#BFGS 0.009277751 2.745042 2.453159e-01 88 20 NA 0 FALSE TRUE 0.02
#CG 0.012242506 2.300605 3.184743e-01 171 19 NA 0 FALSE TRUE 0.00
#Nelder-Mead 0.004664550 5.396015 1.427017e-01 175 NA NA 0 FALSE FALSE 0.00
#L-BFGS-B NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#nlm 0.001000000 2.302585 1.797693e+308 NA NA 0 0 NA NA 0.00
#nlminb 0.001674994 58.216382 1.081060e-01 123 191 89 0 TRUE FALSE 0.00
#spg 0.019577915 2.286535 1.448918e+00 7 NA 4 3 FALSE FALSE 0.14
#ucminf 0.001000000 2.302585 2.153628e+01 1 1 NA 0 FALSE FALSE 0.00
#Rcgmin NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#Rvmmin NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#newuoa NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#bobyqa 0.024625964 2.801075 5.831466e+00 23 NA NA 0 NA NA 0.00
#nmkb NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#hjkb 0.001000000 2.302585 2.153628e+01 1 NA 0 9999 NA NA 0.00
plot(X, Y)
curve(fun(x, fits["BFGS", "b"], fits["BFGS", "c"]), add = TRUE)
curve(fun(x, fits["CG", "b"], fits["CG", "c"]), add = TRUE, col = "blue")
curve(fun(x, fits["Nelder-Mead", "b"], fits["Nelder-Mead", "c"]), add = TRUE, col = "red")
curve(fun(x, fits["nlminb", "b"], fits["nlminb", "c"]), add = TRUE, col = "green")
As you see, most of the converging optimizers give quite different parameter estimates and non of these fits can be considered good. I suspect that this is not merely a result of bad starting values, but that your data just doesn't conform to the model. In particular, it doesn't look like the model can represent the slightly sigmoidal shape of your data.

Related

order/number of variables in lm causing singularities?

I was trying to run a linear model using lm() in R with 12 explanatory variables and 33 observations), but the coefficients for the last three variables are not defined because of singularities. When I switched the order of the variables, the same thing happens again, even though those variables (TotalPrec_11, TotalPrec_12, TotalPrec_10) were significant before. The coefficients were also different between two models.
ab <- lm(value ~ TotalPrec_12 + TotalPrec_11 + TotalPrec_10 + TotalPrec_9 + TotalPrec_8 + TotalPrec_7 + TotalPrec_6 + TotalPrec_5 + TotalPrec_4 + TotalPrec_3 + TotalPrec_2 + TotalPrec_1, data = aa)
summary(ab)
#Coefficients: (3 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 64.34 30.80 2.089 0.0480 *
#TotalPrec_12 19811.97 11080.14 1.788 0.0869 .
#TotalPrec_11 -16159.45 7099.89 -2.276 0.0325 *
#TotalPrec_10 -16500.62 18813.96 -0.877 0.3895
#TotalPrec_9 62662.08 51143.37 1.225 0.2329
#TotalPrec_8 665.39 36411.95 0.018 0.9856
#TotalPrec_7 -77203.59 51555.71 -1.497 0.1479
#TotalPrec_6 4830.11 19503.52 0.248 0.8066
#TotalPrec_5 6403.94 14902.77 0.430 0.6714
#TotalPrec_4 -735.73 5023.83 -0.146 0.8848
#TotalPrec_3 NA NA NA NA
#TotalPrec_2 NA NA NA NA
#TotalPrec_1 NA NA NA NA
The same data here with a different order of variables:
ab1 <- lm(value ~ TotalPrec_1 + TotalPrec_2 + TotalPrec_3 + TotalPrec_9 + TotalPrec_8 + TotalPrec_7 + TotalPrec_6 + TotalPrec_5 + TotalPrec_4 + TotalPrec_11 + TotalPrec_12 + TotalPrec_10, data = aa)
summary(ab1)
#Coefficients: (3 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 63.72 54.44 1.171 0.2538
#TotalPrec_1 19611.54 19366.33 1.013 0.3218
#TotalPrec_2 -14791.44 7847.87 -1.885 0.0722 .
#TotalPrec_3 6766.60 3144.68 2.152 0.0422 *
#TotalPrec_9 28677.62 53530.82 0.536 0.5973
#TotalPrec_8 -23207.34 65965.12 -0.352 0.7282
#TotalPrec_7 -26628.10 55839.25 -0.477 0.6380
#TotalPrec_6 -28694.23 13796.80 -2.080 0.0489 *
#TotalPrec_5 46982.35 17941.89 2.619 0.0154 *
#TotalPrec_4 -26393.70 17656.70 -1.495 0.1486
#TotalPrec_11 NA NA NA NA
#TotalPrec_12 NA NA NA NA
#TotalPrec_10 NA NA NA NA
Several posts online suggest that it might be a multicollinearity problems. I ran the cor() function to check for collinearity, and nothing came out to be perfectly correlated.
I used the same set of these 12 variables with other response variables, and there was no problem with singularities. So I'm not sure what happens here and what I need to do differently to figure this out.
edit
here is my data
> dput(aa)
structure(list(value = c(93, 95, 88, 90, 90, 80, 100, 80, 96,
100, 100, 100, 80, 94, 88, 76, 90, 0, 93, 100, 88, 90, 95, 71,
92, 93, 92, 100, 85, 90, 100, 100, 100), TotalPrec_1 = c(0.00319885835051536,
0.00319885835051536, 0.00319885835051536, 0.00717973057180643,
0.00717973057180643, 0.00717973057180643, 0.00464357063174247,
0.00464357063174247, 0.00464357063174247, 0.00598198547959327,
0.00598198547959327, 0.00598198547959327, 0.00380058260634541,
0.00380058260634541, 0.00380058260634541, 0.00380058260634541,
0.00364887388423085, 0.00364887388423085, 0.00364887388423085,
0.00475014140829443, 0.00475014140829443, 0.00475014140829443,
0.00475014140829443, 0.00499139120802283, 0.00499139120802283,
0.00499139120802283, 0.00499139120802283, 0.00490436097607016,
0.00490436097607016, 0.00490436097607016, 0.00623255362734198,
0.00623255362734198, 0.00623255362734198), TotalPrec_2 = c(0.00387580785900354,
0.00387580785900354, 0.00387580785900354, 0.00625309534370899,
0.00625309534370899, 0.00625309534370899, 0.00298969540745019,
0.00298969540745019, 0.00298969540745019, 0.00558579061180353,
0.00558579061180353, 0.00558579061180353, 0.00370361795648932,
0.00370361795648932, 0.00370361795648932, 0.00370361795648932,
0.00335893919691443, 0.00335893919691443, 0.00335893919691443,
0.00621500937268137, 0.00621500937268137, 0.00621500937268137,
0.00621500937268137, 0.00234323320910334, 0.00234323320910334,
0.00234323320910334, 0.00234323320910334, 0.00644989637658, 0.00644989637658,
0.00644989637658, 0.00476496992632746, 0.00476496992632746, 0.00476496992632746
), TotalPrec_3 = c(0.00418250449001789, 0.00418250449001789,
0.00418250449001789, 0.00702223135158419, 0.00702223135158419,
0.00702223135158419, 0.00427648611366748, 0.00427648611366748,
0.00427648611366748, 0.00562589056789875, 0.00562589056789875,
0.00562589056789875, 0.0037367227487266, 0.0037367227487266,
0.0037367227487266, 0.0037367227487266, 0.00477339653298258,
0.00477339653298258, 0.00477339653298258, 0.0124167986214161,
0.0124167986214161, 0.0124167986214161, 0.0124167986214161, 0.010678518563509,
0.010678518563509, 0.010678518563509, 0.010678518563509, 0.0139585845172405,
0.0139585845172405, 0.0139585845172405, 0.00741709442809224,
0.00741709442809224, 0.00741709442809224), TotalPrec_4 = c(0.00659881485626101,
0.00659881485626101, 0.00659881485626101, 0.00347008113749325,
0.00347008113749325, 0.00347008113749325, 0.00720167113468051,
0.00720167113468051, 0.00720167113468051, 0.00704727275297045,
0.00704727275297045, 0.00704727275297045, 0.00856677815318107,
0.00856677815318107, 0.00856677815318107, 0.00856677815318107,
0.00867980346083641, 0.00867980346083641, 0.00867980346083641,
0.00614343490451574, 0.00614343490451574, 0.00614343490451574,
0.00614343490451574, 0.00704662408679723, 0.00704662408679723,
0.00704662408679723, 0.00704662408679723, 0.00495137926191091,
0.00495137926191091, 0.00495137926191091, 0.00796654727309942,
0.00796654727309942, 0.00796654727309942), TotalPrec_5 = c(0.00515584181994199,
0.00515584181994199, 0.00515584181994199, 0.000977653078734875,
0.000977653078734875, 0.000977653078734875, 0.00485571753233671,
0.00485571753233671, 0.00485571753233671, 0.00477610062807798,
0.00477610062807798, 0.00477610062807798, 0.00664602871984243,
0.00664602871984243, 0.00664602871984243, 0.00664602871984243,
0.00533714797347784, 0.00533714797347784, 0.00533714797347784,
0.00265633105300366, 0.00265633105300366, 0.00265633105300366,
0.00265633105300366, 0.00200922577641904, 0.00200922577641904,
0.00200922577641904, 0.00200922577641904, 0.00172789173666387,
0.00172789173666387, 0.00172789173666387, 0.00347296684049069,
0.00347296684049069, 0.00347296684049069), TotalPrec_6 = c(0.00170362275093793,
0.00170362275093793, 0.00170362275093793, 0.000670029199682176,
0.000670029199682176, 0.000670029199682176, 0.0018315939232707,
0.0018315939232707, 0.0018315939232707, 0.00138648133724927,
0.00138648133724927, 0.00138648133724927, 0.00329410820268094,
0.00329410820268094, 0.00329410820268094, 0.00329410820268094,
0.00210500298999249, 0.00210500298999249, 0.00210500298999249,
0.000628655252512544, 0.000628655252512544, 0.000628655252512544,
0.000628655252512544, 0.000631613133009523, 0.000631613133009523,
0.000631613133009523, 0.000631613133009523, 0.000616533157881349,
0.000616533157881349, 0.000616533157881349, 0.000599739549215883,
0.000599739549215883, 0.000599739549215883), TotalPrec_7 = c(0.00124496815260499,
0.00124496815260499, 0.00124496815260499, 0.000289129035081714,
0.000289129035081714, 0.000289129035081714, 0.00089572963770479,
0.00089572963770479, 0.00089572963770479, 0.00187503395136445,
0.00187503395136445, 0.00187503395136445, 0.00070394336944446,
0.00070394336944446, 0.00070394336944446, 0.00070394336944446,
0.000733022985514253, 0.000733022985514253, 0.000733022985514253,
4.50894685855019e-06, 4.50894685855019e-06, 4.50894685855019e-06,
4.50894685855019e-06, 3.02730550174601e-05, 3.02730550174601e-05,
3.02730550174601e-05, 3.02730550174601e-05, 3.71173496205301e-06,
3.71173496205301e-06, 3.71173496205301e-06, 4.58224167232402e-05,
4.58224167232402e-05, 4.58224167232402e-05), TotalPrec_8 = c(0.000394100265111774,
0.000394100265111774, 0.000394100265111774, 0.000930351321585476,
0.000930351321585476, 0.000930351321585476, 0.000679628865327686,
0.000679628865327686, 0.000679628865327686, 0.000997507828287781,
0.000997507828287781, 0.000997507828287781, 1.77486290340312e-05,
1.77486290340312e-05, 1.77486290340312e-05, 1.77486290340312e-05,
1.63553704624064e-05, 1.63553704624064e-05, 1.63553704624064e-05,
4.31556363764685e-05, 4.31556363764685e-05, 4.31556363764685e-05,
4.31556363764685e-05, 8.14739760244265e-05, 8.14739760244265e-05,
8.14739760244265e-05, 8.14739760244265e-05, 4.07490988436621e-05,
4.07490988436621e-05, 4.07490988436621e-05, 0.000140139847644605,
0.000140139847644605, 0.000140139847644605), TotalPrec_9 = c(0.000616681878454983,
0.000616681878454983, 0.000616681878454983, 0.000742240983527154,
0.000742240983527154, 0.000742240983527154, 0.000230846126214601,
0.000230846126214601, 0.000230846126214601, 0.00132466584909707,
0.00132466584909707, 0.00132466584909707, 0.000114383190521039,
0.000114383190521039, 0.000114383190521039, 0.000114383190521039,
6.07241054240149e-05, 6.07241054240149e-05, 6.07241054240149e-05,
2.74324702331796e-05, 2.74324702331796e-05, 2.74324702331796e-05,
2.74324702331796e-05, 6.96572624292457e-06, 6.96572624292457e-06,
6.96572624292457e-06, 6.96572624292457e-06, 3.32364725181833e-05,
3.32364725181833e-05, 3.32364725181833e-05, 0.000108777909190394,
0.000108777909190394, 0.000108777909190394), TotalPrec_10 = c(0.00040393992094323,
0.00040393992094323, 0.00040393992094323, 0.00166831514798104,
0.00166831514798104, 0.00166831514798104, 0.000324568885844201,
0.000324568885844201, 0.000324568885844201, 0.000868275004904717,
0.000868275004904717, 0.000868275004904717, 1.25834640130051e-05,
1.25834640130051e-05, 1.25834640130051e-05, 1.25834640130051e-05,
7.2861012085923e-06, 7.2861012085923e-06, 7.2861012085923e-06,
0.000946254527661949, 0.000946254527661949, 0.000946254527661949,
0.000946254527661949, 0.000476793473353609, 0.000476793473353609,
0.000476793473353609, 0.000476793473353609, 0.00102826312649995,
0.00102826312649995, 0.00102826312649995, 0.00266417209059, 0.00266417209059,
0.00266417209059), TotalPrec_11 = c(0.00124716362915933, 0.00124716362915933,
0.00124716362915933, 0.00470362277701497, 0.00470362277701497,
0.00470362277701497, 0.0017967780586332, 0.0017967780586332,
0.0017967780586332, 0.000694554066285491, 0.000694554066285491,
0.000694554066285491, 0.000485763972392306, 0.000485763972392306,
0.000485763972392306, 0.000485763972392306, 0.00074231723556295,
0.00074231723556295, 0.00074231723556295, 0.000763822405133396,
0.000763822405133396, 0.000763822405133396, 0.000763822405133396,
0.00114128366112709, 0.00114128366112709, 0.00114128366112709,
0.00114128366112709, 0.000856105296406895, 0.000856105296406895,
0.000856105296406895, 0.00255026295781135, 0.00255026295781135,
0.00255026295781135), TotalPrec_12 = c(0.00380058260634541, 0.00380058260634541,
0.00380058260634541, 0.00475014140829443, 0.00475014140829443,
0.00475014140829443, 0.00412079365924, 0.00412079365924, 0.00412079365924,
0.00455283792689442, 0.00455283792689442, 0.00455283792689442,
0.00117174908518791, 0.00117174908518791, 0.00117174908518791,
0.00117174908518791, 0.00119069591164588, 0.00119069591164588,
0.00119069591164588, 0.00201585865579545, 0.00201585865579545,
0.00201585865579545, 0.00201585865579545, 0.00202310062013566,
0.00202310062013566, 0.00202310062013566, 0.00202310062013566,
0.00231692171655595, 0.00231692171655595, 0.00231692171655595,
0.00495567917823791, 0.00495567917823791, 0.00495567917823791
)), row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"
))
When you have multiple predictors, singularity doesn’t necessarily mean that two variables are perfectly correlated. It means that at least one of your variables can be perfectly predicted by some combination of the other variables, even if none of those variables is a perfect predictor on its own. When you have many predictors relative to few observations, as you do, the odds of this happening increase. So you will probably need to simplify your model.
You are trying to estimate a linear model y = X %*% beta + epsilon given X and y. The model matrix X has 33 rows and 13 columns, one for the intercept and one for each numeric variable:
X <- model.matrix(ab)
dim(X)
## [1] 33 13
colnames(X)
## [1] "(Intercept)" "TotalPrec_12" "TotalPrec_11" "TotalPrec_10" "TotalPrec_9"
## [6] "TotalPrec_8" "TotalPrec_7" "TotalPrec_6" "TotalPrec_5" "TotalPrec_4"
## [11] "TotalPrec_3" "TotalPrec_2" "TotalPrec_1"
But X has rank 10, not 13:
qr(X)$rank
## [1] 10
So there is no unique least squares solution beta. lm copes by fitting a reduced model to the first set of 10 linearly independent columns of X, as indicated in your summary output. (Whether it copes or throws an error depends on its argument singular.ok. The default value is TRUE.)
I find it curious that changing the response makes the problem go away, given that the rank of X does not depend on y. Perhaps you changed more than just the response without realizing?

Run a package function multiple times with different parameter values

if I have a data frame of historic option data;
StrikePrice UnderlyingPrice Days2Exp RfRate DividendRate Volatility
47 45 4 0.02 0.5 0.2
50 55 20 0.03 0.1 0.35
And I am using the package 'LSMonteCarlo' function 'AmerPutLSM';
price = AmerPutLSM(Spot = 45, sigma = 0.2, n=500, m=100, Strike = 47, r= 0.02, dr = 0.5, mT = 4)
summary(price)
Is there anyway I can do this function without manually having to change the values for the second row in my dataframe? (I'm dealing with a lot of rows in reality) An example that is wrong but gets the point of what I want to do across;
price = AmerPutLSM(Spot = dataframe$StrikePrice[1:2], sigma = dataframe$Volatility[1:2] etc, etc...)
Thanks
You can use any of the apply function here -
result <- apply(df, 1, function(x) AmerPutLSM(Spot = x['UnderlyingPrice'],
sigma = x['Volatility'], n=500, m=100, Strike = x['StrikePrice'],
r = x['RfRate'], dr = x['DividendRate'], mT = x['Days2Exp']))
result

convert log returns to actual price of a time series forecast using R

I have this simulated data and i fitted the ARMA-GARCH model using rugarch package. My code so far as follows,
ar.sim<-arima.sim(model=list(ar=c(.9,-.2),ma=c(-.7,.1)),n=100)
logr=diff(log(na.omit(ar.sim)))
require(rugarch)
gar<-ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(2, 1)),
mean.model = list(armaOrder = c(2, 1)),
distribution.model = "norm");
fitg=ugarchfit(spec = gar,data = ar.sim,solver = "hybrid");
ugarchforecast(fitg,n.ahead =10)
In this model , i used log returns . So my forecast also based on log returns. But i need the actual price. I googled to find any R function that convert this log return to actual price. But i couldn't find any.
Is there any function in R to extract the actual price from this log return of do i need to do that manually ?
The price should be [initial price] * exp(cumulative log returns). For example:
df <- data.frame(price = c(90, 108, 81, 105, 180))
df$log = log(df$price)
df$logr = c(NA, diff(df$log))
df$logr_na0 = ifelse(is.na(df$logr), 0, df$logr)
df$cuml_log= cumsum(df$logr_na0)
df$reconstructed_price_norm = exp(df$cuml_log)
initial_price <- 90
df$reconstructed_price = initial_price * df$reconstructed_price_norm
output
> df
price log logr logr_na0 cuml_log reconstructed_price_norm reconstructed_price
1 90 4.499810 NA 0.0000000 0.0000000 1.000000 90
2 108 4.682131 0.1823216 0.1823216 0.1823216 1.200000 108
3 81 4.394449 -0.2876821 -0.2876821 -0.1053605 0.900000 81
4 105 4.653960 0.2595112 0.2595112 0.1541507 1.166667 105
5 180 5.192957 0.5389965 0.5389965 0.6931472 2.000000 180

get target vector from simplex optimisation in R

I started using R to solve a LP-Problem using the simplex funciton from library("boot"). When i try the following code I only get the target vector as string but not as a vector.
library("boot")
# This example is taken from Exercise 7.5 of Gill, Murray and Wright (1991).
enj <- c(200, 6000, 3000, -200)
fat <- c(800, 6000, 1000, 400)
vitx <- c(50, 3, 150, 100)
vity <- c(10, 10, 75, 100)
vitz <- c(150, 35, 75, 5)
j<-simplex(a = enj, A1 = fat, b1 = 13800, A2 = rbind(vitx, vity, vitz),
b2 = c(600, 300, 550), maxi = TRUE)
xx<-(j["soln"])
print(xx)
From this get the folling output
$soln
x1 x2 x3 x4
0.0 0.0 13.8 0.0
However, this is not a vector object so i can't get the Element of the first Dimension by using x[1]. Can you please help me to get the result as a vector?
Your variable xx is a list so it's quite easy to return a specific value within it.
For example, if you want x1, you can do :
xx[["soln"]]["x1"] or xx[[1]][1]
But if you want to transform xx into a vector, you can do :
xx2 <- unlist(xx)
and then you have a vector :
> xx2
soln.x1 soln.x2 soln.x3 soln.x4
0.0 0.0 13.8 0.0
> is.vector(xx2)
[1] TRUE
If you don't want to keep the names for each element in the vector, put the arg use.names to FALSE :
> xx2 <- unlist(xx, use.names = FALSE)
> xx2
[1] 0.0 0.0 13.8 0.0
> is.vector(xx2)
[1] TRUE

Passing Fixed and Variable parameters to Optimx

This is a syntax question and probably has a simple solution but I can't find it covered anywhere on SO for the optimx package.
Minimal working example & Question
I have a function like:
ToOptimise = function(a,b,d,e){
(a-1)^2 + (b-2)^2 + (d-3)^2 +(e-4)^2
}
I can optimise select parameters in mle2 quite easily:
library(bbmle)
Calib2 = mle2(ToOptimise,
start = list(a = 1, d = 10),
fixed = list(b = 2, e = 2))
This is not really the right tool though as it is not a maximum likelihood problem.
I want to find a way to pass fixed and optimisable parameters to Optimx as easily
I plan to do several calibrations holding different parameters fixed at any time and so do not want to have to redo the function hardcoding in some parameters.
My attempt that don't work
library(optimx)
ToOptimiseVector = function(Theta){
a = Theta[1]
b = Theta[2]
d = Theta[3]
e = Theta[4]
(a-1)^2 + (b-2)^2 + (d-3)^2 +(e-4)^2
}
Calib1 = optimx( par = c(1,2,1,2),
fn = ToOptimiseVector,
Theta[2] = 2, Theta[4] = 2)
Another related Question on Stack Overflow
Another optimx syntax question that does not look at fixed and optimisable parameters is:
R- Optimx for exponential function with 2 parameters - cannot evaluate function at initial parameter values
Added a opt parameter to the fn. Just pass this opt vector into optimx(). Any non NA values will become fixed.
ToOptimiseVector <- function(Theta, opt = rep(NA, 4)){
# Check if any of opt is not NA
if (any(!sapply(opt, is.na))) {
i = !sapply(opt, is.na)
# Fix non-NA values
Theta[i] <- opt[i]
}
a <- Theta[1]
b <- Theta[2]
d <- Theta[3]
e <- Theta[4]
return((a-1)^2 + (b-2)^2 + (d-3)^2 +(e-4)^2)
}
Seems to work.
Calib1 = optimx( par = c(1,2,1,2), fn = ToOptimiseVector, opt = c(NA, 2, NA, 2))
Calib2 = optimx( par = c(1,2,1,2), fn = ToOptimiseVector)
> Calib1
p1 p2 p3 p4 value fevals gevals niter convcode kkt1 kkt2 xtimes
Nelder-Mead 0.9998974 5.517528 3.00022 10.83214 4 103 NA NA 0 TRUE FALSE 0.02
BFGS 1.0000000 4.000000 3.00000 8.00000 4 6 3 NA 0 TRUE FALSE 0.00
> Calib2
p1 p2 p3 p4 value fevals gevals niter convcode kkt1 kkt2 xtimes
Nelder-Mead 1.000936 1.999793 3.0006 4.000256 1.344336e-06 227 NA NA 0 FALSE TRUE 0.01
BFGS 1.000000 2.000000 3.0000 4.000000 3.566556e-23 16 3 NA 0 TRUE TRUE 0.00
The trick is that the starting params and the arguments to the function to be optimized have to be aligned. Please see if the following helps you at all.
library(optimx)
ToOptimiseVector <- function(Theta){
a <- Theta[1]
b <- Theta[2]
d <- Theta[3]
e <- Theta[4]
(a-1)^2 + (b-2)^2 + (d-3)^2 +(e-4)^2
}
start <- c(1,0,1,1)
start <- c(1,0,0,0)
start <- c(1,2,1,2)
Calib1 <- optimx( par=start,
fn = ToOptimiseVector)
This gives warning messages, but you do get a result.
> Calib1
p1 p2 p3 p4 value fevals gevals niter convcode kkt1 kkt2
Nelder-Mead 1 2 3 4 0 4 4 NA 0 TRUE TRUE
BFGS 1 2 3 4 0 4 4 NA 0 TRUE TRUE
xtimes
Nelder-Mead 0
BFGS 0
Hope you can start with this and proceed.

Resources