Controlling Step Size in 'optim' R function - r

I am using R to optimize a function using the 'optim' function. However, the true values of the variables I am optimizing over are spaced apart at least 10^-5 or so. But, as I understand it, the default step size (ie how much optim adds to each control variable to see how that changes the objective function) is of the order of 10^-8.
Is there any easy way to tell the 'optim' function to increase the step size to 10^5 or perhaps higher?
For reference my code is here:
Optimal <- optim(par = starting, fn =expectedSeats,
propensities = propsShocked, n = NumberofDistricts,
shockType = "normal", shockSD = 0.1,
method = "L-BFGS-B",
lower = rep(0,NumberofDistricts), upper = rep(1,NumberofDistricts),
control=list(factr = 1e12)
)
I have looked around and can't seem to figure this out. Thanks!

As I understand your question, I believe you can specify the step value within the ndeps argument of the control= option. According to the documentation, the default is 1e-3 (not 1e-8).

Related

zfit straight line fitting for 2 dim dataset

I would like to fit 2-dim plot by straight line (a*x+b) using zfit like the following figure.
That is very easy work by a probfit package, but it has been deprecated by scikit-hep. https://nbviewer.jupyter.org/github/scikit-hep/probfit/blob/master/tutorial/tutorial.ipynb
How can I fit such 2dim plots by any function?
I've checked zfit examples, but it seems to be assumed some distribution (histogram) thus zfit requires dataset like 1d array and I couldn't reach how to pass 2d data to zfit.
There is no direct way in zfit currently to implement this out-of-the-box (with one line), since a corresponding loss is simply not added.
However, the SimpleLoss (zfit.loss.SimpleLoss) allows you to construct any loss that you can think of (have a look at the example as well in the docstring). In your case, this would look along this:
x = your_data
y = your_targets # y-value
obs = zfit.Space('x', (lower, upper))
param1 = zfit.Parameter(...)
param2 = zfit.Parameter(...)
...
model = Func(...) # a function is the way to go here
data = zfit.Data.from_numpy(array=x, obs=obs)
def mse():
prediction = model.func(data)
value = tf.reduce_mean((prediction - y) ** 2) # or whatever you want to have
return value
loss = zfit.loss.SimpleLoss(mse, [param1, param2])
# etc.
On another note, it would be a good idea to add such a loss. If you're interested to contribute I recommend to get in contact with the authors and they will gladly help you and guide you to it.
UPDATE
The loss function itself consists presumably of three to four things: x, y, a model and maybe an uncertainty on y. The chi2 loss looks like this:
def chi2():
y_pred = model.func(x)
return tf.reduce_sum((y_pred - y) / y_error) ** 2)
loss = zfit.loss.SimpleLoss(chi2, model.get_params())
That's all, 4 lines of code. x is a zfit.Data object, model is in this case a Func.
Does that work?
That's all.

Error in optim: L-BFGS-B needs finite value of fn

I am trying to run impute_errors() function of the imputeTestBench package for a series of values. I am using six user defined methods for selection of best imputation method. Below is my code:
correctedSalesHistoryMatrix[, 1:2],
matrix(unlist(apply(X = as.matrix(correctedSalesHistoryMatrix[, -c(1, 2)]),
MARGIN = 1,
FUN = impute_errors,
smps = "mcar",
methods = c(
"imputationMethod1"
, "imputationMethod2"
, "imputationMethod3"
, "imputationMethod4"
, "imputationMethod5"
, "imputationMethod6"
),
methodPath = "C:\\Documents\\Imputations.R",
errorParameter = "mape",
missPercentFrom = 10,
missPercentTo = 10
)
), nrow = nrow(correctedSalesHistoryMatrix), byrow = T
)
)
When I am using a small dataset, the function executes successfully. When I am using a large dataset I am using the following error:
Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, :
L-BFGS-B needs finite values of 'fn'
Called from: optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0,
np + 1L), upper = rep(Inf, np + 1L), control = optim.control)
I don't think this is an easy fix.
Error is probably not caused by imputeTestBench itself, but rather by one of your user defined imputation methods.
Run impute_errors like before and only add na_mean as method instead of your user defined methods (impute_errors(..., methods = 'na_mean') ) to see if this suggestion is true.
The error itself occurs quite often and has to do with stats::optim receiving inputs it can't deal with. Quite likely you are not using stats::optim in your user defined imputation methods (so you can't easily fix the input). More likely is that a package your are using is doing some calculations and then using stats::optim. Or even worse a package you are using is using another package, that is using stats::optim.
In the answers to this question you can see an explanation underlying problem. Overall seems to occur especially for large datasets, when the fn input parameter to stats::optim becomes Inf.
Here a some examples of the problem also occurring for different R packages and functions (which all use stats::optim somewhere internally): 1, 2, 3
Not too much you can do overall, if you don't want to go extremely deep into the underlying packages.
If you are using the imputeTS package for one of your user supplied imputation methods, in this Github Issue a workaround is proposed, which might help, if the error occurs within the na_kalman or na_seadec method.

liquidSVM parameter selection clarification in R

This is my first question and i'm only a basic "programmer" so i'm sorry if i do not make myself clear enough.
I'm currently using liquidSVM 1.2.1 on R 3.5.0 and, despite its great potential i do not understand some technicisms, as the help is not explanatory enough for me and i cannot find anything on the internet.
More specifically I'd like to understand further how the parameter selection works.
The final liquidSVM model contains in fact info on gammas and lambdas but i cannot understand if these parameters are all being used in different cells or if just a final couple has been chosen for the final model.
These leads to 2 sub-questions:
If using all the values, how can i disable grid_choice and select only a value for each parameter?
If the algorithm selects a final couple of values, how can i understand which one it is?
This is the setting i've been using so far:
model = liquidSVM:: svm(formula, TRAIN, threads = 3, predict.prob = T, random_seed = 123, folds = 5, scale = F, d = 1, partition_choice = 5, grid_choice = -1)
I tried different things, for example:
setting gamma = 0.01 and lambda = 0.1;
setting max_gamma = 0.01 and min_gamma = 0.01
setting grid_choice = NULL or grid_choice = list(gamma = 0.01, lambda = 0.01)
but it still does a grid selection on its own.
If only i could understand how to disable this grid search and provide my chosen parameters, i'd code a grid search by myself (thus knowing what the code is doing).
Thank you in advance.
The question is somewhat older now. However, if someone is still looking for a corresponding solution.
You can define your grid in which to be searched for the best matching values by using the arguments called gammas and lambdas. In this case you set them to one value.
For example:
model <- svm(x1~., train, display=1,folds=5, mc_type="OvA_ls",
gammas = 0.01,
lambdas = 0.1
)
would set the gamma to only 0.01 and the lambda to 0.1.
However this is not a grid search anymore and you should expect to get two hands full of warning messages. If you provide a vector of gammas and a vector of lambdas it will search that set grid rather than the default. Hence the arguments can be handy if you want to compare liquidSVM with other packages for example.
Best luck

r Nomad categorical optimisation (snomadr)

I am trying to use the Nomad technique for blackbox optimisation from the crs package (C implementation), which is called via the snomadr function. The method works when trying straight numerical optimisation, but errors when categorical features are included. However the help for categorical optimisation is not very well documented, so I am struggling to see where I am going wrong. Reproducible code below:
library(crs)
library(randomForest)
Illustrating this on randomForest & the iris dataset.
Creating the randomForest model (leaving the last row out as starting points for the optimizer)
rfIris <- randomForest(x=iris[-150,-c(1)], y=unlist(iris[-150,1]))
The objective function (functions we want to optimize)
objFn <- function(x0,model){
preds <- predict(object = model, newdata = x0)
as.numeric(preds)
}
Test to see if the objective function works (should return ~6.37)
objOut <- objFn(x0=unlist(iris[150,-c(1)]),model = rfIris)
Creating initial conditions, options list, and upper/lower bounds for Nomad
x0 <- iris[150,-c(1)]
x0 <- unlist(x0)
options <- list("MAX_BB_EVAL"=10000,
"MIN_MESH_SIZE"=0.001,
"INITIAL_MESH_SIZE"=1,
"MIN_POLL_SIZE"=0.001,
"NEIGHBORS_EXE" = c(1,2,3),
"EXTENDED_POLL_ENABLED" = 'yes',
"EXTENDED_POLL_TRIGGER" = 'r0.01',
"VNS_SEARCH" = '1')
up <- c(10,10,10,10)
low <- c(0,0,0,0)
Calling the optimizer
opt <- snomadr(eval.f = objFn, n = 4, bbin = c(0,0,0,2), bbout = 0, x0= x0 ,model = rfIris, opts=options,
ub = up, lb = low)
and I get an error about the NEIGHBORS_EXE parameter in the options list. It seems as if I need to supply NEIGHBORS_EXE a file corresponding to a set of 'extended poll' coordinates, however is it not clear what these exactly are.
The method works by setting "EXTENDED_POLL_ENABLED" = 'no' in the options list, as it then ignores the categorical variables and defaults to numerical optimisation, but this is not what I want.
I also managed to pull up some additional information for NEIGHBORS_EXE using
snomadr(information=list("help"="-h NEIGHBORS_EXE"))
and again, do not understand what the 'neighbours.exe' is meant to be.
Any help would be much appreciated!
This is the response from Zhenghua who coded the R interface:
The issue is that he did not configure the parameter “NEIGHBORS_EXE” properly. He need to prepare an Executable file for defining the neighbors, put the executable file in the folder where R is called, and then set the parameter “NEIGHBORS_EXE” to the executable file name.
You can contact us at nomad#gerad.ca if you wish to continue the discussion.
About the neighbours_exe parameter you can refer to the section 7.1 of user guide of Nomad
https://www.gerad.ca/nomad/Downloads/user_guide.pdf

Estimate parameters of Frechet distribution using mmedist or fitdist(with mme) error

I'm relatively new in R and I would appreciated if you could take a look at the following code. I'm trying to estimate the shape parameter of the Frechet distribution (or inverse weibull) using mmedist (I tried also the fitdist that calls for mmedist) but it seems that I get the following error :
Error in mmedist(data, distname, start = start, fix.arg = fix.arg, ...) :
the empirical moment function must be defined.
The code that I use is the below:
require(actuar)
library(fitdistrplus)
library(MASS)
#values
n=100
scale = 1
shape=3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
memp=minvweibull(c(1,2), shape=3, rate=1, scale=1)
# estimating the parameters
para_lm = mmedist(data_fre,"invweibull",start=c(shape=3,scale=1),order=c(1,2),memp = "memp")
Please note that I tried many times en-changing the code in order to see if my mistake was in syntax but I always get the same error.
I'm aware of the paradigm in the documentation. I've tried that as well but with no luck. Please note that in order for the method to work the order of the moment must be smaller than the shape parameter (i.e. shape).
The example is the following:
require(actuar)
#simulate a sample
x4 <- rpareto(1000, 6, 2)
#empirical raw moment
memp <- function(x, order)
ifelse(order == 1, mean(x), sum(x^order)/length(x))
#fit
mmedist(x4, "pareto", order=c(1, 2), memp="memp",
start=c(shape=10, scale=10), lower=1, upper=Inf)
Thank you in advance for any help.
You will need to make non-trivial changes to the source of mmedist -- I recommend that you copy out the code, and make your own function foo_mmedist.
The first change you need to make is on line 94 of mmedist:
if (!exists("memp", mode = "function"))
That line checks whether "memp" is a function that exists, as opposed to whether the argument that you have actually passed exists as a function.
if (!exists(as.character(expression(memp)), mode = "function"))
The second, as I have already noted, relates to the fact that the optim routine actually calls funobj which calls DIFF2, which calls (see line 112) the user-supplied memp function, minvweibull in your case with two arguments -- obs, which resolves to data and order, but since minvweibull does not take data as the first argument, this fails.
This is expected, as the help page tells you:
memp A function implementing empirical moments, raw or centered but
has to be consistent with distr argument. This function must have
two arguments : as a first one the numeric vector of the data and as a
second the order of the moment returned by the function.
How can you fix this? Pass the function moment from the moments package. Here is complete code (assuming that you have made the change above, and created a new function called foo_mmedist):
# values
n = 100
scale = 1
shape = 3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
# estimating the parameters
para_lm = foo_mmedist(data_fre, "invweibull",
start= c(shape=5,scale=2), order=c(1, 2), memp = moment)
You can check that optimization has occurred as expected:
> para_lm$estimate
shape scale
2.490816 1.004128
Note however, that this actually reduces to a crude way of doing overdetermined method of moments, and am not sure that this is theoretically appropriate.

Resources