How to estimate gamma and cost parameters for SVM quickly - r

I want to train SVMs in R and I know there are functions such as e1071::tune.svm() that can be used to find the optimal parameters for the SVM. However, it seems there are some formulas out there (e.g. used in this report) that can give you a reasonable estimate of these parameters.
Since a grid-search for the parameters can take quite a lot of time on larger datasets and usually, one has to provide a range of possible values anyway, I wondered whether there is a package that implements formulas to get a quick estimate for the gamma and cost parameters for the SVM?
So far, I've found out that caret::train() might use such an approach to estimate sigma (which should be the reciprocal of 2*gamma^2) but I haven't tried it yet, since other calculations are still running (and will be, probably for the next days). Is there also an implementation to estimate cost or at least give a range of reasonable values?
I have found a similar question that asks for alternatives to grid-search in general. However, I would be interested in an R implementation of such alternatives and also, I hope things have developed further since the more general question was posted years ago.

Related

Results different after latest JAGS update?

I am running Bayesian Hierarchical Modeling in R using R2jags. When I open code I used a month ago and run it on a dataset I used a month ago (verified by "date modified" in windows explorer), I get different results than I got a month ago. The only difference I can think of is I got a new work computer in the last month and we installed JAGS 4.3.0. I was previously using 4.2.0.
Is it remotely possible to get different results just from updating my version of JAGS? I'm not posting code or results here because I don't need help troubleshooting it - everything is exactly the same.
Edit:
Conversion seems fine - Gewekes, autocorrelation plots, and trace plots look good. That hasn't changed.
I have a seed set both via set.seet () and jags.seed=. Is that enough? I've never had a problem replicating these types of results before.
As far as how different the results are, they are large enough to cause meaningful difference in the inference. I am assessing relationships between 30 chemical exposures and a health outcome in among 336 humans. Here are two examples. Chemical B troubles me the most because of the credible interval shift. Chemical A is another example.
I also doubled the number of iterations from 50k to 100k which resulted in very minor/inconsequential differences.
Edit 2:
I posted at source forge asking about the different default RNGs for versions: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/52bfef7d17/
There are at least 3 possible reasons for you seeing a difference between results from these models:
One or both of your attempts to fit this model did not converge, and/or your effective sample size is so small that random sampling error is having a large impact on your inference. If you have already checked to ensure convergence and sufficient effective sample size (for both models) then you can rule this out.
You are seeing small differences in the posteriors due to the random sampling inherent to MCMC in otherwise converged results. If these differences are big enough to cause a meaningful difference in inference then your effective sample size is not high enough - so just run the models for longer and the difference should reduce. You can also set the random seed in JAGS using initial values for .RNG.seed and .RNG.name so that successive model runs are numerically identical. If you run the models for longer and this difference does not reduce (or if it is a large difference to begin with) then you can rule this out.
Your model contains a node for which the default sampling scheme changed between JAGS 4.2.0 and 4.3.0 - there were some changes to sampling schemes (and the order of precedence for assigning samplers to nodes) that could conceivably have changed your results (from memory I think this affected GLM particularly, but I can't remember exactly). However, although this may affect the probability of convergence, it should not substantially affect the posterior if the model does converge. It may be contributing to a numerical difference as explained for point (2) though.
I'd recommend first ensuring convergence of both models, and then (assuming they did both converge) looking at exactly how much of a difference you are seeing. If it looks like both models converged AND the difference is more than just random sampling variation, then please reply here and/or update your question (as that shouldn't happen ... i.e. we may need to look into the possibility of a bug in JAGS).
Thanks,
Matt
--------- Edit following additional information added to the question --------
Based on what you have written, it does seem that the difference in inference exceeds what might be expected due to random variation, so there may be some kind of underlying issue here. In order to diagnose this further we would need a minimal reproducible example (https://stackoverflow.com/help/minimal-reproducible-example). This means that you would need to provide not only the model (or preferably a simplified model that still exhibits the problem) but also some data to which we can fit the model. If your data are too sensitive to share then this could be a fictitious dataset for which you also see a difference between JAGS 4.2.0 and JAGS 4.3.0.
The official help forum for JAGS is at https://sourceforge.net/p/mcmc-jags/discussion/610037/ - so you can certainly post there, although we would still need a minimal reproducible example to be able to do anything. If you do so, then please update both posts with a link to the other so that anyone reading either post knows about the cross-posting. You should also note that R2jags is not officially supported on the JAGS forums, so please provide the minimal reproducible example using plain rjags code (or runjags if you prefer) rather than using the R2jags wrapper.
To answer your question in the comments: in order to obtain information on the samplers used you can use rjags::list.samplers() eg:
library(rjags)
# LINE is just a small example model built into rjags:
data(LINE)
LINE$recompile()
# $`bugs::ConjugateGamma`
# [1] "tau"
# $`bugs::ConjugateNormal`
# [1] "alpha"
# $`bugs::ConjugateNormal`
# [1] "beta"

mlr3 optimized average of ensemble

I try to optimize the averaged prediction of two logistic regressions in a classification task using a superlearner.
My measure of interest is classif.auc
The mlr3 help file tells me (?mlr_learners_avg)
Predictions are averaged using weights (in order of appearance in the
data) which are optimized using nonlinear optimization from the
package "nloptr" for a measure provided in measure (defaults to
classif.acc for LearnerClassifAvg and regr.mse for LearnerRegrAvg).
Learned weights can be obtained from $model. Using non-linear
optimization is implemented in the SuperLearner R package. For a more
detailed analysis the reader is referred to LeDell (2015).
I have two questions regarding this information:
When I look at the source code I think LearnerClassifAvg$new() defaults to "classif.ce", is that true?
I think I could set it to classif.auc with param_set$values <- list(measure="classif.auc",optimizer="nloptr",log_level="warn")
The help file refers to the SuperLearner package and LeDell 2015. As I understand it correctly, the proposed "AUC-Maximizing Ensembles through Metalearning" solution from the paper above is, however, not impelemented in mlr3? Or do I miss something? Could this solution be applied in mlr3? In the mlr3 book I found a paragraph regarding calling an external optimization function, would that be possible for SuperLearner?
As far as I understand it, LeDell2015 proposes and evaluate a general strategy that optimizes AUC as a black-box function by learning optimal weights. They do not really propose a best strategy or any concrete defaults so I looked into the defaults of the SuperLearner package's AUC optimization strategy.
Assuming I understood the paper correctly:
The LearnerClassifAvg basically implements what is proposed in LeDell2015 namely, it optimizes the weights for any metric using non-linear optimization. LeDell2015 focus on the special case of optimizing AUC. As you rightly pointed out, by setting the measure to "classif.auc" you get a meta-learner that optimizes AUC. The default with respect to which optimization routine is used deviates between mlr3pipelines and the SuperLearner package, where we use NLOPT_LN_COBYLA and SuperLearner ... uses the Nelder-Mead method via the optim function to minimize rank loss (from the documentation).
So in order to get exactly the same behaviour, you would need to implement a Nelder-Mead bbotk::Optimizer similar to here that simply wraps stats::optim with method Nelder-Mead and carefully compare settings and stopping criteria. I am fairly confident that NLOPT_LN_COBYLA delivers somewhat comparable results, LeDell2015 has a comparison of the different optimizers for further reference.
Thanks for spotting the error in the documentation. I agree, that the description is a little unclear and I will try to improve this!

Time complexity of nlm-package in R?

I'm estimating a Non-Linear system (via seemingly unrelated regressions - SUR), using systemfit (nlsystemfit() function) package with 4 equations, 32 parameters to estimate (!) and 412 observations. But my code is taking forever (my laptop it's not a super-powerful one tho). So far, the process was on a 13 hours run. I'm not an expert in computational stuff, but someone explained me some time ago the concept of Time Complexity of the algorithms (or big-o), then depending on this concept the time to compute a certain algorithm could rely on specific functional relation on the number of observations and/or coefficients.
Hence, I'm thinking of just stopping my process, and trying to simplify the model (temporarily) and trying to run something simpler, only to check-up if the estimated parameters had sens so far. And then, run a whole model.
But all this has a sense if I can change key elements in my model, which can reduce the time of processing significantly. That's why I was looking on google about the time complexity of nlm-package (nlsystemfit() function relies on nlm) but unsuccessfully. So, this is my question: Anybody knows where I can find that info, or at least give me advice on how test non-linear systems before run a whole model?
Since you didn't provide any substantial information regarding your model or some code for the same, its hard to express a betterment for your situation.
From what you said:
Hence, I'm thinking of just stopping my process, and trying to simplify the model (temporarily) and trying to run something simpler, only to check-up if the estimated parameters had sens so far. And then, run a whole model.
It seems you require benchmarking or to obtain the measured time taken to execute, as in your case. (although it can deal with memory usage or some other performance metric as well)
There are quite a few ways to benchmark code in R, which include the use of Sys.time() or system.time() just before and right after your algorithm/function executes, or libraries such as rbenchmark (which is a simple wrapper around the system.time function), tictoc, bench and microbenchmark.
Among these the last two are preferable options, as bench::mark includes system_time(), a higher precision alternative to system.time() and microbenchmark is known to be a reliable source to accurately measure and compare the execution time of R expressions/algorithms.

R Supervised Latent Dirichlet Allocation Package

I'm using this LDA package for R. Specifically I am trying to do supervised latent dirichlet allocation (slda). In the linked package, there's an slda.em function. However what confuses me is that it asks for alpha, eta and variance parameters. As far as I understand, I thought these parameters are unknowns in the model. So my question is, did the author of the package mean to say that these are initial guesses for the parameters? If yes, there doesn't seem to be a way of accessing them from the result of running slda.em.
Aside from coding the extra EM steps in the algorithm, is there a suggested way to guess reasonable values for these parameters?
Since you are trying to generate a supervised model, the typical approach would be to use cross validation to determine the model parameters. So you hold out some of the data as your test set, train the a model on the remaining data, and evaluate the model performance, repeating k times. You then continue to repeat with different model parameters to determine which result in the best model performance.
In the specific case of slda, I would run demo(slda) to see the author's implementation of it. When you run the demo, you'll see that he sets alpha=1.0, eta=0.1, and variance=0.25. I'd suggest using these as your starting point, and then use cross validation to determine better parameters if you need to improve model performance.

How do you use the rugarch package to include the stable distribution

In R, I would like to use rugarch and stabledist/fBasics packages together to fit a univariate time-series object to be modeled as an ARMA(1,1)-GARCH(1,1) process with the innovation term/conditional distribution term being modeled as a stable distribution. Is there a way to to this? given that the fBasics package allows one to have a dstable() function, which I'm guessing would be used to optimise a maximum-likelihood function.
And as a follow up, how would one go about simulating several thousand iterations of x days forward returns assuming it follows the same process. (I'm guessing here using the function rstable() with the parameters estimated above.)
Any other packages that you might think would do the job better would gladly be looked at as well.
Yes, you can use dstable and rstable, but they come from package stabledist...
If you want to estimate the stable parameters, you can use
fBasics::stableFit(data, type="mle")
to give you MLE estimate, but usually takes few minutes to compute.
Faster, but little less precise is the quantile method (implicit for stableFit, i.e. dont specify the type).
Then if you get the fit, you extract the resulting estimates from result#fit$estiamte and can use it in rstable to draw random variates..

Resources