Why can optim() be used recursively in R

I am hoping someone can clear something up for me please. I am using optim to optimise a set of parameters to fit a model to a timeseries. The code works fine and optimises fine.
However, I recently read on the optim help page that you can use optim() recursively. So I implemented this and it improves the fit over multiple recursions.
My question is this: If optim is already optimising over the parameter space and spitting out an optimal answer for that starting condition, why is the fit improved by using it recursively, repeatedly? This seems unintuitive.
How to find several solutions of nonlinear equation using R e.g. nleqslv?

As far as I understand R's nonlinear equation solver nleqslv(x, fn) finds only one solution of the nonlinear equation.
However (as Bhas commented) searchZeros function (the same package) can find my solutions depending on the starting points.
Question: are there some function in R which can help choosing the set of initial points for searchZeros ,which will help me to find all the solutions ?
I am interested in the case of function with several variables.
I undestand that solution to be found pretty much depends on the initial approximation. So the brute force way is to check some reasonable grid of intial approximations. However there might be some more intelligent way to get all the solutions ?

Optimizing over 3 dimensional piece-wise constant function

I'm working on a simulation project with a 3-dimensional piece-wise constant function, and I'm trying to find the inputs that maximize the output. Using optim() in R with the Nelder-Mead or SANN algorithms seems best (they don't require the function to be differentiable), but I'm finding that optim() ends up returning my starting value exactly. This starting value was obtained using a grid search, so it's likely reasonably good, but I'd be surprised if it was the exact peak.
I suspect that optim() is not testing points far enough out from the initial guess, leading to a situation where all tested points give the same output.
Is this a reasonable concern?
How can I tweak the breadth of values that optim() is testing as it searches?

Does R support switching between optimizers like STATA does?

I need to implement the model show here:
The model estimation step on p.315 notes that:
"We maximize the likelihood by iterating the Marquart and
Berndt–Hall–Hall–Hausman algorithms, using numerical derivatives, optimal
stepsize, and a convergence criterion of 10^-6 for the change in the norm of the
parameter vector from one iteration to the next."
Now I know that stata supports switching between optimizers,
see bottom of p2.
Is there an R package or Matlab function/s that can do the same thing?
Specifically I need to be able to switch between BHHH and Levenberg-Marquardt.
Kind Regards
For R, check out the CRAN Task View on Optimization. Searching that page, it looks like BHHH and Marquardt are available in separate packages (minpack.lm and maxLik, respectively). You could write your own code to handle switching between them.

Is there any Python equivalent of R's biglm?

I have used biglm in R and found it very useful. Now I need the same type of functionality in python. Any ideas? I have seen that patsy/statsmodels has an incremental mode, but have not been able to find any samples to copy/adapt. Any pointers would be much appreciated.
from a related answer of Nathaniel Smith on the statsmodels mailing list
My incremental LS code might be useful here, it's basically the same
The new X'X is the sum of the old X'Xs, then you have to re-do the
scaling and inversion to get the new vcov matrix for the estimates.
Should be doable so long as you know how many data points are in each
and the various sums-of-squares. (The code I linked has some extra
complexity because of handling a particular sort of heteroskedasticity
via FGLS, but it can pretty much be ignored.)
statsmodels doesn't have anything in this area yet.
There is an incremental OLS function in statsmodels, however that was written as helper function for cusum tests (in memory) and hasn't been used or checked for any other purpose:

R script - nls function

Can anyone give me a good explanation for what the parameter "algorithm" does in the nls function in R?
Also, how does the formula work? I know it uses a tilda, but I can't really find a down-to-earth explanation of it.
Also, how important are the start values? Do I need to try multiple start values, or can I still have a guarantee that nls will find the correct parameters regardless of the start values I use?
In brief:
nls() is going to vary parameters to try to minimize the square error between your model and your data. There's several good methods it can try to find the minimum. Reading the details about "method" in ?optim will provide some good info and references.
In general, for nonlinear models, your results can be sensitive to initial guess. You should try several different guesses to make sure that the outputs are close. If your results are very sensitive to your guess, you can try re-parameterizing, using a different algorithm, or rethinking your model.
As for the formula, I'd echo the previous answer. Work through the examples in the bottom of ?nls and then try to ask a more specific question.
