R script - nls function - r

Can anyone give me a good explanation for what the parameter "algorithm" does in the nls function in R?
Also, how does the formula work? I know it uses a tilda, but I can't really find a down-to-earth explanation of it.
Also, how important are the start values? Do I need to try multiple start values, or can I still have a guarantee that nls will find the correct parameters regardless of the start values I use?

In brief:
nls() is going to vary parameters to try to minimize the square error between your model and your data. There's several good methods it can try to find the minimum. Reading the details about "method" in ?optim will provide some good info and references.
In general, for nonlinear models, your results can be sensitive to initial guess. You should try several different guesses to make sure that the outputs are close. If your results are very sensitive to your guess, you can try re-parameterizing, using a different algorithm, or rethinking your model.
As for the formula, I'd echo the previous answer. Work through the examples in the bottom of ?nls and then try to ask a more specific question.

Related

R: Evaluate Gradient Boosting Machines (GBM) for Regression

Which are the best metrics to evaluate the fit of a GBM algorithm in R (metrics, graphs, ratios)? And how interpret them?
I think maybe you are overthinking this one! Take a step back and think about what matters... the error. You have forecasted values and you have observed values. the difference tells you most of what you need to know when comparing across models. Basic measures like MSE, MPE, etc. should do fine. If you are looking to refine within a given model, I would recommend taking a look at the gbm documentation. For example, you can pass your gbm model object to summary(), to get the relative influence of each of your variables. Additionally, you can find a lot of information in the documentation, so if you haven't taken a look, I would recommend doing so! I have posted the link at the bottom.
-Carmine
gbm_documentation

R numerical method similar to Vpasolve in Matlab

I am trying to solve a numerical equation in R but would want a method which perform similar to vpasolve in Matlab. I have a non linear equation (involving lot of log functions) which when solve in R with uniroot gives me complete different answer compared to what vpasolve gives in matlab.
First, a word of caution: it's often much more productive to learn that there's a better way to do something than the way you are used to doing.
edit
I went back to MATLAB and realized that the "vpa" collection is using extended precision. Is that absolutely necessary for your purposes? If not, then my suggestions below may suffice.
If you do require extended precision, then perhaps Rmpfr::unirootR function will suffice. I would like to point out that, since all these solvers are generating an approximate solution (as opposed to analytic), the use of extended precision operations seems a bit pointless.
Next, you need to determine whether MATLAB::vpasolve or uniroot is getting you the correct answer. Or maybe you simply are converging to a root that's not the one you want, in which case you need to read up on setting limits on the starting conditions or the search region.
Finally, in addition to uniroot, I recommend you learn to use the R packages BBsolve , nleqslv, rootsolve, and ktsolve (disclaimer: I am the owner and maintainer of ktsolve). These packages are pretty flexible and may lead you to better solutions to your original problem.

How to find several solutions of nonlinear equation using R e.g. nleqslv?

As far as I understand R's nonlinear equation solver nleqslv(x, fn) finds only one solution of the nonlinear equation.
However (as Bhas commented) searchZeros function (the same package) can find my solutions depending on the starting points.
Question: are there some function in R which can help choosing the set of initial points for searchZeros ,which will help me to find all the solutions ?
I am interested in the case of function with several variables.
I undestand that solution to be found pretty much depends on the initial approximation. So the brute force way is to check some reasonable grid of intial approximations. However there might be some more intelligent way to get all the solutions ?

How does r calculate the p-values in logistic regression

What type of p-values do R calculate in a binomial logistic regression, and where is this documented?
When i read the documentation for ?glm() I find no reference to the calculation of the p-values.
The p-values are calculated by the function summary.glm. See ?summary.glm for a (very brief) bit about how those are calculated.
For more information, look at the source code by typing
summary.glm
at the R command prompt. There you will find the lines of code where an object pvalue is created. Follow the code back to see how the components of the p-value calculation are (conditionally) calculated.
The authors of R wrote the help system with several principles in mind: compactness (don't write more than is needed, it's not a textbook), accuracy, and a curious and well-educated audience. It really was written for other statisticians. The "curious" part of that opening sentence was included to raise the question why you did not also follow the various links in the ?glm page: to summary.glm where you would have found one answer to your ambiguous question or to anova.glm where you would have found another possible answer. The help-authors do expect that you will follow those links and read the whole page and execute the examples. You will notice that even after you get to summary.glm that there is no mention of "binary logistic regression" since they pretty much assume that you are well-grounded in statistics and have copy of McCullagh and Nelder handy, or if not that you will go read the references.
The other principle: sometimes it is the code itself (given the open-source nature of R) that performs the documentation. Technically glm doesn't print anything and print.glm doesn't print p-values. It would be print.summary.glm or print.anova.glm that would be doing any printing. Part of learning R is learning that the results printed to the console will have gone through a eval-print loop and that output can be tailored with object-class-specific functions.
These assumptions are just part of what many people see as a "steep learning curve for R" (although I would have called it a shallow curve if plotted with time/effort on x-axis.)

How can I do blind fitting on a list of x, y value pairs if I don't know the form of f(x) = y?

If I have a function f(x) = y that I don't know the form of, and if I have a long list of x and y value pairs (potentially thousands of them), is there a program/package/library that will generate potential forms of f(x)?
Obviously there's a lot of ambiguity to the possible forms of any f(x), so something that produces many non-trivial unique answers (in reduced terms) would be ideal, but something that could produce at least one answer would also be good.
If x and y are derived from observational data (i.e. experimental results), are there programs that can create approximate forms of f(x)? On the other hand, if you know beforehand that there is a completely deterministic relationship between x and y (as in the input and output of a pseudo random number generator) are there programs than can create exact forms of f(x)?
Soooo, I found the answer to my own question. Cornell has released a piece of software for doing exactly this kind of blind fitting called Eureqa. It has to be one of the most polished pieces of software that I've ever seen come out of an academic lab. It's seriously pretty nifty. Check it out:
It's even got turnkey integration with Amazon's ec2 clusters, so you can offload some of the heavy computational lifting from your local computer onto the cloud at the push of a button for a very reasonable fee.
I think that I'm going to have to learn more about GUI programming so that I can steal its interface.
(This is more of a numerical methods question.) If there is some kind of observable pattern (you can kinda see the function), then yes, there are several ways you can approximate the original function, but they'll be just that, approximations.
What you want to do is called interpolation. Two very simple (and not very good) methods are Newton's method and Laplace's method of interpolation. They both work on the same principle but they are implemented differently (Laplace's is iterative, Newton's is recursive, for one).
If there's not much going on between any two of your data points (ie, the actual function doesn't have any "bumps" whose "peaks" are not represented by one of your data points), then the spline method of interpolation is one of the best choices you can make. It's a bit harder to implement, but it produces nice results.
Edit: Sometimes, depending on your specific problem, these methods above might be overkill. Sometimes, you'll find that linear interpolation (where you just connect points with straight lines) is a perfectly good solution to your problem.
It depends.
If you're using data acquired from the real-world, then statistical regression techniques can provide you with some tools to evaluate the best fit; if you have several hypothesis for the form of the function, you can use statistical regression to discover the "best" fit, though you may need to be careful about over-fitting a curve -- sometimes the best fit (highest correlation) for a specific dataset completely fails to work for future observations.
If, on the other hand, the data was generated something synthetically (say, you know they were generated by a polynomial), then you can use polynomial curve fitting methods that will give you the exact answer you need.
Yes, there are such things.
If you plot the values and see that there's some functional relationship that makes sense, you can use least squares fitting to calculate the parameter values that minimize the error.
If you don't know what the function should look like, you can use simple spline or interpolation schemes.
You can also use software to guess what the function should be. Maybe something like Maxima can help.
Wolfram Alpha can help you guess:
http://blog.wolframalpha.com/2011/05/17/plotting-functions-and-graphs-in-wolframalpha/
Polynomial Interpolation is the way to go if you have a totally random set
http://en.wikipedia.org/wiki/Polynomial_interpolation
If your set is nearly linear, then regression will give you a good approximation.
Creating exact form from the X's and Y's is mostly impossible.
Notice that what you are trying to achieve is at the heart of many Machine Learning algorithm and therefor you might find what you are looking for on some specialized libraries.
A list of x/y values N items long can always be generated by an degree-N polynomial (assuming no x values are the same). See this article for more details:
http://en.wikipedia.org/wiki/Polynomial_interpolation
Some lists may also match other function types, such as exponential, sinusoidal, and many others. It is impossible to find the 'simplest' matching function, but the best you can do is go through a list of common ones like exponential, sinusoidal, etc. and if none of them match, interpolate the polynomial.
I'm not aware of any software that can do this for you, though.

Resources