I have a multidimensional array of data (x1,x2,x3,...,y). There are no information about data correlation, nature and boundaries. I have performed some analyses to find linear dependence using regression but nothing were found.
I would like try to find non-linear dependence. I haven't found any information how to perform the analysis if I just have portion of data. Which methods and/or algorithms can I use to find dependence of data?
The general topic you are looking for has various names. Search for "nonlinear regression" and "data mining" and "machine learning". I second the recommendation for Hastie & Tibshirani, "Elements of Statistical Learning". Brian Ripley also has a good book on the topic; I don't remember the title. There are probably many more good books.
If you can give more details about the problem, maybe someone has more specific advice. Probably it's better to take it to the StackExchange statistics forum rather than StackOverflow.
Related
My master thesis is in health forecasting and I'm using R (fable, fabletools, fasster) to implement the methods.
For the theoretical part of the thesis, I need to know the heuristics and the theoretical basis of each function I use.
I have been using Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos and I have already read R documentation on these functions but I still have some doubts.
I need information like what theoretical method they follow (ARIMA, Moving Averages, ANN, etc), the mathematical expression they use and how it is decided which is the best fit (for automatic methods):
I use the following methods and gathered some information about each one.
I'm new in this field and I need some help.
Is this correct? Can anyone add anything else about any of the functions?
ARIMA() - MSARIMA model (meaning an ARIMA model that is sensible to seasonality and can take into account several external regressors:
SNAIVE()- Linear regression with seasonality;
NNETAR() - ANN model;
fasster()
ETS()
Thank you in advance!
The book you cite contains information on how SNAIVE, NNETAR, ETS, and ARIMA forecasts are calculated. It explains that for model classes such as ETS and ARIMA, the AICc is used to select a particular model. It gives equations for all these methods. Please read it.
fasster() is a new method that is not fully documented yet. The readme file (https://github.com/tidyverts/fasster) provides some information, and there is a talk by the author (https://www.youtube.com/watch?v=6YlboftSalY) explaining the state space modelling framework behind it.
I have a question on "augment" function from Silge and Robinson's "Text Mining with R: A Tidy Approach" textbook. Having run an LDA on a corpus, I am applying the "augment" to assign topics to each word.
I get the results, but am not sure what takes place "under the hood" behind "augment", i.e. how the topic for each word is being determined using the Bayesian framework. Is it just based on conditional probability formula, and estimated after LDA is fit using p(topic|word)=p(word|topic)*p(topic)/p(word)?
I will appreciate if someone could please provide statistical details on how "augment" does this. Could you also please provide references to papers where this is documented.
The tidytext package is open source and on GitHub so you can dig into the code for augment() for yourself. I'd suggest looking at
augment() for LDA from the topicmodels package
augment() for the structural topic model from the stm package
To learn more about these approaches, there is an excellent paper/vignette on the structural topic model, and I like the Wikipedia article for LDA.
Does anyone know which R package has the implementation of Generalized Reduced Gradient (GRG2) Algorithm ? thanks
Since #BenBolker has done the initial footwork in finding what sort of functionality you were hoping to replicate I'm posting a follow-up that might be useful. A recent exchange on Rhelp ended with a quote that was nominated for the R fortunes package, although it is not clear to me whether it was accepted:
"The idea that the Excel solver "has a good reputation for being fast
and accurate" does not withstand an examination of the Excel solver's
ability to solve the StRD nls test problems. ...
Excel solver does have the virtue that it will always produce an
answer, albeit one with zero accurate digits."
"I am unaware of R being applied to the StRD, but I did apply S+ to the
StRD and, with analytic derivatives, it performed flawlessly."
From: Bruce McCullough <bdmccullough#drexel.edu>
Date: February 20, 2013 7:58:24 AM PST
Here is a link to the self-cited work documenting the failures of the Excel Solver (which we now know is powered by some version of the GRG2 algorithm) by McCullough:
www.pages.drexel.edu/~bdm25/chap8.pdf and the link to the NIST website for the testing problems are here: http://www.itl.nist.gov/div898/strd/nls/nls_info.shtml and http://www.itl.nist.gov/div898/strd/nls/nls_main.shtml
The negative comment (brought to my attention by a downvote) from #jwg prompted me to redo the search suggested by Bolker. Still no hits for findFn( "GRG2"). I can report several hits for "GRG" none of them apparently to a solver, and was amused that one of them has the catchy expansion to "General Random Guessing model". That seemed particularly amusing when the thrust of my arguably non-answer was that choosing to use Excel's solver left one genuinely uncertain about the accuracy of the solution. I am unrepentant about posting an "answer" that does not deliver exactly what was requested, but instead warns users who might not be religiously committed to the Microsoft way in this statistical/mathematical arena. The lack of any effort on the part of the distributed R developers to provide a drop-in-replacement for the Excel solver is something to ponder seriously.
Some relavant insights come from this post to R-help by a reputable statistical scientist :
The code in Excel is actually called GRG2 (the 2 does matter). Unlike
any of the methods for optim(), it can handle nonlinear inequality
constraints and does not need a feasible initial solution.
There's a blurb about it in the NEOS optimisation guide:
http://www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide/Blurbs/grg2.html
Judging from this blurb, it will be similar to L-BFGS-B for problems
with no constraints or box constraints.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics tlumley at
u.washington.edu University of Washington, Seattle
So under some conditions it may be suitable to use optim like this in place of the Excel solver:
optim(pars,
OptPars,
... ,
method = "L-BFGS-B")
Note that the NEOS optimisation guide is now here: http://neos-guide.org/content/optimization-guide and GRG2 is mentioned on this page: http://neos-guide.org/content/reduced-gradient-methods It lists BFGS, CONOPT and several others as related algorithms. The article describes these as 'projected augmented Lagrangian algorithm.' According to the Optimization CTV, these algorithms can be found in nloptr, alabama and Rsolnp.
I've had good matches (to six sig figs) between the Excel solver and R using the optimx package, but YMMV.
Does there exist a simple, cheatsheet-like document which compiles the best practices for mathematical computing in R? Does anyone have a short list of their best-practices? E.g., it would include items like:
For large numerical vectors x, instead of computing x^2, one should compute x*x. This speeds up calculations.
To solve a system $Ax = b$, never solve $A^{-1}$ and left-multiply $b$. Lower order algorithms exist (e.g., Gaussian elimination)
I did find a nice numerical analysis cheatsheet here. But I'm looking for something quicker, dirtier, and more specific to R.
#Dirk Eddelbeuttel has posted a bunch of stuff on "high performance computing with R". He's also a regular so will probably come along and grab some well-deserved reputation points. While you are waiting you can read some of his stuff here:
http://dirk.eddelbuettel.com/papers/ismNov2009introHPCwithR.pdf
There is an archive of the r-devel mailing list where discussions about numerical analysis issues relating to R performance occur. I will often put its URL in the Google advanced search page domain slot when I want to see what might have been said in the past: https://stat.ethz.ch/pipermail/r-devel/
I'm using R language and the manuals on the R site are really informative. However, I'd like to see some more examples and implementations with R which can help me develop my knowledge faster. Any suggestions?
Just to add some more
Programming in R
INTRODUCTION TO STATISTICAL MODELLING IN R
Linear algebra in R
The R Inferno
R by example
The R Clinic
Survey analysis in R
R & Bioconductor Manual
Rtips
Resources to help you learn and use R
General R Links
I'll mention a few that i think are excellent resources but that i haven't seen mentioned on SO. They are all free and freely available on the Web (links supplied).
Data Analysis Examples
A collection of individual examples from the UCLA Statistics Dept. which you can browse by major category (e.g., "Count Models", "Multivariate Analysis", "Power Analysis") then download examples with complete R code under any of these rubrics (e.g., under "Count Models" are "Poisson Regression", "Negative Binomial Regression", and so on).
Verzani: SimpleR: Using R for Introductory Statistics A little over 100 pages, and just outstanding. It's easy to follow but very dense. It is a few years old, still i've only found one deprecated function in this text. This is a resource for a brand new R user; it also happens to be an excellent statistics refresher. This text probably contains 20+ examples (with R code and explanation) directed to fundamental statistics (e.g., hypothesis testing, linear regression, simple simulation, and descriptive statistics).
Statistics with R (Vincent Zoonekynd) You can read it online or print it as a pdf. Printed it's well over 1000 pages. The author obviously got a lot of the information by reading the source code for the various functions he discusses--a lot of the information here, i haven't found in any other source. This resource contains large sections on Graphics, Basic Statistics, Regression, Time Series--all w/ small examples (R code + explanation). The final three sections contain the most exemplary code--very thorough application sections on Finance (which seems to be the author's professional field), Genetics, and Image Analysis.
All the packages on CRAN are open source, so you can download all the source code from there. I recommend starting there by looking at the packages you use regularly to see how they're implemented.
Beyond that, Rosetta Code has many R examples. And you may want to follow R-Bloggers.
Book like tutorials
Book like tutorials are usually spread in the form of PDF. Many of them are available on the R-project homepage here:
http://cran.r-project.org/other-docs.html#english
(This link includes many of the texts others have mentioned)
Article like tutorials
These are usually present inside blogs. The biggest list of R-bloggers I know of exists here:
http://www.r-bloggers.com/
And many of these bloggers posts (many of which are tutorials) are listed here:
http://www.r-bloggers.com/archive/
(although inside each blog there are usually more tutorials).