I'm newbie in R and I want to implement the random forest algorithm using the caret package.
Is there any useful tutorial, step by step?
Most packages contain a manual, and many also include vignettes.
A quick look at the CRAN page for caret http://cran.r-project.org/web/packages/caret/index.html shows that this packages is particularly well documented.
It contains 4 vignettes:
caret Manual – Data and Functions
caret Manual – Variable Selection
caret Manual – Model Building
caret Manual – Variable Importance
Start there.
A few more things appeared about caret package since the question was originally asked. Two tutorials by Max Kuhn, maintainer of caret, I found particularly useful.
YouTube caret webinar and useR! 2013 tutorial
Another two excellent starting points are:
Max Kuhn and Kjell Johnson - Applied Predictive Modeling (2013) - http://appliedpredictivemodeling.com/
caret webpage - http://topepo.github.io/caret/index.html
Related
I'm trying to build a random forest using model based regression trees in partykit package. I have built a model based tree using mob() function with a user defined fit() function which returns an object at the terminal node.
In partykit there is cforest() which uses only ctree() type trees. I want to know if it is possible to modify cforest() or write a new function which builds random forests from model based trees which returns objects at the terminal node. I want to use the objects in the terminal node for predictions. Any help is much appreciated. Thank you in advance.
Edit: The tree I have built is similar to the one here -> https://stackoverflow.com/a/37059827/14168775
How do I build a random forest using a tree similar to the one in above answer?
At the moment, there is no canned solution for general model-based forests using mob() although most of the building blocks are available. However, we are currently reimplementing the backend of mob() so that we can leverage the infrastructure underlying cforest() more easily. Also, mob() is quite a bit slower than ctree() which is somewhat inconvenient in learning forests.
The best alternative, currently, is to use cforest() with a custom ytrafo. These can also accomodate model-based transformations, very much like the scores in mob(). In fact, in many situations ctree() and mob() yield very similar results when provided with the same score function as the transformation.
A worked example is available in this conference presentation:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2017).
"Individual Treatment Effect Prediction Using Model-Based Random Forests."
Presented at Workshop "Psychoco 2017 - International Workshop on Psychometric Computing",
WU Wirtschaftsuniversität Wien, Austria.
URL https://eeecon.uibk.ac.at/~zeileis/papers/Psychoco-2017.pdf
The special case of model-based random forests for individual treatment effect prediction was also implemented in a dedicated package model4you that uses the approach from the presentation above and is available from CRAN. See also:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2019).
"model4you: An R Package for Personalised Treatment Effect Estimation."
Journal of Open Research Software, 7(17), 1-6.
doi:10.5334/jors.219
Is there an R-Package I could use for Bayesian parameter estimation as an alternative to JAGS? I found an old question regarding JAGS/BUGS alternatives in R, however, the last post is already 9 years old. So maybe there are new and flexible gibbs sampling packages available in R? I want to use it to get parameter estimates for novel hierarchical hidden markov models with random effects and covariates etc. I highly value the flexibility of JAGS and think that JAGS is simply great, however, I want to write R functions that facilitate model specification and am looking for a package that I can use for parameter estimation.
There are some alternatives:
stan, with rstan R package. Stan looks well optimized but cannot do certain type of models (like binomial/poisson mixture model), since he cannot sample a discrete variable (or something like that...).
nimble
if you want highly optimized sampling based on C++, you may want to check Rcpp based solutions from Dirk Eddelbuettel
I am using MXNet library in RStudio to train a neural network model.
When training the model using caret, I can tune (among others) the "momentum" parameter. Is this related with the Stochastic Gradient Descent optimizer?
I know that this is the default optimizer when training using "mx.model.FeedForward.create", but what happens when I am using caret:::train??
Momentum is related to SGD and controls how prone your algorithm to change direction of descend. There are several formulas to do that, read more about it here: https://towardsdatascience.com/stochastic-gradient-descent-with-momentum-a84097641a5d
Caret package suppose to be general purpose, so it works with MXNet. When you call cret::train it can accept method parameter. It should be taken from the repository of caret package, which at the moment supports MXNet. See this for an example: https://github.com/topepo/caret/issues/887 from Adam or https://github.com/topepo/caret/blob/master/RegressionTests/Code/mxnet.R for regular SGD.
I have a question on "augment" function from Silge and Robinson's "Text Mining with R: A Tidy Approach" textbook. Having run an LDA on a corpus, I am applying the "augment" to assign topics to each word.
I get the results, but am not sure what takes place "under the hood" behind "augment", i.e. how the topic for each word is being determined using the Bayesian framework. Is it just based on conditional probability formula, and estimated after LDA is fit using p(topic|word)=p(word|topic)*p(topic)/p(word)?
I will appreciate if someone could please provide statistical details on how "augment" does this. Could you also please provide references to papers where this is documented.
The tidytext package is open source and on GitHub so you can dig into the code for augment() for yourself. I'd suggest looking at
augment() for LDA from the topicmodels package
augment() for the structural topic model from the stm package
To learn more about these approaches, there is an excellent paper/vignette on the structural topic model, and I like the Wikipedia article for LDA.
Is there a R package for a Nested Logit or Probit model?
I've checked the bayesm and mnp packages, and they don't appear to have the capacity.
You could try the mlogit or VGAM packages.
Web searches work well via r-seek.
I don't see that I have the capacity to edit Dirk's postings (not surprising given our relative merits in R programming), but the link is actually to rseek.org, not r-seek.org
Other search facilities include Baron's site and of course Google with "r-project" in the search string: