Citing caret R package in APA style - r

I used caret package to do neural network analysis and need to cite the package in APA style. But, `citation("caret") doesn't look like a typical APA style. Can anyone make it to the APA 6th style? Thanks.
To cite package ‘caret’ in publications use:
Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams,
Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton
Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew
Ziem, Luca Scrucca, Yuan Tang and Can Candan. (2016). caret:
Classification and Regression Training. R package version 6.0-71.
https://CRAN.R-project.org/package=caret

Kuhn, M. (2008). Caret package. Journal of Statistical Software, 28(5)

Here is the citation format in APA:
Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28(5), 1 - 26. doi:http://dx.doi.org/10.18637/jss.v028.i05
The citation format in BibTex (in latex):
#article{JSSv028i05,
author = {Max Kuhn},
title = {Building Predictive Models in R Using the caret Package},
journal = {Journal of Statistical Software, Articles},
volume = {28},
number = {5},
year = {2008},
keywords = {},
abstract = {The caret package, short for classification and regression training, contains numerous tools for developing predictive models using the rich set of models available in R. The package focuses on simplifying model training and tuning across a wide variety of modeling techniques. It also includes methods for pre-processing training data, calculating variable importance, and model visualizations. An example from computational chemistry is used to illustrate the functionality on a real data set and to benchmark the benefits of parallel processing with several types of models.},
issn = {1548-7660},
pages = {1--26},
doi = {10.18637/jss.v028.i05},
url = {https://www.jstatsoft.org/v028/i05}
}
Please refer to the following website for other formats:
https://www.jstatsoft.org/rt/captureCite/v028i05/0/ApaCitationPlugin

Related

Is it possible to build a random forest with model based trees i.e., `mob()` in partykit package

I'm trying to build a random forest using model based regression trees in partykit package. I have built a model based tree using mob() function with a user defined fit() function which returns an object at the terminal node.
In partykit there is cforest() which uses only ctree() type trees. I want to know if it is possible to modify cforest() or write a new function which builds random forests from model based trees which returns objects at the terminal node. I want to use the objects in the terminal node for predictions. Any help is much appreciated. Thank you in advance.
Edit: The tree I have built is similar to the one here -> https://stackoverflow.com/a/37059827/14168775
How do I build a random forest using a tree similar to the one in above answer?
At the moment, there is no canned solution for general model-based forests using mob() although most of the building blocks are available. However, we are currently reimplementing the backend of mob() so that we can leverage the infrastructure underlying cforest() more easily. Also, mob() is quite a bit slower than ctree() which is somewhat inconvenient in learning forests.
The best alternative, currently, is to use cforest() with a custom ytrafo. These can also accomodate model-based transformations, very much like the scores in mob(). In fact, in many situations ctree() and mob() yield very similar results when provided with the same score function as the transformation.
A worked example is available in this conference presentation:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2017).
"Individual Treatment Effect Prediction Using Model-Based Random Forests."
Presented at Workshop "Psychoco 2017 - International Workshop on Psychometric Computing",
WU Wirtschaftsuniversität Wien, Austria.
URL https://eeecon.uibk.ac.at/~zeileis/papers/Psychoco-2017.pdf
The special case of model-based random forests for individual treatment effect prediction was also implemented in a dedicated package model4you that uses the approach from the presentation above and is available from CRAN. See also:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2019).
"model4you: An R Package for Personalised Treatment Effect Estimation."
Journal of Open Research Software, 7(17), 1-6.
doi:10.5334/jors.219

Which decision tree algorithm is used in randomForest in R?

I would like to know, which implementation of random forest in package randomForest in R is used to grow decision trees? Is it CART, ID3, C4.5 ,...... or sth else?
According to ?randomForest() the description states:
randomForest implements Breiman’s random forest algorithm (based on
Breiman and Cutler’s original Fortran code) for classification and
regression. It can also be used in unsupervised mode for assessing
proximities among data points, with Breiman L (2001). "Random
Forests"." Based on: Machine Learning. 45 (1): 5–32.
doi:10.1023/A:1010933404324.
According to Wikipedia (https://en.wikipedia.org/wiki/Random_forest):
The introduction of random forests proper was first made in a paper
by Leo Breiman This paper describes a method of building a forest of
uncorrelated trees using a CART like procedure. Reference to Breiman L (2001).
"Random Forests". Machine Learning. 45 (1): 5–32.
doi:10.1023/A:1010933404324. "
Therefore I would say it is CART.
In R, the ()randomForest package is using CART. There is also another package in R called ()ranger which can run decision trees at a faster pace

defining classes using random forest models in R

I am pretty new to machine learning, and I've stumbled upon an issue and can't seem to find a solution no matter how hard I google.
I have performed a multiclass classification procedure using a randomForest algorithm and found a model that offers adequate prediction of my test sample. I then used varImpPlot() to determine which predictors are most important to the determining the class assignments.
My problem: I would like to know why those predictors are most important. Specifically, I would like to be able to report that cases that fall into Class X hold Characteristics A (e.g., are male), B (e.g., are older), and C (e.g., have high IQ), while cases that fall into Class Y hold Characteristics D (female), E (younger), and F (low IQ), and so on for the rest of my classes.
I know that standard binary logistic regression allows you to say that cases with high values on Characteristic A are more likely to fall into class X, for example. So, I was hoping for something conceptually similar, but from a random forest classification model on multiple classes.
Is this a thing that can be done using random forest models? If yes, is there a function in randomForest or in caret (or even elsewhere) that can help me get past the varImpPlot() and varImp() table?
Thanks!
There is a package named ExplainPrediction that promises an explanation for random forest models. Here's the top of DESCRIPTION file. The URL page has a link to an extensive citation list:
Package: ExplainPrediction
Title: Explanation of Predictions for Classification and Regression Models
Version: 1.3.0
Date: 2017-12-27
Author: Marko Robnik-Sikonja
Maintainer: Marko Robnik-Sikonja <marko.robnik#fri.uni-lj.si>
Description: Generates explanations for classification and regression models and visualizes them.
Explanations are generated for individual predictions as well as for models as a whole. Two explanation methods
are included, EXPLAIN and IME. The EXPLAIN method is fast but might miss explanations expressed redundantly
in the model. The IME method is slower as it samples from all feature subsets.
For the EXPLAIN method see Robnik-Sikonja and Kononenko (2008) <doi:10.1109/TKDE.2007.190734>,
and the IME method is described in Strumbelj and Kononenko (2010, JMLR, vol. 11:1-18).
All models in package 'CORElearn' are natively supported, for other prediction models a wrapper function is provided
and illustrated for models from packages 'randomForest', 'nnet', and 'e1071'.
License: GPL-3
URL: http://lkm.fri.uni-lj.si/rmarko/software/
Imports: CORElearn (>= 1.52.0),semiArtificial (>= 2.2.5)
Suggests: nnet,e1071,randomForest
Also:
Package: DALEX
Title: Descriptive mAchine Learning EXplanations
Version: 0.1.1
Authors#R: person("Przemyslaw", "Biecek", email = "przemyslaw.biecek#gmail.com", role = c("aut", "cre"))
Description: Machine Learning (ML) models are widely used and have various applications in classification
or regression. Models created with boosting, bagging, stacking or similar techniques are often
used due to their high performance, but such black-box models usually lack of interpretability.
'DALEX' package contains various explainers that help to understand the link between input variables and model output.
The single_variable() explainer extracts conditional response of a model as a function of a single selected variable.
It is a wrapper over packages 'pdp' and 'ALEPlot'.
The single_prediction() explainer attributes arts of model prediction to articular variables used in the model.
It is a wrapper over 'breakDown' package.
The variable_dropout() explainer assess variable importance based on consecutive permutations.
All these explainers can be plotted with generic plot() function and compared across different models.
Depends: R (>= 3.0)
License: GPL
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1.9000
Imports: pdp, ggplot2, ALEPlot, breakDown
Suggests: gbm, randomForest, xgboost
URL: https://pbiecek.github.io/DALEX/
BugReports: https://github.com/pbiecek/DALEX/issues
NeedsCompilation: no
Packaged: 2018-02-28 01:44:36 UTC; pbiecek
Author: Przemyslaw Biecek [aut, cre]
Maintainer: Przemyslaw Biecek <przemyslaw.biecek#gmail.com>
Repository: CRAN
Date/Publication: 2018-02-28 16:36:14 UTC
Built: R 3.4.3; ; 2018-04-03 03:04:04 UTC; unix

Copy the required data in text file using R

Question: Input data is the text file. Copy only Statistics data and paste it in another text file.
We can see in the output only statistics data. But ignore package data in text
Input:
Statistics - R is statistical software which is used for data analysis. It includes a huge number of statistical procedures such as
t-test, chi-square tests, standard linear models, instrumental
variables estimation, local polynomial regressions, etc. It also
provides high-level graphics capabilities.
R provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests,
time-series analysis, classification, clustering, and others.
R is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of
packages.
Packages - The capabilities of R are extended through user-created
packages, which allow specialized statistical techniques, graphical
devices (ggplot2), import/export capabilities, reporting tools (knitr,
Sweave), etc.
These packages are developed primarily in R, and sometimes in Java, C
and Fortran. A core set of packages is included with the installation
of R, with more than 5,800 additional packages and 120,000 functions
Statistics - R is an object oriented programming language.
S-PLUS is a commercial version of the same S programming language that R is a free version
SAS is proprietary software that can be used with very large datasets such as census data.
Packages - Other R package resources include Crantastic, a community
site for rating and reviewing all CRAN packages, and R-Forge.
Version 0.16 – This is the last alpha version developed primarily by
Ihaka and Gentleman. Much of the basic functionality from the "White
Book" (see S history) was implemented. The mailing lists commenced on
April 1, 1997.
Output:
Statistics - R is statistical software which is used for data analysis. It includes a huge number of statistical procedures such as
t-test, chi-square tests, standard linear models, instrumental
variables estimation, local polynomial regressions, etc. It also
provides high-level graphics capabilities.
R provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests,
time-series analysis, classification, clustering, and others.
R is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of
packages.
Statistics - R is an object oriented programming language.
S-PLUS is a commercial version of the same S programming language that R is a free version
SAS is proprietary software that can be used with very large datasets such as census data.
R Code:
setwd("xxx")
text <- readLines("data.txt")
q3<-data.frame(text)
df<- q3[!(is.na(q3$text) | q3$text==""), ]
q4<-data.frame(df)
a<-Search(q4, "Statistics")
View(a)
Only the word containing Statistics Paragraph is captured but not the rest.
Need the help to Build R Code
You can use str_extract_all:
left.border <- "Statistics"
rigth.border <- "Packages"
pattern <- paste0(left.border, "(.*?)", right.border)
str_extract_all(text,pattern)
[[1]]
[1] "Statistics - R is statistical software which is used for data analysis. It includes a huge number of statistical procedures such as t-test, chi-square tests, standard linear models, instrumental variables estimation, local polynomial regressions, etc. It also provides high-level graphics capabilities.\n\nR provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others.\n\nR is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of packages.\n\nPackages"
[2] "Statistics - R is an object oriented programming language.\n\nS-PLUS is a commercial version of the same S programming language that R is a free version\n\nSAS is proprietary software that can be used with very large datasets such as census data.\n\nPackages"
Then, you can remplace right.border with empty space to remove "Packages" at the end.
Bests,
ZP

R programming, Random forest through caret

I'm newbie in R and I want to implement the random forest algorithm using the caret package.
Is there any useful tutorial, step by step?
Most packages contain a manual, and many also include vignettes.
A quick look at the CRAN page for caret http://cran.r-project.org/web/packages/caret/index.html shows that this packages is particularly well documented.
It contains 4 vignettes:
caret Manual – Data and Functions
caret Manual – Variable Selection
caret Manual – Model Building
caret Manual – Variable Importance
Start there.
A few more things appeared about caret package since the question was originally asked. Two tutorials by Max Kuhn, maintainer of caret, I found particularly useful.
YouTube caret webinar and useR! 2013 tutorial
Another two excellent starting points are:
Max Kuhn and Kjell Johnson - Applied Predictive Modeling (2013) - http://appliedpredictivemodeling.com/
caret webpage - http://topepo.github.io/caret/index.html

Resources