I'm trying to learn about MARS/Earth models for classification and am using "classif.earth" in the MLR package in R. My issue is that the MLR documentation says that "classif.earth" performs flexible discriminant analysis using the earth algorithm.
However, when I look at the code:
(https://github.com/mlr-org/mlr/blob/master/R/RLearner_classif_earth.R)
I don't see a call to fda in the mda package, rather it directs earth to fit a glm with a default logit link.
So tell me if I'm wrong, but it seems to me that "classif.earth" is not doing flexible discriminant analysis but rather fitting a logistic regression on the earth model.
The implementation uses MARS to perform the FDA, where the MARS model determines the different groups. You can find more information in this paper; I quote from the abstract:
Linear discriminant analysis is equivalent to multiresponse linear regression [...] to represent the groups.
Related
I am using regsubsets method for linear regression and came across step() method for selecting columns for logistic regression methods.
I am not sure whether we can use regsubsets or steps for Poisson regression. It will be helpful if there is a method to find the best subsets for Poisson regression in R programming.
From here it looks like
step() (base R: works on glm objects -> includes Poisson regression models)
bestglm package
glmulti package
Possibly others.
Be prepared for the GLM (Poisson etc.) case to be much slower than the analogous problem for Gaussian responses (OLS/lm).
I was checking tidymodels for multivariate regression and saw this example here:
https://www.tidymodels.org/learn/models/pls/
This covers multivariate for Partial Least Squares model.
Is there a page that states what models currently support multivariate regression?
I believe the current models that support multivariate (more than one outcome) regression are:
single layer neural network: mlp()
multivariate adaptive regression splines: mars()
good old linear regression: linear_reg()
This list was made by looking for which models use the maybe_multivariate() internal helper, but we should document this better somehow.
I had a statistic class where a Logistic Regression was set up on some ecotoxicological data (dose-response curve) and the nlstools package was used for this purpose. So far when I came across the Logistic Regression I only used the glm and I am not sure what is the difference between these two approaches. When applying both approaches on the same data there was a clear difference between the fitted curves.
I already tried to look for answers but what I've read so far is that the nlstools package is used for nonlinear Regressions. From what I've read the Logistic Regression is not really a nonlinear Regression but a gerneralized linear model. Why can you use the nlstools package then? And I am a little bit confused how I should classify the Logistic Regression. Thanks for your help!
I have a collection of documents, that might have latent topics associated with them. It is likely that each document might relate to one or more topics. I have a master file of all possible "topics"/categories and descriptions to these topics. I am seeking to create a model that predicts the topics for each document.
I could potentially use Supervised text classification using RTextTools, but that would only help me categorize documents to belong to one category or another. I am seeking to find a solution that would not only help me determine the topic proportions to the document, but also give the term-topic/category distributions.
sLDA seems like a good fit, but it seems to only predict continuous variable outcomes instead of categorical.
LDA is more of a classification method, predicting classes. other methods can be multinational logistic regression. LDA could be harder to train compared to Multinational, given a possible little improved fit it can provide.
update: LDA is a classification method where unlike logistic regression that you directly predict Pr(Y = k|X = x) using the logit link, LDA uses the Bayes theorem for prediction. It is normally a more popular compared to logistic regression (and its extension for multi-class prediction, namely multinational logistic regression) for multi-class problems.
LDA assumes that the observations are drawn from a Gaussian distribution with a common covariance matrix in each class, and so can provide some improvements over logistic regression when this assumption approximately holds. in contrast,it is suggested that logistic regression can outperform LDA if these Gaussian assumptions are not hold. To sum up, While both are appropriate for the development of linear classification models, linear discriminant analysis makes more assumptions about the underlying data as opposed to logistic regression, which makes logistic regression a more flexible and robust method when these assumptions are not hold. So what I meant was, it is important to understand your data well, and see which might fit your data better. There are good sources on read you can read and comparison of classification methods:
http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf
I suggest Introduction to statistical learning, on classification chapter. Hope this helps
I can see how cv.glm work with a glm object, but what about fitted survival models?
I have a bunch of models (Weibull, Gompertz, lognormal, etc). I want to assess the prediction error using cross validation. Which package/function can do this in R?
SuperLearner can do V-fold cross-validation for a large library of underlying machine learning algorithms, not sure that it includes survival models. Alternatively, take a look at the cvTools package, which is designed to help do cross-validation of any prediction algorithm you give it.