Fit a Logit/Probit regression model in R using Maximum Likelihood [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
A simple question here:
I'd like to know if there is any function in R that can fit a logit/probit regression model using maximum likelihood method?
Currently, I'm using OLS method given by function glm (I hope it does use OLS method)... I read somewhere that probit/logit model with OLS method may have incidental parameter problem. So I'd like to try MLE method.
Thank you for your help in advance!

#Maju116's comment is correct. glm() doesn't use ordinary least squares, it uses iteratively reweighted least squares; as the linked Wikipedia article says
IRLS is used to find the maximum likelihood estimates of a generalized linear model
The default link for the binomial family is logit, so either glm(...,family=binomial) or glm(...,family=binomial(link="logit")) will fit logistic (logit) regression. glm(...,family=binomial(link="probit")) will fit probit regression.
If you are currently using glm(...) without an explicit family argument, then you are assuming Gaussian errors, which does mean that you'll get the same answers as ordinary least squares (lm()) (which are the maximum likelihood estimates for a data set with Gaussian (normally) distributed errors). For clarity and efficiency, it's generally best to use lm() rather than glm() with the default family when you want to do OLS.

Related

How can I use R to get confidence intervals in Azure ML? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I came across this question which asks if Azure ML can calculate confidence - or probabilities - for row data prediction. However, given that the answer to that question is No, and suggests to use R, I am trying to figure out how to use R to do exactly this for a regression model.
Does anyone have any suggestions for references on where to look for this?
My scenario is that I have used Azure ML to build a boosted decision tree regression model, which outputs a Scored Label column. But I don't know regression analysis well enough to write R code to use the outputted model to get confidence intervals.
I am looking for any references that can help me understand how to do this in R (in conjuncture with Azure ML).
There isn't a straight forward way to compute the confidence interval from the results of the Boosted Decision Tree model in Azure ML.
Here are some alternate suggestions:
Rebuild the model using the library(gbm) http://artax.karlin.mff.cuni.cz/r-help/library/gbm/html/gbm.html or the library(glm) https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html
Then build the confidence interval using confint function: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/confint.html
For a linear model, the confidence interval computation is simpler: http://www.r-tutor.com/elementary-statistics/simple-linear-regression/confidence-interval-linear-regression

R Linear model allow quadratic and first order but not higher [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
My current linear model is: fit<-lm(ES~Area+Anear+Dist+DistSC+Elevation)
I have been asked to further this by:
Fit a linear model for ES using the five explanatory variables and
include up to quadratic terms and first order interactions (i.e. allow
Area^2 and Area*Elevation, but don't allow Area^3 or
Area*Elevation*Dist).
From my research I can do +I(Area^2) and +(Area*Elevation) but this would make a huge list.
Assuming I am understanding the question correctly I would be adding 5 squared terms and 10 * terms giving 20 total. Or do I not need all of these?
Is that really the most efficient way of going about it?
EDIT:
Note that I am planning on carrying out a stepwise regression for the null model and the full model after. I am seemingly having trouble with this when using poly.
Look at ?formula to further your education:
fit<-lm( ES~ (Area+Anear+Dist+DistSC+Elevation)^2 )
Those will not be squared terms but rather part of what you were asked to provide... all the 2-way interactions (and main effects). Formula "mathematics" is different than regular use of powers. To add the squared terms in a manner that allows proper statistical interpretation use poly
fit<-lm( ES~ (Area+Anear+Dist+DistSC+Elevation)^2 +
poly(Area,2) +poly(Anear,2)+ poly(Dist,2)+ poly(DistSC,2)+ poly(Elevation,2) )

Which model is suitable for predicting percentages? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I came across this problem to predict loss on a loan-default, based on various input attributes. You not only have to predict loss/no-loss but also predict what percentage of loan will be lost (0-100%). I am wondering how does one go about modeling such a scenario:
Should loss/no-loss(0%) be modeled as a categorical classification (using SVM etc), since no-loss is quite common?
If you use Linear Regression, how do you keep the loss within the bounds of 1-100?
Thanks in advance!
Generalised linear model with a logit link and binomial family.
Here is a link to get you started.

Poisson regression with both response and explanatory variables as counting [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I’ve got the following variables:
Response: number of quota units leased (in and out) by fishers.
Explanatory: number of quota units own by fishers.
I fitted a GLM (Poisson), but I’m not totally sure if it’s right, considering that the explanatory variable is count as well. I’ve found examples of Poisson regression just with categorical and continuous explanatory variables, but not with counting variables.
So:
Am I right using Poisson with my data? If not so, what alternative do I have?
The residuals variances of my model are not homogeneous. I understand that Poisson regression allows face this problem, or should I pay attention to this issue and solve it (using weights, for example)?
Any help would much appreciated,
The problem seems like it could be well modeled with Poisson regression. The residual variance should NOT be "homogeneous". The Poisson model assumes that the variance is proportional to the mean. You have options if that asumption is violated. The quasi-biniomial and the negative binomial models can also be used and they allow some relaxation of the dispersion parameter estimates.
If the number of quota units owned by fishers sets an upper bound on the number used then I would not think that should be used as an explanatory variable, but might better be entered as offset=log(quota_units). It will change the interpretation of the estimates, such that they are estimates of the log(usage_rate).

How to calculate Total least squares in R? (Orthogonal regression) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I didn't find a function to calculate the orthogonal regression (TLS - Total Least Squares).
Is there a package with this kind of function?
Update: I mean calculate the distance of each point symmetrically and not asymmetrically as lm() does.
You might want to consider the Deming() function in package MethComp [function info]. The package also contains a detailed derivation of the theory behind Deming regression.
The following search of the R Archives also provide plenty of options:
Total Least Squares
Deming regression
Your multiple questions on CrossValidated, here and R-Help imply that you need to do a bit more work to describe exactly what you want to do, as the terms "Total least squares" and "orthogonal regression" carry some degree of ambiguity about the actual technique wanted.
Two answers:
gx.rma in the rgr package appears to do this.
Brian Ripley has given a succinct answer on this thread. Basically, you're looking for PCA, and he suggests princomp. I do, too.
I got the following solution from this url:
https://www.inkling.com/read/r-cookbook-paul-teetor-1st/chapter-13/recipe-13-5
r <- prcomp( ~ x + y )
slope <- r$rotation[2,1] / r$rotation[1,1]
intercept <- r$center[2] - slope*r$center[1]
Basically you performa PCA that will fit a line between x and y minimizing the orthogonal residuals. Then you can retrieve the intercept and slope for the first component.
For anyone coming across this question again, there exists a dedicated package 'onls' by now for that purpose. It is similar handled as the nls package (which implements ordinary least square algorithms)

Resources