Calculate probabilities from Probit Model - R Command? - r

I'm working with a Probit model, and would like to calculate the probabilities from my model for each observation in my DF. I know I can calculate this using the formula, however I am wondering if there is a quick way to output the probabilities and append them to my DF.
I am running the following model:
attach(non.part.2)
y <- cbind(E)
x1 <- cbind(tech.ems, med.com, tech.nonemerg)
probit <- glm(y ~ x1, family = binomial (link = "probit"))
summary(probit)
I am running several models, so it would be nice to be able to have R spit out the probabilities and allow me to name them in my DF(non.part.2) - something like p_x1 - so that I can run summary stats on the various models later.
Any help is much appreciated!

The following should work.
non.part2$p_x1 <- predict(probit, yourDataToPredictOn, type = "response")

Related

Generalized Linear Model (GLM) in R

I have a response variable (A) which I transformed (logA) and predictor (B) from data (X) which are both continuous. How do I check the linearity between the two variables using Generalized Additive Model (GAM) in R. I use the following code
model <- gamlss(logA ~ pb(B) , data = X, trace = F)
but I am not sure about it, can I add "family=Poisson" in the code when logA is continuous in GLM? Any thoughts on this?
Thanks in advance
If your dependent variable is a count variable, you can use family=PO() without log transformation. With family=PO() a log link is already applied to transform the variable. See help page for gamlss family and also vignette on count regression section 2.1.
So it will go like:
library(gamlss)
fit = gamlss(gear ~ pb(mpg),data=mtcars,family=PO())
You can see that the predictions are log transformed and you need to take the exponential:
with(mtcars,plot(mpg,gear))
points(mtcars$mpg,exp(predict(fit,what="mu")),col="blue",pch=20)

plm() versus lm() with multiple fixed effects

I am attempting to run a model with county, year, and state:year fixed effects. The lm() approach looks like this:
lm <- lm(data = mydata, formula = y ~ x + county + year + state:year
where county, year, and state:year are all factors.
Because I have a large number of counties, running the model is very slow using lm(). More frustrating given the number of models I need to produce, lm() produces a much larger object than plm(). This plm() command yields the same coefficients and levels of significance for my main variables.
plm <- plm(data = mydata, formula = y ~ x + year + state:year, index = "county", model = "within"
However, these produce substantially different R-squared, Adj. R-squared, etc. I thought I could solve the R-squared problem by calculating the R-squared for plm by hand:
SST <- sum((mydata$y - mean(mydata$y))^2)
fit <- (mydata$y - plm$residuals)
SSR <- sum((fit - mean(mydata$y))^2)
R2 <- SSR / SST
I tested the R-squared code with lm and got the same result reported by summary(lm). However, when I calculated R-squared for plm I got a different R-squared (and it was greater than 1).
At this point I checked what the coefficients for my fixed effects in plm were and they were different than the coefficients in lm.
Can someone please 1) help me understand why I'm getting these differing results and 2) suggest the most efficient way to construct the models I need and obtain correct R-squareds? Thanks!

Can I do a mulitvariate regression with the segmented package in r?

I have FINALLY figured out how to use the segmented package with a uni-variate analysis giving results comparable to what I was expecting. Ultimately though, I have to do a GLM piece-wise regression on a multivariate analysis. The model has some variables that need to be segmented and some that do not as well as categorical variables. Is this possible with the segmented package?
If so, how?
Do I have to keep interactively keep developing models adding one variable to the segmented package at at time?
piecewise <- glm(y ~ x, family = quasipoisson(link = "log"), data = data)
piecewise_seg <- segmented(piecewise, seg.z = ~ x1, psi = 3)
piecewise_seg2 <- segmented(piecewise_seg, seg.z = ~x2 psi = 400)
Or can I do this in one go? If so, how can I set the different psi parameters for each different variable?
Wait, I think I found it towards the end of the package documentation.
2 segmented variables: starting values requested via a named list
o1<-update(o,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))

R Prediction on a Linear Regression Model

I'm sure this is something that can be done, just not sure how!
I have a dataset that is around 500 rows(csv) and it shows footballers match stas(e,g passes, shots on target)etc.I have some of their salaries(around 10) and I'n trying to predict their salaries using a linear regression equation.
In the below, if Y is salaries, is there a way on R to essentially autopopulate? what the rest of the salaries might be based on the ten salaries I do have?
lm(y ~ x1 + x2 +x3)
Any help would be much appreciated.
This is what the predict function does.
Note that you don't need to call predict.lm explicitly. Because the result of a call to lm is an object with class "lm", R "knows" to use predict.lm when you call predict on it.
Eg:
lm1 <- lm(y ~ x1 + x2 +x3)
y.fitted <- predict(lm1)
You should also be able to test the predictive accuracy of your model using cross validation with the function cv.lm in the DAAG library. With this function you create test data to test the model which is generated using training data.

Logistic regression with robust clustered standard errors in R

A newbie question: does anyone know how to run a logistic regression with clustered standard errors in R? In Stata it's just logit Y X1 X2 X3, vce(cluster Z), but unfortunately I haven't figured out how to do the same analysis in R. Thanks in advance!
You might want to look at the rms (regression modelling strategies) package. So, lrm is logistic regression model, and if fit is the name of your output, you'd have something like this:
fit=lrm(disease ~ age + study + rcs(bmi,3), x=T, y=T, data=dataf)
fit
robcov(fit, cluster=dataf$id)
bootcov(fit,cluster=dataf$id)
You have to specify x=T, y=T in the model statement. rcs indicates restricted cubic splines with 3 knots.
Another alternative would be to use the sandwich and lmtest package as follows. Suppose that z is a column with the cluster indicators in your dataset dat. Then
# load libraries
library("sandwich")
library("lmtest")
# fit the logistic regression
fit = glm(y ~ x, data = dat, family = binomial)
# get results with clustered standard errors (of type HC0)
coeftest(fit, vcov. = vcovCL(fit, cluster = dat$z, type = "HC0"))
will do the job.
I have been banging my head against this problem for the past two days; I magically found what appears to be a new package which seems destined for great things--for example, I am also running in my analysis some cluster-robust Tobit models, and this package has that functionality built in as well. Not to mention the syntax is much cleaner than in all the other solutions I've seen (we're talking near-Stata levels of clean).
So for your toy example, I'd run:
library(Zelig)
logit<-zelig(Y~X1+X2+X3,data=data,model="logit",robust=T,cluster="Z")
Et voilĂ !
There is a command glm.cluster in the R package miceadds which seems to give the same results for logistic regression as Stata does with the option vce(cluster). See the documentation here.
In one of the examples on this page, the commands
mod2 <- miceadds::glm.cluster(data=dat, formula=highmath ~ hisei + female,
cluster="idschool", family="binomial")
summary(mod2)
give the same robust standard errors as the Stata command
logit highmath hisei female, vce(cluster idschool)
e.g. a standard error of 0.004038 for the variable hisei.

Resources