Sorry to ask such a basic question, but I am stuck in interpreting glm formula.
I calculated a binomial model and I want to use the formula output (intercept and each of the estimated coefficients) to "manually" calculate the predicted score calculated by the model.
In the case of a linear regression this should be something like
y = a +b1*x1+ ... + bn*xn
or
score = intercept + x.1*variable.1 + ... + x.n* variable.n
but I understand that logistic regression is different, and could not find how.
Could someone help me with this?
thanks in advance!!!
I find out that the score could be manually calculated aplying the formula as is detailed in teh output and using inv.logit() to the value obtained.
Related
I have a regression like this y = β0 + β1xi1 + β2xi2 + ui.
And I want to perform a two sided t test that β1=3, 5% confidence
So according to examples I've found, I think I should do something like this:
t.test(y~x1, data=data, alternative=c("two.sided"), mu=3, conf.level=0.95, var.eq=F, paired=F)
But I'm wondering, how does this take into account that the regression I'm doing includes both x1 and x2? Surely the coefficient will be different when both are added so the first line of code is wrong. But adding it in like this t.test(y~x1+x2, data=data, alternative=c("two.sided"), mu=3, conf.level=0.95, var.eq=F, paired=F) doesn't work.
Thanks
I am following a course on R. At the moment, we are working with logistic regression. The basic form we are taught is this one:
model <- glm(
formula = y ~ x1 + x2,
data = df,
family = quasibinomial(link = "logit"),
weights = weight
)
This makes perfectly sense to me. However, then we are being recommended to use the following to get coefficients and heteroscedasticity-robust inference:
model_rob <- lmtest::coeftest(model, sandwich::vcovHC(model))
This confuses me bit. Reading about vcovHC is states that it creates a "heteroskedasticity-consistent estimation". Why would you do this when doing logistic regression? I taught it did not assume homoscedasticity? Also, I am not sure what the coeftest does?
Thank you!
You're right - homoscedasticity (residuals at each level of the predictor have the same variance), is not an assumption in logistic regression. However, the binary response in logistic regression is heteroscedastic (0 or 1) which is why a corresponding estimator should be consistent with it. I guess that is what is meant with "heteroscedasticity-consistent". As #MrFlick already pointed out, if you would like more information on that topic, Cross Validated is likely to be the place to be. The coeftest produces the Wald test statistic of the estimated coefficients. These tests give you some information on whether a predictor (independent variable) seems to be associated to the dependent variable according to your data.
I am trying to use R to rerun someone else's project, so we need to use some macros in R.
Here comes a very basic question:
m1.nlme = lme(log.bp.dia ~ M25.9to9.ma5iqr + temp.c.9to9.ma4iqr + o3.ma5iqr + sea_spring + sea_summer + sea_fall + BMI + male + age_ini, data=barbara.1.clean, random = ~ 1|study_id)
Since the model is using AR(1) [autocorrelation 1 covariance model] in SAS for within person variance, I am not sure how to do this in R.
And where I can see the index for different models, like unstructured?
Thanks
I don't know what you mean by "index" for different models, but to specify an AR(1) covariance structure for the residuals, you can add corr=corAR1() to your lme call.
The correlation at lag $1$ is say $r$, where $-1< r <1$ for a stationary $AR(1)$ model. The correlation at lag $k \geq 1$ is $r^k$. This gives you the autocovariance matrix by just multiplying by the variance of $X_t$.
I would first like to say, that I understand that calculating an R^2 value for a non-linear regression isn't exactly correct or a valid thing to do.
However, I'm in a transition period of performing most of our work in SigmaPlot over to R and for our non-linear (concentration-response) models, colleagues are used to seeing an R^2 value associated with the model to estimate goodness-of-fit.
SigmaPlot calculates the R^2 using 1-(residual SS/total SS), but in R I can't seem to extract the total SS (residual SS are reported in summary).
Any help in getting this to work would be greatly appreciated as I try and move us into using a better estimator of goodness-of-fit.
Cheers.
Instead of extracting the total SS, I've just calculated them:
test.mdl <- nls(ctrl.adj~a/(1((conc.calc/x0)^b)),
data=dataSet,
start=list(a=100,b=10,x0=40), trace=T);
1 - (deviance(test.mdl)/sum((ctrl.adj-mean(ctrl.adj))^2))
I get the same R^2 as when using SigmaPlot, so all should be good.
So the total variation in y is like (n-1)*var(y) and the proportion not explained my your model is sum(residuals(fit)^2) so do something like 1-(sum(residuals(fit)^2)/((n-1)*var(y)) )
I need to fit Y_ij ~ NegBin(m_ij,k), hence a negative binomial distribution to a count. However, the data I have observed are censored, I know the value of y_ij, but it could be more than that value. Writting down the loglikelihood going with this problem is:
ll = \sum_{i=1}^n w_i (c_i log(P(Y_ij=y_ij|X_ij)) + (1- c_i) log(1- \sum_{k=1}^32 P(Y_ij = k|X_ij)))
Where X_ij represent the design matrix (with the covariates of interest), w_i is the weight for each observation, y_ij is the response variable and P(Y_ij=y_ij|Xij) is the negative binomial distribution where the m_ij=exp(X_ij \beta) and \alpha is the overdispersion parameter.
Does someone knows if there exist a build-in code in R that could be used to obtain this?
Check this paper out: Regression Models for Count Data in R