To perform generalized linear regression using R, there is an option in glm where i can put weight to each of the observation by weights.
Now i want to know what does it actually do? I found a similar question on this website. However that doesnt answer my problem. Thus i am stating my problem with an example.
Examlle data are as under
x1=rnorm(100)
x2=rnorm(100)
y=rbinom(100,1,0.5)
Data=data.frame(y,x1,x2)
w=rexp(100)
model= glm(y~x1+x2, data=Data, family=binomial,weights=w)
Are these weights resampling the y? and then using this new resampled y and its corresponding covariates, y is regressed on covariates??
In other words, can I restate the above procedure as :
newsample=sample(1:length(y),length(y),replace=TRUE,prob=w)
newdata=Data[newsample,]
And then regress the y with its corresponding x1 and x2 as
model= glm(y~x1+x2, data=newdata, family=binomial)
?
Thankyou.
Related
I have a response variable (A) which I transformed (logA) and predictor (B) from data (X) which are both continuous. How do I check the linearity between the two variables using Generalized Additive Model (GAM) in R. I use the following code
model <- gamlss(logA ~ pb(B) , data = X, trace = F)
but I am not sure about it, can I add "family=Poisson" in the code when logA is continuous in GLM? Any thoughts on this?
Thanks in advance
If your dependent variable is a count variable, you can use family=PO() without log transformation. With family=PO() a log link is already applied to transform the variable. See help page for gamlss family and also vignette on count regression section 2.1.
So it will go like:
library(gamlss)
fit = gamlss(gear ~ pb(mpg),data=mtcars,family=PO())
You can see that the predictions are log transformed and you need to take the exponential:
with(mtcars,plot(mpg,gear))
points(mtcars$mpg,exp(predict(fit,what="mu")),col="blue",pch=20)
I am following a course on R. At the moment, we are working with logistic regression. The basic form we are taught is this one:
model <- glm(
formula = y ~ x1 + x2,
data = df,
family = quasibinomial(link = "logit"),
weights = weight
)
This makes perfectly sense to me. However, then we are being recommended to use the following to get coefficients and heteroscedasticity-robust inference:
model_rob <- lmtest::coeftest(model, sandwich::vcovHC(model))
This confuses me bit. Reading about vcovHC is states that it creates a "heteroskedasticity-consistent estimation". Why would you do this when doing logistic regression? I taught it did not assume homoscedasticity? Also, I am not sure what the coeftest does?
Thank you!
You're right - homoscedasticity (residuals at each level of the predictor have the same variance), is not an assumption in logistic regression. However, the binary response in logistic regression is heteroscedastic (0 or 1) which is why a corresponding estimator should be consistent with it. I guess that is what is meant with "heteroscedasticity-consistent". As #MrFlick already pointed out, if you would like more information on that topic, Cross Validated is likely to be the place to be. The coeftest produces the Wald test statistic of the estimated coefficients. These tests give you some information on whether a predictor (independent variable) seems to be associated to the dependent variable according to your data.
I've run a simple model using orm (i.e. reg <- orm(formula = y ~ x)) and I'm having trouble understanding how to get predicted values for Y. I've never worked with models that use multiple intercepts. I want to know for each and every value of Y in my dataset what the predicted value from the model would be. I tried predict(reg, type="mean") and this produced values that are close to the predicted values from an OLS regression, but I'm not sure if this is what I want. I really just want something analogous to OLS where you can obtain the E(Y) given a set of predictors. If possible, please provide code I can run to do this with a very short explanation.
I'm sure this is something that can be done, just not sure how!
I have a dataset that is around 500 rows(csv) and it shows footballers match stas(e,g passes, shots on target)etc.I have some of their salaries(around 10) and I'n trying to predict their salaries using a linear regression equation.
In the below, if Y is salaries, is there a way on R to essentially autopopulate? what the rest of the salaries might be based on the ten salaries I do have?
lm(y ~ x1 + x2 +x3)
Any help would be much appreciated.
This is what the predict function does.
Note that you don't need to call predict.lm explicitly. Because the result of a call to lm is an object with class "lm", R "knows" to use predict.lm when you call predict on it.
Eg:
lm1 <- lm(y ~ x1 + x2 +x3)
y.fitted <- predict(lm1)
You should also be able to test the predictive accuracy of your model using cross validation with the function cv.lm in the DAAG library. With this function you create test data to test the model which is generated using training data.
In R, given a multinomial linear logit regression, I would need to obtain the conditional probability given some values of the predictors.
For example, using the function multinom from the package nnet, imagine to have computed fit <- multinom(response ~ predictor). From fit, how can I obtain the probability weights of the different response classes, given a certain value of the predictor?
I thought of using something like predict(fit,newdata,type=???), but I have no idea about how to continue.
I found a possible solution: predict(fit, newdata = predictor, "probs"). In this way, I was able to find the probability weights for all the values of the predictor: every row corresponds to a certain value.