How to correct for Heteroscedasticity in R - r

I am running a regression analysis of the housing market, where I test how the independent variables are affecting the dependent variable (Price).
I am running the following code for the OLS:
reg.model1 <- log(Price2) ~ Detached.house + Semi.detached.house + Attached.houses - Apartment +
Stock.apartment + Housing.cooperative - Sole.owner + Age +
BRA + Bedrooms + Balcony + Lotsize + Sentrum + Alna + Vestre.Aker + Nordstrand + Marka +
Ullern + Østensjø + Søndre.Nordstrand + Stovner + Nordre.Aker + Bjerke +
Grorud + Gamle.Oslo + St..Hanshaugen + Grünerløkka + Sagene - Frogner
reg1 <- lm(formula = reg.model1, data = Data)
The next step was to test for Heteroscedasticity by running Breusch-Pagan Test:
bptest(reg1)
I got the following results in the consol:
As far as I understand, since the P-value is smaller than 0.05 this means Heteroscedasticity is present.
So what I struggle with is what to do next to correct for Heteroscedasticity, I have read that there are several ways to correct this. However, everything I have tried has failed, most likely because I have done something wrong. If someone could guide me on how to correct this I would really appreciate it!

Related

Problems with Fixed effects panel data

I am trying to run a regression with a panel data from the Michigan Consumers Survey. It is the first time I am using panel data on R so I am not very aware of the package "plm" that is needed. I am setting my panel data for fixed effects on individuals (CASEID) and time (YYYY):
Michigan_panel <- pdata.frame(Michigan_survey, index = c("CASEID", "YYYY"))
Then I am using the following regression:
mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
However R is showing me the following error:
> mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, :
empty model
Does anyone know what I am doing wrong?
Could you give the link where is this specific survey? I found various dataset with this data name.
I suspect (only suspect), you data isn't panel data, please check the CASEID variable.
Changing the order between formula and data in plm won't be solve your problem.
.
I think the error come when you write the model. Your solution is this:
mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
In my view, you have to specify indexes in the formula, and follow the order of the plm package. I would like to write your formula as follows:
mod_1 <- plm(ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq,
data = Michigan_panel,
index= c("CASEID", "YYYY"),
model = "within")
1. Different Approach
From my knowledge we can also code this formula in a more elegant format.
library(plm)
Michigan_panel <- pdata.frame(Michigan_survey, index = c("CASEID", "YYYY"))
attach(Michigan_panel)
y <- cbind(ICS)
X <- cbind(ICE,PX1Q2,RATEX,ZLB,INCOME,AGE,EDUC,MARRY,SEX,AGE_sq)
model1 <- plm(y~X+factor(CASEID)+factor(YEAR), data=Michigan_panel, model="within")
summary(model1)
detach()
Adding factor(CASEID) and factor(YEAR) will add dummy variables in your model.

Create Well-Being Latent Variable in Lavaan from Depression/Anxiety Questionnaire

I'm building a structural equation model that incorporates 4 latent variables: physical lifestyle, social lifestyle, trauma score, and the DV (well-being).
We have a 7 question survey of just well-being, but I think it would be more sound (less measurement error) to cull three surveys of well-being, depression, and anxiety to make them into a latent dependent variable. I received the warning that the covariance matrix was not positive definite when just using the scaled scores from the surveys, so I decided to actually incorporate the questions from the surveys themselves. However, when I do this and then look at modification indices I receive an output that suggests that the residuals are not currently correlated, when I thought that that was the default for any latent variable, which is why I am wondering whether I am specifying the well-being latent variable correctly (whether it's just a matter of adding in all questions that will ultimately comprise this latent variable).
Below is the entire model. The latent variable "well-being" currently only has questions from the phq 9, Depression Survey; and the General Anxiety Survey (but will also be adding in the well-being survey). I've added the output for the modification indices below that.
I've included some data here: https://drive.google.com/file/d/1AX50DFNik30Qsyiyp6XnPMETNfVXK83r/view?usp=sharing
Thanks much!
fit.latent_wb <- '
#factor loadings; measurement model portion
pl =~ exercisescore + mindfulnessscore + promistscore
sl =~ family_support + friendshipcount + friendshipnet +
sense_of_community + sesscore + ethnicity
trauma =~ neglectscore + abusescore + exposure + family_support + age
wb =~ phq9_1 + phq9_2 + phq9_3 + phq9_4 + phq9_5 + phq9_6 +
phq9_7 + phq9_8 + phq9_9 + gad7_1 + gad7_2 + gad7_3 + gad7_4 +
gad7_5+ gad7_6+ gad7_7
#regressions: structural model
wb ~ age + gender + ethnicity + sesscore + resiliencescore +
pl + emotionalsupportscore + trauma
resiliencescore ~ age + sesscore + emotionalsupportscore + sl
emotionalsupportscore ~ sl + gender
friendshipnet~~age
exercisescore~~sense_of_community
'
fit.latent_wb <- sem(fit.latent_wb, data = total, meanstructure = TRUE, std.lv = TRUE)
summary(fit.latent_wb, fit.measures = TRUE,standardized = TRUE, rsquare = TRUE, estimates = FALSE)
Output for Mod Indices:

Significant variables for Logistic regression in R

I am still new to R and still struggling. I am trying to do a logistic regression using a categorical and continuous variable and I am supposed to select the right variable for my model. There are 27 variables and a 8,000 observations.
I have gone through a couple of articles online including stepwise regression by AIC and all I do is confuse myself the more. I was also told to select my variables from the correlation matrix but when I do the correlation I don't seem to find the correlation especially with the categorical variable. I also try to fit all the model and I get some variables with p-value less than 0.5. This is the code:
d4 <- d3[,c('SW','MOI','YOI','DOI_CMC','RMOB','RYOB','RDOB_CMC',
'RCA','Region','TPR','DPR','NV','HEL','Has_Radio','Has_TV',
'Religion','WI','MOFB','YOB','DOB_CMC','DOFB_CMC','AOR','MTFBI',
'DSOUOM_CMC','RW','RH','RBMI')]
cor(d4)
d5 <- cor(d4)
round(cor(d4),2)
When I select the significant variables and try to apply logistic regression all the p value will be between 0.9 to 1. See code:
d3 <- lm(TPR ~ SW + MOI + RMOB + RYOB + RCA + Region + TPR + DPR +
NV + HEL + Has_Radio + Has_TV + Religion + WI + MOFB +
YOB + DOB_CMC + DOFB_CMC + AOR + MTFBI + DSOUOM_CMC +
RW + RH + RBMI,
data = d3, family = "binomial")
summary(d3)
I need help with this please!!
Here is the sample of d3

Linear regression model up to nth power of number

I know, that when I'm using lm() or glm() function to fit the regression model in R, it's possible to write interactions up to n-th degree like this:
fit <- glm(formula=outVar ~ (inVar1 + inVar2 + inVar3)^n,
data=d)
But is it possible to do similar thing with the power of variables, so I don't have to specify I(inVar1^2), I(inVar1^3) and to exclude interactions between different powers of the same variable?
EDIT
I'd like to do something like this:
formula=outVar ~ (poly(inVar1 + inVar2 + inVar3, 2))^2
So I'd get the formula
outVar ~ inVar1 + inVar2 + inVar3 + I(inVar1^2) + I(inVar2^2) + I(inVar3^2) + inVar1:inVar2 + inVar1:inVar3 + inVar2:inVar3 + I(inVar1^2):I(inVar2^2) + I(inVar1^2):I(inVar3^2) + I(inVar1^2):I(inVar3^2) + inVar1:I(inVar2^2) + inVar1:I(inVar3^2)...

fitting model in svyglm

I am using svyglm model, I need to get "AIC" to compare differents models.
My problem is that if I use my model:
Modelo12=svyglm(formula = Asiste ~ sexo + E27 + JovenActivo + hijos +
+ jefe + LN_YSVL_sin_joven_prom + aniosed + climaeducativo +
+ icv2 + TV + Computadora + Telefono + Cable + Calefon +
+ DVD + Microhondas + Aire + Auto_o_moto + Secadora + Lavavajillas +
+ Refrigerador + Actividad_del_Jefe + Hacinamiento, family = quasibinomial(link
=
"probit"), data = Personas.con.muestra, design=diseno_personas_14_17)
when I write summary(Modelo12) AIC doesnt appears.
In other hand, I can`t use stepwise, this is the error:
stepwise(Modelo12)
Direction: backward/forward forward/backward backward forward
Criterion: BIC
Error en extractAIC.svyglm(fit, scale, k = k, ...) :
svyglm not fitted by maximum likelihood
THANKS!!!
Natalia
AIC is a likelihood based measure. svyglm does not use maximum likelihood fitting, as the error and ?svyglm tell you. There you also get the advice to use regTermTest, which provide Wald tests and adjusted Rao-Scott likelihood ratio test.

Resources