I know, that when I'm using lm() or glm() function to fit the regression model in R, it's possible to write interactions up to n-th degree like this:
fit <- glm(formula=outVar ~ (inVar1 + inVar2 + inVar3)^n,
data=d)
But is it possible to do similar thing with the power of variables, so I don't have to specify I(inVar1^2), I(inVar1^3) and to exclude interactions between different powers of the same variable?
EDIT
I'd like to do something like this:
formula=outVar ~ (poly(inVar1 + inVar2 + inVar3, 2))^2
So I'd get the formula
outVar ~ inVar1 + inVar2 + inVar3 + I(inVar1^2) + I(inVar2^2) + I(inVar3^2) + inVar1:inVar2 + inVar1:inVar3 + inVar2:inVar3 + I(inVar1^2):I(inVar2^2) + I(inVar1^2):I(inVar3^2) + I(inVar1^2):I(inVar3^2) + inVar1:I(inVar2^2) + inVar1:I(inVar3^2)...
Related
I am running a regression analysis of the housing market, where I test how the independent variables are affecting the dependent variable (Price).
I am running the following code for the OLS:
reg.model1 <- log(Price2) ~ Detached.house + Semi.detached.house + Attached.houses - Apartment +
Stock.apartment + Housing.cooperative - Sole.owner + Age +
BRA + Bedrooms + Balcony + Lotsize + Sentrum + Alna + Vestre.Aker + Nordstrand + Marka +
Ullern + Østensjø + Søndre.Nordstrand + Stovner + Nordre.Aker + Bjerke +
Grorud + Gamle.Oslo + St..Hanshaugen + Grünerløkka + Sagene - Frogner
reg1 <- lm(formula = reg.model1, data = Data)
The next step was to test for Heteroscedasticity by running Breusch-Pagan Test:
bptest(reg1)
I got the following results in the consol:
As far as I understand, since the P-value is smaller than 0.05 this means Heteroscedasticity is present.
So what I struggle with is what to do next to correct for Heteroscedasticity, I have read that there are several ways to correct this. However, everything I have tried has failed, most likely because I have done something wrong. If someone could guide me on how to correct this I would really appreciate it!
I am using the following model in R:
glmer(y ~ x + z + (1|id), weights = specification, family = binomial, data = data)
which:
y ~ binomial(y, specification)
Logit(y) = intercept + a*x + b*z
a and b are coefficients for x and z variables.
a|a0+a1 = a0 + a1*I
One of the variables (here x) depends on other variable (here I), so I need a hierarchical model to see the heterogeneity in mean of the x.
I would appreciate it if anyone could help me with this problem?
Sorry if the question does not look professional! This is one of my first experiences.
I'm not perfectly sure I understand the question, but: if Logit(y) = intercept + a*x + b*z and a = a0 + a1*I, then it would seem that
Logit(y) = intercept + (a0+a1*I)*x + b*z
This looks like a straightforward interaction model:
glmer(y ~ 1 + x + x:I + z + (1|id), ...)
to make it more explicit, this could also be written as
glmer(y ~ 1 + x + I(x*I) + z + (1|id), ...)
(although the use of I as a predictor variable and in the I() function is a little bit confusing at first glance ...)
I appreciated any insights into staggered did (difference-in-differences) models.
I wanted to ask if I use the correct function to set-up the model for a did (data structure provided below):
did=time*treated
didreg = lm(y ~ time + treated + did + x + factor(year) + factor(firm), data = sample)
The data looks like:
I'm not familiar with difference-in-difference modelling, but from skimming the Wiki it seems that what you want is a simple interaction. To fit that, you don't even need to calculate a new variable (did), but you can specify it directly in the model. There's couple of ways to specify that with R formula syntax:
# Simple main effects models, no interactions
main_mod <- lm(y ~ time + treated + x + factor(year) + factor(firm), data = sample)
# Model with the interaction effect explicitly specified
did_mod1 <- lm(y ~ time + treated + time:treated + x + factor(year) + factor(firm), data = sample)
# Model with shortened syntax for specifying interactions
did_mod2 <- lm(y ~ time * treated + x + factor(year) + factor(firm), data = sample)
did_mod1 and did_mod2 are identical, did_mod2 is just a more compact way of writing the same model. The * indicates that you want both the main effects and the interactions of the variables to the left and the right. It's recommended to always fit main effects when you fit interactions, so the second way of writing the model saves time & space.
Is there anyone can help me implement a ROC curve for a bayesian logistic regression? been trying DPpackage but is it me or it just doesn't work.
the two models i want to compare using ROC Curve are showed below:
bayes_mod=MCMClogit(Default ~ ACTIVITY + CIF + MAN + STA + PIA + COL + CurrLiq + DebtCov + GDPgr, data=mydata, burnin=500000,mcmc=10000, tune=0.6,b0=coef(mylogit.reduced),B0=information2, subset=c(-1772,-2064,-655))
bayes_mod1=MCMClogit(Default ~ ACTIVITY + CIF + MAN + STA + PIA + COL + CurrLiq + DebtCov + GDPgr, data=mydata, burnin=500000,mcmc=10000,tune=0.6,subset=c(-1772,-2064,-655))
where Default ~ ACTIVITY + CIF + MAN + STA + PIA + COL + CurrLiq + DebtCov + GDPgr are my arguments; mydata is the database; mylogit.reduced is a logistic regression estimated prior to bayesian, B0 is the covariation matrix, and subset=c are the eliminated observations.
I don't know this package, but it probably provides a predict function (actually it does, I just can't find if it does for MCMClogit models as I can't find the doc for this function). You can then pass it to a ROC function like pROC:
library(pROC)
predictions <- predict(mydata, newdata=mytestdata)
roc(mytestdata$Default, predictions)
I am fitting a simple regression in R on gas usage per capita. The regression formulas looks like:
gas_b <- lm(log(gasq_pop) ~ log(gasp) + log(pcincome) + log(pn) +
log(pd) + log(ps) + log(years),
data=gas)
summary(gas_b)
I want to include a linear constraint that the beta coefficients of log(pn)+log(pd)+log(ps)=1 (sum to one). Is there a simple way of implementing this (possibly in the lm function) in R without having to use constrOptim() function?
Modify your regression as follows:
gas_b <- lm(log(gasq_pop) - log(ps) ~ log(gasp) + log(pcincome) +
I(log(pn)-log(ps)) + I(log(pd)-log(ps)) + log(years), data=gas)
summary(gas_b)
If b=coef(gas_b), then the relevant coefficients are
log(pn): b[4]
log(pd): b[5]
log(ps): 1 - b[4] - b[5]