Compare beta coefficients of the same regression - r

is there a way to compare (standardized) beta coefficients of one sample and regression without generating two models and conducting an anova? Is there a simpler method with e.g. one function?
For example, if I have this model and would want to compare beta coefficients of SE_gesamt and CE_gesamt (only two variables):
library(lm.beta)
fit1 <- lm(Umint_gesamt ~ Alter + Geschlecht_Dummy + SE_gesamt + CE_gesamt + EmoP_gesamt + Emp_gesamt + IN_gesamt + DN_gesamt + SozID_gesamt, data=dataset)
summary(fit1)
lm.beta(fit1)
All the best,
Karen

Related

Hedonic price method multivariate regression in R | Interpretation of linear-linear, log-linear and log-log model

I am aware that there are similar questions on this site, however, none of them seem to answer my question sufficiently.
I am performing a multivariate regression in order to predict real estate data using Hedonic price method.
EXCERPT OF DATA USED
Dependent variable is AV_TOTAL, which is actually the price of the unit apartments'.
Distances from the closer park/highway are expressed in meters.
U_NUM_PARKS/U_FPLACE(presence of parkings and fireplace) are taken into account as dummy variables.
1) Linear-Linear Model --> Results Model 1
lm(AV_TOTAL ~ LIVINGA_AREAM2 + NUM_FLOORS +
U_BASE_FLO + U_BDRMS + factor(U_NUM_PARK) + DIST_PARKS +
DIST_HIGHdiff + DIST_BIGDIG, data = data)
Residuals Model 1
2) Log-linear Model --> Results Model 2
lm(log(AV_TOTAL) ~ LIVINGA_AREAM2 + NUM_FLOORS +
U_BASE_FLO + U_BDRMS + factor(U_NUM_PARK) + DIST_PARKS + DIST_HIGHdiff + DIST_BIGDIG, data = data)
Residuals Model 2
3) Log-Log Model --> Results Model 3
lm(formula = log(AV_TOTAL) ~ log(LIVINGA_AREAM2) + NUM_FLOORS +
U_BASE_FLO + log(U_BDRMS) + factor(U_NUM_PARK) + log(DIST_PARKS) +
log(DIST_HIGHdiff) + log(DIST_BIGDIG), data = data)
Residuals Model 3
All the models have quite good R^2 while residuals plot shows better normal distribution for Model 2 and 3.
I can't figure out which is the difference between model 2 and 3 especially in interpreting the variable DIST_PARKS (distance from parks) and also which is the more correct model.

R: Stepwise regression with multiple orthogonal polynomials

I am currently looking for an "optimal" fit for some data. I would like to use AIC stepwise regression to find the "best" polynomial regression for my outcome (y) with three variables (a, b, c) and maximum ^3. I also have interactions.
If I use:
lm_poly <- lm(y ~ a + I(a^2) + I(a^3) + b + I(b^2) + I(b^3) + c + a:b, my_data)
stepAIC(lm_poly, direction = "both")
I will get collinearities due to the use of I(i^j)-terms. This shows in the beta regression coefficients of the final fit. There are terms >|1|.
Is there a possibility to do stepwise regression with orthogonal terms?
Using poly() would be nice, but I just don't understand how to do stepwise regression with poly().
lm_poly2 <- lm(y ~ poly(a,3) + poly(b,3) + c + a:b, my_data)
stepAIC(lm_poly2, direction = "both")
This will not include steps with a, a^2(and b respectivly) and thus not find the results I am looking for.
(I know, that I might still have collinearities do to the interaction a:b.)
I hope, someone can understand my point.
Thank you in advance.
Jans

why all of the coefficients estimated by lasso are zero?

I'm new to R and want to implement lasso on my data in order to feature selection according to the coefficient estimated by this algorithm. My data base is big and There are 40 predictors(continuous and categorical).when I apply lasso regression using glmnet package, all the coefficients that are estimated for each predictor in this algorithm are zero except the intercept, why this happen? Is the model over fitted? How can I fix it?The code I used for this section is:
#Transforming categorical variables:
xfactors <- model.matrix(Bill_TotalCharge ~addNA(P_AgeGroup) +
addNA(ADT_ConditionOnDischarge) + addNA(Provider_Profession) +
addNA(ADT_HospitalName) + addNA(ADT_Province) + addNA(ADT_City) +
addNA(DiagnosisValueGroup) + addNA(DiagnosisGroupLevel1) +
addNA(DiagnosisGroupLevel2) + addNA(Bill_Insurer) + addNA(Bill_InsurerType1)
+ addNA(Bill_InsurerType2) + addNA(Bill_InsurerBox) +
addNA(ADT_AdmissionType) + addNA(Bill_RecordType) + addNA(P_MaritalStatus) +
addNA(Gender) + addNA(MonthNumberOfYear) + addNA(CalenderYear) ,
na.action=na.exclude)[,-1]
#Creating matrix of combination of contniuous and categorical varriables
x <- as.matrix(data.frame(Bill_TotalBasicInsurance, Bill_TotalPatient
,Bill_TotalCost1,Bill_TotalCost2, Bill_TotalCost3 , Bill_TotalCost4 ,
Bill_TotalCost5 , Bill_TotalCost6 , Bill_TotalCost7 , Bill_TotalCost8
,Bill_TotalCost9 ,Bill_TotalCost10 ,Bill_TotalCost11 ,Bill_TotalCost12 ,
P_Age, xfactors))
#Running lasso
glmmod <- glmnet(x, y=Bill_TotalCharge, family="gaussian",alpha=1)
Then I want to use cv.glmnet function to determine the min_lambda with cross validation and unbelievably it returns a 6_digits number as a min lambda(lambda and subsequently alpha should be between zero and one).What is the problem and how can I fix it?The code I used for this reason is:
cv.glmmod <- cv.glmnet(x, y=Bill_TotalCharge, alpha=1)
best.lambda <- cv.glmmod$lambda.min
I appreciate any help greatly in advance.

specifying multiple random effects in R lmer (translating from HLM model)

I'm attempting to "translate" a model run in HLM7 software to R lmer syntax.
This is from the now-ubiquitous "Math achievement" dataset. The outcome is math achievement score, and in the dataset there are various student-level predictors (such as minority status, SES, and whether or not the student is female) and various school level predictors (such as Catholic vs. Public).
The only predictors in the model I want to fit are student-level predictors, which have all been group-mean centered to deal with dummy variables (aside: contrast codes are better). The students are nested in schools, so we should (I think) have random effects specified for all of the components of the model.
Here is the HLM model:
Level-1 Model
(note: all predictors at level one are group mean centered)
MATHACHij = β0j + β1j*(MINORITYij) + β2j*(FEMALEij) + β3j*(SESij) + rij
Level-2 Models
β0j = γ00 + u0j
β1j = γ10 + u1j
β2j = γ20 + u2j
β3j = γ30 + u3j
Mixed Model
MATHACHij = γ00 + γ10*MINORITYij + γ20*FEMALEij + γ30*SESij + u0j + u1j*MINORITYij + u2j*FEMALEij + u3j*SESij + rij
Translating it to lmer syntax, I try:
(note: _gmc means the variable has been group mean centered, the grouping factor is "school_id")
model1<-lmer(mathach~minority_gmc+female_gmc+ses_gmc+(minority_gmc|school_id)+(female_gmc|school_id)+(ses_gmc|school_id), data=data, REML=F)
When I run this model I get results that don't mesh with the HLM results. Am I specifying the random effects incorrectly?
Thanks!
When you specify your random effect structure, you can include each random effect in one parentheses. While this may not solve your result dependencies, I believe the appropriate random effects code syntax for your model is this:
lmer(mathach~minority_gmc + female_gmc + ses_gmc + (1 + minority_gmc + female_gmc + ses_gmc |school_id), data=data, REML=F)

Constrained least squares

I am fitting a simple regression in R on gas usage per capita. The regression formulas looks like:
gas_b <- lm(log(gasq_pop) ~ log(gasp) + log(pcincome) + log(pn) +
log(pd) + log(ps) + log(years),
data=gas)
summary(gas_b)
I want to include a linear constraint that the beta coefficients of log(pn)+log(pd)+log(ps)=1 (sum to one). Is there a simple way of implementing this (possibly in the lm function) in R without having to use constrOptim() function?
Modify your regression as follows:
gas_b <- lm(log(gasq_pop) - log(ps) ~ log(gasp) + log(pcincome) +
I(log(pn)-log(ps)) + I(log(pd)-log(ps)) + log(years), data=gas)
summary(gas_b)
If b=coef(gas_b), then the relevant coefficients are
log(pn): b[4]
log(pd): b[5]
log(ps): 1 - b[4] - b[5]

Resources