Constrained least squares - r

I am fitting a simple regression in R on gas usage per capita. The regression formulas looks like:
gas_b <- lm(log(gasq_pop) ~ log(gasp) + log(pcincome) + log(pn) +
log(pd) + log(ps) + log(years),
data=gas)
summary(gas_b)
I want to include a linear constraint that the beta coefficients of log(pn)+log(pd)+log(ps)=1 (sum to one). Is there a simple way of implementing this (possibly in the lm function) in R without having to use constrOptim() function?

Modify your regression as follows:
gas_b <- lm(log(gasq_pop) - log(ps) ~ log(gasp) + log(pcincome) +
I(log(pn)-log(ps)) + I(log(pd)-log(ps)) + log(years), data=gas)
summary(gas_b)
If b=coef(gas_b), then the relevant coefficients are
log(pn): b[4]
log(pd): b[5]
log(ps): 1 - b[4] - b[5]

Related

Hedonic price method multivariate regression in R | Interpretation of linear-linear, log-linear and log-log model

I am aware that there are similar questions on this site, however, none of them seem to answer my question sufficiently.
I am performing a multivariate regression in order to predict real estate data using Hedonic price method.
EXCERPT OF DATA USED
Dependent variable is AV_TOTAL, which is actually the price of the unit apartments'.
Distances from the closer park/highway are expressed in meters.
U_NUM_PARKS/U_FPLACE(presence of parkings and fireplace) are taken into account as dummy variables.
1) Linear-Linear Model --> Results Model 1
lm(AV_TOTAL ~ LIVINGA_AREAM2 + NUM_FLOORS +
U_BASE_FLO + U_BDRMS + factor(U_NUM_PARK) + DIST_PARKS +
DIST_HIGHdiff + DIST_BIGDIG, data = data)
Residuals Model 1
2) Log-linear Model --> Results Model 2
lm(log(AV_TOTAL) ~ LIVINGA_AREAM2 + NUM_FLOORS +
U_BASE_FLO + U_BDRMS + factor(U_NUM_PARK) + DIST_PARKS + DIST_HIGHdiff + DIST_BIGDIG, data = data)
Residuals Model 2
3) Log-Log Model --> Results Model 3
lm(formula = log(AV_TOTAL) ~ log(LIVINGA_AREAM2) + NUM_FLOORS +
U_BASE_FLO + log(U_BDRMS) + factor(U_NUM_PARK) + log(DIST_PARKS) +
log(DIST_HIGHdiff) + log(DIST_BIGDIG), data = data)
Residuals Model 3
All the models have quite good R^2 while residuals plot shows better normal distribution for Model 2 and 3.
I can't figure out which is the difference between model 2 and 3 especially in interpreting the variable DIST_PARKS (distance from parks) and also which is the more correct model.

Compare beta coefficients of the same regression

is there a way to compare (standardized) beta coefficients of one sample and regression without generating two models and conducting an anova? Is there a simpler method with e.g. one function?
For example, if I have this model and would want to compare beta coefficients of SE_gesamt and CE_gesamt (only two variables):
library(lm.beta)
fit1 <- lm(Umint_gesamt ~ Alter + Geschlecht_Dummy + SE_gesamt + CE_gesamt + EmoP_gesamt + Emp_gesamt + IN_gesamt + DN_gesamt + SozID_gesamt, data=dataset)
summary(fit1)
lm.beta(fit1)
All the best,
Karen

why all of the coefficients estimated by lasso are zero?

I'm new to R and want to implement lasso on my data in order to feature selection according to the coefficient estimated by this algorithm. My data base is big and There are 40 predictors(continuous and categorical).when I apply lasso regression using glmnet package, all the coefficients that are estimated for each predictor in this algorithm are zero except the intercept, why this happen? Is the model over fitted? How can I fix it?The code I used for this section is:
#Transforming categorical variables:
xfactors <- model.matrix(Bill_TotalCharge ~addNA(P_AgeGroup) +
addNA(ADT_ConditionOnDischarge) + addNA(Provider_Profession) +
addNA(ADT_HospitalName) + addNA(ADT_Province) + addNA(ADT_City) +
addNA(DiagnosisValueGroup) + addNA(DiagnosisGroupLevel1) +
addNA(DiagnosisGroupLevel2) + addNA(Bill_Insurer) + addNA(Bill_InsurerType1)
+ addNA(Bill_InsurerType2) + addNA(Bill_InsurerBox) +
addNA(ADT_AdmissionType) + addNA(Bill_RecordType) + addNA(P_MaritalStatus) +
addNA(Gender) + addNA(MonthNumberOfYear) + addNA(CalenderYear) ,
na.action=na.exclude)[,-1]
#Creating matrix of combination of contniuous and categorical varriables
x <- as.matrix(data.frame(Bill_TotalBasicInsurance, Bill_TotalPatient
,Bill_TotalCost1,Bill_TotalCost2, Bill_TotalCost3 , Bill_TotalCost4 ,
Bill_TotalCost5 , Bill_TotalCost6 , Bill_TotalCost7 , Bill_TotalCost8
,Bill_TotalCost9 ,Bill_TotalCost10 ,Bill_TotalCost11 ,Bill_TotalCost12 ,
P_Age, xfactors))
#Running lasso
glmmod <- glmnet(x, y=Bill_TotalCharge, family="gaussian",alpha=1)
Then I want to use cv.glmnet function to determine the min_lambda with cross validation and unbelievably it returns a 6_digits number as a min lambda(lambda and subsequently alpha should be between zero and one).What is the problem and how can I fix it?The code I used for this reason is:
cv.glmmod <- cv.glmnet(x, y=Bill_TotalCharge, alpha=1)
best.lambda <- cv.glmmod$lambda.min
I appreciate any help greatly in advance.

Linear regression model up to nth power of number

I know, that when I'm using lm() or glm() function to fit the regression model in R, it's possible to write interactions up to n-th degree like this:
fit <- glm(formula=outVar ~ (inVar1 + inVar2 + inVar3)^n,
data=d)
But is it possible to do similar thing with the power of variables, so I don't have to specify I(inVar1^2), I(inVar1^3) and to exclude interactions between different powers of the same variable?
EDIT
I'd like to do something like this:
formula=outVar ~ (poly(inVar1 + inVar2 + inVar3, 2))^2
So I'd get the formula
outVar ~ inVar1 + inVar2 + inVar3 + I(inVar1^2) + I(inVar2^2) + I(inVar3^2) + inVar1:inVar2 + inVar1:inVar3 + inVar2:inVar3 + I(inVar1^2):I(inVar2^2) + I(inVar1^2):I(inVar3^2) + I(inVar1^2):I(inVar3^2) + inVar1:I(inVar2^2) + inVar1:I(inVar3^2)...

roc curve for bayesian logistic regression

Is there anyone can help me implement a ROC curve for a bayesian logistic regression? been trying DPpackage but is it me or it just doesn't work.
the two models i want to compare using ROC Curve are showed below:
bayes_mod=MCMClogit(Default ~ ACTIVITY + CIF + MAN + STA + PIA + COL + CurrLiq + DebtCov + GDPgr, data=mydata, burnin=500000,mcmc=10000, tune=0.6,b0=coef(mylogit.reduced),B0=information2, subset=c(-1772,-2064,-655))
bayes_mod1=MCMClogit(Default ~ ACTIVITY + CIF + MAN + STA + PIA + COL + CurrLiq + DebtCov + GDPgr, data=mydata, burnin=500000,mcmc=10000,tune=0.6,subset=c(-1772,-2064,-655))
where Default ~ ACTIVITY + CIF + MAN + STA + PIA + COL + CurrLiq + DebtCov + GDPgr are my arguments; mydata is the database; mylogit.reduced is a logistic regression estimated prior to bayesian, B0 is the covariation matrix, and subset=c are the eliminated observations.
I don't know this package, but it probably provides a predict function (actually it does, I just can't find if it does for MCMClogit models as I can't find the doc for this function). You can then pass it to a ROC function like pROC:
library(pROC)
predictions <- predict(mydata, newdata=mytestdata)
roc(mytestdata$Default, predictions)

Resources