KFAS: Negative variances and huge Std Errors - r

I am trying to fit a Marketing Mix Model and run into the following problem:
Warning message:
In KFS(model) :
Possible error in diffuse filtering: Negative variances in Pinf, check the >model or try changing the tolerance parameter tol or P1/P1inf of the model.
Below is a reproducible sample code with more detail.
The goal is to develop a tool to optimize marketing mix
Below x1(t), x2(t), x3(t) are investments into 3 marketing channelsat time t
y(t) is the sales at time t
We want to use Kalman Filter approach:
y(t) = alpha + lambda * y(t-1) + beta1 * x1(t) + beta2 * x2(t) + beta3 * x3(t) +
beta12 * x1(t) * x2(t) + beta13 * x1(t) * x3(t) + beta23 * x2(t) * x3(t) + N(0, sigma)
In order to:
i. deduce sales attributed to each channel x1(t), x2(t), x3(t)
ii. their synergies x1 * x2, x1 * x3, x2 * x3
iii. carry over sales yCO(t) = y(t-1)
We use KFAS package
https://cran.r-project.org/web/packages/KFAS/index.html
Below is a simple reproducible example where we:
1. Simulate x1(t), x2(t), x3(t)
2. Set input parameter values used for sales simulation
b1, b2, b12, ..., b23, lambdaà, and sigma
3. Use simulated y(t), x1(t), x2(t), x3(t) to fit the model
4. Compare estimated coefficient with input values b1, b2,...
library(KFAS)
library(dplyr)
sigma<-50
set.seed(1)
x1<-1000 + rnorm(n = 100,mean = 0,sd = 100) + rnorm(100, 0, sigma)
x2<-rep(0, 100)
x2[sort(which(1:100%%6==0))]<-500
x3<-300+100*sin(1:100%%12/12*pi) + rnorm(100, 0, sigma)
#Operationalize with SQRT
x1<-sqrt(x1)
x2<-sqrt(x2)
x3<-sqrt(x3)
#Set input parameters fro simulation
lambda0<-0.5
b1 <- 3
b2 <- 4
b3 <- 5
b12 <- 0.3
b13 <- 0.2
b23 <- 0.1
y_s <-
b1 * x1 +
b2 * x2 +
b3 * x3 +
b12 * x1 * x2 +
b13 * x1 * x3 +
b23 * x2 * x3 +
rnorm(100, sd = sigma)
# function to account for carry over term
# y(t) = lambda * y(t-1) + y_s(t) , where
# y_s(t) = b1 * x1 + b2 * x2 + b3 * x3 + Synergy terms
getCarryOver<-function(t,
y_s,
lambda)
{
if (t==1) return(y_s[1])
else lambda*getCarryOver(t-1,y_s,lambda) + y_s[t]
}
# Add Carry Over term
y<-vector('numeric',100)
for (i in 1:100){
y[i]<-getCarryOver(i,y_s,lambda0)
}
yCO=dplyr::lag(y)
yCO[1]=0
if (!identical(y[-1],y_s[-1]+lambda0*yCO[-1]))
stop('identical(y,y_s+lambda0*yCO)')
model <- SSModel(y ~ SSMregression(~ x1 +
x2 +
x3 +
x1*x2 +
x1*x3 +
x2*x3 +
yCO
, Q = diag(NA,1)), H = NA)
fit <- fitSSM(model, inits = c(0,0,0,0,0,0,0,0), method = "BFGS")
model <- fit$model
model$Q
model$H
out <- KFS(model)
print(out)

This doesn't fully answer your question but this is too long for a comment...
You are not creating the same model you have formulated in before the codes. By defining Q=NA you are actually stating that the first coefficient x1 should be time varying with unknown variance. And then in the fitSSM call you are giving too many initial values so you don't notice the error (only Q and H need numerical estimation by fitSSM, the coeffients for x1 etc are directly estimated by Kalman filter). I admit there is probably few checks missing here which would warn user accordingly. SSMregression function is only needed if you have time-varying regression coefficients or complex multivariate models, here you can just write SSModel(y~ x1*x2 + x1*x3 + x2*x3 + yCO, H=NA) (the main effects are automatically included as in lm).
I would also check the carryover term calculations, just to be sure that you actually generate your data correctly.
If you still get errors, it could be that you have really high multicollinearity that you need to modify the prior for first time step, ie set model$P1inf[] <- 0 (removes the diffuse initialization) and set diag(model$P1) to something moderate like 100, (prior variance of the coefficients).
Actually if you are using the the formulation you suggest (all x's and yC0 as simple explanatory variable) then you should get identical results with lm. And running your code with lm I get same apparently wrong results:
> model <- SSModel(y~ x1*x2 + x1*x3 + x2*x3 + yCO, H=NA)
>
> fit <- fitSSM(model, inits = 0, method = "BFGS")
> out <- KFS(fit$model)
Warning message:
In KFS(fit$model) :
Possible error in diffuse filtering: Negative variances in Pinf, check the model or try changing the tolerance parameter tol or P1/P1inf of the model.
> out
Smoothed values of states and standard errors at time n = 100:
Estimate Std. Error
(Intercept) -1.171e+03 1.300e+03
x1 3.782e+01 4.102e+01
x2 -4.395e+00 1.235e+01
x3 7.287e+01 6.844e+01
yCO 5.244e-01 3.396e-02
x1:x2 5.215e-01 3.979e-01
x1:x3 -1.853e+00 2.167e+00
x2:x3 1.671e-01 3.471e-01
> summary(lm(y~ x1*x2 + x1*x3 + x2*x3 + yCO))
Call:
lm(formula = y ~ x1 * x2 + x1 * x3 + x2 * x3 + yCO)
Residuals:
Min 1Q Median 3Q Max
-137.297 -29.870 -2.214 35.178 87.578
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.171e+03 1.300e+03 -0.901 0.370
x1 3.782e+01 4.102e+01 0.922 0.359
x2 -4.395e+00 1.235e+01 -0.356 0.723
x3 7.287e+01 6.844e+01 1.065 0.290
yCO 5.244e-01 3.396e-02 15.445 <2e-16 ***
x1:x2 5.215e-01 3.979e-01 1.311 0.193
x1:x3 -1.853e+00 2.167e+00 -0.855 0.395
x2:x3 1.671e-01 3.471e-01 0.481 0.631
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 50.25 on 92 degrees of freedom
Multiple R-squared: 0.88, Adjusted R-squared: 0.8709
F-statistic: 96.43 on 7 and 92 DF, p-value: < 2.2e-16
So I think you have some issues with generating your example data, or in the model formulation.

Related

Diff-in-diff using Instrumental Variables: How to implement with ivreg? (interaction between endogenous and exogenous variables)

I'd like to estimate the effect of a treatment on two separate groups, so something of the form
Equation 1
T being the treatment and M the dummy separating the two groups.
The problem is that the treatment is correlated to other variables that affect Y. Luckily, there exists a variable Z that serves as an instrument for T. What I've been able to implement in R was to "manually" run 2SLS, following the stages
Equation 2
and
Equation 3
To provide a reproducible example, first a simulation
n <- 100
set.seed(271)
Z <- runif(n)
e <- rnorm(n, sd = 0.5)
M <- as.integer(runif(n)) # dummy
u <- rnorm(n)
# Treat = 1 + 2*Z + e
alpha_0 <- 1
alpha_1 <- 2
Treat <- alpha_0 + alpha_1*Z + e
# Y = 3 + M + 2*Treat + 3*Treat * M + e + u (ommited vars that determine Treat affect Y)
beta_0 <- 3
beta_1 <- 1
beta_2 <- 2
beta_3 <- 3
Y <- beta_0 + beta_1*M + beta_2*Treat + beta_3 * M*Treat + e + u
The first stage regression
fs <- lm(Treat ~ Z)
stargazer::stargazer(fs, type = "text")
===============================================
Dependent variable:
---------------------------
Treat
-----------------------------------------------
Z 2.383***
(0.168)
Constant 0.835***
(0.096)
-----------------------------------------------
Observations 100
R2 0.671
Adjusted R2 0.668
Residual Std. Error 0.445 (df = 98)
F Statistic 200.053*** (df = 1; 98)
===============================================
And second stage
Treat_hat <- fitted(fs)
ss <- lm(Y ~ M + Treat_hat + M:Treat_hat)
stargazer::stargazer(ss, type = "text")
===============================================
Dependent variable:
---------------------------
Y
-----------------------------------------------
M 1.230
(1.717)
Treat_hat 2.243***
(0.570)
M:Treat_hat 2.636***
(0.808)
Constant 2.711**
(1.213)
-----------------------------------------------
Observations 100
R2 0.727
Adjusted R2 0.718
Residual Std. Error 2.539 (df = 96)
F Statistic 85.112*** (df = 3; 96)
===============================================
The problem now is that those Standard Errors aren't adjusted for the first stage, which looks like quite some work to do manually. As I'd do for any other IV regression, I'd prefer to just use AER::ivreg.
But I can't seem to get the same regression going there. Here are many possible iterations, that never quite do the same thing
AER::ivreg(Y ~ M + Treat + M:Treat | Z)
AER::ivreg(Y ~ M + Treat + M:Treat | M + Z)
Warning message:
In ivreg.fit(X, Y, Z, weights, offset, ...) :
more regressors than instruments
These make sense, I guess
AER::ivreg(Y ~ M + Treat + M:Treat | M + Z + M:Z)
Call:
AER::ivreg(formula = Y ~ M + Treat + M:Treat | M + Z + M:Z)
Coefficients:
(Intercept) M Treat M:Treat
2.641 1.450 2.229 2.687
Surprisingly close, but not quite.
I couldn't find a way to tell ivreg that Treat and M:Treat aren't really two separate endogenous variables, but really just the same endogenous variable moved around and interacted with an exogenous one.
In conclusion,
i) Is there some way to mess with ivreg and make this work?
ii) Is there some other function for 2SLS that can just manually accept 1st and 2nd stage formulas without this sort of restriction, and that adjusts standard errors?
iii) What's the simplest way to get the correct SEs if there are no other alternatives? I didn't come across any direct R code, just a bunch of matrix multiplication formulas (although I didn't dig too deep for this one).
Thank you
Essentially, if Z is a valid a valid instrument for Treat, M:Z should be a valid instrument for M:Treat, so, to me this makes sense:
AER::ivreg(Y ~ M + Treat + M:Treat | M + Z + M:Z)
I actually managed to back out the correct param values for a modified simulation:
n <- 100
set.seed(271)
Z <- runif(n)
e <- rnorm(n, sd = 0.5)
M <- round(runif(n)) # note: I changed from as.integer() to round() in order to get some 1's in the regression
u <- rnorm(n)
# Treat = 1 + 2*Z + e
alpha_0 <- 1
alpha_1 <- 2
Treat <- alpha_0 + alpha_1*Z + e
beta_0 <- 3
beta_1 <- 1
beta_2 <- 2
beta_3 <- 3
Y <- beta_0 + beta_1*M + beta_2*Treat + beta_3 * M*Treat
Now:
my_ivreg <- AER::ivreg(Y ~ M + Treat + M:Treat | M + Z + M:Z)
>summary(my_ivreg)
Call:
AER::ivreg(formula = Y ~ M + Treat + M:Treat | M + Z + M:Z)
Residuals:
Min 1Q Median 3Q Max
-1.332e-14 -7.105e-15 -3.553e-15 -8.882e-16 3.553e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.000e+00 2.728e-15 1.100e+15 <2e-16 ***
M 1.000e+00 3.810e-15 2.625e+14 <2e-16 ***
Treat 2.000e+00 1.255e-15 1.593e+15 <2e-16 ***
M:Treat 3.000e+00 1.792e-15 1.674e+15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.633e-15 on 96 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
Wald test: 1.794e+31 on 3 and 96 DF, p-value: < 2.2e-16
Which is what we were looking for...

Goodness-of-fit for fixed effect logit model using 'bife' package

I am using the 'bife' package to run the fixed effect logit model in R. However, I cannot compute any goodness-of-fit to measure the model's overall fit given the result I have below. I would appreciate if I can know how to measure the goodness-of-fit given this limited information. I prefer chi-square test but still cannot find a way to implement this either.
---------------------------------------------------------------
Fixed effects logit model
with analytical bias-correction
Estimated model:
Y ~ X1 +X2 + X3 + X4 + X5 | Z
Log-Likelihood= -9153.165
n= 20383, number of events= 5104
Demeaning converged after 6 iteration(s)
Offset converged after 3 iteration(s)
Corrected structural parameter(s):
Estimate Std. error t-value Pr(> t)
X1 -8.67E-02 2.80E-03 -31.001 < 2e-16 ***
X2 1.79E+00 8.49E-02 21.084 < 2e-16 ***
X3 -1.14E-01 1.91E-02 -5.982 2.24E-09 ***
X4 -2.41E-04 2.37E-05 -10.171 < 2e-16 ***
X5 1.24E-01 3.33E-03 37.37 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AIC= 18730.33 , BIC= 20409.89
Average individual fixed effects= 1.6716
---------------------------------------------------------------
Let the DGP be
n <- 1000
x <- rnorm(n)
id <- rep(1:2, each = n / 2)
y <- 1 * (rnorm(n) > 0)
so that we will be under the null hypothesis. As it says in ?bife, when there is no bias-correction, everything is the same as with glm, except for the speed. So let's start with glm.
modGLM <- glm(y ~ 1 + x + factor(id), family = binomial())
modGLM0 <- glm(y ~ 1, family = binomial())
One way to perform the LR test is with
library(lmtest)
lrtest(modGLM0, modGLM)
# Likelihood ratio test
#
# Model 1: y ~ 1
# Model 2: y ~ 1 + x + factor(id)
# #Df LogLik Df Chisq Pr(>Chisq)
# 1 1 -692.70
# 2 3 -692.29 2 0.8063 0.6682
But we may also do it manually,
1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modGLM))),
modGLM0$df.residual - modGLM$df.residual)
# [1] 0.6682207
Now let's proceed with bife.
library(bife)
modBife <- bife(y ~ x | id)
modBife0 <- bife(y ~ 1 | id)
Here modBife is the full specification and modBife0 is only with fixed effects. For convenience, let
logLik.bife <- function(object, ...) object$logl_info$loglik
for loglikelihood extraction. Then we may compare modBife0 with modBife as in
1 - pchisq((-2 * logLik(modBife0)) - (-2 * logLik(modBife)), length(modBife$par$beta))
# [1] 1
while modGLM0 and modBife can be compared by running
1 - pchisq(c((-2 * logLik(modGLM0)) - (-2 * logLik(modBife))),
length(modBife$par$beta) + length(unique(id)) - 1)
# [1] 0.6682207
which gives the same result as before, even though with bife we, by default, have bias correction.
Lastly, as a bonus, we may simulate data and see it the test works as it's supposed to. 1000 iterations below show that both test (since two tests are the same) indeed reject as often as they are supposed to under the null.

R: lm() result differs when using `weights` argument and when using manually reweighted data

In order to correct heteroskedasticity in error terms, I am running the following weighted least squares regression in R :
#Call:
#lm(formula = a ~ q + q2 + b + c, data = mydata, weights = weighting)
#Weighted Residuals:
# Min 1Q Median 3Q Max
#-1.83779 -0.33226 0.02011 0.25135 1.48516
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -3.939440 0.609991 -6.458 1.62e-09 ***
#q 0.175019 0.070101 2.497 0.013696 *
#q2 0.048790 0.005613 8.693 8.49e-15 ***
#b 0.473891 0.134918 3.512 0.000598 ***
#c 0.119551 0.125430 0.953 0.342167
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 0.5096 on 140 degrees of freedom
#Multiple R-squared: 0.9639, Adjusted R-squared: 0.9628
#F-statistic: 933.6 on 4 and 140 DF, p-value: < 2.2e-16
Where "weighting" is a variable (function of the variable q) used for weighting the observations. q2 is simply q^2.
Now, to double-check my results, I manually weight my variables by creating new weighted variables :
mydata$a.wls <- mydata$a * mydata$weighting
mydata$q.wls <- mydata$q * mydata$weighting
mydata$q2.wls <- mydata$q2 * mydata$weighting
mydata$b.wls <- mydata$b * mydata$weighting
mydata$c.wls <- mydata$c * mydata$weighting
And run the following regression, without the weights option, and without a constant - since the constant is weighted, the column of 1 in the original predictor matrix should now equal the variable weighting:
Call:
lm(formula = a.wls ~ 0 + weighting + q.wls + q2.wls + b.wls + c.wls,
data = mydata)
#Residuals:
# Min 1Q Median 3Q Max
#-2.38404 -0.55784 0.01922 0.49838 2.62911
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#weighting -4.125559 0.579093 -7.124 5.05e-11 ***
#q.wls 0.217722 0.081851 2.660 0.008726 **
#q2.wls 0.045664 0.006229 7.330 1.67e-11 ***
#b.wls 0.466207 0.121429 3.839 0.000186 ***
#c.wls 0.133522 0.112641 1.185 0.237876
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 0.915 on 140 degrees of freedom
#Multiple R-squared: 0.9823, Adjusted R-squared: 0.9817
#F-statistic: 1556 on 5 and 140 DF, p-value: < 2.2e-16
As you can see, the results are similar but not identical. Am I doing something wrong while manually weighting the variables, or does the option "weights" do something more than simply multiplying the variables by the weighting vector?
Provided you do manual weighting correctly, you won't see discrepancy.
So the correct way to go is:
X <- model.matrix(~ q + q2 + b + c, mydata) ## non-weighted model matrix (with intercept)
w <- mydata$weighting ## weights
rw <- sqrt(w) ## root weights
y <- mydata$a ## non-weighted response
X_tilde <- rw * X ## weighted model matrix (with intercept)
y_tilde <- rw * y ## weighted response
## remember to drop intercept when using formula
fit_by_wls <- lm(y ~ X - 1, weights = w)
fit_by_ols <- lm(y_tilde ~ X_tilde - 1)
Although it is generally recommended to use lm.fit and lm.wfit when passing in matrix directly:
matfit_by_wls <- lm.wfit(X, y, w)
matfit_by_ols <- lm.fit(X_tilde, y_tilde)
But when using these internal subroutines lm.fit and lm.wfit, it is required that all input are complete cases without NA, otherwise the underlying C routine stats:::C_Cdqrls will complain.
If you still want to use the formula interface rather than matrix, you can do the following:
## weight by square root of weights, not weights
mydata$root.weighting <- sqrt(mydata$weighting)
mydata$a.wls <- mydata$a * mydata$root.weighting
mydata$q.wls <- mydata$q * mydata$root.weighting
mydata$q2.wls <- mydata$q2 * mydata$root.weighting
mydata$b.wls <- mydata$b * mydata$root.weighting
mydata$c.wls <- mydata$c * mydata$root.weighting
fit_by_wls <- lm(formula = a ~ q + q2 + b + c, data = mydata, weights = weighting)
fit_by_ols <- lm(formula = a.wls ~ 0 + root.weighting + q.wls + q2.wls + b.wls + c.wls,
data = mydata)
Reproducible Example
Let's use R's built-in data set trees. Use head(trees) to inspect this dataset. There is no NA in this dataset. We aim to fit a model:
Height ~ Girth + Volume
with some random weights between 1 and 2:
set.seed(0); w <- runif(nrow(trees), 1, 2)
We fit this model via weighted regression, either by passing weights to lm, or manually transforming data and calling lm with no weigths:
X <- model.matrix(~ Girth + Volume, trees) ## non-weighted model matrix (with intercept)
rw <- sqrt(w) ## root weights
y <- trees$Height ## non-weighted response
X_tilde <- rw * X ## weighted model matrix (with intercept)
y_tilde <- rw * y ## weighted response
fit_by_wls <- lm(y ~ X - 1, weights = w)
#Call:
#lm(formula = y ~ X - 1, weights = w)
#Coefficients:
#X(Intercept) XGirth XVolume
# 83.2127 -1.8639 0.5843
fit_by_ols <- lm(y_tilde ~ X_tilde - 1)
#Call:
#lm(formula = y_tilde ~ X_tilde - 1)
#Coefficients:
#X_tilde(Intercept) X_tildeGirth X_tildeVolume
# 83.2127 -1.8639 0.5843
So indeed, we see identical results.
Alternatively, we can use lm.fit and lm.wfit:
matfit_by_wls <- lm.wfit(X, y, w)
matfit_by_ols <- lm.fit(X_tilde, y_tilde)
We can check coefficients by:
matfit_by_wls$coefficients
#(Intercept) Girth Volume
# 83.2127455 -1.8639351 0.5843191
matfit_by_ols$coefficients
#(Intercept) Girth Volume
# 83.2127455 -1.8639351 0.5843191
Again, results are the same.

Force certain parameters to have positive coefficients in lm()

I would like to know how to constrain certain parameters in lm() to have positive coefficients. There are a few packages or functions (e.g. display) that can make all coefficients, and the intercept, positive.
For instance, in this example, I would like to force only x1 and x2 to have positive coefficients.
x1=c(NA,rnorm(99)*10)
x2=c(NA,NA,rnorm(98)*10)
x3=rnorm(100)*10
y=sin(x1)+cos(x2)-x3+rnorm(100)
lm(y~x1+x2+x3)
Call:
lm(formula = y ~ x1 + x2 + x3)
Coefficients:
(Intercept) x1 x2 x3
-0.06278 0.02261 -0.02233 -0.99626
I have tried function nnnpls() in package nnls, it can control the coefficient sign easily. Unfortunately I can't use it due to issues with NAs in the data as this function doesn't allow NA.
I saw function glmc() can be used to apply constraints but I couldn't get it working.
Could someone let me know what should I do?
You could use the package colf for this. It currently offers two least squares non linear optimizers, namely nls or nlxb:
library(colf)
colf_nlxb(y ~ x1 + x2 + x3, data = DF, lower = c(-Inf, 0, 0, -Inf))
#nlmrt class object: x
#residual sumsquares = 169.53 on 98 observations
# after 3 Jacobian and 3 function evaluations
# name coeff SEs tstat pval gradient JSingval
#1 param_X.Intercept. -0.0066952 NA NA NA 3.8118 103.3941
#2 param_x1 0.0000000 NA NA NA 103.7644 88.7017
#3 param_x2 0.0000000 NA NA NA 0.0000 9.8032
#4 param_x3 -0.9487088 NA NA NA 330.7776 0.0000
colf_nls(y ~ x1 + x2 + x3, data = DF, lower = c(-Inf, 0, 0, -Inf))
#Nonlinear regression model
# model: y ~ param_X.Intercept. * X.Intercept. + param_x1 * x1 + param_x2 *
# x2 + param_x3 * x3
# data: model_ingredients$model_data
#param_X.Intercept. param_x1 param_x2 param_x3
# -0.0392 0.0000 0.0000 -0.9801
# residual sum-of-squares: 159
#
#Algorithm "port", convergence message: both X-convergence and relative convergence (5)
You can set the lower and/or upper bounds to specify the limits as you like for each one of the coefficients.
You can use package penalized:
set.seed(1)
x1=c(NA,rnorm(99)*10)
x2=c(NA,NA,rnorm(98)*10)
x3=rnorm(100)*10
y=sin(x1)+cos(x2)-x3+rnorm(100)
DF <- data.frame(x1,x2,x3,y)
lm(y~x1+x2+x3, data=DF)
#Call:
#lm(formula = y ~ x1 + x2 + x3, data = DF)
#
#Coefficients:
#(Intercept) x1 x2 x3
# -0.02438 -0.01735 -0.02030 -0.98203
This gives the same:
library(penalized)
mod1 <- penalized(y, ~ x1 + x2 + x3, ~1,
lambda1=0, lambda2=0, positive = FALSE, data=na.omit(DF))
coef(mod1)
#(Intercept) x1 x2 x3
#-0.02438357 -0.01734856 -0.02030120 -0.98202831
If you constraint the coefficients of x1 and x2 to be positive, they become zero (as expected):
mod2 <- penalized(y, ~ x1 + x2 + x3, ~1,
lambda1=0, lambda2=0, positive = c(T, T, F), data=na.omit(DF))
coef(mod2)
#(Intercept) x3
#-0.03922266 -0.98011223
With ConsReg https://cran.r-project.org/web/packages/ConsReg/index.html package you can deal with this kind of problems
You can set bound limits (lower and upper) and also restrictions within coefficients, like beta1 > beta2 which in some cases can be very useful.

How can I specify a relationship between parameter estimates in lm?

Using lm, I would like to fit the model:
y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2
My question is:
How can I specify that the coefficient of the interaction should equal the multiplication of the coefficients the main effects?
I've seen that to set the coefficient to a specific value you can use offset() and I() but I don't know how to specify a relationship between coefficient.
Here is a simple simulated dataset:
n <- 50 # Sample size
x1 <- rnorm(n, 1:n, 0.5) # Independent variable 1
x2 <- rnorm(n, 1:n, 0.5) # Independent variable 2
b0 <- 1
b1 <- 0.5
b2 <- 0.2
y <- b0 + b1*x1 + b2*x2 + b1*b2*x1*x2 + rnorm(n,0,0.1)
To fit Model 1: y = b0 + b1*x1 + b2*x2 + b3*x1*x2, I would use:
summary(lm(y~ x1 + x2 + x1:x2))
But how do I fit Model 2: y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2?
One of the main differences between the two models is the number of parameters to estimate. In Model 1, we estimate 4 parameters: b0 (intercept), b1 (slope of var. 1), b2 (slope of var. 2), and b3 (slope for the interaction between vars. 1 & 2). In Model 2, we estimate 3 parameters: b0 (intercept), b1 (slope of var. 1 & part of slope of the interaction between vars. 1 & 2), and b2 (slope of var. 2 & part of slope of the interaction between vars. 1 & 2)
The reason why I want to do this is that when investigating whether there is a significant interaction between x1 & x2, model 2, y = b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, can be a better null model than y = b0 + b1*x1 + b2*x2.
Many thanks!
Marie
Because of the constraint that you impose on the coefficients, the model you specify is not a linear model and so lm can not be used to fit it. You would need to use a non-linear regression, such as nls.
> summary(nls(y ~ b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, start=list(b0=0, b1=1, b2=1)))
Formula: y ~ b0 + b1 * x1 + b2 * x2 + b1 * b2 * x1 * x2
Parameters:
Estimate Std. Error t value Pr(>|t|)
b0 0.987203 0.049713 19.86 <2e-16 ***
b1 0.494438 0.007803 63.37 <2e-16 ***
b2 0.202396 0.003359 60.25 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1121 on 47 degrees of freedom
Number of iterations to convergence: 5
Achieved convergence tolerance: 2.545e-06
You can really see that the model is non-linear when you re-write it as
> summary(nls(y ~ b0+(1+b1*x1)*(1+b2*x2)-1, start=list(b0=0, b1=1, b2=1)))
Formula: y ~ b0 + (1 + b1 * x1) * (1 + b2 * x2) - 1
Parameters:
Estimate Std. Error t value Pr(>|t|)
b0 0.987203 0.049713 19.86 <2e-16 ***
b1 0.494438 0.007803 63.37 <2e-16 ***
b2 0.202396 0.003359 60.25 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1121 on 47 degrees of freedom
Number of iterations to convergence: 5
Achieved convergence tolerance: 2.25e-06
Brian provides a way to fit the constrained model you specify but if you're interested in if the unconstrained model fits better than your constrained model you use the delta method to test that hypothesis.
# Let's make some fake data where the constrained model is true
n <- 100
b0 <- 2
b1 <- .2
b2 <- -1.3
b3 <- b1 * b2
sigma <- 1
x1 <- rnorm(n)
# make x1 and x2 correlated for giggles
x2 <- x1 + rnorm(n)
# Generate data according to the model
y <- b0 + b1*x1 + b2*x2 + b3*x1*x2 + rnorm(n, 0, sigma)
# Fit full model y = b0 + b1*x1 + b2*x3 + b3*x1*x2 + error
o <- lm(y ~ x1 + x2 + x1:x2)
# If we want to do a hypothesis test of Ho: b3 = b1*b2
# this is the same as Ho: b3 - b1*b2 = 0
library(msm)
# Get estimate of the difference specified in the null
est <- unname(coef(o)["x1:x2"] - coef(o)["x1"] * coef(o)["x2"])
# Use the delta method to get a standard error for
# this difference
standerr <- deltamethod(~ x4 - x3*x2, coef(o), vcov(o))
# Calculate a test statistic. We're relying on asymptotic
# arguments here so hopefully we have a decent sample size
z <- est/standerr
# Calculate p-value
pval <- 2 * pnorm(-abs(z))
pval
I explain what the delta method is used for and more on how to use it in R in this blog post.
Expanding on Brian's answer you could alternatively do this by comparing the full model to the constrained model - however you have to use nls to fit the full model to be able to easily compare the models.
o2 <- nls(y ~ b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, start=list(b0=0, b1=1, b2=1))
o3 <- nls(y ~ b0 + b1*x1 + b2*x2 + b3*x1*x2, start = list(b0 = 0, b1 = 1, b2 = 1, b3 = 1))
anova(o2, o3)
There's no way to do what you're asking for in lm and there's no reason for it to be able to do it. You run lm to get estimates of of your coefficients. If you don't want to estimate the coefficient then don't include the predictor in the model. You can use coef to extract the coefficients you want and multiply them out afterwards.
Note that leaving the interaction out is a different model and will produce a different b1 and b2. You could alternatively leave I(x1 * x2) in and not use the coefficient.
As for why you want to do this, there's not good a priori justification that your constrained model actually fits better than the simple additive model. Having more free parameters necessarily means a model fits better but you haven't added that, you've added a constraint that, in the real world, could make it fit worse. In that case would you consider it a better "baseline" for comparison to the model including the interaction?

Resources