Panel data regression: Robust standard errors - r

my problem is this: I get NA where I should get some values in the computation of robust standard errors.
I am trying to do a fixed effect panel regression with cluster-robust standard errors. For this, I follow Arai (2011) who on p. 3 follows Stock/ Watson (2006) (later published in Econometrica, for those who have access). I would like to correct the degrees of freedom by (M/(M-1)*(N-1)/(N-K) against downward bias as my number of clusters is finite and I have unbalanced data.
Similar problems have been posted before [1, 2] on StackOverflow and related problems [3] on CrossValidated.
Arai (and the answer in the 1st link) uses the following code for functions (I provide my data below with some further comment):
gcenter <- function(df1,group) {
variables <- paste(
rep("C", ncol(df1)), colnames(df1), sep=".")
copydf <- df1
for (i in 1:ncol(df1)) {
copydf[,i] <- df1[,i] - ave(df1[,i], group,FUN=mean)}
colnames(copydf) <- variables
return(cbind(df1,copydf))}
# 1-way adjusting for clusters
clx <- function(fm, dfcw, cluster){
# R-codes (www.r-project.org) for computing
# clustered-standard errors. Mahmood Arai, Jan 26, 2008.
# The arguments of the function are:
# fitted model, cluster1 and cluster2
# You need to install libraries `sandwich' and `lmtest'
# reweighting the var-cov matrix for the within model
library(sandwich);library(lmtest)
M <- length(unique(cluster))
N <- length(cluster)
K <- fm$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)*dfcw
coeftest(fm, vcovCL) }
,where the gcenter computes deviations from the mean (fixed effect). I then continue and do the regression with DS_CODEbeing my cluster variable (I have named my data 'data').
centerdata <- gcenter(data, data$DS_CODE)
datalm <- lm(C.L1.retE1M ~ C.MCAP_SEC + C.Impact_change + C.Mom + C.BM + C.PD + C.CashGen + C.NITA + C.PE + C.PEdummy + factor(DS_CODE), data=centerdata)
M <- length(unique(data$DS_CODE))
dfcw <- datalm$df / (datalm$df - (M-1))
and want to calculate
clx(datalm, dfcw, data$DS_CODE)
However, when I want to compute uj (see formula clx above) for the variance, I get only at the beginning some values for my regressors, then lots of zeros. If this input uj is used for the variance, only NAs result.
My data
Since my data may be of special structure and I can't figure out the problem, I post the entire thing as a link from Hotmail. The reason is that with other data (taken from Arai (2011)) my problem does not occur. Sorry in advance for the mess but I'd be very grateful if you could have a look at it nevertheless.
The file is a 5mb .txt file containing purely data.

After some time playing around, it works for me and gives me:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.5099e-16 5.2381e-16 0.8610 0.389254
C.MCAP_SEC -5.9769e-07 1.2677e-07 -4.7149 2.425e-06 ***
C.Impact_change -5.3908e-04 7.5601e-05 -7.1306 1.014e-12 ***
C.Mom 3.7560e-04 3.3378e-03 0.1125 0.910406
C.BM -1.6438e-04 1.7368e-05 -9.4645 < 2.2e-16 ***
C.PD 6.2153e-02 3.8766e-02 1.6033 0.108885
C.CashGen -2.7876e-04 1.4031e-02 -0.0199 0.984149
C.NITA -8.1792e-02 3.2153e-02 -2.5438 0.010969 *
C.PE -6.6170e-06 4.0138e-06 -1.6485 0.099248 .
C.PEdummy 1.3143e-02 4.8864e-03 2.6897 0.007154 **
factor(DS_CODE)130324 -5.2497e-16 5.2683e-16 -0.9965 0.319028
factor(DS_CODE)130409 -4.0276e-16 5.2384e-16 -0.7689 0.441986
factor(DS_CODE)130775 -4.4113e-16 5.2424e-16 -0.8415 0.400089
...
This leaves us with the question why it doesn't for you. I guess it has something to do with the format of your data. Is everything numeric? I converted the column classes and it looks like that for me:
str(dat)
'data.frame': 48251 obs. of 12 variables:
$ DS_CODE : chr "902172" "902172" "902172" "902172" ...
$ DNEW : num 2e+05 2e+05 2e+05 2e+05 2e+05 ...
$ MCAP_SEC : num 78122 71421 81907 80010 82462 ...
$ NITA : num 0.135 0.135 0.135 0.135 0.135 ...
$ CashGen : num 0.198 0.198 0.198 0.198 0.198 ...
$ BM : num 0.1074 0.1108 0.097 0.0968 0.0899 ...
$ PE : num 57 55.3 63.1 63.2 68 ...
$ PEdummy : num 0 0 0 0 0 0 0 0 0 0 ...
$ L1.retE1M : num -0.72492 0.13177 0.00122 0.07214 -0.07332 ...
$ Mom : num 0 0 0 0 0 ...
$ PD : num 5.41e-54 1.51e-66 3.16e-80 2.87e-79 4.39e-89 ...
$ Impact_change: num 0 -10.59 -10.43 0.7 -6.97 ...
What does str(data) return for you?

The plm package can estimate clustered SEs for panel regressions. The original data is no longer available, so here's an example using dummy data.
require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")
fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))
##Arellano clustered by *group* SEs
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC0"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680 0.066939 0.4434 0.6575
x 1.034833 0.050540 20.4755 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If you're using lm models (instead of plm), then the multiwayvcov package may help.
library("lmtest")
library("multiwayvcov")
data(petersen)
m1 <- lm(y ~ x, data = petersen)
> coeftest(m1, vcov=function(x) cluster.vcov(x, petersen[ , c("firmid")],
df_correction=FALSE))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680 0.066939 0.4434 0.6575
x 1.034833 0.050540 20.4755 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For more details see:
Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R.
See also:
Double clustered standard errors for panel data

Related

F-score and standardized Beta for heteroscedasticity-corrected covariance matrix (hccm) in R

I have multiple regression models which failed Breusch-Pagan tests, and so I've recalculated the variance using a heteroscedasticity-corrected covariance matrix, like this: coeftest(lm.model,vcov=hccm(lm.model)). coeftest() is from the lmtest package, while hccm() is from the car package.
I'd like to provide F-scores and standardized betas, but am not sure how to do this, because the output looks like this...
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.000261 0.038824 0.01 0.995
age 0.004410 0.041614 0.11 0.916
exercise -0.044727 0.023621 -1.89 0.059 .
tR -0.038375 0.037531 -1.02 0.307
allele1_num 0.013671 0.038017 0.36 0.719
tR:allele1_num -0.010077 0.038926 -0.26 0.796
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Any advice on how to report these so they are as consistent as possible with the standard summary() and Anova() output from R and car, and the function std_beta() from the sjmisc package?
In case anyone else has this question, here was my solution. It is not particularly elegant, but it works.
I simply used the function for std_beta as a template, and then changed the input for the standard error to that derived from the std_beta() function.
# This is taken from std_beta function from sj_misc package.
# =====================================
b <-coef(lm.model) # Same Estimate
b <-b[-1] # Same intercept
fit.data <- as.data.frame(stats::model.matrix(lm.model)) # Same model.
fit.data <- fit.data[, -1] # Keep intercept
fit.data <- as.data.frame(sapply(fit.data, function(x) if (is.factor(x))
to_value(x, keep.labels = F)
else x))
sx <- sapply(fit.data, sd, na.rm = T)
sy <- sapply(as.data.frame(lm.model$model)[1], sd, na.rm = T)
beta <- b * sx/sy
se <-coeftest(lm.model,vcov=hccm(lm.model))[,2] # ** USE HCCM covariance for SE **
se <- se[-1]
beta.se <- se * sx/sy
data.frame(beta = beta, ci.low = (beta - beta.se *
1.96), ci.hi = (beta + beta.se * 1.96))
For the F-scores, I just squared the t-values.
I hope this saves someone some time.

Error comparing linear mixed effects models

I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA
To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"
Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)

How to obtain lsmeans() pairwise contrasts with custom vcov?

I would like to get pairwise comparisons of adjusted means using lsmeans(), while supplying a robust coefficient-covariance matrix (e.g. vcovHC). Usually functions on regression models provide a vcov argument, but I can't seem to find any such argument in the lsmeans package.
Consider this dummy example, originally from CAR:
require(car)
require(lmtest)
require(sandwich)
require(lsmeans)
mod.moore.2 <- lm(conformity ~ fcategory + partner.status, data=Moore)
coeftest(mod.moore.2)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.197778 1.372669 7.4292 4.111e-09 ***
## fcategorymedium -1.176000 1.902026 -0.6183 0.539805
## fcategoryhigh -0.080889 1.809187 -0.0447 0.964555
## partner.statushigh 4.606667 1.556460 2.9597 0.005098 **
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
coeftest(mod.moore.2, vcov.=vcovHAC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.197778 0.980425 10.4014 4.565e-13 ***
## fcategorymedium -1.176000 1.574682 -0.7468 0.459435
## fcategoryhigh -0.080889 2.146102 -0.0377 0.970117
## partner.statushigh 4.606667 1.437955 3.2036 0.002626 **
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lsmeans(mod.moore.2, list(pairwise ~ fcategory), adjust="none")[[2]]
## contrast estimate SE df t.ratio p.value
## low - medium 1.17600000 1.902026 41 0.618 0.5398
## low - high 0.08088889 1.809187 41 0.045 0.9646
## medium - high -1.09511111 1.844549 41 -0.594 0.5560
##
## Results are averaged over the levels of: partner.status
As you can see, lsmeans() estimates p-values using the default variance-covariance matrix.
How can I obtain pairwise contrasts using the vcovHAC variance estimate?
It turns out that there is a wonderful and seamless interface between lsmeans and multcomp packages (see ?lsm), whereas lsmeans provides support for glht().
require(multcomp)
x <- glht(mod.moore.2, lsm(pairwise ~ fcategory), vcov=vcovHAC)
## Note: df set to 41
summary(x, test=adjusted("none"))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = conformity ~ fcategory + partner.status, data = Moore)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## low - medium == 0 1.17600 1.57468 0.747 0.459
## low - high == 0 0.08089 2.14610 0.038 0.970
## medium - high == 0 -1.09511 1.86197 -0.588 0.560
## (Adjusted p values reported -- none method)
This is at least one way to achieve this. I'm still hoping someone knows of an approach using lsmeans only...
Another way to approach this is to hack into the lsmeans object, and manually replace the variance-covariance matrix prior to summary-ing the object.
mod.lsm <- lsmeans(mod.moore.2, ~ fcategory)
mod.lsm#V <- vcovHAC(mod.moore.2) ##replace default vcov with custom vcov
pairs(mod.lsm, adjust = "none")
## contrast estimate SE df t.ratio p.value
## low - medium 1.17600000 1.574682 41 0.747 0.4594
## low - high 0.08088889 2.146102 41 0.038 0.9701
## medium - high -1.09511111 1.861969 41 -0.588 0.5597
##
## Results are averaged over the levels of: partner.status
I'm not sure if this was possible using the 'lsmeans' package but it is using the updated emmeans package.
Moore <- within(carData::Moore, {
partner.status <- factor(partner.status, c("low", "high"))
fcategory <- factor(fcategory, c("low", "medium", "high"))
})
mod.moore.2 <- lm(conformity ~ fcategory + partner.status, data=Moore)
lmtest::coeftest(mod.moore.2, vcov.= sandwich::vcovHAC)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 10.197778 0.980425 10.4014 4.565e-13 ***
#> fcategorymedium -1.176000 1.574682 -0.7468 0.459435
#> fcategoryhigh -0.080889 2.146102 -0.0377 0.970117
#> partner.statushigh 4.606667 1.437955 3.2036 0.002626 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans::emmeans(
mod.moore.2, trt.vs.ctrl ~ fcategory,
vcov = sandwich::vcovHAC(mod.moore.2),
adjust = "none")$contrasts
#> contrast estimate SE df t.ratio p.value
#> medium - low -1.1760 1.57 41 -0.747 0.4594
#> high - low -0.0809 2.15 41 -0.038 0.9701
#>
#> Results are averaged over the levels of: partner.status
Created on 2021-07-08 by the reprex package (v0.3.0)
Note, you can't just write the following
emmeans::emmeans(
mod.moore.2, trt.vs.ctrl ~ fcategory,
vcov = sandwich::vcovHAC,
adjust = "none")$contrasts
due to conflict with the sandwich::vcovHAC command which also has an adjust option. (I had incorrectly thought this was a bug).
OR
use update to inject a custom vcov matrix into your emmeans/emmGrid object.
Example:
# create an emmeans object from your fitted model
emmob <- emmeans(thismod, ~ predictor)
# generate a robust vcov matrix using a function
# from the sandwich or clubSandwich package
vcovR <- vcovHC(thismod, type="HC3")
# turn the resulting object into a (square) matrix
vcovRm <- matrix(vcovR, ncol=ncol(vcovR))
# update the V slot of the emmeans/emmGrid object
emmob <- update(emmob, V=vcovRm)

R Linear Regression Data in Single Column

I have the following data as an example:
InputName InputValue Output
===================================
Oxide 35 0.4
Oxide 35.2 0.42
Oxide 34.6 0.38
Oxide 35.9 0.46
CD 0.5 0.42
CD 0.48 0.4
CD 0.56 0.429
I want to do a linear regression of InputValue vs. Output treating different InputName as independent predictors.
If I want to use lm(Output ~ Oxide + CD) in R, it assumes a separate column for each predictor. In the example above that would mean making a separate column for Oxide and CD. I can do that using cast function from plyr package which might introduce NAs in the data.
However, is there a way to direct tell lm function that the input predictors are grouped according to the column InputName, and the values are given in the column Inputvalue?
It seems to me you are describing a form of dummy variable coding. This is not necessary in R at all, since any factor column in your data will automatically be dummy coded for you.
Recreate your data:
dat <- read.table(text="
InputName InputValue Output
Oxide 35 0.4
Oxide 35.2 0.42
Oxide 34.6 0.38
Oxide 35.9 0.46
CD 0.5 0.42
CD 0.48 0.4
CD 0.56 0.429
", header=TRUE)
Now build the model you described, but drop the intercept to make things a little bit more explicit:
fit <- lm(Output ~ InputValue + InputName - 1, dat)
summary(fit)
Call:
lm(formula = Output ~ InputValue + InputName - 1, data = dat)
Residuals:
1 2 3 4 5 6 7
-0.003885 0.003412 0.001519 -0.001046 0.004513 -0.014216 0.009703
Coefficients:
Estimate Std. Error t value Pr(>|t|)
InputValue 0.063512 0.009864 6.439 0.00299 **
InputNameCD 0.383731 0.007385 51.962 8.21e-07 ***
InputNameOxide -1.819018 0.346998 -5.242 0.00633 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.009311 on 4 degrees of freedom
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9995
F-statistic: 4662 on 3 and 4 DF, p-value: 1.533e-07
Notice how all of your factor levels for InputName appear in the output, giving you a separate estimate of the effect of each level.
Concisely, the information you need are in these two lines:
InputNameCD 0.383731 0.007385 51.962 8.21e-07 ***
InputNameOxide -1.819018 0.346998 -5.242 0.00633 **
Here are 2 ways of doing this, split the data and do the regressions separately, or use interaction terms to specify that you want to consider the different levels of InputName to have separate slopes:
Split
lapply(split(dat,dat$InputName),lm,formula=Output~InputValue)
$CD
Call:
FUN(formula = ..1, data = X[[1L]])
Coefficients:
(Intercept) InputValue
0.2554 0.3135
$Oxide
Call:
FUN(formula = ..1, data = X[[2L]])
Coefficients:
(Intercept) InputValue
-1.78468 0.06254
Interaction
lm(Output~InputName + InputName:InputValue - 1,dat)
Call:
lm(formula = Output ~ InputName + InputName:InputValue - 1, data = dat)
Coefficients:
InputNameCD InputNameOxide InputNameCD:InputValue InputNameOxide:InputValue
0.25542 -1.78468 0.31346 0.06254
For comparision purposes I've also removed the intercept. Note that the estimated coefficients are the same in each case.

R lm interaction terms with categorical and squared continuous variables

I am trying to get an lm fit for my data. The problem I am having is that I want to fit a linear model(1st order polynomial) when the factor is "true" and a second order polynomial when the factor is "false". How can I get that done using only one lm.
a=c(1,2,3,4,5,6,7,8,9,10)
b=factor(c("true","false","true","false","true","false","true","false","true","false"))
c=c(10,8,20,15,30,21,40,25,50,31)
DumbData<-data.frame(cbind(a,c))
DumbData<-cbind(DumbData,b=b)
I have tried
Lm2<-lm(c~a + b + b*I(a^2), data=DumbData)
summary(Lm2)
that results in:
summary(Lm2)
Call:
lm(formula = c ~ a + b + b * I(a^2), data = DumbData)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.74483 1.12047 -0.665 0.535640
a 4.44433 0.39619 11.218 9.83e-05 ***
btrue 6.78670 0.78299 8.668 0.000338 ***
I(a^2) -0.13457 0.03324 -4.049 0.009840 **
btrue:I(a^2) 0.18719 0.01620 11.558 8.51e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7537 on 5 degrees of freedom
Multiple R-squared: 0.9982, Adjusted R-squared: 0.9967
F-statistic: 688 on 4 and 5 DF, p-value: 4.896e-07
here I have I(a^2) for both fits and i want 1 1st order and another with second order polynomials.
If one tries with:
Lm2<-lm(c~a + b + I(b*I(a^2)), data=DumbData)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In Ops.factor(b, I(a^2)) : * not meaningful for factors
How can I get the proper interaction terms here???
Thanks Andrie, there are still some things I am missing here. In this example the variable b is a logic one, if is a factor of two levels does not work, I guess I have to convert the factor variable in a logic one. The other thing I am missing is the not in the condition, I(!b*a^2) without the ! I get:
Call: lm(formula = c ~ a + I(b * a^2), data = dat)
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.2692 1.8425 3.945 0.005565 **
a 2.3222 0.3258 7.128 0.000189 ***
I(b * a^2) 0.3005 0.0355 8.465 6.34e-05 ***
I can not relate the formulas with and without the ! condition, which is a bit strange to me.
Try something along the following lines:
dat <- data.frame(
a=c(1,2,3,4,5,6,7,8,9,10),
b=c(TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE),
c=c(10,8,20,15,30,21,40,25,50,31)
)
fit <- lm(c ~ a + I(!b * a^2), dat)
summary(fit)
This results in:
Call:
lm(formula = c ~ a + I(!b * a^2), data = dat)
Residuals:
Min 1Q Median 3Q Max
-4.60 -2.65 0.50 2.65 4.40
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.5000 2.6950 3.896 0.005928 **
a 3.9000 0.4209 9.266 3.53e-05 ***
I(!b * a^2)TRUE -13.9000 2.4178 -5.749 0.000699 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.764 on 7 degrees of freedom
Multiple R-squared: 0.9367, Adjusted R-squared: 0.9186
F-statistic: 51.75 on 2 and 7 DF, p-value: 6.398e-05
Note:
I made use of the logical values TRUE and FALSE.
These will coerce to 1 and 0, respectively.
I used the negation !b inside the formula.
Ummm ...
Lm2<-lm(c~a + b + b*I(a^2), data=DumbData)
You say that "The problem I am having is that I want to fit a linear model(1st order polynomial) when the factor is "true" and a second order polynomial when the factor is "false". How can I get that done using only one lm. "
From that I infer that you don't want b to be directly in the model? In addition, a^2 should be included only if b is false.
So that would be...
lm(c~ a + I((!b) * a^2))
If b is true (that is, !b equals FALSE) then a^2 is multiplied by zero (FALSE) and omitted from the equation.
The only problem is that you have defined b as factor instead of logical. That can be cured.
# b=factor(c("true","false","true","false","true","false","true","false","true","false"))
# could use TRUE and FALSE instead of "ture" and "false"
# alternatively, after defining b as above, do
# b <- b=="true" -- that would convert b to logical (i.e boolean TRUE and FALSe values)
Ok to be exact, you defined b as "character" but it was converted to "factor" when adding it to the data frame ("DumbData")
Another minor point about the way you defined the data frame.
a=c(1,2,3,4,5,6,7,8,9,10)
b=factor(c("true","false","true","false","true","false","true","false","true","false"))
c=c(10,8,20,15,30,21,40,25,50,31)
DumbData<-data.frame(cbind(a,c))
DumbData<-cbind(DumbData,b=b)
Here, cbind is unnecessary. You coud have it all on one line:
Dumbdata<- data.frame(a,b,c)
# shorter and cleaner!!
In addition, to convert b to logical use:
Dumbdata<- data.frame(a,b=b=="true",c)
Note. You need to say b=b=="true", it seems redundant but the LHS (b) gives the name of the variable in data frame whereas the RHS (b=="true") is an expression that evaluates to a "logical" (boolean) value.

Resources