How to get coefficient for intercept in felm - r

I run the following code in R. And I do not get coefficient for intercept. How can I get the coefficient for intercept?
#create covariates
x <- rnorm(4000)
x2 <- rnorm(length(x))
#create individual and firm
id <- factor(sample(500,length(x),replace=TRUE))
firm <- factor(sample(300,length(x),replace=TRUE))
#effects
id.eff <- rlnorm(nlevels(id))
firm.eff <- rexp(nlevels(firm))
#left hand side
y <- 50000 + x + 0.25*x2 + id.eff[id] + firm.eff[firm] + rnorm(length(x))
#estimate and print result
est <- felm(y ~ x+x2 | id + firm)
summary(est)
Call: felm(formula = y ~ x + x2 | id + firm)
which gives me
Residuals: Min 1Q Median 3Q Max -3.3129 -0.6147 -0.0009 0.6131 3.2878
Coefficients: Estimate Std. Error t value Pr(>|t|)
x 1.00276 0.01834 54.66 <2e-16 ***
x2 0.26190 0.01802 14.54 <2e-16 ***
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.02 on 3199 degrees of freedom Multiple R-squared(full model): 0.8778 Adjusted R-squared: 0.8472 Multiple R-squared(proj model): 0.4988 Adjusted R-squared: 0.3735 F-statistic(full model):28.72 on 800 and 3199 DF, p-value: < 2.2e-16 F-statistic(proj model): 1592 on 2 and 3199 DF, p-value: < 2.2e-16

Related

Linear regression on dynamic groups in R

I have a data.table data_dt on which I want to run linear regression so that user can choose the number of columns in groups G1 and G2 using variable n_col. The following code works perfectly but it is slow due to extra time spent on creating matrices. To improve the performance of the code below, is there a way to remove Steps 1, 2, and 3 altogether by tweaking the formula of lm function and still get the same results?
library(timeSeries)
library(data.table)
data_dt = as.data.table(LPP2005REC[, -1])
n_col = 3 # Choose a number from 1 to 3
######### Step 1 ######### Create independent variable
xx <- as.matrix(data_dt[, "SPI"])
######### Step 2 ######### Create Group 1 of dependent variables
G1 <- as.matrix(data_dt[, .SD, .SDcols=c(1:n_col + 2)])
######### Step 3 ######### Create Group 2 of dependent variables
G2 <- as.matrix(data_dt[, .SD, .SDcols=c(1:n_col + 2 + n_col)])
lm(xx ~ G1 + G2)
Results -
summary(lm(xx ~ G1 + G2))
Call:
lm(formula = xx ~ G1 + G2)
Residuals:
Min 1Q Median 3Q Max
-3.763e-07 -4.130e-09 3.000e-09 9.840e-09 4.401e-07
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.931e-09 3.038e-09 -1.623e+00 0.1054
G1LMI -5.000e-01 4.083e-06 -1.225e+05 <2e-16 ***
G1MPI -2.000e+00 4.014e-06 -4.982e+05 <2e-16 ***
G1ALT -1.500e+00 5.556e-06 -2.700e+05 <2e-16 ***
G2LPP25 3.071e-04 1.407e-04 2.184e+00 0.0296 *
G2LPP40 -5.001e+00 2.360e-04 -2.119e+04 <2e-16 ***
G2LPP60 1.000e+01 8.704e-05 1.149e+05 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.762e-08 on 370 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.104e+12 on 6 and 370 DF, p-value: < 2.2e-16
This may be easier by just creating the formula with reformulate
out <- lm(reformulate(names(data_dt)[c(1:n_col + 2, 1:n_col + 2 + n_col)],
response = 'SPI'), data = data_dt)
-checking
> summary(out)
Call:
lm(formula = reformulate(names(data_dt)[c(1:n_col + 2, 1:n_col +
2 + n_col)], response = "SPI"), data = data_dt)
Residuals:
Min 1Q Median 3Q Max
-3.763e-07 -4.130e-09 3.000e-09 9.840e-09 4.401e-07
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.931e-09 3.038e-09 -1.623e+00 0.1054
LMI -5.000e-01 4.083e-06 -1.225e+05 <2e-16 ***
MPI -2.000e+00 4.014e-06 -4.982e+05 <2e-16 ***
ALT -1.500e+00 5.556e-06 -2.700e+05 <2e-16 ***
LPP25 3.071e-04 1.407e-04 2.184e+00 0.0296 *
LPP40 -5.001e+00 2.360e-04 -2.119e+04 <2e-16 ***
LPP60 1.000e+01 8.704e-05 1.149e+05 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.762e-08 on 370 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.104e+12 on 6 and 370 DF, p-value: < 2.2e-16

How would I fit this linear trend function in R?

I want to fit this linear trend function to my data:
Yt=a+bt+Xt
This is based on time series data.
I believe writing lm(y ~ time) will return the equivalent of Yt=a+Xt but I'm confused how to include bt into this linear trend function in R.
You can simply include it as an explanatory variable
library(data.table)
d <- data.table(id = 1)
d <- d[, .(year=1:200), by=id]
d[, x1 := runif(200)]
# add an erros
d[, e := rnorm(200, 23, 7)]
# add the dependent variable
d[, y := 3.5*x1 + 0.5*year + e ]
m <- lm(y ~ x1 + year, d)
summary(m)
Call:
lm(formula = y ~ x1 + year, data = d)
Residuals:
Min 1Q Median 3Q Max
-19.2008 -4.4356 0.3986 5.2283 16.6819
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.064776 1.519766 13.203 <2e-16 ***
x1 3.114048 1.914318 1.627 0.105
year 0.523195 0.009187 56.947 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.469 on 197 degrees of freedom
Multiple R-squared: 0.943, Adjusted R-squared: 0.9424
F-statistic: 1628 on 2 and 197 DF, p-value: < 2.2e-16

Test if intercepts in ancova model are significantly different in R

I ran a model explaining the weight of some plant as a function of time and trying to incorporate the treatment effect.
mod <- lm(weight ~time + treatment)
The model looks like this:
with model summary being:
Call:
lm(formula = weight ~ time + treatment, data = df)
Residuals:
Min 1Q Median 3Q Max
-21.952 -7.674 0.770 6.851 21.514
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -37.5790 3.2897 -11.423 < 2e-16 ***
time 4.7478 0.2541 18.688 < 2e-16 ***
treatmentB 8.2000 2.4545 3.341 0.00113 **
treatmentC 5.4633 2.4545 2.226 0.02797 *
treatmentD 20.3533 2.4545 8.292 2.36e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.506 on 115 degrees of freedom
Multiple R-squared: 0.7862, Adjusted R-squared: 0.7788
F-statistic: 105.7 on 4 and 115 DF, p-value: < 2.2e-16
ANOVA table
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
time 1 31558.1 31558.1 349.227 < 2.2e-16 ***
treatment 3 6661.9 2220.6 24.574 2.328e-12 ***
Residuals 115 10392.0 90.4
I want to test the H0 that intercept1=intercept2=intercept3=intercept4. Is this done by simply interpreting the t-value and p-value for the intercept ( I guess not because this is the baseline (treatment A in this case))? I'm a bit puzzled by this as not much attention is paid on difference in intercept on most sources i looked up.

linear model having 4 predictors

I am trying to fit a linear model having 4 predictors. The problem I am facing is my code doesn't estimate the one parameter. Every time when I put the one variable at last of my lm formula it doesn't estimate it. My code is:
AllData <- read.csv("AllBandReflectance.csv",header = T)
Swir2ref <- AllData$band7
x1 <- AllData$X1
x2 <- AllData$X2
y1 <- AllData$Y1
y2 <- AllData$Y2
linear.model <- lm( Swir2ref ~ x1 + y1 +x2 +y2 , data = AllData )
summary(linear.model)
Call:
lm(formula = Swir2ref ~ x1 + y1 + x2 + y2, data = AllData)
Residuals:
Min 1Q Median 3Q Max
-0.027277 -0.008793 -0.000689 0.010085 0.035097
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.595593 0.002006 296.964 <2e-16 ***
x1 0.002175 0.003462 0.628 0.532
y1 0.001498 0.003638 0.412 0.682
x2 0.022671 0.018786 1.207 0.232
y2 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01437 on 67 degrees of freedom
Multiple R-squared: 0.02876, Adjusted R-squared: -0.01473
F-statistic: 0.6613 on 3 and 67 DF, p-value: 0.5787

Polynomial model to data in R [duplicate]

This question already has answers here:
Fitting polynomial model to data in R
(5 answers)
Closed 5 years ago.
Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015)
Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,632,720,814,920,1044,1186)
How can I find the population for the years: 1925, 1963, 1978, 1988, 1998 using Polynomial Linear Regression.
Here is a starting point for the solution of your problem.
Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,
1970,1975,1980,1985,1990,1995,2000,2005,2010,2015)
Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,
632,720,814,920,1044,1186)
df <- data.frame(Year, Africa)
# Polynomial linear regression of order 5
model1 <- lm(Africa ~ poly(Year,5), data=df)
summary(model1)
###########
Call:
lm(formula = Africa ~ poly(Year, 5), data = df)
Residuals:
Min 1Q Median 3Q Max
-59.639 -27.119 -12.397 9.149 97.398
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 411.32 10.12 40.643 < 2e-16 ***
poly(Year, 5)1 881.26 47.47 18.565 3.01e-12 ***
poly(Year, 5)2 768.50 47.47 16.190 2.42e-11 ***
poly(Year, 5)3 709.43 47.47 14.945 8.07e-11 ***
poly(Year, 5)4 628.45 47.47 13.239 4.89e-10 ***
poly(Year, 5)5 359.04 47.47 7.564 1.14e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 47.47 on 16 degrees of freedom
Multiple R-squared: 0.9852, Adjusted R-squared: 0.9805
F-statistic: 212.5 on 5 and 16 DF, p-value: 4.859e-14
#############
pred <- predict(model1)
plot(Year, Africa, type="o", xlab="Year", ylab="Africa")
lines(Year, pred, lwd=2, col="red")
The model estimated above shows a bad fit for Years < 1900. It is therefore preferable to estimate a model only for data after 1900.
# Polynomial linear regression of order 2
df2 <- subset(df,Year>1900)
model2 <- lm(Africa ~ poly(Year,2), data=df2)
summary(model2)
###########
Call:
lm(formula = Africa ~ poly(Year, 2), data = df2)
Residuals:
Min 1Q Median 3Q Max
-9.267 -2.489 -0.011 3.334 12.482
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 586.857 1.677 349.93 < 2e-16 ***
poly(Year, 2)1 1086.646 6.275 173.17 < 2e-16 ***
poly(Year, 2)2 245.687 6.275 39.15 3.65e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.275 on 11 degrees of freedom
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9996
F-statistic: 1.576e+04 on 2 and 11 DF, p-value: < 2.2e-16
###########
df2$pred <- predict(model2)
plot(df2$Year, df2$Africa, type="o", xlab="Year", ylab="Africa")
lines(df2$Year, df2$pred, lwd=2, col="red")
The fit of this second model is clearly better:
At last, we get model prediction for the years 1925, 1963, 1978, 1988, 1998.
df3 <- data.frame(Year=c(1925, 1963, 1978, 1988, 1998))
df3$pred <- predict(model2, newdata=df3)
df3
Year pred
1 1925 286.4863
2 1963 301.1507
3 1978 451.7210
4 1988 597.6301
5 1998 779.9623

Resources