Once I've done an STL or decomposition on time series data, how do I extract the models for each component?
For example, how do I get the slope and intercept for the trend, the period for the seasonal data, and so on?
I can provide sample data if needed, but this is a generic question.
As a partial answer to your question, the trend can be extracted quite easily if it is linear. Here's an example:
library(forecast)
plot(decompose(AirPassengers))
In the case of a linear trend we can use the tslm() function to extract the intercept and the slope
tslm(AirPassengers ~ trend)
Call:
lm(formula = formula, data = "AirPassengers", na.action = na.exclude)
Coefficients:
(Intercept) trend
87.653 2.657
To obtain a fit including the seasons, this could be extended like
fit <- tslm(AirPassengers ~ trend + season)
> summary(fit)
Call:
lm(formula = formula, data = "AirPassengers", na.action = na.exclude)
Residuals:
Min 1Q Median 3Q Max
-42.121 -18.564 -3.268 15.189 95.085
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.50794 8.38856 7.571 5.88e-12 ***
trend 2.66033 0.05297 50.225 < 2e-16 ***
season2 -9.41033 10.74941 -0.875 0.382944
season3 23.09601 10.74980 2.149 0.033513 *
season4 17.35235 10.75046 1.614 0.108911
season5 19.44202 10.75137 1.808 0.072849 .
season6 56.61502 10.75254 5.265 5.58e-07 ***
season7 93.62136 10.75398 8.706 1.17e-14 ***
season8 90.71103 10.75567 8.434 5.32e-14 ***
season9 39.38403 10.75763 3.661 0.000363 ***
season10 0.89037 10.75985 0.083 0.934177
season11 -35.51996 10.76232 -3.300 0.001244 **
season12 -9.18029 10.76506 -0.853 0.395335
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 26.33 on 131 degrees of freedom
Multiple R-squared: 0.9559, Adjusted R-squared: 0.9518
F-statistic: 236.5 on 12 and 131 DF, p-value: < 2.2e-16
If I interpret this result correctly, there is a monthly average increase of 2.66 passengers, and there are on average 9.4 passengers less in the second month than in the first month, etc.
Related
I want to extract a selected output from the lm function. This is the code i have,
fastfood <- openintro::fastfood
L1 = lm(formula = calories~sat_fat +fiber+sugar, fastfood)
summary(L1)
This is the output
Call:
lm(formula = calories ~ sat_fat + fiber + sugar, data = fastfood)
Residuals:
Min 1Q Median 3Q Max
-680.18 -88.97 -24.65 57.46 1501.07
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 113.334 15.760 7.191 2.36e-12 ***
sat_fat 30.839 1.180 26.132 < 2e-16 ***
fiber 24.396 2.444 9.983 < 2e-16 ***
sugar 8.890 1.120 7.938 1.37e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 160.8 on 499 degrees of freedom
(12 observations deleted due to missingness)
Multiple R-squared: 0.6726, Adjusted R-squared: 0.6707
F-statistic: 341.7 on 3 and 499 DF, p-value: < 2.2e-16
I need to extract only the follwing from above output ? How do i get to this?
sat_fat 30.839.
Most commonly coef() is used to return the coefficients e.g.
coef(L1)
coef(L1)['sat_fat']
You may also want to look at tidy in the broom package which returns a nice summary as a dataframe, with coefficients in the estimate column.
I am using lm() on a large data set in R. Using summary() one can get lot of details about linear regression between these two parameters.
The part I am confused with is which one is the correct parameter in the Coefficients: section of summary, to use as correlation coefficient?
Sample Data
c1 <- c(1:10)
c2 <- c(10:19)
output <- summary(lm(c1 ~ c2))
Summary
Call:
lm(formula = c1 ~ c2)
Residuals:
Min 1Q Median 3Q Max
-2.280e-15 -8.925e-16 -2.144e-16 4.221e-16 4.051e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.000e+00 2.902e-15 -3.101e+15 <2e-16 ***
c2 1.000e+00 1.963e-16 5.093e+15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.783e-15 on 8 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 2.594e+31 on 1 and 8 DF, p-value: < 2.2e-16
Is this the correlation coefficient I should use?
output$coefficients[2,1]
1
Please suggest, thanks.
The full variance covariance matrix of the coefficient estimates is:
fm <- lm(c1 ~ c2)
vcov(fm)
and in particular sqrt(diag(vcov(fm))) equals coef(summary(fm))[, 2]
The corresponding correlation matrix is:
cov2cor(vcov(fm))
The correlation between the coefficient estimates is:
cov2cor(vcov(fm))[1, 2]
I have data from an experiment with two conditions (dichotomous IV: 'condition'). I also want to make use of another IV which is metric ('hh'). My DV is also metric ('attention.hh'). I've already run a multiple regression model with an interaction of my IVs. Therefore, I centered the metric IV by doing this:
hh.cen <- as.numeric(scale(data$hh, scale = FALSE))
with these variables I ran the following analysis:
model.hh <- lm(attention.hh ~ hh.cen * condition, data = data)
summary(model.hh)
The results are as follows:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04309 3.83335 0.011 0.991
hh.cen 4.97842 7.80610 0.638 0.525
condition 4.70662 5.63801 0.835 0.406
hh.cen:condition -13.83022 11.06636 -1.250 0.215
However, the theory behind my analysis tells me, that I should expect a quadratic relation of my metric IV (hh) and the DV (but only in one condition).
Looking at the plot, one could at least imply this relation:
Of course I want to test this statistically. However, I'm struggling now how to compute the lineare regression model.
I have two solutions I think that should be good, leading to different outcomes. Unfortunately, I don't know which is the right one now. I know, that by including interactions (and 3-way interactions) into the model, I also have to include all simple/main effects as well.
Solution: Including all terms on their own:
therefore I first compute the squared IV:
attention.hh.cen <- scale(data$attention.hh, scale = FALSE)
now i can compute the linear model:
sqr.model.1 <- lm(attention.hh.cen ~ condition + hh.cen + hh.sqr + (condition : hh.cen) + (condition : hh.sqr) , data = data)
summary(sqr.model.1)
This leads to the following outcome:
Call:
lm(formula = attention.hh.cen ~ condition + hh.cen + hh.sqr +
(condition:hh.cen) + (condition:hh.sqr), data = data)
Residuals:
Min 1Q Median 3Q Max
-53.798 -14.527 2.912 13.111 49.119
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.3475 3.5312 -0.382 0.7037
condition -9.2184 5.6590 -1.629 0.1069
hh.cen 4.0816 6.0200 0.678 0.4996
hh.sqr 5.0555 8.1614 0.619 0.5372
condition:hh.cen -0.3563 8.6864 -0.041 0.9674
condition:hh.sqr 33.5489 13.6448 2.459 0.0159 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 20.77 on 87 degrees of freedom
Multiple R-squared: 0.1335, Adjusted R-squared: 0.08365
F-statistic: 2.68 on 5 and 87 DF, p-value: 0.02664
Solution: R includes all main effects of an interaction by using the *
sqr.model.2 <- lm(attention.hh.cen ~ condition * I(hh.cen^2), data = data)
summary(sqr.model.2)
IMHO, this should also be fine -- however, the output is not the same as the one received by the code above
Call:
lm(formula = attention.hh.cen ~ condition * I(hh.cen^2), data = data)
Residuals:
Min 1Q Median 3Q Max
-52.297 -13.353 2.508 12.504 49.740
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.300 3.507 -0.371 0.7117
condition -8.672 5.532 -1.567 0.1206
I(hh.cen^2) 4.490 8.064 0.557 0.5791
condition:I(hh.cen^2) 32.315 13.190 2.450 0.0162 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 20.64 on 89 degrees of freedom
Multiple R-squared: 0.1254, Adjusted R-squared: 0.09587
F-statistic: 4.252 on 3 and 89 DF, p-value: 0.007431
I'd rather go with solution number 1 but I'm not sure about that.
Maybe someone has a better solution or can help me out?
After running a regression, how can I select the variable name and corresponding parameter estimate?
For example, after running the following regression, I obtain:
set.seed(1)
n=1000
x=rnorm(n,0,1)
y=.6*x+rnorm(n,0,sqrt(1-.6)^2)
(reg1=summary(lm(y~x)))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-1.2994 -0.2688 -0.0055 0.3022 1.4577
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.006475 0.013162 -0.492 0.623
x 0.602573 0.012723 47.359 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4162 on 998 degrees of freedom
Multiple R-squared: 0.6921, Adjusted R-squared: 0.6918
F-statistic: 2243 on 1 and 998 DF, p-value: < 2.2e-16
I would like to be able to select the coefficient by the variable names (e.g., (Intercept) -0.006475)
I have tried the following but nothing works...
attr(reg1$coefficients,"terms")
names(reg1$coefficients)
Note: This works reg1$coefficients[1,1] but I want to be able to call it by the name rather than row / column.
The package broom tidies a lot of regression models very nicely.
require(broom)
set.seed(1)
n=1000
x=rnorm(n,0,1)
y=.6*x+rnorm(n,0,sqrt(1-.6)^2)
model = lm(y~x)
tt <- tidy(model, conf.int=TRUE)
subset(tt,term=="x")
## term estimate std.error statistic p.value conf.low conf.high
## 2 x 0.602573 0.01272349 47.35908 1.687125e-257 0.5776051 0.6275409
with(tt,tt[term=="(Intercept)","estimate"])
## [1] -0.006474794
So, your code doesn't run the way you have it. I changed it a bit:
set.seed(1)
n=1000
x=rnorm(n,0,1)
y=.6*x+rnorm(n,0,sqrt(1-.6)^2)
model = lm(y~x)
Now, I can call coef(model)["x"] or coef(model)["(Intercept)"] and get the values.
> coef(model)["x"]
x
0.602573
> coef(model)["(Intercept)"]
(Intercept)
-0.006474794
Suppose I want to fit a linear regression model with degree two (orthogonal) polynomial and then predict the response. Here are the codes for the first model (m1)
x=1:100
y=-2+3*x-5*x^2+rnorm(100)
m1=lm(y~poly(x,2))
prd.1=predict(m1,newdata=data.frame(x=105:110))
Now let's try the same model but instead of using $poly(x,2)$, I will use its columns like:
m2=lm(y~poly(x,2)[,1]+poly(x,2)[,2])
prd.2=predict(m2,newdata=data.frame(x=105:110))
Let's look at the summaries of m1 and m2.
> summary(m1)
Call:
lm(formula = y ~ poly(x, 2))
Residuals:
Min 1Q Median 3Q Max
-2.50347 -0.48752 -0.07085 0.53624 2.96516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.677e+04 9.912e-02 -169168 <2e-16 ***
poly(x, 2)1 -1.449e+05 9.912e-01 -146195 <2e-16 ***
poly(x, 2)2 -3.726e+04 9.912e-01 -37588 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9912 on 97 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.139e+10 on 2 and 97 DF, p-value: < 2.2e-16
> summary(m2)
Call:
lm(formula = y ~ poly(x, 2)[, 1] + poly(x, 2)[, 2])
Residuals:
Min 1Q Median 3Q Max
-2.50347 -0.48752 -0.07085 0.53624 2.96516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.677e+04 9.912e-02 -169168 <2e-16 ***
poly(x, 2)[, 1] -1.449e+05 9.912e-01 -146195 <2e-16 ***
poly(x, 2)[, 2] -3.726e+04 9.912e-01 -37588 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9912 on 97 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.139e+10 on 2 and 97 DF, p-value: < 2.2e-16
So m1 and m2 are basically the same. Now let's look at the predictions prd.1 and prd.2
> prd.1
1 2 3 4 5 6
-54811.60 -55863.58 -56925.56 -57997.54 -59079.52 -60171.50
> prd.2
1 2 3 4 5 6
49505.92 39256.72 16812.28 -17827.42 -64662.35 -123692.53
Q1: Why prd.2 is significantly different from prd.1?
Q2: How can I obtain prd.1 using the model m2?
m1 is the right way to do this. m2 is entering a whole world of pain...
To do predictions from m2, the model needs to know it was fitted to an orthogonal set of basis functions, so that it uses the same basis functions for the extrapolated new data values. Compare: poly(1:10,2)[,2] with poly(1:12,2)[,2] - the first ten values are not the same. If you fit the model explicitly with poly(x,2) then predict understands all that and does the right thing.
What you have to do is make sure your predicted locations are transformed using the same set of basis functions as used to create the model in the first place. You can use predict.poly for this (note I call my explanatory variables x1 and x2 so that its easy to match the names up):
px = poly(x,2)
x1 = px[,1]
x2 = px[,2]
m3 = lm(y~x1+x2)
newx = 90:110
pnew = predict(px,newx) # px is the previous poly object, so this calls predict.poly
prd.3 = predict(m3, newdata=data.frame(x1=pnew[,1],x2=pnew[,2]))