Intercept and slope functions in R - r

is there an intercept and slope function in R like there is in excel? I know you can use the function "lm" to run a linear regression but for my purposes it would be much more practical to get the output simply as a number just like using intercept and slope in excel does.

Once you've created your model, you can extract the intercept and slope values from the coefficients matrix within the model. This can be extracted either using the coefficients()/coef() function (these are aliases of the same function), or by extracting the coefficients directly using $coefficient. It's better to use the coefficients() function as this can also be used on models other than lm, and so it is a good habit.
x <- rnorm(100)
y <- 0.5*x + rnorm(100)
mod <- lm(y ~ x)
cf <- coef(mod)
cf will now contain a vector with the (Intercept) and x (a.k.a, the slope). You can then extract these using either numbers:
Intercept <- cf[1]
Slope <- cf[2]
or by their names:
Intercept <- cf["(Intercept)"]
Slope <- cf["x"]
If you're doing multivariable, then it would be advised to use the names, as the order of the output may be unexpected (and again, this is a good habit to get into)

Assuming that the question is asking for intercept and slope functions for the linear model with one independent variable and intercept:
1) mean/cov/var If the idea of the question is to not use lm then try these functions:
slope <- function(x, y) cov(x, y) / var(x)
intercept <- function(x, y) mean(y) - slope(x, y) * mean(x)
To test this use the built-in CO2 data:
coef(lm(uptake ~ conc, CO2))
## (Intercept) conc
## 19.50028981 0.01773059
with(CO2, intercept(conc, uptake))
## [1] 19.50029
with(CO2, slope(conc, uptake))
## [1] 0.01773059
2) lm If it is ok to use lm then:
intercept <- function(x, y) coef(lm(y ~ x))[[1]]
slope <- function(x, y) coef(lm(y ~ x))[[2]]
3) lm.fit Another possibility is to use lm.fit like this:
intercept <- function(x, y) coef(lm.fit(cbind(1, x), y))[[1]]
slope <- function(x, y) coef(lm.fit(cbind(1, x), y))[[2]]

Related

Replace lm coefficients and calculate results of lm new in R

I am able to change the coefficients of my linear model. Then i want to compare the results of my "new" model with the new coefficients, but R is not calculating the results with the new coefficients.
As you can see in my following example the summary of my models fit and fit1 are excactly the same, though results like multiple R-squared should or fitted values should change.
set.seed(2157010) #forgot set.
x1 <- 1998:2011
x2 <- x1 + rnorm(length(x1))
y <- 3*x2 + rnorm(length(x1)) #you had x, not x1 or x2
fit <- lm( y ~ x1 + x2)
# view original coefficients
coef(fit)
# generate second function for comparing results
fit1 <- fit
# replace coefficients with new values, use whole name which is coefficients:
fit1$coefficients[2:3] <- c(5, 1)
# view new coefficents
coef(fit1)
# Comparing
summary(fit)
summary(fit1)
Thanks in advance
It might be easier to compute the multiple R^2 yourself with the substituted parameters.
mult_r2 <- function(beta, y, X) {
tot_ss <- var(y) * (length(y) - 1)
rss <- sum((y - X %*% beta)^2)
1 - rss/tot_ss
}
(or, more compactly, following the comments, you could compute p <- X %*% beta; (cor(y,beta))^2)
mult_r2(coef(fit), y = model.response(model.frame(fit)), X = model.matrix(fit))
## 0.9931179, matches summary()
Now with new coefficients:
new_coef <- coef(fit)
new_coef[2:3] <- c(5,1)
mult_r2(new_coef, y = model.response(model.frame(fit)), X = model.matrix(fit))
## [1] -343917
That last result seems pretty wild, but the substituted coefficients are very different from the true least-squares coeffs, and negative R^2 is possible when the model is bad enough ...

How R calculates the Regression coefficients using lm() function

I wanted to replicate R's calculation on estimation of regression equation on below data:
set.seed(1)
Vec = rnorm(1000, 100, 3)
DF = data.frame(X1 = Vec[-1], X2 = Vec[-length(Vec)])
Below R reports estimates of coefficients
coef(lm(X1~X2, DF)) ### slope = -0.03871511
Then I manually estimate the regression estimate for slope
(sum(DF[,1]*DF[,2])*nrow(DF) - sum(DF[,1])*sum(DF[,2])) / (nrow(DF) * sum(DF[,1]^2) - (sum(DF[,1])^2)) ### -0.03871178
They are close but still are nor matching exactly.
Can you please help me to understand what am I missing here?
Any pointer will be very helpful.
The problem is that X1 and X2 are switched in lm relative to the long formula.
Background
The formula for slope in lm(y ~ x) is the following where x and y each have length n and x is short for x[i] and y is short for y[i] and the summations are over i = 1, 2, ..., n.
Source of the problem
Thus the long formula in the question, also shown in (1) below, corresponds to lm(X2 ~ X1, DF) and not to lm(X1 ~ X2, DF). Either change the formula in the lm model as in (1) below or else change the long formula in the answer by replacing each occurrence of DF[, 1] in the denominator with DF[, 2] as in (2) below.
# (1)
coef(lm(X2 ~ X1, DF))[[2]]
## [1] -0.03871178
(sum(DF[,1]*DF[,2])*nrow(DF) - sum(DF[,1])*sum(DF[,2])) /
(nrow(DF) * sum(DF[,1]^2) - (sum(DF[,1])^2)) # as in question
## [1] -0.03871178
# (2)
coef(lm(X1 ~ X2, DF))[[2]] # as in question
## [1] -0.03871511
(sum(DF[,1]*DF[,2])*nrow(DF) - sum(DF[,1])*sum(DF[,2])) /
(nrow(DF) * sum(DF[,2]^2) - (sum(DF[,2])^2))
## [1] -0.03871511
This is not a StackOverflow question per se, but rather a stats question for the sister site.
The narrow answer is that you can look into the R sources; it generally farms off to LAPACK and BLAS but a key part of the regression calculation is specialised in order to deal (in a statistically, rather than numerical way) with low-rank cases.
Anyway, here, I believe you are 'merely' not adjusting for degrees of freedom correctly which 'almost but not quite' washes out when you use 1000 observations. A simpler case follows, along with a 'simpler' way to calculate the coefficient 'by hand' which also has the advantage of matching:
> set.seed(1)
> Vec <- rnorm(5,100,3)
> DF <- data.frame(X1=Vec[-1], X2=Vec[-length(Vec)])
> coef(lm(X1 ~ X2, DF))[2]
X2
-0.322898
> cov(DF$X1, DF$X2) / var(DF$X2)
[1] -0.322898
>
coef(lm(X1~X2, DF))
# (Intercept) X2
# 103.83714016 -0.03871511
You can apply the formula of coefficients in OLS matrix form as below.
X = cbind(1,DF[,2])
solve(t(X) %*% (X)) %*% t(X)%*% as.matrix(DF[,1])
giving,
# [,1]
#[1,] 103.83714016
#[2,] -0.03871511
which is same with lm() output.
Data:
set.seed(1)
Vec = rnorm(1000, 100, 3)
DF = data.frame(X1 = Vec[-1], X2 = Vec[-length(Vec)])

Is it possible to cache `lm()` matrices to fit new data?

I wrote an algorithm which fits a linear model with lm() and then "updates" the response variable iteratively. The problem is: In a high-dimension scenario, fitting linear models creates a bottleneck.
On the other hand, most of the work required is a matrix inversion that only depends on the covariate matrix X, i.e., the coefficients are given by: solve(t(X) %*% X) %*% X %*% y.
Reading lm() code, I understand that R uses QR decomposition.
Is it possible to recover the internal matrix operation used and fit a new model with new y values faster?
Here's a minimal example:
set.seed(1)
X <- matrix(runif(400*150000), nrow = 150000)
y1 <- runif(150000)
y2 <- runif(150000)
mod1 <- lm(y1 ~ X)
mod2 <- lm(y2 ~ X)
Theoretically, mod2 "repeats" costful matrix operations identical to the ones made in the first lm() call.
I want to keep using lm() for its efficient implementation and ability to handle incomplete rank matrices automatically.
# Data
set.seed(1)
n = 5
X <- matrix(runif(5*n), nrow = n)
y1 <- runif(n)
y2 <- runif(n)
# lm models
mod1 <- lm(y1 ~ X)
mod2 <- lm(y2 ~ X)
# Obtain QR decomposition of X
q = qr(X)
# Reuse 'q' to obtain fitted values repeatedly
mod1_fv = qr.fitted(q, y1)
mod2_fv = qr.fitted(q, y2)
# Compare fitted values from reusing 'q' to fitted values in 'lm' models
Vectorize(all.equal)(unname(mod1$fitted.values), mod1_fv)
#> [1] TRUE TRUE TRUE TRUE TRUE
Vectorize(all.equal)(unname(mod2$fitted.values), mod2_fv)
#> [1] TRUE TRUE TRUE TRUE TRUE
Created on 2019-09-06 by the reprex package (v0.3.0)
Have you tried just fitting a multivariate model? I haven't checked the code, but on my system it's almost half as fast as fitting separately, so I wouldn't be surprised if it's doing what you suggest behind the scenes. That is,
mods <- lm(cbind(y1, y2) ~ X)

Calculating coefficients of bivariate linear regression

Question to be answered
Does anyone know how to solve the attached problem in two lines of code? I believe an as.matrix would work to create a matrix, X, and then use X %*% X, t(X), and solve(X) to get the answer. However, it does not seem to be working. Any answers will help, thanks.
I would recommend using read.csv instead of read.table
It would be useful for you to go over the difference of the two functions in this thread: read.csv vs. read.table
df <- read.csv("http://pengstats.macssa.com/download/rcc/lmdata.csv")
model1 <- lm(y ~ x1 + x2, data = df)
coefficients(model1) # get the coefficients of your regression model1
summary(model1) # get the summary of model1
Based on the answer of #kon_u, here is an example to do it by hands:
df <- read.csv("http://pengstats.macssa.com/download/rcc/lmdata.csv")
model1 <- lm(y ~ x1 + x2, data = df)
coefficients(model1) # get the coefficients of your regression model1
summary(model1) # get the summary of model1
### Based on the formula
X <- cbind(1, df$x1, df$x2) # the column of 1 is to consider the intercept
Y <- df$y
bhat <- solve(t(X) %*% X) %*% t(X) %*% Y # coefficients
bhat # Note that we got the same coefficients with the lm function

r loess: coefficients of global "parametric" terms

Is there a way how I can extract coefficients of globally fitted terms in local regression modeling?
Maybe I do misunderstand the role of globally fitted terms in the function loess, but what I would like to have is the following:
# baseline:
x <- sin(seq(0.2,0.6,length.out=100)*pi)
# noise:
x_noise <- rnorm(length(x),0,0.1)
# known structure:
x_1 <- sin(seq(5,20,length.out=100))
# signal:
y <- x + x_1*0.25 + x_noise
# fit loess model:
x_seq <- seq_along(x)
mod <- loess(y ~ x_seq + x_1,parametric="x_1")
The fit is done perfectly, however, how can I extract the estimated value of the globally fitted term x_1 (i.e. some value near 0.25 for the example above)?
Finally, I found a solution to my problem using the function gam from the package gam:
require(gam)
mod2 <- gam(y ~ lo(x_seq,span=0.75,degree=2) + x_1)
However, the fits from the two models are not exactly the same (which might be due to different control settings?)...

Resources