Linear Regression with a known fixed intercept in R - r

I want to calculate a linear regression using the lm() function in R. Additionally I want to get the slope of a regression, where I explicitly give the intercept to lm().
I found an example on the internet and I tried to read the R-help "?lm" (unfortunately I'm not able to understand it), but I did not succeed. Can anyone tell me where my mistake is?
lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
plot (lin$x, lin$y)
regImp = lm(formula = lin$x ~ lin$y)
abline(regImp, col="blue")
# Does not work:
# Use 1 as intercept
explicitIntercept = rep(1, length(lin$x))
regExp = lm(formula = lin$x ~ lin$y + explicitIntercept)
abline(regExp, col="green")
Thanls for your help.

You could subtract the explicit intercept from the regressand and then fit the intercept-free model:
> intercept <- 1.0
> fit <- lm(I(x - intercept) ~ 0 + y, lin)
> summary(fit)
The 0 + suppresses the fitting of the intercept by lm.
edit To plot the fit, use
> abline(intercept, coef(fit))
P.S. The variables in your model look the wrong way round: it's usually y ~ x, not x ~ y (i.e. the regressand should go on the left and the regressor(s) on the right).

I see that you have accepted a solution using I(). I had thought that an offset() based solution would have been more obvious, but tastes vary and after working through the offset solution I can appreciate the economy of the I() solution:
with(lin, plot(y,x) )
lm_shift_up <- lm(x ~ y +0 +
offset(rep(1, nrow(lin))),
data=lin)
abline(1,coef(lm_shift_up))

I have used both offset and I(). I also find offset easier to work with (like BondedDust) since you can set your intercept.
Assuming Intercept is 10.
plot (lin$x, lin$y)
fit <-lm(lin$y~0 +lin$x,offset=rep(10,length(lin$x)))
abline(fit,col="blue")

Related

`gam` package: extra shift spotted when sketching data on `plot.gam`

I try to fit a GAM using the gam package (I know mgcv is more flexible, but I need to use gam here). I now have the problem that the model looks good, but in comparison with the original data it seems to be offset along the y-axis by a constant value, for which I cannot figure out where this comes from.
This code reproduces the problem:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <- gam(y ~ s(x,6))
fit$coefficients
#(Intercept) s(x, 6)
# 1.921819 -2.318771
plot(fit, ylim = range(y))
points(x, y)
points(x, y -1.921819, col=2)
legend("topright", pch=1, col=1:2, legend=c("Original", "Minus intercept"))
Chambers, J. M. and Hastie, T. J. (1993) Statistical Models in S (Chapman & Hall) shows that there should not be an offset, and this is also intuitively correct (the smooth should describe the data).
I noticed something comparable in mgcv, which can be solved by providing the shift parameter with the intercept value of the model (because the smooth is seemingly centred). I thought the same could be true here, so I subtracted the intercept from the original data-points. However, the plot above shows this idea wrong. I don't know where the extra shift comes from. I hope someone here may be able to help me.
(R version. 3.3.1; gam version 1.12)
I think I should first explain various output in the fitted GAM model:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <-gam(y ~ s(x,6), model = FALSE)
## coefficients for parametric part
## this includes intercept and null space of spline
beta <- coef(fit)
## null space of spline smooth (a linear term, just `x`)
nullspace <- fit$smooth.frame[,1]
nullspace - x ## all 0
## smooth space that are penalized
## note, the backfitting procedure guarantees that this is centred
pensmooth <- fit$smooth[,1]
sum(pensmooth) ## centred
# [1] 5.89806e-17
## estimated smooth function (null space + penalized space)
smooth <- nullspace * beta[2] + pensmooth
## centred smooth function (this is what `plot.gam` is going to plot)
c0 <- mean(smooth)
censmooth <- smooth - c0
## additive predictors (this is just fitted values in Gaussian case)
addpred <- beta[1] + smooth
You can first verify that addpred is what fit$additive.predictors gives, and since we are fitting additive models with Gaussian response, this is also as same as fit$fitted.values.
What plot.gam does, is to plot censmooth:
plot.gam(fit, col = 4, ylim = c(-1.5,1.5))
points(x, censmooth, col = "gray")
Remember, there is
addpred = beta[0] + censmooth + c0
If you want to shift original data y to match this plot, you not only need to subtract intercept (beta[0]), but also c0 from y:
points(x, y - beta[1] - c0)

`rms::ols()`: how to fit a model without intercept

I'd like to use the ols() (ordinary least squares) function from the rms package to do a multivariate linear regression, but I would not like it to calculate the intercept. Using lm() the syntax would be like:
model <- lm(formula = z ~ 0 + x + y, data = myData)
where the 0 stops it from calculating an intercept, and only two coefficients are returned, on for x and the other for y. How do I do this when using ols()?
Trying
model <- ols(formula = z ~ 0 + x + y, data = myData)
did not work, it still returns an intercept and a coefficient each for x and y.
Here is a link to a csv file
It has five columns. For this example, can only use the first three columns:
model <- ols(formula = CorrEn ~ intEn_anti_ncp + intEn_par_ncp, data = ccd)
Thanks!
rms::ols uses rms:::Design instead of model.frame.default. Design is called with the default of intercept = 1, so there is no (obvious) way to specify that there is no intercept. I assume there is a good reason for this, but you can try changing ols using trace.

R: How to get rid of .lin in plinear nls

Explanation
I am trying to fit an exponential curve to data in form theta = x0 * exp(-kappa*l).
I do it firstly with linear = lm( I(-log(temp.theta/x0)) ~ l + 0 ) where I get coefficient (k = coef(linear)) and then with nls(temp.theta ~ I(x0 * exp(-k*l)) + 0, algorithm = "plinear" , start = list(k=k)) because I am not sure whether errors have the right nature with the lm().
That decision came from reading a few Q&A's at stats.stackexchange about models where they discussed additive vs. multiplicative noise (=> error estimates?), which I haven't quite understand as I have just really basic knowledge of statistics. And since lm() and nls() give me different error estimates I intuitively think the latter could be more accurate.
The problem is the nls(... , algorithm="plinear") produces the coefficient which I want, but also the .lin thing which I understand to be multiplying the whole right side of the equation and hence messing up my model as it has sense only with intercept at x0.
Questions
Is there a way to set .lin = 1 or somehow turn it off?
Or alternatively: Is the lm() model sufficient for getting me reasonable error estimation?
Reproducible example
(sorry for not including one right away, I thought it's better to ask in an abstract form):
l = c(0.001 , 0.002 , 0.003 , 0.004 , 0.005)
temp.theta = c(84.405 , 70.265 , 58.689 , 49.428 , 41.188)
x0 = 100
temp.lm = lm( I(-log(temp.theta/x0)) ~ l + 0 )
k=coef(temp.lm)
temp.nls = nls(temp.theta ~ I(x0 * exp(-k*l)) + 0, algorithm="plinear", start=list(k=k))
kappa=coef(temp.nls)
kappa
Regarding the nls model it seems that the desired model has no linear components since x0 is fixed so there is no reason to use plinear in the first place:
temp2.nls <- nls(temp.theta ~ x0 * exp(-k*l), start=list(k=k))
Regarding whether lm or nls is better have a look at the residuals. Looking at the plots of the residuals, the residual of the first point seems to stick out suggesting it may not follow either model; however, with only 5 points we can't really say too much.
plot(resid(temp.lm), pch = 20, cex = 2, main = "lm Residuals")
plot(resid(temp2.nls), pch = 20, cex = 2, main = "nls Residuals")

change null hypothesis in lmtest in R

I have a linear model generated using lm. I use the coeftest function in the package lmtest go test a hypothesis with my desired vcov from the sandwich package. The default null hypothesis is beta = 0. What if I want to test beta = 1, for example. I know I can simply take the estimated coefficient, subtract 1 and divide by the provided standard error to get the t-stat for my hypothesis. However, there must be functionality for this already in R. What is the right way to do this?
MWE:
require(lmtest)
require(sandwich)
set.seed(123)
x = 1:10
y = x + rnorm(10)
mdl = lm(y ~ x)
z = coeftest(mdl, df=Inf, vcov=NeweyWest)
b = z[2,1]
se = z[2,2]
mytstat = (b-1)/se
print(mytstat)
the formally correct way to do this:
require(multcomp)
zed = glht(model=mdl, linfct=matrix(c(0,1), nrow=1, ncol=2), rhs=1, alternative="two.sided", vcov.=NeweyWest)
summary(zed)
Use an offset of -1*x
mdl<-lm(y~x)
mdl2 <- lm(y ~ x-offset(x) )
> mdl
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
0.5255 0.9180
> mdl2
Call:
lm(formula = y ~ x - offset(x))
Coefficients:
(Intercept) x
0.52547 -0.08197
You can look at summary(mdl2) to see the p-value (and it is the same as in mdl.
As far as I know, there is no default function to test the model coefficients against arbitrary value (1 in your case). There is the offset trick presented in the other answer, but it's not that straightforward (and always be careful with such model modifications). So, your expression (b-1)/se is actually a good way to do it.
I have two notes on your code:
You can use summary(mdl) to get the t-test for 0.
You are using lmtest with covariance structure (which will change the t-test values), but your original lm model doesn't have it. Perhaps this could be a problem? Perhaps you should use glm and specify the correlation structure from the start.

MCMClogit confusion

Could anybody explain to me why
simulatedCase <- rbinom(100,1,0.5)
simDf <- data.frame(CASE = simulatedCase)
posterior_m0 <<- MCMClogit(CASE ~ 1, data = simDf, b0 = 0, B0 = 1)
always results in a MCMC acceptance ratio of 0? Any explanation would be greatly appreciated!
I think your problem is the model formula, since logistic regression models have no error term. Thus you model CASE ~ 1 should be replaced by something like CASE ~ x (the predictor variable x is mandatory). Here is your example, modified:
CASE <- rbinom(100,1,0.5)
x <- 1:100
posterior_m0 <- MCMClogit (CASE ~ x, b0 = 0, B0 = 1)
classic_m0 <- glm (CASE ~ x, family=binomial(link="logit"), na.action=na.pass)
So I think your problem is not related to the MCMCpack library (disclaimer: I have never used this package).
For anyone stumbling into this same problem :
It seems that the MCMClogit function cannot handle anything but B0=0 if your model only has an intercept.
If you add a covariate, then you can specify a precision just fine.
I would consider other packages (such as arm or rjags) if you really want to sample from this model. For a list of options available for Bayesian regression, see http://cran.r-project.org/web/views/Bayesian.html

Resources