force given coefficients in lm() - r

I am currently trying to fit a polynomial model to measurement data using lm().
fit_poly4 <- lm(y ~ poly(x, degree = 4, raw = T), weights = w)
with x as independent, y as dependent variable and w = 1/variance of the measurements.
I want to try a polynomial with given coefficients instead of the ones determined by R. Specifically I want my polynomial to be
y = -3,3583*x^4 + 43*x^3 - 191,14*x^2 + 328,2*x - 137,7
I tried to enter it as
fit_poly4 <- lm(y ~ 328.2*x-191.14*I(x^2)+43*I(x^3)-3.3583*I(x^4)-137.3,
weights = w)
but this just returns an error:
Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars
Is there a way to determine the coefficients in lm() and how would one do this?

I'm not sure why you want to do this, but you can use an offset term:
set.seed(101)
dd <- data.frame(x=rnorm(1000),y=rnorm(1000), w = rlnorm(1000))
fit_poly4 <- lm(y ~
-1 + offset(328.2*x-191.14*I(x^2)+43*I(x^3)-3.3583*I(x^4)-137.3),
data=dd,
weights = w)
the -1 suppresses the usual intercept term.

Related

how do i write this exponential model as a function in R?

For y = B0 + B1x, I can write it as lm(y ~ x). However I am not sure how to write y = B0eB1x into a model function in R.
I have tried lm(log(y) ~ x), lm(y ~ exp(x)), lm(y ~ log(x)), and lm(log(y) ~ log(x)), but I am not sure which is correct. I get different results for each model.
The two ways that you can do this that are actually faithful to the original statistical model (Gaussian errors with constant variance) are:
glm(y ~ x, family = gaussian(link = "log"), data = ...)
(but you'll have to exponentiate the intercept parameter) or
nls(y ~ b0*exp(b1*x), start = ..., data = ...)
(but you'll have to provide starting values in the form list(b0 = 1, b1 = 1) (for some sensible values).
y = b0*exp(b1*x) implies log(y) = log(b0) + b1*x, but transforming the response variable in this way will change the statistical model ... so lm(log(y) ~ x, data = ...) will give you similar but not identical answers to the preceding two recipes.

Syntax for three-piece segmented regression using NLS in R when concave

My goal is to fit a three-piece (i.e., two break-point) regression model to make predictions using propagate's predictNLS function, making sure to define knots as parameters, but my model formula seems off.
I've used the segmented package to estimate the breakpoint locations (used as starting values in NLS), but would like to keep my models in the NLS format, specifically, nlsLM {minipack.lm} because I am fitting other types of curves to my data using NLS, want to allow NLS to optimize the knot values, am sometimes using variable weights, and need to be able to easily calculate the Monte Carlo confidence intervals from propagate. Though I'm very close to having the right syntax for the formula, I'm not getting the expected/required behaviour near the breakpoint(s). The segments SHOULD meet directly at the breakpoints (without any jumps), but at least on this data, I'm getting a weird local minimum at the breakpoint (see plots below).
Below is an example of my data and general process. I believe my issue to be in the NLS formula.
library(minpack.lm)
library(segmented)
y <- c(-3.99448113, -3.82447011, -3.65447803, -3.48447030, -3.31447855, -3.14448753, -2.97447972, -2.80448401, -2.63448380, -2.46448069, -2.29448796, -2.12448912, -1.95448783, -1.78448797, -1.61448563, -1.44448719, -1.27448469, -1.10448651, -0.93448525, -0.76448637, -0.59448626, -0.42448586, -0.25448588, -0.08448548, 0.08551417, 0.25551393, 0.42551411, 0.59551395, 0.76551389, 0.93551398)
x <- c(61586.1711, 60330.5550, 54219.9925, 50927.5381, 48402.8700, 45661.9175, 37375.6023, 33249.1248, 30808.6131, 28378.6508, 22533.3782, 13901.0882, 11716.5669, 11004.7305, 10340.3429, 9587.7994, 8736.3200, 8372.1482, 8074.3709, 7788.1847, 7499.6721, 7204.3168, 6870.8192, 6413.0828, 5523.8097, 3961.6114, 3460.0913, 2907.8614, 2016.1158, 452.8841)
df<- data.frame(x,y)
#Use Segmented to get estimates for parameters with 2 breakpoints
my.seg2 <- segmented(lm(y ~ x, data = df), seg.Z = ~ x, npsi = 2)
#extract knot, intercept, and coefficient values to use as NLS start points
my.knot1 <- my.seg2$psi[1,2]
my.knot2 <- my.seg2$psi[2,2]
my.m_2 <- slope(my.seg2)$x[1,1]
my.b1 <- my.seg2$coefficients[[1]]
my.b2 <- my.seg2$coefficients[[2]]
my.b3 <- my.seg2$coefficients[[3]]
#Fit a NLS model to ~replicate segmented model. Presumably my model formula is where the problem lies
my.model <- nlsLM(y~m*x+b+(b2*(ifelse(x>=knot1&x<=knot2,1,0)*(x-knot1))+(b3*ifelse(x>knot2,1,0)*(x-knot2-knot1))),data=df, start = c(m = my.m_2, b = my.b1, b2 = my.b2, b3 = my.b3, knot1 = my.knot1, knot2 = my.knot2))
How it should look
plot(my.seg2)
How it does look
plot(x, y)
lines(x=x, y=predict(my.model), col='black', lty = 1, lwd = 1)
I was pretty sure I had it "right", but when the 95% confidence intervals are plotted with the line and prediction resolution (e.g., the density of x points) is increased, things seem dramatically incorrect.
Thank you all for your help.
Define g to be a grouping vector having the same length as x which takes on values 1, 2, 3 for the 3 sections of the X axis and create an nls model from these. The resulting plot looks ok.
my.knots <- c(my.knot1, my.knot2)
g <- cut(x, c(-Inf, my.knots, Inf), label = FALSE)
fm <- nls(y ~ a[g] + b[g] * x, df, start = list(a = c(1, 1, 1), b = c(1, 1, 1)))
plot(y ~ x, df)
lines(fitted(fm) ~ x, df, col = "red")
(continued after graph)
Constraints
Although the above looks ok and may be sufficient it does not guarantee that the segments intersect at the knots. To do that we must impose the constraints that both sides are equal at the knots:
a[2] + b[2] * my.knots[1] = a[1] + b[1] * my.knots[1]
a[3] + b[3] * my.knots[2] = a[2] + b[2] * my.knots[2]
so
a[2] = a[1] + (b[1] - b[2]) * my.knots[1]
a[3] = a[2] + (b[2] - b[3]) * my.knots[2]
= a[1] + (b[1] - b[2]) * my.knots[1] + (b[2] - b[3]) * my.knots[2]
giving:
# returns a vector of the three a values
avals <- function(a1, b) unname(cumsum(c(a1, -diff(b) * my.knots)))
fm2 <- nls(y ~ avals(a1, b)[g] + b[g] * x, df, start = list(a1 = 1, b = c(1, 1, 1)))
To get the three a values we can use:
co <- coef(fm2)
avals(co[1], co[-1])
To get the residual sum of squares:
deviance(fm2)
## [1] 0.193077
Polynomial
Although it involves a large number of parameters, a polynomial fit could be used in place of the segmented linear regression. A 12th degree polynomial involves 13 parameters but has a lower residual sum of squares than the segmented linear regression. A lower degree could be used with corresponding increase in residual sum of squares. A 7th degree polynomial involves 8 parameters and visually looks not too bad although it has a higher residual sum of squares.
fm12 <- nls(y ~ cbind(1, poly(x, 12)) %*% b, df, start = list(b = rep(1, 13)))
deviance(fm12)
## [1] 0.1899218
It may, in part, reflect a limitation in segmented. segmented returns a single change point value without quantifying the associated uncertainty. Redoing the analysis using mcp which returns Bayesian posteriors, we see that the second change point is bimodally distributed:
library(mcp)
model = list(
y ~ 1 + x, # Intercept + slope in first segment
~ 0 + x, # Only slope changes in the next segments
~ 0 + x
)
# Fit it with a large number of samples and plot the change point posteriors
fit = mcp(model, data = data.frame(x, y), iter = 50000, adapt = 10000)
plot_pars(fit, regex_pars = "^cp*", type = "dens_overlay")
FYI, mcp can plot credible intervals as well (the red dashed lines):
plot(fit, q_fit = TRUE)

heteroscedasticity: weights in lm function in R

I am confused. I have the following model: lm(GAV ~ EMPLOYED). This model has heteroscedasticity, and I believe the error standard deviation of this model can be approximated by a variable called SDL.
I have fitted the corresponding weighted model, resulting after dividing each term by variable SDL, using two forms:
lm(I(GAV/SDL) ~ I(1/SDL) + I(EMPLOYED/SDL)-1)
And
lm(GAV ~EMPLOYED,weights = 1/SDL)
I thought they would yield the same results. However, I get different parameters estimates...
Can anyone show me the error I am making?
Thanks in advance!
Fede
help("lm") clearly explains:
weighted least squares is used with weights weights (that is,
minimizing sum(w*e^2));
So:
x <- 1:10
set.seed(42)
w <- sample(10)
y <- 1 + 2 * x + rnorm(10, sd = sqrt(w))
lm(y ~ x, weights = 1/w)
#Call:
# lm(formula = y ~ x, weights = 1/w)
#
#Coefficients:
#(Intercept) x
# 3.715 1.643
lm(I(y/w^0.5) ~ I(1/w^0.5) + I(x/w^0.5) - 1)
#Call:
# lm(formula = I(y/w^0.5) ~ I(1/w^0.5) + I(x/w^0.5) - 1)
#
#Coefficients:
#I(1/w^0.5) I(x/w^0.5)
# 3.715 1.643
Btw., you might be interested in library(nlme); help("gls"). It offers more sophisticated possibilities for modelling heteroscedasticity.

`rms::ols()`: how to fit a model without intercept

I'd like to use the ols() (ordinary least squares) function from the rms package to do a multivariate linear regression, but I would not like it to calculate the intercept. Using lm() the syntax would be like:
model <- lm(formula = z ~ 0 + x + y, data = myData)
where the 0 stops it from calculating an intercept, and only two coefficients are returned, on for x and the other for y. How do I do this when using ols()?
Trying
model <- ols(formula = z ~ 0 + x + y, data = myData)
did not work, it still returns an intercept and a coefficient each for x and y.
Here is a link to a csv file
It has five columns. For this example, can only use the first three columns:
model <- ols(formula = CorrEn ~ intEn_anti_ncp + intEn_par_ncp, data = ccd)
Thanks!
rms::ols uses rms:::Design instead of model.frame.default. Design is called with the default of intercept = 1, so there is no (obvious) way to specify that there is no intercept. I assume there is a good reason for this, but you can try changing ols using trace.

Use of offset in lm regression - R

I have this code
dens <- read.table('DensPiu.csv', header = FALSE)
fl <- read.table('FluxPiu.csv', header = FALSE)
mydata <- data.frame(c(dens),c(fl))
dat = subset(mydata, dens>=3.15)
colnames(dat) <- c("x", "y")
attach(dat)
and I would like to do a least-square regression on the data contained in dat, the function has the form
y ~ a + b*x
and I want the regression line to pass through a specific point P(x0,y0) (which is not the origin).
I'm trying to do it like this
x0 <- 3.15
y0 <-283.56
regression <- lm(y ~ I(x-x0)-1, offset=y0)
(I think that data = dat is not necessary in this case) but I get this error :
Error in model.frame.default(formula = y ~ I(x - x0) - 1, : variable
lengths differ (found for '(offset)').
I don't know why. I guess that I haven't defined correctly the offset value but I couldn't find any example online.
Could someone explain to me how offset works, please?
Your offset term has to be a variable, like x and y, not a numeric constant. So you need to create a column in your dataset with the appropriate values.
dat$o <- 283.56
lm(y ~ I(x - x0) - 1, data=dat, offset=o)
In fact, the real issue here is that you should specify offset with a vector whose length is the same as the number of rows (or the length, if data is composed as a vector) of your data. The following code will do your job as expected:
regression <- lm(y ~ I(x-x0)-1, offset = rep(y0, length(y)))
Here is a good explanation for those who are interested:
http://rfunction.com/archives/223

Resources