Fit an exponential decay model in R - r

I am very new to R and I appreciate the help
I have some data that looks like this.
Y is negatively correlated with X, in a nonlinear way. It seems to be approximated by a formula of the following form y=1+ax where a<1.
If I wanted to fit that data in R to find a what function would I use? NLS?

Next time please provide test data. We have done it for you this time. Then we use nls as shown.
set.seed(123)
# generate test data
n <- 35
x <- 1:n
a <- 0.5
y <- 1 + a^x + rnorm(n, 0, .01)
fm <- nls(y ~ 1+a^x, start = list(a = mean((y-1)^(1/x), na.rm = TRUE)))
fm
giving:
Nonlinear regression model
model: y ~ 1 + a^x
data: parent.frame()
a
0.5025
residual sum-of-squares: 0.003031
Number of iterations to convergence: 5
Achieved convergence tolerance: 1.346e-06
Plot
plot(y ~ x)
lines(fitted(fm) ~ x, col = "red")

Related

Replace lm coefficients and calculate results of lm new in R

I am able to change the coefficients of my linear model. Then i want to compare the results of my "new" model with the new coefficients, but R is not calculating the results with the new coefficients.
As you can see in my following example the summary of my models fit and fit1 are excactly the same, though results like multiple R-squared should or fitted values should change.
set.seed(2157010) #forgot set.
x1 <- 1998:2011
x2 <- x1 + rnorm(length(x1))
y <- 3*x2 + rnorm(length(x1)) #you had x, not x1 or x2
fit <- lm( y ~ x1 + x2)
# view original coefficients
coef(fit)
# generate second function for comparing results
fit1 <- fit
# replace coefficients with new values, use whole name which is coefficients:
fit1$coefficients[2:3] <- c(5, 1)
# view new coefficents
coef(fit1)
# Comparing
summary(fit)
summary(fit1)
Thanks in advance
It might be easier to compute the multiple R^2 yourself with the substituted parameters.
mult_r2 <- function(beta, y, X) {
tot_ss <- var(y) * (length(y) - 1)
rss <- sum((y - X %*% beta)^2)
1 - rss/tot_ss
}
(or, more compactly, following the comments, you could compute p <- X %*% beta; (cor(y,beta))^2)
mult_r2(coef(fit), y = model.response(model.frame(fit)), X = model.matrix(fit))
## 0.9931179, matches summary()
Now with new coefficients:
new_coef <- coef(fit)
new_coef[2:3] <- c(5,1)
mult_r2(new_coef, y = model.response(model.frame(fit)), X = model.matrix(fit))
## [1] -343917
That last result seems pretty wild, but the substituted coefficients are very different from the true least-squares coeffs, and negative R^2 is possible when the model is bad enough ...

Simulate data from regression model with exact parameters in R

How can I simulate data so that the coefficients recovered by lm are determined to be particular pre-determined values and have normally distributed residuals? For example, could I generate data so that lm(y ~ 1 + x) will yield (Intercept) = 1.500 and x = 4.000? I would like the solution to be versatile enough to work for multiple regression with continuous x (e.g., lm(y ~ 1 + x1 + x2)) but there are bonus points if it works for interactions as well (lm(y ~ 1 + x1 + x2 + x1*x2)). Also, it should work for small N (e.g., N < 200).
I know how to simulate random data which is generated by these parameters (see e.g. here), but that randomness carries over to variation in the estimated coefficients, e.g., Intercept = 1.488 and x = 4.067.
Related: It is possible to generate data that yields pre-determined correlation coefficients (see here and here). So I'm asking if this can be done for multiple regression?
One approach is to use a perfectly symmetrical noise. The noise cancels itself so the estimated parameters are exactly the input parameters, yet the residuals appear normally distributed.
x <- 1:100
y <- cbind(1,x) %*% c(1.5, 4)
eps <- rnorm(100)
x <- c(x, x)
y <- c(y + eps, y - eps)
fit <- lm(y ~ x)
# (Intercept) x
# 1.5 4.0
plot(fit)
Residuals are normally distributed...
... but exhibit an anormally perfect symmetry!
EDIT by OP: I wrote up a general-purpose code exploiting the symmetrical-residuals trick. It scales well with more complex models. This example also shows that it works for categorical predictors and interaction effects.
library(dplyr)
# Data and residuals
df = tibble(
# Predictors
x1 = 1:100, # Continuous
x2 = rep(c(0, 1), each=50), # Dummy-coded categorical
# Generate y from model, including interaction term
y_model = 1.5 + 4 * x1 - 2.1 * x2 + 8.76543 * x1 * x2,
noise = rnorm(100) # Residuals
)
# Do the symmetrical-residuals trick
# This is copy-and-paste ready, no matter model complexity.
df = bind_rows(
df %>% mutate(y = y_model + noise),
df %>% mutate(y = y_model - noise) # Mirrored
)
# Check that it works
fit <- lm(y ~ x1 + x2 + x1*x2, df)
coef(fit)
# (Intercept) x1 x2 x1:x2
# 1.50000 4.00000 -2.10000 8.76543
You could do rejection sampling:
set.seed(42)
tol <- 1e-8
x <- 1:100
continue <- TRUE
while(continue) {
y <- cbind(1,x) %*% c(1.5, 4) + rnorm(length(x))
if (sum((coef(lm(y ~ x)) - c(1.5, 4))^2) < tol) continue <- FALSE
}
coef(lm(y ~ x))
#(Intercept) x
# 1.500013 4.000023
Obviously, this is a brute-force approach and the smaller the tolerance and the more complex the model, the longer this will take. A more efficient approach should be possible by providing residuals as input and then employing some matrix algebra to calculate y values. But that's more of a maths question ...

force given coefficients in lm()

I am currently trying to fit a polynomial model to measurement data using lm().
fit_poly4 <- lm(y ~ poly(x, degree = 4, raw = T), weights = w)
with x as independent, y as dependent variable and w = 1/variance of the measurements.
I want to try a polynomial with given coefficients instead of the ones determined by R. Specifically I want my polynomial to be
y = -3,3583*x^4 + 43*x^3 - 191,14*x^2 + 328,2*x - 137,7
I tried to enter it as
fit_poly4 <- lm(y ~ 328.2*x-191.14*I(x^2)+43*I(x^3)-3.3583*I(x^4)-137.3,
weights = w)
but this just returns an error:
Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars
Is there a way to determine the coefficients in lm() and how would one do this?
I'm not sure why you want to do this, but you can use an offset term:
set.seed(101)
dd <- data.frame(x=rnorm(1000),y=rnorm(1000), w = rlnorm(1000))
fit_poly4 <- lm(y ~
-1 + offset(328.2*x-191.14*I(x^2)+43*I(x^3)-3.3583*I(x^4)-137.3),
data=dd,
weights = w)
the -1 suppresses the usual intercept term.

heteroscedasticity: weights in lm function in R

I am confused. I have the following model: lm(GAV ~ EMPLOYED). This model has heteroscedasticity, and I believe the error standard deviation of this model can be approximated by a variable called SDL.
I have fitted the corresponding weighted model, resulting after dividing each term by variable SDL, using two forms:
lm(I(GAV/SDL) ~ I(1/SDL) + I(EMPLOYED/SDL)-1)
And
lm(GAV ~EMPLOYED,weights = 1/SDL)
I thought they would yield the same results. However, I get different parameters estimates...
Can anyone show me the error I am making?
Thanks in advance!
Fede
help("lm") clearly explains:
weighted least squares is used with weights weights (that is,
minimizing sum(w*e^2));
So:
x <- 1:10
set.seed(42)
w <- sample(10)
y <- 1 + 2 * x + rnorm(10, sd = sqrt(w))
lm(y ~ x, weights = 1/w)
#Call:
# lm(formula = y ~ x, weights = 1/w)
#
#Coefficients:
#(Intercept) x
# 3.715 1.643
lm(I(y/w^0.5) ~ I(1/w^0.5) + I(x/w^0.5) - 1)
#Call:
# lm(formula = I(y/w^0.5) ~ I(1/w^0.5) + I(x/w^0.5) - 1)
#
#Coefficients:
#I(1/w^0.5) I(x/w^0.5)
# 3.715 1.643
Btw., you might be interested in library(nlme); help("gls"). It offers more sophisticated possibilities for modelling heteroscedasticity.

Equal and opposite slopes in segmented package

Hi I am trying to use a segmented package in R to fit a piecewise linear regression model to estimate break point in my data. I have used the following code to get this graph.
library(segmented)
set.seed(5)
x <- c(1:10, 13:22)
y <- numeric(20)
## Create first segment
y[1:10] <- 20:11 + rnorm(10, 0, 1.5)
## Create second segment
y[11:20] <- seq(11, 15, len=10) + rnorm(10, 0, 1.5)
## fitting a linear model
lin.mod <- lm(y~x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=15)
summary(segmented.mod)
plot(x,y, pch=".",cex=4,xlab="x",ylab="y")
plot(segmented.mod, add=T, lwd = 3,col = "red")
My theoretical calculations suggests that the slopes of the two lines about the breakpoint should be equal in magnitude but opposite in sign. I am beginner to lm and glms. I was hoping if there is a way to estimate breakpoints with slopes constrained by the relation, slope1=-slope2
enter image description here
This is not supported in the segmented package.
nls2 with "plinear-brute" algorithm could be used. In the output .lin1 and .lin2 are the constant term and the slope respectively. This tries each value in the range of x as a possible bp fitting a linear regression to each.
library(nls2)
st <- data.frame(bp = seq(min(x), max(x)))
nls2(y ~ cbind(1, abs(x - bp)), start = st, alg = "plinear-brute")
giving:
Nonlinear regression model
model: y ~ cbind(1, abs(x - bp))
data: parent.frame()
bp .lin1 .lin2
14.000000 9.500457 0.709624
residual sum-of-squares: 45.84213
Number of iterations to convergence: 22
Achieved convergence tolerance: NA
Here is another example which may clarify this since it generates the data from the same model as fit:
library(nls2)
set.seed(123)
n <- 100
bp <- 25
x <- 1:n
y <- rnorm(n, 10 + 2 * abs(x - bp))
st <- data.frame(bp = seq(min(x), max(x)))
fm <- nls2(y ~ cbind(1, abs(x - bp)), start = st, alg = "plinear-brute")
giving:
> fm
Nonlinear regression model
model: y ~ cbind(1, abs(x - bp))
data: parent.frame()
bp .lin1 .lin2
25.000 9.935 2.005
residual sum-of-squares: 81.29
Number of iterations to convergence: 100
Achieved convergence tolerance: NA
Note: In the above we assumed that bp is an integer in the range of x but we can relax that if such condition is not desired by using the result of nls2 as the starting value of an nls optimization, i.e. nls(y ~ cbind(1, abs(x - bp)), start = coef(fm)[1], alg = "plinear") .

Resources