R nls: fitting a curve to data - r

I'm having trouble finding the right curve to fit to my data. If someone more knowledgeable than me has an idea/solution for a better fitting curve I would be really grateful.
Data: The aim is to predict x from y
dat <- data.frame(x = c(15,25,50,100,150,200,300,400,500,700,850,1000,1500),
y = c(43,45.16,47.41,53.74,59.66,65.19,76.4,86.12,92.97,
103.15,106.34,108.21,113) )
This is how far I've come:
model <- nls(x ~ a * exp( (log(2) / b ) * y),
data = dat, start = list(a = 1, b = 15 ), trace = T)
Which is not a great fit:
dat$pred <- predict(model, list(y = dat$y))
plot( dat$y, dat$x, type = 'o', lty = 2)
points( dat$y, dat$pred, type = 'o', col = 'red')
Thanks, F

Predicting x from y a 5th degree polynomial is not so parsimonius but does seem to fit:
fm <- lm(x ~ poly(y, 5), dat)
plot(x ~ y, dat)
lines(fitted(fm) ~ y, dat)
(continued after plot)
You could also consider the UCRS.5b model of the drc package:
library(drc)
fm <- drm(x ~ y, data = dat, fct = UCRS.5b())
plot(fm)
Note: Originally, I assumed you wanted to predict y from x and had written the answer below.
A cubic looks pretty good:
plot(y ~ x, dat)
fm <- lm(y ~ poly(x, 3), dat)
lines(fitted(fm) ~ x, dat)
(continued after plot)
A 4 parameter logistic also looks good:
library(drc)
fm <- drm(y ~ x, data = dat, fct = LL.4())
plot(fm)

Related

How to fit Gaussian distribution with one-sided data?

x <- c(-3,-2.5,-2,-1.5,-1,-0.5)
y <- c(2,2.5,2.6,2.9,3.2,3.3)
The challenge is that the entire data is from the left slope, how to generate a two-sided Gaussian Distribution?
There is incomplete information with regards to the question. Hence several ways can be implemented. NOTE that the data is insufficient. ie trying fitting tis by nls does not work.
Here is one way to tackle it:
f <- function(par, x, y )sum((y - par[3]*dnorm(x,par[1],par[2]))^2)
a <- optim(c(0, 1, 1), f, x = x, y = y)$par
plot(x, y, xlim = c(-3,3.5), ylim = c(2, 3.5))
curve(dnorm(x, a[1], a[2])*a[3], add = TRUE, col = 2)
There is no way to fit a Gaussian distribution with these densities. If correct y-values had been provided this would be one way of solving the problem:
# Define function to be optimized
f <- function(pars, x, y){
mu <- pars[1]
sigma <- pars[2]
y_hat <- dnorm(x, mu, sigma)
se <- (y - y_hat)^2
sum(se)
}
# Define the data
x <- c(-3,-2.5,-2,-1.5,-1,-0.5)
y <- c(2,2.5,2.6,2.9,3.2,3.3)
# Find the best paramters
opt <- optim(c(-.5, .1), f, 'SANN', x = x, y = y)
plot(
seq(-5, 5, length.out = 200),
dnorm(seq(-5, 5, length.out = 200), opt$par[1], opt$par[2]), type = 'l', col = 'red'
)
points(c(-3,-2.5,-2,-1.5,-1,-0.5), c(2,2.5,2.6,2.9,3.2,3.3))
Use nls to get a least squares fit of y to .lin.a * dnorm(x, b, c) where .lin.a, b and c are parameters to be estimated.
fm <- nls(y ~ cbind(a = dnorm(x, b, c)),
start = list(b = mean(x), c = sd(x)), algorithm = "plinear")
fm
giving:
Nonlinear regression model
model: y ~ cbind(a = dnorm(x, b, c))
data: parent.frame()
b c .lin.a
0.2629 3.2513 27.7287
residual sum-of-squares: 0.02822
Number of iterations to convergence: 7
Achieved convergence tolerance: 2.582e-07
The dnorm model (black curve) seems to fit the points although even a straight line (blue line) involving only two parameters (intercept and slope) instead of 3 isn't bad.
plot(y ~ x)
lines(fitted(fm) ~ x)
fm.lin <- lm(y ~ x)
abline(fm.lin, col = "blue")

How to perform a nonlinear regression of a complex function that has a summation using R?

I have the following function:
Of this function, the parameter R is a constant with a value of 22.5. I want to estimate parameters A and B using nonlinear regression (nls() function). I made a few attempts, but all were unsuccessful. I'm not very familiar with this type of operations in R, so I would like your help.
Additionally, if possible, I would also like to plot this function using ggplot2.
# Initial data
x <- c(0, 60, 90, 120, 180, 240)
y <- c(0, 0.967676, 1.290101, 1.327099, 1.272404, 1.354246)
R <- 22.5
df <- data.frame(x, y)
f <- function(x) (1/(n^2))*exp((-B*(n^2)*(pi^2)*x)/(R^2))
# First try
nls(formula = y ~ A*(1-(6/(pi^2))*sum(f, seq(1, Inf, 1))),
data = df,
start = list(A = 1,
B = 0.7))
Error in seq.default(1, Inf, 1) : 'to' must be a finite number
# Second try
nls(formula = y ~ A*(1-(6/(pi^2))*integrate(f, 1, Inf)),
data = df,
start = list(A = 1,
B = 0.7))
Error in f(x, ...) : object 'n' not found
You can use a finite sum approximation. Using 25 terms:
f <- function(x, B, n = 1:25) sum((1/(n^2))*exp((-B*(n^2)*(pi^2)*x)/(R^2)))
fm <- nls(formula = y ~ cbind(A = (1-(6/pi^2))* Vectorize(f)(x, B)),
data = df,
start = list(B = 0.7),
alg = "plinear")
fm
giving:
Nonlinear regression model
model: y ~ cbind(A = (1 - (6/pi^2)) * Vectorize(f)(x, B))
data: df
B .lin.A
-0.00169 1.39214
residual sum-of-squares: 1.054
Number of iterations to convergence: 12
Achieved convergence tolerance: 9.314e-06
The model does not seem to fit the data very well (solid line in graph below); however, a logistic model seems to work well (dashed line).
fm2 <- nls(y ~ SSlogis(x, Asym, xmid, scal), df)
plot(y ~ x, df)
lines(fitted(fm) ~ x, df)
lines(fitted(fm2) ~ x, df, lty = 2)
legend("bottomright", c("fm", "fm2"), lty = 1:2)

R - Predicted variables not included in linear regression graph

Here's the relevant code snippet. How do I get the predicted variables to display in the plot?
df <- data.frame(X = 2010:2022, Y = c(11539282, 11543332, 11546969, 11567845, 11593741, 11606027, 11622554, 11658609, rep(NA, 5)))
model.1 <- lm(formula = Y ~ X, data = df)
predict(object = model.1, newdata = df)
plot(X, Y, ylim=c(11500000,11750000))
lines(sort(X), fitted(model.1)[order(X)])
Make these changes:
when creating the model use na.action = na.exclude
use the formula methods for plot and lines
use fitted(model.2) as the predicted values
no sorting is needed as X is already sorted
giving this code:
model.2 <- lm(Y ~ X, df, na.action = na.exclude)
plot(Y ~ X, df)
lines(fitted(model.2) ~ X, df)
or use abline in which case this shorter code can be used:
model.3 <- lm(Y ~ X, df)
plot(Y ~ X, df)
abline(model.3)
In either case we get this output:
Added
Based on clarification in the comments we could do this (or if you want an even wider range try ylim = extendrange(pred, f = .10) to extend the range by 10%, say, on either side).
pred <- predict(model.3, df)
plot(Y ~ X, df, ylim = range(pred))
lines(pred ~ X, df)
giving:

Plot the observed and fitted values from a linear regression using xyplot() from the lattice package

I can create simple graphs. I would like to have observed and predicted values (from a linear regression) on the same graph. I am plotting say Yvariable vs Xvariable. There is only 1 predictor and only 1 response. How could I also add linear regression curve to the same graph?
So to conclude need help with:
plotting actuals and predicted both
plotting regression line
Here is one option for the observed and predicted values in a single plot as points. It is easier to get the regression line on the observed points, which I illustrate second
First some dummy data
set.seed(1)
x <- runif(50)
y <- 2.5 + (3 * x) + rnorm(50, mean = 2.5, sd = 2)
dat <- data.frame(x = x, y = y)
Fit our model
mod <- lm(y ~ x, data = dat)
Combine the model output and observed x into a single object for plott
res <- stack(data.frame(Observed = dat$y, Predicted = fitted(mod)))
res <- cbind(res, x = rep(dat$x, 2))
head(res)
Load lattice and plot
require("lattice")
xyplot(values ~ x, data = res, group = ind, auto.key = TRUE)
The resulting plot should look similar to this
To get just the regression line on the observed data, and the regression model is a simple straight line model as per the one I show then you can circumvent most of this and just plot using
xyplot(y ~ x, data = dat, type = c("p","r"), col.line = "red")
(i.e. you don't even need to fit the model or make new data for plotting)
The resulting plot should look like this
An alternative to the first example which can be used with anything that will give coefficients for the regression line is to write your own panel functions - not as scary as it seems
xyplot(y ~ x, data = dat, col.line = "red",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.abline(coef = coef(mod), ...) ## using mod from earlier
}
)
That gives a plot from Figure 2 above, but by hand.
Assuming you've done this with caret then
mod <- train(y ~ x, data = dat, method = "lm",
trControl = trainControl(method = "cv"))
xyplot(y ~ x, data = dat, col.line = "red",
panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.abline(coef = coef(mod$finalModel), ...) ## using mod from caret
}
)
Will produce a plot the same as Figure 2 above.
Another option is to use panel.lmlineq from latticeExtra.
library(latticeExtra)
set.seed(0)
xsim <- rnorm(50, mean = 3)
ysim <- (0 + 2 * xsim) * (1 + rnorm(50, sd = 0.3))
## basic use as a panel function
xyplot(ysim ~ xsim, panel = function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.lmlineq(x, y, adj = c(1,0), lty = 1,xol.text='red',
col.line = "blue", digits = 1,r.squared =TRUE)
})

Linear regression in R (normal and logarithmic data)

I want to carry out a linear regression in R for data in a normal and in a double logarithmic plot.
For normal data the dataset might be the follwing:
lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
plot (lin$x, lin$y)
There I want to calculate draw a line for the linear regression only of the datapoints 2, 3 and 4.
For double logarithmic data the dataset might be the following:
data = data.frame(
x=c(1:15),
y=c(
1.000, 0.742, 0.623, 0.550, 0.500, 0.462, 0.433,
0.051, 0.043, 0.037, 0.032, 0.028, 0.025, 0.022, 0.020
)
)
plot (data$x, data$y, log="xy")
Here I want to draw the regression line for the datasets 1:7 and for 8:15.
Ho can I calculate the slope and the y-offset als well as parameters for the fit (R^2, p-value)?
How is it done for normal and for logarithmic data?
Thanks for you help,
Sven
In R, linear least squares models are fitted via the lm() function. Using the formula interface we can use the subset argument to select the data points used to fit the actual model, for example:
lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
linm <- lm(y ~ x, data = lin, subset = 2:4)
giving:
R> linm
Call:
lm(formula = y ~ x, data = lin, subset = 2:4)
Coefficients:
(Intercept) x
-1.633 1.500
R> fitted(linm)
2 3 4
-0.1333333 1.3666667 2.8666667
As for the double log, you have two choices I guess; i) estimate two separate models as we did above, or ii) estimate via ANCOVA. The log transformation is done in the formula using log().
Via two separate models:
logm1 <- lm(log(y) ~ log(x), data = dat, subset = 1:7)
logm2 <- lm(log(y) ~ log(x), data = dat, subset = 8:15)
Or via ANCOVA, where we need an indicator variable
dat <- transform(dat, ind = factor(1:15 <= 7))
logm3 <- lm(log(y) ~ log(x) * ind, data = dat)
You might ask if these two approaches are equivalent? Well they are and we can show this via the model coefficients.
R> coef(logm1)
(Intercept) log(x)
-0.0001487042 -0.4305802355
R> coef(logm2)
(Intercept) log(x)
0.1428293 -1.4966954
So the two slopes are -0.4306 and -1.4967 for the separate models. The coefficients for the ANCOVA model are:
R> coef(logm3)
(Intercept) log(x) indTRUE log(x):indTRUE
0.1428293 -1.4966954 -0.1429780 1.0661152
How do we reconcile the two? Well the way I set up ind, logm3 is parametrised to give more directly values estimated from logm2; the intercepts of logm2 and logm3 are the same, as are the coefficients for log(x). To get the values equivalent to the coefficients
of logm1, we need to do a manipulation, first for the intercept:
R> coefs[1] + coefs[3]
(Intercept)
-0.0001487042
where the coefficient for indTRUE is the difference in the mean of group 1 over the mean of group 2. And for the slope:
R> coefs[2] + coefs[4]
log(x)
-0.4305802
which is the same as we got for logm1 and is based on the slope for group 2 (coefs[2]) modified by the difference in slope for group 1 (coefs[4]).
As for plotting, an easy way is via abline() for simple models. E.g. for the normal data example:
plot(y ~ x, data = lin)
abline(linm)
For the log data we might need to be a bit more creative, and the general solution here is to predict over the range of data and plot the predictions:
pdat <- with(dat, data.frame(x = seq(from = head(x, 1), to = tail(x,1),
by = 0.1))
pdat <- transform(pdat, yhat = c(predict(logm1, pdat[1:70,, drop = FALSE]),
predict(logm2, pdat[71:141,, drop = FALSE])))
Which can plot on the original scale, by exponentiating yhat
plot(y ~ x, data = dat)
lines(exp(yhat) ~ x, dat = pdat, subset = 1:70, col = "red")
lines(exp(yhat) ~ x, dat = pdat, subset = 71:141, col = "blue")
or on the log scale:
plot(log(y) ~ log(x), data = dat)
lines(yhat ~ log(x), dat = pdat, subset = 1:70, col = "red")
lines(yhat ~ log(x), dat = pdat, subset = 71:141, col = "blue")
For example...
This general solution works well for the more complex ANCOVA model too. Here I create a new pdat as before and add in an indicator
pdat <- with(dat, data.frame(x = seq(from = head(x, 1), to = tail(x,1),
by = 0.1)[1:140],
ind = factor(rep(c(TRUE, FALSE), each = 70))))
pdat <- transform(pdat, yhat = predict(logm3, pdat))
Notice how we get all the predictions we want from the single call to predict() because of the use of ANCOVA to fit logm3. We can now plot as before:
plot(y ~ x, data = dat)
lines(exp(yhat) ~ x, dat = pdat, subset = 1:70, col = "red")
lines(exp(yhat) ~ x, dat = pdat, subset = 71:141, col = "blue")
#Split the data into two groups
data1 <- data[1:7, ]
data2 <- data[8:15, ]
#Perform the regression
model1 <- lm(log(y) ~ log(x), data1)
model2 <- lm(log(y) ~ log(x), data2)
summary(model1)
summary(model2)
#Plot it
with(data, plot(x, y, log="xy"))
lines(1:7, exp(predict(model1, data.frame(x = 1:7))))
lines(8:15, exp(predict(model2, data.frame(x = 8:15))))
In general, splitting the data into different groups and running different models on different subsets is unusual, and probably bad form. You may want to consider adding a grouping variable
data$group <- factor(rep(letters[1:2], times = 7:8))
and running some sort of model on the whole dataset, e.g.,
model_all <- lm(log(y) ~ log(x) * group, data)
summary(model_all)

Resources