Piecewise linear regression in R (segmented.lm) - r

I appreciate any help to make segmented.lm (or any other function) find the obvious breakpoints in this example:
data = list(x=c(50,60,70,80,90) , y= c(703.786,705.857,708.153,711.056,709.257))
plot(data, type='b')
require(segmented)
model.lm = segmented(lm(y~x,data = data),seg.Z = ~x, psi = NA)
It returns with the following error:
Error in solve.default(crossprod(x1), crossprod(x1, y1)) :
system is computationally singular: reciprocal condition number = 1.51417e-20
If I change K:
model.lm = segmented(lm(y~x,data = data),seg.Z = ~x, psi = NA, control = seg.control(K=1))
I get another error:
Error in segmented.lm(lm(y ~ x, data = data), seg.Z = ~x, psi = NA, control = seg.control(K = 1)) :
only 1 datum in an interval: breakpoint(s) at the boundary or too close each other

A nice objective method to determine the break point is described in Crawley (2007: 427).
First, define a vector breaks for a range of potential break points:
breaks <- data$x[data$x >= 70 & data$x <= 90]
Then run a for loop for piecewise regressions for all potential break points and yank out the minimal residual standard error (mse) for each model from the summary output:
mse <- numeric(length(breaks))
for(i in 1:length(breaks)){
piecewise <- lm(data$y ~ data$y*(data$x < breaks[i]) + data$y*(data$x >= breaks[i]))
mse[i] <- summary(piecewise)[6]
}
mse <- as.numeric(mse)
Finally, identify the break point with the least mse:
breaks[which(mse==min(mse))]
Hope this helps.

Related

Point constraint in R piecewise regression

I am trying to fit a piecewise regression for this dataset. I know we do not have a linear relation between the dependent and independent variable but my real world application requires me to model the data as a lm segmented regression.
Here is my code with description of the steps
bond_data <- data.frame(
yield_change = c(-1.2,-0.9,-1.8,-1.4,-1.8,-2.1,-2.3,-2.1,-2.5,-2.2,-2.4,-2.5,-2.4,-2.4,
-3.0,-2.6,-5.1,-4.8,-4.9,-5.0,-5.0,-6.2,-6.1,-6.3,-5.0,-5.0),
maturity =c(10.2795,10.8603,11.7753,12.3562,12.5205,13.3589,13.8630,14.2822,14.3589,15.3589,
15.8630,16.778,17.3616,17.8658,18.3616,21.8685,22.5288,23.8685,24.3644,25.3671,
26.8712,27.8712,28.8712,29.8740,44.3781,49.3836))
The bond_data Dataframe contains these two vectors stated above.
#Defining lm model & segmented modelmodel <- lm(yield_change~maturity, data = bond_data)
segmented.model <- segmented(model,seg.Z=~maturity,psi = list(maturity = c(15,20,30)),fixed.psi = c(15,20,30),control = seg.control(it.max = 0, n.boot = 50))
xp <- c(min(bond_data$maturity), segmented.model$psi[,"Est."], max(bond_data$maturity))
new_data <- data.frame(xp)
colnames(new_data) <- "maturity"
o <- segmented.model
new_data$dummy1 <- pmax(new_data$maturity - o$psi[1,2], 0)
new_data$dummy2 <- pmax(new_data$maturity - o$psi[2,2], 0)
new_data$dummy3 <- pmax(new_data$maturity - o$psi[3,2], 0)
new_data$dummy4 <-I(new_data$maturity > o$psi[1,2]) * coef(o)[3]
new_data$dummy5 <-I(new_data$maturity > o$psi[2,2]) * coef(o)[4]
new_data$dummy6 <-I(new_data$maturity > o$psi[3,2]) * coef(o)[5]
names(new_data)[-1] <- names(model.frame(o))[-c(1,2)]
yp <- predict(segmented.model,new_data)
plot(bond_data$maturity,bond_data$yield_change, pch=16, col="blue",ylim = c(-8,0))
lines(xp,yp)
I get the following image
Plot of actual values in blue points and pred line
I am trying to get the first segment start at the point(maturity = 10, yield_change = 0)
One thing to note is that all my breakpoints have fixed x positions and no estimates are made so when I run segmented.model$psi my initial values are the same as my estimates (15,20 and 30) and all my st.err are zero.
How would I go about making my prediction line start at the point(maturity = 10, yield_change = 0)? I appreciate any help!
I have tried doing the following:
model <- lm(I(yield_change-0)~I(maturity-10), data = bond_data)
segmented.model <- segmented(model,seg.Z=~maturity,psi = list(maturity = c(15,20,30)),fixed.psi = c(15,20,30), control = seg.control(it.max = 0, n.boot = 50))
#But by running the previous line I get the error (object maturity not recognised).
#By running:
segmented.model <- segmented(model,seg.Z=~I(maturity-10),psi = list(I(maturity-10) = c(15,20,30)),fixed.psi = c(15,20,30), control = seg.control(it.max = 0, n.boot = 50))
I get this error:
Error: unexpected '=' in "segmented.model <- segmented(model,seg.Z=~I(maturity-10),psi = list(I(maturity-10) ="
I do not think I am using the correct method to solve my problem...

Errors in nls() - singular gradient or NaNs produced

I am trying to fit my photosynthesis data to a nls function, which is a nonrectangular hyperbola function. So far, I have some issues with getting the right start value for nls and, therefore, I am getting a lot of errors such as 'singular gradient ', 'NaNs produced', or 'step factor 0.000488281 reduced below 'minFactor' of 0.000976562'. Would you please give some suggestions for finding the best starting values? Thanks in advance!
The codes and data are below:
#Dataframe
PPFD <- c(0,0,0,50,50,50,100,100,100,200,200,200,400,400,400,700,700,700,1000,1000,1000,1500,1500,1500)
Cultivar <- c(-0.7,-0.8,-0.6,0.6,0.5,0.8,2.0,2.0,2.3,3.6,3.7,3.7,5.7,5.5,5.8,9.7,9.6,10.0,14.7,14.4,14.9,20.4,20.6,20.9)
NLRC <-data.frame(PPFD,Cultivar)
#nls regression
reg_nrh <- nls(Cultivar ~ (1/(2*Theta))*(AQY*PPFD+Am-sqrt((AQY*PPFD+Am)^2-4*AQY*Theta*Am*PPFD))-Rd, data = NLRC, start=list(Am = max(NLRC$Cultivar)-min(NLRC$Cultivar), AQY = 0.05, Rd=-min(NLRC$Cultivar), Theta = 1))
#estimated parameters for plotting
Amnrh <- coef(reg_nrh)[1]
AQYnrh <- coef(reg_nrh)[2]
Rdnrh <- coef(reg_nrh)[3]
Theta <- coef(reg_nrh)[4]
#plot
plot(NLRC$PPFD, NLRC$Cultivar, main = c("Cultivar"), xlab="", ylab="", ylim=c(-2,40),cex.lab=1.2,cex.axis=1.5,cex=2)+mtext(expression("PPFD ("*mu*"mol photons "*m^-2*s^-1*")"),side=1,line=3.3,cex=1.5)+mtext(expression(P[net]*" ("*mu*"mol "*CO[2]*" "*m^-2*s^-1*")"),side=2,line=2.5,cex=1.5)
#simulated value
ppfd = seq(from = 0, to = 1500)
pnnrh <- (1/(2*Theta))*(AQYnrh*ppfd+Amnrh-sqrt((AQYnrh*ppfd+Amnrh)^2-4*AQYnrh*Theta*Amnrh*ppfd))- Rdnrh
lines(ppfd, pnnrh, col="Green")
If we
take the maximum of 0 and the expression within the sqrt to avoid taking negative square roots
fix Theta at 0.8
use lm to get starting values for AQY and Am
then it converges
Theta <- 0.8
fm <- lm(Cultivar ~ PPFD, NLRC)
st <- list(AQY = coef(fm)[[2]], Rd = -min(NLRC$Cultivar), Am = coef(fm)[[1]])
fo <- Cultivar ~
(1/(2*Theta))*(AQY*PPFD+Am-sqrt(pmax(0, (AQY*PPFD+Am)^2-4*AQY*Theta*Am*PPFD)))-Rd
reg <- nls(fo, data = NLRC, start = st)
deviance(reg) # residual sum of squares
## [1] 5.607943
plot(Cultivar ~ PPFD, NLRC)
lines(fitted(reg) ~ PPFD, NLRC, col = "red")
(continued after image)
Note that the first model below has only two parameters yet has lower residual sum of squares (lower is better).
reg2 <- nls(Cultivar ~ a * PPFD^b, NLRC, start = list(a = 1, b = 1))
deviance(reg2)
## [1] 5.098796
These have higher residual sum of squares but do have the advantage that they are very simple.
deviance(fm) # fm defined above
## [1] 6.938648
fm0 <- lm(Cultivar ~ PPFD + 0, NLRC) # same as fm except no intercept
deviance(fm0)
## [1] 7.381632

plotting interaction effects for LASSO models in R

I fitted a lasso logistic model with interaction terms. Then i wanted to visualize those interactions using a interaction plot.
I tried to find some R function that will plot interactions for glmnet models and i couldnt find any .
Is there any R package that will plot interactions for LASSO ?
Since i couldnt find any, i tried to do it manually , by plotting the predicted values. But i am getting some errors.
My code is as follows,
require(ISLR)
require(glmnet)
y <- Smarket$Direction
x <- model.matrix(Direction ~ Lag1 + Lag4* Volume, Smarket)[, -1]
lasso.mod <- cv.glmnet(x, y, alpha=1,family="binomial",nfolds = 5, type.measure="class",
lambda = seq(0.001,0.1,by = 0.001))
lasso.mod$lambda.min
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100))
lasso.mod1 <- glmnet(x, y, alpha=1,family="binomial",
lambda = lasso.mod$lambda.min)
pred$Direction = predict(lasso.mod1, newx=pred,
type="response", s= lasso.mod$lambda.min)
i am getting this error :
Error in cbind2(1, newx) %*% nbeta :
not-yet-implemented method for <data.frame> %*% <dgCMatrix>
Can any suggest anything to fix this issue ?
Thank you
predict.glmnet says newx must be a matrix. And you need to give interaction value by yourself.
library(dplyr)
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100)) %>%
mutate(`Lag4:Volume` = Lag4 * Volume) # preparing interaction values
pred$Direction = predict(lasso.mod1, newx = as.matrix(pred), # convert to matrix
type = "link", s= lasso.mod$lambda.min)
[EDITED]
Oh, I overlooked more general, better way.
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100))
pred$Direction = predict(lasso.mod1,
newx = model.matrix( ~ Lag1 + Lag4* Volume, pred)[, -1],
type="response", s= lasso.mod$lambda.min)

Exponential decay fit in r

I would like to fit an exponential decay function in R to the following data:
data <- structure(list(x = 0:38, y = c(0.991744340878828, 0.512512332368168,
0.41102449265681, 0.356621905557202, 0.320851602373477, 0.29499198506227,
0.275037747162642, 0.25938850981822, 0.245263623938863, 0.233655093612007,
0.224041426946405, 0.214152907133301, 0.207475138903635, 0.203270738895484,
0.194942528735632, 0.188107106969046, 0.180926819430008, 0.177028560207711,
0.172595416846822, 0.166729221891201, 0.163502461048814, 0.159286528409165,
0.156110097827889, 0.152655498715612, 0.148684858095915, 0.14733605355542,
0.144691873223729, 0.143118852619617, 0.139542186417186, 0.137730138713745,
0.134353615271572, 0.132197800438632, 0.128369567159113, 0.124971834736476,
0.120027536018095, 0.117678812415655, 0.115720611113327, 0.112491329844252,
0.109219168085624)), class = "data.frame", row.names = c(NA,
-39L), .Names = c("x", "y"))
I've tried fitting with nls but the generated curve is not close to the actual data.
enter image description here
It would be very helpful if anyone could explain how to work with such nonlinear data and find a function of best fit.
Try y ~ .lin / (b + x^c). Note that when using "plinear" one omits the .lin linear parameter when specifying the formula to nls and also omits a starting value for it.
Also note that the .lin and b parameters are approximately 1 at the optimum so we could also try the one parameter model y ~ 1 / (1 + x^c). This is the form of a one-parameter log-logistic survival curve. The AIC for this one parameter model is worse than for the 3 parameter model (compare AIC(fm1) and AIC(fm3)) but the one parameter model might still be preferable due to its parsimony and the fact that the fit is visually indistinguishable from the 3 parameter model.
opar <- par(mfcol = 2:1, mar = c(3, 3, 3, 1), family = "mono")
# data = data.frame with x & y col names; fm = model fit; main = string shown above plot
Plot <- function(data, fm, main) {
plot(y ~ x, data, pch = 20)
lines(fitted(fm) ~ x, data, col = "red")
legend("topright", bty = "n", cex = 0.7, legend = capture.output(fm))
title(main = paste(main, "- AIC:", round(AIC(fm), 2)))
}
# 3 parameter model
fo3 <- y ~ 1/(b + x^c) # omit .lin parameter; plinear will add it automatically
fm3 <- nls(fo3, data = data, start = list(b = 1, c = 1), alg = "plinear")
Plot(data, fm3, "3 parameters")
# one parameter model
fo1 <- y ~ 1 / (1 + x^c)
fm1 <- nls(fo1, data, start = list(c = 1))
Plot(data, fm1, "1 parameter")
par(read.only = opar)
AIC
Adding the solutions in the other answers we can compare the AIC values. We have labelled each solution by the number of parameters it uses (the degrees of freedom would be one greater than that) and have reworked the log-log solution to use nls instead of lm and have a LHS of y since one cannot compare the AIC values of models having different left hand sides or using different optimization routines since the log likelihood constants used could differ.
fo2 <- y ~ exp(a + b * log(x+1))
fm2 <- nls(fo2, data, start = list(a = 1, b = 1))
fo4 <- y ~ SSbiexp(x, A1, lrc1, A2, lrc2)
fm4 <- nls(fo4, data)
aic <- AIC(fm1, fm2, fm3, fm4)
aic[order(aic$AIC), ]
giving from best AIC (i.e. fm3) to worst AIC (i.e. fm2):
df AIC
fm3 4 -329.35
fm1 2 -307.69
fm4 5 -215.96
fm2 3 -167.33
A biexponential model would fit much better, though still not perfect. This would indicate that you might have two simultaneous decay processes.
fit <- nls(y ~ SSbiexp(x, A1, lrc1, A2, lrc2), data = data)
#A1*exp(-exp(lrc1)*x)+A2*exp(-exp(lrc2)*x)
plot(y ~x, data = data)
curve(predict(fit, newdata = data.frame(x)), add = TRUE)
If the measurement error depends on magnitude, you could consider using it for weighting.
However, you should consider carefully what kind of model you'd expect from your domain knowledge. Just selecting a non-linear model empirically is usually not a good idea. A non-parametric fit might be a better option.
data <- structure(list(x = 0:38, y = c(0.991744340878828, 0.512512332368168,
0.41102449265681, 0.356621905557202, 0.320851602373477, 0.29499198506227,
0.275037747162642, 0.25938850981822, 0.245263623938863, 0.233655093612007,
0.224041426946405, 0.214152907133301, 0.207475138903635, 0.203270738895484,
0.194942528735632, 0.188107106969046, 0.180926819430008, 0.177028560207711,
0.172595416846822, 0.166729221891201, 0.163502461048814, 0.159286528409165,
0.156110097827889, 0.152655498715612, 0.148684858095915, 0.14733605355542,
0.144691873223729, 0.143118852619617, 0.139542186417186, 0.137730138713745,
0.134353615271572, 0.132197800438632, 0.128369567159113, 0.124971834736476,
0.120027536018095, 0.117678812415655, 0.115720611113327, 0.112491329844252,
0.109219168085624)), class = "data.frame", row.names = c(NA,
-39L), .Names = c("x", "y"))
# Do this because the log of 0 is not possible to calculate
data$x = data$x +1
fit = lm(log(y) ~ log(x), data = data)
plot(data$x, data$y)
lines(data$x, data$x ^ fit$coefficients[2], col = "red")
This did a lot better than using the nls forumla. And when plotting the fit seems to do fairly well.

Fitting ODE in R, with the use of the FME package

I'm trying to fit an ODE model to some data and solve for the values of the parameters in the model.
I know there is a package called FME in R which is designed to solve this kind of problem. However, when I tried to write the code like the manual of this package, the program could not run with the following traceback information:
Error in lsoda(y, times, func, parms, ...) : illegal input detected before taking any integration steps - see written message
The code is the following:
x <- c(0.1257,0.2586,0.5091,0.7826,1.311,1.8636,2.7898,3.8773)
y <- c(11.3573,13.0092,15.1907,17.6093,19.7197,22.4207,24.3998,26.2158)
time <- 0:7
# Initial Values of the Parameters
parms <- c(r = 4, b11 = 1, b12 = 0.2, a111 = 0.5, a112 = 0.1, a122 = 0.1)
# Definition of the Derivative Functions
# Parameters in pars; Initial Values in y
derivs <- function(time, y, pars){
with(as.list(c(pars, y)),{
dx <- r + b11*x + b12*y - a111*x^2 - a122*y^2 - a112*x*y
dy <- r + b12*x + b11*y - a122*x^2 - a111*y^2 - a112*x*y
list(c(dx,dy))
})
}
initial <- c(x = x[1], y = y[1])
data <- data.frame(time = time, x = x, y = y)
# Cost Computation, the Object Function to be Minimized
model_cost <- function(pars){
out <- ode(y = initial, time = time, func = derivs, parms = pars)
cost <- modCost(model = out, obs = data, x = "time")
return(cost)
}
# Model Fitting
model_fit <- modFit(f = model_cost, p = parms, lower = c(-Inf,rep(0,5)))
Is there anyone that knows how to use the FME package and fix the problem?
Your code-syntax is right and it works until the last line.
you can check your code with
model_cost(parms)
This works fine and you can see with
model_cost(parms)$model
that your "initial guess" is far away from the observed data (compare "obs" and "mod"). Perhaps here is a failure so that the fitting procedure will not reach the observed data.
So much for the while ... I also checked different methods with parameter "methods = ..." but still does not work.
Best wishes,
Johannes
edit: If you use:
model_fit <- modFit(f = model_cost, p = parms)
without any lower bounds, then you will get an result (even there are warnings), but then a112 is negative what you wanted to omit.

Resources