R: Force regression coefficients to add up to 1 - r

I'm trying to run a simple OLS regression with a restriction that the sum of the coefficients of two variables add up to 1.
I want:
Y = α + β1 * x1 + β2 * x2 + β3 * x3,
where β1 + β2 = 1
I have found how to make a relation between coefficients like:
β1 = 2* β2
But I haven't found how to make restrictions like:
β1 = 1 - β2
How would I do it in this simple example?
data <- data.frame(
A = c(1,2,3,4),
B = c(3,2,2,3),
C = c(3,3,2,3),
D = c(5,3,3,4)
lm(formula = 'D ~ A + B + C', data = data)

β1 + β2 = 1
To have β1 + β2 = 1 the model you have to fit is
fit <- lm(Y ~ offset(x1) + I(x2 - x1) + x3, data = df)
That is
Y = α + x1 + β2 * (x2 - x1) + β3 * x3
after substituting β1 = 1 - β2; x_new = x2 - x1 and the coefficient for x1 is 1.
β1 + β2 + β3 = 1
fit <- lm(Y ~ offset(x1) + I(x2 - x1) + I(x3 - x1), data = df)
Y = α + x1 + β2 * (x2 - x1) + β3 * (x3 - x1)
after substituting β1 = 1 - β2 - β3
β1 + β2 + β3 + ... = 1
I think the pattern is clear... you just have to subtract one variable, x1, from the remaining variables(x2, x3, ...) and have the coefficient of that variable, x1, to 1.
Example β1 + β2 = 1
# Data
df <- iris[, 1:4]
colnames(df) <- c("Y", paste0("x", 1:3, collaapse=""))
# β1 + β2 = 1
fit <- lm(Y ~ offset(x1) + I(x2 - x1) + x3, data = df)
coef_2 <- coef(fit)
beta_1 <- 1 - coef_2[2]
beta_2 <- coef_2[2]

1) CVXR We can compute the coefficients using CVXR directly by specifying the objective and constraint. We assume that D is the response, the coefficients of A and B must sum to 1, b[1] is the intercept and b[2], b[3] and b[4] are the coefficients of A, B and C respectively.
b <- Variable(4)
X <- cbind(1, as.matrix(data[-4]))
obj <- Minimize(sum((data$D - X %*% b)^2))
constraints <- list(b[2] + b[3] == 1)
problem <- Problem(obj, constraints)
soln <- solve(problem)
bval <- soln$getValue(b)
## [,1]
## [1,] 1.6428605
## [2,] -0.3571428
## [3,] 1.3571428
## [4,] -0.1428588
The objective is the residual sum of squares and it equals:
## [1] 0.07142857
2) pracma We can also use the pracma package to compute the coefficients. We specify the X matrix, response vector, the constraint matrix (in this case the vector given as the third argument is regarded as a one row matrix) and the right hand side of the constraint.
lsqlincon(X, data$D, Aeq = c(0, 1, 1, 0), beq = 1) # X is from above
## [1] 1.6428571 -0.3571429 1.3571429 -0.1428571
3) limSolve This package can also solve for the coefficients of regression problems with constraints. The arguments are the same as in (2).
lsei(X, data$D, c(0, 1, 1, 0), 1)
1.6428571 -0.3571429 1.3571429 -0.1428571
[1] 0
[1] 0.07142857
[1] "lsei"
4) nls This can be formulated as a problem for nls with the B coefficient equal to one minus the A coefficient.
nls(D ~ b0 + b1 * A + (1-b1) * B + b2 * C, data,
start = list(b0 = 1, b1 = 1, b2 = 1))
## D ~ b0 + b1 * A + (1 - b1) * B + b2 * C
## data: data
## b0 b1 b2
## 1.6429 -0.3571 -0.1429
## residual sum-of-squares: 0.07143
## Number of iterations to convergence: 1
## Achieved convergence tolerance: 2.803e-08
We can double check the above by using the lm approach in the other answer:
lm(D ~ I(A-B) + C + offset(B), data)
lm(formula = D ~ I(A - B) + C + offset(B), data = data)
(Intercept) I(A - B) C
1.6429 -0.3571 -0.1429
The I(A-B) coefficient equals the coefficient of A in the original formulation and one minus it is the coefficient of C. We see that all approaches do lead to the same coefficients.


Why do MASS:lm.ridge coefficents differ from those calculated manually?

When performing ridge regression manually, as it is defined
solve(t(X) %*% X + lbd*I) %*%t(X) %*% y
I get different results from those calculated by MASS::lm.ridge. Why? For ordinary linear regression the manual method (computing the pseudoinverse) works fine.
Here is my Minimal, Reproducible Example:
ridgeRegression = function(X, y, lbd) {
Rinv = solve(t(X) %*% X + lbd*diag(ncol(X)))
t(Rinv %*% t(X) %*% y)
# generate some data:
tb1 = tibble(
x0 = 1,
x1 = seq(-1, 1, by=.01),
x2 = x1 + rnorm(length(x1), 0, .1),
y = x1 + x2 + rnorm(length(x1), 0, .5)
X = as.matrix(tb1 %>% select(x0, x1, x2))
# sanity check: force ordinary linear regression
# and compare it with the built-in linear regression:
ridgeRegression(X, tb1$y, 0) - coef(summary(lm(y ~ x1 + x2, data=tb1)))[, 1]
# looks the same: -2.94903e-17 1.487699e-14 -2.176037e-14
# compare manual ridge regression to MASS ridge regression:
ridgeRegression(X, tb1$y, 10) - coef(MASS::lm.ridge(y ~ x0 + x1 + x2 - 1, data=tb1, lambda = 10))
# noticeably different: -0.0001407148 0.003689412 -0.08905392
MASS::lm.ridge scales the data before modelling - this accounts for the difference in the coefficients.
You can confirm this by checking the function code by typing MASS::lm.ridge into the R console.
Here is the lm.ridge function with the scaling portion commented out:
X = as.matrix(tb1 %>% select(x0, x1, x2))
n <- nrow(X); p <- ncol(X)
#Xscale <- drop(rep(1/n, n) %*% X^2)^0.5
#X <- X/rep(Xscale, rep(n, p))
Xs <- svd(X)
rhs <- t(Xs$u) %*% tb1$y
d <- Xs$d
lscoef <- Xs$v %*% (rhs/d)
lsfit <- X %*% lscoef
resid <- tb1$y - lsfit
s2 <- sum(resid^2)/(n - p)
HKB <- (p-2)*s2/sum(lscoef^2)
LW <- (p-2)*s2*n/sum(lsfit^2)
k <- 1
dx <- length(d)
div <- d^2 + rep(10, rep(dx,k))
a <- drop(d*rhs)/div
dim(a) <- c(dx, k)
coef <- Xs$v %*% a
# x0 x1 x2
#[1,] 0.01384984 0.8667353 0.9452382

GAM R variance explained by variable

My current problem is to calculate the variance explained by the different variables of a general additive model (GAM) with R.
I followed the explanation given by Wood here :
But I would like to do it with three variables.
I tried this :
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
f1 <- function(x) exp(2 * x) - 3.75887
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
f3 <- function(x) 0.008*x^2 - 1.8*x + 874
f <- f1(x1) + f2(x2) + f3(x3)
e <- rnorm(n, 0, 2)
y <- f + e
b <- gam(y ~ s(x1, k = 3)+s(x2, k = 3)+ s(x3, k = 3))
b3 <- gam(y ~ s(x1) + s(x2), sp = c(b$sp[1], b$sp[2]))
b2 <- gam(y ~ s(x1) + s(x3), sp = c(b$sp[1], b$sp[3]))
b1 <- gam(y ~ s(x2) + s(x3), sp = c(b$sp[2], b$sp[3]))
b0 <- gam(y~1)
But I don't understand results. For example, the model with only x1 and x2 has a deviance smaller than deviance with the three explanatory variable.
Does the method I used to extract variance explained by variable with three variables is correct?
Does it mean that there is a confounding effect in the global model? Or is there another explanation?
Thanks a lot.
You did something wrong here:
b <- gam(y ~ s(x1, k = 3) + s(x2, k = 3) + s(x3, k = 3))
b3 <- gam(y ~ s(x1) + s(x2), sp = c(b$sp[1], b$sp[2]))
b2 <- gam(y ~ s(x1) + s(x3), sp = c(b$sp[1], b$sp[3]))
b1 <- gam(y ~ s(x2) + s(x3), sp = c(b$sp[2], b$sp[3]))
Why did you set k = 3 in the first line, while not setting k = 3 for the rest? Without specifying k, s() will take default value k = 10. Now you get a problem: b1, b2, b3 are not nested in b.
In Simon Wood's original example, he left k unspecified, so that k=10 is taken for all s(). In fact, you can vary k values, but you must gurantee that you always have the same k for the same covariate (to ensure nesting). For example, you can do:
b <- gam(y ~ s(x1, k = 4) + s(x2, k = 6) + s(x3, k = 3))
b3 <- gam(y ~ s(x1, k = 4) + s(x2, k = 6), sp = c(b$sp[1], b$sp[2])) ## droping s(x3) from b
b2 <- gam(y ~ s(x1, k = 4) + s(x3, k = 3), sp = c(b$sp[1], b$sp[3])) ## droping s(x2) from b
b1 <- gam(y ~ s(x2, k = 6) + s(x3, k = 3), sp = c(b$sp[2], b$sp[3])) ## droping s(x1) from b
Then let's do:
# [1] 0.2073421
# [1] 0.4323154
# [1] 0.02094997
The positive values imply that dropping any model term will inflate the deviance, which is sensible as our true model have all three terms.

Fiting 1 - exp(x) giving higher weight to the first values

I want to fit to a 1 - exp(x) function to a data set , but giving higher weight to the first values. However, the following code is not working in such way:
x <-sqrt((0.05)^2+(0.05)^2)*seq(from = 1, to = 20, by = 1)
y <- c(11,20,27,32,35,36,36.5,25,16,9,4,1,7.87e-16,2.07e-15,-9.36e-16,1.61e-15,-3.81e-16,3.92e-16,7.65e-16,-8.26e-16)
temp <- data.frame(cbind(x,y))
we <- 1/(log1p(seq_along(x)))
# fit non-linear model
mod <- nls(y ~ (1 - exp(a + b * x)), data = temp, start = list(a = 0, b = 0), weights = we)
#add fitted curve
lines(temp$x, predict(mod, list(x = temp$x)))
Here is the output:
Your specification of weights is correct. The bad fit you obtained is due to your faulty model assumption. You assumed:
y ~ 1 - exp(a + b * x)
Note that exp() gives strictly positive values, so y will be no larger than 1. However, y values in your data range up to 35.
My idea is not perfect, but it might give you a better starting point. Consider:
y ~ a * x * exp(b * x * x + c * x)
Using your data:
x <- c(0, sqrt((0.05)^2+(0.05)^2)*seq(from = 1, to = 20, by = 1))
y <- c(0, 11,20,27,32,35,36,36.5,25,16,9,4,1,7.87e-16,2.07e-15,-9.36e-16,1.61e-15,-3.81e-16,3.92e-16,7.65e-16,-8.26e-16)
fit <- nls(y ~ a * x * exp(b * x * x + c * x), start = list(a = 30, b= -1, c = -1))
plot(x, y)
lines(x, predict(fit, list(x)))

optim in R failing due to NAs

I have been trying to estimate a rather messy nonlinear regression model in R for quite some time now. After countless failed attempts using the nls function, I am now trying my luck with optim, which I have used many times in the past. For this example, I'll use the following data:
x1 <- runif(1000,0,7)
x2 <- runif(1000,0,7)
x3 <- runif(1000,0,7)
y <- log(.5 + .5*x1 + .7*x2 + .4*x3 + .05*x1^2 + .1*x2^2 + .15*x3^2 - .05*x1*x2 - .1*x1*x3 - .07*x2*x3 + .02*x1*x2*x2) + rnorm(1000)
I would like to estimate the parameters in the polynomial expression inside the log() function above, and so I have defined the following function to replicate a nonlinear least squares regression:
g <- function(coefs){
fitted <- coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3
error <- y - log(fitted)
In order to avoid negative starting values inside the log() expression, I first estimate the linear model below:
lm.1 <- lm(I(exp(y)) ~ x1 + x2 + x3 + I(x1^2) + I(x2^2) + I(x3^2) + I(x1*x2) + I(x1*x3) + I(x2*x3) + I(x1*x2*x3))
intercept.start <- ifelse((min(fitted(lm.1)-lm.1$coefficients[1])) <= 0, -(min(fitted(lm.1)-lm.1$coefficients[1])) + .5, .5)
coefs.start <- c(intercept.start,lm.1$coefficients[-1])
Defining intercept.start above guarantees that the expression inside of log() will be strictly positive at the outset. However, when I run the optim command
nl.model <- optim(coefs.start, g, method="L-BFGS-B")
I get the following error message
Error in optim(coefs.start, g, method = "L-BFGS-B") :
L-BFGS-B needs finite values of 'fn'
In addition: Warning message:
In log(fitted) : NaNs produced
Does anyone know how I can force the optim routine to simply disregard parameter estimates that would produce negative values inside of the log() expression? Thanks in advance.
Here's a slightly different approach.
Aside from the typo mentioned in the comment, if the issue is that the argument to the log(...) is < 0 for certain parameter estimates, you can change the function definition to prevent that.
# just some setup - we'll need this later
err <- rnorm(1000, sd=0.1) # note smaller error sd
x1 <- runif(1000,0,7)
x2 <- runif(1000,0,7)
x3 <- runif(1000,0,7)
par <- c(0.5, 0.5, 0.7, 0.4, 0.05, 0.1, 0.15, -0.05, -0.1, -0.07, 0.02)
m <- cbind(1, x1, x2, x3, x1^2, x2^2, x3^2, x1*x2, x1*x3, x2*x3, x1*x2*x3)
y <- as.numeric(log(m %*% par)) + err
# note slight change in the model function definition
g <- function(coefs){
fitted <- coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3
fitted <- ifelse(fitted<=0, 1, fitted) # ensures fitted > 0
error <- y - log(fitted)
lm.1 <- lm(I(exp(y)) ~ x1 + x2 + x3 + I(x1^2) + I(x2^2) + I(x3^2) + I(x1*x2) + I(x1*x3) + I(x2*x3) + I(x1*x2*x3))
nl.model <- optim(coef(lm.1), g, method="L-BFGS-B", control=list(maxit=1000))
# (Intercept) x1 x2 x3 I(x1^2) I(x2^2) I(x3^2) I(x1 * x2) I(x1 * x3) I(x2 * x3) I(x1 * x2 * x3)
# 0.40453182 0.50136222 0.71696293 0.45335893 0.05461253 0.10210854 0.14913914 -0.06169715 -0.11195476 -0.08497180 0.02531717
with(nl.model, cat(convergence, message))
Note that these estimates are pretty close to the actual values. That's because in the setup I used a smaller error term (sd = 0.2 instead of 1). In your example, the error is large compared to the response (y), so you're basically fitting random error.
If you fit the model using the actual parameter values as starting estimates, you get nearly identical results, no closer to the "true" values.
nl.model <- optim(par, g, method="L-BFGS-B", control=list(maxit=1000))
# [1] 0.40222956 0.50159930 0.71734810 0.45459606 0.05465654 0.10206887 0.14899640 -0.06177640 -0.11209065 -0.08497423 0.02533085
with(nl.model, cat(convergence, message))
Try this with the original error (sd = 1) and see what happens.
Here's a log of my efforts to investigate. I put a maximum on the fitted values and got convergence. I then asked myself if increasing that max would do anything th the estimated parameters and found that there was no change... AND there was no difference from the starting values, so I think you messed up in building the function. Perhaps you can investigate further:
> gp <- function(coefs){
+ fitted <- coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3 }
> describe( gp( coefs.start) ) #describe is from pkg:Hmisc
n missing unique Info Mean .05 .10 .25 .50 .75
1000 0 1000 1 13.99 2.953 4.692 8.417 12.475 18.478
.90 .95
25.476 28.183
lowest : 0.5000 0.5228 0.5684 0.9235 1.1487
highest: 41.0125 42.6003 43.1457 43.5950 47.2234
> g <- function(coefs){
+ fitted <- max( coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3 , 1000)
+ error <- y - log(fitted)
+ return(sum(error^2))
+ }
> nl.model <- optim(coefs.start, g, method="L-BFGS-B")
> nl.model
x1 x2 x3 I(x1^2)
0.77811231 -0.94586233 -1.33540959 1.65454871 0.31537594
I(x2^2) I(x3^2) I(x1 * x2) I(x1 * x3) I(x2 * x3)
0.45717138 0.11051418 0.59197115 -0.25800792 0.04931727
I(x1 * x2 * x3)
[1] 24178.62
function gradient
1 1
[1] 0
> g <- function(coefs){
+ fitted <- max( coefs[1] + coefs[2]*x1 + coefs[3]*x2 + coefs[4]*x3 + coefs[5]*x1^2 + coefs[6]*x2^2 + coefs[7]*x3^2 + coefs[8]*x1*x2 + coefs[9]*x1*x3 + coefs[10]*x2*x3 + coefs[11]*x1*x2*x3 , 100000)
+ error <- y - log(fitted)
+ return(sum(error^2))
+ }
> nl.model <- optim(coefs.start, g, method="L-BFGS-B")
> nl.model
x1 x2 x3 I(x1^2)
0.77811231 -0.94586233 -1.33540959 1.65454871 0.31537594
I(x2^2) I(x3^2) I(x1 * x2) I(x1 * x3) I(x2 * x3)
0.45717138 0.11051418 0.59197115 -0.25800792 0.04931727
I(x1 * x2 * x3)
[1] 89493.99
function gradient
1 1
[1] 0

R neuralnet does not converge within stepmax for time series

I'm writing a neural network for prediction of elements in a time series x + sin(x^2) in R, using the neuralnet package. This is how training data is being generated, assuming a window of 4 elements, and that the last one is the one that has to be predicted:
nntr0 <- ((1:25) + sin((1:25)^2))
nntr1 <- ((2:26) + sin((2:26)^2))
nntr2 <- ((3:27) + sin((3:27)^2))
nntr3 <- ((4:28) + sin((4:28)^2))
nntr4 <- ((5:29) + sin((5:29)^2))
Then, I turn these into a data.frame:
nntr <- data.frame(nntr0, nntr1, nntr2, nntr3, nntr4)
Then, I proceed to train the NN:
net.sinp <- neuralnet(nntr4 ~ nntr0 + nntr1 + nntr2 + nntr3, data=nntr, hidden=10, threshold=0.04, act.fct="tanh", linear.output=TRUE, stepmax=100000)
Which, after a while, gives me the message
Warning message:
algorithm did not converge in 1 of 1 repetition(s) within the stepmax
Call: neuralnet(formula = nntr4 ~ nntr0 + nntr1 + nntr2 + nntr3, data = nntr, hidden = 10, threshold = 0.04, stepmax = 100000, act.fct = "tanh", linear.output = TRUE)
Can anyone help me figure out why it is not converging? Many thanks
With tanh as an activation function (it is bounded),
it is very difficult to reproduce the linear trend in your signal.
You can use linear activation functions instead,
or try to detrend the signal.
# Data
dx <- 1
n <- 25
x <- seq(0,by=dx,length=n+4)
y <- x + sin(x^2)
y0 <- y[1:n]
y1 <- y[1 + 1:n]
y2 <- y[2 + 1:n]
y3 <- y[3 + 1:n]
y4 <- y[4 + 1:n]
d <- data.frame(y0, y1, y2, y3, y4)
# Linear activation functions
r <- neuralnet(y4 ~ y0 + y1 + y2 + y3, data=d, hidden=10)
plot(y4, compute(r, d[,-5])$net.result)
# No trend
d2 <- data.frame(
y0 = y0 - x[1:n],
y1 = y1 - x[1 + 1:n],
y2 = y2 - x[2 + 1:n],
y3 = y3 - x[3 + 1:n],
y4 = y4 - x[4 + 1:n]
r <- neuralnet(y4 ~ y0 + y1 + y2 + y3, data=d2, hidden=10, act.fct="tanh" )
plot(d2$y4, compute(r, d2[,-5])$net.result)
Warning message:
algorithm did not converge in 1 of 1 repetition(s) within the stepmaxmeans your algorithm reached the limited steps before it is converged. If you type ?neuralnet and see the definition for stepmax it says,
the maximum steps for the training of the neural network. Reaching this maximum leads to a stop of the neural network's training process.
For your problem, I recommend you to increase your stepmax value to 1e7 and see what happens.
The code will be,
net.sinp <- neuralnet(nntr4 ~ nntr0 + nntr1 + nntr2 + nntr3, data=nntr, hidden=10, threshold=0.04, act.fct="tanh", linear.output=TRUE, stepmax=1e7)
