Covariance matrix of the estimated parameteric coefficients and estimated smoothing parameters in a GAM (package: mgcv)?

Consider the code below to fit a generalized additive model including two terms x0 which is linear and x1 which is nonlinear:
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2, method="REML")
b <- gam(y~x1+s(x2, k=5),data=dat)
The model b estimates 3 parameters: an intercept, one parametric coefficient for x1, and one smoothing parameter for x2. How can I extract the estimated covariance matrix of these 3 parameters? I have used vcov(b) which gives the following results:
(Intercept) x0 s(x1).1 s(x1).2 s(x1).3 s(x1).4
(Intercept) 0.104672470 -0.155791753 0.002356237 0.001136459 0.001611635 0.001522158
x0 -0.155791753 0.322528093 -0.004878003 -0.002352757 -0.003336490 -0.003151250
s(x1).1 0.002356237 -0.004878003 0.178914602 0.047701707 0.078393786 0.165195739
s(x1).2 0.001136459 -0.002352757 0.047701707 0.479869768 0.606310668 0.010704075
s(x1).3 0.001611635 -0.003336490 0.078393786 0.606310668 0.933905535 0.025816649
s(x1).4 0.001522158 -0.003151250 0.165195739 0.010704075 0.025816649 0.184471259
It seems vcov(b) gives the covariance related to each knot of the smooth term s(x1), as the results contain s(x1).1, s(x1).2, s(x1).3, s(x1).4 (That's what I guess). I need the covariance between the estimated smoothing parameter and other parametric coefficients, which should be just one for (Intercept) and just one for x0. Is it available at all?
Edit: I set the method of estimation to REML in the code. I agree that I might have used incorrect phrases to explain my idea as said by Gavin Simpson, and I understand all he said. Yet the idea of calculating the covariance between the parametric coefficients (intercept and coefficient of x1) and them smoothing parameter comes from the method of estimation. If we set it to ML or REML, then there could be the covariance I guess. In this case, the estimated covariance matrix for the log smoothing parameter estimates are provided by sp.vcov. So I think such value could exist similarly for the parametric coefficients and the smoothing parameter.

Your statement
The model b estimates 3 parameters: an intercept, one parametric coefficient for x1, and one smoothing parameter for x2.
is incorrect.
The model estimates many more coefficients that these three. Note also that it is confusing to speak of a smoothing parameter for x2 as the model also estimates one of those, but I doubt this is what you mean by that phrase. The smoothing parameter estimated for x2 is the value that controls the wiggliness of the fitted spline. It is also estimated alongside the other coefficients you see, although it isn't typically considered as part of the main model estimated parameters because what you see in the VCOV are actually the variances and covariances of the model coefficients conditional upon this value of the smoothness parameter.
The GAM fitted here is one in which the effect of x2 is represented by a spline basis expansion of x2. For the basis used and the identifiability constraints applied to the basis, this means that the true effect of x2, f(x2), is estimated via a k-1 basis functions. This is a function hat(f(x2)) = \sum \beta_i b_i(x2) estimated by summing up the weighted (by beta_i, the model coefficients for the ith basis function, b) basis functions evaluated at the observed values of x2 (b_i(x2)).
Hence once the basis is chosen and once we have a smoothness parameter (my version, the one controlling the wiggliness), this model is simply a GLM with x1 and the 4 basis functions evaluated at x2. Hence it is parametric and there isn't a single element in the VCOV that relates to the smooth f(x2) - the model just doesn't work that way.


Is there a way to force the coefficient of the independent variable to be a positive coefficient in the linear regression model used in R?

In lm(y ~ x1 + x2+ x3 +...+ xn) , not all independent variables are positive.
For example, we know that x1 to x5 must have positive coefficients and x6 to x10 must have negative coefficients.
However, when lm(y ~ x1 + x2+ x3 +...+ x10) is performed using R, some of x1 ~ x5 have negative coefficients and some of x6 ~ x10 have positive coefficients. is the data analysis result.
I want to control this using a linear regression method, is there any good way?
The sign of a coefficient may change depending upon its correlation with other coefficients. As #TarJae noted, this seems like an example of (or counterpart to?) Simpson's Paradox, which describes cases where the sign of a correlation might reverse depending on if we condition on another variable.
Here's a concrete example in which I've made two independent variables, x1 and x2, which are both highly correlated to y, but when they are combined the coefficient for x2 reverses sign:
# specially chosen seed; most seeds' result isn't as dramatic
df1 <- data.frame(y = 1:10,
x1 = rnorm(10, 1:10),
x2 = rnorm(10, 1:10))
lm(y ~ ., df1)
lm(formula = y ~ ., data = df1)
(Intercept) x1 x2
-0.2634 1.3990 -0.4792
This result is not incorrect, but arises here (I think) because the prediction errors from x1 happen to be correlated with the prediction errors from x2, such that a better prediction is created by subtracting some of x2.
EDIT, additional analysis:
The more independent series you have, the more likely you are to see this phenomenon arise. For my example with just two series, only 2.4% of the integer seeds from 1 to 1000 produce this phenomenon, where one of the series produces a negative regression coefficient. This increases to 16% with three series, 64% of the time with five series, and 99.9% of the time with 10 series.
Possibilities include using:
nls with algorithm = "port" in which case upper and lower bounds can be specified.
nnnpls in the nnls package which supports upper and lower 0 bounds or use nnls in the same package if all coefficients should be non-negative.
bvls (bounded value least squares) in the bvls package and specify the bounds.
there is an example of performing non-negative least squares in the vignette of the CVXR package.
reformulate it as a quadratic programming problem (see Wikipedia for the formulation) and use quadprog package.
nnls in the limSolve package. Negate the columns that should have negative coefficients to convert it to a non-negative least squares problem.
These packages mostly do not have a formula interface but instead require that a model matrix and dependent variable be passed as separate arguments. If df is a data frame containing the data and if the first column is the dependent variable then the model matrix can be calculated using:
A <- model.matrix(~., df[-1])
and the dependent variable is
Another approach is to add a penalty to the least squares objective function, i.e. the objective function becomes the sum of the squares of the residuals plus one or more additional terms that are functions of the coefficients and tuning parameters. Although doing this does not impose any hard constraints to guarantee the desired signs it may result in the correct signs anyways. This is particularly useful if the problem is ill conditioned or if there are more predictors than observations.
linearRidge in the ridge package will minimize the sum of the square of the residuals plus a penalty equal to lambda times the sum of the squares of the coefficients. lambda is a scalar tuning parameter which the software can automatically determine. It reduces to least squares when lambda is 0. The software has a formula method which along with the automatic tuning makes it particularly easy to use.
glmnet adds penalty terms containing two tuning parameters. It includes least squares and ridge regression as a special cases. It also supports bounds on the coefficients. There are facilities to automatically set the two tuning parameters but it does not have a formula method and the procedure is not as straight forward as in the ridge package. Read the vignettes that come with it for more information.
1- one way is to define an optimization program and minimize the mean square error by constraints and limits. (nlminb, optim, etc.)
2- Another one is using a library called "lavaan" as follow:

How do I specify the dispersion parameter when computing the confidence interval for a GLM?

I have a model of exponential decay in the form Y = exp{a + bX + cW}. In R, I represent this as a generalized linear model (GLM) using a gamma random component with log link function.
fitted <- glm(Y ~ X + W, family=Gamma(link='log'))
I know from this post that for the standard errors to really represent an exponential rather than gamma random component, I need to specify the dispersion parameter as being 1 when I call summary.
summary(fitted, dispersion=1)
summary(fitted) # not the same!
Now, I want to find the 95% confidence intervals for my estimates of a, b, c. However, there seems to be no way to specify the dispersion parameter for the confint, even though I know it should affect the confidence interval (because it affects the standard error).
confint(fitted, dispersion=1) # same as the last confint :(
So, in order to get the confidence intervals corresponding to an exponential rather than gamma random component, how do I specify the dispersion parameter when computing the confidence interval for a GLM?

using lambda.min to extrace coefficients from model trained with glmnet

I am using glmnet to train the logistic regression model and then try to obtain the coefficients with the specific lambda. I used the simple example here:
fit = glmnet(x, y, family = "binomial")
coef(fit, s = c(0.05,0.01))
I have checked the values of fit$lambda, however, I could not find the specific values of 0.05 or 0.01 in fit$lambda. So how could coef return the coefficients with a lambda not in the fit$lambda vector.
This is explained in the help for coef.glmnet, specifically the exact argument:
This argument is relevant only when predictions are made at values of s (lambda) different from those used in the fitting of the original model. If exact=FALSE (default), then the predict function uses linear interpolation to make predictions for values of s (lambda) that do not coincide with those used in the fitting algorithm. While this is often a good approximation, it can sometimes be a bit coarse. With exact=TRUE, these different values of s are merged (and sorted) with object$lambda, and the model is refit before predictions are made.

How does lmer (from the R package lme4) compute log likelihood?

I'm trying to understand the function lmer. I've found plenty of information about how to use the command, but not much about what it's actually doing (save for some cryptic comments here: I'm playing with the following simple example:
I understand that lmer is fitting a model of the form Y_{ij} = beta + B_i + epsilon_{ij}, where epsilon_{ij} and B_i are independent normals with variances sigma^2 and tau^2 respectively. If theta = tau/sigma is fixed, I computed the estimate for beta with the correct mean and minimum variance to be
c = sum_{i,j} alpha_i y_{ij}
alpha_i = lambda/(1 + theta^2 n_i)
lambda = 1/[\sum_i n_i/(1+theta^2 n_i)]
n_i = number of observations from group i
I also computed the following unbiased estimate for sigma^2:
s^2 = \sum_{i,j} alpha_i (y_{ij} - c)^2 / (1 + theta^2 - lambda)
These estimates seem to agree with what lmer produces. However, I can't figure out how log likelihood is defined in this context. I calculated the probability density to be
pd(Y_{ij}=y_{ij}) = \prod_{i,j}[f_sigma(y_{ij}-ybar_i)]
* prod_i[f_{sqrt(sigma^2/n_i+tau^2)}(ybar_i-beta) sigma sqrt(2 pi/n_i)]
ybar_i = \sum_j y_{ij}/n_i (the mean of observations in group i)
f_sigma(x) = 1/(sqrt{2 pi}sigma) exp(-x^2/(2 sigma)) (normal density with sd sigma)
But log of the above is not what lmer produces. How is log likelihood computed in this case (and for bonus marks, why)?
Edit: Changed notation for consistency, striked out incorrect formula for standard deviation estimate.
The links in the comments contained the answer. Below I've put what the formulae simplify to in this simple example, since the results are somewhat intuitive.
lmer fits a model of the form , where and are independent normals with variances and respectively. The joint probability distribution of and is therefore
The likelihood is obtained by integrating this with respect to (which isn't observed) to give
where is the number of observations from group , and is the mean of observations from group . This is somewhat intuitive since the first term captures spread within each group, which should have variance , and the second captures the spread between groups. Note that is the variance of .
However, by default (REML=T) lmer maximises not the likelihood but the "REML criterion", obtained by additionally integrating this with respect to to give
where is given below.
Maximising likelihood (REML=F)
If is fixed, we can explicitly find the and which maximise likelihood. They turn out to be
Note has two terms for variation within and between groups, and is somewhere between the mean of and the mean of depending on the value of .
Substituting these into likelihood, we can express the log likelihood in terms of only:
lmer iterates to find the value of which minimises this. In the output, and are shown in the fields "deviance" and "logLik" (if REML=F) respectively.
Maximising restricted likelihood (REML=T)
Since the REML criterion doesn't depend on , we use the same estimate for as above. We estimate to maximise the REML criterion:
The restricted log likelihood is given by
In the output of lmer, and are shown in the fields "REMLdev" and "logLik" (if REML=T) respectively.

Determining the values of beta in a logistic regression equation

I read about logistic regression on Wikipedia and it talks of the equation where z, the output depends on the values of beta1, beta2, beta3, and so on which are termed as regression coefficients along with beta0, which is the intercept.
How do you determine the values of these coefficients and intercept given a sample set where we know the output probability and the values of the input dependent factors?
Use R. Here's a video:
