Explicit formula for Savitzky-Golay matrix - math

Does anyone have an explicit formula to calculate the entries of the Savitzky-Golay Matrix. i.e. a direct expression that only depends on the radii of the tensor in its various dimensions, the polynomial regression orders and the derivative orders.
There are examples for finite sets of data-points but i can't seem to find an explicit formula.

Related

Is there an R function to compare two variance-covariance matrices via fit indicators?

I obtained two variance-co-variance matrices from two different samples. Both contain data on the same variables. I would like to estimate their similarity according to fit indices, i.e., I am interested whether the pattern of covariances between the variables is similar or different in the two samples. I am familiar with fit indices from structural equation modeling (e.g., Chi-square, GFI, CFI, RMSEA, SRMR) which compare an empirical variance-covariance matrix with a model-implied variance-covariance matrix. Is there a way to obtain these fit indicators for the comparison of two empirical variance-covariance matrices?
I tried compareCov which only gives a visual comparison.

How to use the 'weights' in the nls (non-linear least squares) function in R?

My question is on how to correctly interpret (and use) the 'weights' input variable in the nls function of R for non-linear weighted least squares regression.
The solution for solving the unknown parameters in weighted least squares theory is:
From this the variable P is the weight square matrix of size (NxN) where N is the number of data observations.
However, when I look at the nls documentation in R found here, it says the 'weights' to be input is a vector.
This has me puzzled since based on my understanding, the weights should be a square matrix. Some insights with those who have a better understanding is appreciated.
Weight variable in regression, is a measure of how important an observation is to your model due to different reasons (eg. may be in terms of reliability of measurement or inverse of variance estimate). Therefore, some observations may be more important/ weigh higher than others.
Weight vector , in matrix notation converts to a diagonal matrix for i in {1,2,3...n,} both represents the same thing (i.e. weight of ith observation). For nls package in R you need to supply weights in vector form.
Also, it should be noted that, weighted least squares is a special variant of generalized least squares in which we use weights to counter the heteroskedasticity. If the residuals are correlated for observations, perhaps a general model might be suitable.
PS: Cross validated would be the right place to get better detailed answer. Also, It seems to be memory efficient to store a vector rather than a matrix as the number of observations grows

Simple variogram in R, understanding gstat::variogram() and object gstat

I have a data.frame in R whose variables represent locations and whose observations are measures of a certain variable in those locations. I want to measure the decay of dependence for certain locations depending on distance, so the variogram comes particularly useful for my studies.
I am trying to use gstat library but I am a bit confused about certain parameters. As far as I understand the (empirical) variogram should only need as basic data:
The locations of the variables
Observations for these variables
And then other parameters like maximun distance, directions, ...
Now, gstat::variogram() function requires as first input an object of class gstat. Checking the documentation of function gstat() I see that it outputs an object of this class, but this function requires a formula argument, which is described as:
formula that defines the dependent variable as a linear model of independent variables; suppose the dependent variable has name z, for ordinary and simple kriging use the formula z~1; for simple kriging also define beta (see below); for universal kriging, suppose z is linearly dependent on x and y, use the formula z~x+y
Could someone explain me what this formula is for?
try
methods(variogram)
and you'll see that gstat has several methods for variogram, one requiring a gstat object as first argument.
Given a data.frame, the easiest is to use the formula method:
variogram(z~1, ~x+y, data)
which specifies that in data, z is the observed variable of interest, ~1 specifies a constant mean model, ~x+y specify that the coordinates are found in columns x and y of data.

How do you do constrained non-linear least squares in R

I am fitting a non-linear least squares model in R. I wish to minimize $(Y - f(Xb))^2$ where $f$ is a nonlinear monotone differentiable function, $X$ is a set of features and $b$ is the parameter vector. Is there a way of doing this with constraints on $b$? I want to constrain $b$ to be greater than 0 and I want L1-style shrinkage of some of the elements to 0. Is there a way of doing this in R? nls() doesn't allow for constraints.
You can convert $\|\boldsymbol{x}\|_1$ in your objective by putting a constraint on each element of $\boldsymbol{x}$, into a simple sum and then use quadprog to solve the problem.

Cross validation of PCA+lm

I'm a chemist and about an year ago I decided to know something more about chemometrics.
I'm working with this problem that I don't know how to solve:
I performed an experimental design (Doehlert type with 3 factors) recording several analyte concentrations as Y.
Then I performed a PCA on Y and I used scores on the first PC (87% of total variance) as new y for a linear regression model with my experimental coded settings as X.
Now I need to perform a leave-one-out cross validation removing each object before perform the PCA on the new "training set", then create the regression model on the scores as I did before, predict the score value for the observation in the "test set" and calculate the error in prediction comparing the predicted score and the score obtained by the projection of the object in the test set in the space of the previous PCA. So repeated n times (with n the number of point of my experimental design).
I'd like to know how can I do it with R.
Do the calculations e.g. by prcomp and then lm. For that you need to apply the PCA model returned by prcomp to new data. This needs two (or three) steps:
Center the new data with the same center that was calculated by prcomp
Scale the new data with the same scaling vector that was calculated by prcomp
Apply the rotation calculated by prcomp
The first two steps are done by scale, using the $center and $scale elements of the prcomp object. You then matrix multiply your data by $rotation [, components.to.use]
You can easily check whether your reconstruction of the PCA scores calculation by calculating scores for the data you input to prcomp and comparing the results with the $x element of the PCA model returned by prcomp.
Edit in the light of the comment:
If the purpose of the CV is calculating some kind of error, then you can choose between calculating error of the predicted scores y (which is how I understand you) and calculating error of the Y: the PCA lets you also go backwards and predict the original variates from scores. This is easy because the loadings ($rotation) are orthogonal, so the inverse is just the transpose.
Thus, the prediction in original Y space is scores %*% t (pca$rotation), which is faster calculated by tcrossprod (scores, pca$rotation).
There is also R library pls (Partial Least Squares), which has tools for PCR (Principal Component Regression)

Resources