Hi I'm looking for some clarification here.
Context: I want to draw a line in a scatterplot that doesn't appear parametric, therefore I am using geom_smooth() in a ggplot. It automatically returns geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method. I gather gam stands for generalized additive models and it has a cubic spline used.
Are the following perceptions correct?
-Loess estimates the response at specific values.
-Splines are approximations that connect different piecewise functions that fit the data (which make up the generalized additive model), and cubic splines are the specific type of spline used here.
Lastly, When should splines be used, when should loess be used?
Related
I have used the 'rms' package to use restricted cubic splines in my cox regression model.
Example of my univariate code
S_HTN<-Surv(data_HTN$time+data_HTN$age, data_HTN$event_HTN)
htn_dd<-datadist(data_HTN)
option(datadist='htn_dd')
HTN_spline<-cph(S_HTN~rcs(centiles,3), data=data_HTN)
I have plotted these via ggplot fine but what I want to know if I can see where the knots are and then use these in other analyses?
You can access the formula of your fitted model along with the knot locations by using the function Function(), i.e. Function(HTN_spline).
However, you can also adjust knot locations manually, with x, y and z being your three desired knots:
HTN_spline <- cph(S_HTN ~ rms::rcs(centiles, c(x, y, z)), data=data_HTN)
I am just struggling to find an answer to a statistical/R questions concerning the use of splines.
I have been building a linear model such as below
lm(imaging~bloodtest + age + sex + timetoblood +timetoimage, data=df)
but have found from the residuals and examination of the plot that the fit of the model is not great, and is curvilinear relationship.
I am wanting to examine the use of restricted cubic splines in the regression model, but am wondering how to go about examining this? All the examples I can find are of univariate models on the various worked examples and I am wondering how to include splines in a multivariate model?
I believe that a restricted cubic spline (linear at the endpoints) is the same as a natural spline, implemented as ns() in the splines package (a "recommended" package, so it comes with R).
You can replace any or all of the continuous predictors in your model with natural spline terms with an appropriate number of degrees of freedom (this is a decision you have to make: see e.g. Harrell's Regression Modeling Strategies for guidance).
library(splines)
lm(imaging~ bloodtest + ns(age, 7) + sex + ns(timetoblood,7) + ns(timetoimage,7),
data=df)
If you want smooth interaction terms you might need to move to the mgcv package, which offers tensor product smooths (and uses penalized or regression splines).
I am developing a COX regression model in R.
The model I am currently using is as follows
fh <- cph(S ~ rcs(MPV,4) + rcs(age,3) + BMI + smoking + hyperten + gender +
rcs(FVCPP,3) + TLcoPP, x=TRUE, y=TRUE, surv=TRUE, time.inc=2*52)
If I then want to look at this with
print(fh, latex = TRUE)
I get 3 coefs/SE/Wald etc for MPV (MVP, MVP' and MVP'') and 2 for age (age, age').
Could someone please explain to me what these outputs are? i.e. I believe they are to do with the restricted cubic splines I have added.
When you write rcs(MPV,4), you define the number of knots to use in the spline; in this case 4. Similarly, rcs(age,3) defines a spline with 3 knots. Due to identifiability constraints, 1 knot from each spline is subtracted out. You can think of this as defining an intercept for each spline. So rcs(Age,3) is a linear combination of 2 nonlinear basis functions and an intercept, while rcs(MPV,4) is a linear combination of 3 nonlinear basis functions and an intercept, i.e.,
and
In the notation above, what you get out from the print statement are the regression coefficients and , with corresponding standard errors, p-values etc. The intercepts and are typically set to zero, but they are important, because without them, the model fitting routine how have no idea of where on the y-axis to constrain the splines.
As a final note, you might actually be more interested in the output of summary(fh).
I want to do a ridge regression in R by using glmnet or lm.ridge.
I need to do this regression with log(Y)
cost ~ size + weight ⇒ log(cost) ~ size + weight
However, I found that there is no link like glm for glmnet or lm.ridge.
Any ideas for this issue?
Use the alpha input parameter (with 0 value) for the ?glmnet function. As the documentation says:
alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.
Try something like the following:
glmnet(x=cbind(size, weight), y=log(cost), alpha=0, family='gaussian')
or may be with poission regression
glmnet(x=cbind(size, weight), y=cost, alpha=0, family='poission')
If your input data is not too huge, you can calculate the learnt weights by ridge regression from the training data directly by using the formula solve(t(X)%*%X + λ*I)%*%(t(X)%*%y), where X is your input variables matrix, y is response variable and I is the identity matrix, you can learn the best value of the lambda parameter using cross-validation from a held-out dataset.
Is there a way to gain the equations and coefficients when using the R's car package of both the smoothing and regression (line) model?
scatterplot(prestige ~ income, data=Prestige)
scatterplot(prestige ~ income, data=Prestige, smoother=gamLine)
(the gamLine argument requiring the mgcv package)
Thanks
In the first case, you can see the info by
summary(lm(prestige ~ income, data = Prestige))
In the 2nd case, something similar will likely apply but I don't know that package or smoothing method. It's surely not as simple as a linear fit. Take a look at ?mgcv-FAQ.