I have used the 'rms' package to use restricted cubic splines in my cox regression model.
Example of my univariate code
S_HTN<-Surv(data_HTN$time+data_HTN$age, data_HTN$event_HTN)
htn_dd<-datadist(data_HTN)
option(datadist='htn_dd')
HTN_spline<-cph(S_HTN~rcs(centiles,3), data=data_HTN)
I have plotted these via ggplot fine but what I want to know if I can see where the knots are and then use these in other analyses?
You can access the formula of your fitted model along with the knot locations by using the function Function(), i.e. Function(HTN_spline).
However, you can also adjust knot locations manually, with x, y and z being your three desired knots:
HTN_spline <- cph(S_HTN ~ rms::rcs(centiles, c(x, y, z)), data=data_HTN)
Related
> # simulate some data...
> dat <- gamSim(1,n=400,dist="normal",scale=2)
> # fit model&plot
> library(mgcv)
> library(splines)
> b0 <- gam(y~s(x1),data=dat)
> plot(b0)
Following the code above, I can get a plot like this: enter image description here
Now, I want to get a similar plot using the ns() function in GAM:
> b1 <- gam(y ~ ns(x1), data=dat)
> plot(b1)
But when I run the code in R, it shows "No term to plot", so I would like to know how to plot this picture? Thanks!
Because ns() is not a (penalised) spline indicated by s(), te(), t2() or ti(), it is not a member of class "mgcv.smooth". When you plot the fitted GAM object, the code looks to see if there are any mgcv smooths to plot. All other terms in the model are parametric terms, including your natural spline. If you do summary(b1) you'll see the ns() term in the Parametric effects section of the output.
Basically, gam() is just looking at your model as if it were a bunch of linear parametric terms. It doesn't know that those terms in the model matrix map to basis functions and hence to a natural spline.
Visualisation is not easy; plot(b1, all.terms = TRUE) will plot the linear effects of each basis function, so at least you see something, but typically this is not what you want. You will have to predict from the model over the range of the covariate x1 and then plot the predicted values against the grid of x1 values.
This begs the question; what were you expecting gam() to do with the ns() basis?
I have a data which should follow the power law distribution.
x = distance
y = %
I want to create a model and to add the fitted line to my plot.
My aim to recreate something like this:
As author uses R-square; I assume they applied linear models, as R^2 is not suitable for non-linear models. http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression
However, I can't find out how to "curve" my line to the points; how to add the formula y ~ a*x^(-b) to my model.
Instead of curly line I got back the line as from the simple linear regression.
My questions are:
Do I correctly assume the model y ~ a*x^(-b) used by author is linear?
what type of model to use to recreate my example: lm, glm, nls, etc. ?
I generated the dummy data, including the applied power law formula from the plot above:
set.seed(42)
scatt<-runif(10)
x<-seq(1, 1000, 100)
b = 1.8411
a = 133093
y = a*x^(-b) + scatt # add some variability in my dependent variable
plot(y ~ x)
and tried to create a glm model.
# formula for non-linear model
m<-m.glm<-glm(y ~ x^2, data = dat) #
# add predicted line to plot
lines(x,predict(m),col="red",lty=2,lwd=3)
This is my first time to model, so I am really confused and I don't know where to start... thank you for any suggestion or directions, I really appreciate it...
I personally think this question a dupe of this: `nls` fails to estimate parameters of my model but I would be cold-blooded if I close it (as OP put a bounty). Anyway, bounty question can not be closed.
So the best I could think of, is to post a community wiki answer (I don't want to get this bounty).
As you want to fit a model of this form y ~ a*x^(-b), it often benefit from taking log transform on both sides and fit a linear model log(y) ~ log(x).
fit <- lm(log(y) ~ log(x))
As you have already known how to use curve to plot regression curve and are happy with it, I will now show how to make plot.
Some people call this log-log regression. Here are some other links I have for such kind of regression:
How to predict a new value using simple linear regression log(y)=b0+b1*log(x)
How to plot confidence bands for my weighted log-log linear regression?
m <- lm(log(y) ~ log(x), data=dat)
a <- exp(intercept)
b <- -exp(slope)
plot(y ~ x, type="p", lty=3)
lines(x, exp(predict(m)), col="blue", lty=2, lwd=3)
I am using random-forest for a regression problem to predict the label values of Test-Y for a given set of Test-X (new values of features). The model has been trained over a given Train-X (features) and Train-Y (labels). "randomForest" of R serves me very well in predicting the numerical values of Test-Y. But this is not all I want.
Instead of only a number, I want to use random-forest to produce a probability density function. I searched for a solution for several days and here is I found so far:
"randomForest" doesn't produce probabilities for regression, but only in classification. (via "predict" and setting type=prob).
Using "quantregForest" provides a nice way to make and visualize prediction intervals. But still not the probability density function!
Any other thought on this?
Please see the predict.all parameter of the predict.randomForest function.
library("ggplot2")
library("randomForest")
data(mpg)
rf = randomForest(cty ~ displ + cyl + trans, data = mpg)
# Predict the first car in the dataset
pred = predict(rf, newdata = mpg[1, ], predict.all = TRUE)
hist(pred$individual)
The histogram of 500 "elementary" predictions looks like this:
You can also use quantregForest with a very fine grid of quantiles, convert them into a "cumulative distribution function (cdf)" with R-function ecdf and convert this cdf into a density estimation with a kernel density estimator.
Hi I'm looking for some clarification here.
Context: I want to draw a line in a scatterplot that doesn't appear parametric, therefore I am using geom_smooth() in a ggplot. It automatically returns geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method. I gather gam stands for generalized additive models and it has a cubic spline used.
Are the following perceptions correct?
-Loess estimates the response at specific values.
-Splines are approximations that connect different piecewise functions that fit the data (which make up the generalized additive model), and cubic splines are the specific type of spline used here.
Lastly, When should splines be used, when should loess be used?
I'm trying to use the autoKrige() function in the automap package for a simple application of universal kriging. I have an irregularly spaced grid of measurements, and I want to interpolate between them on a fine spatial scale. Example code:
library('automap')
# create an irregularly spaced grid
y <-x <-c(-5,-4,-2,-1,-0.5,0,0.5,1,2,4,5)
grid <-expand.grid(x,y)
names(grid) <-c('x', 'y')
# create some measurements, greatest in the centre, with some noise
vals <-apply(grid,1, function(x) {12/(0.1+sqrt(x[1]^2 + x[2]^2))+rnorm(1,2,1.5)})
# get data into sp format
s <-SpatialPointsDataFrame(grid, data.frame(vals))
# make some prediction locations and get them into sp format
pred <-expand.grid(seq(-5,5,by=0.5), seq(-5,5,by=0.5))
pred <-cbind(pred[,1], pred[,2]) # this seems to be needed, not sure why
pred <-SpatialPoints(pred)
# try universal kriging
surf <-autoKrige(vals~x+y, s, new_data=pred)
This results in the error:
Error in gstat.formula.predict(d$formula, newdata, na.action = na.action, :
NROW(locs) != NROW(X): this should not occur
I have tried making new_data have the same number of rows as the original data, and have even tried making the co-ordinates in new_data exactly the same as the original data, but I still get this error. I'm new to geostatistics techniques so apologies if I'm making a basic mistake. Can anyone advise where I'm going wrong? Thanks.
The problem is that you have the syntax of the autoKrige function wrong. The formula input to autoKrige specifies the linear model you want to use, e.g.:
log(zinc) ~ dist
from the meuse dataset. In this case, you model log(zinc) versus dist using a linear model, and the residuals to this model are interpolated using the variogram. Essentially, universal kriging is linear regression with spatially correlated residuals.
In your case, you specify:
val ~ x+y
so autoKrige (gstat actually) will try to first model the linear model of vals versus x and y (multivariate regression), and interpolate the residuals using the variogram model. However, the x and y variables are not present in the SpatialPointsDataFrame.
What I think you want to do is to only interpolate spatially using the variogram model. In that case, the linear model is very simple, actually just fitting a mean value:
vals ~ 1
where the mean of vals is determined and the residuals are interpolated using the variogram model. This is actually known as Ordinary Kriging. Your call to autoKrige would be something like:
surf <-autoKrige(vals ~ 1, s, new_data=pred)