Extracting baseline hazards from stpm2-object - r

I need to extract the baseline hazards from a general survival model (GSM) that I've constructed using the rstpm2-package (a conversion of the stpm2 module in stata).
using the data in the rstpm2-package let's use this as an example:
library(rstpm2)
gsm <- stpm2(Surv(rectime,censrec==1)~hormon, data=brcancer, df=3)
sum.gsm <- summary(gsm)
So I've noticed that the summary has an element named bhazard:
sum.gsm#args$bhazard
However it seems to be filled with zeroes and holds one value per patient. As far as I understand the baseline hazard should consist of one hazard for every time-point in the data.
Does anyone have any experience that could be of assistance

You can use the plot and lines methods to plot survival and a number of other predictions. For example:
plot(gsm, newdata=data.frame(hormon=0))
Plot
Note that you need to specify the newdata argument. For more general plots, you can get predictions on a time grid with full covariates with standard errors using:
out <- predict(gsm,newdata=data.frame(hormon=0:1), grid=TRUE,
full=TRUE, se.fit=TRUE)
Then you could use ggplot2 or lattice to plot the intervals. For example,
library(ggplot2)
ggplot(out, aes(x=rectime, y=Estimate, ymin=lower, ymax=upper,
fill=factor(hormon))) +
geom_ribbon(alpha=0.6) + geom_line()
Plot
Edit: to predict survival at specific times, you can include the times in the newdata argument:
predict(gsm, newdata=data.frame(hormon=0,rectime=1:365))
(Note that survival is defined in terms of log(time), hence I have excluded rectime=0.)

Related

Extract values used to make plot for parametric component of GAM in R

I have performed a GAM that includes both continuous smooth terms and a categorical variable. I have plotted the model (mod) using plot(mod,residuals=T,all.terms=T,pages=1). This produces plots of the two smooth parameters as well as the parametric parameter. I want to extract the values used to make these plots so I can re do them and make them look nicer. If I save the plot in an object, this gives me everything I need for the smooth terms, but doesn't contain any information about the parametric component: plot.mod=plot(mod,residuals=T,all.terms=T,select=0). But I can't see where the numbers are coming from for the default plotting of the parametric component. Is there a way to extract these as well?
Here is a reproducible example of what I have done so far
library(mgcv)
# create some data
data=data.frame(response=c(10,12,8,9,3,4,5,5,4,5,4,5,4,1),pred1=c(9,8,8,9,6,7,6,4,3,4,2,3,3,1),pred2=as.factor(c("A","C","B","B","A","A","C","B","C","A","C","B","A","B")),pred3=c(1,6,3,4,8,6,4,5,7,10,11,3,12,1))
# run the GAM
mod <- gam(response ~ s(pred1,k=8) + pred2 + s(pred3,k=5), data=data, family=gaussian(), method="REML")
# the default plot
plot(mod,residuals=T,all.terms=T,pages=1)
# save values in an object. But this only saves the smooth terms.
plot.mod=plot(mod,residuals=T,all.terms=T,select=0)
# How can I extract the values used to plot the parametric term?
The plot I'm trying to extract the data to make:
From the plot.gam documentation, termplot is used for the parametric terms, so
plot.para <- termplot(mod, se = TRUE, plot = FALSE)
saves that plot to a list.
The format is different than the others, but the data is there.

Predicted Survival Curves using Corrected Group Prognosis Method

How can I plot predicted survival curves of a continuous covariate (let's say 20th and 80th percentile of the value) using the corrected group prognosis method as implemented in R by Therneau
For example,
library(survival)
library(survminer)
fit <- coxph( Surv(stop, event) ~ size + strata(rx), data = bladder )
ggadjustedcurves(fit, data=bladder, method = "conditional", strata=rx)
Now, this is useful because I am given two survival curves that are stratified by rx (either 0 or 1) and the conditional method is being acted upon the bladder data set. However, let's say I would like to use the marginal method but not stratify and instead plot my continuous covariate at 20th and 80th value but also re-balance the subpopulation. Would like any step in the right direction.
To re-state, I have a Cox model with continuous predictors. I would like to build a Cox model but not stratify on rx but have this in the model. Then, I want to pass the created Cox object into ggadjustedcurves() function with uses "subpopulation re-balancing" when given a reference data set. And then, instead of showing two survival curves stratified on a categorical variable, I want to plot two representative survival curves at the 20th and 80th percentile.
EDIT
My first attempt
fit2 <- coxph( Surv(stop, event) ~ size + rx, data = bladder ) #remove strata
fit2
# CGP
pred<- data.frame("rx" = 1, "size" = 3.2)
ggadjustedcurves(fit2, data = pred , method = "conditional", reference = bladder)
Is this what I think it is? Conditional re-balancing has been applied to the reference data set and then the predicted curves are generated for an individual with rx=1 and size of 3.2.
It is difficult to understand what you are truly looking for, but I think I have a rough idea. I think you want to plot the survival curve that would have been observed if every person in your sample had received a specific value for the continuous covariate. If there is no confounding, you can simply use a Cox model that includes only the continuous covariate and use the predict() function for a range of points in time and plot the results. If you need to adjust for confounding, you can include the confounders in the Cox model and use g-computation to obtain the desired probabilities. I describe this in a recent preprint: https://arxiv.org/pdf/2208.04644.pdf
This can be done in R using the contsurvplot package (also developed by me). First, install the package using:
devtools::install_github("RobinDenz1/contsurvplot")
Afterwards, fit your Cox model, but use x=TRUE in the coxph call:
library(survival)
library(contsurvplot)
library(riskRegression)
library(ggplot2)
fit2 <- coxph(Surv(stop, event) ~ size + rx, data=bladder, x=TRUE)
You can now call the plot_surv_lines function to obtain the causal survival curves for specific values of size, given the model. Using the horizon argument you can tell the function for which values you want to plot the survival curves. I choose the 20% and 80% quantile of size as you described:
plot_surv_lines(time="stop",
status="event",
variable="size",
data=bladder,
model=fit2,
horizon=quantile(bladder$size, probs=c(0.2, 0.8)))
The package contains a lot more plotting routines to visualize the causal effect of a continuous variable on a time-to-event outcome that might be more suitable for what you actually want.

Interpolation and Curve fitting with R

I am a chemical engineer and very new to R. I am attempting to build a tool in R (and eventually a shiny app) for analysis of phase boundaries. Using a simulation I get output that shows two curves which can be well represented by a 4th order polynomial. The data is as follows:
https://i.stack.imgur.com/8Oa0C.jpg
The procedure I have to follow uses the difference between the two curves to produce a second. In order to compare the curves, the data has to increase as a function of pressure in set increments, for example of 0.2 As can be seen, the data from the simulation is not incremental and there is no way to compare the curves based on the output.
To resolve this, in excel I carried out the following steps on each curve:
I plotted the data with pressure on the x axis and temperature on the y axis
Found the line of best fit using a 4th order polynomial
Used the equation of the curve to calculate the temperature at set increments of pressure
From this, I was able to compare the curves mathematically and produce the required output.
Does anyone have any suggestions how to carry this out in R, or if there is a more statistical or simplified approach that I have missed(extracting bezier curve points etc)?
As a bit of further detail, I have taken the data and merged it using tidyr so that the graphs (4 in total) are displayed in just three columns, the graph title, temperature and pressure. I did this after following a course on ggplot2 on Datacamp, but not sure if this format is suitable when carrying out regression etc? The head of my dataset can be seen here:
https://i.stack.imgur.com/WeaPz.jpg
I am very new to R, so apologies if this is a stupid question and I am using the wrong terms.
Though I agree with #Jaap's comment, polynomial regression is very easy in R. I'll give you the first line:
x <- c(0.26,3.33,5.25,6.54,7.38,8.1,8.73,9.3,9.81,10.28,10.69,11.08,11.43,11.75,12.05,12.33)
y <- c(16.33,24.6,31.98,38.38,43.3,48.18,53.08,57.99,62.92,67.86,72.81,77.77,82.75,87.75,92.77,97.81)
lm <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4))
Now your polynomial coefficients are in lm$coef, you can extract them and easily plot the fitted line, e.g.:
coefs <- lm$coef
plot(x, y)
lines(x, coefs[1] + coefs[2] * x + coefs[3] * x^2 + coefs[4] * x^3 + coefs[5] * x^4)
The fitted values are also simply given using lm$fit. Build the same polynomial for the second curve and compare the coefficients, not just the "lines".

add a logarithmic regression line to a scatterplot (comparison with Excel)

In Excel, it's pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this.
To generate the graph, I used ggplot2 with the following code.
ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)+
stat_smooth(method='loess')
But the code does local polynomial regression fitting which is based on averaging out numerous small linear regressions. My question is whether there is a log trend line in R similar to the one used in Excel.
An alternative I am looking for is to get an log equation in form y = (c*ln(x))+b; is there a coef() function to get 'c' and 'b'?
Let my data be:
c(0.599885189,0.588404133,0.577784156,0.567164179,0.556257176,
0.545350172,0.535112897,0.52449292,0.51540375,0.507271336,0.499904325,
0.498851894,0.498851894,0.497321087,0.4964600,0.495885955,0.494068121,
0.492154612,0.490145427,0.486892461,0.482395714,0.477229238,0.471010333)
The above data are y-points while the x-points are simply integers from 1:length(y) in increment of 1. In Excel: I can simply plot this and add a logarithmic trend line and the result would look:
With black being the log. In R, how would one do this with the above dataset?
I prefer to use base graphics instead of ggplot2:
#some data with a linear model
x <- 1:20
set.seed(1)
y <- 3*log(x)+5+rnorm(20)
#plot data
plot(y~x)
#fit log model
fit <- lm(y~log(x))
#look at result and statistics
summary(fit)
#extract coefficients only
coef(fit)
#plot fit with confidence band
matlines(x=seq(from=1,to=20,length.out=1000),
y=predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000)),
interval="confidence"))
#some data with a non-linear model
set.seed(1)
y <- log(0.1*x)+rnorm(20,sd=0.1)
#plot data
plot(y~x)
#fit log model
fit <- nls(y~log(a*x),start=list(a=0.2))
#look at result and statistics
summary(fit)
#plot fit
lines(seq(from=1,to=20,length.out=1000),
predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000))))
You can easily specify alternative smoothing methods (such as lm(), linear least-squares fitting) and an alternative formula
library(ggplot2)
g0 <- ggplot(dat, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)
g0 + stat_smooth(method="lm",formula=y~log(x),fill="red")
The confidence bands are automatically included: I changed the color to make them visible since they're very narrow. You can use se=FALSE in stat_smooth to turn them off.
The other answer shows you how to get the coefficients:
coef(lm(success~log(horizon),data=dat))
I can imagine you might next want to add the equation to the graph: see Adding Regression Line Equation and R2 on graph
I'm pretty sure a simple +scale_y_log10() would get you what you wanted. GGPlot stats are calculated after transformations, so the loess() would then be calculated on the log transformed data.
I've just written a blog post here that describes how to match Excel's logarithmic curve fitting exactly. The nub of the approach centers around the lm() function:
# Set x and data.to.fit to the independent and dependent variables
data.to.fit <- c(0.5998,0.5884,0.5777,0.5671,0.5562,0.5453,0.5351,0.524,0.515,0.5072,0.4999,0.4988,0.4988,0.4973,0.49,0.4958,0.4940,0.4921,0.4901,0.4868,0.4823,0.4772,0.4710)
x <- c(seq(1, length(data.to.fit)))
data.set <- data.frame(x, data.to.fit)
# Perform a logarithmic fit to the data set
log.fit <- lm(data.to.fit~log(x), data=data.set)
# Print out the intercept, log(x) parameters, R-squared values, etc.
summary(log.fit)
# Plot the original data set
plot(data.set)
# Add the log.fit line with confidence intervals
matlines(predict(log.fit, data.frame(x=x), interval="confidence"))
Hope that helps.

how to plot estimates through model in R

I'm trying to use R to do some modelling, I've started to use BodyWeight library, since I've seen some examples online. Just to understand and get used to the commands.
I've come to my final model, with estimates and I was wondering how to plot these estimates, but I haven't seen anything online..
Is there a way to plot the values of the estimates with a line, and dots for the values of each observation?
Where can I find information about how to do this, do I have to extract the values myself or it is possible to say plot the estimates of these model?
I'm only starting with R. Any help is welcome.
Thank you
There is no function that just plots the output of a model, since there are usually many different possible ways of plotting the output.
Take a look at the predict function for whatever model type you are using (for example, linear regressions using lm have a predict.lm function).
Then choose a plotting system (you will likely want different panels for different levels of diet, so use either ggplot2 or lattice). Then see if you can describe more clearly in words how you want the plot to look. Then update your question if you get stuck.
Now we've identified which dataset you are using, here's a possible plot:
#Run your model
model <- lme(weight ~ Time + Diet, BodyWeight, ~ 1 | Rat)
summary(model)
#Predict the values
#predict.lme is a pain because you have to specify which rat
#you are interested in, but we don't want that
#manually predicting things instead
times <- seq.int(0, 65, 0.1)
mcf <- model$coefficients$fixed
predicted <-
mcf["(Intercept)"] +
rep.int(mcf["Time"] * times, nlevels(BodyWeight$Diet)) +
rep(c(0, mcf["Diet2"], mcf["Diet3"]), each = length(times))
prediction_data <- data.frame(
weight = predicted,
Time = rep.int(times, nlevels(BodyWeight$Diet)),
Diet = rep(levels(BodyWeight$Diet), each = length(times))
)
#Draw the plot (using ggplot2)
(p <- ggplot(BodyWeight, aes(Time, weight, colour = Diet)) +
geom_point() +
geom_line(data = prediction_data)
)

Resources