How to use termplot function with fixed predictive variable values? - r

Let´s assume I want to draw a plot similar to here here using R, i.e. hazard ratio on y axis and some predictor variable on x axis based on a Cox model with spline term. The only exception is that I want to set my own x axis points; termplot seems to pick all the unique values from the data set but I want to use some sort of grid. This is because I am doing multiple imputation which induces different unique values in every round. Otherwise I can do combined inference quite easily but it would be a lot easier to make predictions for the same predictor values every imputation round.
So, I need to find a way to use termplot function so that I can fix predictor values or to find a workaround. I tried to use predict function but its newdata argument requires values for all other (adjusting) variables too, which inflates standard errors. This is a problem because I am also plotting confidence intervals. I think I could do this manually without any functions except that spline terms are out of my reach in this sense.
Here is an illustrative example.
library(survival)
data(diabetic)
diabetic<-diabetic[diabetic$eye=="right",]
# Model with spline term and two adjusting variables
mod<-coxph(Surv(time,status)~+pspline(age,df=3)+risk+laser,data=diabetic)
summary(mod)
# Let's pretend this is the grid
# These are in the data but it's easier for comparison in this example
ages<-20:25
# Right SEs but what if I want to use different age values?
termplot(mod,term=1,se=TRUE,plot=F)$age[20:25,]
# This does something else
termplot(mod,data=data.frame(age=ages),term=1,se=TRUE,plot=F)$age
# This produces an error
predict(mod,newdata=data.frame(age=ages),se.fit=T)
# This inflates variance
# May actually work with models without categorical variables: what to do with them?
# Actual predictions are different but all that matters is the difference between ages
# and they are in line
predict(mod,newdata=data.frame(age=ages,risk=mean(diabetic$risk),laser="xenon"),se.fit=T)
Please let me know if didn't exlain my problem sufficiently. I tried to keep it as simple as possible.

In the end, this how I worked it out. First, I made the predictions and SEs using termplot function and then I used linear interpolation to get approximately right predictions and SEs for my custom grid.
ptemp<-termplot(mod,term=1,se=TRUE,plot=F)
ptemp<-data.frame(ptemp[1]) # Picking up age and corresponding estimates and SEs
x<-ptemp[,1]; y<-ptemp[,2]; se<-ptemp[,3]
f<-approxfun(x,y) # Linear interpolation function
x2<-seq(from=10,to=50,by=0.5) # You can make a finer grid if you want
y2<-f(x2) # Interpolation itself
f_se<-approxfun(x,se) # Same for SEs
se2<-f_se(x2)
dat<-data.frame(x2,y2,se2) # The wanted result

Related

Fit multiple linear regression without an intercept with the function lm() in R

can you please help with this question in R, i need to get more than one predictor:
Fit multiple linear regression without an intercept with the function lm() to train data
using variable (y.train) as a goal variable and variables (X.mat.train) as
predictors. Look at the vector of estimated coefficients of the model and compare it with
the vector of ’true’ values beta.vec graphically
(Tip: build a plot for the differences of the absolute values of estimated and true values).
i have already tried it out with a code i will post at the end but it give me only one predictor but in this example i need to get more than one predictor:
and i think the wrong one is the first line but i couldn't find a way to fix it :
i can't put the data set here it's large but i have a variable that stores 190 observation from a victor (y.train) and another value that stores 190 observation from a matrix (X.mat.trian).. should give more than one predictor but for me it's giving one..
simple.fit = lm(y.train~0+ X.mat.train) #goal var #intercept # predictor
summary(simple.fit)# Showing the linear regression output
plot(simple.fit)
abline(simple.fit)
n <- summary(simple.fit)$coefficients
estimated_coeff <- n[ , 1]
estimated_coeff
plot(estimated_coeff)
#Coefficients: X.mat.train 0.5018
v <- sum(beta.vec)
#0.5369
plot(beta.vec)
plot(beta.vec, simple.fit)

Address unequal variance between groups before applying contrasts for a linear model? (r)

My Goal: I have an ordinal factor variable (5 levels) to which I would like to apply contrasts to test for a linear trend. However, the factor groups have heterogeneity of variance.
What I've done: Upon recommendation, I used lmRob() from robust pckg to create a robust linear model, then applied the contrasts.
# assign the codes for a linear contrast of 5 groups, save as object
contrast5 <- contr.poly(5)
# set contrast property of sf1 to contain the weights
contrasts(SCI$sf1) <- contrast5
# fit and save a robust model (exhaustive instead of subsampling)
robmod.sf1 <- lmRob(ICECAP_A ~ sf1, data = SCI, nrep = Exhaustive)
summary.lmRob(robmod.sf1)
My problem: I have since been reading that robust regression is more suited to address outliers, and not heterogeneity of variance. (bottom of https://stats.idre.ucla.edu/r/dae/robust-regression/_ ) This UCLA page (among others) suggests the sandwich package to get heteroskedastic-consistent (HC) standard errors (such as in https://thestatsgeek.com/2014/02/14/the-robust-sandwich-variance-estimator-for-linear-regression-using-r/ ).
But these examples use a series of functions/calls to generate output that gives you the HC that could be used to calculate confidence intervals, t-values, p-values etc.
My thinking is that if I use vcovHC(), I could get the HC std errors, but the HC std errors would not have been 'applied'/a property of the model, so I couldn't pass the model (with the HC errors) through a function to apply the contrasts that I ultimately want. I hope I am not conflating two separate concepts, but surely if a function addresses/down-weights outliers, that should at least somewhat address unequal variances as well?
Can anyone confirm if my reasoning is sound (and thus remain with lmRob()? Or suggest how I could just correct my standard errors and still apply the contrasts?
vcovHC is the right function to deal with heteroscedasticity. HC stands for heteroscedasticity-consistent estimator. This will not downweight outliers in estimates of model effects, but it will calculated the CIs and p-values differently to accommodate the impact of such outlying observations. lmRob does downweight outlying values and does not handle heteroscedasticity
See more here:
https://stats.stackexchange.com/questions/50778/sandwich-estimator-intuition/50788#50788

Plotting backtransformed data with LS means plot

I have used the package lsmeans in R to get the average estimate for all observations for my treatment factor (across the levels of a block factor in the experimental design that has been included with systematic effect because it only had 3 levels). I have used a sqrt transformation for my response variable.
Thus I have used the following commands in R.
First defining model
model<-sqrt(response)~treatment+block
Then applying lsmeans
model_lsmeans<-lsmeans(model,~treatment)
Then plotting this
plot(model_lsmeans,ylab="treatment", xlab="response(with 95% CI)")
This gives a very nice graph with estimates and 95% confidense intervals for the different treatment.
The problems is just that this graph is for the transformed response.
How do I get this same plot with the backtransformed response (so the squared response)?
I have tried to create a new data frame and extract the lsmean, lower.CL, and upper.CL:
a<-summary(model_lsmeans)
New_dataframe<-as.data.frame(a[c("treatment","lsmean","lower.CL","upper.CL")])
And then make these squared
New_dataframe$lsmean<-New_dataframe$lsmean^2
New_dataframe$lower.CL<-New_dataframe$lower.CL^2
New_dataframe$upper.CL<-New_dataframe$upper.CL^2
New_dataframe
This gives me the estimates and CI boundaries squared that I need.
The problem is that I cannot make the same graph for thise estimates and CI as the one that I did in LS means above.
How can I do this? The reason that I ask is that I want to have graphs that are all of a similar style for my article. Since I very much like this LSmeans plot, and it is very convenient for me to use on the non-transformed response variables, I would like to have all my graphs in this style.
Thank you very much for your help! Hope everything is clear!
Kind regards
Ditlev

Prediction at a new value using lowess function in R

I am using lowess function to fit a regression between two variables x and y. Now I want to know the fitted value at a new value of x. For example, how do I find the fitted value at x=2.5 in the following example. I know loess can do that, but I want to reproduce someone's plot and he used lowess.
set.seed(1)
x <- 1:10
y <- x + rnorm(x)
fit <- lowess(x, y)
plot(x, y)
lines(fit)
Local regression (lowess) is a non-parametric statistical method, it's a not like linear regression where you can use the model directly to estimate new values.
You'll need to take the values from the function (that's why it only returns a list to you), and choose your own interpolation scheme. Use the scheme to predict your new points.
Common technique is spline interpolation (but there're others):
https://www.r-bloggers.com/interpolation-and-smoothing-functions-in-base-r/
EDIT: I'm pretty sure the predict function does the interpolation for you. I also can't find any information about what exactly predict uses, so I've tried to trace the source code.
https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/library/stats/R/loess.R
else { ## interpolate
## need to eliminate points outside original range - not in pred_
I'm sure the R code calls the underlying C implementation, but it's not well documented so I don't know what algorithm it uses.
My suggestion is: either trust the predict function or roll out your own interpolation algorithm.

Simulating from a vector of discrete data

I have a vector of discrete data and I want to simulate from the empirical distribution associated to this data, I was simulating with the function rlogspline after doing fit<-logspline(vector_of_data) where vector_of_data is data that is suppose to be coming from a continuous distribution, that's why I used logspline, but with this vector I have the certainty that the values in it are of discrete nature so I can't use logspline to adjust a "fit" for it.
Basically what I want to do is to adjust a "fit" of the observed data and then use that fit to simulate those values. Do you think this can be done in R?
Thank you very much for your help.
I think sample(x,...,replace=TRUE) (sampling with replacement) should simulate from the empirical distribution ...
I am not totally clear exactly what you are trying to do, but could you use something like quantile and runif, for example:
obs <- c(125,110,115,100,150) # original observations
sim <- quantile(obs, runif(10000)) # simulations
hist(sim, freq=FALSE)

Resources