Backtransform LMM output in R - r

When performing linear mixed models, I have had to square-root(log) transform the data to achieve a normal distribution. Having performed the LMMs, I now want to plot the results onto a graph, but on the original scale i.e. not square-root(log) transformed.
Apparently I can use my raw (untransformed data) on a graph, and to create the predicted regression line I can use the coefficients from my LMM output to get backtransformed predicted y-values for each of my x values. This is where I'm stuck - I have no idea how to do this. Can anyone help?

Related

How to use termplot function with fixed predictive variable values?

Let´s assume I want to draw a plot similar to here here using R, i.e. hazard ratio on y axis and some predictor variable on x axis based on a Cox model with spline term. The only exception is that I want to set my own x axis points; termplot seems to pick all the unique values from the data set but I want to use some sort of grid. This is because I am doing multiple imputation which induces different unique values in every round. Otherwise I can do combined inference quite easily but it would be a lot easier to make predictions for the same predictor values every imputation round.
So, I need to find a way to use termplot function so that I can fix predictor values or to find a workaround. I tried to use predict function but its newdata argument requires values for all other (adjusting) variables too, which inflates standard errors. This is a problem because I am also plotting confidence intervals. I think I could do this manually without any functions except that spline terms are out of my reach in this sense.
Here is an illustrative example.
library(survival)
data(diabetic)
diabetic<-diabetic[diabetic$eye=="right",]
# Model with spline term and two adjusting variables
mod<-coxph(Surv(time,status)~+pspline(age,df=3)+risk+laser,data=diabetic)
summary(mod)
# Let's pretend this is the grid
# These are in the data but it's easier for comparison in this example
ages<-20:25
# Right SEs but what if I want to use different age values?
termplot(mod,term=1,se=TRUE,plot=F)$age[20:25,]
# This does something else
termplot(mod,data=data.frame(age=ages),term=1,se=TRUE,plot=F)$age
# This produces an error
predict(mod,newdata=data.frame(age=ages),se.fit=T)
# This inflates variance
# May actually work with models without categorical variables: what to do with them?
# Actual predictions are different but all that matters is the difference between ages
# and they are in line
predict(mod,newdata=data.frame(age=ages,risk=mean(diabetic$risk),laser="xenon"),se.fit=T)
Please let me know if didn't exlain my problem sufficiently. I tried to keep it as simple as possible.
In the end, this how I worked it out. First, I made the predictions and SEs using termplot function and then I used linear interpolation to get approximately right predictions and SEs for my custom grid.
ptemp<-termplot(mod,term=1,se=TRUE,plot=F)
ptemp<-data.frame(ptemp[1]) # Picking up age and corresponding estimates and SEs
x<-ptemp[,1]; y<-ptemp[,2]; se<-ptemp[,3]
f<-approxfun(x,y) # Linear interpolation function
x2<-seq(from=10,to=50,by=0.5) # You can make a finer grid if you want
y2<-f(x2) # Interpolation itself
f_se<-approxfun(x,se) # Same for SEs
se2<-f_se(x2)
dat<-data.frame(x2,y2,se2) # The wanted result

How to rectify heteroscedasticity for multiple linear regression model

I'm fitting a multiple linear regression model with 6 predictiors (3 continuous and 3 categorical). The residuals vs. fitted plot show that there is heteroscedasticity, also it's confirmed by bptest().
summary of sales_lm
rediduals vs. fitted plot
Also I calculated the sqrt for my train data and test data, as showed below:
sqrt(mean(sales_train_lm_pred-sales_train$SALES)^2)
2 3533.665
sqrt(mean(sales_test_lm_pred-sales_test$SALES)^2)
2 3556.036
I tried to fit glm() model, but still didn't rectify heteroscedasticity.
glm.test3<-glm(SALES~.,weights=1/sales_fitted$.resid^2,family=gaussian(link="identity"), data=sales_train)
resid vs. fitted plot for glm.test3
it looks weird.
glm.test3 plot
Could you please help me what should I do next?
Thanks in advance!
That you observe heteroscedasticity for your data means that the variance is not stationary. You can try the following:
1) Apply the one-parameter Box-Cox transformation (of the which the log transform is a special case) with a suitable lambda to one or more variables in the data set. The optimal lambda can be determined by looking at its log-likelihood function. Take a look at MASS::boxcox.
2) Play with your feature set (decrease, increase, add new variables).
2) Use the weighted linear regression method.

How to fix a exponential regression to positively skewed data?

file linki am trying to explore exponential regression for predicting #calls_to_be_made(DV) for given #sales_potential(IDV) of customers.The data is positively skewed. is it better to try exponential regression for this case.? if so how to proceed? if not why? Need some more ideas.
I have to performed linear regression to predict #calls_to_be_made(DV) for given #sales_potential(IDV) of customers.The data is positively skewed. I have eliminated zero values and i have log10 transformed both the variables to get near normal distribution. it gave an rsqr value of 55%.
The expected result is how to perform exponential regression and also need clarity of thought on which model (technique) is good to prefer for my particular problem.?
#importdata
file1<-read.xlsx("wx2.xlsx",sheetName = "Sheet1",header = T)
str(file1)
#file1$Potential.18.19<-round(file1$Potential.18.19,digits = 0)
data1<-file1
data1$Sales_to_be_made <- log10(data1$Sales_to_be_made)
data1$calls_to_be_made <- log10(data1$calls_to_be_made)
dput(data1)

Plot residuals vs predicted response in R

Is Plot residuals vs predicted response equivalent to Plot residuals vs fitted ?
If so, then would be plotted by plot(lm) and plot(predict(lm)), where lm is the linear model ?
Am I correct?
Maybe little off-topic, but as an addition: package named ggfortify might come handy. Super easy to use, like this:
library(ggfortify)
autoplot(mod3)
Yields an output with the most important things you need to know, if your model violates the lm assumptions or not. An example output here:
Yes, the fitted values are the predicted responses on the training data, i.e. the data used to fit the model, so plotting residuals vs. predicted response is equivalent to plotting residuals vs. fitted.
As for your second question, the plot would be obtained by plot(lm), but before that you have to run par(mfrow = c(2, 2)). This is because plot(lm) outputs 4 plots, one of which is the one you want, i.e the residuals vs fitted plot. The command above divides the output screen into four facets, so each plot will be shown in one. The plot you are looking for will appear in the top left.

Plotting backtransformed data with LS means plot

I have used the package lsmeans in R to get the average estimate for all observations for my treatment factor (across the levels of a block factor in the experimental design that has been included with systematic effect because it only had 3 levels). I have used a sqrt transformation for my response variable.
Thus I have used the following commands in R.
First defining model
model<-sqrt(response)~treatment+block
Then applying lsmeans
model_lsmeans<-lsmeans(model,~treatment)
Then plotting this
plot(model_lsmeans,ylab="treatment", xlab="response(with 95% CI)")
This gives a very nice graph with estimates and 95% confidense intervals for the different treatment.
The problems is just that this graph is for the transformed response.
How do I get this same plot with the backtransformed response (so the squared response)?
I have tried to create a new data frame and extract the lsmean, lower.CL, and upper.CL:
a<-summary(model_lsmeans)
New_dataframe<-as.data.frame(a[c("treatment","lsmean","lower.CL","upper.CL")])
And then make these squared
New_dataframe$lsmean<-New_dataframe$lsmean^2
New_dataframe$lower.CL<-New_dataframe$lower.CL^2
New_dataframe$upper.CL<-New_dataframe$upper.CL^2
New_dataframe
This gives me the estimates and CI boundaries squared that I need.
The problem is that I cannot make the same graph for thise estimates and CI as the one that I did in LS means above.
How can I do this? The reason that I ask is that I want to have graphs that are all of a similar style for my article. Since I very much like this LSmeans plot, and it is very convenient for me to use on the non-transformed response variables, I would like to have all my graphs in this style.
Thank you very much for your help! Hope everything is clear!
Kind regards
Ditlev

Resources