Plotting both a GLM and LM of same data - r

I would like to plot both a linear model (LM) and non-linear (GLM) model of the same data.
The range between 16% - 84% should line up between a LM and GLM, Citation: section 3.5
I have included a more complete chunk of the code because I am not sure at which point I should try to cut the linear model. or at which point I have messed up - I think with the linear model.
The code below results in the following image:
My Objective (taken from previous citation-link).
Here is my data:
mydata3 <- structure(list(
dose = c(0, 0, 0, 3, 3, 3, 7.5, 7.5, 7.5, 10, 10, 10, 25, 25, 25, 50, 50, 50),
total = c(25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L),
affected = c(1, 0, 1.2, 2.8, 4.8, 9, 2.8, 12.8, 8.6, 4.8, 4.4, 10.2, 6, 20, 14, 12.8, 23.4, 21.6),
probability = c(0.04, 0, 0.048, 0.112, 0.192, 0.36, 0.112, 0.512, 0.344, 0.192, 0.176, 0.408, 0.24, 0.8, 0.56, 0.512, 0.936, 0.864)),
.Names = c("dose", "total", "affected", "probability"),
row.names = c(NA, -18L),
class = "data.frame")
My script:
#load libraries
library(ggplot2)
library(drc) # glm model
library(plyr) # rename function
library(scales) #log plot scale
#Creating linear model
mod_linear <- lm(probability ~ (dose), weights = total, data = mydata3)
#Creating data.frame: note values 3 and 120 refer to 16% and 84% response in sigmoidal plot
line_df <-expand.grid(dose=exp(seq(log(3),log(120),length=200)))
#Extracting values from linear model
p_line_df <- as.data.frame(cbind(dose = line_df,
predict(mod_linear, newdata=data.frame(dose = line_df),
interval="confidence",level=0.95)))
#Renaming linear df columns
p_line_df <-rename(p_line_df, c("fit"="probability"))
p_line_df <-rename(p_line_df, c("lwr"="Lower"))
p_line_df <-rename(p_line_df, c("upr"="Upper"))
p_line_df$model <-"Linear"
#Create sigmoidal dose-response curve using drc package
mod3 <- drm(probability ~ (dose), weights = total, data = mydata3, type ="binomial", fct=LL.2(names=c("Slope:b","ED50:e")))
#data frame for ggplot2
base_DF_3 <-expand.grid(dose=exp(seq(log(1.0000001),log(10000),length=200)))
#extract data from model
p_df3 <- as.data.frame(cbind(dose = base_DF_3,
predict(mod3, newdata=data.frame(dose = base_DF_3),
interval="confidence", level=.95)))
#renaming columns
p_df3 <-rename(p_df3, c("Prediction"="probability"))
p_df3$model <-"Sigmoidal"
#combining Both DataFames
p_df_all <- rbind(p_df3, p_line_df)
#plotting
ggplot(p_df_all, aes(x=dose,y=probability, group=model))+
geom_line(aes(x=dose,y=probability,group=model,linetype=model),show.legend = TRUE)+
scale_x_log10(breaks = c(0.000001, 10^(0:10)),labels = c(0, math_format()(0:10)))

Looking at the reference you provided, what the authors describe is the use of a linear model to approximate the central portion of a (sigmoidal) logistic function. The linear model that achieves this is a straight line that passes through the inflection point of the logistic curve, and has the same slope as the logistic function at that inflection point. We can use some basic algebra and calculus to solve this problem.
From ?LL.2, we see that the form of the logistic function being fitted by drm is
f(x) = 1 / {1 + exp(b(log(x) - log(e)))}
We can get the values of the coefficient in this equation by
b = mod3$coefficients[1]
e = mod3$coefficients[2]
Now, by differentiation, the slope of the logistic function is given by
dy/dx = -(b * exp((log(x)-log(e))*b)) / (1+exp((log(x)-log(e))*b))^2
At the inflection point, the dose (x) is equal to the coefficient e, thus the slope at the inflection point simplifies (greatly) to
sl50 = -b/4
Since we also know that the inflection point occurs at the point where probability = 0.5 and dose = e, we can construct the straight line (in log-transformed coordinates) like this:
linear_probability = sl50 * (log(p_df3$dose) - log(e)) + 0.5
Now, to plot the logistic and linear functions together:
p_df3_lin = p_df3
p_df3_lin$model = 'linear'
p_df3_lin$probability = linear_probability
p_df_all <- rbind(p_df3, p_df3_lin)
ggplot(p_df_all, aes(x=dose,y=probability, group=model))+
geom_line(aes(x=dose,y=probability,group=model,linetype=model),show.legend = TRUE)+
scale_x_log10(breaks = c(0.000001, 10^(0:10)),labels = c(0, math_format()(0:10))) +
scale_y_continuous(limits = c(0,1))

Related

Is there a way to obtain residual plots for all interaction terms?

I am working on an exercise asking me "Plot the residuals against Y_hat, each predictor variable, and each two-factor interaction term on separate graphs." Here is a snippet of the data set I am using:
> dput(head(Commercial_Properties, 10))
structure(list(Rental_Rates = c(13.5, 12, 10.5, 15, 14, 10.5,
14, 16.5, 17.5, 16.5), Age = c(1, 14, 16, 4, 11, 15, 2, 1, 1,
8), Op_Expense_Tax = c(5.02, 8.19, 3, 10.7, 8.97, 9.45, 8, 6.62,
6.2, 11.78), Vacancy_Rate = c(0.14, 0.27, 0, 0.05, 0.07, 0.24,
0.19, 0.6, 0, 0.03), Total_Sq_Ft = c(123000, 104079, 39998, 57112,
60000, 101385, 31300, 248172, 215000, 251015), residuals = c(`1` = -1.03567244005944,
`2` = -1.51380641405037, `3` = -0.591053402133659, `4` = -0.133568082335235,
`5` = 0.313283765150399, `6` = -3.18718522392237, `7` = -0.538356748944345,
`8` = 0.236302385996349, `9` = 1.98922037248654, `10` = 0.105829602747806
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
From here I created the proper linear model that includes two factor interaction terms:
commercial_properties_lm_two_degree_interaction <-
lm(data=Commercial_Properties,
formula=Rental_Rates ~ (Age + Op_Expense_Tax + Vacancy_Rate + Total_Sq_Ft)^2)
Next what I was hoping to accomplish was to plot the residuals not just of the linear terms, but also of the interaction terms. I attempted to do this using the residualPlots() function in the car package
library(car)
residualPlots(model=commercial_properties_lm_two_degree_interaction,
terms=~ (Age + Op_Expense_Tax + Vacancy_Rate + Total_Sq_Ft)^2)
When applied in this way the output only produced the residual plots against the linear terms, it didn't plot any interactions. So I then attempted to do it manually, but I got an error:
residualPlots(model=commercial_properties_lm_two_degree_interaction,
terms=~ Age + Op_Expense_Tax + Vacancy_Rate + Tota_Sq_Ft +
Age:Op_Expense_Tax + Age:Vacancy_Rate)
Error in termsToMf(model, terms) : argument 'terms' not interpretable.
Now if I were to do things completely manually I was able to get an interaction plot for example:
with(data=Commercial_Properties, plot(x=Op_Expense_Tax * Vacancy_Rate, y=residuals))
plotted successfully. My issue is that sure I can do this completely manually for a reasonably small amount of variables, but it will get extremely tedious once the amount of variables begins to get larger.
So my question is if there is a way to use an already created function in R to make residual plots of the interaction terms or would I be left to doing it completely manually or most likely having to write some sort of loop ?
Note, I'm not asking about partial residuals. I haven't gotten to that point in my text I'm using. Just plain interaction terms against residuals.
You could do an eval(parse()) approach using the 'term.labels' attribute.
With gsub(':', '*', a[grep(':', a)]) pull out the interaction terms and replace : with * so it can be evaluated.
a <- attr(terms(commercial_properties_lm_two_degree_interaction), 'term.labels')
op <- par(mfrow=c(2, 3))
with(Commercial_Properties,
lapply(gsub(':', '*', a[grep(':', a)]), function(x)
plot(eval(parse(text=x)), residuals, xlab=x)))
par(op)
Edit
This is how we would do this with a for loop in R (but see comments below):
as <- gsub(':', '*', a[grep(':', a)])
op <- par(mfrow=c(2, 3))
for (x in as) {
with(Commercial_Properties,
plot(eval(parse(text=x)), residuals, xlab=x)
)
}
par(op)

R confusionMatrix error data and reference factors with same levels

I'm trying to understand how to make a confusion matrix after I use the glm function for a logistic regression. Here is my code so far. I am using the caret package and the confusionMatrix function.
dput(head(wine_quality))
structure(list(fixed.acidity = c(7, 6.3, 8.1, 7.2, 7.2, 8.1),
volatile.acidity = c(0.27, 0.3, 0.28, 0.23, 0.23, 0.28),
citric.acid = c(0.36, 0.34, 0.4, 0.32, 0.32, 0.4), residual.sugar = c(20.7,
1.6, 6.9, 8.5, 8.5, 6.9), chlorides = c(0.045, 0.049, 0.05,
0.058, 0.058, 0.05), free.sulfur.dioxide = c(45, 14, 30,
47, 47, 30), total.sulfur.dioxide = c(170, 132, 97, 186,
186, 97), density = c(1.001, 0.994, 0.9951, 0.9956, 0.9956,
0.9951), pH = c(3, 3.3, 3.26, 3.19, 3.19, 3.26), sulphates = c(0.45,
0.49, 0.44, 0.4, 0.4, 0.44), alcohol = c(8.8, 9.5, 10.1,
9.9, 9.9, 10.1), quality = structure(c(4L, 4L, 4L, 4L, 4L,
4L), .Label = c("3", "4", "5", "6", "7", "8", "9", "white"
), class = "factor"), type = structure(c(3L, 3L, 3L, 3L,
3L, 3L), .Label = c("", "red", "white"), class = "factor"),
numeric_type = c(0, 0, 0, 0, 0, 0)), row.names = c(NA, 6L
), class = "data.frame")
library(tibble)
library(broom)
library(ggplot2)
library(caret)
any(is.na(wine_quality)) # this evaulates to FALSE
wine_model <- glm(type ~ fixed.acidity + volatile.acidity + citric.acid + residual.sugar + chlorides + free.sulfur.dioxide + total.sulfur.dioxide + density + pH + sulphates + alcohol, wine_quality, family = "binomial")
# split data into test and train
smp_size <- floor(0.75 * nrow(wine_quality))
set.seed(123)
train_ind <- sample(seq_len(nrow(wine_quality)), size = smp_size)
train <- wine_quality[train_ind, ]
test <- wine_quality[-train_ind, ]
# make prediction on train data
pred <- predict(wine_model)
train$fixed.acidity <- as.numeric(train$fixed.acidity)
round(train$fixed.acidity)
train$fixed.acidity <- as.factor(train$fixed.acidity)
pred <- as.numeric(pred)
round(pred)
pred <- as.factor(pred)
confusionMatrix(pred, wine_quality$fixed.acidity)
After this final line of code, I get this error:
Error: `data` and `reference` should be factors with the same levels.
This error doesn't make sense to me. I've tested that the length of pred and length of fixed.acidity are both the same (6497) and also they are both factor data type.
length(pred)
length(wine_quality$fixed.acidity)
class(pred)
class(train$fixed.acidity)
Is there any obvious reason why this confusion matrix is not working? I'm trying to find a hit ratio for the model. I would appreciate dummy explanations I really don't know what I'm doing here.
The error from confusionMatrix() tells us that the two variables passed to the function need to be factors with the same values. We can see why we received the error when we run str() on both variables.
> str(pred)
Factor w/ 5318 levels "-23.6495182533792",..: 310 339 419 1105 310 353 1062 942 594 1272 ...
> str(wine_quality$fixed.acidity)
num [1:6497] 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
pred is a factor, when wine_quality$fixed_acidity is a numeric vector. The confusionMatrix() function is used to compare predicted and actual values of a dependent variable. It is not intended to cross tabulate a predicted variable and an independent variable.
Code in the question uses fixed.acidity in the confusion matrix when it should be comparing predicted values of type against actual values of type from the testing data.
Also, the code in the question creates the model prior to splitting the data into test and training data. The correct procedure is to split the data before building a model on the training data, make predictions with the testing (hold back) data, and compare actuals to predictions in the testing data.
Finally, the result of the predict() function as coded in the original post is the linear predicted values from the GLM model (equivalent to wine_model$linear.predictors in the output model object). These values must be further transformed to make them suitable before use in confusionMatrix().
In practice, it's easier to use caret::train() with the GLM method and binomial family, where predict() will generate results that are usable in confusionMatrix(). We'll illustrate this with the UCI wine quality data.
First, we download the data from the UCI Machine Learning Repository to make the example reproducible.
download.file("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv",
"./data/wine_quality_red.csv")
download.file("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv",
"./data/wine_quality_white.csv")
Second, we load the data, assign type as either red or white depending on the data file, and bind the data into a single data frame.
red <- read.csv("./data/wine_quality_red.csv",header = TRUE,sep=";")
white <- read.csv("./data/wine_quality_white.csv",header = TRUE,sep=";")
red$type <- "red"
white$type <- "white"
wine_quality <- rbind(red,white)
wine_quality$type <- factor(wine_quality$type)
Next, we split the data into test and training based on values of type so each data frame gets a proportional number of red and white wines, train the data with the default caret::train() settings and a GLM method.
library(caret)
set.seed(123)
inTrain <- createDataPartition(wine_quality$type, p = 3/4)[[1]]
training <- wine_quality[ inTrain,]
testing <- wine_quality[-inTrain,]
aModel <- train(type ~ .,data = training, method="glm", familia's = "binomial")
Finally, we use the model to make predictions on the hold back data frame, and run a confusion matrix.
testLM <- predict(aModel,testing)
confusionMatrix(data=testLM,reference=testing$type)
...and the output:
> confusionMatrix(data=testLM,reference=testing$type)
Confusion Matrix and Statistics
Reference
Prediction red white
red 393 3
white 6 1221
Accuracy : 0.9945
95% CI : (0.9895, 0.9975)
No Information Rate : 0.7542
P-Value [Acc > NIR] : <2e-16
Kappa : 0.985
Mcnemar's Test P-Value : 0.505
Sensitivity : 0.9850
Specificity : 0.9975
Pos Pred Value : 0.9924
Neg Pred Value : 0.9951
Prevalence : 0.2458
Detection Rate : 0.2421
Detection Prevalence : 0.2440
Balanced Accuracy : 0.9913
'Positive' Class : red

calculate gaussian curve fitting on a list

I have a list data like below. I want to perform nonlinear regression Gaussian curve fitting between mids and counts for each element of my list and report mean and standard deviation
mylist<- structure(list(A = structure(list(breaks = c(-10, -9,
-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4), counts = c(1L,
0L, 1L, 5L, 9L, 38L, 56L, 105L, 529L, 2858L, 17L, 2L, 0L, 2L),
density = c(0.000276014352746343, 0, 0.000276014352746343,
0.00138007176373171, 0.00248412917471709, 0.010488545404361,
0.0154568037537952, 0.028981507038366, 0.146011592602815,
0.788849020149048, 0.00469224399668783, 0.000552028705492686,
0, 0.000552028705492686), mids = c(-9.5, -8.5, -7.5, -6.5,
-5.5, -4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5),
xname = "x", equidist = TRUE), .Names = c("breaks", "counts",
"density", "mids", "xname", "equidist"), class = "histogram"),
B = structure(list(breaks = c(-7, -6, -5,
-4, -3, -2, -1, 0), counts = c(2L, 0L, 6L, 2L, 2L, 1L, 3L
), density = c(0.125, 0, 0.375, 0.125, 0.125, 0.0625, 0.1875
), mids = c(-6.5, -5.5, -4.5, -3.5, -2.5, -1.5, -0.5), xname = "x",
equidist = TRUE), .Names = c("breaks", "counts", "density",
"mids", "xname", "equidist"), class = "histogram"), C = structure(list(
breaks = c(-7, -6, -5, -4, -3, -2, -1, 0, 1), counts = c(2L,
2L, 4L, 5L, 14L, 22L, 110L, 3L), density = c(0.0123456790123457,
0.0123456790123457, 0.0246913580246914, 0.0308641975308642,
0.0864197530864197, 0.135802469135802, 0.679012345679012,
0.0185185185185185), mids = c(-6.5, -5.5, -4.5, -3.5,
-2.5, -1.5, -0.5, 0.5), xname = "x", equidist = TRUE), .Names = c("breaks",
"counts", "density", "mids", "xname", "equidist"), class = "histogram")), .Names = c("A",
"B", "C"))
I have read this
Fitting a density curve to a histogram in R
but this is how to fit a curve to a histogram. what I want is Best-fit values"
" Mean"
" SD"
If I use PRISM to do it, I should get the following results
for A
Mids Counts
-9.5 1
-8.5 0
-7.5 1
-6.5 5
-5.5 9
-4.5 38
-3.5 56
-2.5 105
-1.5 529
-0.5 2858
0.5 17
1.5 2
2.5 0
3.5 2
performing nonlinear regression Gaussian curve fitting , I get
"Best-fit values"
" Amplitude" 3537
" Mean" -0.751
" SD" 0.3842
for the second set
B
Mids Counts
-6.5 2
-5.5 0
-4.5 6
-3.5 2
-2.5 2
-1.5 1
-0.5 3
"Best-fit values"
" Amplitude" 7.672
" Mean" -4.2
" SD" 0.4275
and for the third one
Mids Counts
-6.5 2
-5.5 2
-4.5 4
-3.5 5
-2.5 14
-1.5 22
-0.5 110
0.5 3
I get this
"Best-fit values"
" Amplitude" 120.7
" Mean" -0.6893
" SD" 0.4397
In order to convert the histogram back to the estimate of the mean and standard deviation. First convert the results of the bin counts times the bin. This will be an approximation of the original data.
Based on your example above:
#extract the mid points and create list of simulated data
simdata<-lapply(mylist, function(x){rep(x$mids, x$counts)})
#if the original data were integers then this may give a better estimate
#simdata<-lapply(mylist, function(x){rep(x$breaks[-1], x$counts)})
#find the mean and sd of simulated data
means<-lapply(simdata, mean)
sds<-lapply(simdata, sd)
#or use sapply in the above 2 lines depending on future process needs
If your data was integers then using the breaks as the bins will provide a better estimate. Depending on the function for the histogram (ie right=TRUE/FALSE) may shift the results by one.
Edit
I thought this was going to be an easy one. I reviewed the video, the sample data shown was:
mids<-seq(-7, 7)
counts<-c(7, 1, 2, 2, 2, 5, 217, 70, 18, 0, 2, 1, 2, 0, 1)
simdata<-rep(mids, counts)
The video results were mean = -0.7359 and sd= 0.4571. The solution which I found provided the closest results was using the "fitdistrplus" package:
fitdist(simdata, "norm", "mge")
Using the "maximizing goodness-of-fit estimation" resulted in mean = -0.7597280 and sd= 0.8320465.
At this point, the method above provides a close estimate but does not exactly match. I don't not know what technique was used to calculate the fit from the video.
Edit #2
The above solutions involved recreating the original data and fitting that using either the mean/sd or using the fitdistrplus package. This attempt is an attempt to perform a least-square fit using the Gaussian distribution.
simdata<-lapply(mylist, function(x){rep(x$mids, x$counts)})
means<-sapply(simdata, mean)
sds<-sapply(simdata, sd)
#Data from video
#mids<-seq(-7, 7)
#counts<-c(7, 1, 2, 2, 2, 5, 217, 70, 18, 0, 2, 1, 2, 0, 1)
#make list of the bins and distribution in each bin
mids<-lapply(mylist, function(x){x$mids})
dis<-lapply(mylist, function(x) {x$counts/sum(x$counts)})
#function to perform the least square fit
nnorm<-function(values, mids, dis) {
means<-values[1]
sds<-values[2]
#print(paste(means, sds))
#calculate out the Gaussian distribution for each bin
modeld<-dnorm(mids, means, sds)
#sum of the squares
diff<-sum( (modeld-dis)^2)
diff
}
#use optim function with the mean and sd as initial guesses
#find the mininium with the mean and SD as fit parameters
lapply(1:3, function(i) {optim(c(means[[i]], sds[[i]]), nnorm, mids=mids[[i]], dis=dis[[i]])})
This solution provides a closer answer to PRISM results, but still not the same. Here is a comparison of all the 4 solutions.
From the table, the least square fit (the one just above) provides the closest approximation. Maybe tweaking the mid points dnorm function might help. But Case B data is farthest from being normally distributed but the PRISM software still generates a small standard deviation, while the other methods are similar. It is possible the PRISM software performs some type of data filtering to remove the outliers before the fit.

How to scale y-axis to intuitively detect small differences in data

I have a data set from a literature survey, where we looked at effects of pH to certain parameters (Metrics) in a group of animals. Because experiments are done on different time scales, I divided the response ratio by time.
This leads to very small differences around 1 (less than 1, there is a negative effect, greater than 1 a positive effect), which are still interesting and important (because the real values are divided by time). The problem is that some of the values are either very low or very high and the differences close to 1 are not visible.
Since values are close to 1, log transformation of y-axis scale does not help. How can I transform the y-axis scale in ggplot2 so that differences close to 1 are visible and yet intuitive? (that the reader can detect differences without thinking too much; I could standardize the values to minimum value, multiply by 10000 and take a log10 scale, but this would not lead to understandable differences.)
df <- structure(list(Study = c(1, 1, 2, 2, 3), pH_control = c(8.06,
8.06, 8.01, 8.01, 7.99), pH_treatment = c(7.86, 7.75, 7.8, 7.8,
7.45), time = c(120, 120, 60, 150, 140), Metrics = structure(c(3L,
1L, 2L, 3L, 1L), .Label = c("Growth", "Metabolism", "Survival"
), class = "factor"), RR_per_time_unit = c(0.9998, 1.001, 1.002,
0.98, 0.9), CI.max = c(1, 1.003, 1.00003, 0.9999, 0.92), CI.min = c(0.9996,
0.9999, 1.004, 0.9789, 0.89), pH_diff = c(0.2, 0.31, 0.21, 0.21,
0.54)), .Names = c("Study", "pH_control", "pH_treatment", "time",
"Metrics", "RR_per_time_unit", "CI.max", "CI.min", "pH_diff"), row.names = c(NA,
-5L), class = "data.frame")
df$pH_diff <- df$pH_control - df$pH_treatment
library(ggplot2)
ggplot(df, aes(y = RR_per_time_unit, x = pH_diff, ymin = CI.min, ymax = CI.max)) +
geom_pointrange(aes(color = Metrics)) + geom_hline(aes(yintercept = 1)) + coord_trans(y = "log10")

Plot Regression Surface

I am reading a book by Cohen, Cohen, Aiken and West(2003) "Applied Multiple Regression Correlation Analysis for the Behavioral Sciences" and have come across a 3d plot of a Regression surface showing interaction and no interaction (p. 259). The graphs look like they may have been created using R. I like the graphs as a teaching tool and would like to reproduce them. The plots look something like this:
The only addition to the Coehn et al. plots were lines across the planes at the mean, +1sd, and =1sd for x2. This would be an excellent addition if possible (generally most things are possible with R)
I have provided a sample data set below with an IV, 2 predictors and centered predictors. How would I use R to generate the regression surface (plane) plot showing interaction and an additive model for both the centered and uncentered data (I assume the technique will be the same but want to make sure).
Total of 4 plots:
1. uncentered no interaction
2. uncentered interaction
3. centered no interaction
4. centered interaction
DF<-structure(list(y = c(-1.22, -1.73, -2.64, -2.44, -1.11, 2.24,
3.42, 0.67, 0.59, -0.61, -10.77, 0.93, -8.6, -6.99, -0.12, -2.29,
-5.16, -3.35, -3.35, -2.51, 2.21, -1.18, -5.21, -7.74, -1.34),
x1 = c(39.5, 41, 34, 30.5, 31.5, 30, 41.5, 24, 43, 39, 25.5,
38.5, 33.5, 30, 41, 31, 25, 37, 37.5, 24.5, 38, 37, 41, 37,
36), x2 = c(61L, 53L, 53L, 44L, 49L, 44L, 57L, 47L, 54L,
48L, 46L, 59L, 46L, 61L, 55L, 57L, 59L, 59L, 55L, 50L, 62L,
55L, 55L, 52L, 55L), centered.x1 = c(5.49702380952381, 6.99702380952381,
-0.0029761904761898, -3.50297619047619, -2.50297619047619,
-4.00297619047619, 7.49702380952381, -10.0029761904762, 8.99702380952381,
4.99702380952381, -8.50297619047619, 4.49702380952381, -0.50297619047619,
-4.00297619047619, 6.99702380952381, -3.00297619047619, -9.00297619047619,
2.99702380952381, 3.49702380952381, -9.50297619047619, 3.99702380952381,
2.99702380952381, 6.99702380952381, 2.99702380952381, 1.99702380952381
), centered.x2 = c(9.80357142857143, 1.80357142857143, 1.80357142857143,
-7.19642857142857, -2.19642857142857, -7.19642857142857,
5.80357142857143, -4.19642857142857, 2.80357142857143, -3.19642857142857,
-5.19642857142857, 7.80357142857143, -5.19642857142857, 9.80357142857143,
3.80357142857143, 5.80357142857143, 7.80357142857143, 7.80357142857143,
3.80357142857143, -1.19642857142857, 10.8035714285714, 3.80357142857143,
3.80357142857143, 0.803571428571431, 3.80357142857143)), .Names = c("y",
"x1", "x2", "centered.x1", "centered.x2"), row.names = c(NA,
25L), class = "data.frame")
Thank you in advance.
EDIT: The following code plots the plane but will not work for when you have an interaction (which is really what I'm interested in). Additionally, I don't know how to plot the high (+1sd), low(-1sd) and mean for x2 either.
x11(10,5)
s3d <- scatterplot3d(DF[,c(2,3,1)], type="n", highlight.3d=TRUE,
angle=70, scale.y=1, pch=16, main="scatterplot3d")
# Now adding a regression plane to the "scatterplot3d"
my.lm <- with(DF, lm(y ~ x1 + x2))
s3d$plane3d(my.lm, lty.box = "solid")
An attempt to plot an interaction plane (Seen here):
s3d <- scatterplot3d(DF[,c(2,3,1)], type="n", highlight.3d=TRUE,
angle=70, scale.y=1, pch=16, main="scatterplot3d")
my.lm <- with(DF, lm(y ~ x1 + x2 + x1:x2 ))
s3d$plane3d(my.lm, lty.box = "solid")
Yielded the following error:
Error in segments(x, z1, x + y.max * yx.f, z2 + yz.f * y.max, lty = ltya, :
cannot mix zero-length and non-zero-length coordinates
Here's how I would do it (adding a bit of color) with packages 'rms' and 'lattice':
require(rms) # also need to have Hmisc installed
require(lattice)
ddI <- datadist(DF)
options(datadist="ddI")
lininterp <- ols(y ~ x1*x2, data=DF)
bplot(Predict(lininterp, x1=25:40, x2=45:60),
lfun=wireframe, # bplot passes extra arguments to wireframe
screen = list(z = -10, x = -50), drape=TRUE)
And the non-interaction model:
bplot(Predict(lin.no.int, x1=25:40, x2=45:60), lfun=wireframe, col=2:8, drape=TRUE,
screen = list(z = -10, x = -50),
main="Estimated regression surface with no interaction")

Resources