A question following this post. I have the following data:
x1, disease symptom
y1, another disease symptom
I fitted the x1/y1 data with a Deming regression with vr (or sdr) option set to 1. In other words, the regression is a Total Least Squares regression, i.e. orthogonal regression. See previous post for the graph.
x1=c(24.0,23.9,23.6,21.6,21.0,20.8,22.4,22.6,
21.6,21.2,19.0,19.4,21.1,21.5,21.5,20.1,20.1,
20.1,17.2,18.6,21.5,18.2,23.2,20.4,19.2,22.4,
18.8,17.9,19.1,17.9,19.6,18.1,17.6,17.4,17.5,
17.5,25.2,24.4,25.6,24.3,24.6,24.3,29.4,29.4,
29.1,28.5,27.2,27.9,31.5,31.5,31.5,27.8,31.2,
27.4,28.8,27.9,27.6,26.9,28.0,28.0,33.0,32.0,
34.2,34.0,32.6,30.8)
y1=c(100.0,95.5,93.5,100.0,98.5,99.5,34.8,
45.8,47.5,17.4,42.6,63.0,6.9,12.1,30.5,
10.5,14.3,41.1, 2.2,20.0,9.8,3.5,0.5,3.5,5.7,
3.1,19.2,6.4, 1.2, 4.5, 5.7, 3.1,19.2, 6.4,
1.2,4.5,81.5,70.5,91.5,75.0,59.5,73.3,66.5,
47.0,60.5,47.5,33.0,62.5,87.0,86.0,77.0,
86.0,83.0,78.5,83.0,83.5,73.0,69.5,82.5,78.5,
84.0,93.5,83.5,96.5,96.0,97.5)
x11()
plot(x1,y1,xlim=c(0,35),ylim=c(0,100))
library(MethComp)
dem_reg <- Deming(x1, y1)
abline(dem_reg[1:2], col = "green")
I would like to know how much x1 helps to predict y1:
normally, I’d go for a R-squared, but it does not seem to be relevant; although another mathematician told me he thinks a R-squared may be appropriate. And this page suggests to calculate a Pearson product-moment correlation coefficient, which is R I believe?
partially related, there is possibly a tolerance interval. I could calculated it with R ({tolerance} package or code shown in the post), but it is not exactly what I am searching for.
Does someone know how to calculate a goodness of fit for Deming regression, using R? I looked at MetchComp pdf but could not find it (perhaps missed it though).
EDIT: following Gaurav's answers about confidence interval: R code
Firstly: confidence intervals for parameters
library(mcr)
MCR_reg=mcreg(x1,y1,method.reg="Deming",error.ratio=1,method.ci="analytical")
getCoefficients(MCR_reg)
Secondly: confidence intervals for predicted values
# plot of data
x11()
plot(x1,y1,xlim=c(0,35),ylim=c(0,100))
# Deming regression using functions from {mcr}
library(mcr) MCR_reg=mcreg(x1,y1,method.reg="Deming",error.ratio=1,method.ci="analytical")
MCR_intercept=getCoefficients(MCR_reg)[1,1]
MCR_slope=getCoefficients(MCR_reg)[2,1]
# CI for predicted values
x_to_predict=seq(0,35)
predicted_values=MCResultAnalytical.calcResponse(MCR_reg,x_to_predict,alpha=0.05)
CI_low=predicted_values[,4]
CI_up=predicted_values[,5]
# plot regression line and CI for predicted values
abline(MCR_intercept,MCR_slope, col="red")
lines(x_to_predict,CI_low,col="royalblue",lty="dashed")
lines(x_to_predict,CI_up,col="royalblue",lty="dashed")
# comments
text(7.5,60, "Deming regression", col="red")
text(7.5,40, "Confidence Interval for", col="royalblue")
text(7.5,35, "Predicted values - 95%", col="royalblue")
EDIT 2
Topic moved to Cross Validated:
https://stats.stackexchange.com/questions/167907/deming-orthogonal-regression-measuring-goodness-of-fit
There are many proposed methods to calculate goodness of fit and tolerance intervals for Deming Regression but none of them widely accepted. The conventional methods we use for OLS regression may not make sense. This is an area of active research. I don't think there many R-packages which will help you compute that since not many mathematicians agree on any particular method. Most methods for calculating intervals are based on Resampling techniques.
However you can check out the 'mcr' package for intervals...
https://cran.r-project.org/web/packages/mcr/
I met some difficulty in calculating the Bayes decision boundary of Figure 2.5. In the package ElemStatLearn, it already calcualted the probability at each point and used contour to draw the boundary. Can any one tell me how to calculate the probability? Thank you very much.
In traditional Bayes decision problem, the mixture distribution are usually normal distribution, but in this example, it uses two steps to generate the samples, so I have some difficulty in calculating the distribution.
Thank you very much.
Section 2.3.3 of ESL (accessible online) states how the data were generated. Each class is a mixture of 10 Gaussian distributions of equal covariance and each of the 10 means are drawn from another bivariate Gaussian, as specified in the text. To calculate the exact decision boundary of the simulation in Figure 2.5, you would need to know the particular 20 means (10 for each class) that were generated to produce the data but those values are not provided in the text.
However, you can generate a new pair of mixture models and calculate the probability for each of the two classes (BLUE & ORANGE) that you generate. Since each of the 10 distributions in a class are equally likely, the class-conditional probability p(x|BLUE) is just the average of the probabilities for each of the 10 distributions in the BLUE model.