R: Lattice Q-Q Plot with regression Line - r

I can create a lattice qq-plot with:
qqnorm(surfchem.cast$Con)
but I have not learned how to add a panel.abline or prepane.qqmathline().
I've looked in the lattice graphics book and searched the web without finding the correct syntax. A pointer to how to add this line representing the linear relationship between theoretical and data quantiles will be greatly appreciated. I also do not find a question here where the answer is for a qq plot rather than an xyplot.

The convention with Q-Q plots is to plot the line that goes through the first and fourth quartiles of the sample and the test distribution, not the line of best fit.
set.seed(1)
Z <- rnorm(100)
qqnorm(Z)
qqline(Z,probs=c(0.25,0.75))
The reason for this is that, if your sample is not normally distributed, the deviations tend to be at the extremes.
set.seed(1)
Z <- runif(100) # NOTE: uniform distribution...
qqnorm(Z)
qqline(Z, probs=c(0.25,0.75))
If you want the line connecting the corners, as in your comment, use different probabilities. The reason you need to use (0.01,0.99) rather than (0,1) is that the latter will produce infinities.
set.seed(1)
Z <- runif(100) # NOTE: uniform distribution...
qqnorm(Z)
qqline(Z, probs=c(0.01,0.99))

Related

Line style in q-q plot in R

Why I did get lines instead of standard bubbles in my q-q plot?
My code:
data <- read.csv("C:\\Users\\anton\\SanFrancisco.csv")
x <- data$ï..San.Francisco
head(x)
library("fitdistrplus")
fitnor <- fitdist(x, "norm")
fitlogis <- fitdist(x, "logis")
qqcomp(list(fitnor, fitlogis), legendtext=c("Normal", "Logistic"))
From the documentation for qqcomp - get to it by ?qqcomp.
qqcomp provides a plot of the quantiles of each theoretical
distribution (x-axis) against the empirical quantiles of the data
(y-axis), by default defining probability points as (1:n - 0.5)/n for
theoretical quantile calculation (data are assumed continuous). For
large dataset (n > 1e4), lines are drawn instead of points and
customized with the fitpch parameter.
This is a design feature. Your data must have more than 10000 values. If that is the case, the bubbles on the q-q plot would be difficulty to individually distinguish. Additionally, they are large enough that the bubbles for one model would cover those for the other.

Different lowess curves in plot and qplot in R

I am comparing two graphs with a non-parametric lo(w)ess curve superimposed in each case. The problem is that the curves look very different, despite the fact that their arguments, such as span, are identical.
y<-rnorm(100)
x<-rgamma(100,2,2)
qplot(x,y)+stat_smooth(span=2/3,se=F)+theme_bw()
plot(x,y)
lines(lowess(y~x))
There seems to be a lot more curvatute in the graph generated by qplot(). As you know detecting curvature is very important in the diagnostics of regression analysis and I fear that If I am to use ggplot2, I would reach erroneous conclusions.
Could you please tell me how I could produce the same curve in ggplot2?
Thank you
Or, you can use loess(..., degree=1). This produces a very similar, but not quite identical result to lowess(...)
set.seed(1) # for reproducibility
y<-rnorm(100)
x<-rgamma(100,2,2)
plot(x,y)
points(x,loess(y~x,data.frame(x,y),degree=1)$fitted,pch=20,col="red")
lines(lowess(y~x))
With ggplot
qplot(x,y)+stat_smooth(se=F,degree=1)+
theme_bw()+
geom_point(data=as.data.frame(lowess(y~x)),aes(x,y),col="red")
Here is a new stat function for use with ggplot2 that uses lowess(): https://github.com/harrelfe/Hmisc/blob/master/R/stat-plsmo.r. You need to load the proto package for this to work. I like using lowess because it is fast for any sample size and allows outlier detection to be turned off for binary Y. But it doesn't provide confidence bands.

Linear regression different using R plot() and qplot()

If I create a scatterplot using plot() with lm(x~y) on my data I get intercept at 500 and when I observe the qplot on the same data with stat_smooth(method=lm), the intercept is at roughly 1000 on y axis. Although the slope looks visually similar to that on the simple plot(). I hope this makes sense. I cannot understand why the difference. Full functions are given below. Any help will be greatly appreciated.
plot():
plot (my[[12]],my[[8]])
abline(lm(my[[12]]~my[[8]]),col="red")
qplot():
myGG<-qplot(x=my[[12]],y=my[[8]]) # pretty scatterplot
myGG<-myGG + stat_smooth(fullrange=TRUE,method="lm")
It seems to me that the variables in the regressions do not correspond. In lm the variable my[[12]] is dependent, in the qplot variant it is the independent one. Using lm(my[[8]]~my[[12]] should make it equivalent.
It is a common mistake to mix up the variables when using plot and lm. Note that to get the axis right, the order of the variables changes in lm compared to plot.
x <- rnorm(100)
y <- rnorm(100)
plot(x,y)
abline(lm(y ~x))
To make it less confusing you might use the formula interface in plot as well.
plot(y ~ x)
abline(lm(y ~x))

How to scale/transform graphics::plot() axes with any transformation, not just logarithmic (for Weibull plots)?

I am building an R package to display Weibull plots (using graphics::plot) in R. The plot has a log-transformed x-axis and a Weibull-transformed y-axis (for lack of a better description). The two-parameter Weibull distribution can thus be represented as a straight line on this plot.
The logarithmic transformation of the x-axis is as simple as adding the log="x" parameter to plot() or curve(). How can I supply the y-axis transformation in an elegant way, so that all graphics-related plotting will work on my axis-transformed plot? To demonstrate what I need, run the following example code:
## initialisation ##
beta <- 2;eta <- 1000
ticks <- c(seq(0.01,0.09,0.01),(1:9)/10,seq(0.91,0.99,0.01))
F0inv <- function (p) log(qweibull(p, 1, 1))
# this is the transformation function
F0 <- function (q) exp(-exp(q))
# this is the inverse of the transformation function
weibull <- function(x)pweibull(x,beta,eta)
# the curve of this function represents the weibull distribution
# as a straight line on weibull paper
weibull2 <- function(x)F0inv(weibull(x))
First an example of a Weibull distribution with beta=2 and eta=1000 on a regular, untransformed plot:
## untransformed axes ##
curve(weibull ,xlim=c(100,1e4),ylim=c(0.01,0.99))
abline(h=ticks,col="lightgray")
This plot is useless for Weibull analysis. Here is my currently implemented solution that transforms the data with function F0inv() and modifies the y-axis of the plot. Notice that I have to use F0inv() on all y-axis related data.
## transformed axis with F0inv() ##
curve(weibull2,xlim=c(100,1e4),ylim=F0inv(c(0.01,0.99)),log="x",axes=F)
axis(1);axis(2,at=F0inv(ticks),labels=ticks)
abline(h=F0inv(ticks),col="lightgray")
This works, but this is not very user-friendly: when the user wants to add annotations, one must always use F0inv():
text(300,F0inv(0.4),"at 40%")
I found that you can achieve a solution to my problem using ggplot2 and scales, but I don't want to change to a graphics package unless absolutely necessary since a lot of other code needs to be rewritten.
## with ggplot2 and scales ##
library(ggplot2)
library(scales)
weibull_trans <- function()trans_new("weibull", F0inv, F0)
qplot(c(100,1e4),xlim=c(100,1e4),ylim=c(0.01,0.99),
stat="function",geom="line",fun=weibull) +
coord_trans(x="log10",y = "weibull")
I think that if I could dynamically replace the code for applying the logarithmic transformation with my own, my problem would be solved.
I tried to find more information by Googling "R axis transformation", "R user coordinates", "R axis scaling" without useful results. Almost everything I have found dealt with logarithmic scales.
I tried to look into plot() at how the log="x" parameter works, but the relevant code for plot.window is written in C – not my strongest point at all.
While it doesn't appear to be possible in base graphics, you can make this function do what you want so that you can call it more simply:
F0inv <- function (p) log(qweibull(p, 1, 1))
## this is the transformation function
F0 <- function (q) exp(-exp(q))
weibullplot <- function(eta, beta,
ticks=c(seq(0.01,0.09,0.01),(1:9)/10,seq(0.91,0.99,0.01)),
...) {
## the curve of this function represents the weibull distribution
## as a straight line on weibull paper
weibull2 <- function(x)
F0inv(pweibull(x, beta, eta))
curve(weibull2, xlim=c(100, 1e4), ylim=F0inv(c(0.01, 0.99)), log="x", axes=FALSE)
axis(1);
axis(2, at=F0inv(ticks), labels=ticks)
abline(h=F0inv(ticks),col="lightgray")
}
weibullplot(eta=1000, beta=2)

qqline() equivalent for a normal probability plot of edf

I made a plot of an empirical distribution function (EDF) using plot.ecdf(x, ...).
In order to visualize normality, I'm looking in r for a qqline equivalent to draw a simple diagonal line in my plot.
The normplot() function in MATLAB is doing the same thing (See the red line in the plot on this link: http://www.mathworks.de/de/help/stats/normplot.html). Thanks.
As mentioned in the comments, just call qqline():
x <- ecdf(rnorm(10))
plot.ecdf(x)
qqline(x)

Resources