I am comparing two graphs with a non-parametric lo(w)ess curve superimposed in each case. The problem is that the curves look very different, despite the fact that their arguments, such as span, are identical.
y<-rnorm(100)
x<-rgamma(100,2,2)
qplot(x,y)+stat_smooth(span=2/3,se=F)+theme_bw()
plot(x,y)
lines(lowess(y~x))
There seems to be a lot more curvatute in the graph generated by qplot(). As you know detecting curvature is very important in the diagnostics of regression analysis and I fear that If I am to use ggplot2, I would reach erroneous conclusions.
Could you please tell me how I could produce the same curve in ggplot2?
Thank you
Or, you can use loess(..., degree=1). This produces a very similar, but not quite identical result to lowess(...)
set.seed(1) # for reproducibility
y<-rnorm(100)
x<-rgamma(100,2,2)
plot(x,y)
points(x,loess(y~x,data.frame(x,y),degree=1)$fitted,pch=20,col="red")
lines(lowess(y~x))
With ggplot
qplot(x,y)+stat_smooth(se=F,degree=1)+
theme_bw()+
geom_point(data=as.data.frame(lowess(y~x)),aes(x,y),col="red")
Here is a new stat function for use with ggplot2 that uses lowess(): https://github.com/harrelfe/Hmisc/blob/master/R/stat-plsmo.r. You need to load the proto package for this to work. I like using lowess because it is fast for any sample size and allows outlier detection to be turned off for binary Y. But it doesn't provide confidence bands.
Related
Q-Q plot is a useful graphical device used to check for example normality of residuals. Q-Q plot is constructed by putting theoretical quantiles on x-axis and observed quantiles on the y-axis. In ggplot, this can be easily done using geom_qq and stat_qq. I would like to produce a wormplot, which is like a Q-Q plot, but on the y-axis, it has a difference between theoretical and observed quantiles (see the figure).
Is there a way to do this in ggplot? For example, is there a simple way to change the y-axis of the geom_qq to show the difference between theoretical and observed quantiles? I know it should be possible to calculate observed quantiles manually, but this would not work well if I would like to create plots of multiple groups or using facets, since then I would also need to calculate the observed quantiles manually for each group separately.
blogpost mentioned in comments contains a guide to code your own statfunctions to create such plots yourself
Otherwise, library qqplotr https://aloy.github.io/qqplotr/index.html contains an option detrend=True which basically produce wormplots with accompanying confidence bands.
If you want lines, and not a band, just do fill=NA, color='black', size=0.5
I've computed and plotted gaussian kernel density estimates using the KernSmooth package as follows:
x <- MyData$MyNumericVector
h <- dpik(x)
est <- bkde(x, bandwidth=h)
plot(est, type='l')
This is the method described in KernSmooth's documentation. Note that dpik() finds the optimal bandwidth and bkde() uses this bandwidth to fit the kernel density estimate. It's important that I use this method instead of the basic density() function.
How do I layer these plots on top of one another?
I cannot use the basic density() function that geom_density() from ggplot2 relies upon, as bandwidths and kernel density estimates are best optimized using the KernSmooth package (see Deng & Wickham, 2011 here: http://vita.had.co.nz/papers/density-estimation.pdf). Since Wickham wrote ggplot2 and the above review of kernel density estimation packages, it would make sense that there's a way to use ggplot2 to layer densities that aren't reliant on the basic density() function, but I'm not sure.
Can I use ggplot2 for this even if I don't wish to use the basic density() function? What about lattice ?
You could do it with geom_line:
m <- ggplot(NULL, aes(x=bkde(movies$votes)$x,y=bkde(movies$votes)$y)) + geom_line()
print(m)
If you were doing t with lattice::densityplot, you could probably add some of the values to the drags-list:
darg
list of arguments to be passed to the density function. Typically, this should be a list with zero or more of the following components : bw, adjust, kernel, window, width, give.Rkern, n, from, to, cut, na.rm (see density for details)
I can create a lattice qq-plot with:
qqnorm(surfchem.cast$Con)
but I have not learned how to add a panel.abline or prepane.qqmathline().
I've looked in the lattice graphics book and searched the web without finding the correct syntax. A pointer to how to add this line representing the linear relationship between theoretical and data quantiles will be greatly appreciated. I also do not find a question here where the answer is for a qq plot rather than an xyplot.
The convention with Q-Q plots is to plot the line that goes through the first and fourth quartiles of the sample and the test distribution, not the line of best fit.
set.seed(1)
Z <- rnorm(100)
qqnorm(Z)
qqline(Z,probs=c(0.25,0.75))
The reason for this is that, if your sample is not normally distributed, the deviations tend to be at the extremes.
set.seed(1)
Z <- runif(100) # NOTE: uniform distribution...
qqnorm(Z)
qqline(Z, probs=c(0.25,0.75))
If you want the line connecting the corners, as in your comment, use different probabilities. The reason you need to use (0.01,0.99) rather than (0,1) is that the latter will produce infinities.
set.seed(1)
Z <- runif(100) # NOTE: uniform distribution...
qqnorm(Z)
qqline(Z, probs=c(0.01,0.99))
I made a plot of an empirical distribution function (EDF) using plot.ecdf(x, ...).
In order to visualize normality, I'm looking in r for a qqline equivalent to draw a simple diagonal line in my plot.
The normplot() function in MATLAB is doing the same thing (See the red line in the plot on this link: http://www.mathworks.de/de/help/stats/normplot.html). Thanks.
As mentioned in the comments, just call qqline():
x <- ecdf(rnorm(10))
plot.ecdf(x)
qqline(x)
Having created a scatter plot with pandas. I don't know how to create the regresion line that would be the least squared from the points.
looking for examples in http://matplotlib.org i haven't found any similar graph.
Thanks you a lot in advance !!
Pandas has an ordinary least squares (ols) function, there is a very detailed example in the 0.10.1 docs of how to plot the result, here's a snippet:
model = ols(y=rets['AAPL'], x=rets.ix[:, ['GOOG']], window=250)
# just plot the coefficient for GOOG
model.beta['GOOG'].plot()
Note: this example is no longer in the docs (since 0.10.1), I'm not sure why.