Why I did get lines instead of standard bubbles in my q-q plot?
My code:
data <- read.csv("C:\\Users\\anton\\SanFrancisco.csv")
x <- data$ï..San.Francisco
head(x)
library("fitdistrplus")
fitnor <- fitdist(x, "norm")
fitlogis <- fitdist(x, "logis")
qqcomp(list(fitnor, fitlogis), legendtext=c("Normal", "Logistic"))
From the documentation for qqcomp - get to it by ?qqcomp.
qqcomp provides a plot of the quantiles of each theoretical
distribution (x-axis) against the empirical quantiles of the data
(y-axis), by default defining probability points as (1:n - 0.5)/n for
theoretical quantile calculation (data are assumed continuous). For
large dataset (n > 1e4), lines are drawn instead of points and
customized with the fitpch parameter.
This is a design feature. Your data must have more than 10000 values. If that is the case, the bubbles on the q-q plot would be difficulty to individually distinguish. Additionally, they are large enough that the bubbles for one model would cover those for the other.
Related
I would like to plot a segment of an ROC curve over a specific range of x values, instead of plotting the entire curve. I don't want to change the range of the x axis itself. I just want to plot only part of the ROC curve, within a range of x values that I specify.
library(pROC)
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$wfns)
plot(rocobj)
That code plots the whole ROC curve. Let's say I just wanted to plot the curve from x=1 to x=.5. How could I do that? Thank you.
The default plot function for roc objects plots the rocobj$sensitivities as a function of rocobj$specificities.
So
plot(rocobj$specificities,rocobj$sensitivities,type="l",xlim=c(1.5,-0.5))
abline(1,-1)
achieves the same as
plot(rocobj)
And
plot(rocobj$specificities[2:6],rocobj$sensitivities[2:6],type="l",xlim=c(1.5,-0.5),ylim=c(0,1))
abline(1,-1)
Gets close to what I think you are after (plots from 0.514 to 1.0). I don't know enough about the package to know if the sensitivities can be calculated at a specific range, or resolution of specificities.
The plot function of pROC uses the usual R semantics for plotting, so you can use the xlim argument as you would for any other plot:
plot(rocobj, xlim = c(1, .5))
I can create a lattice qq-plot with:
qqnorm(surfchem.cast$Con)
but I have not learned how to add a panel.abline or prepane.qqmathline().
I've looked in the lattice graphics book and searched the web without finding the correct syntax. A pointer to how to add this line representing the linear relationship between theoretical and data quantiles will be greatly appreciated. I also do not find a question here where the answer is for a qq plot rather than an xyplot.
The convention with Q-Q plots is to plot the line that goes through the first and fourth quartiles of the sample and the test distribution, not the line of best fit.
set.seed(1)
Z <- rnorm(100)
qqnorm(Z)
qqline(Z,probs=c(0.25,0.75))
The reason for this is that, if your sample is not normally distributed, the deviations tend to be at the extremes.
set.seed(1)
Z <- runif(100) # NOTE: uniform distribution...
qqnorm(Z)
qqline(Z, probs=c(0.25,0.75))
If you want the line connecting the corners, as in your comment, use different probabilities. The reason you need to use (0.01,0.99) rather than (0,1) is that the latter will produce infinities.
set.seed(1)
Z <- runif(100) # NOTE: uniform distribution...
qqnorm(Z)
qqline(Z, probs=c(0.01,0.99))
So I have a barplot in which the y axis is the log (frequencies). From just eyeing it, it appears that bars decrease exponentially, but I would like to know this for sure. What I want to do is also plot an exponential on this same graph. Thus, if my bars fall below the exponential, I would know that my bars to decrease either exponentially or faster than exponential, and if the bars lie on top of the exponential, I would know that they dont decrease exponentially. How do I plot an exponential on a bar graph?
Here is my graph if that helps:
If you're trying to fit density of an exponential function, you should probably plot density histogram (not frequency). See this question on how to plot distributions in R.
This is how I would do it.
x.gen <- rexp(1000, rate = 3)
hist(x.gen, prob = TRUE)
library(MASS)
x.est <- fitdistr(x.gen, "exponential")$estimate
curve(dexp(x, rate = x.est), add = TRUE, col = "red", lwd = 2)
One way of visually inspecting if two distributions are the same is with a Quantile-Quantile plot, or Q-Q plot for short. Typically this is done when inspecting if a distribution follows standard normal.
The basic idea is to plot your data, against some theoretical quantiles, and if it matches that distribution, you will see a straight line. For example:
x <- qnorm(seq(0,1,l=1002)) # Theoretical normal quantiles
x <- x[-c(1, length(x))] # Drop ends because they are -Inf and Inf
y <- rnorm(1000) # Actual data. 1000 points drawn from a normal distribution
l.1 <- lm(sort(y)~sort(x))
qqplot(x, y, xlab="Theoretical Quantiles", ylab="Actual Quantiles")
abline(coef(l.1)[1], coef(l.1)[2])
Under perfect conditions you should see a straight line when plotting the theoretical quantiles against your data. So you can do the same plotting your data against the exponential function you think it will follow.
I am building an R package to display Weibull plots (using graphics::plot) in R. The plot has a log-transformed x-axis and a Weibull-transformed y-axis (for lack of a better description). The two-parameter Weibull distribution can thus be represented as a straight line on this plot.
The logarithmic transformation of the x-axis is as simple as adding the log="x" parameter to plot() or curve(). How can I supply the y-axis transformation in an elegant way, so that all graphics-related plotting will work on my axis-transformed plot? To demonstrate what I need, run the following example code:
## initialisation ##
beta <- 2;eta <- 1000
ticks <- c(seq(0.01,0.09,0.01),(1:9)/10,seq(0.91,0.99,0.01))
F0inv <- function (p) log(qweibull(p, 1, 1))
# this is the transformation function
F0 <- function (q) exp(-exp(q))
# this is the inverse of the transformation function
weibull <- function(x)pweibull(x,beta,eta)
# the curve of this function represents the weibull distribution
# as a straight line on weibull paper
weibull2 <- function(x)F0inv(weibull(x))
First an example of a Weibull distribution with beta=2 and eta=1000 on a regular, untransformed plot:
## untransformed axes ##
curve(weibull ,xlim=c(100,1e4),ylim=c(0.01,0.99))
abline(h=ticks,col="lightgray")
This plot is useless for Weibull analysis. Here is my currently implemented solution that transforms the data with function F0inv() and modifies the y-axis of the plot. Notice that I have to use F0inv() on all y-axis related data.
## transformed axis with F0inv() ##
curve(weibull2,xlim=c(100,1e4),ylim=F0inv(c(0.01,0.99)),log="x",axes=F)
axis(1);axis(2,at=F0inv(ticks),labels=ticks)
abline(h=F0inv(ticks),col="lightgray")
This works, but this is not very user-friendly: when the user wants to add annotations, one must always use F0inv():
text(300,F0inv(0.4),"at 40%")
I found that you can achieve a solution to my problem using ggplot2 and scales, but I don't want to change to a graphics package unless absolutely necessary since a lot of other code needs to be rewritten.
## with ggplot2 and scales ##
library(ggplot2)
library(scales)
weibull_trans <- function()trans_new("weibull", F0inv, F0)
qplot(c(100,1e4),xlim=c(100,1e4),ylim=c(0.01,0.99),
stat="function",geom="line",fun=weibull) +
coord_trans(x="log10",y = "weibull")
I think that if I could dynamically replace the code for applying the logarithmic transformation with my own, my problem would be solved.
I tried to find more information by Googling "R axis transformation", "R user coordinates", "R axis scaling" without useful results. Almost everything I have found dealt with logarithmic scales.
I tried to look into plot() at how the log="x" parameter works, but the relevant code for plot.window is written in C – not my strongest point at all.
While it doesn't appear to be possible in base graphics, you can make this function do what you want so that you can call it more simply:
F0inv <- function (p) log(qweibull(p, 1, 1))
## this is the transformation function
F0 <- function (q) exp(-exp(q))
weibullplot <- function(eta, beta,
ticks=c(seq(0.01,0.09,0.01),(1:9)/10,seq(0.91,0.99,0.01)),
...) {
## the curve of this function represents the weibull distribution
## as a straight line on weibull paper
weibull2 <- function(x)
F0inv(pweibull(x, beta, eta))
curve(weibull2, xlim=c(100, 1e4), ylim=F0inv(c(0.01, 0.99)), log="x", axes=FALSE)
axis(1);
axis(2, at=F0inv(ticks), labels=ticks)
abline(h=F0inv(ticks),col="lightgray")
}
weibullplot(eta=1000, beta=2)
I have two density curves plotted using this:
Network <- Mydf$Networks
quartiles <- quantile(Mydf$Avg.Position, probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)
I'd like to compute the area under each curve for a given Avg.Position range. Sort of like pnorm for the normal curve. Any ideas?
Calculate the density seperately and plot that one to start with. Then you can use basic arithmetics to get the estimate. An integration is approximated by adding together the area of a set of little squares. I use the mean method for that. the length is the difference between two x-values, the height is the mean of the y-value at the begin and at the end of the interval. I use the rollmeans function in the zoo package, but this can be done using the base package too.
require(zoo)
X <- rnorm(100)
# calculate the density and check the plot
Y <- density(X) # see ?density for parameters
plot(Y$x,Y$y, type="l") #can use ggplot for this too
# set an Avg.position value
Avg.pos <- 1
# construct lengths and heights
xt <- diff(Y$x[Y$x<Avg.pos])
yt <- rollmean(Y$y[Y$x<Avg.pos],2)
# This gives you the area
sum(xt*yt)
This gives you a good approximation up to 3 digits behind the decimal sign. If you know the density function, take a look at ?integrate
Three possibilities:
The logspline package provides a different method of estimating density curves, but it does include pnorm style functions for the result.
You could also approximate the area by feeding the x and y variables returned by the density function to the approxfun function and using the result with the integrate function. Unless you are interested in precise estimates of small tail areas (or very small intervals) then this will probably give a reasonable approximation.
Density estimates are just sums of the kernels centered at the data, one such kernel is just the normal distribution. You could average the areas from pnorm (or other kernels) with the sd defined by the bandwidth and centered at your data.