superimposing two probability plots with probplot - r

I can create a lognormal probability plot using the probplot() function from the e1071 package. A problem arises when I try to add another set of lognormal data to the first plot. Although I use the command par(new=T), the xaxis of the two plots are different and don't align.
Is there another way to go about this?
I tried using the points() function. However, it appears I need the x and y coordinates to plot it and I don't know how to extract the x, y coordinates from the probplot() function.
''' R
# Program to plot random logn failure times with probability plot
library(e1071)
logn_prob_plot <- function() {
set.seed(1)
x<-rlnorm(10,1,1)
par(bty="l")
par(col.lab="white")
p<-probplot(x,qdist=qlnorm)
par(col.lab="black")
mtext(text="failure time", col="black",side=1,line=3,outer=F)
mtext(text="lognormal probability", col="black",side=2,line=3,outer=F)
set.seed(2)
y=rlnorm(10,2,3)
par(new=T)
par(col.lab="white")
probplot(y,qdist=qlnorm,xlab="fail time",ylab="lognormal probability")
par(col.lab="black")
mtext(text="failure time", col="black",side=1,line=3,outer=F)
mtext(text="lognormal probability", col="black",side=2,line=3,outer=F)
}
logn_prob_plot()
My expected result is two groups of data on the same probability plot with the same x and y axes. Instead, I get two different x-axes that are not aligned.

First lets simulate the variables:
set.seed(1)
x<-rlnorm(10,1,1)
set.seed(2)
y=rlnorm(10,2,3)
The first probplot is:
p<-probplot(x,qdist=qlnorm, meanlog = 1, sdlog = 1)
which produces the output:
The second probplot is:
q <- probplot(y,qdist=qlnorm,meanlog = 2, sdlog = 3)
which produces the output:
Your best shot a merging them is using the scale of the smaller one and discarding some points:
p<-probplot(x,qdist=qlnorm, meanlog = 1, sdlog = 1)
points(sort(x), p[[1]](ppoints(length(x))), col = "red", pch = 19)
lines(q, col = "blue")
points(sort(y), q[[1]](ppoints(length(y))), col = "blue", pch = 19)
which gives:
The red line and points are from the distribution with meanlog = 1, sdlog = 1 and the
blue ones are from the one with meanlog = 2, sdlog = 3.
I further have to warn you that from reading the code of the probplot() function:
xl <- quantile(x, c(0.25, 0.75))
yl <- qdist(c(0.25, 0.75), ...)
slope <- diff(yl)/diff(xl)
the slope of the line is determined only by position the first and the third quartile and not bz what happens elsewhere.

Related

How to plot a function with plot() in R with a step size of 0.5?

New to R. I'm given a function f(x)=x^3-3x+7. I need to plot this function in red and plot its derivative with a step-size of 0.5 in blue. dfx vs x in the same graph. I have plotted f(x), but i cant plot dfx with adjusted step size.
My code:
f1 <- function(x) {x^3-3*x+7}
exp = D(expression(x^3-3*x+7),'x')
f2 <- function(x) {D(exp,'x')}
curve(f1,from=-2,to=2,col='red')
curve(exp,col='blue',add=TRUE,type='p')
I need the points to be plotted at an interval of 0.5 and also draw a line to connect them
I'm not 100% sure what you need. If by changing the step size you mean that both curves be entirely visible in the same plot, that can be done by increasing the limits on the y-axis:
curve(f1, from = -2, to = 2, col='red', add = F, ylim = c(0,10))
curve(exp, col = 'blue', add = T, type = 'p')

How to plot theoretical Pareto distribution in R?

I need to plot theoretical Pareto distribution in R.
I want this as a line - not points and not polylines.
My distribution function is 1−(1/x)^2.
I plotted empirical distribution of my sample and also theoretical distribution at one graph:
ecdf(b2)
plot(ecdf(b2))
lines(x, (1-(1/x)^2), col = "red", lwd = 2, xlab = "", ylab = "")
But I got:
You can see that red line is not continuous, it's something like polyline. Is it possible to get the continuous red line?
Do you have any advices?
Use curve() instead.
library(EnvStats)
set.seed(8675309)
# You did not supply the contents of b2 so I generated some
b2 <- rpareto(100, 1, 2)
plot(ecdf(b2))
ppareto <- function(x) 1−(1/x)^2
curve(ppareto, col = "red", add = TRUE)

What kind of breaks do I use when i want to make a map with data that has a lognormal distribution

I have a number of positions with assigned values, where a lot of the numbers are near 0 and very few are high. Similar to a lognormal distribution. I want to plot them in a heatmap, but Im not sure how to set the breaks. Im making the map in mapplots, using colorbrewer. Here is an example of how I thought it could be done - but is this sound?
library(RColorBrewer)
library(mapplots)
set.seed(10)
d <- data.frame(x=rlnorm(1000,meanlog = -0.75, sdlog = 1.5), Lon=runif(1000), Lat=runif(1000))
byx = 0.05
byy = 0.05
xlim<-c(0,1)
ylim<-c(0,1)
intervals <- (c(0,1,2,5,10,20,50,100,200,500,1000,2000,5000, 10000, 20000)/100)
grd <- make.grid(d$Lon,d$Lat,d$x,byx, byy, xlim, ylim)
maxvalue <- max(unlist(grd), na.rm = T)
breaks <- unlist(lapply(maxvalue,function(x) c(intervals[intervals< x], x)))
col <- colorRampPalette(brewer.pal(9, "YlOrRd"))(length(breaks)-1)
basemap(xlim=xlim, ylim=ylim, main = "", bg="grey")
draw.grid(grd, breaks=breaks,col=col)
legend.grid("topright", xpd=F, breaks=breaks, type=2, col=col, bg="white")
You get a map like this:
The distribution of points in the map is like this:
hist(unlist(grd), 50)

why my GAM fit doesn't seem to have a correct intecept? [R]

My GAM curves are being shifted downwards. Is there something wrong with the intercept? I'm using the same code as Introduction to statistical learning... Any help's appreciated..
Here's the code. I simulated some data (a straight line with noise), and fit GAM multiple times using bootstrap.
(It took me a while to figure out how to plot multiple GAM fits in one graph. Thanks to this post Sam's answer, and this post)
library(gam)
N = 1e2
set.seed(123)
dat = data.frame(x = 1:N,
y = seq(0, 5, length = N) + rnorm(N, mean = 0, sd = 2))
plot(dat$x, dat$y, xlim = c(1,100), ylim = c(-5,10))
gamFit = vector('list', 5)
for (ii in 1:5){
ind = sample(1:N, N, replace = T) #bootstrap
gamFit[[ii]] = gam(y ~ s(x, 10), data = dat, subset = ind)
par(new=T)
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
}
The issue is with plot.gam. If you take a look at the help page (?plot.gam), there is a parameter called scale, which states:
a lower limit for the number of units covered by the limits on the ‘y’ for each plot. The default is scale=0, in which case each plot uses the range of the functions being plotted to create their ylim. By setting scale to be the maximum value of diff(ylim) for all the plots, then all subsequent plots will produced in the same vertical units. This is essential for comparing the importance of fitted terms in additive models.
This is an issue, since you are not using range of the function being plotted (i.e. the range of y is not -5 to 10). So what you need to do is change
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
to
plot(gamFit[[ii]], col = 'blue',
scale = 15,
axes = F, xlab='', ylab='')
And you get:
Or you can just remove the xlim and ylim parameters from both calls to plot, and the automatic setting of plot to use the full range of the data will make everything work.

How to plot a normal distribution by labeling specific parts of the x-axis?

I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result:

Resources