How to plot theoretical Pareto distribution in R? - r

I need to plot theoretical Pareto distribution in R.
I want this as a line - not points and not polylines.
My distribution function is 1−(1/x)^2.
I plotted empirical distribution of my sample and also theoretical distribution at one graph:
ecdf(b2)
plot(ecdf(b2))
lines(x, (1-(1/x)^2), col = "red", lwd = 2, xlab = "", ylab = "")
But I got:
You can see that red line is not continuous, it's something like polyline. Is it possible to get the continuous red line?
Do you have any advices?

Use curve() instead.
library(EnvStats)
set.seed(8675309)
# You did not supply the contents of b2 so I generated some
b2 <- rpareto(100, 1, 2)
plot(ecdf(b2))
ppareto <- function(x) 1−(1/x)^2
curve(ppareto, col = "red", add = TRUE)

Related

superimposing two probability plots with probplot

I can create a lognormal probability plot using the probplot() function from the e1071 package. A problem arises when I try to add another set of lognormal data to the first plot. Although I use the command par(new=T), the xaxis of the two plots are different and don't align.
Is there another way to go about this?
I tried using the points() function. However, it appears I need the x and y coordinates to plot it and I don't know how to extract the x, y coordinates from the probplot() function.
''' R
# Program to plot random logn failure times with probability plot
library(e1071)
logn_prob_plot <- function() {
set.seed(1)
x<-rlnorm(10,1,1)
par(bty="l")
par(col.lab="white")
p<-probplot(x,qdist=qlnorm)
par(col.lab="black")
mtext(text="failure time", col="black",side=1,line=3,outer=F)
mtext(text="lognormal probability", col="black",side=2,line=3,outer=F)
set.seed(2)
y=rlnorm(10,2,3)
par(new=T)
par(col.lab="white")
probplot(y,qdist=qlnorm,xlab="fail time",ylab="lognormal probability")
par(col.lab="black")
mtext(text="failure time", col="black",side=1,line=3,outer=F)
mtext(text="lognormal probability", col="black",side=2,line=3,outer=F)
}
logn_prob_plot()
My expected result is two groups of data on the same probability plot with the same x and y axes. Instead, I get two different x-axes that are not aligned.
First lets simulate the variables:
set.seed(1)
x<-rlnorm(10,1,1)
set.seed(2)
y=rlnorm(10,2,3)
The first probplot is:
p<-probplot(x,qdist=qlnorm, meanlog = 1, sdlog = 1)
which produces the output:
The second probplot is:
q <- probplot(y,qdist=qlnorm,meanlog = 2, sdlog = 3)
which produces the output:
Your best shot a merging them is using the scale of the smaller one and discarding some points:
p<-probplot(x,qdist=qlnorm, meanlog = 1, sdlog = 1)
points(sort(x), p[[1]](ppoints(length(x))), col = "red", pch = 19)
lines(q, col = "blue")
points(sort(y), q[[1]](ppoints(length(y))), col = "blue", pch = 19)
which gives:
The red line and points are from the distribution with meanlog = 1, sdlog = 1 and the
blue ones are from the one with meanlog = 2, sdlog = 3.
I further have to warn you that from reading the code of the probplot() function:
xl <- quantile(x, c(0.25, 0.75))
yl <- qdist(c(0.25, 0.75), ...)
slope <- diff(yl)/diff(xl)
the slope of the line is determined only by position the first and the third quartile and not bz what happens elsewhere.

How to overlay density histogram with gamma distribution fit in R?

I am new to R and would like to add a fit to a gamma distribution to my histogram. I would like the gamma distribution fit to overlay my histogram.
I am able to calculate the gamma distribution with the dgamma function and also with the fitdist function. However, I am not able to overlay this gamma distribution as a fit onto my histogram.
This is the code I tried:
hist(mydata, breaks = 30, freq = FALSE, col = "grey")
lines(dgamma(mydata, shape = 1))
The code I tried does not overlay the gamma distribution fit onto my histogram. I only get the histogram without the fit.
See if the following example can help in overlaying
a fitted line in black
a PDF graph in red, dotted
on a histogram.
First, create a dataset.
set.seed(1234) # Make the example reproducible
mydata <- rgamma(100, shape = 1, rate = 1)
Now fit a gamma distribution to the data.
param <- MASS::fitdistr(mydata, "gamma")
This vector is needed for the fitted line.
x <- seq(min(mydata), max(mydata), length.out = 100)
And plot them all.
hist(mydata, breaks = 30, freq = FALSE, col = "grey", ylim = c(0, 1))
curve(dgamma(x, shape = param$estimate[1], rate = param$estimate[2]), add = TRUE)
lines(sort(mydata), dgamma(sort(mydata), shape = 1),
col = "red", lty = "dotted")

Add confidence band to R IRF Plot

I use following example code to plot an impulse response function:
# Load data and apply VAR
library("vars")
data(Canada)
data <- Canada
data <- data.frame(data[,1:2])
names(data)
var <- VAR(data, p=2, type = "both")
# Apply IRf
irf <- irf(var, impulse = "e", response = "prod", boot = T, cumulative = FALSE, n.ahead = 20)
str(irf)
plot(irf)
# Response
irf$irf
# Lower & Higher
irf$Lower
irf$Upper
#Create DataFrame and Plot
irf_df <- data.frame(irf$irf,irf$Lower,irf$Upper)
irf_df$T<-seq.int(nrow(irf_df)) #T
irf_df
plot(data.frame(irf_df$T, irf_df[1]), type="l", main="Impulse Response")
abline(h=0, col="blue", lty=2)
It looks like it works so far, though I sense that the code could be improved.
Would it be possible to add a confidence band for the Lower and Upper bounds of the confidence interval?
If you want to plot the Lower and Upper bands, you can use the lines() function, setting the y-limits of the plot if desired.
plot(irf_df$T, irf_df$prod, type="l", main="Impulse Response",
ylim = c(min(irf_df$prod.1), max(irf_df$prod.2)) * 1.1)
abline(h=0, col="blue", lty=2)
lines(irf_df$T, irf_df$prod.1, lty = 2)
lines(irf_df$T, irf_df$prod.2, lty = 2)
For a fancier plot with the confidence band filled in, use polygon. Here, we set up an empty plot, then plot the polygon, and finally overlay the line. Also note here that there's no need to set up a new data.frame: we can simply use values from the irf() output:
plot(irf$irf$e, type = "n", main = "Impulse Response",
ylim = c(min(irf$Lower$e), max(irf$Upper$e)))
polygon(x = c(seq_along(irf$irf$e), rev(seq_along(irf$irf$e))),
y = c(irf$Lower$e, rev(irf$Upper$e)),
lty = 0, col = "#fff7ec")
lines(irf$irf$e)
Output:

Create sample vector data in R with a skewed distribution with limited range

I want to create in R a sample vector of data in R, in which I can control the range of values selected, so I think I want to use sample to limit the range of values generated rather than an rnorm-type command that generates a range of values based upon the type of distribution, variance, SD, etc.
So I'm looking to do a sample with a specified range (e.g. 1-5) for a skewed distribution something like this:
x=rexp(100,1/10)
Here's what I have but does not provide a skewed distribution:
y=sample(1:5,234, replace=T)
How can I have my cake (limited range) and eat it too (skewed distribution), so to speak.
Thanks
set.seed(3)
hist(sample(1:10, size = 100, replace = TRUE, prob = 10:1))
The beta distribution takes values from 0 to 1. If you want your values to be from 0 to 5 for instance, then you can multiply them by 5. Finally, you can get a "skewness" with the beta distribution.
For example, for the skewness you can get these three types:
And using R and beta distribution you can get similar distributions as follows. Notice that the Green Vertical line refers to mean and the Red to median:
x= rbeta(10000,5,2)
hist(x, main="Negative or Left Skewness", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
x= rbeta(10000,2,5)
hist(x, main="Positive or Right Skewness", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
x= rbeta(10000,5,5)
hist(x, main="Symmetrical", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
To better see what the sample function is doing with integers, use the barplot function, not the histogram function:
set.seed(3)
barplot(table(sample(1:10, size = 100, replace = TRUE, prob = 10:1)))

How to plot a normal distribution by labeling specific parts of the x-axis?

I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result:

Resources