I'm trying to use the inverse cumulative distribution method to plot a histogram from the standard cauchy distribution and I'm getting a strange plot that doesn't look like the textbook standard cauchy. I think I have my inverse function correct (x = tan(pi*(x - 1/2))) so I would appreciate some help. Here is the r code that I have used:
n <- 10000
u <- runif(n)
c.samp <- sapply(u, function(u) tan(pi*(u - 1/2)))
hist(c.samp, breaks = 90, col = "blue",
main = "Hist of Cauchy")
The resulting plot just doesn't look correct:
Any help is appreciated, thank you.
The histogram and sampling technique is correct.
Compare the results with the following (which uses the R Cauchy sampling function).
c.samp2 <- rcauchy(n)
hist(c.samp2, breaks = 90, col = "blue",
main = "Hist of Cauchy 2")
The output here also look incorrect, but it is not.
First, you should note the x-axis is by default chosen based on the extreme values that you happen to encounter. As you probably know, the Cauchy distribution is extremely fat-tailed and very large, but rare, values are expected. When running 10000 samples from the Cauchy distribution, those relatively few single measurements squeeze the plot and do not show up on the plot because only very few observations are allocated to each bins in those extremes.
The default parameters of how hist chooses the bins are also poorly suited for distribution like the Cauchy. Try e.g.
hist(c.samp2, breaks = "FD", col = "blue",
bins = 50,
main = "Hist of Cauchy 2",
xlim = c(-500, 500))
I suggest to read the help("hist") page carefully and play around with the parameters to get a good and useful histogram.
By tweaking the chosen x-axis ranges, using an y-axis probability scale, adding the theoretical distribution and a "rug", you get something more useful.
hist(c.samp, breaks = "FD", col = "blue",
main = "Hist of Cauchy distribution",
xlim = c(-50, 50),
freq = FALSE)
curve(dcauchy, add = TRUE, col = "red")
rug(c.samp)
Note that using c.samp or c.samp2 now hardly changes the plot.
Related
What is the distribution of the sum of two independent random variables with a uniform uniform distribution on the range 0 to 1? Just make the appropriate simulation, the result is presented on the graph and commented on it.
my code:
x<-runif(10,0,1)
y<-runif(10,0,1)
z<-x+y
plot(density(x),col='blue')
par(new = TRUE)
plot(density(y),ann=FALSE, axis = FALSE,col='green')
par(new = TRUE)
plot(density(z),ann=FALSE, axis = FALSE,col='red')
Occasionally hist(..., nclass=nclass.scott) produces a histogram where the maximum bar extends over the top of the y axis. You may try this example a few times:
x <- sample(1000000, 500, replace=TRUE)
h <- hist(x,nclass=nclass.scott)
text(x=h$mids, y=h$counts, labels=h$counts, pos=3, col="red")
Example:
Occasionally the red number over the highest bar cannot be presented as it seems to be clipped by the plot region. I could add ylim=..., but it's quite tricky to get the maximum height of the bar.
Even when knowing the maximum height, ylim=(0, max) has the problem that max may be ignored: For example, when maximum is 527, then the upper displayed y-axis label is 500, even if ylim=(0, 527) is specified. When using 600 instead, it works, but then the y-axis is a bit too long...
If that is not a bug of R (3.3.3), what is an elegant (minimalistic) solution?
I think you need to set par(xpd= T) in your graph to avoid the trimming.
?par
xpd
A logical value or NA. If FALSE, all plotting is clipped to the
plot region, if TRUE, all plotting is clipped to the figure region,
and if NA, all plotting is clipped to the device region. See also
clip.
You can do it better by collaborating with usr option and xpd.Upon observation the bars seems going out of chart but it is not the bars that are going outside the chart but the axis being restricted to the labels. Hence to fix the labels we can choose to use usr. In case someone wants to play with the margin, one can also use mar.
library(RColorBrewer)
par(mfrow=c(1,1),xpd=T,yaxs="i")
x <- sample(1000000, 500, replace=TRUE)
h <- hist(x,nclass=nclass.scott,axes=FALSE,col=brewer.pal(10,"Set3"))
# usr <- par("usr")
at <- c(0, 10,30, par("usr")[4])
axis(2,at=at,labels = round(at))
text(x=h$mids, y=h$counts, labels=h$counts, pos=3, col="red")
usr
A vector of the form c(x1, x2, y1, y2) giving the extremes of the
user coordinates of the plotting region. When a logarithmic scale is
in use (i.e., par("xlog") is true, see below), then the x-limits will
be 10 ^ par("usr")[1:2]. Similarly for the y-axis.
You may want to run it several times, I have run it for many times, the bar won't seems to go outside the chart now.
Output:
What you describe is not a bug. You are using functionality to draw a histogram and then you want to add text to it. The function has not been designed for that, hence you need to reserve some additional white space for the text.
I suggest you run the function once, to get the "base values" of the graph. Then run the function again with adjusted scale (extra space for the text). In order to achieve this, you could use the following code
set.seed(9876) ### for reproducibility
x <- sample(1000000, 500, replace = TRUE)
h <- hist(x, nclass = nclass.scott, plot = FALSE)
### use the info from the previous call to adjust the y-scale with a constant
hist(x, nclass = nclass.scott, ylim = c(0, max(h$counts) + 10))
text(x = h$mids, y = h$counts, labels = h$counts, pos = 3, col = "red")
### ... or add a proportion (a little bit more robust)
hist(x, nclass = nclass.scott, ylim = c(0, max(h$counts) * 1.075))
text(x = h$mids, y = h$counts, labels = h$counts, pos = 3, col = "red")
Please let me know whether this is what you want.
I want to plot the histogram with real data and compare it with a theoretical normal distribution in one plot. But the scale looks different. Two plots have different scale
# you can generate some ramdom data on ystar which is realy data.
x<-seq(-4,4,length=200)
y<-dnorm(x,mean=0, sd=1)
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5),ylim=c(0,0.7))
par(new = TRUE)
hist(ystar,xlim = c(-10,10),freq = FALSE,ylim=c(0,0.7),breaks = 50)
Desire output
Assuming that ystar is a vector, you should change this:
y<-dnorm(x,mean=0, sd=1)
To:
y<-dnorm(x,mean=mean(ystar), sd=sd(ystar))
This will produce a distribution function that approximately matches the histogram.
You should then be able to use the same x-limits for both the histogram and the theoretical distribution, which will eliminate the strange overlapping axis labels you have in your current version.
I've looked through similar problems and I am unable to resolve mine based on what has been answered with them. I have a histogram of Body Mass Index Data. I have found the mean and sd of the data, and I am trying to overlay a normal pdf with the same mean and sd of the data to compare it with the histogram.
Here is what I have so far.
I used hist(BMI) to graph the histogram. I found the mean(BMI) to be 26.65 and the sd(BMI) to be 3.47.
I am trying to use the curve function to plot a normal curve with those same parameters over the histogram.
curve(dnorm(x, mean = 26.65, sd = 3.47), add = T, col = "red")
HistOverlay
As you can see, the red curve is barely visible at the very bottom of the histogram. Why is this error occurring?
Thank you.
The normal distribution is probability density function. Hence the integral of the function is 1. Simply speaking, it is the probability that a single individual has this BMI. To generate the result you want to have you need to multiply the curve with the size of your population.
Something along the line:
curve(100*dnorm(x, mean = 26.65, sd = 3.47), add = T, col = "red")
I am very much an R novice so I am guessing this question is rather stupid/simple...
I have two vectors that represent two samples.
I would like to plot each of them (different colors) against the uniform CDF (something like a Q-Q plot).
To be precise, I would like something very similar to plot #7 here (could not find what was used to draw that plot...). Figure 7 is displayed below:
only with multiple samples and some flexibility with changing the axis labels, colors and such.
Could you please point at a good direction?
For example:
set.seed(10)
N <- 1000
B = rt(N,df=10)
C = rchisq(N,df=10)
op <- layout(matrix(c(1,2),ncol=2,nrow=1))
qqnorm(B,col='green',ylab='student')
qqline(B, col = 2)
qqnorm(C,col='blue',ylab=expression( chi^2 ))
qqline(C, col = 2)
The basics for drawing a QQ-plot are:
qqnorm(variable, main = "QQ Plot", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
qqline(variable, col = "red")