For an assignment I was asked this:
For the values of
(shape=5,rate=1),(shape=50,rate=10),(shape=.5,rate=.1), plot the
histogram of a random sample of size 10000. Use a density rather than
a frequency histogram so that you can add in a line for the population
density (hint: you will use both rgamma and dgamma to make this plot).
Add an abline for the population and sample mean. Also, add a subtitle
that reports the population variance as well as the sample variance.
My current code looks like this:
library(ggplot2)
set.seed(1234)
x = seq(1, 1000)
s = 5
r = 1
plot(x, dgamma(x, shape = s, rate = r), rgamma(x, shape = s, rate = r), sub =
paste0("Shape = ", s, "Rate = ", r), type = "l", ylab = "Density", xlab = "", main =
"Gamma Distribution of N = 1000")
After running it I get this error:
Error in plot.window(...) : invalid 'xlim' value
What am I doing incorrectly?
plot() does not take y1 and y2 arguments. See ?plot. You need to do a plot (or histogram) of one y variable (e.g., from rgamma), then add the second y variable (e.g., from dgamma) using something like lines().
Here's one way to get a what you want:
#specify parameters
s = 5
r = 1
# plot histogram of random draws
set.seed(1234)
N = 1000
hist(rgamma(N, shape=s, rate=r), breaks=100, freq=FALSE)
# add true density curve
x = seq(from=0, to=20, by=0.1)
lines(x=x, y=dgamma(x, shape=s, rate=r))
Related
I have following task:
Assume the population of interest can be modeled by a Bernoulli distribution with
p = 0.5.
For each sample size n simulate r = 5, 000 draws (by using a for loop over (i in
1:r)) from that Bernoulli distribution with p = 0.5 and calculate the standardized
sample mean for each draw.
The last histogram looks good with a curve, but 1st and 2ns are wrong. Maybe someone han help me with this. Thanks in advance for your time!
I have done following:
set.seed(2005)
x1 <- rbinom(5000,3,0.5)
par(mfrow=c(2,2))
hist(x=x1,
main=expression(paste(" Random Variables with",size,"=1 and",prob,"=0.5")),
sub="Standardized value of smple sample avearge",
xlab="n=3", ylab="Probability", probability = TRUE)
curve(dnorm(x, mean = mean(x), sd=sd(x)), add = TRUE, col="blue")
Essentially what happened in the first two panels is that for a small n the histogram breaks were calculated in an ungraceful manner. You can fix that by letting the breaks depend on the data range. Here, I chose the breaks depending on whether the range of the data was smaller than 10. If this is TRUE, manually calculate breaks, otherwise use the default "Sturges" algorithm for breaks.
par(mfrow=c(2,2))
N <- c(2, 5, 25, 100)
for (i in seq_along(N)) {
set.seed(2015 + i)
n <- N[i]
xx <- rbinom(10000, n, 0.78)
if (diff(range(xx)) < 10) {
breaks <- seq(floor(min(xx)), ceiling(max(xx)))
} else {
breaks <- "Sturges"
}
hist(
x = xx, breaks = breaks,
main=expression(paste("Bernoulli Random Variables with",size,"=1 and",prob,"=0.78")),
sub = "Standardized value of sample average",
xlab = paste0("n=",n), ylab = "Probability", probability = TRUE
)
curve(dnorm(x, mean = mean(xx), sd=sd(xx)), add = TRUE, col="blue")
}
Created on 2021-01-07 by the reprex package (v0.3.0)
I'm trying to fit Variance-Gamma distribution to empirical data of 1-minute logarithmic returns. In order to visualize the results I plotted together 2 histograms: empirical and theoretical.
(a is the vector of empirical data)
SP_hist <- hist(a,
col = "lightblue",
freq = FALSE,
breaks = seq(a, max(a), length.out = 141),
border = "white",
main = "",
xlab = "Value",
xlim = c(-0.001, 0.001))
hist(VG_sim_rescaled,
freq = FALSE,
breaks = seq(min(VG_sim_rescaled), max(VG_sim_rescaled), length.out = 141),
xlab = "Value",
main = "",
col = "orange",
add = TRUE)
(empirical histogram-blue, theoretical histogram-orange)
However, after having plotted 2 histograms together, I started wondering about 2 things:
In both histograms I stated, that freq = FALSE. Therefore, the y-axis should be in range (0, 1). In the actual picture values on the y-axis exceed 3,000. How could it happen? How to solve it?
I need to change the bucketing size (the width of the buckets) and the density per unit length of the x-axis. How is it possible to do these tasks?
Thank you for your help.
freq=FALSE means that the area of the entire histogram is normalized to one. As your x-axis has a very small range (about 10^(-4)), the y-values must be quite large to achieve an area (= x times y) of one.
The only way to set the number of bins is by providing a vector of break points to the parameter breaks. Theoretically, this parameter also accepts a single number, but this number is ignored by hist. Thus try the following:
bins <- 6 # number of cells
breaks <- seq(min(x),max(x),(max(x)-min(x))/bins)
hist(x, freq=FALSE, breaks=breaks)
I need to overlay a normal distribution curve based on a dataset on a histogram of the same dataset.
I get the histogram and the normal curve right individually. But the curve just stays a flat line when combined to the histogram using the add = TRUE attribute in the curve function.
I did try adjusting the xlim and ylim to check if it works but am not getting the intended results, I am confused about how to set the (x and y) limits to suit both the histogram and the curve.
Any suggestions? My dataset is a set of values for 100 individuals daily walk distances ranging from min = 0.4km to max = 10km
bd.m <- read_excel('walking.xlsx')
hist(bd.m, ylim = c(0,10))
curve(dnorm(x, mean = mean(bd.m), sd = sd(bd.m)), add = TRUE, col = 'red')
You need to set freq = FALSEin the call to hist. For example:
dt <- rnorm(1000, 2)
hist(dt, freq = F)
curve(dnorm(x, mean = mean(dt), sd = sd(dt)), add = TRUE, col = 'red')
I am trying to plot the density of the gamma distribution.
x<-seq(0,10000,length.out = 1000)
plot(density(rgamma(1000,shape = 7,scale = 120)))
plot(dgamma(x,shape=7,scale=120),col="red")
But, I don't understand why both plots are totally different.
Since you didn't supply x in the final call, the x coordinates defaulted to the indices 1,2,3,...1000 of the vector dgamma(x,shape=7,scale=120) rather than the intended 0,10,20,....
If you do:
x<-seq(0,10000,length.out = 1000)
plot(density(rgamma(1000,shape = 7,scale = 120)))
points(x,dgamma(x,shape=7,scale=120),type = "l", col="red")
Then the graph is:
I have a function in R which creates a standard normal plot, and then uses a for loop that calls density plots for the t distribution for various degrees of freedom. The plot looks like:
Note that the density for degrees of freedom = 2 extends outside of the y axis limits. I am wondering if there is a way to edit the for loop so that the axis limits are adjusted according to the range of the density lines that are drawn.
The for loop code that I am using is as follows:
N <- 1000
n <- c(25,50,100,200)
df<-c(1:4,seq(5,25,by=5))
histPlot <- function(data) {
x <- seq(-4, 4, length=100)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l",
main=paste("Distribution of size", nrow(data)/9000, sep=" "),
xlab="standard deviation")
colors <- brewer.pal(n = 9, name = "Spectral")
i<-1
for (d in df) {
lines(density(data[data$df==d, "t"]),col=colors[i])
legend("topright", pch=c(21,21), col=c(colors, "black"), legend=c(df, "normal"), bty="o", cex=.8)
i <- i+1
}
}
The lines functions called inside the for loop add up to the existing plot.
This means you have to change the ylim parameter in the plot function call. This will make a higher plot, and lines will be visible when added.
Try like this:
plot(x, y, type="l",
main=paste("Distribution of size", nrow(data)/9000, sep=" "),
xlab="standard deviation",
ylim = c(0, 1)) # This line will make the plot higher, i.e. the y axis range will be from 0 to 1