R prob = TRUE in hist doesn't give out a density distribution - r

earthquake <- function(lambda=1, n_sim =10, n=100){
meanls <- c()
for (i in 1:n){
meanls <- c(meanls,round(mean(rexp(n_sim,1/lambda)),2))
xbar <- earthquake(2.4,1000,40)
hist(xbar, prob=TRUE, col="moccasin",las= TRUE)
I have the code above, and it should return a density distribution histogram since I set probability to TRUE, while I just get frequency diagram. Is there anything else I should do with the data?

If you set a random seed you can replicate your results. Otherwise you will need to adjust your xlim= according to your data. You do not say why you are using sd=2.4/sqrt(40)) as the standard deviation instead of sd(xbar) which is what I have used here. That produces a very broad, flat curve that does not match the data at all. If you wanted the standard error curve, that would be sd(xbar)/sqrt(40).
xbar <- earthquake(2.4, 1000, 40)
# [1] 2.19 2.59
hist(xbar, prob=TRUE, xlim=c(2.1, 2.7), col="moccasin", las= TRUE)
x <- seq(2.1, 2.7, length.out=100)
curve(dnorm(x, mean=mean(xbar), sd=sd(xbar)), col="blue", add=TRUE, lwd=2)
lines(density(xbar), col="red", lwd = 2)


Variables in R how, central limit

I have following task:
Assume the population of interest can be modeled by a Bernoulli distribution with
p = 0.5.
For each sample size n simulate r = 5, 000 draws (by using a for loop over (i in
1:r)) from that Bernoulli distribution with p = 0.5 and calculate the standardized
sample mean for each draw.
The last histogram looks good with a curve, but 1st and 2ns are wrong. Maybe someone han help me with this. Thanks in advance for your time!
I have done following:
x1 <- rbinom(5000,3,0.5)
main=expression(paste(" Random Variables with",size,"=1 and",prob,"=0.5")),
sub="Standardized value of smple sample avearge",
xlab="n=3", ylab="Probability", probability = TRUE)
curve(dnorm(x, mean = mean(x), sd=sd(x)), add = TRUE, col="blue")
Essentially what happened in the first two panels is that for a small n the histogram breaks were calculated in an ungraceful manner. You can fix that by letting the breaks depend on the data range. Here, I chose the breaks depending on whether the range of the data was smaller than 10. If this is TRUE, manually calculate breaks, otherwise use the default "Sturges" algorithm for breaks.
N <- c(2, 5, 25, 100)
for (i in seq_along(N)) {
set.seed(2015 + i)
n <- N[i]
xx <- rbinom(10000, n, 0.78)
if (diff(range(xx)) < 10) {
breaks <- seq(floor(min(xx)), ceiling(max(xx)))
} else {
breaks <- "Sturges"
x = xx, breaks = breaks,
main=expression(paste("Bernoulli Random Variables with",size,"=1 and",prob,"=0.78")),
sub = "Standardized value of sample average",
xlab = paste0("n=",n), ylab = "Probability", probability = TRUE
curve(dnorm(x, mean = mean(xx), sd=sd(xx)), add = TRUE, col="blue")
Created on 2021-01-07 by the reprex package (v0.3.0)

Overlay a Normal curve to Histogram

I repeat 50 times a rnorm with n=100, mean=100 and sd=25. Then I plot the histogram of all the sample means, but now I need to overlay a normal curve over the histogram.
x <- replicate(50, rnorm(100, 100, 25), simplify = FALSE)
sapply(x, mean)
sapply(x, sd)
hist(sapply(x, mean))
Do you know ow to overlay a normal curve over the histogram of the means?
When we plot the density rather than the frequency histogram by setting freq=FALSE, we may overlay a curve of a normal distribution with the mean of the means. For the xlim of the curve we use the range of the means.
mean.of.means <- mean(sapply(x, mean))
r <- range(sapply(x, mean))
v <- hist(sapply(x, mean), freq=FALSE, xlim=r, ylim=c(0, .5))
curve(dnorm(x, mean=mean.of.means, sd=1), r[1], r[2], add=TRUE, col="red")
Also possible is to draw a sufficient amount of a normal distribution, and overlay the histogram with the lines of the density distribution.
lines(density(rnorm(1e6, mean.of.means, 1)))
Note, that I have used 500 mean values in my answer, since the comparison with a normal distribution may become meaningless with too few values. However, you can play with the breaks= option in the histogram function.
x <- replicate(500, rnorm(100, 100, 25), simplify = FALSE)

Create sample vector data in R with a skewed distribution with limited range

I want to create in R a sample vector of data in R, in which I can control the range of values selected, so I think I want to use sample to limit the range of values generated rather than an rnorm-type command that generates a range of values based upon the type of distribution, variance, SD, etc.
So I'm looking to do a sample with a specified range (e.g. 1-5) for a skewed distribution something like this:
Here's what I have but does not provide a skewed distribution:
y=sample(1:5,234, replace=T)
How can I have my cake (limited range) and eat it too (skewed distribution), so to speak.
hist(sample(1:10, size = 100, replace = TRUE, prob = 10:1))
The beta distribution takes values from 0 to 1. If you want your values to be from 0 to 5 for instance, then you can multiply them by 5. Finally, you can get a "skewness" with the beta distribution.
For example, for the skewness you can get these three types:
And using R and beta distribution you can get similar distributions as follows. Notice that the Green Vertical line refers to mean and the Red to median:
x= rbeta(10000,5,2)
hist(x, main="Negative or Left Skewness", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
x= rbeta(10000,2,5)
hist(x, main="Positive or Right Skewness", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
x= rbeta(10000,5,5)
hist(x, main="Symmetrical", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
To better see what the sample function is doing with integers, use the barplot function, not the histogram function:
barplot(table(sample(1:10, size = 100, replace = TRUE, prob = 10:1)))

Getting values from kernel density estimation in R

I am trying to get density estimates for the log of stock prices in R. I know I can plot it using plot(density(x)). However, I actually want values for the function.
I'm trying to implement the kernel density estimation formula. Here's what I have so far:
a <- read.csv("boi_new.csv", header=FALSE)
S = a[,3] # takes column of increments in stock prices
dS=S[!is.na(S)] # omits first empty field
N = length(dS) # Sample size
rseed = 0 # Random seed
x = rep(c(1:5),N/5) # Inputted data
set.seed(rseed) # Sets random seed for reproducibility
QL <- function(dS){
h = density(dS)$bandwidth
r = log(dS^2)
f = 0*x
for(i in 1:N){
f[i] = 1/(N*h) * sum(dnorm((x-r[i])/h))
Any help would be much appreciated. Been at this for days!
You can pull the values directly from the density function:
x = rnorm(100)
d = density(x, from=-5, to = 5, n = 1000)
Alternatively, if you really want to write your own kernel density function, here's some code to get you started:
Set the points z and x range:
z = c(-2, -1, 2)
x = seq(-5, 5, 0.01)
Now we'll add the points to a graph
plot(0, 0, xlim=c(-5, 5), ylim=c(-0.02, 0.8),
pch=NA, ylab="", xlab="z")
for(i in 1:length(z)) {
points(z[i], 0, pch="X", col=2)
Put Normal density's around each point:
## Now we combine the kernels,
x_total = numeric(length(x))
for(i in 1:length(x_total)) {
for(j in 1:length(z)) {
x_total[i] = x_total[i] +
dnorm(x[i], z[j], sd=1)
and add the curves to the plot:
lines(x, x_total, col=4, lty=2)
Finally, calculate the complete estimate:
## Just as a histogram is the sum of the boxes,
## the kernel density estimate is just the sum of the bumps.
## All that's left to do, is ensure that the estimate has the
## correct area, i.e. in this case we divide by $n=3$:
plot(x, x_total/3,
xlim=c(-5, 5), ylim=c(-0.02, 0.8),
ylab="", xlab="z", type="l")
This corresponds to
density(z, adjust=1, bw=1)
The plots above give:

Plot weighted frequency matrix

This question is related to two different questions I have asked previously:
1) Reproduce frequency matrix plot
2) Add 95% confidence limits to cumulative plot
I wish to reproduce this plot in R:
I have got this far, using the code beneath the graphic:
#Set the number of bets and number of trials and % lines
numbet <- 36
numtri <- 1000
#Fill a matrix where the rows are the cumulative bets and the columns are the trials
xcum <- matrix(NA, nrow=numbet, ncol=numtri)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(5/6,1/6), replace = TRUE)
xcum[,i] <- cumsum(x)/(1:numbet)
#Plot the trials as transparent lines so you can see the build up
matplot(xcum, type="l", xlab="Number of Trials", ylab="Relative Frequency", main="", col=rgb(0.01, 0.01, 0.01, 0.02), las=1)
My question is: How can I reproduce the top plot in one pass, without plotting multiple samples?
You can produce this plot...
... by using this code:
boring <- function(x, occ) occ/x
boring_seq <- function(occ, length.out){
x <- seq(occ, length.out=length.out)
data.frame(x = x, y = boring(x, occ))
numbet <- 31
odds <- 6
plot(1, 0, type="n",
xlim=c(1, numbet + odds), ylim=c(0, 1),
main="Frequency matrix",
xlab="Successive occasions",
ylab="Relative frequency"
axis(2, at=c(0, 0.5, 1))
for(i in 1:odds){
xy <- boring_seq(i, numbet+1)
lines(xy$x, xy$y, type="o", cex=0.5)
for(i in 1:numbet){
xy <- boring_seq(i, odds+1)
lines(xy$x, 1-xy$y, type="o", cex=0.5)
You can also use Koshke's method, by limiting the combinations of values to those with s<6 and at Andrie's request added the condition on the difference of Ps$n and ps$s to get a "pointed" configuration.
ps <- ldply(0:35, function(i)data.frame(s=0:i, n=i))
plot.window(c(0,36), c(0,1))
apply(ps[ps$s<6 & ps$n - ps$s < 30, ], 1, function(x){
s<-x[1]; n<-x[2];
lines(c(n, n+1, n, n+1), c(s/n, s/(n+1), s/n, (s+1)/(n+1)), type="o")})
lines(6:36, 6/(6:36), type="o")
# need to fill in the unconnected points on the upper frontier
Weighted Frequency Matrix is also called Position Weight Matrix (in bioinformatics).
It can be represented in a form of a sequence logo.
This is at least how I plot weighted frequency matrix.
data(motifPWM); attributes(motifPWM) # Loads a sample position weight matrix (PWM) containing 8 positions.
plot(motifPWM) # Plots the PWM as sequence logo.
