I want to calculate the integral of the Normal Distribution at exactly some point - I know that to do this, this is the equivalent of integrating the Normal Distribution at that point and at some point slightly after that point : then, you can subtract both of these values and get an approximate answer.
I tried doing this in R:
a = pnorm(1.96, mean = 0, sd = 1, log = FALSE)
b = pnorm(1.961, mean = 0, sd = 1, log = FALSE)
final_answer = b - a
#5.83837e-05
Is it possible to do this in one step instead of manually subtracting "a" and "b"?
Thank you!
We need to be clear about what you are asking here. If you are looking for the integral of a normal distribution at a specific point, then you can use pnorm, which is the anti-derivative of dnorm.
We can see this by reversing the process and looking at the derivative of pnorm to ensure it matches dnorm:
# Numerical approximation to derivative of pnorm:
delta <- 10^-6
(pnorm(0.75 + delta) - pnorm(0.75)) / delta
#> [1] 0.3011373
Note that this is a very close approximation of dnorm
dnorm(0.75)
#> [1] 0.3011374
So the anti-derivative of a normal distribution density at point x is given by:
pnorm(x)
You can try this
> diff(pnorm(c(1.96, 1.961), mean = 0, sd = 1, log = FALSE))
[1] 5.83837e-05
I'm working on a task where we look at 12 independent and identically distributed random variables - each of which have standard normal distribution.
From that I understand we have a mean of 0 and sd of 1.
We then have an interval of (-1.644, 1.644)
To find the probability of a single random variable landing in this interval I write:
(pnorm(1.644, mean = 0, sd = 1, lower.tail=TRUE) - pnorm(-1.644, mean = 0, sd = 1, lower.tail=TRUE))
Which returns the Probability of 0.8998238
I'm able to find the probability of at least one of the 12 random variables landing outside of the interval (-1.644, 1.644) with the following:
PROB_1 = 1-(0.8998238^12))
#PROB_1 = 0.7182333
However - How would if find the probability of Exactly 2 random variables landing outside of the interval? I've attempted the following:
((12*11)/2)*((1-0.7182333)^2)*(0.7182333^10)
I'm sure I'm missing something here, and there is a much easier way to solve this.
Any help is much appreciated.
You need the binomial coefficient
prob=pnorm(1.644, mean = 0, sd = 1, lower.tail=TRUE)-pnorm(-1.644, mean = 0, sd = 1, lower.tail=TRUE)
dbinom(2, 12, 1-prob)
prob^10 * (1-prob)^2 * choose(12, 2)
0.2304877
I have to make a normal distribution from a set of pre-established data, henceforth xvec. So, I know I need to use dnorm(xvec,meanvec,sdvec). But what values I put for mean and sd? Can I put always meanvec = mean(xvec) and sdvec = sd(xvec)? Is it a reasonable way? Or is it preferable let the default values of mean=0 and sd=1?
I'm asking this because I looked some examples and the values for mean and sd alwayes were chosen before. For example, this one, from https://www.tutorialspoint.com/r/r_normal_distribution.htm:
Create a sequence of numbers between -10 and 10 incrementing by 0.1.
x <- seq(-10, 10, by = .1)
Choose the mean as 2.5 and standard deviation as 0.5.
y <- dnorm(x, mean = 2.5, sd = 0.5)
Why did he put mean=2.5 and sd=0.5, once
> mean(x)
[1] 5.105265e-16
> sd(x)
[1] 5.816786?
This question already has an answer here:
How can I simulate m random samples of size n from a given distribution with R?
(1 answer)
Closed 5 years ago.
How to calculate medians in R and create a histogram with a normal distribution mu=16 and sigma=4
I think that you probably want a sample with 1000 observations but shrinked to size . For doing that, you'll need a sample() function:
set.seed(12)
s1 <- sample(x = 1:1000, size = 10)
s2 <- sample(x = 1:1000, size = 40)
median(s1)
median(s2)
hist(s1)
hist(s2)
The second option is to go with rnorm(), a function that generates a random sample from a normal distribution based on the specified parameters.
set.seed(12)
s1 = rnorm(1000, mean = 0, sd = 1)
s2 = rnorm(1000, mean = 35, sd = 0.1))
median(s1)
median(s2)
hist(s1)
hist(s2)
Ps. I set the seed to have reproducible results. You may skip that line.
Note that for the second option we assume a normal(Gaussian) distribution.
Learn more about probability distributions here:
http://blog.cloudera.com/blog/2015/12/common-probability-distributions-the-data-scientists-crib-sheet/
n and u are numbers, not vectors. There must be some more information given, such as which distribution you're sampling from, and the population mean and standard deviation. For example, if you want to generate a sample of 1000 from a normal distribution with a mean 0 and sd 1, you would use
sample = rnorm(1000, 0, 1)
From there you can plot the histogram and calculate median:
median(sample)
hist(sample)
I want to generate a random distribution of say 10,000 numbers with predefined min, max, mean, and sd values. I have followed this link setting upper and lower limits in rnorm to get random distribution with fixed min and max values. However, in doing so, mean value changes.
For example,
#Function to generate values between a lower limit and an upper limit.
mysamp <- function(n, m, s, lwr, upr, nnorm) {
set.seed(1)
samp <- rnorm(nnorm, m, s)
samp <- samp[samp >= lwr & samp <= upr]
if (length(samp) >= n) {
return(sample(samp, n))
}
stop(simpleError("Not enough values to sample from. Try increasing nnorm."))
}
Account_Value <- mysamp(n=10000, m=1250000, s=4500000, lwr=50000, upr=5000000, nnorm=1000000)
summary(Account_Value)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 50060 1231000 2334000 2410000 3582000 5000000
#Note - though min and max values are good, mean value is very skewed for an obvious reason.
# sd(Account_Value) # 1397349
I am not sure whether we can generate a random normal distribution that meets all conditions. If there is any other sort of random distribution that can meet all conditions, please do share too.
Look forward to your inputs.
-Thank you.
You could use a generalized form of the beta distribution, known as the Pearson type I distribution. The standard beta distribution is defined on the interval (0,1), but you can take a linear transformation of a standard beta distributed variable to obtain values between any (min, max). The answer to this question on CrossValidated explains how to parameterize a beta distribution with its mean and variance, with certain constraints.
While it's possible to formulate both a truncated normal and a generalized beta distribution with the desired min, max, mean and sd, the shape of the two distributions will be very different. This is because the truncated normal distribution has a positive probability density at the endpoints of its support interval, while in the generalized beta distribution the density will always fall smoothly to zero at the endpoints. Which shape is more preferable will depend on your intended application.
Here's an implementation in R for generating generalized beta distributed observations with a mean, variance, min and max parameterization.
rgbeta <- function(n, mean, var, min = 0, max = 1)
{
dmin <- mean - min
dmax <- max - mean
if (dmin <= 0 || dmax <= 0)
{
stop(paste("mean must be between min =", min, "and max =", max))
}
if (var >= dmin * dmax)
{
stop(paste("var must be less than (mean - min) * (max - mean) =", dmin * dmax))
}
# mean and variance of the standard beta distributed variable
mx <- (mean - min) / (max - min)
vx <- var / (max - min)^2
# find the corresponding alpha-beta parameterization
a <- ((1 - mx) / vx - 1 / mx) * mx^2
b <- a * (1 / mx - 1)
# generate standard beta observations and transform
x <- rbeta(n, a, b)
y <- (max - min) * x + min
return(y)
}
set.seed(1)
n <- 10000
y <- rgbeta(n, mean = 1, var = 4, min = -4, max = 5)
sapply(list(mean, sd, min, max), function(f) f(y))
# [1] 0.9921269 2.0154131 -3.8653859 4.9838290
Discussion:
Hi. It is very interesting problem. It needs quite an effort to be solved properly and not always solution can be found.
First thing is that when you truncate a distribution (set a min and max for it) standard deviation is limited (has a maximum depending on min and max values). If you want too big value of it - you can not get it.
Second restriction limits mean. It is obvious that if you want mean below minimum and above maximum it will not work, but you may want something too close to limits and still it can not be satisfied.
Third restriction limits a combination of this parameters. Im not sure how does it work, but i am pretty sure not all the combinations may be satisfied.
But there are some combinations that may work and may be found.
Solution:
The problem is: what are the parameters: mean and sd of truncated (cut) distribution with defined limits a and b, so in the end the mean will be equal to desired_mean and standard deviation will be equal to desired_sd.
It is important that values of parameters: mean and sd are used before truncation. So that is why in the end mean and deviation are diffrent.
Below is the code that solves the problem using function optim(). It may not be the best solution for this problem, but it generally works:
require(truncnorm)
eval_function <- function(mean_sd){
mean <- mean_sd[1]
sd <- mean_sd[2]
sample <- rtruncnorm(n = n, a = a, b = b, mean = mean, sd = sd)
mean_diff <-abs((desired_mean - mean(sample))/desired_mean)
sd_diff <- abs((desired_sd - sd(sample))/desired_sd)
mean_diff + sd_diff
}
n = 1000
a <- 1
b <- 6
desired_mean <- 3
desired_sd <- 1
set.seed(1)
o <- optim(c(desired_mean, desired_sd), eval_function)
new_n <- 10000
your_sample <- rtruncnorm(n = new_n, a = a, b = b, mean = o$par[1], sd = o$par[2])
mean(your_sample)
sd(your_sample)
min(your_sample)
max(your_sample)
eval_function(c(o$par[1], o$par[2]))
I am very interested if there are other solutions to that problem, so please post them if you find other answers.
EDIT:
#Mikko Marttila: Thanks to your comment and link: Wikipedia I implemented formulas to calculate mean and sd of truncated distribution. Now the solution is WAY more elegant and it should calculate quite accurately mean and sd of the desired distribution if they exist. It works much faster also.
I implemented eval_function2 which should be used in the optim() function instead of previous one:
eval_function2 <- function(mean_sd){
mean <- mean_sd[1]
sd <- mean_sd[2]
alpha <- (a - mean)/sd
betta <- (b - mean)/sd
trunc_mean <- mean + sd * (dnorm(alpha, 0, 1) - dnorm(betta, 0, 1)) /
(pnorm(betta, 0, 1) - pnorm(alpha, 0, 1))
trunc_var <- (sd ^ 2) *
(1 +
(alpha * dnorm(alpha, 0, 1) - betta * dnorm(betta, 0, 1))/
(pnorm(betta, 0, 1) - pnorm(alpha, 0, 1)) -
(dnorm(alpha, 0, 1) - dnorm(betta, 0, 1))/
(pnorm(betta, 0, 1) - pnorm(alpha, 0, 1)))
trunc_sd <- trunc_var ^ 0.5
mean_diff <-abs((desired_mean - trunc_mean)/desired_mean)
sd_diff <- abs((desired_sd - trunc_sd)/desired_sd)
}