Accordingo to ?runif, this function will not generate either of min or max bounds. How can I do something like runif but including min and max?
This is just for pure theory. I was wondering - what if I actually needed to randomly generate some values from uniform distribution, including the lower bound.
You can write your own uniform distribution function that includes the endpoints using the sample function:
myrunif <- function(n, min=0, max=1) {
min + (sample(.Machine$integer.max, n) - 1) / (.Machine$integer.max - 1) *
(max - min)
}
With this function, each endpoint has a small probability, 1/(.Machine$integer.max-1), of being returned.
However, it's worthwhile remembering that mathematically the probability of drawing either a or b (or any particular value) from a U(a, b) random variable is 0, so the current behavior of runif makes a lot of sense.
In pure theory the probability of any single value being generated from a continuous distribution will be 0, so the probability of min or max is 0.
From a practical standpoint if you really want to generate a uniform (which will round to a finite set of values and therefore having probability greater than 0 of being seen) with the possibility of seeing the desired min and max values, then just generate a uniform between min-epsilon and max+epsilon. Now min and max are in the range and have a chance of being chosen just like the other values. You just need to choose a value of epsilon such that values between min-epsilon and min will round to min and similar for the max.
Related
I'm a newbie in statistics and I'm studying R.
I decided to do this exercise to pratice some analysis with an original dataset.
This is the issue: I want to create a datset of let's say 100 subjects and for each one of them I have a test score.
This test score has a range that goes from 0 to 70 and the mean score is 48 (and its improbable that someone scores 0).
Firstly I tried to create the set with x <- round(runif(100, min=0, max=70)) , but then I found out that were not normally distributed using plot(x).
So I searched another Rcommand and found this, but I couldn't decide the min\max:
ex1 <- round(rnorm(100, mean=48 , sd=5))
I really can't understand what I have to do!
I would like to write a function that gives me a set of data normally distributed, in a range of 0-70, with a mean of 48 and a not so big standard deviation in order to do some T-test later...
Any help?
Thanks a lot in advance guys
The normal distribution, by definition, does not have a min or max. If you go more than a few standard deviations from the mean, the probability density is very small, but not 0. You can truncate a normal distribution, chopping of the tails. Here, I use pmin and pmax to set any values below 0 to 0, and any values above 70 to 70:
ex1 <- round(rnorm(100, mean=48 , sd=5))
ex1 <- pmin(ex1, 70)
ex1 <- pmax(ex1, 0)
You can calculate the probability of an individual observation being below or above a certain point using pnorm. For your mean of 48 and SD of 5, the probability an individual observation is less than 0 is very small:
pnorm(0, mean = 48, sd = 5)
# [1] 3.997221e-22
This probability is so small that the truncation step is unnecessary in most applications. But if you started experimenting with bigger standard deviations, or mean values closer to the bounds, it could become necessary.
This method of truncation is simple, but it is a bit of a hack. If you truncated a distribution to be within 1 SD of the mean using this method, you would end up with spikes a the upper and lower bound that are even higher than the density at the mean! But it should work well enough for less extreme applications. A more robust method might be to draw more samples than you need, and keep the first n samples that fall within your bounds. If you really care to do things right, there are packages that implement truncated normal distributions.
(Because the normal distribution is symmetric, and 100 is farther from your mean than 0, the probability of observations > 100 are even smaller.)
I would like to generate random numbers with both a specified mean & sd, AND a specified min and/or max (i.e., give me 100 random numbers between 0 and 50 with a mean=20 and sd=10; OR give me 100 random numbers with min=10, mean=25, sd=15).
I can specify mean & sd in runif, rnorm.
I can specify a range in sample (though I don't think I can specify ONLY a min or a max)
I need something where I can specify both, in R.
Thanks!
I can generate numbers with uniform distribution by using the code below:
runif(1,min=10,max=20)
How can I sample randomly generated numbers that fall more frequently closer to the minimum and maxium boundaries? (Aka an "upside down bell curve")
Well, bell curve is usually gaussian, meaning it doesn't have min and max. You could try Beta distribution and map it to desired interval. Along the lines
min <- 1
max <- 20
q <- min + (max-min)*rbeta(10000, 0.5, 0.5)
As #Gregor-reinstateMonica noted, Beta distribution is bounded on both ends, [0...1], so it could be easily mapped into any bounded interval just by scale and shift. It has two parameters, and symmetric if those parameters are equal. Above 1 parameters make it kind of bell distribution, but below 1 parameters make it into inverse bell, what you're looking for. You could play with them, put different values instead of 0.5 and see how it is going. Parameters equal to 1 makes it uniform.
Sampling from a beta distribution is a good idea. Another way is to sample a number of uniform numbers and then take the minimum or maximum of them.
According to the theory of order statistics, the cumulative distribution function for the maximum is F(x)^n where F is the cdf from which the sample is taken and n is the number of samples, and the cdf for the minimum is 1 - (1 - F(x))^n. For a uniform distribution, the cdf is a straight line from 0 to 1, i.e., F(x) = x, and therefore the cdf of the maximum is x^n and the cdf of the minimum is 1 - (1 - x)^n. As n increases, these become more and more curved, with most of the mass close to the ends.
A web search for "order statistics" will turn up some resources.
If you don't care about decimal places, a hacky way would be to generate a large sample of normally distributed datapoints using rnorm(), then count the number of times each given rounded value appears (n), and then substract n from the maximum value of n (max(n)) to get inverse counts.
You can then use the inverse count to make a new vector (that you can sample from), i.e.:
library(tidyverse)
x <- rnorm(100000, 100, 15)
x_tib <- round(x) %>%
tibble(x = .) %>%
count(x) %>%
mutate(new_n = max(n) - n)
new_x <- rep(x_tib$x, x_tib$new_n)
qplot(new_x, binwidth = 1)
An "upside-down bell curve" compared to the normal distribution can be sampled using the following algorithm. I write it in pseudocode because I'm not familiar with R. Notice that this sampler samples in a truncated interval (here, the interval [x0, x1]) because it's not possible for an upside-down bell curve extended to infinity to integrate to 1 (which is one of the requirements for a probability density).
In the pseudocode, RNDU01() is a uniform(0, 1) random number.
x0pdf = 1-exp(-(x0*x0))
x1pdf = 1-exp(-(x1*x1))
ymax = max(x0pdf, x1pdf)
while true
# Choose a random x-coordinate
x=RNDU01()*(x1-x0)+x0
# Choose a random y-coordinate
y=RNDU01()*ymax
# Return x if y falls within PDF
if y < 1-exp(-(x*x)): return x
end
I want to generate some data which correspond to a quantile function. But the data need a min and a max value.
set.seed(30)
a1<-950 ; a2<-0; a3<-2.48; a4<-1.92
invcdf<-function (x)(a1+a2*a3*((-log(x))^(1/a4)))/(a3*((-log(x))^(1/a4))+1)
t<-invcdf(runif(2000,min=80,max=800))
When I use min and max in the runif function NaN's are produced.
How can I improve this code to avoid NaN's? I can't change the parameters.
Since you don't explain what exactly you are trying to do (which distribution are you trying to sample?), all I can do is interpret this as an attempt to generate random variable according to some distribution using its inverse CDF function. Because I don't know which it is, I can't comment on whether your implementation of it is correct.
However, when you use this method, you should know that the CDF function takes values between 0 and 1, as it is a cumulative density, starting at 0, and going to 1 in some limit.
The inverse of that function then only makes sense if you feed it values between 0 and 1, and that is where a possible error lies. runif(2000,min=80,max=800) generates random values between 80 and 800, way outside the (0,1) interval.
If you instead do this:
t <- invcdf(runif(2000))
We do get results (which happen to lie between 80 and 800 mostly):
In an assignment, it asks to "Draw two separate samples of 100 independent standard normals" using R. I assume that I use the rnorm function, which returns a value from a normal distribution. Given that the standard normal is the same every time, if I just use rnorm(100, mean = 0, std = 1) will that meet the requirements?
Thanks!
Should be ...sd = 1... . And "mean = 0, sd = 1" are the defaults for rnorm, so this is equivalent to rnorm(100). As for your class requirements, it's probably not for us to say what will satisfy them.
Running the code
rnorm(100, mean = 0, std = 1)
will generate two separate samples of 100 independent standard normal variables. As every time we run it, it draws random numbers, which follow SN distribution, sample will be independent.
Also, you can simply use
rnorm(100)
As, if mean or sd are not specified rnorm() function assumes the default values of 0 and 1, respectively.