Generate uniform random variable when lower boundary is close to zero - r

When I run in R runif(100,max=0.1, min=1e-10)
I get 100 uniformly distributed random variables between 0.1 and 0.0001. So, there is no random value between 0.0001 and the min value (min=1e-10). How to generate uniform random variables on the whole interval (between min and max values)?

Maybe you aren't generating enough to make it likely enough that you've seen one:
> range(runif(100,max=0.1,min=exp(-10)))
[1] 0.00199544 0.09938462
> range(runif(1000,max=0.1,min=exp(-10)))
[1] 0.0002407759 0.0999674631
> range(runif(10000,max=0.1,min=exp(-10)))
[1] 5.428209e-05 9.998912e-02
How often do they occur?
> sum(runif(10000,max=0.1,min=exp(-10)) < .0001)
[1] 5
5 in that sample of 10000. So the chances of getting one in a sample of 100 is... (Actually you can work this out exactly from the number and the properties of a Uniform distribution).

(Edited to replace exp(-10) with 1e-10)
Given your max of 0.1 and min of 1e-10, the probability that any given value is less than 1e-4 is given by
(1e-4 - 1e-10) / (0.1 - 1e-10) = 9.99999e-04
The probability that 100 random values from this distribution are all greater than 1e-4 is
(1 - 9.99999e-04) ^ 100 = 0.90479
About 90.5%. So you shouldn't be at all surprised that in a draw of 100 numbers from this distribution, you didn't see any less than 1e-4. This is expected more than 90.5% of the time theoretically. We can even verify this in simulation:
set.seed(47) # for replicability
# 100,000 times, draw 100 numbers from your uniform distribution
d = replicate(n = 1e5, runif(100, max = 0.1, min = 1e-10))
# what proportion of the 100k draws have no values less than 1e-4?
mean(colSums(d < 1e-4) == 0)
# [1] 0.90557
# 90.56% - very close to our calculated 90.48%
For more precision, we can repeat with even more replications
# same thing, 1 million replications
d2 = replicate(n = 1e6, runif(100, max = 0.1, min = 1e-10))
mean(colSums(d2 < 1e-4) == 0)
# [1] 0.90481
So, with 1MM replications, runif() is almost exactly meeting expectations. It is off from the expectation by 0.90481 - 0.90479 = 0.00002. I would say there is absolutely no evidence that runif is broken.
We can even plot the histograms for some of the replications. Here are the first 20:
par(mfrow = c(4, 5), mar = rep(0.4, 4))
for (i in 1:20) {
hist(d[, i], main = "", xlab = "", axes = F,
col = "gray70", border = "gray40")
}
The histograms are showing 10 bars each, so each bar is about .01 wide (since the total range is about 0.1). The range you are interested in is about 0.0001 wide. To see that in a histogram, we would need to plot 1,000 bars per plot, 100 times as many bars. Using 1,000 bins doesn't make a lot of sense when there are only 100 values. Of course almost all the bins will be empty, and the lowest one, in particular, will be empty about 90% of the time as we calculated above.
To get more very low random values, your two choices are (a) draw more numbers from the uniform or (b) change distributions to one that has more weight closer to 0. You could try an exponential distribution? Or maybe, if you want a hard upper bound as well you could scale a beta distribution? Your other choice is to not use random values at all, maybe you want evenly spaced values and seq is what you're looking for?

Related

Monte-Carlo analysis on raster: Increasingly count the number of times an equation generated a condition that was correct, for each pixel

I am trying to write a script in R to perform a Monte Carlo analysis based on the law of large numbers.
A short introduction on the problem:
In slope stability the factor of safety (FS) - which is calculated from equations - indicates whether or not a slope is stable. When the FS is less than 1, the slope is unstable and may fail.
We use different equations that use field parameters to estimate the FS, but these parameters are uncertain since it will never be possible to sample all points on the terrain. Therefore, I want to sample from probability density distributions (PDF) values to make up my input rasters, calculate the FS and count against the number of iterations (loops) how many times the FS < 1 for each pixel. With that I can use this count (per pixel) and divide by the number of iterations (Law of Large numbers), generating a probabilistic raster based on the FS.
For example: if a pixel after say, 10000 iterations, scored FS < 1 in 250 iterations, then the probability of failure for this pixel is 0.025 (2.5%). If FS < 1 in 4875 iterations, the probability is 0.4875 (48.75%) and so on. At the end of the process I want a continuous raster ranging from minimum 0 and maximum 1.
my code so far..
# Import Rasters - Slope raster and derivations used on the equation
slp <- raster("C:/Users/Usuario/Desktop/Dados_SHALSTABPROB/TIN_TopoRaster_1.1/slp.tif")
cos2slp <- (cos((slp*pi)/180)*cos((slp*pi)/180))
sinslp <- sin((slp*pi)/180)
coslp <- cos((slp*pi)/180)
# GeotechMask is a raster containing pixels with a value of 1 or 4. Depending on the pixel, the sampling takes place in different PDFs. Below are the distributions
GeotechMask <- raster("C:/Users/Usuario/Desktop/Dados_SHALSTABPROB/TIN_TopoRaster_1.1/BST/Geotecnica_BST.tif")
# Random distributions
n = 10000
Cca <- rlnormTrunc(n = n, meanlog = 6822.766571, sdlog = 2900.196608, min = 3152.737752, max = 9550.432277) # These are soil cohesion values when GeotechMask = 1.
CPVa <- rlnormTrunc(n = n, meanlog = 10462.53602, sdlog = 2250.075859, min = 7521.613833, max = 12772.33429) # These are soil cohesion values when GeotechMask = 4.
PHIca <- rlnormTrunc(n = n, meanlog = 17.25, sdlog = 12.38, min = 7.05, max = 34.33) # These are soil angle of internal friction values when GeotechMask = 1.
PHIPVa <- rlnormTrunc(n = n, meanlog = 23.31, sdlog = 5.34, min = 16.05, max = 30.18) # These are soil angle of internal friction values when GeotechMask = 4.
Pesp <- rnormTrunc(n = n, mean = 1800, sd = 212.13, min = 1700, max = 2000) # These are soil specific weight values under any condition.
NormH <- rnormTrunc(n=n, mean = 6.057426, sd = 1.358259, min = 0.5, max = 10) # Max depth of failure under any condition.
NormU <- rnormTrunc(n=n, mean = 2.5, sd = 3.53, min = 0, max = 5) # Max water table height under any condition.
# Monte Carlo
M = 10 # Iterations
g <- 9.8 # Gravity
pw <- 1000 # Water specific weight
for (i in 1:M){
CODE.. This is where im stuck. Some considerations below
}
The FS equation i'm using is: FS <- ( ( c + cos2slp * (PEsp * g * (H-U) ) + (PEsp * g - pw * g) * U ) * tan((phi*pi)/180) ) / (H * PEsp * g * sinslp * coslp) - already formatted to be used in the code.
Where: all variables in the equation are rasters with the same length, dimension and projection; c will be a raster sampled from the Cca or CPVa distributions; PEsp will be a raster sampled from the Pesp distribution; H will be a raster sampled from the NormH distribution; U will be a raster sampled from the NormU distribution and Phi will be a raster sampled from the PHIca or PHIPVa distributions. If it would be faster to only sample say, 10 random values from the distribution to account for all pixels in some of the rules, that's not a problem.
It is worth noting that if there is a more efficient way to tackle the problem, one that does not work directly on top of the rasters, it can be implemented as long as it respects the geographical coordinates of the rasters.
I know that there are ways that are more efficient than others for this type of problem, I just don't know how to implement them. The intention is to iterate over the equation thousands of times.
The rasters have 3.518.663 pixels, my machine has an intel core i7 processor, 8GB RAM, GeForce MX110 2GB dedicated video card and SSD.

R How to sample from an interrupted upside down bell curve

I've asked a related question before which successfully received an answer. Now I want to sample values from an upside down bell curve but exclude a range of values that fall in the middle of it like shown on the picture below:
I have this code currently working:
min <- 1
max <- 20
q <- min + (max-min)*rbeta(10000, 0.5, 0.5)
How may I adapt it to achieve the desired output?
Say you want a sample of 10,000 from your distribution but don't want any numbers between 5 and 15 in your sample. Why not just do:
q <- min + (max-min)*rbeta(50000, 0.5, 0.5);
q <- q[!(q > 5 & q < 15)][1:10000]
Which gives you this:
hist(q)
But still has the correct size:
length(q)
#> [1] 10000
An "upside-down bell curve" compared to the normal distribution, with the exclusion of a certain interval, can be sampled using the following algorithm. I write it in pseudocode because I'm not familiar with R. I adapted it from another answer I just posted.
Notice that this sampler samples in a truncated interval (here, the interval [x0, x1], with the exclusion of [x2, x3]) because it's not possible for an upside-down bell curve extended to infinity to integrate to 1 (which is one of the requirements for a probability density).
In the pseudocode, RNDU01() is a uniform(0, 1) random number.
x0pdf = 1-exp(-(x0*x0))
x1pdf = 1-exp(-(x1*x1))
ymax = max(x0pdf, x1pdf)
while true
# Choose a random x-coordinate
x=RNDU01()*(x1-x0)+x0
# Choose a random y-coordinate
y=RNDU01()*ymax
# Return x if y falls within PDF
if (x<x2 or x>x3) and y < 1-exp(-(x*x)): return x
end

How to Standardize a Column of Data in R and Get Bell Curve Histogram to fins a percentage that falls within a ranges?

I have a data set and one of columns contains random numbers raging form 300 to 400. I'm trying to find what proportion of this column in between 320 and 350 using R. To my understanding, I need to standardize this data and creates a bell curve first. I have the mean and standard deviation but when I do (X - mean)/SD and get histogram from this column it's still not a bell curve.
This the code I tried.
myData$C1 <- (myData$C1 - C1_mean) / C1_SD
If you are simply counting the number of observations in that range, there's no need to do any standardization and you may directly use
mean(myData$C1 >= 320 & myData$C1 <= 350)
As for the standardization, it definitely doesn't create any "bell curves": it only shifts the distribution (centering) and rescales the data (dividing by the standard deviation). Other than that, the shape itself of the density function remains the same.
For instance,
x <- c(rnorm(100, mean = 300, sd = 20), rnorm(100, mean = 400, sd = 20))
mean(x >= 320 & x <= 350)
# [1] 0.065
hist(x)
hist((x - mean(x)) / sd(x))
I suspect that what you are looking for is an estimate of the true, unobserved proportion. The standardization procedure then would be applicable if you had to use tabulated values of the standard normal distribution function. However, in R we may do that without anything like that. In particular,
pnorm(350, mean = mean(x), sd = sd(x)) - pnorm(320, mean = mean(x), sd = sd(x))
# [1] 0.2091931
That's the probability P(320 <= X <= 350), where X is normally distributed with mean mean(x) and standard deviation sd(x). The figure is quite different from that above since we misspecified the underlying distribution by assuming it to be normal; it actually is a mixture of two normal distributions.

How to generate normally distributed random numbers in specific interval?

I want to generate 100 normally distributed random number in interval [-50,50]. However in the below code the range of random number generated is [-50,50].
n <- rnorm(100, -50,50)
plot(n)
Your question is atrangely asked, because it seems you don't fully understand the rnorm function.
rnorm(100, -50,50)
generates a sample of 100 points given by a normal distribution centered on -50, with a standard deviation of 50. So you need to specifiy what you want by :
100 normally distributed random number in interval [-50,50]. In a normal distribution you don't give an upper and lower limit : the probability of drawing is never 0, but is just very low when being several standard deviation away from the mean. So:
Or you want a normal distribution centered on 0 with 50 standard deviation, and the answer is rnorm(100, 0,50), but you will have values above 50 and below -50.
Or you actually want a normal distribution with no value outside the [-50,50] range, and in this case you still need to give a standard deviation, and you will need to cut the values draw outside the range. You could do something like:
sd <- 50
n <- data.frame(draw = rnorm(1000, 0,sd))
final <- sample(n$draw[!with(n, draw > 50 | draw < -50)],100)
Here is an example of what it does for 2 different sd:
sd <- 10
n1 <- data.frame(draw = rnorm(1000, 0,sd))
final1 <- sample(n$draw[!with(n, draw > 50 | draw < -50)],100)
sd <- 50
n2 <- data.frame(draw = rnorm(1000, 0,sd))
final2 <- sample(n$draw[!with(n, draw > 50 | draw < -50)],100)
par(mfrow = c(1,2))
hist(final1,main = "sd = 10")
hist(final2,main = "sd = 50")
or you just want to sample values in this range with a flat distribution. In this case, just sample(-50:50,100,replace = T)
You have to make a sacrifice. Either your random variable is not normally distributed because the tails are cut off, or you compromise on the boundaries. You can define your random variable to "practically" lie in a range, this is you accept that a very small percentage lies outside. Maybe 1 % would be an acceptable choice for your purpose.
my_range <- setNames(c(-50, 50), c("lower", "upper"))
prob <- 0.01 # probability to lie outside of my_range
# you have to define this, 1 % in this case
my <- mean(my_range)
z_value <- qnorm(prob/2)
sigma <- (my - my_range["lower"]) / (-1 * z_value)
# proof
N <- 100000 # large number
sim_vec <- rnorm(N, my, sigma)
chk <- 1 - length(sim_vec[sim_vec >= my_range["lower"] &
sim_vec <= my_range["upper"]]) / length(sim_vec)
cat("simulated proportion outside range:", chk, "\n")

Generating random numbers in a specific interval

I want to generate some Weibull random numbers in a given interval. For example 20 random numbers from the Weibull distribution with shape 2 and scale 30 in the interval (0, 10).
rweibull function in R produce random numbers from a Weibull distribution with given shape and scale values. Can someone please suggest a method? Thank you in advance.
Use the distr package. It allows to do this kind of stuff very easily.
require(distr)
#we create the distribution
d<-Truncate(Weibull(shape=2,scale=30),lower=0,upper=10)
#The d object has four slots: d,r,p,q that correspond to the [drpq] prefix of standard R distributions
#This extracts 10 random numbers
d#r(10)
#get an histogram
hist(d#r(10000))
Using base R you can generate random numbers, filter which drop into target interval and generate some more if their quantity appears to be less than you need.
rweibull_interval <- function(n, shape, scale = 1, min = 0, max = 10) {
weib_rnd <- rweibull(10*n, shape, scale)
weib_rnd <- weib_rnd[weib_rnd > min & weib_rnd < max]
if (length(weib_rnd) < n)
return(c(weib_rnd, rweibull_interval(n - length(weib_rnd), shape, scale, min, max))) else
return(weib_rnd[1:n])
}
set.seed(1)
rweibull_interval(20, 2, 30, 0, 10)
[1] 9.308806 9.820195 7.156999 2.704469 7.795618 9.057581 6.013369 2.570710 8.430086 4.658973
[11] 2.715765 8.164236 3.676312 9.987181 9.969484 9.578524 7.220014 8.241863 5.951382 6.934886

Resources