Family Wise Error Rate controlled by method of maximum statistics - r

I am flipping each coin 100 times in a bag of 50 coins and then I want to use the Method of Maximum statistics in order to determine the Family Wise Error Rate. However, I keep getting an FWER of 1 which feels wrong.
coins <- rbinom(50, 100, 0.5)
So I start by defining a new function where we input how many times we do randomizations, the coins themselves, and how many times we flip them.
simulate_max <- function(n_number_of_randomizations, input_coins, N_number_of_tosses, alpha = 0.05) {
maxList <- NULL
Then we do a for loop for every time we have specified.
for (iteration in 1:n_number_of_randomizations){
Now we shuffle the list of coins
CoinIteration <- sample(input_coins)
Now we apply the binary test to every coin in the bag
testresults <- map_df(CoinIteration, function(x) tidy(binom.test(x,N_number_of_tosses,p=alpha)) )
Now we want to add the maximum result from every test to the max list.
thisRandMax <- max(testresults$statistic)
maxList <- c(maxList, thisRandMax)
}
Finally, we iterate through every member of the maximum list to subtract the expected value of heads (ie 50 for 50% chance * 100 tosses.
for (iterator2 in 1:length(maxList)){
maxList[iterator2]<-maxList[iterator2]-(0.5*N_number_of_tosses)
}
Return the output from the function
return(data.frame(maxList))
}
Now we apply this simulation for each of the requested iterations.
repsmax = map_df(1:Nreps, ~simulate_max(Nrandomizations,coins,Ntosses))
Now we calculate the fwer by dividing the increased amount by the total number of cells.
fwer = sum(repsmax>0) / (Nreps*Nrandomizations)

There are some issues that I think would be good to clarify.
A FWER of ~1 seems about right to me given the parameters of your experiment. FWER relates to Type I error, and for a single normally distributed test at alpha = 0.05, FWER = 1 - P(Type I error = 0); FWER = 1 - 0.95 = 0.05. For two tests at alpha = 0.05, FWER = 1 - P(Type I error = 0); FWER = 1 - 0.95^2 = 0.0975. You have 50 coins (50 tests), so your FWER at alpha = 0.05 is 1 - 0.95^50 = 0.923. If your code treats the 100 coins as 100 tests, your FWER will be = 0.996 (~1).
You can control for Type I error (account for multiple testing) by using e.g. the Bonferroni correction (alpha / n). If you change your alpha to "0.05 / 50" = 0.001, you will control your FWER (reduce it) to 0.05 (1 - 0.999^50 = ~0.049). I suspect this is the answer you are looking for: if alpha = 0.001 then FWER = 0.05 and you have an acceptable chance of incorrectly rejecting the null hypothesis.
I don't know what the "maximum estimate of the effect size" is, or how to calculate it, but given that the two distributions are approximately identical, the effect size will be ~ 0. It then makes sense that controlling FWER to 0.05 (by adjusting alpha to 0.001) is the 'answer' to the question and if you can get your code to reflect that logic, I think you'll have your solution.

Related

Conditional probability in r

The question:
A screening test for a disease, that affects 0.05% of the male population, is able to identify the disease in 90% of the cases where an individual actually has the disease. The test however generates 1% false positives (gives a positive reading when the individual does not have the disease). Find the probability that a man has the disease given that has tested positive. Then, find the probability that a man has the disease given that he has a negative test.
My wrong attempt:
I first started by letting:
• T be the event that a man has a positive test
• Tc be the event that a man has a negative test
• D be the event that a man has actually the disease
• Dc be the event that a man does not have the disease
Therefore we need to find P(D|T) and P(D|Tc)
Then I wrote this code:
set.seed(110)
sims = 1000
D = rep(0, sims)
Dc = rep(0, sims)
T = rep(0, sims)
Tc = rep(0, sims)
# run the loop
for(i in 1:sims){
# flip to see if we have the disease
flip = runif(1)
# if we got the disease, mark it
if(flip <= .0005){
D[i] = 1
}
# if we have the disease, we need to flip for T and Tc,
if(D[i] == 1){
# flip for S1
flip1 = runif(1)
# see if we got S1
if(flip1 < 1/9){
T[i] = 1
}
# flip for S2
flip2 = runif(1)
# see if we got S1
if(flip2 < 1/10){
Tc[i] = 1
}
}
}
# P(D|T)
mean(D[T == 1])
# P(D|Tc)
mean(D[Tc == 1])
I'm really struggling so any help would be appreciated!
Perhaps the best way to think through a conditional probability question like this is with a concrete example.
Say we tested one million individuals in the population. Then 500 individuals (0.05% of one million) would be expected to have the disease, of whom 450 would be expected to test positive and 50 to test negative (since the false negative rate is 10%).
Conversely, 999,500 would be expected to not have the disease (one million minus the 500 who do have the disease), but since 1% of them would test positive, then we would expect 9,995 people (1% of 999,500) with false positive results.
So, given a positive test result taken at random, it either belongs to one of the 450 people with the disease who tested positive, or one of the 9,995 people without the disease who tested positive - we don't know which.
This is the situation in the first question, since we have a positive test result but don't know whether it is a true positive or a false positive. The probability of our subject having the disease given their positive test is the probability that they are one of the 450 true positives out of the 10,445 people with positive results (9995 false positives + 450 true positives). This boils down to the simple calculation 450/10,445 or 0.043, which is 4.3%.
Similarly, a negative test taken at random either belongs to one of the 989505 (999500 - 9995) people without the disease who tested negative, or one of the 50 people with the disease who tested negative, so the probability of having the disease is 50/989505, or 0.005%.
I think this question is demonstrating the importance of knowing that disease prevalence needs to be taken into account when interpreting test results, and very little to do with programming, or R. It requires only a calculator (at most).
If you really wanted to run a simulation in R, you could do:
set.seed(1) # This makes the sample reproducible
sample_size <- 1000000 # This can be changed to get a larger or smaller sample
# Create a large sample of 1 million "people", using a 1 to denote disease and
# a 0 to denote no disease, with probabilities of 0.0005 (which is 0.05%) and
# 0.9995 (which is 99.95%) respectively.
disease <- sample(x = c(0, 1),
size = sample_size,
replace = TRUE,
prob = c(0.9995, 0.0005))
# Create an empty vector to hold the test results for each person
test <- numeric(sample_size)
# Simulate the test results of people with the disease, using a 1 to denote
# a positive test and 0 to denote a negative test. This uses a probability of
# 0.9 (which is 90%) of having a positive test and 0.1 (which is 10%) of having
# a negative test. We draw as many samples as we have people with the disease
# and put them into the "test" vector at the locations corresponding to the
# people with the disease.
test[disease == 1] <- sample(x = c(0, 1),
size = sum(disease),
replace = TRUE,
prob = c(0.1, 0.9))
# Now we do the same for people without the disease, simulating their test
# results, with a 1% probability of a positive test.
test[disease == 0] <- sample(x = c(0, 1),
size = 1e6 - sum(disease),
replace = TRUE,
prob = c(0.99, 0.01))
Now we have run our simulation, we can just count the true positives, false positives, true negatives and false negatives by creating a contingency table
contingency_table <- table(disease, test)
contingency_table
#> test
#> disease 0 1
#> 0 989566 9976
#> 1 38 420
and get the approximate probability of having the disease given a positive test like this:
contingency_table[2, 2] / sum(contingency_table[,2])
#> [1] 0.04040015
and the probability of having the disease given a negative test like this:
contingency_table[2, 1] / sum(contingency_table[,1])
#> [1] 3.83992e-05
You'll notice that the probability estimates from sampling are not that accurate because of how small some of the sampling probabilities are. You could simulate a larger sample, but it might take a while for your computer to run it.
Created on 2021-08-19 by the reprex package (v2.0.0)
To expand on Allan's answer, but relating it back to Bayes Theorem, if you prefer:
From the question, you know (converting percentages to probabilities):
Plugging in:

R and power for one-tailed t-test

I'll be running a one-tailed t-test to determine if one mean is significantly lower than another. The problem is that, when I use R's pwr package to determine what power I can expect with n=30, I get an extremely low power even for large effects. So, for example:
> pwr.t.test(d=0.8,sig.level=.05,n=30,alternative="less")
Two-sample t test power calculation
n = 30
d = 0.8
sig.level = 0.05
power = 1.251823e-06
alternative = less
NOTE: n is number in *each* group
What's even stranger is that, when I increase n, the power goes down. So, for example, upping n to 300 gives me this:
> pwr.t.test(d=0.8,sig.level=.05,n=300,alternative="less")
Two-sample t test power calculation
n = 300
d = 0.8
sig.level = 0.05
power = 0
alternative = less
NOTE: n is number in *each* group
What am I missing?
I guess it's because d and alternative = 'less' are on different 'directions'.
Try this, and you will know what I mean.
pwr.t.test(d= - 0.8,sig.level=.05,n=300,alternative="less")
Two-sample t test power calculation
n = 300
d = -0.8
sig.level = 0.05
power = 1
alternative = less
NOTE: n is number in *each* group

h (effect size) parameter in pwr package in R

I am calculating the sample size for proportion test. I would like to have significance level =0.05, power = 0.90 and that the effect size is greater that 5%.
I would like to have statistically significance result if the difference in proportions is more that 5%.
But when I use pwr.2p.test function from pwr package to calculate sample size
pwr.2p.test(sig.level = 0.05, power =0.9, h=0.2, alternative="greater")
I have to specify effect size as Cohen's D. But it's range is said to be in (-3,3), and interpretation of this is:
The meaning of effect size varies by context, but the standard interpretation offered by Cohen (1988) is: cited from here
.8 = large (8/10 of a standard deviation unit)
.5 = moderate (1/2 of a standard deviation)
.2 = small (1/5 of a standard deviation)
My question is, how to formulate that I'd like to detect that there is more that 5% difference in proportions in 2 groups in a Cohen's d statistic?
Thanks for any help!
I used the function ES.h of the package pwr. This function calculate the Effect Size between two proportions. For p1 = 100% and p2 = 95%, we have:
h = ES.h(1, 0.95) = 0.4510268
I understand that this effect size informs the need to detect the distance between the hypothesis.
I'm not very secure in my interpretation, but I used this value to determine the sample size.
pwr.p.test(h=h, sig.level = 0.05, power = 0.8)
Determining the sample size to detect up to 5 points difference in the proportions:
n = 38.58352
To detect a difference of 10 points, the sample size decreases because the accuracy decreases. So, to h = ES.h(1, 0.90) = 0.6435011, so we have: n = 18.95432.
This is my interpretation? What do you think? Am I right?

On average, how many times will this incorrect loop iterate?

In some cases, a loop needs to run for a random number of iterations that ranges from min to max, inclusive. One working solution is to do something like this:
int numIterations = randomInteger(min, max);
for (int i = 0; i < numIterations; i++) {
/* ... fun and exciting things! ... */
}
A common mistake that many beginning programmers make is to do this:
for (int i = 0; i < randomInteger(min, max); i++) {
/* ... fun and exciting things! ... */
}
This recomputes the loop upper bound on each iteration.
I suspect that this does not give a uniform distribution of the number of times the loop will iterate that ranges from min to max, but I'm not sure exactly what distribution you do get when you do something like this. Does anyone know what the distribution of the number of loop iterations will be?
As a specific example: suppose that min = 0 and max = 2. Then there are the following possibilities:
When i = 0, the random value is 0. The loop runs 0 times.
When i = 0, the random value is nonzero. Then:
When i = 1, the random value is 0 or 1. Then the loop runs 1 time.
When i = 1, the random value is 2. Then the loop runs 2 times.
The probability of this first event is 1/3. The second event has probability 2/3, and within it, the first subcase has probability 2/3 and the second event has probability 1/3. Therefore, the average number of distributions is
0 × 1/3 + 1 × 2/3 × 2/3 + 2 × 2/3 × 1/3
= 0 + 4/9 + 4/9
= 8/9
Note that if the distribution were indeed uniform, we'd expect to get 1 loop iteration, but now we only get 8/9 on average. My question is whether it's possible to generalize this result to get a more exact value on the number of iterations.
Thanks!
Final edit (maybe!). I'm 95% sure that this isn't one of the standard distributions that are appropriate. I've put what the distribution is at the bottom of this post, as I think the code that gives the probabilities is more readable! A plot for the mean number of iterations against max is given below.
Interestingly, the number of iterations tails off as you increase max. Would be interesting if someone else could confirm this with their code.
If I were to start modelling this, I would start with the geometric distribution, and try to modify that. Essentially we're looking at a discrete, bounded distribution. So we have zero or more "failures" (not meeting the stopping condition), followed by one "success". The catch here, compared to the geometric or Poisson, is that the probability of success changes (also, like the Poisson, the geometric distribution is unbounded, but I think structurally the geometric is a good base). Assuming min=0, the basic mathematical form for P(X=k), 0 <= k <= max, where k is the number of iterations the loop runs, is, like the geometric distribution, the product of k failure terms and 1 success term, corresponding to k "false"s on the loop condition and 1 "true". (Note that this holds even to calculate the last probability, as the chance of stopping is then 1, which obviously makes no difference to a product).
Following on from this, an attempt to implement this in code, in R, looks like this:
fx = function(k,maximum)
{
n=maximum+1;
failure = factorial(n-1)/factorial(n-1-k) / n^k;
success = (k+1) / n;
failure * success
}
This assumes min=0, but generalizing to arbitrary mins isn't difficult (see my comment on the OP). To explain the code. First, as shown by the OP, the probabilities all have (min+1) as a denominator, so we calculate the denominator, n. Next, we calculate the product of the failure terms. Here factorial(n-1)/factorial(n-1-k) means, for example, for min=2, n=3 and k=2: 2*1. And it generalises to give you (n-1)(n-2)... for the total probability of failure. The probability of success increases as you get further into the loop, until finally, when k=maximum, it is 1.
Plotting this analytic formula gives the same results as the OP, and the same shape as the simulation plotted by John Kugelman.
Incidentally the R code to do this is as follows
plot_probability_mass_function = function(maximum)
{
x=0:maximum;
barplot(fx(x,max(x)), names.arg=x, main=paste("max",maximum), ylab="P(X=x)");
}
par(mfrow=c(3,1))
plot_probability_mass_function(2)
plot_probability_mass_function(10)
plot_probability_mass_function(100)
Mathematically, the distribution is, if I've got my maths right, given by:
which simplifies to
(thanks a bunch to http://www.codecogs.com/latex/eqneditor.php)
The latter is given by the R function
function(x,m) { factorial(m)*(x+1)/(factorial(m-x)*(m+1)^(x+1)) }
Plotting the mean number of iterations is done like this in R
meanf = function(minimum)
{
x = 0:minimum
probs = f(x,minimum)
x %*% probs
}
meanf = function(maximum)
{
x = 0:maximum
probs = f(x,maximum)
x %*% probs
}
par(mfrow=c(2,1))
max_range = 1:10
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
max_range = 1:100
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
Here are some concrete results I plotted with matplotlib. The X axis is the value i reached. The Y axis is the number of times that value was reached.
The distribution is clearly not uniform. I don't know what distribution it is offhand; my statistics knowledge is quite rusty.
1. min = 10, max = 20, iterations = 100,000
2. min = 100, max = 200, iterations = 100,000
I believe that it would still, given a sufficient amount of executions, conform to the distribution of the randomInteger function.
But this is probably a question better suited to be asked on MATHEMATICS.
I don’t know the math behind it, but I know how to compute it! In Haskell:
import Numeric.Probability.Distribution
iterations min max = iteration 0
where
iteration i = do
x <- uniform [min..max]
if i < x
then iteration (i + 1)
else return i
Now expected (iterations 0 2) gives you the expected value of ~0.89. Maybe someone with the requisite math knowledge can explain what I’m actually doing here. Because you start at 0, the loop will always run at least min times.

Generate a Random Number within a Range

I have done this before, but now I'm struggling with it again, and I think I am not understanding the math underlying the issue.
I want to set a random number on within a small range on either side of 1. Examples would be .98, 1.02, .94, 1.1, etc. All of the examples I find describe getting a random number between 0 and 100, but how can I use that to get within the range I want?
The programming language doesn't really matter here, though I am using Pure Data. Could someone please explain the math involved?
Uniform
If you want a (psuedo-)uniform distribution (evenly spaced) between 0.9 and 1.1 then the following will work:
range = 0.2
return 1-range/2+rand(100)*range/100
Adjust the range accordingly.
Pseudo-normal
If you wanted a normal distribution (bell curve) you would need special code, which would be language/library specific. You can get a close approximation with this code:
sd = 0.1
mean = 1
count = 10
sum = 0
for(int i=1; i<count; i++)
sum=sum+(rand(100)-50)
}
normal = sum / count
normal = normal*sd + mean
Generally speaking, to get a random number within a range, you don't get a number between 0 and 100, you get a number between 0 and 1. This is inconsequential, however, as you could simply get the 0-1 number by dividing your # by 100 - so I won't belabor the point.
When thinking about the pseudocode of this, you need to think of the number between 0 and 1 which you obtain as a percentage. In other words, if I have an arbitrary range between a and b, what percentage of the way between the two endpoints is the point I have randomly selected. (Thus a random result of 0.52 means 52% of the distance between a and b)
With this in mind, consider the problem this way:
Set the start and end-points of your range.
var min = 0.9;
var max = 1.1;
Get a random number between 0 and 1
var random = Math.random();
Take the difference between your start and end range points (b - a)
var range = max - min;
Multiply your random number by the difference
var adjustment = range * random;
Add back in your minimum value.
var result = min + adjustment;
And, so you can understand the values of each step in sequence:
var min = 0.9;
var max = 1.1;
var random = Math.random(); // random == 0.52796 (for example)
var range = max - min; // range == 0.2
var adjustment = range * random; // adjustment == 0.105592
var result = min + adjustment; // result == 1.005592
Note that the result is guaranteed to be within your range. The minimum random value is 0, and the maximum random value is 1. In these two cases, the following occur:
var min = 0.9;
var max = 1.1;
var random = Math.random(); // random == 0.0 (minimum)
var range = max - min; // range == 0.2
var adjustment = range * random; // adjustment == 0.0
var result = min + adjustment; // result == 0.9 (the range minimum)
var min = 0.9;
var max = 1.1;
var random = Math.random(); // random == 1.0 (maximum)
var range = max - min; // range == 0.2
var adjustment = range * random; // adjustment == 0.2
var result = min + adjustment; // result == 1.1 (the range maximum)
return 0.9 + rand(100) / 500.0
or am I missing something?
If rand() returns you a random number between 0 and 100, all you need to do is:
(rand() / 100) * 2
to get a random number between 0 and 2.
If on the other hand you want the range from 0.9 to 1.1, use the following:
0.9 + ((rand() / 100) * 0.2)
You can construct any distribution you like form uniform in range [0,1) by changing variable. Particularly, if you want random of some distribution with cumulative distribution function F, you just substitute uniform random from [0,1) to inverse function for desired CDF.
One special (and maybe most popular) case is normal distribution N(0,1). Here you can use Box-Muller transform. Scaling it with stdev and adding a mean you get normal distribution with desired parameters.
You can sum uniform randoms and get some approximation of normal distribution, this case is considered by Nick Fortescue above.
If your source randoms are integers you should firstly construct a random in real domain with some known distribution. For example, uniform distribution in [0,1) you can construct such way. You get first integer in range from 0 to 99, multiply it by 0.01, get second integer, multiply it by 0.0001 and add to first and so on. This way you get a number 0.XXYYZZ... Double precision is about 16 decimal digits, so you need 8 integer randoms to construct double uniform one.
Box-Müller to the rescue.
var z2_cached;
function normal_random(mean, variance) {
if ( z2_cached ) {
var z2 = z2_cached;
z2_cached = 0
return z2 * Math.sqrt(variance) + mean;
}
var x1 = Math.random();
var x2 = Math.random();
var z1 = Math.sqrt(-2 * Math.log(x1) ) * Math.cos( 2*Math.PI * x2);
var z2 = Math.sqrt(-2 * Math.log(x1) ) * Math.sin( 2*Math.PI * x2);
z2_cached = z2;
return z1 * Math.sqrt(variance) + mean;
}
Use with values of mean 1 and variance e.g. 0.01
for ( var i=0; i < 20; i++ ) console.log( normal_random(1, 0.01) );
0.937240893365304
1.072511121460833
0.9950053748909895
1.0034139439164074
1.2319710866884104
0.9834737343090275
1.0363970887198277
0.8706648577217094
1.0882382154101415
1.0425139197341595
0.9438723605883214
0.935894021237943
1.0846400276817076
1.0428213927823682
1.020602499547105
0.9547701472093025
1.2598174560413493
1.0086997644531541
0.8711594789918106
0.9669499056660755
Function gives approx. normal distribution around mean with given variance.
low + (random() / 100) * range
So for example:
0.90 + (random() / 100) * 0.2
How near? You could use a Gaussian (a.k.a. Normal) distribution with a mean of 1 and a small standard deviation.
A Gaussian is suitable if you want numbers close to 1 to be more frequent than numbers a bit further away from 1.
Some languages (such as Java) will have support for Gaussians in the standard library.
Divide by 100 and add 1. (I assume you are looking for a range from 0 to 2?)
You want a range from -1 to 1 as output from your rand() expression.
( rand(2) - 1 )
Then scale that -1 to 1 range as needed. Say, for a .1 variation on either side:
(( rand(2) - 1 ) / 10 )
Then just add one.
(( rand(2) - 1 ) / 10 ) + 1
Rand() already gives you a random number between 0 and 100. The maximum different random number you can get with this are 100 thus Assuming that you want up to three decimal numbers 0.950-1.050 is the range you would be looking at.
The distribution can then be achieved by
0.95 + ((rand() / 100)
Are you looking for the random no. from range 1 to 2, like 1.1,1.5,1.632, etc. if yes then here is a simple python code:
import random
print (random.random%2)+1
var randomNumber = Math.random();
while(randomNumber<0.9 && randomNumber>0.1){
randomNumber = Math.random();
}
if(randomNumber>=0.9){
alert(randomNumber);
}
else if(randomNumber<=0.1){
alert(1+randomNumber);
}
For numbers from 0.9 to 1.1
seed = 1
range = 0,1
if your random is from 0..100
f_rand = random/100
the generated number
gen_number = (seed+f_rand*range*2)-range
You will get
1,04; 1,08; 1,01; 0,96; ...
with seed 3, range 2 => 1,95; 4,08; 2,70; 3,06; ...
I didn't understand this (sorry):
I am trying to set a random number on either side of 1: .98, 1.02, .94, 1.1, etc.
So, I'll provide a general solution for the problem instead.
Converting a random number generator
If you have a random number generator in a give range [0, 1)* with uniform distribution you can convert it to any distribution using the following method:
1 - Describe the distribution as a function defined in the output range and with total area of 1. So this function is f(x) = the probability of getting the value x.
2 - Integrate** the function.
3 - Equate it to the "randomic"*.
4 - Solve the equation for x. So ti gives you the value of x in function of the randomic.
*: Generalization for any input distribution is below.
**: The constant term of the integrated function is 0 (that is, you just discard it).
**: That is a variable the represents the result of generating a random number with uniform distribution in the range [0, 1). [I'm not sure if that's the correct name in English]
Example:
Let's say you want a value with the distribution f(x)=x^2 from 0 to 100. Well that function is not normalized because the total area below the function in the range is 1000000/3 not 1. So you normalize it scaling the curve in the vertical axis (keeping the relative proportions), that is dividing by the total area: f(x)=3*x^2 / 1000000 from 0 to 100.
Now, we have a function with the a total area of 1. The next step is to integrate it (you may have already have done that to get the area) and equte it to the randomic.
The integrated function is: F(x)=x^3/1000000+c. And equate it to the randomic: r=x^3/1000000 (remember that we discard the constant term).
Now, we need to solve the equation for x, the resulting expression: x=100*r^(1/3). Now you can use this formula to generate numbers with the desired distribution.
Generalization
If you have a random number generator with a custom distribution and want another different arbitrary distribution, you first need the source distribution function and then use it to express the target arbirary random number generator. To get the distribution function do the steps up to 3. For the target do all the steps, and then replace the randomic with the expression you got from the source distribution.
This is better understood with an example...
Example:
You have a random number generator with uniform distribution in the range [0, 100) and you want.. the same distribution f(x)=3*x^2 / 1000000 from 0 to 100 for simplicity [Since for that one we already did all the steps giving us x=100*r^(1/3)].
Since the source distribution is uniform the function is constant: f(z)=1. But we need to normalize for the range, leaving us with: f(z)=1/100.
Now, we integrate it: F(z)=z/100. And equate it to the randomic: r=z/100, but this time we don't solve it for x, instead we use it to replace r in the target:
x=100*r^(1/3) where r = z/100
=>
x=100*(z/100)^(1/3)
=>
x=z^(1/3)
And now you can use x=z^(1/3) to calculate random numbers with the distribution f(x)=3*x^2 / 1000000 from 0 to 100 starting with a random number in the distribution f(z)=1/100 from 0 to 100 [uniform].
Note: If you have normal distribution, use the bell function instead. The same method works for any other distribution. Take care of possible asymptote some distributions make create, you may need to try different ways to solve the equations.
On discrete distributions
Some times you need to express a discrete distribution, for example, you want to get 0 with 95% chance and 1 with 5% chance. So how do you do that?
Well, you divide it in rectangular distributions in such way that the ranges join to [0, 1) and use the randomic to evaluate:
0 if r is in [0, 0.95)
f(r) = {
1 if r is in [0.95, 1)
Or you can take the complex path, which is to write a distribution function like this (making each option exactly a range of length 1):
0.95 if x is in [0, 1)
f(x) = {
0.5 if x is in [1, 2)
Since each range has a length of 1 and the assigned values sum up to 1 we know that the total area is 1. Now the next step would be to integrate it:
0.95*x if x is in [0, 1)
F(x) = {
(0.5*(x-1))+0.95 = 0.5*x + 0.45 if x is in [1, 2)
Equate it to the randomic:
0.95*x if x is in [0, 1)
r = {
0.5*x + 0.45 if x is in [1, 2)
And solve the equation...
Ok, to solve that kind of equation, start by calculating the output ranges by applying the function:
[0, 1) becomes [0, 0.95)
[1, 2) becomes [0.95, {(0.5*(x-1))+0.95 where x = 2} = 1)
Now, those are the ranges for the solution:
? if r is in [0, 0.95)
x = {
? if r is in [0.95, 1)
Now, solve the inner functions:
r/0.95 if r is in [0, 0.95)
x = {
2*(r-0.45) = 2*r-0.9 if r is in [0.95, 1)
But, since the output is discrete, we end up with the same result after doing integer part:
0 if r is in [0, 0.95)
x = {
1 if r is in [0.95, 1)
Note: using random to mean pseudo random.
Edit: Found it on wikipedia (I knew I didn't invent it).

Resources