I'm trying to use a power function to change the distribution of a series of values between 0 and 1 such that the mean is 0.5.
ie. for each of the values in the series:
new_value = old_value ^ x
Where x is some number.
Is there a simple way to calculate the value of x?
You could run an optimizer from Python's scipy.
Here is an example:
import numpy as np
from scipy import optimize
values = np.random.uniform(0, 1, 5)
sol = optimize.root_scalar(lambda pwr: np.mean(values ** pwr) - 0.5,
bracket=[np.log(0.5) / np.log(values.max()), np.log(0.5) / np.log(values.min())])
print('given values:', values)
print('given mean:', values.mean())
print('power:', sol.root)
print('transformed values:', values ** sol.root)
print('mean of transformed values:', (values ** sol.root).mean())
Example output:
given values: [0.82082056 0.01531309 0.56587417 0.53283897 0.73051697]
given mean: 0.5330727532243068
power: 1.1562709936704882
transformed values: [0.79588022 0.00796968 0.5176988 0.48291519 0.69553611]
mean of transformed values: 0.5
A much simplified algorithm would be:
choose two limits: a = log(0.5)/log(max(values)) and b = log(0.5)/log(max(values))
calculating with a as power gives a mean lower (or equal) to 0.5
calculating with b as power gives a mean higher (or equal) to 0.5
choose a value m somewhere in the middle and calculate the mean with m as power; if that mean is lower than 0.5, m should replace a, otherwise m should replace b
repeat the previous step until either the mean is close enough to 0.5, or a and b get too close to each other
Using Julia 0.5. Given:
Supertech = [-.2 .1 .3 .5];
Slowpoke = [.05 .2 -.12 .09];
How in the world can I get a covariance. In Excel I just say
=covariance.p(Supertech,Slowpoke)
and it gives me the correct answer of -0.004875
For the life of me I can't figure out how to get this to work using StatsBase.cov()
I've tried putting this into a matrix like:
X = [Supertech; Slowpoke]'
which gives me a nice:
4×2 Array{Float64,2}:
-0.2 0.05
0.1 0.2
0.3 -0.12
0.5 0.09
but I can't get this simple thing to work. I keep coming up with dimension mismatches when I try to use the WeightedVector type.
The syntax [-.2 .1 .3 .5] doesn't create a vector, it creates a one-row matrix. The cov function is actually defined in base Julia, but it requires vectors. So you simply need to use the syntax with commas to create vectors in the first place ([-.2, .1, .3, .5]), or you can use the vec function to reshape the matrix to a one-dimensional vector. It also uses the "corrected" covariance by default, whereas Excel is using the "uncorrected" covariance. You can use the third argument to specify that you don't want this correction.
julia> cov(vec(Supertech), vec(Slowpoke))
-0.0065
julia> cov(vec(Supertech), vec(Slowpoke), false)
-0.004875
I have a 500x500 adjacency matrix of 1 and 0, and I need to calculate pagerank for each page. I have a code here, where R is the matrix and T=0.15 is a constant:
n = ncol(R)
B = matrix(1/n, n, n) # the teleportation matrix
A = 0.85 * R + 0.15 * B
ranks = eigen(A)$vectors[1] # my PageRanks
print(ranks)
[1] -0.5317519+0i
I don't have much experience with R, but I assume that the given output is a general pagerank, and I need a pagerank for each page.
Is there a way to construct a table of pageranks with relation to the matrix? I didn't find anything related to my particular case in the web.
Few points:
(1) You need to convert the binary adjacency matrix (R in your case) to a column-stochastic transition matrix to start with (representing probability of transitions between the pages).
(2) A needs to remain as column stochastic as well, then only the dominant eigenvector corresponding to the eigenvalue 1 will be the page rank vector.
(3) To find the first eigenvector of the matrix A, you need use eigen(A)$vectors[,1]
Example with a small 5x5 adjacency matrix R:
set.seed(12345)
R = matrix(sample(0:1, 25, replace=TRUE), nrow=5) # random binary adjacency matrix
R = t(t(R) / rowSums(t(R))) # convert the adjacency matrix R to a column-stochastic transition matrix
n = ncol(R)
B = matrix(1/n, n, n) # the teleportation matrix
A = 0.85 * R + 0.15 * B
A <- t(t(A) / rowSums(t(A))) # make A column-stochastic
ranks = eigen(A)$vectors[,1] # my PageRanks
print(ranks)
# [1] 0.05564937 0.05564937 0.95364105 0.14304616 0.25280990
print(ranks / sum(ranks)) # normalized ranks
[1] 0.03809524 0.03809524 0.65282295 0.09792344 0.17306313
I want to quickly generate discrete random numbers where I have a known CDF. Essentially, the algorithm is:
Construct the CDF vector (an increasing vector starting at 0 and end at 1) cdf
Generate a uniform(0, 1) random number u
If u < cdf[1] choose 1
else if u < cdf[2] choose 2
else if u < cdf[3] choose 3
*...
Example
First generate an cdf:
cdf = cumsum(runif(10000, 0, 0.1))
cdf = cdf/max(cdf)
Next generate N uniform random numbers:
N = 1000
u = runif(N)
Now sample the value:
##With some experimenting this seemed to be very quick
##However, with N = 100000 we run out of memory
##N = 10^6 would be a reasonable maximum to cope with
colSums(sapply(u, ">", cdf))
If you know the probability mass function (which you do, if you know the cumulative distribution function), you can use R's built-in sample function, where you can define the probabilities of discrete events with argument prob.
cdf = cumsum(runif(10000, 0, 0.1))
cdf = cdf/max(cdf)
system.time(sample(size=1e6,x=1:10000,prob=c(cdf[1],diff(cdf)),replace=TRUE))
user system elapsed
0.01 0.00 0.02
How about using cut:
N <- 1e6
u <- runif(N)
system.time(as.numeric(cut(u,cdf)))
user system elapsed
1.03 0.03 1.07
head(table(as.numeric(cut(u,cdf))))
1 2 3 4 5 6
51 95 165 172 148 75
If you have a finite number of possible values then you can use findInterval or cut or better sample as mentioned by #Hemmo.
However, if you want to generate data from a distribution that that theoretically goes to infinity (like the geometric, negative binomial, Poisson, etc.) then here is an algorithm that will work (this will also work with a finite number of values if wanted):
Start with your vector of uniform values and loop through the distribution values subtracting them from the vector of uniforms, the random value is the iteration where the value goes negative. This is a easier to see whith an example. This generates values from a Poisson with mean 5 (replace the dpois call with your calculated values) and compares it to using the inverse CDF (which is more efficient in this case where it exists).
i <- 0
tmp <- tmp2 <- runif(10000)
randvals <- rep(0, length(tmp) )
while( any(tmp > 0) ) {
tmp <- tmp - dpois(i, 5)
randvals <- randvals + (tmp > 0)
i <- i + 1
}
randvals2 <- qpois( tmp2, 5 )
all.equal(randvals, randvals2)
I have done this before, but now I'm struggling with it again, and I think I am not understanding the math underlying the issue.
I want to set a random number on within a small range on either side of 1. Examples would be .98, 1.02, .94, 1.1, etc. All of the examples I find describe getting a random number between 0 and 100, but how can I use that to get within the range I want?
The programming language doesn't really matter here, though I am using Pure Data. Could someone please explain the math involved?
Uniform
If you want a (psuedo-)uniform distribution (evenly spaced) between 0.9 and 1.1 then the following will work:
range = 0.2
return 1-range/2+rand(100)*range/100
Adjust the range accordingly.
Pseudo-normal
If you wanted a normal distribution (bell curve) you would need special code, which would be language/library specific. You can get a close approximation with this code:
sd = 0.1
mean = 1
count = 10
sum = 0
for(int i=1; i<count; i++)
sum=sum+(rand(100)-50)
}
normal = sum / count
normal = normal*sd + mean
Generally speaking, to get a random number within a range, you don't get a number between 0 and 100, you get a number between 0 and 1. This is inconsequential, however, as you could simply get the 0-1 number by dividing your # by 100 - so I won't belabor the point.
When thinking about the pseudocode of this, you need to think of the number between 0 and 1 which you obtain as a percentage. In other words, if I have an arbitrary range between a and b, what percentage of the way between the two endpoints is the point I have randomly selected. (Thus a random result of 0.52 means 52% of the distance between a and b)
With this in mind, consider the problem this way:
Set the start and end-points of your range.
var min = 0.9;
var max = 1.1;
Get a random number between 0 and 1
var random = Math.random();
Take the difference between your start and end range points (b - a)
var range = max - min;
Multiply your random number by the difference
var adjustment = range * random;
Add back in your minimum value.
var result = min + adjustment;
And, so you can understand the values of each step in sequence:
var min = 0.9;
var max = 1.1;
var random = Math.random(); // random == 0.52796 (for example)
var range = max - min; // range == 0.2
var adjustment = range * random; // adjustment == 0.105592
var result = min + adjustment; // result == 1.005592
Note that the result is guaranteed to be within your range. The minimum random value is 0, and the maximum random value is 1. In these two cases, the following occur:
var min = 0.9;
var max = 1.1;
var random = Math.random(); // random == 0.0 (minimum)
var range = max - min; // range == 0.2
var adjustment = range * random; // adjustment == 0.0
var result = min + adjustment; // result == 0.9 (the range minimum)
var min = 0.9;
var max = 1.1;
var random = Math.random(); // random == 1.0 (maximum)
var range = max - min; // range == 0.2
var adjustment = range * random; // adjustment == 0.2
var result = min + adjustment; // result == 1.1 (the range maximum)
return 0.9 + rand(100) / 500.0
or am I missing something?
If rand() returns you a random number between 0 and 100, all you need to do is:
(rand() / 100) * 2
to get a random number between 0 and 2.
If on the other hand you want the range from 0.9 to 1.1, use the following:
0.9 + ((rand() / 100) * 0.2)
You can construct any distribution you like form uniform in range [0,1) by changing variable. Particularly, if you want random of some distribution with cumulative distribution function F, you just substitute uniform random from [0,1) to inverse function for desired CDF.
One special (and maybe most popular) case is normal distribution N(0,1). Here you can use Box-Muller transform. Scaling it with stdev and adding a mean you get normal distribution with desired parameters.
You can sum uniform randoms and get some approximation of normal distribution, this case is considered by Nick Fortescue above.
If your source randoms are integers you should firstly construct a random in real domain with some known distribution. For example, uniform distribution in [0,1) you can construct such way. You get first integer in range from 0 to 99, multiply it by 0.01, get second integer, multiply it by 0.0001 and add to first and so on. This way you get a number 0.XXYYZZ... Double precision is about 16 decimal digits, so you need 8 integer randoms to construct double uniform one.
Box-Müller to the rescue.
var z2_cached;
function normal_random(mean, variance) {
if ( z2_cached ) {
var z2 = z2_cached;
z2_cached = 0
return z2 * Math.sqrt(variance) + mean;
}
var x1 = Math.random();
var x2 = Math.random();
var z1 = Math.sqrt(-2 * Math.log(x1) ) * Math.cos( 2*Math.PI * x2);
var z2 = Math.sqrt(-2 * Math.log(x1) ) * Math.sin( 2*Math.PI * x2);
z2_cached = z2;
return z1 * Math.sqrt(variance) + mean;
}
Use with values of mean 1 and variance e.g. 0.01
for ( var i=0; i < 20; i++ ) console.log( normal_random(1, 0.01) );
0.937240893365304
1.072511121460833
0.9950053748909895
1.0034139439164074
1.2319710866884104
0.9834737343090275
1.0363970887198277
0.8706648577217094
1.0882382154101415
1.0425139197341595
0.9438723605883214
0.935894021237943
1.0846400276817076
1.0428213927823682
1.020602499547105
0.9547701472093025
1.2598174560413493
1.0086997644531541
0.8711594789918106
0.9669499056660755
Function gives approx. normal distribution around mean with given variance.
low + (random() / 100) * range
So for example:
0.90 + (random() / 100) * 0.2
How near? You could use a Gaussian (a.k.a. Normal) distribution with a mean of 1 and a small standard deviation.
A Gaussian is suitable if you want numbers close to 1 to be more frequent than numbers a bit further away from 1.
Some languages (such as Java) will have support for Gaussians in the standard library.
Divide by 100 and add 1. (I assume you are looking for a range from 0 to 2?)
You want a range from -1 to 1 as output from your rand() expression.
( rand(2) - 1 )
Then scale that -1 to 1 range as needed. Say, for a .1 variation on either side:
(( rand(2) - 1 ) / 10 )
Then just add one.
(( rand(2) - 1 ) / 10 ) + 1
Rand() already gives you a random number between 0 and 100. The maximum different random number you can get with this are 100 thus Assuming that you want up to three decimal numbers 0.950-1.050 is the range you would be looking at.
The distribution can then be achieved by
0.95 + ((rand() / 100)
Are you looking for the random no. from range 1 to 2, like 1.1,1.5,1.632, etc. if yes then here is a simple python code:
import random
print (random.random%2)+1
var randomNumber = Math.random();
while(randomNumber<0.9 && randomNumber>0.1){
randomNumber = Math.random();
}
if(randomNumber>=0.9){
alert(randomNumber);
}
else if(randomNumber<=0.1){
alert(1+randomNumber);
}
For numbers from 0.9 to 1.1
seed = 1
range = 0,1
if your random is from 0..100
f_rand = random/100
the generated number
gen_number = (seed+f_rand*range*2)-range
You will get
1,04; 1,08; 1,01; 0,96; ...
with seed 3, range 2 => 1,95; 4,08; 2,70; 3,06; ...
I didn't understand this (sorry):
I am trying to set a random number on either side of 1: .98, 1.02, .94, 1.1, etc.
So, I'll provide a general solution for the problem instead.
Converting a random number generator
If you have a random number generator in a give range [0, 1)* with uniform distribution you can convert it to any distribution using the following method:
1 - Describe the distribution as a function defined in the output range and with total area of 1. So this function is f(x) = the probability of getting the value x.
2 - Integrate** the function.
3 - Equate it to the "randomic"*.
4 - Solve the equation for x. So ti gives you the value of x in function of the randomic.
*: Generalization for any input distribution is below.
**: The constant term of the integrated function is 0 (that is, you just discard it).
**: That is a variable the represents the result of generating a random number with uniform distribution in the range [0, 1). [I'm not sure if that's the correct name in English]
Example:
Let's say you want a value with the distribution f(x)=x^2 from 0 to 100. Well that function is not normalized because the total area below the function in the range is 1000000/3 not 1. So you normalize it scaling the curve in the vertical axis (keeping the relative proportions), that is dividing by the total area: f(x)=3*x^2 / 1000000 from 0 to 100.
Now, we have a function with the a total area of 1. The next step is to integrate it (you may have already have done that to get the area) and equte it to the randomic.
The integrated function is: F(x)=x^3/1000000+c. And equate it to the randomic: r=x^3/1000000 (remember that we discard the constant term).
Now, we need to solve the equation for x, the resulting expression: x=100*r^(1/3). Now you can use this formula to generate numbers with the desired distribution.
Generalization
If you have a random number generator with a custom distribution and want another different arbitrary distribution, you first need the source distribution function and then use it to express the target arbirary random number generator. To get the distribution function do the steps up to 3. For the target do all the steps, and then replace the randomic with the expression you got from the source distribution.
This is better understood with an example...
Example:
You have a random number generator with uniform distribution in the range [0, 100) and you want.. the same distribution f(x)=3*x^2 / 1000000 from 0 to 100 for simplicity [Since for that one we already did all the steps giving us x=100*r^(1/3)].
Since the source distribution is uniform the function is constant: f(z)=1. But we need to normalize for the range, leaving us with: f(z)=1/100.
Now, we integrate it: F(z)=z/100. And equate it to the randomic: r=z/100, but this time we don't solve it for x, instead we use it to replace r in the target:
x=100*r^(1/3) where r = z/100
=>
x=100*(z/100)^(1/3)
=>
x=z^(1/3)
And now you can use x=z^(1/3) to calculate random numbers with the distribution f(x)=3*x^2 / 1000000 from 0 to 100 starting with a random number in the distribution f(z)=1/100 from 0 to 100 [uniform].
Note: If you have normal distribution, use the bell function instead. The same method works for any other distribution. Take care of possible asymptote some distributions make create, you may need to try different ways to solve the equations.
On discrete distributions
Some times you need to express a discrete distribution, for example, you want to get 0 with 95% chance and 1 with 5% chance. So how do you do that?
Well, you divide it in rectangular distributions in such way that the ranges join to [0, 1) and use the randomic to evaluate:
0 if r is in [0, 0.95)
f(r) = {
1 if r is in [0.95, 1)
Or you can take the complex path, which is to write a distribution function like this (making each option exactly a range of length 1):
0.95 if x is in [0, 1)
f(x) = {
0.5 if x is in [1, 2)
Since each range has a length of 1 and the assigned values sum up to 1 we know that the total area is 1. Now the next step would be to integrate it:
0.95*x if x is in [0, 1)
F(x) = {
(0.5*(x-1))+0.95 = 0.5*x + 0.45 if x is in [1, 2)
Equate it to the randomic:
0.95*x if x is in [0, 1)
r = {
0.5*x + 0.45 if x is in [1, 2)
And solve the equation...
Ok, to solve that kind of equation, start by calculating the output ranges by applying the function:
[0, 1) becomes [0, 0.95)
[1, 2) becomes [0.95, {(0.5*(x-1))+0.95 where x = 2} = 1)
Now, those are the ranges for the solution:
? if r is in [0, 0.95)
x = {
? if r is in [0.95, 1)
Now, solve the inner functions:
r/0.95 if r is in [0, 0.95)
x = {
2*(r-0.45) = 2*r-0.9 if r is in [0.95, 1)
But, since the output is discrete, we end up with the same result after doing integer part:
0 if r is in [0, 0.95)
x = {
1 if r is in [0.95, 1)
Note: using random to mean pseudo random.
Edit: Found it on wikipedia (I knew I didn't invent it).