Probability theory for random string generator

Probability theory for random string generator - math

I would like to understand the "Probability theory" and how it works.
Imagine we have 2 variables (x,y)
If x contains the alphabet letters a-z and y contains the numbers 0-9
From merging them, how many strings can we get?
Another Example is,
function generate($l=10){
$str = ""; for ($x=0;$x<$l;$x++)
$str .= substr(str_shuffle("0123456789abcdefghijklmnopqrstuvwxyz"), 0, 1);
return $str; }
Again, from merging them, how many strings can we get?
Thanks!

You are trying to find the number of combinations that a String of length 1-10 can be, with the character set a-z, 0-9. That means there are 36 possible characters to choose from.
The total number of combinations is equal to the sum of combinations for each String length.
For a String of length 1, there are 36 possibilities (a, b, ..., z, 0, 1, ..., 9).
For a String of length 2, there are 36^2 possibilities (aa, ab, ..., az, a0, ..., a9, ba, ..., 99)
For a String of length 3, there are 36^3 possibilities, you are seeing the pattern.
At every length, there are 36^(length of the String). So, the total number of combinations is the sum of 36^1, 36^2, ..., 36^10.

Related

Encode numbers with letters with fixed lentgh?

I have two unique numbers, 100000 - 999999 (fixed 6 chars length [0-9]), second
1000000 - 9999999 (fixed 7 char length [0-9]). How can i encode/decode this numbers (they need to remain separate after decoding), using only uppercase letters [A-Z] and [0-9] digits and have a fixed length of 8 chars in total?
Example:
input -> num_1: 242404, num_2 : 1002000
encode -> AX3B O3XZ
decode -> 2424041002000
Is there any algorithm for this type of problem?

This is just a simple mapping from one set of values to another set of values. The procedure is always the same:
List all possible input and output values.
Find the index of the input.
Return the value of the output list at that index.
Note that it's often not necessary to make an actual list (i.e. loading all values into some data structure). You can typically compute the value for any index on-demand. This case is no different.
Imagine a list of all possible input pairs:
0 100'000, 1'000'000
1 100'000, 1'000'001
2 100'000, 1'000'002
...
K 100'000, 9'999'999
K+1 100'001, 1'000'000
K+2 100'001, 1'000'001
...
N-1 999'999, 9'999'998
N 999'999, 9'999'999
For any given pair (a, b), you can compute its index i in this list like so:
// Make a and b zero-based
a -= 100'000
b -= 1'000'000
i = a*1'000'000 + b
Convert i to base 36 (A-Z and 0-9 gives you 36 symbols), pad on the left with zeros as necessary1, and insert a space after the fourth digit.
encoded = addSpace(zeroPad(base36(i)))
To get back to the input pair:
Convert the 8-character base 36 string to base 10 (this is the index into the list, remember), then derive a and b from the index.
i = base10(removeSpace(encoded))
a = i/1'000'000 + 100'000 // integer divison (i.e. ignore remainder)
b = i%1'000'000 + 1'000'000
Here is an implementation in Go: https://play.golang.org/p/KQu9Hcoz5UH
1 If you don't like the idea of zero padding you can also offset i at this point. The target set of values is plenty big enough, you need only about 32% of all base 36 numbers with eight digits or less.

Statistical probability of N contiguous true-bits in a sequence of bits?

Let's assume I have an N-bit stream of generated bits. (In my case 64kilobits.)
Whats the probability of finding a sequence of X "all true" bits, contained within a stream of N bits. Where X = (2 to 16), and N = (16 to 1000000), and X < N.
For example:
If N=16 and X=5, whats the likelyhood of finding 11111 within a 16-bit number.
Like this pseudo-code:
int N = 1<<16; // (64KB)
int X = 5;
int Count = 0;
for (int i = 0; i < N; i++) {
int ThisCount = ContiguousBitsDiscovered(i, X);
Count += ThisCount;
}
return Count;
That is, if we ran an integer in a loop from 0 to 64K-1... how many times would 11111 appear within those numbers.
Extra rule: 1111110000000000 doesn't count, because it has 6 true values in a row, not 5. So:
1111110000000000 = 0x // because its 6 contiguous true bits, not 5.
1111100000000000 = 1x
0111110000000000 = 1x
0011111000000000 = 1x
1111101111100000 = 2x
I'm trying to do some work involving physically-based random-number generation, and detecting "how random" the numbers are. Thats what this is for.
...
This would be easy to solve if N were less than 32 or so, I could just "run a loop" from 0 to 4GB, then count how many contiguous bits were detected once the loop was completed. Then I could store the number and use it later.
Considering that X ranges from 2 to 16, I'd literally only need to store 15 numbers, each less than 32 bits! (if N=32)!
BUT in my case N = 65,536. So I'd need to run a loop, for 2^65,536 iterations. Basically impossible :)
No way to "experimentally calculate the values for a given X, if N = 65,536". So I need maths, basically.

Fix X and N, obiously with X < N. You have 2^N possible values of combinations of 0 and 1 in your bit number, and you have N-X +1 possible sequences of 1*X (in this part I'm only looking for 1's together) contained in you bit number. Consider for example N = 5 and X = 2, this is a possible valid bit number 01011, so fixed the last two characteres (the last two 1's) you have 2^2 possible combinations for that 1*Xsequence. Then you have two cases:
Border case: Your 1*X is in the border, then you have (2^(N -X -1))*2 possible combinations
Inner case: You have (2^(N -X -2))*(N-X-1) possible combinations.
So, the probability is (border + inner )/2^N
Examples:
1)N = 3, X =2, then the proability is 2/2^3
2) N = 4, X = 2, then the probaility is 5/16

A bit brute force, but I'd do something like this to avoid getting mired in statistics theory:
Multiply the probabilities (1 bit = 0.5, 2 bits = 0.5*0.5, etc) while looping
Keep track of each X and when you have the product of X bits, flip it and continue
Start with small example (N = 5, X=1 - 5) to make sure you get edge cases right, compare to brute force approach.
This can probably be expressed as something like Sum (Sum 0.5^x (x = 1 -> 16) (for n = 1 - 65536) , but edge cases need to be taken into account (i.e. 7 bits doesn't fit, discard probability), which gives me a bit of a headache. :-)

#Andrex answer is plain wrong as it counts some combinations several times.
For example consider the case N=3, X=1. Then the combination 101 happens only 1/2^3 times but the border calculation counts it two times: one as the sequence starting with 10 and another time as the sequence ending with 01.
His calculations gives a (1+4)/8 probability whereas there are only 4 unique sequences that have at least a single contiguous 1 (as opposed to cases such as 011):
001
010
100
101
and so the probability is 4/8.
To count the number of unique sequences you need to account for sequences that can appear multiple times. As long as X is smaller than N/2 this will happens. Not sure how you can count them tho.

How to get the index of a string of certain length that is made from a list of character in R

Let's say I have a list of character: ['A','C','G','U'] and I want to make strings of a certain length, let's say 5.
From this, I can represent each string of this length as its index in dictionary order. For example, AAAAA is 1, AAAAC is 2, ..., AAACA is 5, etc...
My question is, given an arbitrary string of this length, let's say GUGAC, how do I get its index using R? (In this case, for GUGAC, it should be 738)

What you have here is a base 4 numbering system. The method is to covert the letters into the corresponding base 4 number, multiply by the 4^n power series and take the sum of the values.
string<-"GUGAC"
#Convert string to a vector of letters
strletters<-unlist(strsplit(string, ""))
#covert from letters to base counting (sequence is 0, 1, 2, 3, 10, 11 etc...)
facts<-factor(strletters, levels=c("A", "C", "G", "U"))
nums<-as.integer(facts)-1
#create list of multipliers
multipliers<-4**((length(nums)-1):0)
#sum of the multipliers* nums + 1 (typically start counting from 1 not 0)
sum(multipliers*nums)+1

Stirling numbers of the second kind with multisets

I was looking at [Stirling numbers of the second kind], which are the total number of ways to split a set of length n into k non-empty subsets, where order does not matter.(http://mathworld.wolfram.com/StirlingNumberoftheSecondKind.html), and was wondering how to write a non-naive algorithm to compute
S(n, k {occurences of each element})
Where
S(6, 3, {1, 2, 3} )
would give the total number of ways a set with 6 elements in which 3 are the same element and a different 2 are another element (and 1 is its unique element) could be split into 3 non-empty sets, ignoring permutations.
There is a recursive formula for regular Stirling numbers of the second kind S(n, k), but unlikely to be a comparable function for multisets.
So what's an algorithm that could calculate this number?
Relevant question on Math.SE here, without a real method to calculate this number.

How many unique strings is possible with set amount of characters and length?

If I have two characters (a, b) and a length of three (aaa, aab ...), how do I count how many unique strings I can make of that (and what is the math method called)?
Is this correct?
val = 1, amountCharacters = 2, length = 3;
for (i = 1; i <= length; ++i) { val = amountCharacters*val; uniqueStrings = val }
This example returns 8 which is correct. If I try with something higher, like amountCharacters = 10 it returns 1000. Is it still correct?

If you have n different characters and the length is k, there are exactlty nk possible strings you can form. Each character independently of the rest can be one of n different options and there are k total choices to make. Your code is correct.
For 2 possible characters and 10 letters, there are exactly 1024 possible strings.
Hope this helps!

The same rules than Base mathematics concept applies.
So the short answer is amountCharacters ^ length.
Longest natural answer.
The first letter will have X possible values
The second letter will have X*X possible values
and so on ..
X equals the number of possible values, i-e the amount of characters in your question

If I understand your question correctly, if you have N characters and want to construct a string of length L, the number of combinations is just N^L (e.g. N to the power of L).
There are various other results you can get if there are different limitations on what the string can contain, e.g. combinations or permutations.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Probability theory for random string generator - math

Related

Encode numbers with letters with fixed lentgh?

Statistical probability of N contiguous true-bits in a sequence of bits?

How to get the index of a string of certain length that is made from a list of character in R

Stirling numbers of the second kind with multisets

How many unique strings is possible with set amount of characters and length?

Categories

Resources