Probability of 3-character string appearing in a randomly generated password - math

If you have a randomly generated password, consisting of only alphanumeric characters, of length 12, and the comparison is case insensitive (i.e. 'A' == 'a'), what is the probability that one specific string of length 3 (e.g. 'ABC') will appear in that password?
I know the number of total possible combinations is (26+10)^12, but beyond that, I'm a little lost. An explanation of the math would also be most helpful.

The string "abc" can appear in the first position, making the string look like this:
abcXXXXXXXXX
...where the X's can be any letter or number. There are (26 + 10)^9 such strings.
It can appear in the second position, making the string look like:
XabcXXXXXXXX
And there are (26 + 10)^9 such strings also.
Since "abc" can appear at anywhere from the first through 10th positions, there are 10*36^9 such strings.
But this overcounts, because it counts (for instance) strings like this twice:
abcXXXabcXXX
So we need to count all of the strings like this and subtract them off of our total.
Since there are 6 X's in this pattern, there are 36^6 strings that match this pattern.
I get 7+6+5+4+3+2+1 = 28 patterns like this. (If the first "abc" is at the beginning, the second can be in any of 7 places. If the first "abc" is in the second place, the second can be in any of 6 places. And so on.)
So subtract off 28*36^6.
...but that subtracts off too much, because it subtracted off strings like this three times instead of just once:
abcXabcXabcX
So we have to add back in the strings like this, twice. I get 4+3+2+1 + 3+2+1 + 2+1 + 1 = 20 of these patterns, meaning we have to add back in 2*20*(36^3).
But that math counted this string four times:
abcabcabcabc
...so we have to subtract off 3.
Final answer:
10*36^9 - 28*36^6 + 2*20*(36^3) - 3
Divide that by 36^12 to get your probability.
See also the Inclusion-Exclusion Principle. And let me know if I made an error in my counting.

If A is not equal to C, the probability P(n) of ABC occuring in a string of length n (assuming every alphanumeric symbol is equally likely) is
P(n)=P(n-1)+P(3)[1-P(n-3)]
where
P(0)=P(1)=P(2)=0 and P(3)=1/(36)^3

To expand on Paul R's answer. Probability (for equally likely outcomes) is the number of possible outcomes of your event divided by the total number of possible outcomes.
There are 10 possible places where a string of length 3 can be found in a string of length 12. And there are 9 more spots that can be filled with any other alphanumeric characters, which leads to 36^9 possibilities. So the number of possible outcomes of your event is 10 * 36^9.
Divide that by your total number of outcomes 36^12. And your answer is 10 * 36^-3 = 0.000214
EDIT: This is not completely correct. In this solution, some cases are double counted. However they only form a very small contribution to the probability so this answer is still correct up to 11 decimal places. If you want the full answer, see Nemo's answer.

Related

How can I identify inconsistencies and outliers in a dataset in R

I have a big dataset with alot of columns, being most of them not numeric values. I need to find inconsistencies in the data as well as outliers and the part of obtaining inconsistencies would be easy if the dataset wasn't so big (7032 rows to be exact).
An inconsistency would be something like: ID supposed to be 4 letters and 4 numbers and I obtain something else (like 3 numbers and 2 letters); or other example would be a number that should be a 0 or 1 and I obtain a -1 or a 2 .
Is there any function that I can use to obtain the inconsitencies in each column?
For the specific columns that doesn't have numeric values, I thought of doing a regex and validate if each row for a certain column is valid but I didn't found info that could give me that.
For the part of outliers I did a boxplot to see if I could obtain any outlier, like this:
boxplot(dataset$column)
But the graphic didn't gave me any outliers. Should I be ok with the results that I obtain in the graphic or should I try something else to see if there is really any outlier in the data?
For the specific examples you've given:
an ID must be be four numbers and 4 letters:
!grepl("^[0-9]{4}-[[:alpha:]]{4}$", ID)
will be TRUE for inconsistent values (^ and $ mean beginning- and end-of-string respectively; {4} means "previous pattern repeats exactly four times"; [0-9] means "any symbol between 0 and 9 (i.e. any numeral); [[:alpha:]] means "any alphabetic character"). If you only want uppercase letters you could use [A-Z] instead (assuming you are not working in some weird locale like Estonian).
If you need a numeric value to be 0 or 1, then !num_val %in% c(0,1) will work (this will work for any set of allowed values; you can use it for a specific set of allowed character values as well)
If you need a numeric value to be between a and b then !(a < num_val & num_val < b) ...

Number of possible combinations by changing 3 places in a 9 characters long string with a specific N(6) of characters that can be used for a change

Let's say we have a string a0d2bj5ew.
We want to find all possible combinations by changing 3 places in that string.
So it can be a0[d]2bj[5]e[w] or a0[d][2][b]j5ew and many many more.
The character set that we want to use for our change is also known, in this example it can be 012abc, for example. So the N of that set of characters is 6.
By looking into all combinations by changing 3 places we also do the job for just one or two places simultaneously, so it can be also a0[d]2bj[5]ew or just a0[d]2bj5ew
So the question what is the formula to calculate the total N of all combinations by working with the above mentioned criteria.
a0d2bj5ew string has 9 characters.
We change 3 all possible places.
And we use 012abc as our set of characters that those places will be changed with, so the N is 6.
We can choose 3 (K) positions from 9 (M) places by C(9,3) ways (C(M,K) or nCr, combination number)
Then we have to put 6 (N) possible characters onto these 3 (K) positions. There are 6^3 (N^K) ways, if we can use repeats (so we can get a2a), or A(6,3) ways if no repeats are possible (number of arrangements, combinations where order is important, so a2b is distinct from 2ab)
Result:
C(M,K) * N^K = M! / (K!(M-K)!) * N^K
or
C(M,K) * A(N,K) = M!/(K!(M-K)!) * N!/(N-K)!
These formulas don't consider situations when replacement char is the same as old one

How would I convert numbers to their decimal representation?

I am not sure how to approach it but could someone help me convert the following numbers to their decimal representation:
and
The general method goes something like this:
Work from right to left, you'll want to count the positions (starting with zero) and sum up the terms according to a the following formula:
Say you're working in base x. Then, if you're at the ith position, and that digit is d, then that position will contribute a term of d times x^i to the final sum.
As a concrete example, take your first number - here, x=7 (the base). Starting from the right, the first digit is d=6 at the i=0 position. So we start with 6*(7^0) = 6(1) = 6.
Moving to the left, i=1 and d=5. So we get 5(7^1) = 5(7) = 35 for this term.
Then, moving to the last digit, i=2 and d=4. So we get 4*(7^2)=4(49)=196 for the last term.
Now, you can just add all of these up to get 35 + 6 + 196 = 237 as your final number (in base 10, that is).
The exact same algorithm works for any base, so you should be able to apply it to the binary number in the exact same way.
(Just let x=2 and work right to left, noting that i ranges from 0 to 7 here.)

Number of combinations

Given the following letters in a license plate, how many combinations of them can you create
AAAA1234
Please note that this is not a homework question (I am too old for college :)
I am only trying to understand permutations and combinations. I always get lost when I see questions like this. Do I use n! or nPr or nCr.
Any book on this subject in addition to the logic used to arrive at the answer will also be greatly appreciated.
I have faith in exactly one method to remember such formulas: Rethink through the reasoning to justify it as needed. Then, each time you need the formula, remembering it becomes a mental exercise that makes it easier to remember it the next time. It also allows you to know the math on your own authority, instead of someone else's authority.
If the letters are all different, then there are n choices for the first letter, n-1 choices for the second letter, and so on. That makes n! However, in your problem the letters are not all different. One trick is to tag them to make them different so that you are overcounting, then divide by the amount that you are overcounting. If a of the symbols are A, then you can tag them in a! ways. They are then all different, so that the answer to the modified question is n!. So the answer to the original question is n!/a! (This is assuming that the symbols other than the A are fixed, distinct numbers.)
Another argument is to count the positions for the numbers. There are n positions for the 1, n-1 positions for the 2, etc., so you get n(n-1)...(n-r+1) = n!/a!, where r = n-a.
In fact the answer is the same as the permutation formula nPr. And your arrangements are much the same as partial permutations, which is what the formula is for. But you'll learn it better if you reason through it before looking at the formula.
As for books, I might suggest Brualdi, Introductory Combinatorics.
One strategy that you can use (there will be many) is to get all the permutations possible, then divide out the repeats.
Permutations of 8 elements = 8!
But for each unique arrangement of these, there are a bunch more with the same positions of the A's. So, how many ways can you arrange four A's in one particular set of positions?
Permutations of 4 A's = 4!
So the total unique arrangements should be 8! / 4!
If I'm totally wrong just someone say so and I'll delete this answer...
If you mean 3 letters A-Z and 4 digits 0...9 in that order, then you have
26 letters x
26 letters x
26 letters x
26 letters x
10 digits x
10 digits x
10 digits x
10 digits
= 26^4 * 10^4
= 4569760000
If no leading "0" is allowed, you get a few less.
Edit1: Miscounted the "A"
Edit2: I reread the question - originally I thought it was just four letters at the beginning followed by 4 numbers. If it's just a permutation thing, then the answer is obviously different: 8! permutations at all, but 4! permutations for the A are the same, so 8! / 4! = 1680.
Answer is 8!/4!
Let's try to explain with a simpler question: Combinations of 112 ?
There are 112, 121 and 211. If all digits would be unique, we could just find the answer by 3! But there is a repeating digit. So we should extract repeating digits by 3!/2! = 3
Another example is 1122. We have two repeating digit here. So we should extract twice. 4!/2!.2! = 6
I think this is a good explanation of permutations and combinations:
Easy Permutations and Combinations Better explained.
It goes step by step until you discover how to make the calculations.
No need for permutations, because all letters can be repeated, even the number
since the given example is [AAAA1234],then we have 4-Letters and 4-Digits.
for each letter we have 26 {A-Z} possible combinations
Thats why for 4 letters we will have 26^4
For each Number we have 10 {0-9} possible combinations, except the last digit we 9 possible combinations {case 1}, if it not allowed to be 0 otherwise it is 10 {case 2}
Thats why for 4 letters we will have 9*10^3 {case 1} or 10^4 {case 2}
The total number of combinations is {case 1} 9*(26^4)***(10^3) or {case 2} (26^4)*(10^4)
But if your question about permutations for the set{A,A,A,A,1,2,3,4}, then consider the the equivalent set {1,2,3,4,5,6,7,8} and try avoid the repeated sequence by divide over the permutations of {5,6,7,8} and the answer is 8!/4!=5*6*7*8=1680. the{5,6,7,8} represent {A,A,A,A} See #Tesserex & #erkangur
How many distinct sets of positions can the A's occupy? Given this value, multiply by the number of distinct arrangements of 1234 and you have your answer. You'll need to choose the positions for the A's and then ! will help with the arrangements of 1234.
Consider a simpler example. Let's say you had asked the question:
How many arrangements are there of the symbols: ABCD1234?
Now, since every symbol is distinct, there are 8! ways to arrange them.
Now let's build up to your problem. If we change the letter B to an A, we have AACD1234.
This destroys the uniqueness of exactly half the possible combinations, since any combination where we could have previously switched the A and the B is now non-unique. Therefore, we now have 8!/2 combinations.
Similarly, replacing the C with another A would result in half of the remaining combinations losing their uniqueness, and so on.
So, if only one symbol is duplicated, the generalized formula is (number of symbols total)!/2^(number of duplications)
In your case, the number of possible arrangements is 8!/2^4

Maths Question: number of different permutations

This is more of a maths question than programming but I figure a lot of people here are pretty good at maths! :)
My question is: Given a 9 x 9 grid (81 cells) that must contain the numbers 1 to 9 each exactly 9 times, how many different grids can be produced. The order of the numbers doesn't matter, for example the first row could contain nine 1's etc. This is related to Sudoku and we know the number of valid Sudoku grids is 6.67×10^21, so since my problem isn't constrained like Sudoku by having to have each of the 9 numbers in each row, column and box then the answer should be greater than 6.67×10^21.
My first thought was that the answer is 81! however on further reflection this assumes that the 81 numbers possible for each cell are different, distinct number. They are not, there are 81 possible numbers for each cell but only 9 possible different numbers.
My next thought was then that each of the cells in the first row can be any number between 1 and 9. If by chance the first row happened to be all the same number, say all 1s, then each cell in the second row could only have 8 possibilites, 2-9. If this continued down until the last row then number of different permutations could be calculated by 9^2 * 8^2 * 7^2 ..... * 1^2. However this doesn't work if each row doesn't contain 9 of the same number.
It's been quite a while since I studied this stuff and I can't think of a way to work it out, I'd appreciate any help anyone can offer.
Imagine taking 81 blank slips of paper and writing a number from 1 to 9 on each slip (nine of each number). Shuffle the deck, and start placing the slips on the 9x9 grid.
You'd be able to create 81! different patterns if you considered each slip to be unique.
But instead you want to consider all the 1's to be equivalent.
For any particular configuration, how many times will that configuration be repeated
due to the 1's all being equivalent? The answer is 9!, the number of ways you can permute the nine slips with 1 written on them.
So that cuts the total number of permutations down to 81!/9!. (You divide by the number of indistinguishable permutations. Instead of 9! indistinguishable permutations, imagine there were just 2 indistinguishable permutations. You would divide the count by 2, right? So the rule is, you divide by the number of indistinguishable permutations.)
Ah, but you also want the 2's to be equivalent, and the 3's, and so forth.
By the same reasoning, that cuts down the number of permutations to
81!/(9!)^9
By Stirling's approximation, that is roughly 5.8 * 10^70.
First, let's start with 81 numbers, 1 through 81. The number of permutations for that is 81P81, or 81!. Simple enough.
However, we have nine 1s, which can be arranged in 9! indistinguishable permutations. Same with 2, 3, etc.
So what we have is the total number of board permutations divided by all the indistinguishable permutations of all numbers, or 81! / (9! ** 9).
>>> reduce(operator.mul, range(1,82))/(reduce(operator.mul, range(1, 10))**9)
53130688706387569792052442448845648519471103327391407016237760000000000L

Resources