Using base X, how high can I count using Y characters? - math

I know that the total number of permutations for a given base is the factorial... so the total number of permutations of "abc" is 3! or 3x2x1 or 6.
Obviously I'm not sure of the terminology to properly phrase my question, but I would like to find the highest numbered permutation before the "length" of it's representation increases to X characters.
For example, Using a Base 62 'alphabet', I can represent integers up to 238327 before the representation uses 4 characters instead of 3. I'd like to know the math behind finding this out, given arbitrary values for Base and Length of representation.
Essentially, "using Base-X, how high can I count using Y characters?".

Assuming your numbers are positive and start at 0 then you can count from 0 to X^Y - 1.
As per your example above, 62^3 - 1 = 238328 - 1 = 238327.

Related

How can I identify inconsistencies and outliers in a dataset in R

I have a big dataset with alot of columns, being most of them not numeric values. I need to find inconsistencies in the data as well as outliers and the part of obtaining inconsistencies would be easy if the dataset wasn't so big (7032 rows to be exact).
An inconsistency would be something like: ID supposed to be 4 letters and 4 numbers and I obtain something else (like 3 numbers and 2 letters); or other example would be a number that should be a 0 or 1 and I obtain a -1 or a 2 .
Is there any function that I can use to obtain the inconsitencies in each column?
For the specific columns that doesn't have numeric values, I thought of doing a regex and validate if each row for a certain column is valid but I didn't found info that could give me that.
For the part of outliers I did a boxplot to see if I could obtain any outlier, like this:
boxplot(dataset$column)
But the graphic didn't gave me any outliers. Should I be ok with the results that I obtain in the graphic or should I try something else to see if there is really any outlier in the data?
For the specific examples you've given:
an ID must be be four numbers and 4 letters:
!grepl("^[0-9]{4}-[[:alpha:]]{4}$", ID)
will be TRUE for inconsistent values (^ and $ mean beginning- and end-of-string respectively; {4} means "previous pattern repeats exactly four times"; [0-9] means "any symbol between 0 and 9 (i.e. any numeral); [[:alpha:]] means "any alphabetic character"). If you only want uppercase letters you could use [A-Z] instead (assuming you are not working in some weird locale like Estonian).
If you need a numeric value to be 0 or 1, then !num_val %in% c(0,1) will work (this will work for any set of allowed values; you can use it for a specific set of allowed character values as well)
If you need a numeric value to be between a and b then !(a < num_val & num_val < b) ...

How to determine that remaining of dividing number X by number Y is zero using regular expression

i want to know is it possible to validate that deviding two number has remaining zero in result or not?
for example dividing number 4 on number two has zero in remaining.
4/2=0 (this is true)
but 4/3=1 (this is not true)
is there any expression for validation such case?
Better Question :
Is There any validation expression to validate this sentence ?
Remainder is zero
thank you
You can use a Modulo operator. The modulo operation finds the remainder of division of one number by another
y mod x
5 mod 2 =1 (2x2=4, 5-4=1)
9 mod 3 = 0 (3*3=9)
You can think of it, how many times does x fit in y and then take the remainder.
In computing the modulo operator is integrated in most programming languages, along with division, substraction etc. Check modulo and then your language on google (probably its mod).
This is called the modulo function. It essentially gives the remainder of a division of two integer number. So you can test for the modulo funtion returning zero. For example, in Python you would write
if a % b == 0:
# a can be divided by b with zero remainder

if two n bit numbers are multiplied then the result will be maximum how bit long

I was wondering if there is any formula or way to find out much maximum bits will be required if two n bits binary number are multiplied.
I searched a lot for this but was unable to find its answer anywhere.
Number of digits in base B required for representing a number N is floor(log_B(N)+1). Logarithm has this nice property that log(X*Y)=log(X)+log(Y), which hints that the number of digits for X*Y is roughly the sum of the number of digits representing X and Y.
It can be simply concluded using examples:
11*11(since 11 is the maximum 2 bit number)=1001(4 bit)
111*111=110001(6 bit)
1111*1111=11100001(8 bit)
11111*11111=1111000001(10 bit)
and so from above we can see your answer is 2*n
The easiest way to think about this is to consider the maximum of the product, which is attained when we use the maximum of the two multiplicands.
If value x is an n-bit number, it is at most 2^n - 1. Think about this, that 2^n requires a one followed by n zeroes.
Thus the largest possible product of two n-bit numbers will be:
(2^n - 1)^2 = 2^(2n) - 2^(n+1) + 1
Now n=1 is something of a special case, since 1*1 = 1 is again a one-bit number. But in general we see that the maximum product is a 2n-bit number, whenever n > 1. E.g. if n=3, the maximum multiplicand is x=7 and the square 49 is a six-bit number.
It's worth noting that the base of the positional system doesn't matter. Whatever formula you come up with for the decimal multiplication will work for the binary multiplication.
Let's apply a bit of deduction and multiply two numbers that have relatively prime numbers of digits: 2 and 3 digits, respectively.
Smallest possible numbers:
10 * 100 = 1000 has 4 digits
Largest possible numbers:
99 * 999 = 98901 has 5 digits
So, for a multiplication of n-digit by m-digit number, we deduce that the upper and lower limits are n+m and n+m-1 digits, respectively. Let's make sure it holds for binary as well:
10 * 100 = 1000 has 4 digits
11 * 111 = 10101 has 5 digits
So, it does hold for binary, and we can expect it to hold for any base.
x has n binary digits means that 2^(n-1) <= x < 2^n, also assume that y has m binary digits. That means:
2^(m+n-2)<=x*y<2^(m+n)
So x*y can have m+n-1 or m+n digits. It easy to construct examples where both cases are possible:
m+n-1: 2*2 has 3 binary digits (m = n = 2)
m+n: 7*7=49=(binary)110001 has 6 binary digits and m = n = 3

how do I generate 2 random prime numbers that when multiplied, yield a number with X bits? (X given as argument))

I lack the math skills to make this function.
basically, i want to return 2 random prime numbers that when multiplied, yield a number of bits X given as argument.
for example:
if I say my X is 3 then a possible solution would be:
p = 2 and q = 3 becouse 2 * 3 = 6 (110 has 3 bits).
A problem with this statement is that it starts by asking for two "random" prime numbers. Without any explicit statement of the distribution of the required random primes, we are already stuck. (This is the beginning of a classic paradox, where we are asked to generate a "random" integer.)
But suppose that we change the statement to finding any two arbitrary primes, that yield the desired product with a given number of bits x. The answer is trivial.
The set of numbers that have exactly x bits in their binary representation is the half open set of integers [2^(x-1),2^x-1].
Choose an arbitrary prime number that is less than or equal to (2^x-1)/2. Call it p1.
Next, choose a second prime number that lies in the interval (2^(x-1)/p1,(2^x-1)/p1). Call it p2.
It must be true that p1*p2 will be in the desired interval.
For example, given x = 10, so the product must lie in the interval [512,1023], the set of integers with exactly 10 bits. (Note, there are apparently 147 such numbers in that interval, with exactly two prime factors.)
Step 1:
Choose p1 as any prime no larger than 1023/2 = 511.5. I'll pick p1 = 137. Then the second prime factor must be a prime that lies in the interval
[512 1023]/137
ans =
3.7372 7.4672
thus either 5 or 7.
dec2bin(137*[5 7])
ans =
1010101101
1110111111
If you know the number of bits, you can generate a number 2^(x-2) < x < 2^(x-1). Then take the square root and find the closest primes on either side of it. Multiplying them together will, in most cases, get you a number in the correct range. If it's too high, you can take the two primes directly on the lower side of it.
pseudocode:
x = bits
primelist[] = makeprimelist()
rand = randnum between 2^(x-2) and 2^(x-1)
n = findposition(primelist, rand)
do
result = primelist[n]*primelist[n+1]
n--
while result > 2^(x-1)
Note that numbers generated this way will allways have '1' as the highest significant bit, so would be possible to generate a number of x-1 bits and just tack the 1 onto the end.

Probability of 3-character string appearing in a randomly generated password

If you have a randomly generated password, consisting of only alphanumeric characters, of length 12, and the comparison is case insensitive (i.e. 'A' == 'a'), what is the probability that one specific string of length 3 (e.g. 'ABC') will appear in that password?
I know the number of total possible combinations is (26+10)^12, but beyond that, I'm a little lost. An explanation of the math would also be most helpful.
The string "abc" can appear in the first position, making the string look like this:
abcXXXXXXXXX
...where the X's can be any letter or number. There are (26 + 10)^9 such strings.
It can appear in the second position, making the string look like:
XabcXXXXXXXX
And there are (26 + 10)^9 such strings also.
Since "abc" can appear at anywhere from the first through 10th positions, there are 10*36^9 such strings.
But this overcounts, because it counts (for instance) strings like this twice:
abcXXXabcXXX
So we need to count all of the strings like this and subtract them off of our total.
Since there are 6 X's in this pattern, there are 36^6 strings that match this pattern.
I get 7+6+5+4+3+2+1 = 28 patterns like this. (If the first "abc" is at the beginning, the second can be in any of 7 places. If the first "abc" is in the second place, the second can be in any of 6 places. And so on.)
So subtract off 28*36^6.
...but that subtracts off too much, because it subtracted off strings like this three times instead of just once:
abcXabcXabcX
So we have to add back in the strings like this, twice. I get 4+3+2+1 + 3+2+1 + 2+1 + 1 = 20 of these patterns, meaning we have to add back in 2*20*(36^3).
But that math counted this string four times:
abcabcabcabc
...so we have to subtract off 3.
Final answer:
10*36^9 - 28*36^6 + 2*20*(36^3) - 3
Divide that by 36^12 to get your probability.
See also the Inclusion-Exclusion Principle. And let me know if I made an error in my counting.
If A is not equal to C, the probability P(n) of ABC occuring in a string of length n (assuming every alphanumeric symbol is equally likely) is
P(n)=P(n-1)+P(3)[1-P(n-3)]
where
P(0)=P(1)=P(2)=0 and P(3)=1/(36)^3
To expand on Paul R's answer. Probability (for equally likely outcomes) is the number of possible outcomes of your event divided by the total number of possible outcomes.
There are 10 possible places where a string of length 3 can be found in a string of length 12. And there are 9 more spots that can be filled with any other alphanumeric characters, which leads to 36^9 possibilities. So the number of possible outcomes of your event is 10 * 36^9.
Divide that by your total number of outcomes 36^12. And your answer is 10 * 36^-3 = 0.000214
EDIT: This is not completely correct. In this solution, some cases are double counted. However they only form a very small contribution to the probability so this answer is still correct up to 11 decimal places. If you want the full answer, see Nemo's answer.

Resources