Jaro Similarity - similarity

For finding Jaro similarity I found the matching charecters as follows
matching charecters in string 1 : AABABCAAAC
matching charecters in string 2 : ABAACBAAAC
what is the value of t(0.5*transpositions)?
(source: wikipedia)

Transpositions in this context are all those characters that don't match the same position on strings
from wikipedia
m = 10
t = 4/2 = 2
|S1| = 10
|S2| = 10
d = 1/3 * (10/10 + 10/10 + (10-2)/10) = 0.933
these transposition are [A/B, B/A, B/C, C/B] so t is calculed with |[A/B, B/A, B/C, C/B]| / 2.

Related

how many trailing zeros after factorial?

I am trying to do this programming task:
Write a program that will calculate the number of trailing zeros in a
factorial of a given number.
N! = 1 * 2 * 3 * ... * N
Be careful 1000! has 2568 digits.
For more info, see: http://mathworld.wolfram.com/Factorial.html
Examples:
zeros(6) = 1 ->
6! = 1 * 2 * 3 * 4 * 5 * 6 = 720 --> 1 trailing zero
zeros(12) = 2 ->
12! = 479001600 --> 2 trailing zeros
I'm confused as one of the sample tests I have is showing this: expect_equal(zeros(30), 7)
I could be misunderstanding the task, but where do the trailing 7 zeros come from when the input is 30?
with scientific notation turned on I get this:
2.6525286e+32
and with it turned off I get this:
265252859812191032282026086406022
What you are experiencing is a result of this: Why are these numbers not equal?
But in this case, calculating factorials to find the numbers of trailing zeros is not that efficient.
We can count number of 5-factors in a number (since there will be always enough 2-factors to pair with them and create 10-factors). This function gives you trailing zeros for a factorial by counting 5-factors in a given number.
tailingzeros_factorial <- function(N){
mcount = 0L
mdiv = 5L
N = as.integer(N)
while (as.integer((N/mdiv)) > 0L) {
mcount = mcount + as.integer(N/mdiv)
mdiv = as.integer(mdiv * 5L)
}
return(mcount)
}
tailingzeros_factorial(6)
#> 1
tailingzeros_factorial(25)
#> 6
tailingzeros_factorial(30)
#> 7

recursive function to convert string to integer ML

I need to write my own recursive function in ML that somehow uses ord to convert a string of numbers to integer type. I can use helper functions, but apparently I should be able to do this without using one (according to my professor).
I can assume that the input is valid, and is a positive integer (in string type of course).
So, the call str2int ("1234") should output 1234: int
I assume I will need to use explode and implode at some point since ord operates on characters, and my input is a string. Any direction would be greatly appreciated.
Given that you asked, I guess I can ruin all the fun for you. This will solve your problem, but ironically, it won't help you.
Well, the ordinal number for the character #'0' is 48. So, this means that if you subtract of any ordinal representing a digit the number 48 you get its decimal value. For instance
ord(#"9") - 48
Yields 9.
So, a function that takes a given character representing a number from 0-9 and turns it into the corresponding decimal is:
fun charToInt(c) = ord(c) - 48
Supposing you had a string of numbers like "2014". Then you can first explode the string into list of characters and then map every character to its corresponding decimal.
For instance
val num = "2014"
val digits = map charToInt (explode num)
The explode function is a helper function that takes a string and turn it into a list of characters.
And now digits would be a list of integers representing the decimal numbers [2,0,1,4];
Then, all you need is to apply powers of 10 to obtain the final integer.
2 * 10 ^ 3 = 2000
0 * 10 ^ 2 = 0
1 * 10 ^ 1 = 10
4 * 10 ^ 0 = 4
The result would be 2000 + 0 + 10 + 4 = 2014
You could define a helper function charsToInt that processes the digits in the string from left to right.
At each step it converts the leftmost digit c into a number and does addition with the 10x-multiple of n (which is the intermediary sum of all previously parsed digits) ...
fun charsToInt ([], n) = n
| charsToInt (c :: cs, n) = charsToInt (cs, 10*n + ord c - 48)
val n = charsToInt (explode "1024", 0)
Gives you: val n = 1024 : int
As you see the trick is to pass the intermediary result down to the next step at each recursive call. This is a very common technique when dealing with these kind of problems.
Here's what I came up with:
fun pow10 n =
if n = 0 then 1 else 10*pow10(n-1);
fun str2help (L,n) =
if null L then 0
else (ord(hd L)-48) * pow10(n) + str2help(tl L, n-1);
fun str2int (string) =
str2help(explode string, size string -1);
str2int ("1234");
This gives me the correct result, though is clearly not the easiest way to get there.

How to find d, given p, q, and e in RSA?

I know I need to use the extended euclidean algorithm, but I'm not sure exactly what calculations I need to do. I have huge numbers. Thanks
Well, d is chosen such that d * e == 1 modulo (p-1)(q-1), so you could use the Euclidean algorithm for that (finding the modular multiplicative inverse).
If you are not interested in understanding the algorithm, you can just call BigInteger#modInverse directly.
d = e.modInverse(p_1.multiply(q_1))
Given that, p=11, q=7, e =17, n=77, φ (n) = 60 and d=?
First substitute values from the formula:-
ed mod φ (n) =1
17 d mod 60 = 1
The next step: – take the totient of n, which is 60 to your left hand side and [e] to your right hand side.
60 = 17
3rd step: – ask how many times 17 goes to 60. That is 3.5….. Ignore the remainder and take 3.
60 = 3(17)
Step 4: – now you need to balance this equation 60 = 3(17) such that left hand side equals to right hand side. How?
60 = 3(17) + 9 <== if you multiply 3 by 17 you get 51 then plus 9, that is 60. Which means both sides are now equal.
Step 5: – Now take 17 to your left hand side and 9 to your right hand side.
17 = 9
Step 6:- ask how many times 9 goes to 17. That is 1.8…….
17 = 1(9)
Step 7:- Step 4: – now you need to balance this 17 = 1(9)
17 = 1(9) + 8 <== if you multiply 1 by 9 you get 9 then plus 8, that is 17. Which means both sides are now equal.
Step 8:- again take 9 to your left hand side and 8 to your right hand side.
9 = 1(8)
9 = 1(8) + 1 <== once you reached +1 to balance your equation, you may stop and start doing back substitution.
Step A:-Last equation in step 8 which is 9 = 1(8) + 1 can be written as follows:
1.= 9 – 1(8)
Step B:-We know what is (8) by simple saying 8 = 17 – 1(9) from step 7. Now we can re-write step A as:-
1=9 -1(17 – 1(9)) <== here since 9=1(9) we can re-write as:-
1=1(9)-1(17) +1(9) <== group similar terms. In this case you add 1(9) with 1(9) – that is 2(9).
1=2(9)-1(17)
Step C: – We know what is (9) by simple saying 9 = 60 – 3(17) from step 4. Now we can re-write step B as:-
1=2(60-3(17) -1(17)
1=2(60)-6(17) -1(17) <== group similar terms. In this case you add 6(17) with 1(17) – that is 7(17).
1=2(60)-7(17) <== at this stage we can stop, nothing more to substitute, therefore take the value next 17. That is 7. Subtract it with the totient.
60-7=d
Then therefore the value of d= 53.
I just want to augment the Sidudozo's answer and clarify some important points.
First of all, what should we pass to Extended Euclidean Algorthim to compute d ?
Remember that ed mod φ(n) = 1 and cgd(e, φ(n)) = 1.
Knowing that the Extended Euclidean Algorthim is based on the formula cgd(a,b) = as + bt, hence cgd(e, φ(n)) = es + φ(n)t = 1, where d should be equal to s + φ(n) in order to satisfy the
ed mod φ(n) = 1 condition.
So, given the e=17 and φ(n)=60 (borrowed from the Sidudozo's answer), we substitute the corresponding values in the formula mentioned above:
cgd(e, φ(n)) = es + φ(n)t = 1 ⇔ 17s + 60t = 1.
At the end of the Sidudozo's answer we obtain s = -7. Thus d = s + φ(n) ⇔ d = -7 + 60 ⇒ d = 53.
Let's verify the results. The condition was ed mod φ(n) = 1.
Look 17 * 53 mod 60 = 1. Correct!
The approved answer by Thilo is incorrect as it uses Euler's totient function instead of Carmichael's totient function to find d. While the original method of RSA key generation uses Euler's function, d is typically derived using Carmichael's function instead for reasons I won't get into. The math needed to find the private exponent d given p q and e without any fancy notation would be as follows:
d = e^-1*mod(((p-1)/GCD(p-1,q-1))(q-1))
Why is this? Because d is defined in the relationship
de = 1*mod(λ(n))
Where λ(n) is Carmichael's function which is
λ(n)=lcm(p-1,q-1)
Which can be expanded to
λ(n)=((p-1)/GCD(p-1,q-1))(q-1)
So inserting this into the original expression that defines d we get
de = 1*mod(((p-1)/GCD(p-1,q-1))(q-1))
And just rearrange that to the final formula
d = e^-1*mod(((p-1)/GCD(p-1,q-1))(q-1))
More related information can be found here.
Here's the code for it, in python:
def inverse(a, n):
t, newt = 0, 1
r, newr = n, a
while newr:
quotient = r // newr # floor division
t, newt = newt, t - quotient * newt
r, newr = newr, r - quotient * newr
if r > 1:
return None # there's no solution
if t < 0:
t = t + n
return t
inverse(17, 60) # returns 53
adapted from pseudocode found in wiki: https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#Pseudocode
Simply use this formula,
d = (1+K(phi))/e. (Very useful when e and phi are small numbers)
Lets say, e = 3 and phi = 40
we assume K = 0, 1, 2... until your d value is not a decimal
assume K = 0, then
d = (1+0(40))/3 = 0. (if it is a decimal increase the K value, don't bother finding the exact value of the decimal)
assume K = 2, then
d = (1+2(40)/3) = 81/3 = 27
d = 27.
Assuming K will become exponentially easy with practice.
Taken the values p=7, q=11 and e=17.
then the value of n=p*q=77 and f(n)=(p-1)(q-1)=60.
Therefore, our public key pair is,(e,n)=(7,77)
Now for calvulating the value of d we have the constraint,
e*d == 1 mod (f(n)), [here "==" represents the **congruent symbol**].
17*d == 1 mod 60
(17*53)*d == 53 mod 60, [7*53=901, which gives modulus value 1]
1*d == 53 mod 60
hence,this gives the value of d=53.
Therefore our private key pair will be, (d,n)=(53,77).
Hope this help. Thank you!

How to find the range for a given number, interval and start value?

Provided the below values
start value = 1
End Value = 20
Interval = 5
I have been provided a number 6. I have to find the range of numbers in which the number 6 falls say now the answer is 6-10.
If the given number is greater than the end value then return the same number.
Is there any formula so that i can generate the range for the number?
UPDATE
I tried the below solution, But it is not working if the range interval is changed,
$end_value = $start_value + $range_interval;
// we blindly return the last term if value is greater than max value
if ($input_num > $end_value) {
return '>' . $end_value;
}
// we also find if its a first value
if ($input_num <= $end_value && $value >= $start_value) {
return $start_value . '-' . $end_value;
}
// logic to find the range for a given integer
$dived_value = $input_num/$end_value;
// round the value to get the exact match
$rounded_value = ceil($dived_value);
$upper_bound_range = $rounded_value*$end_value;
$lower_bound_range = $upper_bound_range - $end_value;
return $lower_bound_range . '-'. $upper_bound_range;
In (c-style) pseudocode:
// Integer division assumed
rangeNumber = (yourNumber - startValue) / rangeLength;
lower_bound_range = startValue + rangeNumber*rangeLength;
upper_bound_range = lower_bound_range + rangeLength-1;
For your input:
rangeNumber = (6-1)/5 = 1
lower_bound_range = 1 + 5*1 = 6
upper_bound_range = 10
and so range is [6, 10]
The answer depends on whether you talk about integers or floats. Since all your example numbers are integers, I assume you talk about those. I further assume that all your intervals contain the same number of integers, in your example 5, namely 1...5, 6...10, 11...15, and 16...20. Note that 0 is not contained in the 1st interval (otherwise the 1st interval had 6 numbers).
In this case the answer is easy.
Let be:
s the start value that is not contained in the 1st interval,
i the interval size, i.e. the number of integers that it contains,
p the provided number to which an interval should be assigned,
b the 1st integer in this interval, and
e the last integer in this interval.
Then:
b = s + (p-s-1)\i * i + 1 (here, "\" means integer division, i.e. without remainder)
e = b + i - 1
In your example:
s = 0, i = 5, p = 6, thus
b = 0 + (6-0-1)\5 * 5 + 1 = 6
e = 6 + 5 - 1 = 10

Number of subsets of {1,2,3,...,N} containing at least 3 consecutive elements

Suppose we have a set like {1,2,3} then there is only one way to choose 3 consecutive numbers... it's the set {1,2,3}...
For a set of {1,2,3,4} we have 3 ways: 123 234 1234
(technically these are unordered sets of numbers, but writing them consecutively helps)
f(5) ; {1,2,3,4,5} -> 8 ways: 123 1234 1235 12345 234 2345 345 1345
f(6) ; {1,2,3,4,5,6} -> 20 ways: ...
f(7) ; {1,2,3,4,5,6,7} -> 47 ways: ...
So for a given N, I can get the answer by applying brute force, and calculating all such subset having 3 or more consecutive number.
Here I am just trying to find out a pattern, a technique to get the number of all such subset for a given N.
The problem is further generalized to .....discover m consecutive number within a set of size N.
There is a bijection between this problem and "the number of N-digit binary numbers with at least three consecutive 1s in a row somewhere" (the bijection being a number is 0 if excluded in the subset, and 1 if included in the subset).
This is a known problem, and should be enough information to google for a result, if you search for number of n-digit binary strings with m consecutive 1s, the second hit is Finding all n digit binary numbers with r adjacent digits as 1
Alternatively you can just look it up as http://oeis.org/search?q=0%2C0%2C1%2C3%2C8%2C20%2C47 (based on the brute-forcing you did for the first few terms) - resulting in an explicit formula of 2^n - tribonacci(n+3), see here for an explicit formula for tribonacci numbers. It also gives a recurrence relation. The analogy given is "probability (out of 2^n) of getting at least 1 run of 3 heads within n flips of a fair coin"
I can only assume that the answer to the general problem is 2^n - Fm(n+m), where Fm is the mth n-step Fibonacci number (edit: that does seem to be the case)
This sounds like homework to me, so I'll just get you started. FoOne approach is to think of the Lowest and Highest members of the run, L and H. If the set size is N and your minimum run length is M, then for each possible position P of L, you can work out how many positions of H there are....
With a bit of python code, we can investigate this:
y = set()
def cons(li, num):
if len(li) < num:
return
if len(li) == num:
y.add(tuple([i for i in li]))
else:
y.add(tuple([i for i in li]))
cons(li[1:], num)
cons(li[:-1], num)
This solution will be quite slow (it's exponential in complexity, actually), but try it out for a few small list sizes and I think you should be able to pick up the pattern.
Not sure if you mean consecutive or not. If not, then for {1, 2, 3, 4} there are 4 possibilities: {1, 2, 3} {2, 3, 4} {1, 3, 4} {1, 2, 3, 4}
I think you can calculate the solution with N!/3! where N! = N*(N-1)(N-2)...*1.
Quick answer:
Sequences(n) = (n-1)*(n-2) / 2
Long answer:
You can do this by induction. First, I'm going to re-state the problem, because your problem statement isn't clear enough.
Rule 1: For all sets of consecutive numbers 1..n where n is 2 or more
Rule 2: Count the subsets S(n) of consecutive numbers m..m+q where q is 2 or more
S(n=3)
By inspection we find only one - 123
S(n=4)
By inspection we find 3! - 123 234 and 1234
Note that S(4) contains S(3), plus two new ones... both include the new digit 4... hmm.
S(n=5)
By inspection we find ... S(n=4) as well as 345 2345 and 12345. That's 3+3=6 total.
I think there's a pattern forming here. Let's define a new function T.
Rule 3: S(n) = S(n-1) + T(n) ... for some T.
We know that S(n) contains the digit n, and should have spotted by now that S(n) also contains (as a subcomponent) all sequences of length 3 to n that include the digit n. We know they cannot be in S(n-1) so they must be in T(n).
Rule 4: T(n) contains all sequence ending in n that are of length 3 to n.
How many sequences are in S(n)?
Let's look back at S(3) S(4) and S(5), and incorporate T(n):
S(3) = S(3)
S(4) = S(3) + T(4)
S(5) = S(4) + T(5) = S(3) + T(4) + T(5)
let's generalise:
S(n) = S(3) + T(f) for all f from 4 to n.
So how many are in a given T?
Look back at rule 5 - how many sequences does it describe?
For T(4) it describes all sequences 3 and longer ending in 4. (that's 234)
For T(5) it describes all sequences 3 and longer ending in 5. (that's 345 2345 = 2)
T count Examples
4 2 1234 234
5 3 12345 2345 345
6 4 123456 23456 3456 456
Looks awfully like T(n) is simply n-2!
So
S(6) = T(6) + T(5) + T(4) + S(3)
10 = 4 + 3 + 2 + 1
And
S(7) = 15 = 5 + 4 + 3 + 2 + 1
S(8) = 21 = 6 + 5 + 4 + 3 + 2 + 1
Turning this into a formula
What's 2 * S(8)?
42 = 6 + 5 + 4 + 3 + 2 + 1 + 1 + 2 + 3 + 4 + 5 + 6
Add each pair of biggest and smallest numbers:
42 = 7 + 7 + 7 + 7 + 7 + 7
42 = 7 * 6
But that's 2 * S(8), so
S(8) = 42/2 = 21 = 7 * 6 / 2
This generalizes:
S(n) = (n-1)*(n-2) / 2
Let's check this works:
S(3) = 2*1/2 = 1
S(4) = 3*2/2 = 3
S(5) = 4*3/2 = 6
S(6) = 5*4/2 = 10
I'm satisfied.

Resources