How to determine the number of possible combinations of letters that contain a degenerate substring

How to determine the number of possible combinations of letters that contain a degenerate substring - r

I've been racking my brain for a couple of days to work out a series or closed-form equation to the following problem:
Specifically: given all strings of length N that draws from an alphabet of L letters (starting with 'A', for example {A, B}, {A, B, C}, ...), how many of those strings contain a substring that matches the pattern: 'A', more than 1 not-'A', 'A'. The standard regular expression for that pattern would be A[^A][^A]+A.
The number of possible strings is simple enough: L^N . For small values of N and L, it's also very practical to simply create all possible combinations and use a regular expression to find the substrings that match the pattern; in R:
all.combinations <- function(N, L) {
apply(
expand.grid(rep(list(LETTERS[1:L]), N)),
1,
paste,
collapse = ''
)
}
matching.pattern <- function(N, L, pattern = 'A[^A][^A]+A') {
sum(grepl(pattern, all.combinations(N, L)))
}
all.combinations(4, 2)
matching.pattern(4, 2)
I had come up with the following, which works for N < 7:
M <- function(N, L) {
sum(
sapply(
2:(N-2),
function(g) {
(N - g - 1) * (L - 1) ** g * L ** (N - g - 2)
}
)
)
}
Unfortunately, that only works while N < 7 because it's simply adding the combinations that have substrings A..A, A...A, A....A, etc. and some combinations obviously have multiple matching substrings (e.g., A..A..A, A..A...A), which are counted twice.
Any suggestions? I am open to procedural solutions too, so long as they don't blow up with the number of combinations (like my code above would). I'd like to be able to compute for values of N from 15 to 25 and L from 2 to 10.
For what it is worth, here's the number of combinations, and matching combinations for some values of N and L that are tractable to determine by generating all combinations and doing a regular expression match:
N L combinations matching
-- - ------------ --------
4 2 16 1
5 2 32 5
6 2 64 17
7 2 128 48
8 2 256 122
9 2 512 290
10 2 1024 659
4 3 81 4
5 3 243 32
6 3 729 172
7 3 2187 760
8 3 6561 2996
9 3 19683 10960
10 3 59049 38076
4 4 256 9
5 4 1024 99
6 4 4096 729
7 4 16384 4410
8 4 65536 23778
9 4 262144 118854
10 4 1048576 563499

It is possible to use dynamic programming approach.
For fixed L, let X(n) be number of strings of length n that contain given pattern, and let A(n) be number of strings of length n that contain given pattern and starts with A.
First derive recursion formula for A(n). Lets count all strings in A(n) by grouping them by first 2-3 letters. Number of strings in A(n) with:
"second letter A" is A(n-1),
"second letter non-A and third letter is A" is A(n-2),
"second and third letter non-A" is (L^(n-3) - (L-1)^(n-3)). That is because string 'needs' at least one A in remaining letters to be counted.
With that:
A(n) = A(n-1) + (L-1) * (A(n-2) + (L-1) * (L^(n-3) - (L-1)^(n-3)))
String of length n+1 can start with A or non-A:
X(n+1) = A(n+1) + (L-1) * X(n)
X(i) = A(i) = 0, for i <= 3
Python implementation:
def combs(l, n):
x = [0] * (n + 1) # First element is not used, easier indexing
a = [0] * (n + 1)
for i in range(4, n+1):
a[i] = a[i-1] + (l-1) * (a[i-2] + (l-1) * (l**(i-3) - (l-1)**(i-3)))
x[i] = a[i] + (l-1) * x[i-1]
return x[4:]
print(combs(2, 10))
print(combs(3, 10))
print(combs(4, 10))

This can be described as a state machine. (For simplicity, x is any letter other than A.)
S0 := 'A' S1 | 'x' S0 // ""
S1 := 'A' S1 | 'x' S2 // A
S2 := 'A' S1 | 'x' S3 // Ax
S3 := 'A' S4 | 'x' S3 // Axx+
S4 := 'A' S4 | 'x' S4 | $ // AxxA
Counting the number of matching strings of length n
S0(n) = S1(n-1) + (L-1)*S0(n-1); S0(0) = 0
S1(n) = S1(n-1) + (L-1)*S2(n-1); S1(0) = 0
S2(n) = S1(n-1) + (L-1)*S3(n-1); S2(0) = 0
S3(n) = S4(n-1) + (L-1)*S3(n-1); S3(0) = 0
S4(n) = S4(n-1) + (L-1)*S4(n-1); S4(0) = 1
Trying to reduce S0(n) to just n and L gives a really long expression, so it would be easiest to calculate the recurrence functions as-is.
For really large n, this could be expressed as a matrix expression, and be efficiently calculated.
n
[L-1 1 0 0 0 ]
[ 0 1 L-1 0 0 ] T
[0 0 0 0 1] × [ 0 1 0 L-1 0 ] × [1 0 0 0 0]
[ 0 0 0 L-1 1 ]
[ 0 0 0 0 L ]
In JavaScript:
function f(n, L) {
var S0 = 0, S1 = 0, S2 = 0, S3 = 0, S4 = 1;
var S1_tmp;
while (n-- > 0) {
S0 = S1 + (L - 1) * S0;
S1_tmp = S1 + (L - 1) * S2;
S2 = S1 + (L - 1) * S3;
S3 = S4 + (L - 1) * S3;
S4 = S4 + (L - 1) * S4;
S1 = S1_tmp;
}
return S0;
}
var $tbody = $('#resulttable > tbody');
for (var L = 2; L <= 4; L++) {
for (var n = 4; n <= 10; n++) {
$('<tr>').append([
$('<td>').text(n),
$('<td>').text(L),
$('<td>').text(f(n,L))
]).appendTo($tbody);
}
}
#resulttable td {
text-align: right;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<table id="resulttable">
<thead>
<tr>
<th>N</th>
<th>L</th>
<th>matches</th>
</tr>
</thead>
<tbody>
</tbody>
</table>

Related

Expressing Natural Number by sum of Triangular numbers

Triangular numbers are numbers which is number of things when things can be arranged in triangular shape.
For Example, 1, 3, 6, 10, 15... are triangular numbers.
o o o o o o o o o o is shape of n=4 triangular number
what I have to do is A natural number N is given and I have to print
N expressed by sum of triangular numbers.
if N = 4
output should be
1 1 1 1
1 3
3 1
else if N = 6
output should be
1 1 1 1 1 1
1 1 1 3
1 1 3 1
1 3 1 1
3 1 1 1
3 3
6
I have searched few hours and couldn't find answers...
please help.
(I am not sure this might help, but I found that
If i say T(k) is Triangular number when n is k, then
T(k) = T(k-1) + T(k-3) + T(k-6) + .... + T(k-p) while (k-p) > 0
and p is triangular number )
Here's Code for k=-1(Read comments below)
#include <iostream>
#include <vector>
using namespace std;
long TriangleNumber(int index);
void PrintTriangles(int index);
vector<long> triangleNumList(450); //(450 power raised by 2 is about 200,000)
vector<long> storage(100001);
int main() {
int n, p;
for (int i = 0; i < 450; i++) {
triangleNumList[i] = i * (i + 1) / 2;
}
cin >> n >> p;
cout << TriangleNumber(n);
if (p == 1) {
//PrintTriangles();
}
return 0;
}
long TriangleNumber(int index) {
int iter = 1, out = 0;
if (index == 1 || index == 0) {
return 1;
}
else {
if (storage[index] != 0) {
return storage[index];
}
else {
while (triangleNumList[iter] <= index) {
storage[index] = ( storage[index] + TriangleNumber(index - triangleNumList[iter]) ) % 1000000;
iter++;
}
}
}
return storage[index];
}
void PrintTriangles(int index) {
// What Algorithm?
}

Here is some recursive Python 3.6 code that prints the sums of triangular numbers that total the inputted target. I prioritized simplicity of code in this version. You may want to add error-checking on the input value, counting the sums, storing the lists rather than just printing them, and wrapping the entire routine into a function. Setting up the list of triangular numbers could also be done in fewer lines of code.
Your code saved time but worsened memory usage by "memoizing" the triangular numbers (storing and reusing them rather than always calculating them when needed). You could do the same to the sum lists, if you like. It is also possible to make this more in the dynamic programming style: find the sum lists for n=1 then for n=2 etc. I'll leave all that to you.
""" Given a positive integer n, print all the ways n can be expressed as
the sum of triangular numbers.
"""
def print_sums_of_triangular_numbers(prefix, target):
"""Print sums totalling to target, each after printing the prefix."""
if target == 0:
print(*prefix)
return
for tri in triangle_num_list:
if tri > target:
return
print_sums_of_triangular_numbers(prefix + [tri], target - tri)
n = int(input('Value of n ? '))
# Set up list of triangular numbers not greater than n
triangle_num_list = []
index = 1
tri_sum = 1
while tri_sum <= n:
triangle_num_list.append(tri_sum)
index += 1
tri_sum += index
# Print the sums totalling to n
print_sums_of_triangular_numbers([], n)
Here are the printouts of two runs of this code:
Value of n ? 4
1 1 1 1
1 3
3 1
Value of n ? 6
1 1 1 1 1 1
1 1 1 3
1 1 3 1
1 3 1 1
3 1 1 1
3 3
6

Non Decreasing Number Combinations (Interval)

So my problem is the following:
Given a number X of size and an A (1st number), B(Last number) interval, I have to find the number of all different kind of non decreasing combinations (increasing or null combinations) that I can build.
Example:
Input: "2 9 11"
X = 2 | A = 9 | B = 11
Output: 8
Possible Combinations ->
[9],[9,9],[9,10],[9,11],[10,10],[10,11],[11,11],[10],[11].
Now, If it was the same input, but with a different X, line X = 4, this would change a lot...
[9],[9,9],[9,9,9],[9,9,9,9],[9,9,9,10],[9,9,9,11],[9,9,10,10]...

Your problem can be reformulated to simplify to just two parameters
X and N = B - A + 1 to give you sequences starting with 0 instead of A.
If you wanted exactly X numbers in each item, it is simple combination with repetition and the equation for that would be
x_of_n = (N + X - 1)! / ((N - 1)! * X!)
so for your first example it would be
X = 2
N = 11 - 9 + 1 = 3
x_of_n = 4! / (2! * 2!) = 4*3*2 / 2*2 = 6
to this you need to add the same with X = 1 to get x_of_n = 3, so you get the required total 9.
I am not aware of simple equation for the required output, but when you expand all the equations to one sum, there is a nice recursive sequence, where you compute next (N,X) from (N,X-1) and sum all the elements:
S[0] = N
S[1] = S[0] * (N + 1) / 2
S[2] = S[1] * (N + 2) / 3
...
S[X-1] = S[X-2] * (N + X - 1) / X
so for the second example you give we have
X = 4, N = 3
S[0] = 3
S[1] = 3 * 4 / 2 = 6
S[2] = 6 * 5 / 3 = 10
S[3] = 10 * 6 / 4 = 15
output = sum(S) = 3 + 6 + 10 + 15 = 34
so you can try the code here:
function count(x, a, b) {
var i,
n = b - a + 1,
s = 1,
total = 0;
for (i = 0; i < x; i += 1) {
s *= (n + i) / (i + 1); // beware rounding!
total += s;
}
return total;
}
console.log(count(2, 9, 11)); // 9
console.log(count(4, 9, 11)); // 34
Update: If you use a language with int types (JS has only double),
you need to use s = s * (n + i) / (i + 1) instead of *= operator to avoid temporary fractional number and subsequent rounding problems.
Update 2: For a more functional version, you can use a recursive definition
function count(x, n) {
return n < 1 || x < 1 ? 0 : 1 + count(n - 1, x) + count(n, x - 1);
}
where n = b - a + 1

QBasic - How to find this value?

If we have M as follows:
M = 1+2+3+5+6+7+9+10+11+13+...+n
What would be the QBasic program to find M.
I have done the following so far, but is not returning me the expected value
INPUT "ENTER A VALUE FOR N"
SUM = 0
FOR I = 1 TO N
IF I MOD 4 = 0
SUM = SUM + I
NECT I
How should I go about this?
Thanks.

You have mixed the equality operator. Try this:
INPUT "ENTER A VALUE FOR N"
SUM = 0
FOR I = 1 TO N
IF I MOD 4 <> 0
SUM = SUM + I
NEXT I

No need to write a program, or at least no need to use loops.
Sum of first n natural numbers:
sum_1 = n * (n + 1) / 2
Sum of multiples of 4 < n:
sum_2 = 4 * (n / 4) * (n / 4 + 1) / 2 = 2 * (n / 4) * (n / 4 + 1)
The result is sum_1 - sum_2:
sum = sum_1 - sum_2 = n * (n + 1) / 2 - 2 * (n / 4) * (n / 4 + 1)
NB: / = integer division

This snip calculates the sum of integers to n skipping values divisible by 4.
PRINT "Enter upper value";
INPUT n
' calculate sum of all values
FOR l = 1 TO n
x = x + l
NEXT
' remove values divisible by 4
FOR l = 0 TO n STEP 4
x = x - l
NEXT
PRINT "Solution is:"; x

Calculating Total Number of Times of Loops

I'm trying to calculate the total number of times the innermost statement is executed.
count = 0;
for i = 1 to n
for j = 1 to n - i
count = count + 1
I figured that the most the loop can execute is O(n*n-i) = O(n^2). I wanted to prove this by using double summation but I'm getting lost since the I'm having trouble starting the equation since j = 1 is thrown into there.
Can someone help me explain this to me?
Thanks

For each i, the inner loop executes n - i times (n is constant). Therefore (since i ranges from 1 to n), to determine the total number of times the innermost statement is executed, we must evaluate the sum
(n - 1) + (n - 2) + (n - 3) + ... + (n - n)
By rearranging the terms (grouping all the ns that appear first), we can see that this is equal to
n*n - (1 + 2 + 3 + ... + n) = n*n - n(n+1)/2 = n*(n-1)/2 = n*n/2 - n/2
Here's a simple implementation in Python to verify this:
def f(n):
count = 0;
for i in range(1, n + 1):
for _ in range(1, n - i + 1):
count = count + 1
return count
for n in range(1,11):
print n, '\t', f(n), '\t', n*n/2 - n/2
Output:
1 0 0
2 1 1
3 3 3
4 6 6
5 10 10
6 15 15
7 21 21
8 28 28
9 36 36
10 45 45
The first column is n, the second is the number of times that inner statement is executed, and the third is n*n/2 - n/2.

how to compute the original vector from a distance matrix?

I have a small question about vector and matrix.
Suppose a vector V = {v1, v2, ..., vn}. I generate a n-by-n distance matrix M defined as:
M_ij = | v_i - v_j | such that i,j belong to [1, n].
That is, each element M_ij in the square matrix is the absolute distance of two elements in V.
For example, I have a vector V = {1, 3, 3, 5}, the distance matrix will be
M=[
0 2 2 4;
2 0 0 2;
2 0 0 2;
4 2 2 0; ]
It seems pretty simple. Now comes to the question. Given such a matrix M, how to obtain the initial V?
Thank you.
Based on some answer for this question, it seems that the answer is not unique. So, now suppose that all the initial vector has been normalized to 0 mean and 1 variance. The question is: Given such a symmetric distance matrix M, how to decide the initial normalized vector?

You can't. To give you an idea of why, consider these two cases:
V1 = {1,2,3}
M1 = [ 0 1 2 ; 1 0 1 ; 2 1 0 ]
V2 = {3,4,5}
M2 = [ 0 1 2 ; 1 0 1 ; 2 1 0 ]
As you can see, a single M could be the result of more than one V. Therefore, you can't map backwards.

There is no way to determine the answer uniquely, since the distance matrix is invariant to adding a constant to all elements and to multiplying all the values by -1. Assuming that element 1 is equal to 0, and that the first nonzero element is positive, however, you can find an answer. Here is the pseudocode:
# Assume v[1] is 0
v[1] = 0
# e is value of first non-zero vector element
e = 0
# ei is index of first non-zero vector element
ei = 0
for i = 2...n:
# if all vector elements have been 0 so far
if e == 0:
# get the current distance from element 1 and its index
# this new element may still be 0
e = d[1,i]
ei = i
v[i] = e
elseif d[1,i] == d[ei,i] + v[ei]: # v[i] <= v[1]
# v[i] is to the left of v[1] (assuming v[ei] > v[1])
v[i] = -d[1,i]
else:
# some other case; v[i] is to the right of v[1]
v[i] = d[1,i]

I don't think it is possible to find the original vector, but you can find a translation of the vector by taking the first row of the matrix.
If you let M_ij = | v_i - v_j | and you translate all v_k for k\in [1,n] you will get
M_ij = | v-i + 1 - v_j + 1 |
= | v_i - v_j |
Hence, just take the first row as the vector and find one initial point to translate the vector to.
Correction:
Let v_1 = 0, and let l_k = | v_k | for k\in [2,n] and p_k the parity of v_k
Let p_1 = 1
for(int i = 2; i < n; i++)
if( | l_i - l_(i+1) | != M_i(i+1) )
p_(i+1) = - p_i
else
p_(i+1) = p_i
doing this for all v_k for k\in [2,n] in order will show the parity of each v_k in respect to the others
Then you can find a translation of the original vector with the same or opposite direction
Update (For Normalized vector):
Let d = Sqrt(v_1^2 + v_2^2 + ... + v_n^2)
Vector = {0, v_1 / d, v_2 / d, ... , v_n / d}
or
{0, -v_1 / d, -v_2 / d, ... , -v_n / d}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to determine the number of possible combinations of letters that contain a degenerate substring - r

Related

Expressing Natural Number by sum of Triangular numbers

Non Decreasing Number Combinations (Interval)

QBasic - How to find this value?

Calculating Total Number of Times of Loops

how to compute the original vector from a distance matrix?

Categories

Resources