Alternative for recursive aggregate queries not supported in sqlite3 - recursion

I would like to perform a SQL computation of a system evolving in time as
v <- v + a (*) v
where v is a vector of N components (N >> 10), a is an N-by-N matrix, fairly sparse, (*) denotes matrix multiplication, and the evolution is recursively computed as a sequence of timesteps, with each step using the previous value of v. a changes with time as an external factor, but it is sufficient for this question to assume a is constant.
I could do this recursion loop in an imperative language, but the underlying data was kind of messy and SQL was brilliant for normalising. It would be kind of neat to just finish the job in one language.
I found that matrix multiplication is fine. Recursion is fine too, as of sqlite 3.8. But matrix multiplication inside a recursion loop does not appear to be possible. Here is my progress so far (also at http://sqlfiddle.com/#!5/ed521/1 ):
-- Example vector v
DROP TABLE IF EXISTS coords;
CREATE TABLE coords( row INTEGER PRIMARY KEY, val FLOAT );
INSERT INTO coords
VALUES
(1, 0.0 ),
(2, 1.0 );
-- Example matrix a
DROP TABLE IF EXISTS matrix;
CREATE TABLE matrix( row INTEGER, col INTEGER, val FLOAT, PRIMARY KEY( row, col ) );
INSERT INTO matrix
VALUES
( 1, 1, 0.0 ),
( 1, 2, 0.03 ),
( 2, 1, -0.03 ),
( 2, 2, 0.0 );
-- The timestep equation can also be expressed: v <- ( I + a ) (*) v, where the
-- identity matrix I is first added to a.
UPDATE matrix
SET val = val + 1.0
WHERE row == col;
-- Matrix multiply to evaluate the first step.
SELECT a.row AS row, SUM( a.val*v.val ) AS val
FROM coords v
JOIN matrix a
ON a.col == v.row
GROUP BY a.row;
Here is where the problem arises. I can't see how to do a matrix multiply without a
GROUP BY (aggregation) operation, but Sqlite3 specifically does not permit aggregation inside of a recursion loop:
-- Recursive matrix multiply to evaluate a sequences of steps.
WITH RECURSIVE trajectory( row, val ) AS
(
SELECT row, val
FROM coords
UNION ALL
SELECT a.row AS row, SUM( a.val*v.val ) AS val
FROM trajectory v -- recursive sequence of steps
--FROM coords v -- non-recursive first step only
JOIN matrix a
ON a.col == v.row
GROUP BY a.row
LIMIT 50
)
SELECT *
FROM trajectory;
Returns
Error: recursive aggregate queries not supported
No doubt the designers had some clear reason for excluding it! I am surprised that JOINs are allowed but GROUP BYs are not. I am not sure what my alternatives are, though.
I've found a few other recursion examples but they all seem to have carefully selected problems for which aggregation or self-joins inside the loop are not required. In the docs( https://www.sqlite.org/lang_with.html ) an example query walks a tree recursively, and performs an avg() on the output. This is subtly different: the aggregation happens outside the loop, and tree-walking uses JOINs but no aggregation inside the recursion loop. That problem proceeds only because the recursion does not depend on the aggregations, as it does in this problem.
Another example, the Fibonacci generator is an example of an N = 2 linear dynamical system, but with N = 2 the implementations can just hard-code the two values and the matrix multiply directly into the query, so no aggregating SUM() is needed. More generally with N >> 10 it is not feasible to go down this path.
Any help would be much appreciated. Thanks!

Related

What is the most efficient way to find all pairs of numbers from a list of integers which add up to a separate given integer?

I had an interview yesterday and was asked to give a method to find all of the pairs of numbers from a list which add up to an integer which is given separate to the list. The list can be infinitely long, but for example:
numbers = [11,1,5,27,7,18,2,4,8]
sum = 9
pairs = [(1,8),(5,4),(7,2)]
I got as far as sorting the list and eliminating all numbers greater than the sum number and then doing two nested for loops to take each index and iterate through the other numbers to check whether they sum up to the given number, but was told that there was a more efficient way of doing it...
I've been trying to figure it out but have nothing, other than doing the nested iteration backwards but that only seems marginally more efficient.
Any ideas?
This can be done in O(n) time and O(n) auxiliary space; testing for membership of a set takes O(1) time. Since the output also takes up to O(n) space, the auxiliary space should not be a significant issue.
def pairs_sum(numbers, k):
numbers_set = set(numbers)
return [(x, y) for x in numbers if (y := k - x) in numbers_set and x < y]
Example:
>>> pairs_sum([11, 1, 5, 27, 7, 18, 2, 4, 8], 9)
[(1, 8), (2, 7), (4, 5)]
It is kind of a classic and not sure stackoverflow is the right place to ask that kind of question.
Sort the list is acsending order
Two iterators one starting from the end of the list descending i1, one starting from the beginning of the list ascending i2
Loop
while i1 > i2
if (list[i1] + list[i2] == target)
store {list[i1], list[i2]) in results pairs
i1--
i2++
else if (list[i1] + list[i2] > target)
i1--
else if (list[i1] + list[i2] < target)
i2++
This should be in O(n) with n the length of the list if you avoid the sorting algorithm which can be done with a quick sort on average in O(n log n)
Note: this algorithm doesn't take into account the case where the input list have several times the same number

Renumber table rows with a recursive statement

To understand the behaviour of recursion (in SQLite), I tried the following statements to re-number the rows of a table with a recursive statement:
Let's create a sample table,
CREATE TABLE tb
(x TEXT(1) PRIMARY KEY);
INSERT INTO tb
VALUES ('a'), ('b'), ('c');
and re-number the rows starting from, say, 2 via
SELECT tb.x as x, tb.rowid + 1 as idx from tb;
/* yields expected:
a|2
b|3
c|4
*/
Attempting to do the same with a recursive WITH (neglecting ROWID), results in divergence — here, I have added LIMIT 6 to prevent the divergence:
WITH RECURSIVE
newtb AS (
SELECT tb.x, 2 AS idx FROM tb
UNION ALL
SELECT tb.x, newtb.idx + 1
FROM tb, newtb
LIMIT 6 -- only to prevent divergence!
)
SELECT * FROM newtb;
/* yields indefinitely:
a|2
b|2
c|2
a|3
b|3
c|3
...
*/
Why does the recursion does not stop when it reaches the end of table tb? Could this be prevented?
Note that the problem can be reformulated as how to produce the result of the following procedural pseudo-code in SQLite (without too much ado):
tb := {'a', 'b', 'c'};
num := {1, 2, 3};
result := {}; # initialize an empty table
for i in {1, ..., length(tb)} # assume index starts from 1
append tuple(num[i], tb[i]) to result;
end for
# result will be {(1, 'a'), (2, 'b'), (3, 'c')}
This is equivalent to the zip operation in a language like Python.
According to a hint by #CPerkins, one can achieve this goal via window functions (for SQLite >= 3.25) very elegantly; eg.,
SELECT (row_number() OVER (ORDER BY x)) + 2 AS newId, x FROM tb;
Why does the recursion does not stop when it reaches the end of table tb?
Because that's the way it is designed to be and it is extremely useful. It is little different from most languages that have some form of recursion and is often and efficient and effective the way to resolve some programming issues such as traversing a directory tree.
Most computer programming languages support recursion by allowing a
function to call itself from within its own code. Some functional
programming languages do not define any looping constructs but rely
solely on recursion to repeatedly call code. Computability theory
proves that these recursive-only languages are Turing complete; they
are as computationally powerful as Turing complete imperative
languages, meaning they can solve the same kinds of problems as
imperative languages even without iterative control structures such as
while and for.Recursion (computer science)
If you used LIMIT (SELECT count() FROM tb) instead of LIMIT 6 then the recursion would stop based upon the number of rows in the table.
However, if you are looking to renumber (by adding 1 to the rowid) then you would be looking at something more like :-
WITH RECURSIVE
cte(idx,newidx) AS (
SELECT (SELECT max(rowid) FROM tb),(SELECT max(rowid) FROM tb) +1
UNION ALL
SELECT
idx-1, newidx-1 FROM cte
WHERE idx > 0
)
SELECT (SELECT x FROM tb WHERE tb.rowid = cte.idx) AS x, newidx, idx AS original FROM cte WHERE x IS NOT NULL;
This would (assuming that tb had rows with a, b and c .... X, Y and Z and that rows d-w had been deleted) result in :-
SQlite's reasoning is :-
Recursive common table expressions provide the ability to do
hierarchical or recursive queries of trees and graphs, a capability
that is not otherwise available in the SQL language.
SQL As Understood By SQLite - WITH clause
Could this be prevented?
Yes, you can not use recursion as there may be alternatives but as with recursion throughout other languages if you do use recursion then you have to have some means of detecting when the recursion should finish. The use of a WHERE or LIMIT clause facilitates this.

Generate Unique Combinations of Integers

I am looking for help with pseudo code (unless you are a user of Game Maker 8.0 by Mark Overmars and know the GML equivalent of what I need) for how to generate a list / array of unique combinations of a set of X number of integers which size is variable. It can be 1-5 or 1-1000.
For example:
IntegerList{1,2,3,4}
1,2
1,3
1,4
2,3
2,4
3,4
I feel like the math behind this is simple I just cant seem to wrap my head around it after checking multiple sources on how to do it in languages such as C++ and Java. Thanks everyone.
As there are not many details in the question, I assume:
Your input is a natural number n and the resulting array contains all natural numbers from 1 to n.
The expected output given by the combinations above, resembles a symmetric relation, i. e. in your case [1, 2] is considered the same as [2, 1].
Combinations [x, x] are excluded.
There are only combinations with 2 elements.
There is no List<> datatype or dynamic array, so the array length has to be known before creating the array.
The number of elements in your result is therefore the binomial coefficient m = n over 2 = n! / (2! * (n - 2)!) (which is 4! / (2! * (4 - 2)!) = 24 / 4 = 6 in your example) with ! being the factorial.
First, initializing the array with the first n natural numbers should be quite easy using the array element index. However, the index is a property of the array elements, so you don't need to initialize them in the first place.
You need 2 nested loops processing the array. The outer loop ranges i from 1 to n - 1, the inner loop ranges j from 2 to n. If your indexes start from 0 instead of 1, you have to take this into consideration for the loop limits. Now, you only need to fill your target array with the combinations [i, j]. To find the correct index in your target array, you should use a third counter variable, initialized with the first index and incremented at the end of the inner loop.
I agree, the math behind is not that hard and I think this explanation should suffice to develop the corresponding code yourself.

Julia : BLAS.gemm!() parameters

I want to use the BLAS package. To do so, the meaning of the two first parameters of the gemm() function is not evident for me.
What do the parameters 'N' and 'T' represent?
BLAS.gemm!('N', 'T', lr, alpha, A, B, beta, C)
What is the difference between BLAS.gemm and BLAS.gemm! ?
According to the documentation
gemm!(tA, tB, alpha, A, B, beta, C)
Update C as alpha * A * B + beta*C or the other three variants according to tA (transpose A) and tB. Returns the updated C.
Note: here, alpha and beta must be float type scalars. A, B and C are all matrices. It's up to you to make sure the matrix dimensions match.
Thus, the tA and tB parameters refer to whether you want to apply the transpose operation to A or to B before multiplying. Note that this will cost you some computation time and allocations - the transpose isn't free. (thus, if you were going to apply the multiplication many times, each time with the same transpose specification, you'd be better off storing your matrix as the transposed version from the beginning). Select N for no transpose, T for transpose. You must select one or the other.
The difference between gemm!() and gemv!() is that for gemm!() you already need to have allocated the matrix C. The ! is a "modify in place" signal. Consider the following illustration of their different uses:
A = rand(5,5)
B = rand(5,5)
C = Array(Float64, 5, 5)
BLAS.gemm!('N', 'T', 1.0, A, B, 0.0, C)
D = BLAS.gemm('N', 'T', 1.0, A, B)
julia> C == D
true
Each of these, in essence, perform the calculation C = A * B'. (Technically, gemm!() performs C = (0.0)*C + (1.0)*A * B'.)
Thus, the syntax for the modify in place gemm!() is a bit unusual in some respects (unless you've already worked with a language like C in which case it seems very intuitive). You don't have the explicit = sign like you frequently do when calling functions in assigning values in a high level object oriented language like Julia.
As the illustration above shows, the outcome of gemm!() and gemm() in this case is identical, even though the syntax and procedure to achieve that outcome is a bit different. Practically speaking, however, performance differences between the two can be significant, depending on your use case. In particular, if you are going to be performing that multiplication operation many times, replacing/updating the value of C each time, then gemm!() can be a decent bit quicker because you don't need to keep re-allocating new memory each time, which does have time costs, both in the initial memory allocation and then in the garbage collection later on.

Can recursion be dynamic programming?

I was asked to use dynamic programming to solve a problem. I have mixed notes on what constitutes dynamic programming. I believe it requires a "bottom-up" approach, where smallest problems are solved first.
One thing I have contradicting information on, is whether something can be dynamic programming if the same subproblems are solved more than once, as is often the case in recursion.
For instance. For Fibonacci, I can have a recursive algorithm:
RecursiveFibonacci(n)
if (n=1 or n=2)
return 1
else
return RecursiveFibonacci(n-1) + RecursiveFibonacci(n-2)
In this situation, the same sub-problems may be solved over-and-over again. Does this render it is not dynamic programming? That is, if I wanted dynamic programming, would I have to avoid resolving subproblems, such as using an array of length n and storing the solution to each subproblem (the first indices of the array are 1, 1, 2, 3, 5, 8, 13, 21)?
Fibonacci(n)
F1 = 1
F2 = 1
for i=3 to n
Fi=Fi-1 + Fi-2
return Fn
Dynamic programs can usually be succinctly described with recursive formulas.
But if you implement them with simple recursive computer programs, these are often inefficient for exactly the reason you raise: the same computation is repeated. Fibonacci is a example of repeated computation, though it is not a dynamic program.
There are two approaches to avoiding the repetition.
Memoization. The idea here is to cache the answer computed for each set of arguments to the recursive function and return the cached value when it exists.
Bottom-up table. Here you "unwind" the recursion so that results at levels less than i are combined to the result at level i. This is usually depicted as filling in a table, where the levels are rows.
One of these methods is implied for any DP algorithm. If computations are repeated, the algorithm isn't a DP. So the answer to your question is "yes."
So an example... Let's try the problem of making change of c cents given you have coins with values v_1, v_2, ... v_n, using a minimum number of coins.
Let N(c) be the minimum number of coins needed to make c cents. Then one recursive formulation is
N(c) = 1 + min_{i = 1..n} N(c - v_i)
The base cases are N(0)=0 and N(k)=inf for k<0.
To memoize this requires just a hash table mapping c to N(c).
In this case the "table" has only one dimension, which is easy to fill in. Say we have coins with values 1, 3, 5, then the N table starts with
N(0) = 0, the initial condition.
N(1) = 1 + min(N(1-1), N(1-3), N(1-5) = 1 + min(0, inf, inf) = 1
N(2) = 1 + min(N(2-1), N(2-3), N(2-5) = 1 + min(1, inf, inf) = 2
N(3) = 1 + min(N(3-1), N(3-3), N(3-5) = 1 + min(2, 0, inf) = 1
You get the idea. You can always compute N(c) from N(d), d < c in this manner.
In this case, you need only remember the last 5 values because that's the biggest coin value. Most DPs are similar. Only a few rows of the table are needed to get the next one.
The table is k-dimensional for k independent variables in the recursive expression.
We think of a dynamic programming approach to a problem if it has
overlapping subproblems
optimal substructure
In very simple words we can say dynamic programming has two faces, they are top-down and bottom-up approaches.
In your case, it is a top-down approach if you are talking about the recursion.
In the top-down approach, we will try to write a recursive solution or a brute-force solution and memoize the results so that we will try to use that result when a similar subproblem arrives, so it is brute-force + memoization. We can achieve that brute-force approach with a simple recursive relation.

Resources