Is there a way to stop a sqlite3 total() early when over a certain value? - sqlite

I have this sample table:
NAME SIZE
sam 100
skr 200
sss 50
thu 150
I want to do this query:
select total(size) > 300 from sample;
but my table is very big, so I want it to stop computing total(size) early if it's already greater than 300, instead of going through the entire table. (There are no negative sizes in my table.) Is there any way to do this in SQLite?

I've found a way to allow it to stop early, by using a window function, but unfortunately it makes it slower for a different reason. I hope someone else has a way to do this faster. To truly make it fast, you might need to create a custom aggregate function.
Window function method
A normal aggregate function like total() will always add all of the rows its given, but you can use an aggregate window function instead to add only some of the rows:
select name, size,
total(size) over (rows between unbounded preceding
and current row)
from sample
will give you
sam|100|100.0
skr|200|300.0
sss|50|350.0
thu|150|500.0
in which the third column is a cumulative sum. You can see in this result that you'd like to stop this query once you see the 350. You can do this by putting the above query into a subquery and using the EXISTS operator:
select exists(
select 1
from (select total(size) over (rows between unbounded preceding
and current row)
as total_size
from sample)
where total_size > 300)
This will filter the query to only the rows > 300, and then stop and return true (1) as soon as it finds one of them. If it never finds a row with that sum, it returns false (0).
However: this version can take longer than simply
select total(size) > 300 from sample
because it re-calculates the sum for each row, instead of just adding the next row's size to the running total.

Related

Count with limit and offset in sqlite

am am trying to write a function in python to use sqlite and while I managed to get it to work there is a behavior in sqlite that I dont understand when using the count command. when I run the following sqlite counts as expected, ie returns an int.
SELECT COUNT (*) FROM Material WHERE level IN (?) LIMIT 10
however when I add, shown below, an offset to the end sqlite returns an emply list, in other words nothing.
SELECT COUNT (*) FROM Material WHERE level IN (?) LIMIT 10 OFFSET 82
while omitting the offset is an easy fix I don't understand why sqlite returns nothing. Is this the expected behavior for the command I gave?
thanks for reading
When you execute that COUNT(*) it will return you only a single row.
The LIMIT function is for limiting the number of rows returned. You are setting the limit to 10 which doesn't have any effect here (Because it is returning only a single row).
OFFSET is for offsetting/skipping specified number of rows. Which also doesn't have any effect here.
In simple terms your query translates to COUNT number of rows, then return 10 rows starting from 83rd position. Since you've a single row it will always return empty.
Read about LIMIT and OFFSET

How to create sequence with maximum row value of a column in HSQL DB?

I need to create a sequence that starts from the maximum row value of column in HSQL. Is there any procedure we can right?
CREATE SEQUENCE seq START WITH 1 INCREMENT BY 1 => Works fine
instead if I give like this
CREATE SEQUENCE seq START WITH SELECT MAX(ID) FROM TEST INCREMENT BY 1
shows error:unexpected token: SELECT / Error Code: -5581 / State: 42581
I'm not sure that I understand your question properly, but in H2 you can use
CREATE SEQUENCE SEQ START WITH 9223372036854775807 INCREMENT BY -1
It will return values 9223372036854775807, 9223372036854775806, 9223372036854775805, …
In HSQLDB you can use the very similar
CREATE SEQUENCE SEQ START WITH 2147483647 INCREMENT BY -1
For columns that can't hold such large values the initial value should be decreased.
If you have a some table with existing values and want to find the largest of them and use it as a start value of the sequence, in H2 you can use a subquery directly:
CREATE SEQUENCE SEQ START WITH (SELECT MAX(columnName) FROM tableName)
You may want to add 1 to result of the query.
I don't know how to do that in HSQLDB only with SQL.
Maybe you mean something else, you definitely should edit your question and provide more details.

HOW TO SELECT RANDOM ROWS IN mb maria

SELECT col1 FROM tbl ORDER BY RAND() LIMIT 10;
This can work fine for small tables. However, for big table, it will have a serious performance problem as in order to generate the list of random rows, MySQL need to assign random number to each row and then sort them.
Even if you want only 10 random rows from a set of 100k rows, MySQL need to sort all the 100k rows and then, extract only 10 of them.
My solution for this problem, is to use RAND in the WHERE clause and not in the ORDER BY clause. First, you need to calculate the fragment of your desired result set rows number from the total rows in your table. Second, use this fragment in the WHERE clause and ask only for RAND numbers that smallest (or equal) from this fragment.
SELECT col1 FROM tbl WHERE RAND()<=0.0005;
In order to get exactly 100 row in the result set, we can increase the fragment number a bit and limit the query:
For example:

sqlite: query to add (subtract) cells from adjacent rows and put result in new column

I am examining a .sqlite file in FireFox's SQLite Manager and need to see if any data was not collected. An example is worth a thousand words:
ReadDate ReadValue
1361900350183.00 137
1361899753183.00 139
1361900053183.00 138
The are no primary keys and the table is NOT sorted by ReadDate or time. [Changing the input table is not an option!]
What I'd like to do is produce with simple SQL a table that looks like this:
ReadDate ReadValue TimeOffset
1361899753183.00 139
1361900053183.00 138 300000 // this is ReadDate(1) - ReadDate(0)
1361900350183.00 137 297000 // this is ReadDate(2) - ReadDate(1)
This would allow me to inspect the data and see if any data values were not captured (TimeOffset would be much greater than 300000). I could also write an additional query to get a COUNT of all TimeOffsets beyond a threshold.
I'm having trouble getting going on what I imagine is a simple exercise. I know how to do joins and sorts (order by), but here I need to compare one row to another. Do I need a cursor? And how to get the extra column? I have a gut feeling that if I just knew the vocabulary a little better, I'd be able to come up with the search terms and find the answer quickly.
Many thanks,
Dave
First, add an (empty) column to your table:
ALTER TABLE MyTable ADD COLUMN TimeOffset NUMERIC;
Then, the TimeOffset for each record is the difference between the ReadDate column of this record and of the record with the next smaller ReadDate, i.e, the record with the largest ReadDate that is still smaller than this one's:
UPDATE MyTable
SET TimeOffset = ReadDate - (SELECT MAX(ReadDate)
FROM MyTable AS t2
WHERE t2.ReadDate < MyTable.ReadDate);

Sqlite group_concat ordering

In Sqlite I can use group_concat to do:
1...A
1...B
1...C
2...A
2...B
2...C
1...C,B,A
2...C,B,A
but the order of the concatenation is random - according to docs.
I need to sort the output of group_concat to be
1...A,B,C
2...A,B,C
How can I do this?
Can you not use a subselect with the order by clause in, and then group concat the values?
Something like
SELECT ID, GROUP_CONCAT(Val)
FROM (
SELECT ID, Val
FROM YourTable
ORDER BY ID, Val
)
GROUP BY ID;
To be more precise, according to the docs:
The order of the concatenated elements is arbitrary.
It does not really mean random, it just means that the developers reserve the right to use whatever ordering they whish, even different ones for different queries or in different SQLite versions.
With the current version, this ordering might be the one implied by Adrian Stander's answer, as his code does seem to work. So you might just guard yourself with some unit tests and call it a day. But without examining the source code of SQLite really closely you can never be 100% sure this will always work.
If you are willing to build SQLite from source, you can also try to write your own user-defined aggregate function, but there is an easier way.
Fortunately, since version 3.25.0, you have window functions, providing a guaranteed-to-work, although somewhat ugly solution to your problem.
As you can see in the documentation, window functions have their own ORDER BY clauses:
In the example above, the window frame consists of all rows between the previous row ("1 PRECEDING") and the following row ("1 FOLLOWING"), inclusive, where rows are sorted according to the ORDER BY clause in the window-defn (in this case "ORDER BY a").
Note, that this alone would not necessarily mean that all aggregate functions respect the ordering inside a window frame, but if you take a look at the unit tests, you can see this is actually the case:
do_execsql_test 4.10.1 {
SELECT a,
count() OVER (ORDER BY a DESC),
group_concat(a, '.') OVER (ORDER BY a DESC)
FROM t2 ORDER BY a DESC
} {
6 1 6
5 2 6.5
4 3 6.5.4
3 4 6.5.4.3
2 5 6.5.4.3.2
1 6 6.5.4.3.2.1
0 7 6.5.4.3.2.1.0
}
So, to sum it up, you can write
SELECT ID, GROUP_CONCAT(Val) OVER (PARTITION BY ID ORDER BY Val) FROM YourTable;
resulting in:
1|A
1|A,B
1|A,B,C
2|A
2|A,B
2|A,B,C
Which unfortunately also contains every prefix of your desired aggregations. Instead you want to specify the window frames to always contain the full range, then discard the redundant values, like this:
SELECT DISTINCT ID, GROUP_CONCAT(Val)
OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM YourTable;
or like this:
SELECT * FROM (
SELECT ID, GROUP_CONCAT(Val)
OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM YourTable
)
GROUP BY ID;
Stumbling upon the underlying sorting-problem I tried this:
(... on 10.4.18-MariaDB)
select GROUP_CONCAT(ex.ID) as ID_list
FROM (
SELECT usr.ID
FROM (
SELECT u1.ID as ID
FROM table_users u1
) usr
GROUP BY ID
) ex
... and found the serialized ID_list ordered!
But I don't have an explanation for this now "correct" (?) result.

Resources