Efficient paging in SQLite with millions of records - sqlite

I need to show the SQLite results in a list view. Of course, I need to page the results.
The first option is to use the LIMIT clause. For example:
SELECT * FROM Table LIMIT 100, 5000
It returns records 5001 to 5100. The problem is that internally SQLite "reads" the first 5000 records and it is not too efficient.
What is the best approach for paging when there are a lot of records?

Please note that you always have to use an ORDER BY clause; otherwise, the order is arbitrary.
To do efficient paging, save the first/last displayed values of the ordered field(s), and continue just after them when displaying the next page:
SELECT *
FROM MyTable
WHERE SomeColumn > LastValue
ORDER BY SomeColumn
LIMIT 100;
(This is explained with more detail on the SQLite wiki.)
When you have multiple sort columns (and SQLite 3.15 or later), you can use a row value comparison for this:
SELECT *
FROM MyTable
WHERE (SomeColumn, OtherColumn) > (LastSome, LastOther)
ORDER BY SomeColumn, OtherColumn
LIMIT 100;

Related

how to select first 200 rows in oracle without full table scan

I need to search first 200 rows in my database with out full table can. If I scan full table it takes too much time because my table contain 160 million record. I am using oracle 11g.
Do you really need to avoid a FTS in this case as I expect
SELECT * FROM table WHERE ROWNUM <= 200;
runs pretty fast and starts returning results immediately despite a FTS even with a table containing millions of rows.

Sqlite Query Optimization (using Limit and Offset)

Following is the query that I use for getting a fixed number of records from a database with millions of records:-
select * from myTable LIMIT 100 OFFSET 0
What I observed is, if the offset is very high like say 90000, then it takes more time for the query to execute. Following is the time difference between 2 queries with different offsets:
select * from myTable LIMIT 100 OFFSET 0 //Execution Time is less than 1sec
select * from myTable LIMIT 100 OFFSET 95000 //Execution Time is almost 15secs
Can anyone suggest me how to optimize this query? I mean, the Query Execution Time should be same and fast for any number of records I wish to retrieve from any OFFSET.
Newly Added:-
The actual scenario is that I have got a database having > than 1 million records. But since it's an embedded device, I just can't do "select * from myTable" and then fetch all the records from the query. My device crashes. Instead what I do is I keep fetching records batch by batch (batch size = 100 or 1000 records) as per the query mentioned above. But as i mentioned, it becomes slow as the offset increases. So, my ultimate aim is that I want to read all the records from the database. But since I can't fetch all the records in a single execution, I need some other efficient way to achieve this.
As JvdBerg said, indexes are not used in LIMIT/OFFSET.
Simply adding 'ORDER BY indexed_field' will not help too.
To speed up pagination you should avoid LIMIT/OFFSET and use WHERE clause instead. For example, if your primary key field is named 'id' and has no gaps, than your code above can be rewritten like this:
SELECT * FROM myTable WHERE id>=0 AND id<100 //very fast!
SELECT * FROM myTable WHERE id>=95000 AND id<95100 //as fast as previous line!
By doing a query with a offset of 95000, all previous 95000 records are processed. You should make some index on the table, and use that for selecting records.
As #user318750 said, if you know you have a contiguous index, you can simply use
select * from Table where index >= %start and index < %(start+size)
However, those cases are rare. If you don't want to rely on that assumption, use a sub-query, for example using rowid, which is always indexed,
select * from Table where rowid in (
select rowid from Table limit %size offset %start)
This speeds things up especially if you have "fat" rows (e.g. that contain blobs).
If maintaining the record order is important (it usually isn't), you need to order the indices first:
select * from Table where rowid in (
select rowid from Table order by rowid limit %size offset %start)
select * from data where rowid = (select rowid from data limit 1 offset 999999);
With SQLite, you don't need to get all rows returned at once in a big fat array, you can get called back for every row. This way, you can process the results as they come in, which should address both your crashing and performance issues.
I guess you're not using C as you would already be using a callback, but this technique should be available in any other language.
Javascript example (from : https://www.npmjs.com/package/sqlite3 )
db.each("SELECT rowid AS id, info FROM lorem", function(err, row) {
console.log(row.id + ": " + row.info);
});

Efficient way to store chronological rows?

I've got a table like so:
CREATE TABLE IF NOT EXISTS grades(_id, timestamp, extra);
I want to create an index on "timestamp", so I'm doing:
CREATE INDEX idx_timestamp ON grades(timestamp);
I want to select 20 records at a time based off the timestamp then:
SELECT * FROM grades WHERE timestamp > 123 ORDER BY timestamp ASC LIMIT 20;
So, is there a more efficient way I can define the column "timestamp"? I'm just guessing that specifying it as an indexed column is all we can do, and specifying "ASC" for sort order is a no-op - or can I tell sqlite to store records presorted by timestamp in the first place?
I'm basically trying to implement a paging system, selecting a chronologically ordered page of 20 items at a time.
Thanks
This is fine. It will use the index to order, so it will increase the speed. although, depending what you are doing you will may want to cache some records.
SELECT * FROM grades WHERE timestamp > 123 ORDER BY timestamp ASC LIMIT 200;
We can now get 200 at a time, but on we now will let javascript or the server handle the paging. Basically you would keep track of where the paging is and then only when needed hit the database again. Also, by using the more limited WHERE clause for the timestamp, it's actually quite fast and efficient. Better than using the LIMIT N,M.
If you do the second option, you will be able to cache that query as well, depending how often it gets hit. So, if multiple people keeping querying that some thing, the database will cache it and it will come back really fast since it's already there.

Does a multi-column index work for single column selects too?

I've got (for example) an index:
CREATE INDEX someIndex ON orders (customer, date);
Does this index only accelerate queries where customer and date are used or does it accelerate queries for a single-column like this too?
SELECT * FROM orders WHERE customer > 33;
I'm using SQLite.
If the answer is yes, why is it possible to create more than one index per table?
Yet another question: How much faster is a combined index compared with two separat indexes when you use both columns in a query?
marc_s has the correct answer to your first question. The first key in a multi key index can work just like a single key index but any subsequent keys will not.
As for how much faster the composite index is depends on your data and how you structure your index and query, but it is usually significant. The indexes essentially allow Sqlite to do a binary search on the fields.
Using the example you gave if you ran the query:
SELECT * from orders where customer > 33 && date > 99
Sqlite would first get all results using a binary search on the entire table where customer > 33. Then it would do a binary search on only those results looking for date > 99.
If you did the same query with two separate indexes on customer and date, Sqlite would have to binary search the whole table twice, first for the customer and again for the date.
So how much of a speed increase you will see depends on how you structure your index with regard to your query. Ideally, the first field in your index and your query should be the one that eliminates the most possible matches as that will give the greatest speed increase by greatly reducing the amount of work the second search has to do.
For more information see this:
http://www.sqlite.org/optoverview.html
I'm pretty sure this will work, yes - it does in MS SQL Server anyway.
However, this index doesn't help you if you need to select on just the date, e.g. a date range. In that case, you might need to create a second index on just the date to make those queries more efficient.
Marc
I commonly use combined indexes to sort through data I wish to paginate or request "streamily".
Assuming a customer can make more than one order.. and customers 0 through 11 exist and there are several orders per customer all inserted in random order. I want to sort a query based on customer number followed by the date. You should sort the id field as well last to split sets where a customer has several identical dates (even if that may never happen).
sqlite> CREATE INDEX customer_asc_date_asc_index_asc ON orders
(customer ASC, date ASC, id ASC);
Get page 1 of a sorted query (limited to 10 items):
sqlite> SELECT id, customer, date FROM orders
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2653|1|1303828585
2520|1|1303828713
2583|1|1303829785
1828|1|1303830446
1756|1|1303830540
1761|1|1303831506
2442|1|1303831705
2523|1|1303833761
2160|1|1303835195
2645|1|1303837524
Get the next page:
sqlite> SELECT id, customer, date FROM orders WHERE
(customer = 1 AND date = 1303837524 and id > 2645) OR
(customer = 1 AND date > 1303837524) OR
(customer > 1)
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2515|1|1303837914
2370|1|1303839573
1898|1|1303840317
1546|1|1303842312
1889|1|1303843243
2439|1|1303843699
2167|1|1303849376
1544|1|1303850494
2247|1|1303850869
2108|1|1303853285
And so on...
Having the indexes in place reduces server side index scanning when you would otherwise use a query OFFSET coupled with a LIMIT. The query time gets longer and the drives seek harder the higher the offset goes. Using this method eliminates that.
Using this method is advised if you plan on joining data later but only need a limited set of data per request. Join against a SUBSELECT as described above to reduce memory overhead for large tables.

How to get the number of rows of the selected result from sqlite3?

I want to get the number of selected rows as well as the selected data. At the present I have to use two sql statements:
one is
select * from XXX where XXX;
the other is
select count(*) from XXX where XXX;
Can it be realised with a single sql string?
I've checked the source code of sqlite3, and I found the function of sqlite3_changes(). But the function is only useful when the database is changed (after insert, delete or update).
Can anyone help me with this problem? Thank you very much!
SQL can't mix single-row (counting) and multi-row results (selecting data from your tables). This is a common problem with returning huge amounts of data. Here are some tips how to handle this:
Read the first N rows and tell the user "more than N rows available". Not very precise but often good enough. If you keep the cursor open, you can fetch more data when the user hits the bottom of the view (Google Reader does this)
Instead of selecting the data directly, first copy it into a temporary table. The INSERT statement will return the number of rows copied. Later, you can use the data in the temporary table to display the data. You can add a "row number" to this temporary table to make paging more simple.
Fetch the data in a background thread. This allows the user to use your application while the data grid or table fills with more data.
try this way
select (select count() from XXX) as count, *
from XXX;
select (select COUNT(0)
from xxx t1
where t1.b <= t2.b
) as 'Row Number', b from xxx t2 ORDER BY b;
just try this.
You could combine them into a single statement:
select count(*), * from XXX where XXX
or
select count(*) as MYCOUNT, * from XXX where XXX
To get the number of unique titles, you need to pass the DISTINCT clause to the COUNT function as the following statement:
SELECT
COUNT(DISTINCT column_name)
FROM
'table_name';
Source: http://www.sqlitetutorial.net/sqlite-count-function/
For those who are still looking for another method, the more elegant one I found to get the total of row was to use a CTE.
this ensure that the count is only calculated once :
WITH cnt(total) as (SELECT COUNT(*) from xxx) select * from xxx,cnt
the only drawback is if a WHERE clause is needed, it should be applied in both main query and CTE query.
In the first comment, Alttag said that there is no issue to run 2 queries. I don't agree with that unless both are part of a unique transaction. If not, the source table can be altered between the 2 queries by any INSERT or DELETE from another thread/process. In such case, the count value might be wrong.
Once you already have the select * from XXX results, you can just find the array length in your program right?
If you use sqlite3_get_table instead of prepare/step/finalize you will get all the results at once in an array ("result table"), including the numbers and names of columns, and the number of rows. Then you should free the result with sqlite3_free_table
int rows_count = 0;
while (sqlite3_step(stmt) == SQLITE_ROW)
{
rows_count++;
}
// The rows_count is available for use
sqlite3_reset(stmt); // reset the stmt for use it again
while (sqlite3_step(stmt) == SQLITE_ROW)
{
// your code in the query result
}

Resources