Sqlite Query Optimization (using Limit and Offset) - sqlite

Following is the query that I use for getting a fixed number of records from a database with millions of records:-
select * from myTable LIMIT 100 OFFSET 0
What I observed is, if the offset is very high like say 90000, then it takes more time for the query to execute. Following is the time difference between 2 queries with different offsets:
select * from myTable LIMIT 100 OFFSET 0 //Execution Time is less than 1sec
select * from myTable LIMIT 100 OFFSET 95000 //Execution Time is almost 15secs
Can anyone suggest me how to optimize this query? I mean, the Query Execution Time should be same and fast for any number of records I wish to retrieve from any OFFSET.
Newly Added:-
The actual scenario is that I have got a database having > than 1 million records. But since it's an embedded device, I just can't do "select * from myTable" and then fetch all the records from the query. My device crashes. Instead what I do is I keep fetching records batch by batch (batch size = 100 or 1000 records) as per the query mentioned above. But as i mentioned, it becomes slow as the offset increases. So, my ultimate aim is that I want to read all the records from the database. But since I can't fetch all the records in a single execution, I need some other efficient way to achieve this.

As JvdBerg said, indexes are not used in LIMIT/OFFSET.
Simply adding 'ORDER BY indexed_field' will not help too.
To speed up pagination you should avoid LIMIT/OFFSET and use WHERE clause instead. For example, if your primary key field is named 'id' and has no gaps, than your code above can be rewritten like this:
SELECT * FROM myTable WHERE id>=0 AND id<100 //very fast!
SELECT * FROM myTable WHERE id>=95000 AND id<95100 //as fast as previous line!

By doing a query with a offset of 95000, all previous 95000 records are processed. You should make some index on the table, and use that for selecting records.

As #user318750 said, if you know you have a contiguous index, you can simply use
select * from Table where index >= %start and index < %(start+size)
However, those cases are rare. If you don't want to rely on that assumption, use a sub-query, for example using rowid, which is always indexed,
select * from Table where rowid in (
select rowid from Table limit %size offset %start)
This speeds things up especially if you have "fat" rows (e.g. that contain blobs).
If maintaining the record order is important (it usually isn't), you need to order the indices first:
select * from Table where rowid in (
select rowid from Table order by rowid limit %size offset %start)

select * from data where rowid = (select rowid from data limit 1 offset 999999);

With SQLite, you don't need to get all rows returned at once in a big fat array, you can get called back for every row. This way, you can process the results as they come in, which should address both your crashing and performance issues.
I guess you're not using C as you would already be using a callback, but this technique should be available in any other language.
Javascript example (from : https://www.npmjs.com/package/sqlite3 )
db.each("SELECT rowid AS id, info FROM lorem", function(err, row) {
console.log(row.id + ": " + row.info);
});

Related

Calculating the percentage of dates (SQL Server)

I'm trying to add an auto-calculated field in SQL Server 2012 Express, that stores the % of project completion, by calculating the date difference by using:
ALTER TABLE dbo.projects
ADD PercentageCompleted AS (select COUNT(*) FROM projects WHERE project_finish > project_start) * 100 / COUNT(*)
But I am getting this error:
Msg 1046, Level 15, State 1, Line 2
Subqueries are not allowed in this context. Only scalar expressions are allowed.
What am I doing wrong?
Even if it would be possible (it isn't), it is anyway not something you would want to have as a caculated column:
it will be the same value in each row
the entire table would need to be updated after every insert/update
You should consider doing this in a stored procedure or a user defined function instead.Or even better in the business logic of your application,
I don't think you can do that. You could write a trigger to figure it out or do it as part of an update statement.
Are you storing "percentageCompleted" as a duplicated column value in the same table as your project data?
If this is the case, I would not recommend this, because it would duplicate the data.
If you don't care about duplicate data, try something separating the steps out like this:
ALTER TABLE dbo.projects
ADD PercentageCompleted decimal(2,2) --You could also store it as a varchar or char
declare #percentageVariable decimal(2,2)
select #percentageVariable = (select count(*) from projects where Project_finish > project_start) / (select count(*) from projects) -- need to get ratio by completed/total
update projects
set PercentageCompleted = #percentageVariable
this will give you a decimal value in that table, then you can format it on select if you desire to % + PercentageCompleted * 100

Efficient paging in SQLite with millions of records

I need to show the SQLite results in a list view. Of course, I need to page the results.
The first option is to use the LIMIT clause. For example:
SELECT * FROM Table LIMIT 100, 5000
It returns records 5001 to 5100. The problem is that internally SQLite "reads" the first 5000 records and it is not too efficient.
What is the best approach for paging when there are a lot of records?
Please note that you always have to use an ORDER BY clause; otherwise, the order is arbitrary.
To do efficient paging, save the first/last displayed values of the ordered field(s), and continue just after them when displaying the next page:
SELECT *
FROM MyTable
WHERE SomeColumn > LastValue
ORDER BY SomeColumn
LIMIT 100;
(This is explained with more detail on the SQLite wiki.)
When you have multiple sort columns (and SQLite 3.15 or later), you can use a row value comparison for this:
SELECT *
FROM MyTable
WHERE (SomeColumn, OtherColumn) > (LastSome, LastOther)
ORDER BY SomeColumn, OtherColumn
LIMIT 100;

Does a multi-column index work for single column selects too?

I've got (for example) an index:
CREATE INDEX someIndex ON orders (customer, date);
Does this index only accelerate queries where customer and date are used or does it accelerate queries for a single-column like this too?
SELECT * FROM orders WHERE customer > 33;
I'm using SQLite.
If the answer is yes, why is it possible to create more than one index per table?
Yet another question: How much faster is a combined index compared with two separat indexes when you use both columns in a query?
marc_s has the correct answer to your first question. The first key in a multi key index can work just like a single key index but any subsequent keys will not.
As for how much faster the composite index is depends on your data and how you structure your index and query, but it is usually significant. The indexes essentially allow Sqlite to do a binary search on the fields.
Using the example you gave if you ran the query:
SELECT * from orders where customer > 33 && date > 99
Sqlite would first get all results using a binary search on the entire table where customer > 33. Then it would do a binary search on only those results looking for date > 99.
If you did the same query with two separate indexes on customer and date, Sqlite would have to binary search the whole table twice, first for the customer and again for the date.
So how much of a speed increase you will see depends on how you structure your index with regard to your query. Ideally, the first field in your index and your query should be the one that eliminates the most possible matches as that will give the greatest speed increase by greatly reducing the amount of work the second search has to do.
For more information see this:
http://www.sqlite.org/optoverview.html
I'm pretty sure this will work, yes - it does in MS SQL Server anyway.
However, this index doesn't help you if you need to select on just the date, e.g. a date range. In that case, you might need to create a second index on just the date to make those queries more efficient.
Marc
I commonly use combined indexes to sort through data I wish to paginate or request "streamily".
Assuming a customer can make more than one order.. and customers 0 through 11 exist and there are several orders per customer all inserted in random order. I want to sort a query based on customer number followed by the date. You should sort the id field as well last to split sets where a customer has several identical dates (even if that may never happen).
sqlite> CREATE INDEX customer_asc_date_asc_index_asc ON orders
(customer ASC, date ASC, id ASC);
Get page 1 of a sorted query (limited to 10 items):
sqlite> SELECT id, customer, date FROM orders
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2653|1|1303828585
2520|1|1303828713
2583|1|1303829785
1828|1|1303830446
1756|1|1303830540
1761|1|1303831506
2442|1|1303831705
2523|1|1303833761
2160|1|1303835195
2645|1|1303837524
Get the next page:
sqlite> SELECT id, customer, date FROM orders WHERE
(customer = 1 AND date = 1303837524 and id > 2645) OR
(customer = 1 AND date > 1303837524) OR
(customer > 1)
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2515|1|1303837914
2370|1|1303839573
1898|1|1303840317
1546|1|1303842312
1889|1|1303843243
2439|1|1303843699
2167|1|1303849376
1544|1|1303850494
2247|1|1303850869
2108|1|1303853285
And so on...
Having the indexes in place reduces server side index scanning when you would otherwise use a query OFFSET coupled with a LIMIT. The query time gets longer and the drives seek harder the higher the offset goes. Using this method eliminates that.
Using this method is advised if you plan on joining data later but only need a limited set of data per request. Join against a SUBSELECT as described above to reduce memory overhead for large tables.

SQLite - getting number of rows in a database

I want to get a number of rows in my table using max(id). When it returns NULL - if there are no rows in the table - I want to return 0. And when there are rows I want to return max(id) + 1.
My rows are being numbered from 0 and autoincreased.
Here is my statement:
SELECT CASE WHEN MAX(id) != NULL THEN (MAX(id) + 1) ELSE 0 END FROM words
But it is always returning me 0. What have I done wrong?
You can query the actual number of rows withSELECT Count(*) FROM tblName
see https://www.w3schools.com/sql/sql_count_avg_sum.asp
If you want to use the MAX(id) instead of the count, after reading the comments from Pax then the following SQL will give you what you want
SELECT COALESCE(MAX(id)+1, 0) FROM words
In SQL, NULL = NULL is false, you usually have to use IS NULL:
SELECT CASE WHEN MAX(id) IS NULL THEN 0 ELSE (MAX(id) + 1) END FROM words
But, if you want the number of rows, you should just use count(id) since your solution will give 10 if your rows are (0,1,3,5,9) where it should give 5.
If you can guarantee you will always ids from 0 to N, max(id)+1 may be faster depending on the index implementation (it may be faster to traverse the right side of a balanced tree rather than traversing the whole tree, counting.
But that's very implementation-specific and I would advise against relying on it, not least because it locks your performance to a specific DBMS.
Not sure if I understand your question, but max(id) won't give you the number of lines at all. For example if you have only one line with id = 13 (let's say you deleted the previous lines), you'll have max(id) = 13 but the number of rows is 1. The correct (and fastest) solution is to use count(). BTW if you wonder why there's a star, it's because you can count lines based on a criteria.
I got same problem if i understand your question correctly, I want to know the last inserted id after every insert performance in SQLite operation. i tried the following statement:
select * from table_name order by id desc limit 1
The id is the first column and primary key of the table_name, the mentioned statement show me the record with the largest id.
But the premise is u never deleted any row so the numbers of id equal to the numbers of rows.
Extension of VolkerK's answer, to make code a little more readable, you can use AS to reference the count, example below:
SELECT COUNT(*) AS c from profile
This makes for much easier reading in some frameworks, for example, i'm using Exponent's (React Native) Sqlite integration, and without the AS statement, the code is pretty ugly.

How to get the number of rows of the selected result from sqlite3?

I want to get the number of selected rows as well as the selected data. At the present I have to use two sql statements:
one is
select * from XXX where XXX;
the other is
select count(*) from XXX where XXX;
Can it be realised with a single sql string?
I've checked the source code of sqlite3, and I found the function of sqlite3_changes(). But the function is only useful when the database is changed (after insert, delete or update).
Can anyone help me with this problem? Thank you very much!
SQL can't mix single-row (counting) and multi-row results (selecting data from your tables). This is a common problem with returning huge amounts of data. Here are some tips how to handle this:
Read the first N rows and tell the user "more than N rows available". Not very precise but often good enough. If you keep the cursor open, you can fetch more data when the user hits the bottom of the view (Google Reader does this)
Instead of selecting the data directly, first copy it into a temporary table. The INSERT statement will return the number of rows copied. Later, you can use the data in the temporary table to display the data. You can add a "row number" to this temporary table to make paging more simple.
Fetch the data in a background thread. This allows the user to use your application while the data grid or table fills with more data.
try this way
select (select count() from XXX) as count, *
from XXX;
select (select COUNT(0)
from xxx t1
where t1.b <= t2.b
) as 'Row Number', b from xxx t2 ORDER BY b;
just try this.
You could combine them into a single statement:
select count(*), * from XXX where XXX
or
select count(*) as MYCOUNT, * from XXX where XXX
To get the number of unique titles, you need to pass the DISTINCT clause to the COUNT function as the following statement:
SELECT
COUNT(DISTINCT column_name)
FROM
'table_name';
Source: http://www.sqlitetutorial.net/sqlite-count-function/
For those who are still looking for another method, the more elegant one I found to get the total of row was to use a CTE.
this ensure that the count is only calculated once :
WITH cnt(total) as (SELECT COUNT(*) from xxx) select * from xxx,cnt
the only drawback is if a WHERE clause is needed, it should be applied in both main query and CTE query.
In the first comment, Alttag said that there is no issue to run 2 queries. I don't agree with that unless both are part of a unique transaction. If not, the source table can be altered between the 2 queries by any INSERT or DELETE from another thread/process. In such case, the count value might be wrong.
Once you already have the select * from XXX results, you can just find the array length in your program right?
If you use sqlite3_get_table instead of prepare/step/finalize you will get all the results at once in an array ("result table"), including the numbers and names of columns, and the number of rows. Then you should free the result with sqlite3_free_table
int rows_count = 0;
while (sqlite3_step(stmt) == SQLITE_ROW)
{
rows_count++;
}
// The rows_count is available for use
sqlite3_reset(stmt); // reset the stmt for use it again
while (sqlite3_step(stmt) == SQLITE_ROW)
{
// your code in the query result
}

Resources