Comparison operators behave differently after indexing - sqlite

With this schema:
CREATE TABLE temperatures(
sometext TEXT,
lowtemp INT,
hightemp INT,
moretext TEXT);
When I do search
select * from temperatures where lowtemp < 20 and hightemp > 20;
I get the correct result which is always one record (due to the specifics of the data).
Now, when I index the table:
CREATE INDEX ltemps ON temperatures(lowtemp);
CREATE INDEX htemps ON temperatures(hightemp);
The exact same query above stops providing expected results -- now I get many records, including ones where the lowtemp and hightemp obviously don't meet the comparison test.
I'm running this on the same sqlite3 database, same table. The only difference is adding the above 2 index statements after table creation.
Can someone explain how indexing influences this behavior?

Related

Best index to cover a mix of exact match and less/greater than query in sqlite

I have a table which I need to filter on (using sqlite), it has 3 fields in the query:
WHERE x <= 'something' AND y = 'something' AND z = 'SOMETHING ELSE'
ORDER BY x DESC
I was wondering what's the best index to cover this query.
I have tried a few, for example:
CREATE INDEX idx_x_y_z ON user_messages(
x, y, z
);
CREATE INDEX idx_y_z ON user_messages(
y, z
);
but the best I can get is:
SEARCH TABLE table USING INDEX idx_y_z
USE TEMP B-TREE FOR ORDER BY
Is that optimal or I can avoid the USE TEMP B-TREE FOR ORDER BY?
By reading https://explainextended.com/2009/04/01/choosing-index/ it seems that to be the case, but since the query is slightly different (we order by a field we filter on), I was wondering if maybe is not exactly the same.
Also, I am struggling to find good resources on this, a lot of it addresses the most common scenarios, while it's a bit harder to find more in depth resources, do you have any suggestion?
Thanks!
UPDATE:
Turns out there was another issue, I have oversimplified the schema in the original question.
One of the fields had a type of BOOLEAN, and I was matching it by using the IS FALSE operator, which would return the right number of rows, while = 0 would not for some reasons.
When querying with = it would not use a TEMP B-TREE, while it would when using IS FALSE.
To address this issue I have just created an index excluding the BOOLEAN field, and B-TREE was not used anymore, only SEARCH-TABLE.
From the query planner documentation:
Then the index might be used if the initial columns of the index (columns a, b, and so forth) appear in WHERE clause terms. The initial columns of the index must be used with the = or IN or IS operators. The right-most column that is used can employ inequalities.
So since your WHERE has two exact comparisons and one less than or equal, that one should come last in an index for best effect:
CREATE INDEX idx_y_z_x ON user_messages(y, z, x);
Using that index with a query with your WHERE terms:
sqlite> EXPLAIN QUERY PLAN SELECT * FROM user_messages
...> WHERE x <= 'something' AND y = 'something' AND z = 'something'
...> ORDER BY x DESC;
QUERY PLAN
`--SEARCH TABLE user_messages USING INDEX idx_y_z_x (y=? AND z=? AND x<?)
As you can see, it fully uses the index, with no temporary table needed for sorting the results.
More, essential, reading about how sqlite uses indexes can be found here.

Why is SQLite query on two indexed columns so slow?

I have a table with around 65 million rows that I'm trying to run a simple query on. The table and indexes looks like this:
CREATE TABLE E(
x INTEGER,
t INTEGER,
e TEXT,
A,B,C,D,E,F,G,H,I,
PRIMARY KEY(x,t,e,I)
);
CREATE INDEX ET ON E(t);
CREATE INDEX EE ON E(e);
The query I'm running looks like this:
SELECT MAX(t), B, C FROM E WHERE e='G' AND t <= 9878901234;
I need to run this queries for thousands of different values of t and was expecting each query to run in a fraction of a second. However, the above query is taking nearly 10 seconds to run!
I tried running the query plan but only get this:
0|0|0|SEARCH TABLE E USING INDEX EE (e=?)
So this should be using the index. With a binary search I would expect worse case only 26 tests, which I would be pretty quick.
Why is my query so slow?
Each table in a query can use one index. Since your WHERE clause looks at multiple columns, you can use a multi-column index. For these, all but the last column used from the index has to test for equality; the last one used can be used for greater than/less than.
So:
CREATE INDEX e_idx_e_t ON E(e, t);
should give you a boost.
For further reading about how Sqlite uses indexes, the Query Planner documentation is a good introduction.
You're also mixing an aggregate function (max(t)) and columns (B and C) that aren't part of a group. In Sqlite's case, this means that it will pick values for B and C from the row with the maximum t value; other databases usually throw an error.

How to Query for Recent Rows in SQLITE3

I'm using SQLite3 and trying to query for recent rows. So I'm having SQLite3 insert a unix timestamp into each row with strftime('%s','now'). My Table looks like this:
CREATE TABLE test(id INTEGER PRIMARY KEY, time);
INSERT INTO test (time) VALUES (strftime('%s','now')); --Repeated
SELECT * FROM test;
1|1516816522
2|1516816634
3|1516816646 --etc lots of rows
Now I want to query for only recent entries, for example, I'm trying to get all rows with a time within the last hour. I'm trying the following SQL query:
SELECT * FROM test WHERE time > strftime('%s','now')-60*60;
However, that always returns all rows regardless of the value in the time column. I really don't know what's going on.
Also, if I put WHERE time > strftime('%s','now') it'll return nothing (which is expected) but if I put WHERE time > strftime('%s','now')-1 then it'll return everything. I don't know why.
Here's one more example:
sqlite> SELECT , strftime('%s','now')-1 AS window FROM test WHERE time > window;
1|1516816522|1516817482
2|1516816634|1516817482
3|1516816646|1516817482
It seems that SQLite3 thinks the values in the middle column are greater than the values in the right column!?
This isn't at all what I expect. Can someone please tell me what's going on? Thanks!
The purpose of strftime() is to format values, so it returns a string.
When you try to do computations with its return value, the database must convert it into a number. And numbers and strings cannot be compared directly with each other.
You must ensure that both values in a comparison have the same data type.
The best way to do this is to store numbers in the table:
INSERT INTO test (time)
VALUES (CAST(strftime('%s','now') AS MAKE_THIS_A_NUMBER_PLEASE));
(Or just declare the column type as something with numeric affinity.)

sqlite: query to add (subtract) cells from adjacent rows and put result in new column

I am examining a .sqlite file in FireFox's SQLite Manager and need to see if any data was not collected. An example is worth a thousand words:
ReadDate ReadValue
1361900350183.00 137
1361899753183.00 139
1361900053183.00 138
The are no primary keys and the table is NOT sorted by ReadDate or time. [Changing the input table is not an option!]
What I'd like to do is produce with simple SQL a table that looks like this:
ReadDate ReadValue TimeOffset
1361899753183.00 139
1361900053183.00 138 300000 // this is ReadDate(1) - ReadDate(0)
1361900350183.00 137 297000 // this is ReadDate(2) - ReadDate(1)
This would allow me to inspect the data and see if any data values were not captured (TimeOffset would be much greater than 300000). I could also write an additional query to get a COUNT of all TimeOffsets beyond a threshold.
I'm having trouble getting going on what I imagine is a simple exercise. I know how to do joins and sorts (order by), but here I need to compare one row to another. Do I need a cursor? And how to get the extra column? I have a gut feeling that if I just knew the vocabulary a little better, I'd be able to come up with the search terms and find the answer quickly.
Many thanks,
Dave
First, add an (empty) column to your table:
ALTER TABLE MyTable ADD COLUMN TimeOffset NUMERIC;
Then, the TimeOffset for each record is the difference between the ReadDate column of this record and of the record with the next smaller ReadDate, i.e, the record with the largest ReadDate that is still smaller than this one's:
UPDATE MyTable
SET TimeOffset = ReadDate - (SELECT MAX(ReadDate)
FROM MyTable AS t2
WHERE t2.ReadDate < MyTable.ReadDate);

SQLite - Update with random unique value

I am trying to populate everyrow in a column with random ranging from 0 to row count.
So far I have this
UPDATE table
SET column = ABS (RANDOM() % (SELECT COUNT(id) FROM table))
This does the job but produces duplicate values, which turned out to be bad. I added a Unique constraint but that just causes it to crash.
Is there a way to update a column with random unique values from certain range?
Thanks!
If you want to later read the records in a random order, you can just do the ordering at that time:
SELECT * FROM MyTable ORDER BY random()
(This will not work if you need the same order in multiple queries.)
Otherwise, you can use a temporary table to store the random mapping between the rowids of your table and the numbers 1..N.
(Those numbers are automatically generated by the rowids of the temporary table.)
CREATE TEMP TABLE MyOrder AS
SELECT rowid AS original_rowid
FROM MyTable
ORDER BY random();
UPDATE MyTable
SET MyColumn = (SELECT rowid
FROM MyOrder
WHERE original_rowid = MyTable.rowid) - 1;
DROP TABLE MyOrder;
What you seem to be seeking is not simply a set of random numbers, but rather a random permutation of the numbers 1..N. This is harder to do. If you look in Knuth (The Art of Computer Programming), or in Bentley (Programming Pearls or More Programming Pearls), one suggested way is to create an array with the values 1..N, and then for each position, swap the current value with a randomly selected other value from the array. (I'd need to dig out the books to check whether it is any arbitrary position in the array, or only with a value following it in the array.) In your context, then you apply this permutation to the rows in the table under some ordering, so row 1 under the ordering gets the value in the array at position 1 (using 1-based indexing), etc.
In the 1st Edition of Programming Pearls, Column 11 Searching, Bentley says:
Knuth's Algorithm P in Section 3.4.2 shuffles the array X[1..N].
for I := 1 to N do
Swap(X[I], X[RandInt(I,N)])
where the RandInt(n,m) function returns a random integer in the range [n..m] (inclusive). That's nothing if not succinct.
The alternative is to have your code thrashing around when there is one value left to update, waiting until the random number generator picks the one value that hasn't been used yet. As a hit and miss process, that can take a while, especially if the number of rows in total is large.
Actually translating that into SQLite is a separate exercise. How big is your table? Is there a convenient unique key on it (other than the one you're randomizing)?
Given that you have a primary key, you can easily generate an array of structures such that each primary key is allocated a number in the range 1..N. You then use Algorithm P to permute the numbers. Then you can update the table from the primary keys with the appropriate randomized number. You might be able to do it all with a second (temporary) table in SQL, especially if SQLite supports UPDATE statements with a join between two tables. But it is probably nearly as simple to use the array to drive singleton updates. You'd probably not want a unique constraint on the random number column while this update is in progress.

Resources