sqlite: insert or update a row, performance issue

sqlite: insert or update a row, performance issue - sqlite

sqlite table:
CREATE TABLE IF NOT EXISTS INFO
(
uri TEXT PRIMARY KEY,
cap INTEGER,
/* some other columns */
uid TEXT
);
INFO table has 5K+ rows and is run on a low power device (comparable to a 3 year old mobile phone).
I have this task: insert new URI with some values into INFO table, however, if URI is already present, then I need to update uid text field by appending extra text to it if the extra text to be appended isn't found within existing uid string; all other fields should remain unchanged.
As an example: INFO already has uri="http://example.com" with this uid string: "|123||abc||xxx|".
I need to add uri="http://example.com" and uid="|abc|". Since "|abc|" is a substring within existing field for the uri, then nothing should be updated. In any case, remaining fields shouldn't be updated
To get it done I have these options:
build some sql query (if it's possible to do something like that with sqlite in one sql statement),
Do everything manually in two steps: a) retrieve row for uid, do all processing manually and b) update existing or insert a new row if needed
Considering this is constrained device, which way is preferable? What if I omit the the extra requirement of sub-string match and always append uid to existing uid field?

"If it is possible with sqlite in one sql statement":
Yes, it is possible. The "UPSERT" statement has been nicely discussed in this question.
Applied to your extended case, you could do it like this in one statement:
insert or replace into info (uri, cap, uid)
values ( 'http://example.com',
coalesce((select cap from info where uri = 'http://example.com'),'new cap'),
(select case
when (select uid
from info
where uri = 'http://example.com') is null
then '|abc|'
when instr( (select uid
from info
where uri = 'http://example.com'),'|abc|') = 0
then (select uid
from info
where uri = 'http://example.com') || '|abc|'
else (select uid
from info
where uri = 'http://example.com')
end )
);
Checking the EXPLAIN QUERY PLAN gives us
selectid order from detail
---------- ---------- ---------- -------------------------
0 0 0 EXECUTE SCALAR SUBQUERY 0
0 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
0 0 0 EXECUTE SCALAR SUBQUERY 1
1 0 0 EXECUTE SCALAR SUBQUERY 2
2 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 3
3 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 4
4 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
1 0 0 EXECUTE SCALAR SUBQUERY 5
5 0 0 SEARCH TABLE info USING INDEX sqlite_autoindex_INFO_1 (uri=?)
As far as I know, sqlite will not cache the results of scalar sub-queries (I could not find any evidence of chaching when looking at the VM code produced by EXPLAIN for the above statement). Hence, since sqlite is an in-process db, doing things with two separate statements will most likely perform better than this.
You might want to benchmark the runtimes for this - results will of course depend on your host language and the interface you use (C, JDBC, ODBC, ...).
EDIT
A little performance benchmark using the JDBC driver and sqlite 3.7.2, running 100.000 modifications on a base data set of 5000 rows in table info, 50% updates, 50% inserts, confirms the above conclusion:
Using three prepared statements (first a select, then followed by an update or insert, depending on the selected data): 702ms
Using the above combined statement: 1802ms
The runtimes are quite stable across several runs.

Related

How do I write a PLSQL trigger that increments a value?

I just started working with PL/SQL. The database is for a game I want to integrate into my discord bot. Both the DB and the bot are running on Oracle Cloud.
The DB has one table, players, consisting of a discord user id, they have a level initiated with 1, exp initiated with 0 and mana initiated with 100. For now, the first thing I wanted to implement was a TRIGGER that would activate when exp is updated on table players, check if exp reached the level up threshold and if so, reduce exp to 0 and increase the level by 1.
Right now, I have this:
CREATE OR REPLACE trigger level_up
BEFORE UPDATE
ON PLAYERS
FOR EACH ROW
WHEN (:new.EXP >= PLAYERS.LEVEL * 100)
begin
:new.EXP := 0;
:new.LEVEL := :old.LEVEL + 1;
end;
When I try to run this, I get the following error:
"Error: ORA-00904: : invalid identifier"
Nothing is highlighted and in SQL Developer, when I right-click it and then click "Go to source", it doesn't highlight anything and just throws the cursor to the beginning of the worksheet.
I have already tried a couple different things like
BEFORE UPDATE OF EXP ON PLAYERS
with the rest more or less the same and even tried working with AFTER UPDATE:
CREATE OR REPLACE trigger level_up
AFTER UPDATE
ON PLAYERS
FOR EACH ROW
begin
UPDATE players
SET players.exp = 0,
players.level = players.level + 1
WHERE players.exp < players.level * 100
end;
This gave me multiple errors though:
Error(6,9): PL/SQL: SQL Statement ignored
Error(10,5): PL/SQL: ORA-00933: SQL command not properly ended
Error(10,8): PLS-00103: Encountered the symbol "end-of-file" when expecting one of the following: ( begin case declare end exception exit for goto if loop mod null pragma raise return select update while with << continue close current delete fetch lock insert open rollback savepoint set sql execute commit forall merge pipe purge json_exists json_value json_query json_object json_array
At this point I am fully prepared to just abandon the oracle db and switch to mongodb or something, it's just bugging me out that I can't figure out what I am doing wrong.
Thank you for your time!

If the 1st trigger code is OK to you, then just fix what's wrong - why abandoning the whole idea of using Oracle? It's not its (Oracle's fault) you don't know how to use it.
Here's an example.
Table contains just two columns, necessary for this situation.
Column name can't be just level - that's name of a pseudocolumn in hierarchical query and is reserved word for that purpose; I renamed it to c_level (well, you can name it level, but it should then be enclosed into double quotes and you'd have to reference it that way - with double quotes - every time you access it, using exactly the same letter case. In Oracle, that's generally a bad idea).
SQL> create table players
2 (exp number,
3 c_level number
4 );
Table created.
Trigger: you should have removed colon when referencing the new pseudorecord in the when clause (while in trigger body you have to use it, the colon sign). Also, you don't reference c_level with table's name (players.c_level) but also using the pseudorecord.
SQL> create or replace trigger level_up
2 before update on players
3 for each row
4 when (new.exp >= new.c_level * 100)
5 begin
6 :new.exp := 0;
7 :new.c_level := :old.c_level + 1;
8 end;
9 /
Trigger created.
Let's try it. Initial row:
SQL> insert into players (exp, c_level) values (50, 1);
1 row created.
SQL> select * from players;
EXP C_LEVEL
---------- ----------
50 1
Let's update exp so that its value forces trigger to fire:
SQL> update players set exp = 101;
1 row updated.
New table contents:
SQL> select * from players;
EXP C_LEVEL
---------- ----------
0 2
SQL>
If that was your intention, then it kind of works.

WHEN A SEQUENCE 1ST TIME APPLY WITH INSERT STATEMENT IT START FROM 2 WHILE STARTWITH NOT DEFINE IT SHOULD START FROM 1 BUT NOT

WHEN A SEQUENCE 1ST TIME APPLY WITH INSERT STATEMENT IT START FROM 2 WHILE STARTWITH NOT DEFINE IT SHOULD START FROM 1 BUT NOT
CREATE TABLE ORA(ID NUMBER);
SELECT * FROM ORA;
CREATE SEQUENCE SEQ_ORA;
INSERT INTO ORA VALUES(SEQ_ORA.NEXTVAL);
SELECT * FROM ORA;
INSERT INTO ORA VALUES(SEQ_ORA.CURRVAL);
SELECT * FROM ORA;
--HERE IN USER IT START FROM 2 AND FROM DBA START FROM 1
--DROP TABLE ORA;
--DROP SEQUENCE SEQ_ORA;

This was the sort-of expected behaviour in Oracle 11.2.0.1 with deferred segment creation.
If you have access to Oracle support you can look at document 1050193.1, which shows the same thing happening.
You can change your table creation statement to:
CREATE TABLE ORA(ID NUMBER) SEGMENT CREATION IMMEDIATE
You can also change the default behaviour via the deferred_segment_creation system parameter.
In a comment in your example code you said:
HERE IN USER IT START FROM 2 AND FROM DBA START FROM 1
I assume that 'DBA' means you tested this in the SYS schema (which generally isn't a good idea). Deferred segment creation doesn't apply there; from the documentation:
Restrictions on Deferred Segment Creation
This clause is subject to the following restrictions:
You cannot defer segment creation for the following types of tables: index-organized tables, clustered tables, global temporary tables, session-specific temporary tables, internal tables, typed tables, AQ tables, external tables, and tables owned by SYS, SYSTEM, PUBLIC, OUTLN, or XDB.

sqlite index to optimize NOT queries

When I make an index in sqlite, it optimizes queries for that key. But sometimes I want to find records that exclude certain values of that key, and my index doesn't seem to be used to optimize those, at all. I can't figure out how to write an index to optimize a query that matches NOT on a key.
This is an example of what I'm talking about:
CREATE TABLE place (
id INTEGER PRIMARY KEY,
thing INTEGER NOT NULL,
thekey INTEGER NOT NULL,
unique(thing,thekey));
CREATE INDEX bythekeys ON place(thekey);
EXPLAIN QUERY PLAN SELECT thing FROM place WHERE NOT thekey = ?;
-- => 0 0 0 SCAN TABLE place
EXPLAIN QUERY PLAN SELECT thing FROM place WHERE thekey = ?;
-- => 0 0 0 SEARCH TABLE place USING INDEX bythekeys (thekey=?)
CREATE INDEX bythenotkeys ON ........?
The "bythekeys" index optimizes queries that look up records by that key, unless the query uses the logical negation of that lookup. Whether using NOT, or != it doesn't seem to make a difference. It always just scans the whole table without using any index. Do I make, like... a partial index or something? How do I optimize NOT queries?

The database assumes that a column contains many different values.
So using a != filter is estimated to not reduce the number of results enough to make the additional index lookups worthwhile.
If you actually have very few values, rewrite the query so that it uses equality (=), or a range (<, >).

how can I get faster FTS4 query results ordered by a field in another table?

Background
I'm implementing full-text search over a body of email messages stored in SQLite, making use of its fantastic built-in FTS4 engine. I'm getting some rather poor query performance, although not exactly where I would expect. Let's take a look.
Representative schema
I'll give some simplified examples of the code in question, with links to the full code where applicable.
We've got a MessageTable that stores the data about an email message (full version spread out over several files here, here, and here):
CREATE TABLE MessageTable (
id INTEGER PRIMARY KEY,
internaldate_time_t INTEGER
);
CREATE INDEX MessageTableInternalDateTimeTIndex
ON MessageTable(internaldate_time_t);
The searchable text is added to an FTS4 table named MessageSearchTable (full version here):
CREATE VIRTUAL TABLE MessageSearchTable USING fts4(
id INTEGER PRIMARY KEY,
body
);
The id in the search table acts as a foreign key to the message table.
I'll leave it as an exercise for the reader to insert data into these tables (I certainly can't give out my private email). I have just under 26k records in each table.
Problem query
When we retrieve search results, we need them to be ordered descending by internaldate_time_t so we can pluck out only the most recent few results. Here's an example search query (full version here):
SELECT id
FROM MessageSearchTable
JOIN MessageTable USING (id)
WHERE MessageSearchTable MATCH 'a'
ORDER BY internaldate_time_t DESC
LIMIT 10 OFFSET 0
On my machine, with my email, that runs in about 150 milliseconds, as measured via:
time sqlite3 test.db <<<"..." > /dev/null
150 milliseconds is no beast of a query, but for a simple FTS lookup and indexed order, it's sluggish. If I omit the ORDER BY, it completes in 10 milliseconds, for example. Also keep in mind that the actual query has one more sub-select, so there's a little more work going on in general: the full version of the query runs in about 600 milliseconds, which is into beast territory, and omitting the ORDER BY in that case shaves 500 milliseconds off the time.
If I turn on stats inside sqlite3 and run the query, I notice the line:
Sort Operations: 1
If my interpretation of the docs about those stats is correct, it looks like the query is completely skipping using the MessageTableInternalDateTimeTIndex. The full version of the query also has the line:
Fullscan Steps: 25824
Sounds like it's walking the table somewhere, but let's ignore that for now.
What I've discovered
So let's work on optimizing that a little bit. I can rearrange the query into a sub-select and force SQLite to use our index with the INDEXED BY extension:
SELECT id
FROM MessageTable
INDEXED BY MessageTableInternalDateTimeTIndex
WHERE id IN (
SELECT id
FROM MessageSearchTable
WHERE MessageSearchTable MATCH 'a'
)
ORDER BY internaldate_time_t DESC
LIMIT 10 OFFSET 0
Lo and behold, the running time has dropped to around 100 milliseconds (300 milliseconds in the full version of the query, a 50% reduction in running time), and there are no sort operations reported. Note that with just reorganizing the query like this but not forcing the index with INDEXED BY, there's still a sort operation (though we've still shaved off a few milliseconds oddly enough), so it appears that SQLite is indeed ignoring our index unless we force it.
I've also tried some other things to see if they'd make a difference, but they didn't:
Explicitly making the index DESC as described here, with and without INDEXED BY
Explicitly adding the id column in the index, with and without internaldate_time_t ordered DESC, with and without INDEXED BY
Probably several other things I can't remember at this moment
Questions
100 milliseconds here still seems awfully slow for what seems like it should be a simple FTS lookup and indexed order.
What's going on here? Why is it ignoring the obvious index unless you force its hand?
Am I hitting some limitation with combining data from virtual and regular tables?
Why is it still so relatively slow, and is there anything else I can do to get FTS matches ordered by a field in another table?
Thanks!

An index is useful for looking up a table row based on the value of the indexed column.
Once a table row is found, indexes are no longer useful because it is not efficient to look up a table row in an index by any other criterium.
An implication of this is that it is not possible to use more than one index for each table accessed in a query.
Also see the documentation: Query Planning, Query Optimizer.
Your first query has the following EXPLAIN QUERY PLAN output:
0 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
0 1 1 SEARCH TABLE MessageTable USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
What happens is that
the FTS index is used to find all matching MessageSearchTable rows;
for each row found in 1., the MessageTable primary key index is used to find the matching row;
all rows found in 2. are sorted with a temporary table;
the first 10 rows are returned.
Your second query has the following EXPLAIN QUERY PLAN output:
0 0 0 SCAN TABLE MessageTable USING COVERING INDEX MessageTableInternalDateTimeTIndex (~100000 rows)
0 0 0 EXECUTE LIST SUBQUERY 1
1 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
What happens is that
the FTS index is used to find all matching MessageSearchTable rows;
SQLite goes through all entries in the MessageTableInternalDateTimeTIndex in the index order, and returns a row when the id value is one of the values found in step 1.
SQLite stops after the tenth such row.
In this query, it is possible to use the index for (implied) sorting, but only because no other index is used for looking up rows in this table.
Using an index in this way implies that SQLite has to go through all entries, instead of lookup up the few rows that match some other condition.
When you omit the INDEXED BY clause from your second query, you get the following EXPLAIN QUERY PLAN output:
0 0 0 SEARCH TABLE MessageTable USING INTEGER PRIMARY KEY (rowid=?) (~25 rows)
0 0 0 EXECUTE LIST SUBQUERY 1
1 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
which is essentially the same as your first query, except that joins and subqueries are handled slightly differently.
With your table structure, it is not really possible to get faster.
You are doing three operations:
looking up rows in MessageSearchTable;
looking up corresponding rows in MessageTable;
sorting rows by a MessageTable value.
As far as indexes are concerned, steps 2 and 3 conflict with each other.
The database has to choose whether to use an index for step 2 (in which case sorting must be done explicitly) or for step 3 (in which case it has to go through all MessageTable entries).
You could try to return fewer records from the FTS search by making the message time a part of the FTS table and searching only for the last few days (and increasing or dropping the time if you don't get enough results).

Delete + order by in sqlite (Android)

I have a table for saving the ranking of my app with the following fields: [id,username,score] and I want to clean the table keeping only the top 100 entries.
How can I do this delete? I've tried DELETE FROM ranking ORDER BY score DESC LIMIT 100,999999999) but it returns an error:
Error: near "ORDER": syntax error
Other alternative I've considered is:
DELETE FROM ranking WHERE id NOT IN (SELECT id FROM ranking ORDER BY score
DESC LIMIT 100)
but I dont know if it is efficient enought

I suppose you're looking for this:
DELETE FROM ranking WHERE id NOT IN (
SELECT id FROM ranking ORDER BY score DESC LIMIT 100);
Here's SQL Fiddle illustrating the concept.
It's quite efficient (in fact, it's quite typical), as the nested query is executed once only. It actually depends more on whether the 'score' is covered by index - or not:
(without index):
EXPLAIN QUERY PLAN DELETE FROM ranking WHERE id NOT IN (
SELECT id FROM ranking AS ranking_subquery ORDER BY score DESC LIMIT 2);
--
selectid order from detail
0 0 0 SCAN TABLE ranking (~500000 rows)
0 0 0 EXECUTE LIST SUBQUERY 0
0 0 0 SCAN TABLE ranking AS ranking_subquery (~1000000 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
(after CREATE INDEX ts ON ranking(score);)
selectid order from detail
0 0 0 SCAN TABLE ranking (~500000 rows)
0 0 0 EXECUTE LIST SUBQUERY 0
0 0 0 SCAN TABLE ranking AS ranking_subquery USING INDEX ts (~1000000 rows)

All rows have built-in field rowid. Try this:
DELETE FROM [tbl_names] WHERE rowid not in
(select rowid from [tbl_name] order by score desc limit 100 )
You can read more about that here

Try this:
DELETE FROM ranking WHERE id NOT IN
(SELECT id FROM ranking ORDER BY SCORE limit 100)
Asssume your ID column doesn't have duplicates

you can try this
delete from ranking where id not in
(select top(100) * from ranking order by score)

The answers here are good approaches to deal with your situation. I wanted to add an answer to explain the source of your syntax error.
The ORDER and LIMIT clauses on DELETE are a compile-time option for sqlite. I just spent several hours learning this the hard way :D.
From https://www.sqlite.org/compile.html#enable_update_delete_limit:
SQLITE_ENABLE_UPDATE_DELETE_LIMIT
This option enables an optional ORDER BY and LIMIT clause on UPDATE
and DELETE statements.
If this option is defined, then it must also be defined when using the
Lemon parser generator tool to generate a parse.c file. Because of
this, this option may only be used when the library is built from
source, not from the amalgamation or from the collection of
pre-packaged C files provided for non-Unix like platforms on the
website.

Certain words such as SELECT, DELETE, or BIGINT [or ORDER] are reserved and require special treatment for use as identifiers such as table and column names.
Traditional MySQL quotes:
DELETE FROM ranking ORDER BY `score` DESC;
Proper (ANSI) SQL quotes (some databases support [order] as well):
DELETE FROM ranking ORDER BY "score" DESC;
Although I would consider renaming the column to avoid such confusing issues in the future.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex