Get last_insert_rowid() from bulk insert - r

I have an sqlite database where I need to insert spatial information along with metadata into an R*tree and an accompanying regular table. Each entry needs to be uniquely defined for the lifetime of the database. Therefore the regular table have an INTEGER PRIMARY KEY AUTOINCREMENT column and my plan was to start with the insert into this table, extract the last inserted rowids and use these for the insert into the R*tree. Alas this doesn't seem possible:
>testCon <- dbConnect(RSQLite::SQLite(), ":memory:")
>dbGetQuery(testCon, 'CREATE TABLE testTable (x INTEGER PRIMARY KEY, y INTEGER)')
>dbGetQuery(testCon, 'INSERT INTO testTable (y) VALUES ($y)', bind.data=data.frame(y=1:5))
>dbGetQuery(testCon, 'SELECT last_insert_rowid() FROM testTable')
last_insert_rowid()
1 5
2 5
3 5
4 5
5 5
Only the last inserted rowid seems to be kept (probably for performance reasons). As the number of records to be inserted is hundreds of thousands, it is not feasible to do the insert line by line.
So the question is: Is there any way to make the last_insert_rowid() bend to my will? And if not, what is the best failsafe alternative? Some possibilities:
Record highest rowid before insert and 'SELECT rowid FROM testTable WHERE rowid > prevRowid'
Get the number of rows to insert, fetch the last_insert_rowid() and use seq(to=lastRowid, length.out=nInserts)
While the two above suggestion at least intuitively should work I don't feel confident enough in sqlite to know if they are failsafe.

The algorithm for generating autoincrementing IDs is documented.
For an INTEGER PRIMARY KEY column, you can simply get the current maximum value:
SELECT IFNULL(MAX(x), 0) FROM testTable
and then use the next values.

Related

Why does this SQLite query not use an index for the correlated subquery?

Consider a SQLite database for things with parts, containing the following tables
CREATE TABLE thing (id integer PRIMARY KEY, name text, total_cost real);
CREATE TABLE part (id integer PRIMARY KEY, cost real);
CREATE TABLE thing_part (thing_id REFERENCES thing(id), part_id REFERENCES part(id));
I have an index to find the parts of a thing
CREATE INDEX thing_part_idx ON thing_part (thing_id);
To illustrate the problem, I'm using the following queries to fill the tables with random data
INSERT INTO thing(name)
WITH RECURSIVE
cte(x) AS (
SELECT 1
UNION ALL
SELECT 1 FROM cte LIMIT 10000
)
SELECT hex(randomblob(4)) FROM cte;
INSERT INTO part(cost)
WITH RECURSIVE
cte(x) AS (
SELECT 1
UNION ALL
SELECT 1 FROM cte LIMIT 10000
)
SELECT abs(random()) % 100 FROM cte;
INSERT INTO thing_part (thing_id, part_id)
SELECT thing.id, abs(random()) % 10000 FROM thing, (SELECT 1 UNION ALL SELECT 1), (SELECT 1 UNION ALL SELECT 1);
So each thing is associated with a small number of parts (4 in this example).
At this point, I have not yet set the total cost of the things. I thought I could use the following query
UPDATE thing SET total_cost = (
SELECT sum(part.cost)
FROM thing_part, part
WHERE thing_part.thing_id = thing.id
AND thing_part.part_id = part.id);
but it is extremely slow (I did not have the patience to wait for it to complete).
EXPLAIN QUERY PLAN shows that both thing and thing_part are being scanned over, only the lookup in part is done using the rowid:
SCAN TABLE thing
EXECUTE CORRELATED SCALAR SUBQUERY 0
SCAN TABLE thing_part
SEARCH TABLE part USING INTEGER PRIMARY KEY (rowid=?)
If I look at the query plan for the inner query with a fixed thing_id, i.e.
SELECT sum(part.cost)
FROM thing_part, part
WHERE thing_part.thing_id = 1000
AND thing_part.part_id = part.id;
it does use the thing_part_idx:
SEARCH TABLE thing_part USING INDEX thing_part_idx (thing_id=?)
SEARCH TABLE part USING INTEGER PRIMARY KEY (rowid=?)
I would expect the first query to be equivalent to iterating over all rows of thing and executing the inner query each time, but obviously that's not the case. Why? Should I use a different index or rewrite my query or maybe do the iteration in the client to generate multiple queries instead?
In case it matters, I'm using SQLite version 3.22.0
SQLite might use dynamic typing, but column types still matter for affinity, and indexes can be used only when the database can prove that index lookups behave the same as comparisons with the actual table values, which often requires the affinities to be compatible.
So when you tell the database that the thing_part values are integers:
CREATE TABLE thing_part (
thing_id integer REFERENCES thing(id),
part_id integer REFERENCES part(id)
);
then the index on that will have the correct affinity, and will be used:
QUERY PLAN
|--SCAN TABLE thing
`--CORRELATED SCALAR SUBQUERY
|--SEARCH TABLE thing_part USING INDEX thing_part_idx (thing_id=?)
`--SEARCH TABLE part USING INTEGER PRIMARY KEY (rowid=?)
I would rewrite your query as:
-- calculating sum for each thing_id at once
WITH cte AS (
SELECT thing_part.thing_id, sum(part.cost) AS s
FROM thing_part
JOIN part
ON thing_part.part_id = part.id
GROUP BY thing_part.thing_id
)
UPDATE thing
SET total_cost = (SELECT s FROM cte WHERE thing.id = cte.thing_id);

Can Sqlite's AUTOINCREMENT/Primary Key be started at a number other than 1?

I would like to start different tables off at different values for their primary keys during testing to verify I don't have any bugs in my code. Is this possible in Sqlite?
As documented, the last value of an AUTOINCREMENT column is stored in the internal sqlite_sequence table, where it can be changed.
Yes, you can do that.
The simplest way is probably just to insert a row, and specify the the number. If I wanted to start with 1000, I might do something like this.
sqlite> create table test (test_id integer primary key, s char(1));
sqlite> insert into test values (999, 'a');
sqlite> insert into test (s) values ('b');
sqlite> select * from test;
999|a
1000|b
After you've inserted the "first" row (test_id is 1000), you can delete the "seed" row (test_id is 999).

SQLite: Ordering my select results

I have a table with unique usernames and a bunch of string data I am keeping track of. Each user will have 1000 rows and when I select them I want to return them in the order they were added. Is the following code a necessary and correct way of doing this:
CREATE TABLE foo (
username TEXT PRIMARY KEY,
col1 TEXT,
col2 TEXT,
...
order_id INTEGER NOT NULL
);
CREATE INDEX foo_order_index ON foo(order_id);
SELECT * FROM foo where username = 'bar' ORDER BY order_id;
Add a DateAdded field and default it to the date/time the row was added and sort on that.
If you absolutely must use the order_ID, which I don't suggest. Then at least make it an identity column. The reason I advise against this is because you are relying on side affects to do your sorting and it will make your code harder to read.
If each user will have 1000 rows, then username should not be the primary key. One option is to use the int identity column which all tables have (which optimizes I/O reads since it's typically stored in that order).
Read under "RowIds and the Integer Primary Key" # http://www.sqlite.org/lang_createtable.html
The data for each table in SQLite is stored as a B-Tree structure
containing an entry for each table row, using the rowid value as the
key. This means that retrieving or sorting records by rowid is fast.
Because it's stored in that order in the B-tree structure, it should be fast to order by the int primary key. Make sure it's an alias for rowid though - more in that article.
Also, if you're going to be doing queries where username = 'bob', you should consider an index on the username column - especially there's going to be many users which makes the index effective because of high selectivity. In contrast, adding an index on a column with values like 1 and 0 only leads to low selectivity and renders the index very ineffective. So, if you have 3 users :) it's not worth it.
You can remove the order_id column & index entirely (unless you need them for something other than this sorting).
SQLite tables always have a integer primary key - in this case, your username column has silently been made a unique key, so the table only has the one integer primary key. The key column is called rowid. For your sorting purpose, you'll want to explicitly make it AUTOINCREMENT so that every row always has a higher rowid than older rows.
You probably want to read http://www.sqlite.org/autoinc.html
CREATE TABLE foo (
rowid INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT UNIQUE KEY,
...
Then your select becomes
select * from foo order by rowed;
One advantage of this approach is that you're re-using the index SQLite will already be placing on your table. A date or order_id column is going to mean an extra index, which is just overhead here.

Speed up SQL select in SQLite

I'm making a large database that, for the sake of this question, let's say, contains 3 tables:
A. Table "Employees" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
Others don't matter
B. Table "Job_Sites" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
Others don't matter
C. Table "Workdays" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
emp_id = is a foreign key to Employees(id)
job_id = is a foreign key to Job_Sites(id)
datew = INTEGER that stands for the actual workday, represented by a Unix date in seconds since midnight of Jan 1, 1970
The most common operation in this database is to display workdays for a specific employee. I perform the following select statement:
SELECT * FROM Workdays WHERE emp_id='Actual Employee ID' AND job_id='Actual Job Site ID' AND datew>=D1 AND datew<D2
I need to point out that D1 and D2 are calculated for the beginning of the month in search and for the next month, respectively.
I actually have two questions:
Should I set any fields as indexes besides primary indexes? (Sorry, I seem to misunderstand the whole indexing concept)
Is there any way to re-write the Select statement to maybe speed it up. For instance, most of the checks in it would be to see that the actual employee ID and job site ID match. Maybe there's a way to split it up?
PS. Forgot to say, I use SQLite in a Windows C++ application.
If you use the above query often, then you may get better performance by creating a multicolumn index containing the columns in the query:
CREATE INDEX WorkdaysLookupIndex ON Workdays (emp_id, job_id, datew);
Sometimes you just have to create the index and try your queries to see what is faster.

Sqlite insert into with unique names, getting id

I have a list of strings to insert into a db. They MUST be unique. When i insert i would like their ID (to use as a foreign key in another table) so i use last_insert_rowid. I get 2 problems.
If i use replace, their id
(INTEGER PRIMARY KEY) updates which
breaks my db (entries point to
nonexistent IDs)
If i use ignore, rowid is not updated so i do not get the correct ID
How do i get their Ids? if i dont need to i wouldnt want to use a select statement to check and insert the string if it doesnt exist . How should i do this?
When a UNIQUE constraint violation occurs, the REPLACE algorithm deletes pre-existing rows that are causing the constraint violation prior to inserting or updating the current row and the command continues executing normally. This causes the rowid to change and creates the following problem
Y:> **sqlite3 test**
SQLite version 3.7.4
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> **create table b (c1 integer primary key, c2 text UNIQUE);**
sqlite> **insert or replace into b values (null,'test-1');**
sqlite> **select last_insert_rowid();**
1
sqlite> **insert or replace into b values (null,'test-2');**
sqlite> **select last_insert_rowid();**
2
sqlite> **insert or replace into b values (null,'test-1');**
sqlite> **select last_insert_rowid();**
3
sqlite> **select * from b;**
2|test-2
3|test-1
The work around is to change the definition of the c2 column as follows
create table b (c1 integer primary key, c2 text UNIQUE ON CONFLICT IGNORE);
and to remove the "or replace" clause from your inserts;
then when test after your insert, you will need to execute the following sql: select last_insert_rowid(), changes();
sqlite> **create table b (c1 integer primary key, c2 text UNIQUE ON CONFLICT IGNORE);**
sqlite> **insert into b values (null,'test-1');**
sqlite> **select last_insert_rowid(), changes();**
1|1
sqlite> **insert into b values (null,'test-2');**
sqlite> **select last_insert_rowid(), changes();**
2|1
sqlite> **insert into b values (null,'test-1');**
sqlite> **select last_insert_rowid(), changes();**
2|0
The return value of changes after the 3rd insert will be a notification to your application that you will need to lookup the rowid of "test-1", since it was already on file. Of course if this is a multi-user system, you will need to wrap all this in a transaction as well.
I use the below currently
insert into tbl(c_name) select 'val' where not exists(select id from tbl where c_name ='val');
select id from tbl where c_name ='val';
By "they MUST be unique", do they mean you are sure that they are, or that you want an error as a result if they aren't? If you just make the string itself a key in its table, then I don't understand how either 1 or 2 could be a problem -- you'll get an error as desired in case of unwanted duplication, otherwise the correct ID. Maybe you can clarify your question with a small example of SQL code you're using, the table in question, what behavior you are observing, and what behavior you'd want instead...?
Edited: thanks for the edit but it's still unclear to me what SQL is giving you what problems! If your table comes from, e.g.:
CREATE TABLE Foo(
theid INTEGER PRIMARY KEY AUTOINCREMENT,
aword TEXT UNIQUE ABORT
)
then any attempt to INSERT a duplicated word will fail (the ABORT keyword is optional, as it's the default for UNIQUE) -- isn't that what you want given that you say the words "MUST be unique", i.e., it's an error if they aren't?
The correct answer to your question is: This cannot be done in sqlite. You have to make an additional select query. Quoting the docs for last_insert_rowid:
An INSERT that fails due to a constraint violation is not a successful INSERT and does not change the value returned by this routine. Thus INSERT OR FAIL, INSERT OR IGNORE, INSERT OR ROLLBACK, and INSERT OR ABORT make no changes to the return value of this routine when their insertion fails
Having the same problem in 2022, but since SQLite3 version 3.35.0 (2021-03-12), we have RETURNING.
Combined with UPSERT, it is now possible to achieve this
sqlite> create table test (id INTEGER PRIMARY KEY, text TEXT UNIQUE);
sqlite> insert into test(text) values("a") on conflict do update set id = id returning id;
1
sqlite> insert into test(text) values("a") on conflict do update set id = id returning id;
1
sqlite> insert into test(text) values("b") on conflict do update set id = id returning id;
2
sqlite> insert into test(text) values("b") on conflict do update set id = id returning id;
2
sqlite> select * from test;
1|a
2|b
sqlite> insert into test(text) values("b") on conflict do nothing returning id;
sqlite>
Sadly, this is still a workaround rather than an elegant solution...
On conflict, the insert becomes an update. This means that your update triggers will fire, so you may want to stay away from this!
When the insert is converted into an update, it needs to do something (cf link). However, we don't want to do anything, so we do a no-op by updating id with itself.
Then, returning id gives us the what we want.
Notes:
Our no-op actually does an update, so it costs time, and the trigger on update will fire. But without triggers, it has no effect on the data
Using on conflict do nothing returning id does not fail, but does not return the id.
If usable (again, check your triggers), and if all your tables use the primary key id, then this technique does not need any specialization: just copy/paste on conflict do update set id = id returning id;

Resources