I'm inserting data to a database where each "batch" must have a new unique id for the batch itself. I could add a batch table and use it's AUTOINCREMENT id, but I don't really need it for anything else, so it seems excessive.
I'm currently doing a SELECT MAX(batchid) + 1 FROM items and then using it for inserts, this is of course prone to race-conditions (2 new batches simultaneously can get conflicting ids)
Using an IMMEDIATE transaction is impractical. Is it possible to force an upgrade of a DEFERRED transaction to EXCLUSIVE before doing the select?
Some ideas;
Can I do some No-op cheap update?
Some explicit instruction to now go exclusive?
INSERT INTO items (batchid, value) VALUES ((SELECT MAX(batchid)+1 FROM items), "monkey"), ((SELECT MAX(batchid)+1 FROM items), "banana"), the idea being that the select is now explicitly part of the update?
It would indeed be possible to put the batch ID lookup into a subquery, but that would be a lot of duplication.
The easiest way is to do something to write to the database, such as PRAGMA user_version = x.
Related
Given a table:
CREATE TABLE Foo(
Id INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT
);
How can I return the ids of the multiple rows inserted at the same time using:
INSERT INTO Foo (Name) VALUES
('A'),
('B'),
('C');
I am aware of last_insert_rowid() but I have not found any examples of using it for multiple rows.
What I am trying to achieve can bee seen in this SQL Server example:
DECLARE #InsertedRows AS TABLE (Id BIGINT);
INSERT INTO [Foo] (Name) OUTPUT Inserted.Id INTO #InsertedRows VALUES
('A'),
('B'),
('C');
SELECT Id FROM #InsertedRows;
Any help is very much appreciated.
This is not possible. If you want to get three values, you have to execute three INSERT statements.
Given SQLite3 locking:
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
And how Last Insert Rowid works:
...returns the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection D.
It should be safe to assume that while a writer executes its batch INSERT to a ROWID-table there can be no other writer to make the generated primary keys non-consequent. Thus the insert primary keys are [lastrowid - rowcount + 1, lastrowid]. Or in Python SQLite3 API:
cursor.execute(...) # multi-VALUE INSERT
assert cursor.rowcount == len(values)
lastrowids = range(cursor.lastrowid - cursor.rowcount + 1, cursor.lastrowid + 1)
In normal circumstances when you don't mix provided and expected-to-be-generated keys or as AUTOINCREMENT-mode documentation states:
The normal ROWID selection algorithm described above will generate monotonically increasing unique ROWIDs as long as you never use the maximum ROWID value and you never delete the entry in the table with the largest ROWID.
The above should work as expected.
This Python script can be used to test correctness of the above for multi-threaded and multi-process setup.
Other databases
For instance, MySQL InnoDB (at least in default innodb_autoinc_lock_mode = 1 "consecutive" lock mode) works in similar way (though obviously in much more concurrent conditions) and guarantees that inserted PKs can be inferred from lastrowid:
"Simple inserts" (for which the number of rows to be inserted is known in advance) avoid table-level AUTO-INC locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes
I refactored a table that stored both metadata and data into two tables, one for metadata and one for data. This allows metadata to be queried efficiently.
I also created an updatable view with the original table's columns, using sqlite's insert, update and delete triggers. This allows calling code that needs both data and metadata to remain unchanged.
The insert and update triggers write each incoming row as two rows - one in the metadata table and one in the data table, like this:
// View
CREATE VIEW IF NOT EXISTS Item as select n.Id, n.Title, n.Author, c.Content
FROM ItemMetadata n, ItemData c where n.id = c.Id
// Trigger
CREATE TRIGGER IF NOT EXISTS item_update
INSTEAD OF UPDATE OF id, Title, Author, Content ON Item
BEGIN
UPDATE ItemMetadata
SET Title=NEW.Title, Author=NEW.Author
WHERE Id=old.Id;
UPDATE ItemData SET Content=NEW.Content
WHERE Id=old.Id;
END;
Questions:
Are the updates to the ItemMetadata and ItemData tables atomic? Is there a chance that a reader can see the result of the first update before the second update has completed?
Originally I had the WHERE clauses be WHERE rowid=old.rowid but that seemed to cause random problems so I changed them to WHERE Id=old.Id. The original version was based on tutorial code I found. But after thinking about it I wonder how sqlite even comes up with an old rowid - after all, this is a view across multiple tables. What rowid does sqlite pass to an update trigger, and is the WHERE clause the way I first coded it problematic?
The documentation says:
No changes can be made to the database except within a transaction. Any command that changes the database (basically, any SQL command other than SELECT) will automatically start a transaction if one is not already in effect.
Commands in a trigger are considered part of the command that triggered the trigger.
So all commands in a trigger are part of a transaction, and atomic.
Views do not have a (usable) rowid.
I have a database with an "ID" column. Whenever there is a new entry for the database, I fetch the last ID from the database, increment the value, and then use it in the Insert statement.
EDIT : I need the ID to use in multiple Insert statements. I will fetch this ID from the primary table and use this ID to insert values into related tables.
NextID = Select Max(ID) + 1 From Table
INSERT INTO Table1(ID, Col1, Col2...) Values(NextId, Value1, Value2...)
INSERT INTO Table2 (ID,col1,col2....) Values (NextID, Value1, Value2...)
I dont know if this is a good way because I know there will be concurrency issues.
When my application tries to read the NextID, there is a chance that another instance of the application is also trying to read the same value and thus concurrency issues may arise.
Is there a proper way to deal with this situation? I mean there are ways to set the database isolation level. Which would be a proper Isolation level for this situation.
Also if anybody could suggest me with an alternate way to maintain and increment manually the ID in the database, I'm also open to that.
If this information is not enough, please let me know what you require.
I am working with ASP.Net with VB and MS Sql Server 2008. I do not want to use the built-in "Identity" of SQL Server.
The only way to get the next ID is to actually insert the row, and use identity. Everything else will fail. So you must start by inserting into the parent table:
begin transaction;
insert into Table (col1, col2, col3) values (value1, value2, value3);
set #Id = scope_identity();
insert into Table1(ID, col1, col2) values (#Id, ...);
insert into Table3(ID, col1, col2) values (#Id, ...);
commit;
This is atomic and concurrency safe.
I do not want to use the built-in "Identity" of SQL Server.
tl;dr. What you 'want' matter little unless you can make a clear justification why. You can do it correctly, or you can spend the time 'ill oblivion reinventing the wheel.
Esentially you have a batch of three SQL statements - one select and two inserts. The database engine can execute another statement from a different session anywhere between them, thus breaking your data consistency - some other session can get the same MAX() value that you've got and use it for other insert statements. The only way to prevent DB engine from doing it is to use transactions. Wrap your batch with BEGIN TRANSACTION ... COMMIT and you are done.
Your way of doing this fine, what you would need is transaction handling..
BEGIN TRANSACTION
begin try
NextID = Select Max(ID) + 1 From Table
INSERT INTO Table1(ID, Col1, Col2...) Values(NextId, Value1, Value2...)
INSERT INTO Table2 (ID,col1,col2....) Values (NextID, Value1, Value2...)
COMMIT TRANSACTION
end try
begin catch
ROLLBACK TRANSACTION
--exception logging goes here
end catch
We have a table(say T1) that is referenced by about 16 other tables with foreign keys in our SQL Server database. The data is accessed through an ASP.NET application with LINQToSQL. When the user tried to delete a record from T1 the statement would time out. So we decided to first delete the records from the tables that reference T1 and only then delete the record in T1. The problem is that deletion from T1 does not work as fast as expected.
My question is: is it normal that deletion from a table referenced by many other tables to be so time-consuming even if the record itself does not have any 'children' records?
EDIT: Apparently the cause for the timeout was not the delete itself but another query that retrieved data from the same DataContext. Thank you for your suggestions, I have marked as answer the suggestion to add indexes for all foreign keys because it improved our script's execution plan.
I suspect that you may need to look into the indexing on your child tables.
It sounds as if you FKs are set to Cascade Deletes, so I would suspect that some of your tables do not have an index that includes the key to the parent as the first in the index.
In this way your delete will be full scanning the child tables - even if you've already deleted the child records it will still check as you've still got the Cascade set.
When you define a relationship in DB, you can set the Delete rule as Cascade in SQL server. In this way, when you delete the record from the parent table, it will be automatically deleted from the child tables.
Please see the image below:
If it taking long time, you may have set other constraint that will slow
down the process of deletion.
Linq does not do bulk deletes if you're having it operate directly on the record set -- instead, it is probably deleting one record at a time.
To improve performance, use a stored procedure instead for any bulk insert, update or delete operations.
I have an Sqlite3 database with a table and a primary key consisting of two integers, and I'm trying to insert lots of data into it (ie. around 1GB or so)
The issue I'm having is that creating primary key also implicitly creates an index, which in my case bogs down inserts to a crawl after a few commits (and that would be because the database file is on NFS.. sigh).
So, I'd like to somehow temporary disable that index. My best plan so far involved dropping the primary key's automatic index, however it seems that SQLite doesn't like it and throws an error if I attempt to do it.
My second best plan would involve the application making transparent copies of the database on the network drive, making modifications and then merging it back. Note that as opposed to most SQlite/NFS questions, I don't need access concurrency.
What would be a correct way to do something like that?
UPDATE:
I forgot to specify the flags I'm already using:
PRAGMA synchronous = OFF
PRAGMA journal_mode = OFF
PRAGMA locking_mode = EXCLUSIVE
PRAGMA temp_store = MEMORY
UPDATE 2:
I'm in fact inserting items in batches, however every next batch is slower to commit than previous one (I'm assuming this has to do with the size of index). I tried doing batches of between 10k and 50k tuples, each one being two integers and a float.
You can't remove embedded index since it's the only address of row.
Merge your 2 integer keys in single long key = (key1<<32) + key2; and make this as a INTEGER PRIMARY KEY in youd schema (in that case you will have only 1 index)
Set page size for new DB at least 4096
Remove ANY additional index except primary
Fill in data in the SORTED order so that primary key is growing.
Reuse commands, don't create each time them from string
Set page cache size to as much memory as you have left (remember that cache size is in number of pages, but not number of bytes)
Commit every 50000 items.
If you have additional indexes - create them only AFTER ALL data is in table
If you'll be able to merge key (I think you're using 32bit, while sqlite using 64bit, so it's possible) and fill data in sorted order I bet you will fill in your first Gb with the same performance as second and both will be fast enough.
Are you doing the INSERT of each new as an individual Transaction?
If you use BEGIN TRANSACTION and INSERT rows in batches then I think the index will only get rebuilt at the end of each Transaction.
See faster-bulk-inserts-in-sqlite3.