Obtaining Read/Write Lock for Sqlite Parallel Updates - sqlite

I write a lot of parallel scripts in python for research purposes and was wondering if it is possible to obtain a read/write lock manually in sqlite for a specific set of commands. Here is an oversimplified example of why I need it:
Simple Example:
Suppose I have a table (A) and I want to simply count the number of rows in it and store the result in another table (B) in parallel. To do so, I run 10 instances of a program that counts rows for a certain range of A and adds the sum of the rows to the property in B.
The problem is that I need to read the property in B, add it to the instance's count, and save it; all while making sure none of the other instances are doing this process. Normally it is only a write lock that is needed - in this case I need a read lock as well...
I was hoping I could do something like this:
read/write lock
Grab current value in B, add it to the
instance count
save new value
unlock
Is there a way to do this?
Thanks.

You can increment an integer value using a single UPDATE statement:
sqlite> CREATE TABLE B(Id INTEGER, Value INTEGER);
sqlite> INSERT INTO B VALUES(0, 15);
sqlite> UPDATE B SET Value=Value + 23 WHERE Id = 0;
sqlite> SELECT * FROM B;
0|38
sqlite>
Using a single UPDATE statement makes this operation atomic, making any extra locking unnecessary.
If you need more complex processing, you can use SQL transactions to ensure that any complex database operations are performed atomically.
In general, you should avoid any locking external to SQLite or messing with the SQLite locking subsystem - doing so is a very good recipe for deadlocks...
EDIT:
To append to a string, you can use the || concatenation operator:
sqlite> CREATE TABLE C(Id INTEGER, Value TEXT);
sqlite> INSERT INTO C VALUES(0, 'X');
sqlite> UPDATE C SET Value=Value || 'Y' WHERE Id = 0;
sqlite> SELECT * FROM C;
0|XY
sqlite>

Related

Why does SQLite store strings as text in BLOB columns, rather than bytes?

This behavior has me scratching my head: apparently, when you store a string into a BLOB column, when you query it it doesn't behave like bytes? And, weirder still, when you attempt to perform a BLOB substring, you have to query a length of 2 to get a single byte?
sqlite> create table wtf (a BLOB);
sqlite> insert into wtf (a) values (NULL);
sqlite> insert into wtf (a) values ('a');
sqlite> insert into wtf (a) values (X'61');
sqlite> select * from wtf;
a
a
sqlite> select a = X'61' from wtf;
0
1
sqlite> select HEX(a) from wtf;
61
61
sqlite> select substr(a, 0, 1) from wtf;
sqlite> select substr(a, 0, 2) from wtf;
a
a
Why does SQLite store strings as TEXT in BLOB columns, rather than bytes?
(I'll disregard your imprecise language: consider that everything stored in a computer is "bytes")
SQLite does not enforce column types.
shock! and horror! (...yes)
From the docs (emphasis mine):
In SQLite, the datatype of a value is associated with the value itself, not with its column [...] Flexible typing is a feature of SQLite, not a bug.
Read more here: https://www.sqlite.org/flextypegood.html
When you INSERT INTO table ( thisIsANumericColumn ) VALUES ( 'zzz' ); the SQLite engine is perfectly happy to store TEXT strings as-is, so doing a SELECT thisIsANumericColumn FROM table will result in your SQLite library (or your application code which consumes SQLite's API) needing to perform implicit type conversions if required, which can break at runtime (so you'd get-away with this in NodeJS or PHP, but not in .NET due to how ADO.NET works).
There are at least 3 possible alternative solutions:
Add STRICT to your CREATE TABLE DDL. This instructs SQLite to respect column types, just like a traditional RDBMS.
i.e. CREATE TABLE tbl ( a BLOB NOT NULL ) STRICT;
You must be running SQLite 3.37 (dated 2021-11-27) or later to use STRICT tables.
Simply don't insert incorrectly-typed values in the first place.
Use explicit CHECK constraints to enforce data-type restrictions and other data integrity checks, like value ranges, string length, etc.

Is it possible to run a Teradata query in Excel that uses Volatile tables?

My Teradata query creates a volatile that is used to join to existing views. When linking query to excel the following error pops up: "Teradata: [Teradata Database] [3932] Only an ET or null statement is legal after a DDL Statement". Is there a workaround for this for someone that does not have write permissions in teradata to create a real view or table? I want to avoid linking to Teradata in SQL and running an open query to pull in the data needed.
This is for Excel 2016 64bit and using Teradata version 15.10.1.12
Normally this error will occur if you are using ANSI mode or have issued a BT (Begin Transaction) in BTET mode.
Here are a few workarounds to try:
Issue an ET; statement (commit) after the create volatile table statement. If you are using ANSI mode, use COMMIT; instead of ET;. If you are unsure, try each one in turn. Only one will be valid but both do the same thing. Make sure your Volatile table includes ON COMMIT PRESERVE ROWS
Try using BT ET mode (a.k.a. Teradata mode) when establishing the session. I do not remember where but there will be a setting in the ODBC configuration for this.
Try using a Global Temporary table. These work similarly to Volatile tables except you define them once and the definition sticks around. That is, you can create it in, say BTEQ, or SQL assistant etc. The definition is common to all users and sessions (i.e. your Excel session), but the content is transient and unique to each session (like a volatile table).
Move the select part of your insert into the volatile table into the query that selects the data from the volatile table. See simple example below.
If you do not have create Global Temporary table permissions, ask your DBA.
Here is a simple example to illustrate point 4.
Current:
create volatile table tmp (id Integer)
ON COMMIT PRESERVE ROWS;
insert into tmp
select customer_number
from customer
where X = Y and yr = 2019
;
select a,b,c
from another_tbl A join TMP T ON
A.id = T.id
;
Becomes:
select a,b,c
from another_tbl A join (
select customer_number
from customer
where X = Y and yr = 2019
) AS T
ON
A.id = T.id
;
Or better yet, just Join your tables directly.
Note The first sequence (create table, Insert into and select) is a three statement series. This will return 3 "result sets". The first two will be row counts the last will be the actual data. Most programs (including I think Excel) can not process multiple result set responses. This is one of the reasons it is difficult to use Teradata Macros with client tools like Excel.
The latter solution (a single select) avoids this potential problem.

How can I return inserted ids for multiple rows in SQLite?

Given a table:
CREATE TABLE Foo(
Id INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT
);
How can I return the ids of the multiple rows inserted at the same time using:
INSERT INTO Foo (Name) VALUES
('A'),
('B'),
('C');
I am aware of last_insert_rowid() but I have not found any examples of using it for multiple rows.
What I am trying to achieve can bee seen in this SQL Server example:
DECLARE #InsertedRows AS TABLE (Id BIGINT);
INSERT INTO [Foo] (Name) OUTPUT Inserted.Id INTO #InsertedRows VALUES
('A'),
('B'),
('C');
SELECT Id FROM #InsertedRows;
Any help is very much appreciated.
This is not possible. If you want to get three values, you have to execute three INSERT statements.
Given SQLite3 locking:
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
And how Last Insert Rowid works:
...returns the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection D.
It should be safe to assume that while a writer executes its batch INSERT to a ROWID-table there can be no other writer to make the generated primary keys non-consequent. Thus the insert primary keys are [lastrowid - rowcount + 1, lastrowid]. Or in Python SQLite3 API:
cursor.execute(...) # multi-VALUE INSERT
assert cursor.rowcount == len(values)
lastrowids = range(cursor.lastrowid - cursor.rowcount + 1, cursor.lastrowid + 1)
In normal circumstances when you don't mix provided and expected-to-be-generated keys or as AUTOINCREMENT-mode documentation states:
The normal ROWID selection algorithm described above will generate monotonically increasing unique ROWIDs as long as you never use the maximum ROWID value and you never delete the entry in the table with the largest ROWID.
The above should work as expected.
This Python script can be used to test correctness of the above for multi-threaded and multi-process setup.
Other databases
For instance, MySQL InnoDB (at least in default innodb_autoinc_lock_mode = 1 "consecutive" lock mode) works in similar way (though obviously in much more concurrent conditions) and guarantees that inserted PKs can be inferred from lastrowid:
"Simple inserts" (for which the number of rows to be inserted is known in advance) avoid table-level AUTO-INC locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes

SQLite data retrieve with select taking too long

I have created a table with sqlite for my corona/lua app. It's a hashtable with ~=700 000 values.The table has two columns, which are the hashcode (a string), and the value (another string). During the program I need to get data several times by providing the hashcode.
I'm using something like this code to get the data:
for p in db:nrows([[SELECT * FROM test WHERE id=']].."hashcode"..[[';]]) do
print(p)
-- p = returned value --
end
This statement is though taking insanely too much time to perform
thanks,
Edit:
Success!
the mistake was with the primare key thing.I set the hashcode as the primary key like below and the retrieve time whent to normal:
CREATE TABLE IF NOT EXISTS test (id STRING PRIMARY KEY , array);
I also prepared the statements in advance as you said:
stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
[...]
stmt:bind(1,s)
for p in stmt:nrows() do
The only problem was that the db file size,that was around 18 MB, went to 29,5 MB
You should create the table with id as a unique primary key; this will automatically make an index.
create table if not exists test
(
id text primary key,
val text
);
You should not construct statements using string concatenation; this is a security issue so avoid getting in this habit. Also, you should prepare statements in advance, at program initialization, and run the prepared statements.
Something like this... initially:
hashcode_query_stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
then for each use:
hashcode_query_stmt:bind_values(hashcode)
for p in hashcode_query_stmt:urows() do ... end
Ensure that there is an index on the id/hashcode column? Without one such queries will be slow, slow, slow. This index should probably be unique.
If only selecting the value/hashcode (SELECT value FROM ..), it may be beneficial to have a covering index over (id, value) as that can avoid additional seeking to the row data (see SQLite Query Planning). Try it with and without such a covering index.
Also, it may be worthwhile to employ caching if the same hashcodes are queried multiple times.
As already stated, get sure you have an index on ID.
If you can't change table schema now, you can add a index ad hoc:
CREATE INDEX test_id ON test (id);
About hashes: if you are computing hashes in your software to speed up searches, don't!
SQLite will use your supplied hashes as any regular string/blob. Also, RDBMS are optimized for efficient searching, which may be greatly improved with indexes.
Unless your hashing to save space, you are wasting processor time computing hashes in your application.

Calculating the percentage of dates (SQL Server)

I'm trying to add an auto-calculated field in SQL Server 2012 Express, that stores the % of project completion, by calculating the date difference by using:
ALTER TABLE dbo.projects
ADD PercentageCompleted AS (select COUNT(*) FROM projects WHERE project_finish > project_start) * 100 / COUNT(*)
But I am getting this error:
Msg 1046, Level 15, State 1, Line 2
Subqueries are not allowed in this context. Only scalar expressions are allowed.
What am I doing wrong?
Even if it would be possible (it isn't), it is anyway not something you would want to have as a caculated column:
it will be the same value in each row
the entire table would need to be updated after every insert/update
You should consider doing this in a stored procedure or a user defined function instead.Or even better in the business logic of your application,
I don't think you can do that. You could write a trigger to figure it out or do it as part of an update statement.
Are you storing "percentageCompleted" as a duplicated column value in the same table as your project data?
If this is the case, I would not recommend this, because it would duplicate the data.
If you don't care about duplicate data, try something separating the steps out like this:
ALTER TABLE dbo.projects
ADD PercentageCompleted decimal(2,2) --You could also store it as a varchar or char
declare #percentageVariable decimal(2,2)
select #percentageVariable = (select count(*) from projects where Project_finish > project_start) / (select count(*) from projects) -- need to get ratio by completed/total
update projects
set PercentageCompleted = #percentageVariable
this will give you a decimal value in that table, then you can format it on select if you desire to % + PercentageCompleted * 100

Resources