Explain plan for update operation involving join index - teradata

Below is the explain plan I can't wrap my mind around.
On a high level it seems it is updating the Lineitem table through the OrderLine join index (not at all sure about this), and it is executing this in parallel,
Explanation
1) First, we execute the following steps in parallel.
1) We do a single-AMP UPDATE from join index table AMT4.OrderLine by way of the
primary index "AMT4.OrderLine.l_orderkey = 10" with a residual
condition of ("AMT4.OrderLine.l_orderkey = 10").
2) We do a single-AMP UPDATE from AMT4.Lineitem by way of the primary
index "AMT4.Lineitem.l_orderkey = 10" with no residual conditions.
Appreciate someone helping me understand this.

This is an UPDATE on AMT4.Lineitem with a WHERE-condition on the table's Primary Index.
And there's a Join Index (AMT4.OrderLine) on that table, which is automatically maintained during the update. The JI got the same PI as the base table and is also updated via it's Primary Index.

Related

How to introduce indexing to sqlite query in android?

In my android application, I use Cursor c = db.rawQuery(query, null); to query data from a local sqlite database, and one of the query string looks like the following:
SELECT t1.* FROM table t1
WHERE NOT EXISTS (
SELECT 1 FROM table t2
WHERE t2.start_time = t1.start_time AND t2.stop_time > t1.stop_time
)
however, the issue is that the query gets very slow when the database gets huge. Trying to look into introducing indexing to speed up the query, but so far, not been very successful, therefore, would be great to have some help here, as it's also hard to find examples for this for android applications.
You can create a composite index for the columns start_time and stop_time:
CREATE INDEX idx_name ON table_name(start_time, stop_time);
You can read in The SQLite Query Optimizer Overview:
The ON and USING clauses of an inner join are converted into
additional terms of the WHERE clause prior to WHERE clause analysis
...
and:
If an index is created using a statement like this:
CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
Then the index might be used if the initial columns of the index
(columns a, b, and so forth) appear in WHERE clause terms. The initial
columns of the index must be used with the = or IN or IS operators.
The right-most column that is used can employ inequalities.
You may have to uninstall the app from the device so that the db is deleted and rerun to recreate it, or increase the version number of the db so that you can create the index in the onUpgrade() method.

Efficient insertion of row and foreign table row if it does not exist

Similar to this question and this solution for PostgreSQL (in particular "INSERT missing FK rows at the same time"):
Suppose I am making an address book with a "Groups" table and a "Contact" table. When I create a new Contact, I may want to place them into a Group at the same time. So I could do:
INSERT INTO Contact VALUES (
"Bob",
(SELECT group_id FROM Groups WHERE name = "Friends")
)
But what if the "Friends" Group doesn't exist yet? Can we insert this new Group efficiently?
The obvious thing is to do a SELECT to test if the Group exists already; if not do an INSERT. Then do an INSERT into Contacts with the sub-SELECT above.
Or I can constrain Group.name to be UNIQUE, do an INSERT OR IGNORE, then INSERT into Contacts with the sub-SELECT.
I can also keep my own cache of which Groups exist, but that seems like I'm duplicating functionality of the database in the first place.
My guess is that there is no way to do this in one query, since INSERT does not return anything and cannot be used in a subquery. Is that intuition correct? What is the best practice here?
My guess is that there is no way to do this in one query, since INSERT
does not return anything and cannot be used in a subquery. Is that
intuition correct?
You could use a Trigger and a little modification of the tables and then you could do it with a single query.
For example consider the folowing
Purely for convenience of producing the demo:-
DROP TRIGGER IF EXISTS add_group_if_not_exists;
DROP TABLE IF EXISTS contact;
DROP TABLE IF EXISTS groups;
One-time setup SQL :-
CREATE TABLE IF NOT EXISTS groups (id INTEGER PRIMARY KEY, group_name TEXT UNIQUE);
INSERT INTO groups VALUES(-1,'NOTASSIGNED');
CREATE TABLE IF NOT EXISTS contact (id INTEGER PRIMARY KEY, contact TEXT, group_to_use TEXT, group_reference TEXT DEFAULT -1 REFERENCES groups(id));
CREATE TRIGGER IF NOT EXISTS add_group_if_not_exists
AFTER INSERT ON contact
BEGIN
INSERT OR IGNORE INTO groups (group_name) VALUES(new.group_to_use);
UPDATE contact SET group_reference = (SELECT id FROM groups WHERE group_name = new.group_to_use), group_to_use = NULL WHERE id = new.id;
END;
SQL that would be used on an ongoing basis :-
INSERT INTO contact (contact,group_to_use) VALUES
('Fred','Friends'),
('Mary','Family'),
('Ivan','Enemies'),
('Sue','Work colleagues'),
('Arthur','Fellow Rulers'),
('Amy','Work colleagues'),
('Henry','Fellow Rulers'),
('Canute','Fellow Ruler')
;
The number of values and the actual values would vary.
SQL Just for demonstration of the result
SELECT * FROM groups;
SELECT contact,group_name FROM contact JOIN groups ON group_reference = groups.id;
Results
This results in :-
1) The groups (noting that the group "NOTASSIGNED", is intrinsic to the working of the above and hence added initially) :-
have to be careful regard mistakes like (Fellow Ruler instead of Fellow Rulers)
-1 used because it would not be a normal value automatically generated.
2) The contacts with the respective group :-
Efficient insertion
That could likely be debated from here to eternity so I leave it for the fence sitters/destroyers to decide :). However, some considerations:-
It works and appears to do what is wanted.
It's a little wasteful due to the additional wasted column.
It tries to minimise the waste by changing the column to an empty string (NULL may be even more efficient, but for some can be confusing)
There will obviously be an overhead BUT in comparison to the alternatives probably negligible (perhaps important if you were extracting every Facebook user) but if it's user input driven likely irrelevant.
What is the best practice here?
Fences again. :)
Note Hopefully obvious, but the DROP statements are purely for convenience and that all other SQL up until the INSERT is run once
to setup the tables and triggers in preparation for the single INSERT
that adds a group if necessary.

How can I understand the sqlite query plan?

I executed a query on SQLite and the plan part is
0|1|5|SCAN TABLE edges AS e1 (~250000 rows)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE dihedral USING AUTOMATIC
COVERING INDEX (TYPE=? AND EDGE=?) (~7 rows)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 3
3|0|0|SEARCH TABLE bounds USING AUTOMATIC
COVERING INDEX (FACE=? AND EDGE=?) (~7 rows)
where the query in WHERE is
exists (select dihedral.edge from dihedral where ihedral.type=2 and dihedral.edge=e1.edge) and
exists (select bounds.edge from bounds where bounds.face=f1.face and bounds.edge=e1.edge) and
I understand this is not a high effeciency query,Ijust want to increase the performance.
This is my guess:
There is no subquery flattening, right?
The two exist subquery introduce the correlated subquery, and as they are acctually executed as indexed nested loop, right?
Read the query, because table dihedral and bounds are independent, both are correlated with the outer edge table, so the computational complexity is O(n^2) for no index. However, as there are covering index, the performance should be much better, right? I found on wiki, index has performance of O(log(N)) even better,so the overall performance should be O(n*log(N)), is this right?
Could anyone help me to understand what happened? thanks.
SQLite does support subquery flattening, but this it not possible for an EXISTS subquery like here.
The AUTOMATIC shows that the database creates a temporary index just for this query.
This is a strong indication that you should create these indexes permanently:
CREATE INDEX dihedral_type_edge ON dihedral(type, edge);
CREATE INDEX bounds_face_edge ON bounds(face, edge);
The outer query goes through all edge rows, and for each row, searches in the indexes.
This would result in O(edge * (log(dihedral) + log(bounds))).
The temporary index creation requires sorting these tables, so the entire runtime ends up being O(dihedral*log(dihedral) + bounds*log(bounds) + edge*(log(dihedral)+log(bounds))).

SQLite data retrieve with select taking too long

I have created a table with sqlite for my corona/lua app. It's a hashtable with ~=700 000 values.The table has two columns, which are the hashcode (a string), and the value (another string). During the program I need to get data several times by providing the hashcode.
I'm using something like this code to get the data:
for p in db:nrows([[SELECT * FROM test WHERE id=']].."hashcode"..[[';]]) do
print(p)
-- p = returned value --
end
This statement is though taking insanely too much time to perform
thanks,
Edit:
Success!
the mistake was with the primare key thing.I set the hashcode as the primary key like below and the retrieve time whent to normal:
CREATE TABLE IF NOT EXISTS test (id STRING PRIMARY KEY , array);
I also prepared the statements in advance as you said:
stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
[...]
stmt:bind(1,s)
for p in stmt:nrows() do
The only problem was that the db file size,that was around 18 MB, went to 29,5 MB
You should create the table with id as a unique primary key; this will automatically make an index.
create table if not exists test
(
id text primary key,
val text
);
You should not construct statements using string concatenation; this is a security issue so avoid getting in this habit. Also, you should prepare statements in advance, at program initialization, and run the prepared statements.
Something like this... initially:
hashcode_query_stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
then for each use:
hashcode_query_stmt:bind_values(hashcode)
for p in hashcode_query_stmt:urows() do ... end
Ensure that there is an index on the id/hashcode column? Without one such queries will be slow, slow, slow. This index should probably be unique.
If only selecting the value/hashcode (SELECT value FROM ..), it may be beneficial to have a covering index over (id, value) as that can avoid additional seeking to the row data (see SQLite Query Planning). Try it with and without such a covering index.
Also, it may be worthwhile to employ caching if the same hashcodes are queried multiple times.
As already stated, get sure you have an index on ID.
If you can't change table schema now, you can add a index ad hoc:
CREATE INDEX test_id ON test (id);
About hashes: if you are computing hashes in your software to speed up searches, don't!
SQLite will use your supplied hashes as any regular string/blob. Also, RDBMS are optimized for efficient searching, which may be greatly improved with indexes.
Unless your hashing to save space, you are wasting processor time computing hashes in your application.

Unable to delete oldest table partition

I'm using the 11g interval partitioning feature in one of my tables. I set it up to create 1 day partitions on a timestamp field and created a job to delete data 3 months old. When I try to delete the oldest partition I get the following error:
ORA-14758: Last partition in the range section cannot be dropped
I would have thought that "Last" refers to the newest partition and not the oldest. How should I interpret this error? Is there something wrong with my partitions or should I in fact keep the oldest partition there at all time?
Yes, the error message is somewhat misleading, but it refers to the last STATICALLY created partition (in your original table DDL before Oracle started creating the partitions automatically. I think the only way to avoid this is to create an artifical "MINVAL" partition that you're sure will never be used and then drop the real partitions above this.
[Edit after exchange of comments]
I assume this test case reproduces your problem:
CREATE TABLE test
( t_time DATE
)
PARTITION BY RANGE (t_time)
INTERVAL(NUMTODSINTERVAL(1, 'DAY'))
( PARTITION p0 VALUES LESS THAN (TO_DATE('09-1-2009', 'MM-DD-YYYY')),
PARTITION p1 VALUES LESS THAN (TO_DATE('09-2-2009', 'MM-DD-YYYY')),
PARTITION p2 VALUES LESS THAN (TO_DATE('09-3-2009', 'MM-DD-YYYY')),
PARTITION p3 VALUES LESS THAN (TO_DATE('09-4-2009', 'MM-DD-YYYY'))
);
insert into test values(TO_DATE('08-29-2009', 'MM-DD-YYYY'));
insert into test values(TO_DATE('09-1-2009', 'MM-DD-YYYY'));
insert into test values(TO_DATE('09-3-2009', 'MM-DD-YYYY'));
insert into test values(TO_DATE('09-10-2009', 'MM-DD-YYYY'));
When I do this I can drop partitions p0,p1, and p2 but get your error when attempting to drop p3 even though there is a system-generated partition beyond this.
The only workaround I could find was to temporarily redefine the table partitioning by:
alter table test set interval ();
and then drop partition p3. Then you can redefine the partitioning as per the original specification by:
alter table test set INTERVAL(NUMTODSINTERVAL(1, 'DAY'));
All correct in dpbradley's answer. But it could be done more safe way if you're dropping oldest partition(s):
In fact it is enough just to reset interval like this :
alter table test set interval ();
alter table test set INTERVAL(NUMTODSINTERVAL(1, 'DAY'));
And then drop partition oldest partition.
Otherwise there is a risk if drop partition fails then table will have no interval. So need to catch all exceptions and handle this.

Resources