Cascade deletion is not allowed at the database level and should be implemented in the application layer. Trying to implement using JOOQ. At present my thoughts are as follows
Given: a parent record that extends UpdatableRecord
Get the list of foreign keys referring to this parent's primary key using
parentRecord.getTable().getPrimaryKey().getReferences().
delete child records where child.parentId = parentId
implement a recursive function to handle multiple levels of parent-child relationship.
Am I on the right track? Is this functionality already present in JOOQ? Thanks for any hint.
Am I on the right track?
Well, for starters, I really would insist on using database functionality for this. It's quite likely the database will handle this better, and definitely faster than if you roll it manually from the client.
If that's not an option, an alternative might be to write a stored procedure that implements the cascading deletion, in order to prevent the many server round trips that might incur otherwise.
If that's not an option either, then yes, your approach is logically correct, but make sure that you're not going to produce an N+1 problem. The most efficient solution is to recursively go to the leaf child tables first, delete all the relevant rows there in a single bulk delete (semi-joining the entire path up to the original table's deleted rows) and then recurse up the tree. For example:
A
/ \
/ \
B C
/ \
/ \
D E
If you want to emulate a statement like (hypothetical syntax):
DELETE CASCADE FROM a WHERE a_id IN (1, 2, 3)
Then, you should run:
DELETE FROM e WHERE c_id IN (
SELECT c_id FROM c WHERE a_id IN (1, 2, 3)
);
DELETE FROM e WHERE c_id IN (
SELECT c_id FROM c WHERE a_id IN (1, 2, 3)
);
DELETE FROM c WHERE a_id IN (1, 2, 3);
DELETE FROM a WHERE a_id IN (1, 2, 3);
jOOQ will definitely help you generate these dynamically.
Is this functionality already present in JOOQ?
No, but it would be a nice addition: https://github.com/jOOQ/jOOQ/issues/7367
Related
Snowflake offers a Unique constraint but doesn't actually enforce it. I have an example below showing that with a test table.
What is the point, what value does the constraint add?
What workarounds do people use to avoid duplicates? I can perform a query before every insert but it seems like unnecessary usage.
CREATE OR REPLACE TABLE dbo.Test
(
"A" INT NOT NULL UNIQUE,
"B" STRING NOT NULL
);
INSERT INTO dbo.Test
VALUES (0, 'ABC');
INSERT INTO dbo.Test
VALUES (0, 'DEF');
SELECT *
FROM dbo.Test;
1. A, B
2. 0, ABC
3. 0, DEF
for one, Snowflake is not alone in this world. So data gets imported and exported, and while Snowflake does not enforce the constraints, some other systems might and this way they won't get lost while travelling through Snowflake
for other, it's also informational for the data analytical tools like already mentioned in the link Kirby provided
please remember, that execution is consecutive, so running a check before every query will still get you duplicates at high concurrency. To avoid duplicates fully you need to either run merges (which is admittedly going to be slower) or manually delete the "excessive" data after it's been loaded
Similar to this question and this solution for PostgreSQL (in particular "INSERT missing FK rows at the same time"):
Suppose I am making an address book with a "Groups" table and a "Contact" table. When I create a new Contact, I may want to place them into a Group at the same time. So I could do:
INSERT INTO Contact VALUES (
"Bob",
(SELECT group_id FROM Groups WHERE name = "Friends")
)
But what if the "Friends" Group doesn't exist yet? Can we insert this new Group efficiently?
The obvious thing is to do a SELECT to test if the Group exists already; if not do an INSERT. Then do an INSERT into Contacts with the sub-SELECT above.
Or I can constrain Group.name to be UNIQUE, do an INSERT OR IGNORE, then INSERT into Contacts with the sub-SELECT.
I can also keep my own cache of which Groups exist, but that seems like I'm duplicating functionality of the database in the first place.
My guess is that there is no way to do this in one query, since INSERT does not return anything and cannot be used in a subquery. Is that intuition correct? What is the best practice here?
My guess is that there is no way to do this in one query, since INSERT
does not return anything and cannot be used in a subquery. Is that
intuition correct?
You could use a Trigger and a little modification of the tables and then you could do it with a single query.
For example consider the folowing
Purely for convenience of producing the demo:-
DROP TRIGGER IF EXISTS add_group_if_not_exists;
DROP TABLE IF EXISTS contact;
DROP TABLE IF EXISTS groups;
One-time setup SQL :-
CREATE TABLE IF NOT EXISTS groups (id INTEGER PRIMARY KEY, group_name TEXT UNIQUE);
INSERT INTO groups VALUES(-1,'NOTASSIGNED');
CREATE TABLE IF NOT EXISTS contact (id INTEGER PRIMARY KEY, contact TEXT, group_to_use TEXT, group_reference TEXT DEFAULT -1 REFERENCES groups(id));
CREATE TRIGGER IF NOT EXISTS add_group_if_not_exists
AFTER INSERT ON contact
BEGIN
INSERT OR IGNORE INTO groups (group_name) VALUES(new.group_to_use);
UPDATE contact SET group_reference = (SELECT id FROM groups WHERE group_name = new.group_to_use), group_to_use = NULL WHERE id = new.id;
END;
SQL that would be used on an ongoing basis :-
INSERT INTO contact (contact,group_to_use) VALUES
('Fred','Friends'),
('Mary','Family'),
('Ivan','Enemies'),
('Sue','Work colleagues'),
('Arthur','Fellow Rulers'),
('Amy','Work colleagues'),
('Henry','Fellow Rulers'),
('Canute','Fellow Ruler')
;
The number of values and the actual values would vary.
SQL Just for demonstration of the result
SELECT * FROM groups;
SELECT contact,group_name FROM contact JOIN groups ON group_reference = groups.id;
Results
This results in :-
1) The groups (noting that the group "NOTASSIGNED", is intrinsic to the working of the above and hence added initially) :-
have to be careful regard mistakes like (Fellow Ruler instead of Fellow Rulers)
-1 used because it would not be a normal value automatically generated.
2) The contacts with the respective group :-
Efficient insertion
That could likely be debated from here to eternity so I leave it for the fence sitters/destroyers to decide :). However, some considerations:-
It works and appears to do what is wanted.
It's a little wasteful due to the additional wasted column.
It tries to minimise the waste by changing the column to an empty string (NULL may be even more efficient, but for some can be confusing)
There will obviously be an overhead BUT in comparison to the alternatives probably negligible (perhaps important if you were extracting every Facebook user) but if it's user input driven likely irrelevant.
What is the best practice here?
Fences again. :)
Note Hopefully obvious, but the DROP statements are purely for convenience and that all other SQL up until the INSERT is run once
to setup the tables and triggers in preparation for the single INSERT
that adds a group if necessary.
I am working on a CR where I need to create a PL/SQL package and I am bit confused about the approach.
Background : There is a View named ‘D’ which is at end of the chain of interdependent views in sequence.
We can put it as :
A – Fact table (Populated using Informatica, source MS-Dynamics)
B – View 1 based on fact table
C – View 2 based on View1
D – View 3 based on view2
Each view has multiple joins with other tables in structure along with the base view.
Requirement: Client wants to remove all these views and create a PL/SQL Package which can insert data directly from MS-Dynamics to View3 i.e., ‘D’.
Before I come up with something complex. I would like to know, is there any standard approach to address such requirements.
Any advice/suggestions are appreciated.
It should be obvious that you still need a fact table to keep some data.
You could get rid of B and C by making D more complex (the WITH clause might help to keep it overseeable).
Inserting data into D is (most likely) not possible per se, but you can create and INSTEAD OF INSERT trigger to handle that, i.e. insert into the fact table A instead.
Example for using the WITH clause:
Instead of
create view b as select * from dual;
create view c as select * from b;
create view d as select * from c;
you could write
create view d as
with b as (select * from dual),
c as (select * from b)
select * from c;
As you can see, the existing view definition goes 1:1 into the WITH clause, so it's not too difficult to create a view to combine all views.
If you are on Oracle 12c you might look at DBMS_UTILITY.EXPAND_SQL_TEXT, though you'll probably want to clean up the output a bit for readability.
A few things first
1) A view is a predefined sql query so it is not possible to insert records directly into it. Even a materialized view which is a persistant table structure only gets populated with the results of a query thus as things stand this is not possible. What is possible is to create a new table to populate the data which is currently aggregated at view D
2) It is very possible to aggregate data at muliple levels in Informatica using combination of multiple inline sorter and aggregater transformations which will generate the data at the level you're looking for.
3) Should you do it? Data warehousing best practices would say no and keep the data as granular as possible per the original table A so that it can be rolled up in many ways (refer Kimball group site and read up on star schema for such matters). Do you have much sway in the choice though?
4) The current process (while often used) is not that much better in terms of star schema
I have a process where at some point I got to add a new column to a table of type INTEGER, then I got to populate this new column with an UPDATE. I do this in C.
I can lean down my code to
CREATE table t (a integer, b integer)
populate t
CREATE INDEX t_ndx on t (a)
ALTER TABLE t add c integer
The C pseudo code to update the column 'c' look like this
sqlite3_stmt u;
sqlite3_prepare_v2(db, "update t set c=? where a=?, -1, &u, 0);
for(i=0;i<n;i++)
{ c=c_a[i];
a=a_a[i];
sqlite3_bind_int64(u, 1, c);
sqlite3_bind_int64(u, 1, a);
sqlite3_step(u);
}
The order of a's are the same for this UPDATE as the one given when t was created.
I'd like to know if the sqlite3 engine detect the 'sequential' access and do speed up the "where a=?" (i.e keep a kind of caching of previous cursor ?
I'd like to know as well if there are 'hidden' feature like binding array's (at least when dealing witn INTEGERs) to avoid construct such a loop and avoid all those bindings and avoid the bytecode for doing all those insert something along the line of
sqlite3_stmt u;
sqlite3_prepare_v2(db, "update t set c=? where a=?, -1, &u, 0);
sqlite3_bind_int64_array(u, 1, c_a, n);
sqlite3_bind_int64_array(u, 1, a_a, n);
sqlite3_step_array(u,n);
Thanx in advance
Cheers
Phi
Your code already is pretty much optimal. Searching for the row by a needs a single index lookup and a single table row lookup; both will be fast because the needed pages are likely to be already cached.
You could speed up the lookup on a by making this column the INTEGER PRIMARY KEY, but this makes sense only if a actually is the primary key.
In theory, it would be possible to update multiple rows at once:
UPDATE t
SET c = CASE a
WHEN :a1 THEN :c1
WHEN :a2 THEN :c2
WHEN :a2 THEN :c3
END
WHERE a IN (:a1, :a2, :a3);
But for many a values, this is likely to be implemented as a scan over the table, so it would make sense only if you could fit all values into the query, which is not possible for a large table.
I have recently stumbled upon a problem with selecting relationship details from a 1 table and inserting into another table, i hope someone can help.
I have a table structure as follows:
ID (PK) Name ParentID<br>
1 Myname 0<br>
2 nametwo 1<br>
3 namethree 2
e.g
This is the table i need to select from and get all the relationship data. As there could be unlimited number of sub links (is there a function i can create for this to create the loop ?)
Then once i have all the data i need to insert into another table and the ID's will now have to change as the id's must go in order (e.g. i cannot have id "2" be a sub of 3 for example), i am hoping i can use the same function for selecting to do the inserting.
If you are using SQL Server 2005 or above, you may use recursive queries to get your information. Here is an example:
With tree (id, Name, ParentID, [level])
As (
Select id, Name, ParentID, 1
From [myTable]
Where ParentID = 0
Union All
Select child.id
,child.Name
,child.ParentID
,parent.[level] + 1 As [level]
From [myTable] As [child]
Inner Join [tree] As [parent]
On [child].ParentID = [parent].id)
Select * From [tree];
This query will return the row requested by the first portion (Where ParentID = 0) and all sub-rows recursively. Does this help you?
I'm not sure I understand what you want to have happen with your insert. Can you provide more information in terms of the expected result when you are done?
Good luck!
For the retrieval part, you can take a look at Common Table Expression. This feature can provide recursive operation using SQL.
For the insertion part, you can use the CTE above to regenerate the ID, and insert accordingly.
I hope this URL helps Self-Joins in SQL
This is the problem of finding the transitive closure of a graph in sql. SQL does not support this directly, which leaves you with three common strategies:
use a vendor specific SQL extension
store the Materialized Path from the root to the given node in each row
store the Nested Sets, that is the interval covered by the subtree rooted at a given node when nodes are labeled depth first
The first option is straightforward, and if you don't need database portability is probably the best. The second and third options have the advantage of being plain SQL, but require maintaining some de-normalized state. Updating a table that uses materialized paths is simple, but for fast queries your database must support indexes for prefix queries on string values. Nested sets avoid needing any string indexing features, but can require updating a lot of rows as you insert or remove nodes.
If you're fine with always using MSSQL, I'd use the vendor specific option Adrian mentioned.