Efficient insertion of row and foreign table row if it does not exist - sqlite

Similar to this question and this solution for PostgreSQL (in particular "INSERT missing FK rows at the same time"):
Suppose I am making an address book with a "Groups" table and a "Contact" table. When I create a new Contact, I may want to place them into a Group at the same time. So I could do:
INSERT INTO Contact VALUES (
"Bob",
(SELECT group_id FROM Groups WHERE name = "Friends")
)
But what if the "Friends" Group doesn't exist yet? Can we insert this new Group efficiently?
The obvious thing is to do a SELECT to test if the Group exists already; if not do an INSERT. Then do an INSERT into Contacts with the sub-SELECT above.
Or I can constrain Group.name to be UNIQUE, do an INSERT OR IGNORE, then INSERT into Contacts with the sub-SELECT.
I can also keep my own cache of which Groups exist, but that seems like I'm duplicating functionality of the database in the first place.
My guess is that there is no way to do this in one query, since INSERT does not return anything and cannot be used in a subquery. Is that intuition correct? What is the best practice here?

My guess is that there is no way to do this in one query, since INSERT
does not return anything and cannot be used in a subquery. Is that
intuition correct?
You could use a Trigger and a little modification of the tables and then you could do it with a single query.
For example consider the folowing
Purely for convenience of producing the demo:-
DROP TRIGGER IF EXISTS add_group_if_not_exists;
DROP TABLE IF EXISTS contact;
DROP TABLE IF EXISTS groups;
One-time setup SQL :-
CREATE TABLE IF NOT EXISTS groups (id INTEGER PRIMARY KEY, group_name TEXT UNIQUE);
INSERT INTO groups VALUES(-1,'NOTASSIGNED');
CREATE TABLE IF NOT EXISTS contact (id INTEGER PRIMARY KEY, contact TEXT, group_to_use TEXT, group_reference TEXT DEFAULT -1 REFERENCES groups(id));
CREATE TRIGGER IF NOT EXISTS add_group_if_not_exists
AFTER INSERT ON contact
BEGIN
INSERT OR IGNORE INTO groups (group_name) VALUES(new.group_to_use);
UPDATE contact SET group_reference = (SELECT id FROM groups WHERE group_name = new.group_to_use), group_to_use = NULL WHERE id = new.id;
END;
SQL that would be used on an ongoing basis :-
INSERT INTO contact (contact,group_to_use) VALUES
('Fred','Friends'),
('Mary','Family'),
('Ivan','Enemies'),
('Sue','Work colleagues'),
('Arthur','Fellow Rulers'),
('Amy','Work colleagues'),
('Henry','Fellow Rulers'),
('Canute','Fellow Ruler')
;
The number of values and the actual values would vary.
SQL Just for demonstration of the result
SELECT * FROM groups;
SELECT contact,group_name FROM contact JOIN groups ON group_reference = groups.id;
Results
This results in :-
1) The groups (noting that the group "NOTASSIGNED", is intrinsic to the working of the above and hence added initially) :-
have to be careful regard mistakes like (Fellow Ruler instead of Fellow Rulers)
-1 used because it would not be a normal value automatically generated.
2) The contacts with the respective group :-
Efficient insertion
That could likely be debated from here to eternity so I leave it for the fence sitters/destroyers to decide :). However, some considerations:-
It works and appears to do what is wanted.
It's a little wasteful due to the additional wasted column.
It tries to minimise the waste by changing the column to an empty string (NULL may be even more efficient, but for some can be confusing)
There will obviously be an overhead BUT in comparison to the alternatives probably negligible (perhaps important if you were extracting every Facebook user) but if it's user input driven likely irrelevant.
What is the best practice here?
Fences again. :)
Note Hopefully obvious, but the DROP statements are purely for convenience and that all other SQL up until the INSERT is run once
to setup the tables and triggers in preparation for the single INSERT
that adds a group if necessary.

Related

SQLite: treat non-existent column as NULL

I have a query like this (simplified and anonymised):
SELECT
Department.id,
Department.name,
Department.manager_id,
Employee.name AS manager_name
FROM
Department
LEFT OUTER JOIN Employee
ON Department.manager_id = Employee.id;
The field Department.manager_id may be NULL. If it is non-NULL then it is guaranteed to be a valid id for precisely one row in the Employee table, so the OUTER JOIN is there just for the rows in the Department table where it is NULL.
Here is the problem: old instances of the database do not have this Department.manager_id column at all. In those cases, I would like the query to act as if the field did exist but was always NULL, so e.g. the manager_name field is returned as NULL. If the query only used the Department table then I could just use SELECT * and check for the column in my application, but the JOIN seems to make this impossible. I would prefer not to modify the database, partly so that I can load the database in read only mode. Can this be done just by clever adjustment of the query?
For completeness, here is an answer that does not require munging both possible schemas into one query (but still doesn't need you to actually do the schema migration):
Check for the schema version, and use that to determine which SELECT query to issue (i.e. with or without the manager_id column and JOIN) as a separate step. Here are a few possibilities to determine the schema version:
The ideal situation is that you already keep track of the schema by assigning version numbers to the schema and recording them in the database. Commonly this is done with either:
The user_version pragma.
A table called "Schema" or similar with one row containing the schema version number.
You can directly determine whether the column is present in the table. Two possibilities:
Use the table_info pragma to determine the list of columns in the table.
Use a simple SELECT * FROM Table LIMIT 1 and look at what columns are returned (this is probably better as it is independent of the database engine).
This seems to work:
SELECT
Dept.id,
Dept.name,
Dept.manager_id,
Employee.name AS manager_name
FROM
(SELECT *, NULL AS manager_id FROM Department) AS Dept
LEFT OUTER JOIN Employee
ON Dept.manager_id = Employee.id;
If the manager_id column is present in Department then it is used for the join, whereas if it is not then Dept.manager_id and Employee.name are both NULL.
If I swap the column order in the subquery:
(SELECT NULL AS manager_id, * FROM Department) AS Dept
then the Dept.manager_id and Employee.name are both NULL even if the Department.manager_id column exists, so it seems that Dept.manager_id refers to the first column in the Dept subquery that has that name. It would be good to find a reference in the SQLite documentation saying that this behaviour is guaranteed (or explicitly saying that it is not), but I can't find anything (e.g. in the SELECT or expression pages).
I haven't tried this with other database systems so I don't know if it will work with anything other than SQLite.

Limiting the number of rows a table can contain based on the value of a column - SQLite

Since SQLite doesn't support TRUE and FALSE, I have a boolean keyword that stores 0 and 1. For the boolean column in question, I want there to be a check for the number of 1's the column contains and limit the total number for the table.
For example, the table can have columns: name, isAdult. If there are more than 5 adults in the table, the system would not allow a user to add a 6th entry with isAdult = 1. There is no restriction on how many rows the table can contain, since there is no limit on the amount of entries where isAdult = 0.
You can use a trigger to prevent inserting the sixth entry:
CREATE TRIGGER five_adults
BEFORE INSERT ON MyTable
WHEN NEW.isAdult
AND (SELECT COUNT(*)
FROM MyTable
WHERE isAdult
) >= 5
BEGIN
SELECT RAISE(FAIL, "only five adults allowed");
END;
(You might need a similar trigger for UPDATEs.)
The SQL-99 standard would solve this with an ASSERTION— a type of constraint that can validate data changes with respect to an arbitrary SELECT statement. Unfortunately, I don't know any SQL database currently on the market that implements ASSERTION constraints. It's an optional feature of the SQL standard, and SQL implementors are not required to provide it.
A workaround is to create a foreign key constraint so isAdult can be an integer value referencing a lookup table that contains only values 1 through 5. Then also put a UNIQUE constraint on isAdult. Use NULL for "false" when the row is for a user who is not an adult (NULL is ignored by UNIQUE).
Another workaround is to do this in application code. SELECT from the database before changing it, to make sure your change won't break your app's business rules. Normally in a multi-user RDMS this is impossible due to race conditions, but since you're using SQLite you might be the sole user.

Alternative to using subquery inside CHECK constraint?

I am trying to build a simple hotel room check-in database as a learning exercise.
CREATE TABLE HotelReservations
(
roomNum INTEGER NOT NULL,
arrival DATE NOT NULL,
departure DATE NOT NULL,
guestName CHAR(30) NOT NULL,
CONSTRAINT timeTraveler CHECK (arrival < departure) /* stops time travelers*/
/* CONSTRAINT multipleReservations CHECK (my question is about this) */
PRIMARY KEY (roomNum, arrival)
);
I am having trouble specifying a constraint that doesn't allow inserting a new reservation for a room that has not yet been vacated. For example (below), guest 'B' checks into room 123 before 'A' checks out.
INSERT INTO HotelStays(roomNum, arrival, departure, guestName)
VALUES
(123, date("2017-02-02"), date("2017-02-06"), 'A'),
(123, date("2017-02-04"), date("2017-02-08"), 'B');
This shouldn't be allowed but I am unsure how to write this constraint. My first attempt was to write a subquery in check, but I had trouble figuring out the proper subquery because I don't know how to access the 'roomNum' value of a new insert to perform the subquery with. I then also figured out that most SQL systems don't even allow subquerying inside of check.
So how am I supposed to write this constraint? I read some about triggers which seem like it might solve this problem, but is that really the only way to do it? Or am I just dense and missing an obvious way to write the constraint?
The documentation indeed says:
The expression of a CHECK constraint may not contain a subquery.
While it would be possible to create a user-defined function that goes back to the database and queries the table, the only reasonable way to implement this constraint is with a trigger.
There is a special mechanism to access the new row inside the trigger:
Both the WHEN clause and the trigger actions may access elements of the row being inserted, deleted or updated using references of the form "NEW.column-name" and "OLD.column-name", where column-name is the name of a column from the table that the trigger is associated with.
CREATE TRIGGER multiple_reservations_check
BEFORE INSERT ON HotelReservations
BEGIN
SELECT RAISE(FAIL, "reservations overlap")
FROM HotelReservations
WHERE roomNum = NEW.roomNum
AND departure > NEW.arrival
AND arrival < NEW.departure;
END;

Cascading List of Values with many to many relationship

I am developing an application which tracks class attendance of students in a school, in Apex.
I want to create a page with three level cascading select lists, so the teacher can first select the Semester, then the Subject and then the specific Class of that Subject, so the application returns the Students who are enrolled in that Class.
My problem is that these three tables have a many-to-many relationship between them, so I use extra tables with their keys.
Every Semester has many Subjects and a Subject can be taught in many Semesters.
Every Subject has many classes in every Semester.
The students must enroll in a subject every semester and then the teacher can assign them to a class.
The tables look something like this:
create table semester(
id number not null,
name varchar2(20) not null,
primary key(id)
);
create table subject(
id number not null,
subject_name varchar2(50) not null,
primary key(id)
);
create table student(
id number not null,
name varchar2(20),
primary key(id)
);
create table semester_subject(
id number not null,
semester_id number not null,
subject_id number not null,
primary key(id),
foreign key(semester_id) references semester(id),
foreign key(subject_id) references subject(id),
constraint unique sem_sub_uq unique(semester_id, subject_id)
);
create table class(
id number not null,
name number not null,
semester_subject_id number not null,
primary key(id),
foreign key(semester_subject_id) references semester_subject(id)
);
create table class_enrollment(
id number not null,
student_id number not null,
semester_subject_id number not null,
class_id number,
primary_key(id),
foreign key(student_id) references student(id),
foreign key(semester_subject_id) references semester_subject(id),
foreign key(class_id) references class(id)
);
The list of value for the Semester select list looks like this:
select name, id
from semester
order by 1;
The the subject select list should include the names of all the Subjects available in the semester selected above, but I can't figure the query or even if it's possible. What I have right now:
select s.name, s.id
from subject s, semester_subject ss
where ss.semester_id = :PX_SEMESTER //value from above select list
and ss.subject_id = s.id;
But you can't have two tables in a LoV and the query is probably wrong anyway...
I didn't even begin to think about what the query for the class would look like.
I appreciate any help or if you can point me in the right direction so I can figure it out myself.
Developing an Apex Input Form Using Item-Parametrized Lists of Values (LOVs)
Your initial schema design looks good. One recommendation once you've developed and tested your solution on a smaller scale, append to the ID (primary key) columns a trigger that can auto-populate its values through a sequence. You could also skip the trigger and just reference the sequence in your sql insert DML commands. It just makes things simpler. Creating tables in the APEX environment with their built-in wizards offer the opportunity to make an "auto-incrementing" key column.
There is also an additional column added to the SEMESTER table called SORT_KEY. This helps when you are storing string typed values which have logical sorting sequences that aren't exactly alphanumeric in nature.
Setting Up The Test Data Values
Here is the test data I generated to demonstrate the cascading list of values design that will work with the example.
Making Dynamic List of Value Queries
The next step is to make the first three inter-dependent List of Values definitions. As you have discovered, you can reference page parameters in your LOVs which may come from a variety of sources. In this case, the choice selection from our LOVs will be assigned to Apex Page Items.
I also thought only one table could be referenced in a single LOV query. This is incorrect. The page documentation suggests that it is the SQL query syntax that is the limiting factor. The following LOV queries reference more than one table, and they work:
-- SEMESTER LOV Query
-- name: CHOOSE_SEMESTER
select a.name d, a.id r
from semester a
where a.id in (
select b.semester_id
from semester_subject b
where b.subject_id = nvl(:P5_SUBJECT, b.subject_id))
order by a.sort_id
-- SUBJECT LOV Query
-- name: CHOOSE_SUBJECT
select a.subject_name d, a.id r
from subject a
where a.id in (
select b.subject_id
from semester_subject b
where b.semester_id = nvl(:P5_SEMESTER, b.semester_id))
order by 1
-- CLASS LOV Query
-- name: CHOOSE_CLASS
select a.name d, a.id r
from class a, semester_subject b
where a.semester_subject_id = b.id
and b.subject_id = :P5_SUBJECT
and b.semester_id = :P5_SEMESTER
order by 1
Some design notes to consider:
Don't mind the P5_ITEM notation. The page in my sample app happened to be on "page 5" and so the convention goes.
I chose to assign a name for each LOV query as a hint. Don't just embed the query in an item. Add some breathing room for yourself as a developer by making the LOV a portable object that can be referenced elsewhere if needed.
MAKE a named LOV for each query through the SHARED OBJECTS menu option of your application designer.
The extra operator involving the NVL command, as in nvl(:P5_SUBJECT, b.subject_id) for the CHOOSE_SEMESTER LOV is an expression mirrored on the CHOOSE_SUBJECT query as well. If the default value of P5_SUBJECT and P5_SEMESTER are null when entering the page, how does that assist with the handling of the cascading relationships?
The table SEMESTER_SUBJECT represents a key relationship. Why is a LOV for this table not needed?
APEX Application Form Design Using Cascading LOVs
Setting up the a page for testing the schema design and LOV queries requires the creation of three page items:
Each page item should be defined as a SELECT LIST leave all the defaults initially until you understand how the basic design works. Each select list item should be associated with their corresponding LOV, such as:
The key design twist is the Select List made for the CHOOSE_CLASS LOV, which represents a cascading dependency on more than one data source.
We will use the "Cascading Parent" option so that this item will wait until both CHOOSE_SEMESTER and CHOOSE_SUBJECT are selected. It will also refresh if either of the two are changed.
YES! The cascading parent item can consist of multiple page items/elements. They just have to be declared in a comma separated list.
From the online help info, this is a general introduction to how cascading LOVs can be used in APEX designs:
From Oracle Apex Help Docs: A cascading LOV means that the current item's list of values should be refreshed if the value of another item on this page gets changed.
Specify a comma separated list of page items to be used to trigger the refresh. You can then use those page items in the where clause of your "List of Values" SQL statement.
Demonstration of APEX Application Items with Cascading LOVs
These examples are based on the sample data given at the beginning of this solution. The path of the chosen example case is:
SEMESTER: SPRING 2014 + SUBJECT: PHYS ED + Verify Valid Course Options:
Fitness for Life
General Flexibility
Presidential Fitness Challenge
Running for Fun
Volleyball Basics
The choice from above will be assigned to page item P5_CLASS.
Selection Choices for P5_SEMESTER:
Selection Choices for P5_SUBJECT:
Selection Choices for P5_CLASS:
Closing Remarks and Discussion
Some closing thoughts that occurred to me while working with this design project:
About the Primary Keys: The notion of a generic, ID named column for a primary key was a good design choice. While APEX can handle composite business keys, it gets clumsy and difficult to work around.
One thing that made the schema design challenging to work with was that the notion of "id" transformed in the other tables that referenced it. (Such as the ID column in the SEMESTER table became SEMESTER_ID in the SEMESTER_SUBJECT table. Just keep an eye on these name changes with larger queries. At times I actually lost track exactly what ID I was working with.
A Word for Sanity: In the likely event you decide to assign ID values through a database sequence object, the default is usually to begin at one. If you have several different tables in your schema with the same column name: ID and some associating tables such as CLASS_ENROLLMENT which connects the values of one primary key ID and three additional foreign key ID's, it may get difficult to discern where the data values are coming from.
Consider offsetting your sequences or arbitrarily choosing different increments and starting values. If you're mainly pushing ID's around in your queries, if two different ID sets are separated by two or three orders of magnitude, it will be easy to know if you've pulled the right data values.
Are There MORE Cascading Relationships? If a "parent" item relationship indicates a dependency that makes a page item LOV wait or change depending on the value of another, could there be another cascading relationship to define? In the case of CHOOSE_SEMESTER and CHOOSE_SUBJECT is it possible? Is it necessary?
I was able to figure out how to make these two items hold an optional cascading dependency, but it required setting up another outside page item reference. (If it isn't optional, you get stuck in a closed loop as soon as one of the two values changes.) Fancy, but not really necessary to solve the problem at hand.
What's Left to Do? I left out some additional tasks for you to continue with, such as managing the DML into the ENROLLMENT table after selecting a valid STUDENT.
Overall, you've got a workable schema design. There is a way to represent the data relationships through an APEX application design pattern. Happy coding, it looks like a challenging project!

Hierarchical Database Select / Insert Statement (SQL Server)

I have recently stumbled upon a problem with selecting relationship details from a 1 table and inserting into another table, i hope someone can help.
I have a table structure as follows:
ID (PK) Name ParentID<br>
1 Myname 0<br>
2 nametwo 1<br>
3 namethree 2
e.g
This is the table i need to select from and get all the relationship data. As there could be unlimited number of sub links (is there a function i can create for this to create the loop ?)
Then once i have all the data i need to insert into another table and the ID's will now have to change as the id's must go in order (e.g. i cannot have id "2" be a sub of 3 for example), i am hoping i can use the same function for selecting to do the inserting.
If you are using SQL Server 2005 or above, you may use recursive queries to get your information. Here is an example:
With tree (id, Name, ParentID, [level])
As (
Select id, Name, ParentID, 1
From [myTable]
Where ParentID = 0
Union All
Select child.id
,child.Name
,child.ParentID
,parent.[level] + 1 As [level]
From [myTable] As [child]
Inner Join [tree] As [parent]
On [child].ParentID = [parent].id)
Select * From [tree];
This query will return the row requested by the first portion (Where ParentID = 0) and all sub-rows recursively. Does this help you?
I'm not sure I understand what you want to have happen with your insert. Can you provide more information in terms of the expected result when you are done?
Good luck!
For the retrieval part, you can take a look at Common Table Expression. This feature can provide recursive operation using SQL.
For the insertion part, you can use the CTE above to regenerate the ID, and insert accordingly.
I hope this URL helps Self-Joins in SQL
This is the problem of finding the transitive closure of a graph in sql. SQL does not support this directly, which leaves you with three common strategies:
use a vendor specific SQL extension
store the Materialized Path from the root to the given node in each row
store the Nested Sets, that is the interval covered by the subtree rooted at a given node when nodes are labeled depth first
The first option is straightforward, and if you don't need database portability is probably the best. The second and third options have the advantage of being plain SQL, but require maintaining some de-normalized state. Updating a table that uses materialized paths is simple, but for fast queries your database must support indexes for prefix queries on string values. Nested sets avoid needing any string indexing features, but can require updating a lot of rows as you insert or remove nodes.
If you're fine with always using MSSQL, I'd use the vendor specific option Adrian mentioned.

Resources