Simple database structure to link subjects and exam boards - sqlite

I'm new to Sql and want to use sqlite to create a database to store data about which topics are needed by which exam boards.
So far I've created two tables - one called board and one called topic. I'm not sure how I should represent the relationship between boards and topics. I have read a bit about Normal Form, and I'm pretty sure that I shouldn't be putting multiple entries into one field, and also that having fields like topic1, topic2 etc. is not the way to go.
My sql so far is below. Could someone please help me with the next step - how to make this database actually work for my requirements while not breaking every rule in the book?
For example, I'd like to be able to quickly find out which boards require knowledge of set theory or reciprocal functions etc.
Thanks in advance.
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "topic" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
"topic_name" TEXT NOT NULL,
"level" INTEGER
);
CREATE TABLE IF NOT EXISTS "board" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
"board_name" TEXT NOT NULL UNIQUE,
"link_to_syllabus" TEXT
);
INSERT INTO "topic" ("id","topic_name","level") VALUES (1,'Pythagoras'' Theorem','F');
INSERT INTO "topic" ("id","topic_name","level") VALUES (2,'Circle theorems','H');
INSERT INTO "topic" ("id","topic_name","level") VALUES (3,'',NULL);
INSERT INTO "board" ("id","board_name","link_to_syllabus") VALUES (0,'Edexcel','https://qualifications.pearson.com/en/qualifications/edexcel-gcses/mathematics-2015.html');
INSERT INTO "board" ("id","board_name","link_to_syllabus") VALUES (1,'OCR','https://www.ocr.org.uk/qualifications/gcse/mathematics-j560-from-2015/');
COMMIT;

If I understood, a board can have one or more topic, and a topic can be in one or more boards. If it is correct, you are searching for a many to many relationship, wich is obtained with the table:
CREATE TABLE board_topic (
board_id INTEGER NOT NULL
REFERENCES topic (id),
topic_id INTEGER NOT NULL
REFERENCES board (id),
CONSTRAINT pk PRIMARY KEY (
board_id ,
topic_id
)
)
About the query you asked for, after you insert some data into the association table just showed, the query looks like:
SELECT board.id, board_name FROM board
JOIN board_topic ON board.id = board_topic.board_id
JOIN topic ON topic.id = board_topic.topic_id
WHERE topic_name = "Circle theorems";

Related

What if a table doesn't have a primary key

I have made a simple relation table. All consist of three tables:
Tables for storing personal data (Table_Person)
Table for storing address data (Table_Address)
Table to store the relationship between Table_Person and Table_Address (Table_PersonAddress).
What I want to ask is can I delete the primary key in Table_PersonAddress so that Table_PersonAddress doesn't have a primary key and all that's left is the personID and addressID?
Below is an example of a database relation that I made:
enter image description here
Assuming you don't have any foreign key constraints setup on the junction table (that is, the third table which just stores relationships between people and their addresses), you could delete a person from the first table, while leaving behind the relationships in the third table. However, just because you could do this, does not mean you would want to. Most of the time, if you remove a person from the first table, you would also want to remove all of his relationships from the third table. One way to do this in SQLite is by adding cascading delete constraints to the third table, when you create it:
CREATE TABLE Table_PersonAddress (
...
CONSTRAINT fk_person
FOREIGN KEY (personID)
REFERENCES Table_Person (ID)
ON DELETE CASCADE
)
You probably would also want to add a similar constraint for the address field in the third table, since removing an address also invalidates all relationships involving that address.
Note that SQLite does not allow a cascading delete constraint to be added to table which already exists. You will have to recreate your tables somehow in order to add these constrains.
You can delete it, but my advice is to set a composite PRIMARY KEY for the 2 columns personID and addressID so each row is guaranteed to be UNIQUE.
PRIMARY KEY (personID, addressID)
and remember that in SQLite you always have the rowid column to use it as an id of the row if needed.
So create the table with this statement:
DROP TABLE IF EXISTS PersonAddress;
CREATE TABLE PersonAddress (
personID INTEGER,
addressID INTEGER,
PRIMARY KEY(personID, addressID),
FOREIGN KEY (personID) REFERENCES Person (personID) ON DELETE CASCADE,
FOREIGN KEY (addressID) REFERENCES Address (addressID) ON DELETE CASCADE
);
One more thing: why did you define personID and addressID as TEXT?
Surely SQLite is not at all strict at data type definitions, but since the columns they reference are INTEGER they also should be INTEGER.

Efficient insertion of row and foreign table row if it does not exist

Similar to this question and this solution for PostgreSQL (in particular "INSERT missing FK rows at the same time"):
Suppose I am making an address book with a "Groups" table and a "Contact" table. When I create a new Contact, I may want to place them into a Group at the same time. So I could do:
INSERT INTO Contact VALUES (
"Bob",
(SELECT group_id FROM Groups WHERE name = "Friends")
)
But what if the "Friends" Group doesn't exist yet? Can we insert this new Group efficiently?
The obvious thing is to do a SELECT to test if the Group exists already; if not do an INSERT. Then do an INSERT into Contacts with the sub-SELECT above.
Or I can constrain Group.name to be UNIQUE, do an INSERT OR IGNORE, then INSERT into Contacts with the sub-SELECT.
I can also keep my own cache of which Groups exist, but that seems like I'm duplicating functionality of the database in the first place.
My guess is that there is no way to do this in one query, since INSERT does not return anything and cannot be used in a subquery. Is that intuition correct? What is the best practice here?
My guess is that there is no way to do this in one query, since INSERT
does not return anything and cannot be used in a subquery. Is that
intuition correct?
You could use a Trigger and a little modification of the tables and then you could do it with a single query.
For example consider the folowing
Purely for convenience of producing the demo:-
DROP TRIGGER IF EXISTS add_group_if_not_exists;
DROP TABLE IF EXISTS contact;
DROP TABLE IF EXISTS groups;
One-time setup SQL :-
CREATE TABLE IF NOT EXISTS groups (id INTEGER PRIMARY KEY, group_name TEXT UNIQUE);
INSERT INTO groups VALUES(-1,'NOTASSIGNED');
CREATE TABLE IF NOT EXISTS contact (id INTEGER PRIMARY KEY, contact TEXT, group_to_use TEXT, group_reference TEXT DEFAULT -1 REFERENCES groups(id));
CREATE TRIGGER IF NOT EXISTS add_group_if_not_exists
AFTER INSERT ON contact
BEGIN
INSERT OR IGNORE INTO groups (group_name) VALUES(new.group_to_use);
UPDATE contact SET group_reference = (SELECT id FROM groups WHERE group_name = new.group_to_use), group_to_use = NULL WHERE id = new.id;
END;
SQL that would be used on an ongoing basis :-
INSERT INTO contact (contact,group_to_use) VALUES
('Fred','Friends'),
('Mary','Family'),
('Ivan','Enemies'),
('Sue','Work colleagues'),
('Arthur','Fellow Rulers'),
('Amy','Work colleagues'),
('Henry','Fellow Rulers'),
('Canute','Fellow Ruler')
;
The number of values and the actual values would vary.
SQL Just for demonstration of the result
SELECT * FROM groups;
SELECT contact,group_name FROM contact JOIN groups ON group_reference = groups.id;
Results
This results in :-
1) The groups (noting that the group "NOTASSIGNED", is intrinsic to the working of the above and hence added initially) :-
have to be careful regard mistakes like (Fellow Ruler instead of Fellow Rulers)
-1 used because it would not be a normal value automatically generated.
2) The contacts with the respective group :-
Efficient insertion
That could likely be debated from here to eternity so I leave it for the fence sitters/destroyers to decide :). However, some considerations:-
It works and appears to do what is wanted.
It's a little wasteful due to the additional wasted column.
It tries to minimise the waste by changing the column to an empty string (NULL may be even more efficient, but for some can be confusing)
There will obviously be an overhead BUT in comparison to the alternatives probably negligible (perhaps important if you were extracting every Facebook user) but if it's user input driven likely irrelevant.
What is the best practice here?
Fences again. :)
Note Hopefully obvious, but the DROP statements are purely for convenience and that all other SQL up until the INSERT is run once
to setup the tables and triggers in preparation for the single INSERT
that adds a group if necessary.

View Creation in SQLite

Having viewed several matrials both here on Stack Overflow and other external sources, I am attempting to write a piece of code to generate a view that performs a calculation utilising data from other tables, The aim of the code is:
To Acquire username from the UserDetails table
Acquire weight in pounds from the UserDetails Table
Acquire distance_in_miles from the Cardiovascular Records table
Use the weight and distance in a calculation as shown below which is output as the caloriesBurned column.
My attempt can be seen below:
CREATE VIEW CardioCaloriesBurned (username, weight, distance, caloriesBurned) AS
SELECT UserDetails.username, UserDetails.weight_in_pounds,
CardiovascularRecords.distance_in_miles ,
((0.75 X weight_in_pounds) X distance_in_miles)
FROM UserDetails, CardiovascularRecords
If anyone could help in correcting this issue It would be greatly appreciated.
Edit: I am getting a syntax error in SQLite Manager relating to a "(" however Im not seeing any issues myself.
Edit: Code for the Cardiovascular Table and UserDetails table below:
CREATE TABLE "CardiovascularRecords" ("record_ID" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE CHECK (record_ID>0) , "distance_in_miles" REAL NOT NULL CHECK (distance_in_miles>0) , "username" TEXT NOT NULL , "date" DATETIME DEFAULT CURRENT_DATE, "notes" TEXT(50), "start_time" TEXT, "end_time" TEXT, FOREIGN KEY(distance_in_miles) REFERENCES DistanceinMiles(distance_in_miles) ON DELETE CASCADE ON UPDATE CASCADE,FOREIGN KEY(username) REFERENCES UserDetails(username) ON DELETE CASCADE ON UPDATE CASCADE)
CREATE TABLE "UserDetails" ("username" TEXT PRIMARY KEY NOT NULL UNIQUE CHECK (length(username)>0), "password" TEXT NOT NULL CHECK (length(password)>3), "email_address" TEXT NOT NULL UNIQUE CHECK (length(email_address)>3) , "weight_in_pounds" REAL NOT NULL CHECK(weight_in_pounds>0) CHECK (length(weight_in_pounds)>0), "height_in_inches" REAL NOT NULL CHECK(height_in_inches>0) CHECK (length(height_in_inches)>0), "age" INTEGER CHECK(age>0), WITHOUT ROWID)
Thanks
JHB92
Your immediate error message is probably caused by the fact that you're using the letter X to indicate multiplication. It should be * (the capital 8 character on US keyboards).
After you've solved that problem, you may find that you have many, many more results in the output than you expect. That's because you're not telling the database which record in CardiovascularRecords to "hook up" for the purpose of doing the calculation. That may or may not (probably not) produce the results you want.
If you're not getting what you want, you must tell us which column or columns in UserDetails "point to" the record in CardiovascularRecords that applies to any given calculation.
CREATE VIEW CardioCaloriesBurned AS
SELECT UserDetails.username AS username,
UserDetails.weight_in_pounds AS weight,
CardiovascularRecords.distance_in_miles AS distance,
((0.75 * weight_in_pounds) * distance_in_miles) AS caloriesBurned
FROM UserDetails INNER JOIN CardiovascularRecords
ON UserDetails.username = CardiovascularRecords.username

Sqlite, is Primary Key important if I don't need auto-increment?

I only use primary key integer ID for it's "auto-increment function".
What if I don't need an "auto-increment"? Do I still need primary key if I don't care the uniqueness of record?
Example: Lets compare this table:
create table if not exists `table1`
(
name text primary key,
tel text,
address text
);
with this:
create table if not exists `table2`
(
name text,
tel text,
address text
);
table1 applies primary key and table2 don't. Is there any bad thing happen to table2?
I don't need the record to be unique.
SQLite is a relational database system. So it's all about relations. You build relations between tables on keys.
You can have tables without a primary key; it is not necessary for a table to have a primary key. But you will almost always want a primary key to show what makes a record unique in that table and to build relations.
In your example, what would it mean to have two identical records? They would mean the same person, no? Then how would you count how many persons named Anna are in the database? If you count five, how many of them are unique, how many are mere duplicates? Such queries can be done properly, but get overly complicated because of the lacking primary key. And how would you build relations, say the cars a person drives? You would have a car table and then how to link it to the persons table in your example?
There are cases when you want a table without a primary key. These are usually log tables and the like. They are rare. Whenever you are creating a table without a primary key, ask yourself why this is the case. Maybe you are about to build something messy ;-)
You get auto-incrementing primary keys only when a column is declared as INTEGER PRIMARY KEY; other data types result in plain primary keys.
You are not required to declare a PRIMARY KEY.
But even if you do not do this, there will be some column(s) used to identify and look up records.
The PRIMARY KEY declaration helps to document this, enforces uniqueness, and optimizes lookups through the implicit index.

Is it possible to (emulate?) AUTOINCREMENT on a compound-PK in Sqlite?

According to the SQLite docs, the only way to get an auto-increment column is on the primary key.
I need a compound primary key, but I also need auto-incrementing. Is there a way to achieve both of these in SQLite?
Relevant portion of my table as I would write it in PostgreSQL:
CREATE TABLE tstage (
id SERIAL NOT NULL,
node INT REFERENCES nodes(id) NOT NULL,
PRIMARY KEY (id,node),
-- ... other columns
);
The reason for this requirement is that all nodes eventually dump their data to a single centralized node where, with a single-column PK, there would be collisions.
The documentation is correct.
However, it is possible to reimplement the autoincrement logic in a trigger:
CREATE TABLE tstage (
id INT, -- allow NULL to be handled by the trigger
node INT REFERENCES nodes(id) NOT NULL,
PRIMARY KEY (id, node)
);
CREATE TABLE tstage_sequence (
seq INTEGER NOT NULL
);
INSERT INTO tstage_sequence VALUES(0);
CREATE TRIGGER tstage_id_autoinc
AFTER INSERT ON tstage
FOR EACH ROW
WHEN NEW.id IS NULL
BEGIN
UPDATE tstage_sequence
SET seq = seq + 1;
UPDATE tstage
SET id = (SELECT seq
FROM tstage_sequence)
WHERE rowid = NEW.rowid;
END;
(Or use a common my_sequence table with the table name if there are multiple tables.)
A trigger works, but is complex. More simply, you could avoid serial ids. One approach, you could use a GUID. Unfortunately I couldn't find a way to have SQLite generate the GUID for you by default, so you'd have to generate it in your application. There also isn't a GUID type, but you could store it as a string or a binary blob.
Or, perhaps there is something in your other columns that would serve as a suitable key. If you know that inserts won't happen more frequently than the resolution of your timestamp format of choice (SQLite offers several, see section 1.2), then maybe (node, timestamp_column) is a good primary key.
Or, you could use SQLite's AUTOINCREMENT, but set the starting number on each node via the sqlite_sequence table such that the generated serials won't collide. Since rowid is SQLite is a 64-bit number, you could do this by generating a unique 32-bit number for each node (IP addresses are a convenient, probably unique 32 bit number) and shifting it left 32 bits, or equivalently, multiplying it by 4294967296. Thus, the 64-bit rowid becomes effectively two concatenated 32-bit numbers, NODE_ID, RECORD_ID, guaranteed to not collide unless one node generates over four billion records.
How about...
ASSUMPTIONS
Only need uniqueness in PK, not sequential-ness
Source table has a PK
Create the central table with one extra column, the node number...
CREATE TABLE tstage (
node INTEGER NOT NULL,
id INTEGER NOT NULL, <<< or whatever the source table PK is
PRIMARY KEY (node, id)
:
);
When you rollup the data into the centralized node, insert the number of the source node into 'node' and set 'id' to the source table's PRIMARY KEY column value...
INSERT INTO tstage (nodenumber, sourcetable_id, ...);
There's no need to maintain another autoincrementing column on the central table because nodenumber+sourcetable_id will always be unique.

Resources