Defining unique constraint creating a USI - teradata

i have a requirement where i need to enforce uniqueness on a column. However, when i define a unique constraint teradata automatically defines a USI on this column. I don't want this to happen. How can i enforce uniqueness without creating a USI.
create table cfdw2_samods_stg.STORE_DIM_968
(
store_id integer ,
store_name varchar(100) not null,
store varchar(100) not null,
CONSTRAINT STORE_DIM_968_pk unique ( store_name ),
CONSTRAINT STORE_DIM_968_dpk unique ( store )
) unique primary index(store_id)

Teradata automatically enforces the UNIQUE constraint applied to a column as a USI. As a bonus, a USI provides a 2-AMP operation when accessing the table via the USI.
EDIT
There is overhead for the USI subtable maintenance - additional IO operation. The CPU overhead isn't terrible. Hard RI in Teradata is another option but then you have the overhead of maintaining the RI reference.
You can use DBQL to measure the actual cost of maintaining the USI subtable in terms of CPU and IO by loading the table with and without the UNIQUE constraint. I have seen tables with 10's of billions of rows enforce uniqueness via a USI without a significant overhead placed on the ETL.
Lastly, you can enforce the uniqueness of the column in your ETL code and make sure you don't insert duplicate values.

Related

Limiting the number of rows a table can contain based on the value of a column - SQLite

Since SQLite doesn't support TRUE and FALSE, I have a boolean keyword that stores 0 and 1. For the boolean column in question, I want there to be a check for the number of 1's the column contains and limit the total number for the table.
For example, the table can have columns: name, isAdult. If there are more than 5 adults in the table, the system would not allow a user to add a 6th entry with isAdult = 1. There is no restriction on how many rows the table can contain, since there is no limit on the amount of entries where isAdult = 0.
You can use a trigger to prevent inserting the sixth entry:
CREATE TRIGGER five_adults
BEFORE INSERT ON MyTable
WHEN NEW.isAdult
AND (SELECT COUNT(*)
FROM MyTable
WHERE isAdult
) >= 5
BEGIN
SELECT RAISE(FAIL, "only five adults allowed");
END;
(You might need a similar trigger for UPDATEs.)
The SQL-99 standard would solve this with an ASSERTION— a type of constraint that can validate data changes with respect to an arbitrary SELECT statement. Unfortunately, I don't know any SQL database currently on the market that implements ASSERTION constraints. It's an optional feature of the SQL standard, and SQL implementors are not required to provide it.
A workaround is to create a foreign key constraint so isAdult can be an integer value referencing a lookup table that contains only values 1 through 5. Then also put a UNIQUE constraint on isAdult. Use NULL for "false" when the row is for a user who is not an adult (NULL is ignored by UNIQUE).
Another workaround is to do this in application code. SELECT from the database before changing it, to make sure your change won't break your app's business rules. Normally in a multi-user RDMS this is impossible due to race conditions, but since you're using SQLite you might be the sole user.

In what cases should the AUTOINCREMENT be used instead of the default ROW ID?

In Sqlite, there are two ways to create monotonically increasing primary key values generated by the database engine, through the default ROWID mechanism or through the AUTOINCREMENT mechanism.
sqlite> -- Through the default ROWID mechanism
sqlite> CREATE TABLE foo(id INTEGER NOT NULL PRIMARY KEY, foo);
sqlite> INSERT INTO foo (foo) VALUES ('foo');
sqlite> INSERT INTO foo (foo) VALUES ('bar');
sqlite> SELECT * FROM foo;
1|foo
2|bar
sqlite>
sqlite> -- Through the AUTOINCREMENT mechanism
sqlite> CREATE TABLE bar(id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, bar);
sqlite> INSERT INTO bar (bar) VALUES ('foo');
sqlite> INSERT INTO bar (bar) VALUES ('bar');
sqlite> SELECT * FROM bar;
1|foo
2|bar
sqlite> -- Only the AUTOINCREMENT mechanism uses the sqlite_sequence table
sqlite> SELECT * FROM sqlite_sequence WHERE name in ('foo', 'bar');
bar|2
The documentation seems to suggest that using AUTOINCREMENT is bad:
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed.
If the AUTOINCREMENT keyword appears after INTEGER PRIMARY KEY, that changes the automatic ROWID assignment algorithm to prevent the reuse of ROWIDs over the lifetime of the database. In other words, the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows.
[With AUTOINCREMENT,] SQLite keeps track of the largest ROWID that a table has ever held using an internal table named "sqlite_sequence".The sqlite_sequence table is created and initialized automatically whenever a normal table that contains an AUTOINCREMENT column is created. The content of the sqlite_sequence table can be modified using ordinary UPDATE, INSERT, and DELETE statements. But making modifications to this table will likely perturb the AUTOINCREMENT key generation algorithm. Make sure you know what you are doing before you undertake such changes.
In what cases is it appropriate to use the AUTOINCREMENT keyword?
The documentation says that
the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows.
So it is appropriate to use the AUTOINCREMENT keyword when you need to prevent the reuse of ROWIDs from previously deleted rows. This might be needed if you have external references to those deleted rows, and must not confuse them with new rows.
"the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from
previously deleted rows."
Out of context, this makes it sound like that's the only purpose. I would think one of the most obvious decision points for AUTOINCREMENT is whether or not you need the guarantee of monotonically increasing integers. In turn, surely that need most commonly arises because you need to know what order insertions happened in. One might prefer integers over timestamps for this because of: insufficient clock resolution, fear of wonky clocks, difficulty synchronizing if more than one clock source is involved, etc.
Of course, one can always construct off by one situations where a transaction finishes first but gets a higher number than a competing transaction. However, these seem to pretty much correspond to cases where two humans might disagree on which transaction "happened" first, so the guarantee of monotonicity is about is good as it gets, IMO. That's a guarantee SQLite offers if you use AUTOINCREMENT.
"do you have a use case for when it is a good idea to keep an external
reference to deleted rows?"
Not so much a matter of whether it's a good idea: if humans see/use IDs in any form, it can be confusing if a number they thought was unique gets reassigned. Here's an example seen in the wild. This is a subset of the range of needs for monotonically increasing IDs, but more common than you might think. Humans are good at subconsciously absorbing the fact that numbers "are supposed to" always get bigger. For example, in ye olde slashdot days, people could glance at your user ID number to gauge how long you had been a member. On slashdot, I guess a quick hover over the user name accomplishes the same thing and saves having to go look at their profile.

How can I return inserted ids for multiple rows in SQLite?

Given a table:
CREATE TABLE Foo(
Id INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT
);
How can I return the ids of the multiple rows inserted at the same time using:
INSERT INTO Foo (Name) VALUES
('A'),
('B'),
('C');
I am aware of last_insert_rowid() but I have not found any examples of using it for multiple rows.
What I am trying to achieve can bee seen in this SQL Server example:
DECLARE #InsertedRows AS TABLE (Id BIGINT);
INSERT INTO [Foo] (Name) OUTPUT Inserted.Id INTO #InsertedRows VALUES
('A'),
('B'),
('C');
SELECT Id FROM #InsertedRows;
Any help is very much appreciated.
This is not possible. If you want to get three values, you have to execute three INSERT statements.
Given SQLite3 locking:
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
And how Last Insert Rowid works:
...returns the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection D.
It should be safe to assume that while a writer executes its batch INSERT to a ROWID-table there can be no other writer to make the generated primary keys non-consequent. Thus the insert primary keys are [lastrowid - rowcount + 1, lastrowid]. Or in Python SQLite3 API:
cursor.execute(...) # multi-VALUE INSERT
assert cursor.rowcount == len(values)
lastrowids = range(cursor.lastrowid - cursor.rowcount + 1, cursor.lastrowid + 1)
In normal circumstances when you don't mix provided and expected-to-be-generated keys or as AUTOINCREMENT-mode documentation states:
The normal ROWID selection algorithm described above will generate monotonically increasing unique ROWIDs as long as you never use the maximum ROWID value and you never delete the entry in the table with the largest ROWID.
The above should work as expected.
This Python script can be used to test correctness of the above for multi-threaded and multi-process setup.
Other databases
For instance, MySQL InnoDB (at least in default innodb_autoinc_lock_mode = 1 "consecutive" lock mode) works in similar way (though obviously in much more concurrent conditions) and guarantees that inserted PKs can be inferred from lastrowid:
"Simple inserts" (for which the number of rows to be inserted is known in advance) avoid table-level AUTO-INC locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes

Is it safe to use SYS_GUID() as unique ID in an Oracle table?

I have following table DDL. Can I safely use ROWKEY to identify a row uniquely? I don't want to use Sequence/On-Insert trigger thing.
CREATE
TABLE T_SEGMENT
(
SEGMENT_NAME VARCHAR2(15),
ROWKEY VARCHAR2(50) DEFAULT sys_guid()
)
If you are asking "is sys_guid() guaranteed to return a unique value", yes. Well, almost yes-- part of the GUID is the result of a random number generator so it's theoretically possible that you'd get a duplicate, it's just incredibly unlikely.
Of course, if you are using a GUID to uniquely identify a row, it would make sense to define the rowkey as the primary key and it would make sense for that key to be the first column in the table.

Is it possible to (emulate?) AUTOINCREMENT on a compound-PK in Sqlite?

According to the SQLite docs, the only way to get an auto-increment column is on the primary key.
I need a compound primary key, but I also need auto-incrementing. Is there a way to achieve both of these in SQLite?
Relevant portion of my table as I would write it in PostgreSQL:
CREATE TABLE tstage (
id SERIAL NOT NULL,
node INT REFERENCES nodes(id) NOT NULL,
PRIMARY KEY (id,node),
-- ... other columns
);
The reason for this requirement is that all nodes eventually dump their data to a single centralized node where, with a single-column PK, there would be collisions.
The documentation is correct.
However, it is possible to reimplement the autoincrement logic in a trigger:
CREATE TABLE tstage (
id INT, -- allow NULL to be handled by the trigger
node INT REFERENCES nodes(id) NOT NULL,
PRIMARY KEY (id, node)
);
CREATE TABLE tstage_sequence (
seq INTEGER NOT NULL
);
INSERT INTO tstage_sequence VALUES(0);
CREATE TRIGGER tstage_id_autoinc
AFTER INSERT ON tstage
FOR EACH ROW
WHEN NEW.id IS NULL
BEGIN
UPDATE tstage_sequence
SET seq = seq + 1;
UPDATE tstage
SET id = (SELECT seq
FROM tstage_sequence)
WHERE rowid = NEW.rowid;
END;
(Or use a common my_sequence table with the table name if there are multiple tables.)
A trigger works, but is complex. More simply, you could avoid serial ids. One approach, you could use a GUID. Unfortunately I couldn't find a way to have SQLite generate the GUID for you by default, so you'd have to generate it in your application. There also isn't a GUID type, but you could store it as a string or a binary blob.
Or, perhaps there is something in your other columns that would serve as a suitable key. If you know that inserts won't happen more frequently than the resolution of your timestamp format of choice (SQLite offers several, see section 1.2), then maybe (node, timestamp_column) is a good primary key.
Or, you could use SQLite's AUTOINCREMENT, but set the starting number on each node via the sqlite_sequence table such that the generated serials won't collide. Since rowid is SQLite is a 64-bit number, you could do this by generating a unique 32-bit number for each node (IP addresses are a convenient, probably unique 32 bit number) and shifting it left 32 bits, or equivalently, multiplying it by 4294967296. Thus, the 64-bit rowid becomes effectively two concatenated 32-bit numbers, NODE_ID, RECORD_ID, guaranteed to not collide unless one node generates over four billion records.
How about...
ASSUMPTIONS
Only need uniqueness in PK, not sequential-ness
Source table has a PK
Create the central table with one extra column, the node number...
CREATE TABLE tstage (
node INTEGER NOT NULL,
id INTEGER NOT NULL, <<< or whatever the source table PK is
PRIMARY KEY (node, id)
:
);
When you rollup the data into the centralized node, insert the number of the source node into 'node' and set 'id' to the source table's PRIMARY KEY column value...
INSERT INTO tstage (nodenumber, sourcetable_id, ...);
There's no need to maintain another autoincrementing column on the central table because nodenumber+sourcetable_id will always be unique.

Resources