Is MLOAD executed in a single transaction? - teradata

I have an MLOAD job that inserts data from an Oracle database into a Teradata database. One of the things it does it drop the destination table and recreate it. Our production website populates a dropdown list based on what's in the destination table.
If the MLOAD script is not on a single transaction then it's possible that the dropdown list could fail to populate properly if the binding occurs during the MLOAD job. If it is transactional, however, it would be a seamless process because the changes would not show until the transaction is committed.
I checked the dbc.DBQLogTbl and dbc.DBQLQryLogsql views after running the MLOAD job and it appears there are several transactions occurring within the job, so it would seem that the entire job is not done in a single transaction. However, I wanted to verify that this is indeed the case before I make assumptions.

A transaction in Teradata cannot include multiple DDL statements, each DDL must be commited seperatly.
A MLoad is treated logically as a single transaction even if you see multiple transactions in DBQL, these are steps to prepare and cleanup.
When your application tries to select from the target table everything will be ok (unless it's doing a dirty read using LOCKING ROW FOR ACCESS).
Btw, there might be another error message "table doesn't exist" when the application tries to select. Why do you drop/recreate the table instead of a simple DELETE?
Another solution would be a loading a copy of the table and use view switching:
mload tab2;
replace view v as select * from tab2;
delete from tab1;
The next load will do:
mload tab1;
replace view v as select * from tab1;
delete from tab2;
And so on. Of course your load job needs to implement the switching logic.

Related

Is there a way to efficiently test whether a newly inserted/updated row matches a SQLite query?

I have a SQLite table in my application that periodically has INSERT/UPDATE statements executed against it. I would like to display a view in my application that reflects some query that is run against that table, and keep it continually updated as the table contents change. Since the table could be large, I would like to avoid having to re-run the query each time the table is updated so that I can update the view.
One idea I had was to use SQLite's Data Change Notification Callbacks to be notified whenever an INSERT/UPDATE occurs against the table in question. In my callback, I have the rowid of the newly-updated row, and I would like to see whether it matches the query. Assuming I have the query available as a prepared sqlite3_stmt, what would be the most efficient way to test whether the row would be matched by the query?
Aside: I know that I can't do anything in the callback itself that would affect the state of the database connection, and that's fine. I can defer the actual work of checking the query until later to ensure safety; I'm just trying to determine what the best mechanism for checking the query against the new row contents would be.

Sqlite: Are updates to two tables within an insert trigger atomic?

I refactored a table that stored both metadata and data into two tables, one for metadata and one for data. This allows metadata to be queried efficiently.
I also created an updatable view with the original table's columns, using sqlite's insert, update and delete triggers. This allows calling code that needs both data and metadata to remain unchanged.
The insert and update triggers write each incoming row as two rows - one in the metadata table and one in the data table, like this:
// View
CREATE VIEW IF NOT EXISTS Item as select n.Id, n.Title, n.Author, c.Content
FROM ItemMetadata n, ItemData c where n.id = c.Id
// Trigger
CREATE TRIGGER IF NOT EXISTS item_update
INSTEAD OF UPDATE OF id, Title, Author, Content ON Item
BEGIN
UPDATE ItemMetadata
SET Title=NEW.Title, Author=NEW.Author
WHERE Id=old.Id;
UPDATE ItemData SET Content=NEW.Content
WHERE Id=old.Id;
END;
Questions:
Are the updates to the ItemMetadata and ItemData tables atomic? Is there a chance that a reader can see the result of the first update before the second update has completed?
Originally I had the WHERE clauses be WHERE rowid=old.rowid but that seemed to cause random problems so I changed them to WHERE Id=old.Id. The original version was based on tutorial code I found. But after thinking about it I wonder how sqlite even comes up with an old rowid - after all, this is a view across multiple tables. What rowid does sqlite pass to an update trigger, and is the WHERE clause the way I first coded it problematic?
The documentation says:
No changes can be made to the database except within a transaction. Any command that changes the database (basically, any SQL command other than SELECT) will automatically start a transaction if one is not already in effect.
Commands in a trigger are considered part of the command that triggered the trigger.
So all commands in a trigger are part of a transaction, and atomic.
Views do not have a (usable) rowid.

R - how to react to database inserts/updates/deletes?

I'm reading in data from an SQLite database table into a data.frame with R's DBI. Often (as often as every 5 secs), new records get added into the database table externally, or existing ones updated/deleted, at which point I need to propagate these changes to my data.frame.
So the question is how can I hook onto and respond to these database events in R? I don't want to have to keep querying the database every 5 secs just to make sure nothing has changed. Is there some callback mechanism at my disposal?
If you have access to the C code that is writing your SQL data, then you can implement a callback:
http://www.sqlite.org/c3ref/update_hook.html
and then in your callback function you could update the timestamp of a file if the table being modified is one your R code cares about. Then your R code checks the timestamp of that file, and if its changed, only then does it need to query the SQLite database.
Now I don't know if you could add a callback to the SQLite connection held by R and expect to get a callback if another SQLite connection/process changes the database table. I doubt it, I suspect the callbacks are only triggered if the connection they are registered with does the update, because otherwise all sorts of asynchronous things happen, and there's no event handler.
Another idea is to use triggers to update a database table of modification times. Define triggers on all tables you care about so that they update a row in a "last modified" table. Then use the file modification time to check for any change to the database, and then your R only has to query the "last modified" table to see what specific table has changed since last check.

sqlite: dropping a table in a transaction?

I have a simple, single table sqlite3 database file that has exactly one table. There are no keys, foreign or domestic. There are no triggers. I have the following workflow:
If the database file exixts open it.
Start exclusive transaction
Select all rows from the table in order.
Operate on each row.
Delete each operated-on row.
When done, count the number of remaining rows in the table, if 0 then DROP the table then unlink the database file
Commit or Rollback the transaction
The drop-table always fails with the message that the table is locked. I've seen a couple of other posts that suggest that there could be open statement handles or other cruft lying around. Since I am using "sqlite_exec()"s for all of this I do not have any open DB anything except the DB handle itself.
Is drop table not allowed in transactions?
When dropping a table, you get the "table is locked" message when there is still some active cursor on the table, i.e., when you did not finalize a statement (or did not close a query object in whatever language you're using).

Teradata: Is there a way to generate DDL from a view or select statement?

I am using a global application user account to access database A. This user account does not have permissions to modify database A's schema (ie, create tables, modify tables, etc). This user also has access to database B, but only views. I need to run SQL to feed data from a view in database B into a table in database A.
In a perfect world, I would be able to use this SQL:
create database_a.mytable as (select * from database_b) with no data
However, the user can't create tables in database A. If I could get the DDL of the select statement then I could log in under my personal account (which doesn't have any access to database B) and run the DDL in database A to create the table.
The only other option is to manually write the SQL, but I don't want to do that, especially since this view I am wanting to copy has many columns of varying data types and sizes.
Edit: I may be getting closer. I just experimented with this:
show (select * from database_b.myview)
However, it generated the DLL of every single table that is used in the view itself, as well as the definition for the view. This doesn't really help me since I just want the schema of the select statement itself. In other words, I need what would be generated if I were to use the create table as statement mentioned above.
Edit for Rob: Perhaps "DDL" was the wrong term to use. Using show view db.myview just shows the definition of the view, not the schema it represents. In my above example of create table as, I show how you can create a table that mimics the schema of a result set returned in a select. It generates a DDL on the back end for creating a table and then executes that DDL to actually create the table. You can then say show table db.newtable and see the new table's DDL. I want to get that DDL directly from a select statement so that I can copy it, log out of the app account, into my personal account, and then execute the DDL to create the table.
This is only to save me the headache of having to type out the DDL manually by hand to save time and reduce typing errors, especially since the source view has so many columns. That said, I think hitting up the DBA or writing some snazzy stored procedure to do dynamic stuff would be a bit over the top for my needs. I think there has to be a way to get the DDL for creating a table schema directly from a select statement.
Generate DDL Statements for objects:
SHOW TABLE {DatabaseB}.{Table1};
SHOW VIEW {DatabaseB}.{View1};
Breakdown of columns in a view:
HELP VIEW {DatabaseB}.{View1};
However, without the ability to create the object in the target database DatabaseA your don't have much leverage. Obviously, if the object already existed INSERT INTO SELECT ... FROM DatabaseB.Table1 or MERGE INTO would be options that you already explored.
Alternative Solution
Would it be possible to have a stored procedure created that dynamically created the table based on the view name that is provided? The global application account would simply need privilege to execute the procedure. Generally the user creating the stored procedure would need the permissions to perform the actions contained within the stored procedure. (You have some additional flexibility with this in Teradata 13.10.)
There are some caveats with this approach. You are attempting to materialize views that could reference anywhere from hundreds to billions of records. These aren't simple 1:1 views that are put on top of the target tables. Trying to determine the required space in the target database to materialize the view will be difficult. Performance can and will vary depending on the complexity of the view and the data volumes. This will not be a fast-path or data block optimized operation.
As a DBA, I would be concerned with this approach being taken on by a global application account without fully understanding the intent. I trust you have an open line of communication with the DBA(s) involved for supporting this system. I'm sure there are reasons for your madness that can't be disclosed here.
Possible Solution - VOLATILE TABLE
Unless the implicit privilege for CREATE TABLE has been revoked from the global application account this solution should work.
Volatile tables do not require perm space. There table definitions persist for the duration of the session and any data inserted into them relies on the spool space of the user who instantiated it.
CREATE VOLATILE TABLE {Global Application UserID}.{TableA_Copy} AS
(
SELECT *
FROM {DatabaseB}.{TableA}
)
WITH NO DATA
NO PRIMARY INDEX
ON COMMIT PRESERVE ROWS;
SHOW TABLE {Global Application UserID}.{TableA_Copy};
I opted to use a Teradata 13.10 feature called NO PRIMARY INDEX. By default, CREATE TABLE AS will take the first column of the SELECT statement and make it the PRIMARY INDEX of the table. This could lead to skewing and perm space issues in your testing depending on the data demographics. You can specify an explicit PRIMARY INDEX on your own as you understand the underlying data. (See the DDL manuals for details on the syntax if you're uncertain.)
The use of ON COMMIT PRESERVE ROWS for the intent of this example is probably extraneous. But in reality if you popped any data into that table for testing this clause would be beneficial in Teradata mode as the data would otherwise be lost immediately after the CREATE TABLE or any other data manipulation was performed against the volatile table.

Resources