I'm using Pentaho Kettle 8.0 and I've created a transformation to migrate data between postgresql databases. This transformation reads information about orders (parent) and its items (child) and inserts or updates the target database. But I'm having problems with orders that have no items or that the transformation fails to insert the items. What I need is, every order must have at least 1 item.
I've designed the transformation to lookup the order data and insert/update the target table and then lookup the items. If there is an error during these steps, how can I rollback/delete the parents?
The target tables are like this:
Orders - Order_ID, Value, Qty, Customer_ID
OrderItems - Item_ID, Value, Qty, Order_ID
I suggest you do it in two steps. First you do exactly what you do : inserting parent and child, without any concern about insertion errors. Once it is finished an other transformation clean up any parent without child.
If you need to do it in one step (for example, if the system is in production), I would produce the orders and items flows. Then, for each order lookup if there is one (or more) item and filter those orders before to write to the database. Something like this:
You may also count the number of items by orders, before to filter out the orders without any items.
Related
I am confused by the API documentation of CreateTable from DynamoDB. I need to create multiple tables with a secondary index. From the API: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/dynamodb/DynamoDbClient.html#createTable-software.amazon.awssdk.services.dynamodb.model.CreateTableRequest-
If you want to create multiple tables with secondary indexes on them, you must create the tables sequentially. Only one table with secondary indexes can be in the CREATING state at any given time.
and
Up to 500 simultaneous table operations are allowed per account. These operations include CreateTable, UpdateTable, DeleteTable, UpdateTimeToLive, RestoreTableFromBackup, and RestoreTableToPointInTime.
The only exception is when you are creating a table with one or more secondary indexes. You can have up to 250 such requests running at a time;
Can I create now only one table with a secondary index or 250 at the same time?
If I create multiple tables sequential without waiting on active state is this already concurrency creation?
Must I wait on the active state for every table if I create multiple tables with secondary indexes?
An individual account can only be running one "Create Index" action at a time, no matter how many tables you have.
To understand this it may help to understand what an Index is. An Index is a complete copy of the table, but with a different partition and sort key. So if your original table has a PK of of userId and a sk of sort_key you could now create an index where the partition key is set to sort_key and the sort_key is now set to userId creating an inverted index (a common practice in Dynamo - remember Queries in Dynamo must know what the PK is, so if you have UserID you could access all data of a given User, or if you wanted all Users who have a particular tag, you may have an SK item on users that is something like TAG#ThisTag and then you wanted all users with ThisTag you could do a query against the inverted index with a pk = TAG#ThisTag and get back a list of UserIds.)
While the CreateIndex is being run on a given table, no other actions can be run on it - it wont accept changes to the data/configuration that would cause a fault/mismatch in the copying process. This is one of the reasons a given account is limited to only one create index operation at a time.
As a slight aside if I may - if you have a single account with multiple Dynamos all for the same product, you may want to rethink your database strategy. A single Dynamo Table can be used for many different storages if you set up your PK-SK as generic fields (ie: pk and sk as the attribute names) - No document inside your dynamo has to have the same attributes as any other. And when accessing data, each partition key is exactly as its named - a Partition of data that is all that is accessed when a query is made against that PK. (so if you have 100 items with PK of USER#1 and 100 items with a PK of USER#2 and you query against USER#1 you only access that 100 items - the rest are ignored by the Query and never ever touched - allowing you to in effect have multiple "tables" in a single DynamoDB Table by giving them different Partition Key prefixes.)
I refactored a table that stored both metadata and data into two tables, one for metadata and one for data. This allows metadata to be queried efficiently.
I also created an updatable view with the original table's columns, using sqlite's insert, update and delete triggers. This allows calling code that needs both data and metadata to remain unchanged.
The insert and update triggers write each incoming row as two rows - one in the metadata table and one in the data table, like this:
// View
CREATE VIEW IF NOT EXISTS Item as select n.Id, n.Title, n.Author, c.Content
FROM ItemMetadata n, ItemData c where n.id = c.Id
// Trigger
CREATE TRIGGER IF NOT EXISTS item_update
INSTEAD OF UPDATE OF id, Title, Author, Content ON Item
BEGIN
UPDATE ItemMetadata
SET Title=NEW.Title, Author=NEW.Author
WHERE Id=old.Id;
UPDATE ItemData SET Content=NEW.Content
WHERE Id=old.Id;
END;
Questions:
Are the updates to the ItemMetadata and ItemData tables atomic? Is there a chance that a reader can see the result of the first update before the second update has completed?
Originally I had the WHERE clauses be WHERE rowid=old.rowid but that seemed to cause random problems so I changed them to WHERE Id=old.Id. The original version was based on tutorial code I found. But after thinking about it I wonder how sqlite even comes up with an old rowid - after all, this is a view across multiple tables. What rowid does sqlite pass to an update trigger, and is the WHERE clause the way I first coded it problematic?
The documentation says:
No changes can be made to the database except within a transaction. Any command that changes the database (basically, any SQL command other than SELECT) will automatically start a transaction if one is not already in effect.
Commands in a trigger are considered part of the command that triggered the trigger.
So all commands in a trigger are part of a transaction, and atomic.
Views do not have a (usable) rowid.
I am developing an application which tracks class attendance of students in a school, in Apex.
I want to create a page with three level cascading select lists, so the teacher can first select the Semester, then the Subject and then the specific Class of that Subject, so the application returns the Students who are enrolled in that Class.
My problem is that these three tables have a many-to-many relationship between them, so I use extra tables with their keys.
Every Semester has many Subjects and a Subject can be taught in many Semesters.
Every Subject has many classes in every Semester.
The students must enroll in a subject every semester and then the teacher can assign them to a class.
The tables look something like this:
create table semester(
id number not null,
name varchar2(20) not null,
primary key(id)
);
create table subject(
id number not null,
subject_name varchar2(50) not null,
primary key(id)
);
create table student(
id number not null,
name varchar2(20),
primary key(id)
);
create table semester_subject(
id number not null,
semester_id number not null,
subject_id number not null,
primary key(id),
foreign key(semester_id) references semester(id),
foreign key(subject_id) references subject(id),
constraint unique sem_sub_uq unique(semester_id, subject_id)
);
create table class(
id number not null,
name number not null,
semester_subject_id number not null,
primary key(id),
foreign key(semester_subject_id) references semester_subject(id)
);
create table class_enrollment(
id number not null,
student_id number not null,
semester_subject_id number not null,
class_id number,
primary_key(id),
foreign key(student_id) references student(id),
foreign key(semester_subject_id) references semester_subject(id),
foreign key(class_id) references class(id)
);
The list of value for the Semester select list looks like this:
select name, id
from semester
order by 1;
The the subject select list should include the names of all the Subjects available in the semester selected above, but I can't figure the query or even if it's possible. What I have right now:
select s.name, s.id
from subject s, semester_subject ss
where ss.semester_id = :PX_SEMESTER //value from above select list
and ss.subject_id = s.id;
But you can't have two tables in a LoV and the query is probably wrong anyway...
I didn't even begin to think about what the query for the class would look like.
I appreciate any help or if you can point me in the right direction so I can figure it out myself.
Developing an Apex Input Form Using Item-Parametrized Lists of Values (LOVs)
Your initial schema design looks good. One recommendation once you've developed and tested your solution on a smaller scale, append to the ID (primary key) columns a trigger that can auto-populate its values through a sequence. You could also skip the trigger and just reference the sequence in your sql insert DML commands. It just makes things simpler. Creating tables in the APEX environment with their built-in wizards offer the opportunity to make an "auto-incrementing" key column.
There is also an additional column added to the SEMESTER table called SORT_KEY. This helps when you are storing string typed values which have logical sorting sequences that aren't exactly alphanumeric in nature.
Setting Up The Test Data Values
Here is the test data I generated to demonstrate the cascading list of values design that will work with the example.
Making Dynamic List of Value Queries
The next step is to make the first three inter-dependent List of Values definitions. As you have discovered, you can reference page parameters in your LOVs which may come from a variety of sources. In this case, the choice selection from our LOVs will be assigned to Apex Page Items.
I also thought only one table could be referenced in a single LOV query. This is incorrect. The page documentation suggests that it is the SQL query syntax that is the limiting factor. The following LOV queries reference more than one table, and they work:
-- SEMESTER LOV Query
-- name: CHOOSE_SEMESTER
select a.name d, a.id r
from semester a
where a.id in (
select b.semester_id
from semester_subject b
where b.subject_id = nvl(:P5_SUBJECT, b.subject_id))
order by a.sort_id
-- SUBJECT LOV Query
-- name: CHOOSE_SUBJECT
select a.subject_name d, a.id r
from subject a
where a.id in (
select b.subject_id
from semester_subject b
where b.semester_id = nvl(:P5_SEMESTER, b.semester_id))
order by 1
-- CLASS LOV Query
-- name: CHOOSE_CLASS
select a.name d, a.id r
from class a, semester_subject b
where a.semester_subject_id = b.id
and b.subject_id = :P5_SUBJECT
and b.semester_id = :P5_SEMESTER
order by 1
Some design notes to consider:
Don't mind the P5_ITEM notation. The page in my sample app happened to be on "page 5" and so the convention goes.
I chose to assign a name for each LOV query as a hint. Don't just embed the query in an item. Add some breathing room for yourself as a developer by making the LOV a portable object that can be referenced elsewhere if needed.
MAKE a named LOV for each query through the SHARED OBJECTS menu option of your application designer.
The extra operator involving the NVL command, as in nvl(:P5_SUBJECT, b.subject_id) for the CHOOSE_SEMESTER LOV is an expression mirrored on the CHOOSE_SUBJECT query as well. If the default value of P5_SUBJECT and P5_SEMESTER are null when entering the page, how does that assist with the handling of the cascading relationships?
The table SEMESTER_SUBJECT represents a key relationship. Why is a LOV for this table not needed?
APEX Application Form Design Using Cascading LOVs
Setting up the a page for testing the schema design and LOV queries requires the creation of three page items:
Each page item should be defined as a SELECT LIST leave all the defaults initially until you understand how the basic design works. Each select list item should be associated with their corresponding LOV, such as:
The key design twist is the Select List made for the CHOOSE_CLASS LOV, which represents a cascading dependency on more than one data source.
We will use the "Cascading Parent" option so that this item will wait until both CHOOSE_SEMESTER and CHOOSE_SUBJECT are selected. It will also refresh if either of the two are changed.
YES! The cascading parent item can consist of multiple page items/elements. They just have to be declared in a comma separated list.
From the online help info, this is a general introduction to how cascading LOVs can be used in APEX designs:
From Oracle Apex Help Docs: A cascading LOV means that the current item's list of values should be refreshed if the value of another item on this page gets changed.
Specify a comma separated list of page items to be used to trigger the refresh. You can then use those page items in the where clause of your "List of Values" SQL statement.
Demonstration of APEX Application Items with Cascading LOVs
These examples are based on the sample data given at the beginning of this solution. The path of the chosen example case is:
SEMESTER: SPRING 2014 + SUBJECT: PHYS ED + Verify Valid Course Options:
Fitness for Life
General Flexibility
Presidential Fitness Challenge
Running for Fun
Volleyball Basics
The choice from above will be assigned to page item P5_CLASS.
Selection Choices for P5_SEMESTER:
Selection Choices for P5_SUBJECT:
Selection Choices for P5_CLASS:
Closing Remarks and Discussion
Some closing thoughts that occurred to me while working with this design project:
About the Primary Keys: The notion of a generic, ID named column for a primary key was a good design choice. While APEX can handle composite business keys, it gets clumsy and difficult to work around.
One thing that made the schema design challenging to work with was that the notion of "id" transformed in the other tables that referenced it. (Such as the ID column in the SEMESTER table became SEMESTER_ID in the SEMESTER_SUBJECT table. Just keep an eye on these name changes with larger queries. At times I actually lost track exactly what ID I was working with.
A Word for Sanity: In the likely event you decide to assign ID values through a database sequence object, the default is usually to begin at one. If you have several different tables in your schema with the same column name: ID and some associating tables such as CLASS_ENROLLMENT which connects the values of one primary key ID and three additional foreign key ID's, it may get difficult to discern where the data values are coming from.
Consider offsetting your sequences or arbitrarily choosing different increments and starting values. If you're mainly pushing ID's around in your queries, if two different ID sets are separated by two or three orders of magnitude, it will be easy to know if you've pulled the right data values.
Are There MORE Cascading Relationships? If a "parent" item relationship indicates a dependency that makes a page item LOV wait or change depending on the value of another, could there be another cascading relationship to define? In the case of CHOOSE_SEMESTER and CHOOSE_SUBJECT is it possible? Is it necessary?
I was able to figure out how to make these two items hold an optional cascading dependency, but it required setting up another outside page item reference. (If it isn't optional, you get stuck in a closed loop as soon as one of the two values changes.) Fancy, but not really necessary to solve the problem at hand.
What's Left to Do? I left out some additional tasks for you to continue with, such as managing the DML into the ENROLLMENT table after selecting a valid STUDENT.
Overall, you've got a workable schema design. There is a way to represent the data relationships through an APEX application design pattern. Happy coding, it looks like a challenging project!
I need to assign one of multiple parent types to a single child item. The problem I encounter is that in an Access 2010 web database I cannot create a Union query to bring all the potential parents (from multiple tables) into a single drop down / listbox.
I'm a bit green to all this and could be going about it completely wrong. I'm very open to suggestions. Here is my example:
Contracts are the parent of Subcontracts.
Both Contracts and Subcontracts have a Statement of Work (SoW).
Contracts and Subcontracts can both be direct parents of a SoW.
Each SoW will have only one parent
SoWs are split into paragraphs (not overly consequential)
With a union query I would build the database this way:
Contracts table
Subcontracts table
Union table for contracts and subcontracts
Lookup to union table from SoW table in order to select either a contract or a subcontract as parent from a single data source.
The problem here is that I cannot create a union query in a web database.
My only other thought is to construct the database in this fashion:
Contracts table
Subcontracts table
Contracts SoW table
Subcontracts SoW table
This design (using two tables) might work more effectively for data entry as there could be issues with subforms when attempting to use a union table. I'm not sure as I haven't yet tried. With this method, the Access report should be able to bind the subcontract to the parent contract and display all data in a detail section. However, this design still means that I will use two separate tables to house identical data.
I would put the two contract tables together into one table that would look something like this:
CREATE TABLE ContractTable(
ContactID INTEGER NOT NULL PRIMARY KEY, -- Possibly an autonumber
[various contract columns],
ParentContract INTEGER
);
Note, I know this is not Access friendly syntax. I usually use bigger DBs, but you should be able to get the idea.
Then your query to find parent contracts is SELECT ... FROM ContractTable WHERE ParentContract IS NULL.
To find sub contracts SELECT ... FROM ContractTable WHERE ParentContract IS NOT NULL.
My concern with this approach is that if you need to search through chains of contracts (i.e. A parent of B parent of C parent of D, and you need to go from A to D), you could run into recursive SQL which I don't think Access can handle. You'd have to do it VBA code.
I have recently stumbled upon a problem with selecting relationship details from a 1 table and inserting into another table, i hope someone can help.
I have a table structure as follows:
ID (PK) Name ParentID<br>
1 Myname 0<br>
2 nametwo 1<br>
3 namethree 2
e.g
This is the table i need to select from and get all the relationship data. As there could be unlimited number of sub links (is there a function i can create for this to create the loop ?)
Then once i have all the data i need to insert into another table and the ID's will now have to change as the id's must go in order (e.g. i cannot have id "2" be a sub of 3 for example), i am hoping i can use the same function for selecting to do the inserting.
If you are using SQL Server 2005 or above, you may use recursive queries to get your information. Here is an example:
With tree (id, Name, ParentID, [level])
As (
Select id, Name, ParentID, 1
From [myTable]
Where ParentID = 0
Union All
Select child.id
,child.Name
,child.ParentID
,parent.[level] + 1 As [level]
From [myTable] As [child]
Inner Join [tree] As [parent]
On [child].ParentID = [parent].id)
Select * From [tree];
This query will return the row requested by the first portion (Where ParentID = 0) and all sub-rows recursively. Does this help you?
I'm not sure I understand what you want to have happen with your insert. Can you provide more information in terms of the expected result when you are done?
Good luck!
For the retrieval part, you can take a look at Common Table Expression. This feature can provide recursive operation using SQL.
For the insertion part, you can use the CTE above to regenerate the ID, and insert accordingly.
I hope this URL helps Self-Joins in SQL
This is the problem of finding the transitive closure of a graph in sql. SQL does not support this directly, which leaves you with three common strategies:
use a vendor specific SQL extension
store the Materialized Path from the root to the given node in each row
store the Nested Sets, that is the interval covered by the subtree rooted at a given node when nodes are labeled depth first
The first option is straightforward, and if you don't need database portability is probably the best. The second and third options have the advantage of being plain SQL, but require maintaining some de-normalized state. Updating a table that uses materialized paths is simple, but for fast queries your database must support indexes for prefix queries on string values. Nested sets avoid needing any string indexing features, but can require updating a lot of rows as you insert or remove nodes.
If you're fine with always using MSSQL, I'd use the vendor specific option Adrian mentioned.