In an aviarc workflow, how do I delete rows from from a dataset after they have been removed from the database? - aviarc

Here's the workflow flow I'm using.
<atomic-commit>
<dataset name="foo"/>
</atomic-commit>
<dataset-iterator dataset="foo">
<create-row dataset="hist-foo"/>
<mark-row-created dataset="hist-foo"/>
</dataset-iterator>
So basically, after dataset foo is updated, I want to record the remaining foo entries in another history table. But when I delete rows from the foo table, the rows still remain in the dataset and therefore get added to hist-foo.
I've tried to add a post-workflow to the foo databroker's delete action like this:
<workflow>
<delete-row dataset="{$context.commit-dataset-name}"/>
</workflow>
However I get an error when the delete action is called.
Also, after the first atomic commit, the foo dataset doesn't keep deleted row actions, so I can't identify which rows from deleted from the dataset.

The simplest solution for this situation would be to sift the marked-deleted rows into a separate dataset. Unfortunately this is a little long when using only built-in commands.
<dataset name="deleted-foo" databroker="..."/>
<dataset-iterator dataset="foo">
<if test="row-marked-deleted" value1="foo">
<then>
<create-row dataset="deleted-foo"/>
<copy-row from-dataset="foo" to-dataset="deleted-foo"/>
<mark-row-deleted dataset="deleted-foo"/>
</then>
</if>
</dataset-iterator>
<!-- Keeping in mind that you can't delete rows from a dataset
which is being iterated over. -->
<dataset-iterator dataset="deleted-foo">
<dataset-reset dataset="foo" no-current-row="y"/>
<!-- Assuming rows have a field 'id' which uniquely IDs them -->
<set-current-row-by-field dataset="foo" field="id" value="{$deleted-foo.id}"/>
<if test="dataset-has-current-row" value1="foo">
<then>
<delete-row dataset="foo"/>
</then>
</if>
</dataset-iterator>
<atomic-commit>
<dataset name="deleted-foo"/>
<dataset name="foo"/>
</atomic-commit>
<dataset-iterator dataset="foo">
<create-row dataset="hist-foo"/>
<mark-row-created dataset="hist-foo"/>
</dataset-iterator>
An alternate solution would be to do the history recording at the same time as the inserts/updates were run, for example by running multiple statements within the operations or by having insert/update triggers set up if those are available.

I think that in the answer from Tristan you don't necessarily need to commit "deleted-foo" dataset as you don't mark its rows with any commit flag.
A bit further - I would personally move those operations into pre- and post- commit workflows of the databroker. You'd capture all rows marked as deleted in pre-commit workflow and then delete rows from the foo dataset and populate history dataset in the post-commit workflow.

Related

How to Handle BQ GA Export Changes?

I'm trying to reprocess ga_sessions_yyyymmdd data but am finding the ga_sessions never used to have a field called [channelGrouping] but it does in more recent data.
So my jobs work fine for the latest version of ga_sessions but when i try reprocess earleir ga_sessions data the job fails as it's missing the [channelGrouping] field.
Obviously usually this is what you want, but in this case it's not. I want to make sure i'm sticking to the latest ga_sessions schema and would like the job to just set missing cols to null for when they did not exist.
Is there any way around this?
Perhaps i need to make an empty table called ga_sessions_template_latest and union it on to whatever ga_sessions_ daily table i'm handling - maybe this will 'upgrade' the old ga_sessions to the new structure.
Attached is a screenshot of exactly what i mean (my union idea will actually be horrible due to nested fields in ga_sessions).
I don't have such a script yet. But since the tables are under your project you are able to update them. You can write a script and update the schema on all tables with missing columns from the most recent schema set.
I envision a script that gets most recent table schema.
Then goes back one by one to past tables, does a compare, identifies the missing columns, defines them as not required and nullable, and reads the schema + applies the additional columns and runs the update on the table. Data won't be modified, you will have just additional columns with null values.
you can try out for some also from the Web UI.

Determine flyway variables from earlier SQL step

I'd like to use flyway for a DB update with the situation that an DB already exists with productive data in it. The problem I'm looking at now (and I did not find a nice solution yet), is the following:
There is an existing DB table with numeric IDs, e.g.
create table objects ( obj_id number, ...)
There is a sequence "obj_seq" to allocate new obj_ids
During my DB migration I need to introduce a few new objects, hence I need new
object IDs. However I do not know at development time, what ID
numbers these will be
There is a DB trigger which later references these IDs. To improve performance I'd like to avoid determine the actual IDs every time the trigger runs but rather put the IDs directly into the trigger
Example (very simplified) of what I have in mind:
insert into objects (obj_id, ...) values (obj_seq.nextval, ...)
select obj_seq.currval from dual
-> store this in variable "newID"
create trigger on some_other_table
when new.id = newID
...
Now, is it possible to dynamically determine/use such variables? I have seen the flyway placeholders but my understanding is that I cannot set them dynamically as in the example above.
I could use a Java-based migration script and do whatever string magic I like - so, that would be a way of doing it, but maybe there is a more elegant way using SQL?
Many thx!!
tge
If the table you are updating contains only reference data, get rid of the sequence and assign the IDs manually.
If it contains a mix of reference and user data, you need to select the id based on values in other columns.

BizTalk - how to map these two nodes to a repeating node?

I have an incoming schema that looks like this:
<Root>
<ClaimDates005H>
<Begin>20120301</Begin>
<End>20120302</End>
</ClaimDates005H>
</Root>
(there's more to it, this is just the area I'm concerned with)
I want to map it to a schema with a repeating section, so it winds up like this:
<Root>
<DTM_StatementFromorToDate>
<DTM01_DateTimeQualifier>Begin</DTM01_DateTimeQualifier>
<DTM02_ClaimDate>20120301</DTM02_ClaimDate>
</DTM_StatementFromorToDate>
<DTM_StatementFromorToDate>
<DTM01_DateTimeQualifier>End</DTM01_DateTimeQualifier>
<DTM02_ClaimDate>20120302</DTM02_ClaimDate>
</DTM_StatementFromorToDate>
</Root>
(That's part of an X12 835, BTW...)
Of course in the destination schema there's only a single occurrence of DTM_StatementFromorToDate, that can repeat... I get that I can run both Begin and End into a looping functoid to create two instances of DTM_StatementFromorToDate, one with Begin and one with End, but then how do I correctly populate DTM01_DateTimeQualifier?
Figured it out, the Table Looping functoid took care of it.

Building Accessories Schema and Bulk Insert

I developed an automation application of a car service. I started accessories module yet but i cant imagine how should I build the datamodel schema.
I've got data of accessories in a text file, line by line (not a cvs or ext.., Because of that, i split theme by substring). Every month, the factory send the data file to the service. It includes the prices, the names, the codes and etc. Every month the prices are updated. I thought the bulkinsert (and i did) was a good choice to take the data to SQL, but it's not a solution to my problem. I dont want duplicate data just for having the new prices. I thought to insert only the prices to another table and build a relation between the Accessories - AccesoriesPrices but sometimes, some new accessories can be added to the list, so i have to check every line of Accessories table. And, the other side, i have to keep the quantity of the accessories, the invoices, etc.
By the way, they send 70,000 lines every month. So, anyone can help me? :)
Thanks.
70,000 lines is not a large file. You'll have to parse this file yourself and issue ordinary insert and update statements based upon the data contained therein. There's no need for using bulk operations for data of this size.
The most common approach to something like this would be to write a simple SQL statement that accepts all of the parameters, then does something like this:
if(exists(select * from YourTable where <exists condition>))
update YourTable set <new values> where <exists condition>
else
insert into YourTable (<columns>) values(<values>)
(Alternatively, you could try rewriting this statement to use the merge T-SQL statement)
Where...
<exists condition> represents whatever you would need to check to see if the item already exists
<new values> is the set of Column = value statements for the columns you want to update
<columns> is the set of columns to insert data into for new items
<values> is the set of values that corresponds to the previous list of columns
You would then loop over each line in your file, parsing the data into parameter values, then running the above SQL statement using those parameters.

How to restrict PL/SQL code from executing twice with the same values to the input parameters?

I want to restrict the execution of my PL/SQL code from repetition. That is, I have written a PL/SQL code with three input parameters viz, Month, Year and a Flag. I have executed the procedure with the following values for the parameters:
Month: March
Year : 2011
Flag: Y
Now, If I am trying to execute the procedure with the same values to the parameters as above, I want to write some code in the PL/SQL to restrict the unwanted second execution. Can anyone help. I hope the question is no ambiguous.
You can use the function result cache: http://www.oracle-developer.net/display.php?id=504 . So Oracle can do this for you.
I would create another table that would store the 3 parameters of each request. When your procedure is called it would first check the "parameter request" table to see if the calling parameters have beem used before. If found, then exit the procedure. If not found, then save the parameters and execute the rest of the procedure.
Your going to need to keep "State" of the last call somewhere. I would recommend creating a table with a datetime column.
When your procedure is called update this table. So, next time when your procedure is called.. check this table to see when was the last time your procedure was called and then proceed accordingly.
Why not set up a table to track what arguments you've already executed it with?
In your procedure, first check that table to see if similar parameters have already been processed. If so, exit (with or without an error).
If not, insert them and do the processing necessary.
Depending on how tight the requirements are, you'll need to get a exclusive lock on that table to prevent concurrent execution.
A nice plus would be an extra column with "in progress"/"done"/"error" status so that you can check if things are going on properly. (Maybe a timestamp too if that's important/interesting.)
This setup allows you to easily clear some of the executions (by deleting some rows) if you find things need to be re-done for whatever reason.
Make an insert in the beginning of the procedure, and do a select for update tolock the table so no one else can process any data and if everything goes ok with the procedure, commit and release the table 😀

Resources