Duplicated workflow with correlation - workflow-foundation-4

I have workflow with correlation. When I call twice some method with the same parameters i have the following error:
The execution of an InstancePersistenceCommand was interrupted by a key collision. The instance key with value 'bcd874f3-1d47-d9f0-de51-4487d1e4e12e' could not be associated to the instance because it is already associated to a different instance.
Is there any way to delete previous workflow and start new?

You can add a WorkflowControlEndpoint to the WorkflowServiceHost and use the WorkflowControlClient to terminate the existing workflow before starting a new one with the same correlation key.

Related

MDriven ECO_ID duplicates

We appear to have a problem with MDriven generating the same ECO_ID for multiple objects. For the most part it seems to happen in conjunction with unexpected process shutdowns and/or server shutdowns, but it does also happen during normal activity.
Our system consists of one ASP.NET application and one WinForms application. The ASP.NET app is setup in IIS to use a single worker process. We have a mixture of WebForms and MVC, including ApiControllers. We're using a rather old version of the ECO packages: 7.0.0.10021. We're on VS 2017, target framework is 4.7.1.
We have it configured to use 64 bit integers for object id:s. Database is Firebird. SQL configuration is set to use ReadCommitted transaction isolation.
As far as I can tell we have configured EcoSpaceStrategyHandler with EcoSpaceStrategyHandler.SessionStateMode.Never, which should mean that EcoSpaces are not reused at all, right? (Why would I even use EcoSpaceStrategyHandler in this case, instead of just creating EcoSpace normally with the new keyword?)
We have created MasterController : Controller and MasterApiController : ApiController classes that we use for all our controllers. These have a EcoSpace property that simply does this:
if (ecoSpace == null)
{
if (ecoSpaceStrategyHandler == null)
ecoSpaceStrategyHandler = new EcoSpaceStrategyHandler(
EcoSpaceStrategyHandler.SessionStateMode.Never,
typeof(DiamondsEcoSpace),
null,
false
);
ecoSpace = (DiamondsEcoSpace)ecoSpaceStrategyHandler.GetEcoSpace();
}
return ecoSpace;
I.e. if no strategy handler has been created, create one specifying no pooling and no session state persisting of eco spaces. Then, if no ecospace has been fetched, fetch one from the strategy handler. Return the ecospace. Is this an acceptable approach? Why would it be better than simply doing this:
if (ecoSpace = null)
ecoSpace = new DiamondsEcoSpace();
return ecoSpace;
In aspx we have a master page that has an EcoSpaceManager. It has been configured to use a pool but SessionStateMode is Never. It has EnableViewState set to true. Is this acceptable? Does it mean that EcoSpaces will be pooled but inactivated between round trips?
It is possible that we receive multiple incoming API calls in tight succession, so that one API call hasn't been completed before the next one comes in. I assume that this means that multiple instances of MasterApiController can execute simultaneously but in separate threads. There may of course also be MasterController instances executing MVC requests and also the WinForms app may be running some batch job or other.
But as far as I understand id reservation is made at the beginning of any UpdateDatabase call, in this way:
update "ECO_ID" set "BOLD_ID" = "BOLD_ID" + :N;
select "BOLD_ID" from "ECO_ID";
If the returned value is K, this will reserve N new id:s ranging from K - N to K - 1. Using ReadCommitted transactions everywhere should ensure that the update locks the id data row, forcing any concurrent save operations to wait, then fetches the update result without interference from other transactions, then commits. At that point any other pending save operation can proceed with its own id reservation. I fail to see how this could result in the same ID being used for multiple objects.
I should note that it does seem like it sometimes produces id duplicates within one single UpdateDatabase, i.e. when saving a set of new related objects, some of them end up with the same id. I haven't really confirmed this though.
Any ideas what might be going on here? What should I look for?
The issue is most likely that you use ReadCommitted isolation.
This allows for 2 systems to simultaneously start a transaction, read the current value, increase the batch, and then save after each other.
You must use Serializable isolation for key generation; ie only read things not currently in a write operation.
MDriven use 2 settings for isolation level UpdateIsolationLevel and FetchIsolationLevel.
Set your UpdateIsolationLevel to Serializable

Airflow - How to pass data the output of one operator as input to another task

I have a list of http endpoints each performing a task on its own. We are trying to write an application which will orchestrate by invoking these endpoints in a certain order. In this solution we also have to process the output of one http endpoint and generate the input for the next http enpoint. Also, the same workflow can get invoked simultaneously depending on the trigger.
What I have done until now,
1. Have defined a new operator deriving from the HttpOperator and introduced capabilities to write the output of the http endpoint to a file.
2. Have written a python operator which can transfer the output depending on the necessary logic.
Since I can have multiple instances of the same workflow in execution, I could not hardcode the output file names. Is there a way to make the http operator which I wrote to write to some unique file names and the same file name should be available for the next task so that it can read and process the output.
Airflow does have a feature for operator cross-communication called XCom
XComs can be “pushed” (sent) or “pulled” (received). When a task pushes an XCom, it makes it generally available to other tasks. Tasks can push XComs at any time by calling the xcom_push() method.
Tasks call xcom_pull() to retrieve XComs, optionally applying filters based on criteria like key, source task_ids, and source dag_id.
To push to XCOM use
ti.xcom_push(key=<variable name>, value=<variable value>)
To pull a XCOM object use
myxcom_val = ti.xcom_pull(key=<variable name>, task_ids='<task to pull from>')
With bash operator , you just set xcom_push = True and the last line in stdout is set as xcom object.
You can view the xcom object , hwile your task is running by simply opening the tast execution from airflow UI and clicking on the xcom tab.

Lambda as a leaky state machine?

I have a micro-service which involved in an OAuth 1 interaction. I'm finding myself in a situation where two runs of the Lambda functions with precisely the same starting states have very different outcomes (where state is considered the "event" passed in, environment variables, and "stageParameters" from the API Gateway).
Here's a Cloudwatch log that shows two back-to-back runs:
You can see that while the starting state is identical, the execution path changes pretty quickly. In the second case (failure case), you see the log entry "Auth state changed: null" ... that is very odd indeed because in fact this is logged before even the first line of code of the "handler" is executed. Here's the beginning of the functions handler:
export const handler = (event, context, cb) => {
console.log('EVENT:\n', JSON.stringify(event, null, 2));
So where is this premature logging entry coming from? Well, one must assume that it somehow is left over from prior executions. Let me demonstrate ... it is in fact an event listener that was setup in the prior execution. This function interacts with a Firebase DB and the first time it connects it sets the following up:
auth.signInWithEmailAndPassword(username, password)
.then((result) => {
auth.onAuthStateChanged(this.watchAuthState);
where the watchAuthState function is simply:
watchAuthState(user) {
console.log(`Auth state changed:\n`, JSON.stringify(user, null, 2));
}
This seems to mean that when I run the DB a second time I am already "initialized" with the Firebase DB but apparently the authentication has been invalidated. My number one aim is to just get back to a predictive state model and have it execute precisely the same each time.
If, there are sneaky ways to reuse cached state between Lambda executions in resource useful ways then I guess that too would be interesting but only if we can do that while achieving the predictive state machine.
Regarding the logs order, look at the ID that comes after each timestamp at the beginning of each line. I believe this is the invocation ID. In the two lines you have highlighted in orange, they are from different invocations of the function. The EVENT log is the first line to get logged from the invocation with ID ending in 754ee. The Auth state changed: null line is a log entry coming from the earlier invocation of the function with invocation ID ending in c40d5.
It looks like you are setting auth state to null at the end of an invocation, but the Firebase connection is global, so the second function invocation thinks the Firebase connection is already initialized, but then it throws errors because the authentication was nulled out.
My number one aim is to just get back to a predictive state model and
have it execute precisely the same each time.
Then you need to be aware of Lambda container reuse, and not use any global variables.

How to Uniquely associate nested call with the original call?

I have function that is using unity interception to log the method time. The problem comes when I want to log deeper info like database call time, backend time etc.
So my method (let say, M1) calls some other method(M2) that inturn call some other method and so on to finally call a dbMethod that calls db. I am able to log time for all functions individually but for final aggregation in my log server , it would be helpful if I can find for which M1 call, how much time the dbServer Method took.
So is there some property like threadId that I can use that remain same during the nested calls so that I can use them in final aggregation (for joining M1 and dbMethod log data)? I would like that unique value to be different in different invokation of M1.
For anyone out there that is facing same problem, I solved it using CallContext class. Just add a request Id in outermost call and it will propagate to all the inner nested call.

pass a directory name from one coordinator to another in oozie

I have a coordinator-A running which has workflow that generates output to a directory
/var/test/output/20161213-randomnumber/
now i need to pass the dir name "20161213-randomnumber" to another coordinator-B which needs to start as soon as the workflow of the coordinator-A is completed.
I am not able to find any pointers on how to pass the file name or how can the coordinator-B be triggered with the directory generated by co-ordinator A.
How ever i have seen numerous examples on triggering the coordinators for a specific date, daily, monthly, weekly dataset. In my case the dataset is not time dependent. It can arrive arbitrarily .
In your case, you can add one more action to put one empty trigger file (trig.txt) after your data generation script(/var/test/output/20161213-randomnumber/) action in your coordinator A. Then in your coordinator B add the data dependency to point to the trigger file, if it is there corrdinator B will start. Once B is getting started you can clear the trigger file for the next run.
You can use this data dependency to solve the problem. you can not pass parameter from one coordinator to another coordinator.

Resources