The Job called "Contract States job" fails every times it runs with the following message.
"Async Handler for Update Contract States Operation encountered some
errors, see log for more detail.Detail: "
I've tried setting trace to verbose even but can't see the log, or any errors why this job is failing every time. This job is crucial to the contract entity as it need to renew (via plugin) promptly so to generate contract-detail lines in time for other process to follow.
Could someone please point me to this log or how I can trap this error causing this job to fail?
Trace displayed errors on some contracts not being able to update state because the TotalPrice was not equal to the sum of its contract lines. Further diffing showed that another "unsupported script" would change contract states - thus causing this operation to fail on these contracts.
The fix was to have a look at the "unsupported script" rather.
Related
In Apache Airflow (2.x), each Operator Instance has a state as defined here (airflow source repo).
I have two use cases that don't seem to clearly fall into the pre-defined states:
Warn, but don't fail - This seems like it should be a very standard use case and I am surprised to not see it in the out-of-the-box airflow source code. Basically, I'd like to color-code a node with something eye-catching - say orange - corresponding to a non-fatal warning, but continue execution as normal otherwise. Obviously you can print warnings to the log, but finding them takes more work than just looking at the colorful circles on the DAGs page.
"Sensor N/A" or "Data not ready" - This would be a status that gets assigned when a sensor notices that data in the source system is not yet ready, and that downstream operators can be skipped until the next execution of the DAG, but that nothing in the data pipeline is broken. Basically an expected end-of-branch.
Is there a good way of achieving either of these use cases with the out-of-the-box Airflow node states? If not, is there a way to defining custom operator states? Since I am running airflow on a managed service (MWAA), I don't think changing the source code of our deployment is an option.
Thanks,
The task states are tightly integrated with Airflow. There's no way to configure which logging levels lead to which state. I'd say the easiest way is to grep log files for "WARNING" or set up a log aggregation service e.g. Elasticsearch to make log files searchable.
For #2, sensors have no knowledge about why a sensor timed out. After timeout or execution_timeout is reached, they simply raise an Exception. You can deal with exceptions using trigger_rules, but these still don't take the reason for an exception into account.
If you want more control over this, I would implement your own Sensor which takes an argument e.g. data_not_ready_timeout (which is smaller than timeout and execution_timeout). In the poke() method, check if data_not_ready_timeout has been reached, and raise an AirflowSkipException if so. This will skip downstream tasks. Once timeout or execution_timeout are reached, the task is failed. Look at BaseSensorOperator.execute() for some inspiration to get the initial starting date of a sensor.
Consider the following basic structure of a Parallel Convoy pattern in BizTalk 2016. It is a Parallel Action with 2 active Receive shapes. Combined with a single correlation set that is being initialized by both active receives.
Now my issue arose when I want to have separate exception handling, one for the left receive, and one for the right receive. So I've put a scope around the left receive (Scope_1) with a timeout. And I've wrapped that scope in another scope (Scope_3), to catch the timeout exception.
Now for some reason this isn't allowed and I get back "fatal error X1001: unknown system exception" at build time.
However, if I wrap the scope_3 around both active receives, it's building successfully:
What's the significant difference here for BizTalk to not allow separate timeout exception handling in this scenario?
By the way:
It doesn't matter what type of exception I'm trying to catch, or if all my scopes are a Long Running transaction or not, the occurrence of the error is the same.
If I make a separate correlation set for each receive, the error does not occur, but of course that's not what I want because it wouldn't make it a parallel convoy then.
Setting scopes to synchronized does not affect the behavior.
The significant difference is that the Orchestration will start up when it receives the first message, which may not be the scope_1. So the timer would not be started in that scenario. And if it was scope_1, well it won't time out as you have received it, but it won't be timing out for scope_2.
Having the timer around both, does set the timeout in both scenarios.
What you could do is have the timeout scope as per your second example and set a flag to indicate which one was received, and use that in your exception block.
The other options is a first Receive shape that initializes the correlation set, and then a second receive after it that has the following correlation and have the timeout on that.
First, i am able to replicate your issue.
Although visual studio reported this as an unknown system exception but for me it looks unreachable code detected based the receive shape that is inside the scope (scope_3) that is trying to initialize your correlation. So there's a possibility that you wont be able to initialize the correlation same way your left scope (scope_2) does if your main scope (scope_1) is having some exceptions.
The only way I can think is to use using different correlation sets, you can set your send port to follow on these correlation set.
Without using correlation sets, this should not give error during build time. For me this is considered to be an MS bug, VS should be able to point out the unreachable code detected, not fatal error:
I see a bunch of 'permanent' failures when I fire the following command:-
.show ingestion failures | where FailureKind == "Permanent"
For all the entries that are returned the error code is UpdatePolicy_UnknownError.
The Details column for all the entries shows something like this:-
Failed to invoke update policy. Target Table = 'mytable', Query = '<some query here>': The remote server returned an error: (409) Conflict.: : :
What does this error mean? How do I find out the root cause behind these failures? The information I find through this command is not sufficient. I also copied OperationId for a sample entry and looked it up against the operations info:-
.show operations | where OperationId == '<sample operation id>'
But all I found in the Status is the message Failed performing non-transactional update policy. I know it failed, but can we find out the underlying reason?
"(409) Conflict" error usually comes from writing to the Azure storage.
In general, this error should be treated as a transient one.
If it happens in the writing of the main part of the ingestion, it should be retried (****).
In your case, it happens in writing the data of the non-transactional update policy - this write is not retried - the data enters the main table, but not the dependent table.
In the case of a transactional update policy, the whole ingestion will be failed and then retried.
(****) There was a bug in treating such an error, it was treated as permanent for a short period for the main ingestion data. The bug should be fixed now.
I have a DAG defined that contains a number of tasks, the last of which is only run if any of the previous tasks fail. This task simply posts to a Slack channel that the DAG run experienced errors.
What I would really like is if the message sent to the Slack channel contained the actual error that is logged in the task logs, to provide immediate context to the error and perhaps save Ops from having to dig through the logs.
Is this at all possible?
I'll try provide as much information as possible:
No error message.
The instance stays in the "ready service instances".
The receive location has the same parameters (except URI, the three polling queries, user account/pw and receive pipeline) as another receive location that points to another database/table which works.
The pipeline is waiting for the correct schema.
The port surface and receive location are both waiting for the correct schema.
In my test example, there are only 10 lines being returned.
The message, which contains those 10 lines, validates against the schema.
I tried to let the instance alone to no avail - 30+ minutes - and no change in its condition.
I had also tried suspending and then resuming it which then places the instance in the "dehydrated orchestrations" list. Again, with no error message.
I'm able to get the message by looking at the body of the message that's in the "ready to run" service. (This is the message that validates versus the schema I use in Visual Studio.)
How might something like this arise?
Stupid question, but I have to ask... Is the corresponding host instance running?