I was wondering if there is any way you can be aware inside an orchestration that it has just been resumed? I'm logging processing steps and errors from the orchestration, and it would be nice if in the error log you would see something like "Step 2 failed", and then as the next entry "Orchestration resumed at step 2".
Is there maybe some global property that is set after a resume?
So...first some advice...
"I'm logging processing steps and errors from the orchestration"
Don't waste your time with this. Totally serious. I've seen people put hours into this and the result is never useful and almost always causes more problem that it will ever help you solve. Most important, BizTalk Tracking already does this.
Now, in practical terms, you will already know if the Orchestration was suspended so you're trying to log something that you already know. And even if you do manage to logs this, again, it won't help.
Basically, you will be much better off learning how to use the built in tools, such as Tracking and the Event Log instead of spending time on something that will, trust me, never help you. Meaning, it's a net negative.
For an orchestration to be resumed, it first would have to become suspended.
As far as I know, there are 3 ways to suspend an orchestration:
A suspend shape was encountered
An exception occurred
The orchestration is manually suspended using the BizTalk Administration Console.
The first 2 cases can be easily done programmatically either by setting a variable before the suspend shape or in your exception handling.
For the last case, which doesn't occur often, I'm not sure if that is possible.
Related
I am using an Axon Event Tracking processor. Sometimes events take longer that 10 seconds to process.
This seems to cause the message to be processed again and this appears in the log "Releasing claim of token X/0 failed. It was owned by another node."
If I up the number of segments it does not log this BUT the event is still processed twice so I think this might be misleading. (I think I was mistaken about this)
I have tried adjusting the fetchDelay, cleanupDelay and tokenClaimInterval. None of which has fixed this. Is there a property or something that I am missing?
Edit
The scenario taking longer than 10 seconds is making a HTTP request to an external service.
I'm using axon 4.1.2 with all default configuration when using with Spring auto configuration. I cannot see the Releasing claim on token and preparing for retry in [timeout]s log.
I was having this issue with a single segment and 2 instances of the application. I realised I hadn't increased the number of segments like I thought I had.
After further investigation I have discovered that adding an additional segment seems to have stopped this. Even if I have for example 2 segments and 6 applications it still doesn't reappear, however I'm not sure how this is different to my original scenario of 1 segment and 2 application?
I didn't realise it would be possible for multiple threads to grab the same tracking token and process the same event. It sounds like the best action would be to put an idem-potency check before the HTTP call?
The Releasing claim of token [event-processor-name]/[segment-id] failed. It was owned by another node. message can only occur in three scenarios:
You are performing a merge operation of two segments which fails because the given thread doesn't own both segments.
The main event processing loop of the TrackingEventProcessor is stopped, but releasing the token claim fails because the token is already claimed by another thread.
The main event processing loop has caught an Exception, making it retry with a exponential back-off, and it tries to release the claim (which might fail with the given message).
I am guessing it's not options 1 and 2, so that would leave us with option 3. This should also mean you are seeing other WARN level messages, like:
Releasing claim on token and preparing for retry in [timeout]s
Would you be able to share whether that's the case? That way we can pinpoint a little better what the exact problem is you are encountering.
By the way, very likely you have several processes (event handling threads of the TrackingEventProcessor) stealing the TrackingToken from one another. As they're stealing an un-updated token, both (or more) will handled the same event. Hence why you see the event handler being invoked twice.
Obviously undesirable behavior and something we should resolve for you. I would like to ask you to provide answers to my comments under the question, as right now I have to little to go on. Let us figure this out #Dan!
Update
Thanks for updating your question #dan, that's very helpful.
From what you've shared, I am fairly confident that both instances are stealing the token from one another. This does depend though on whether both are using the same database for the token_entry table (although I am assuming they are).
If they are using the same table, then they should "nicely" share their work, unless one of them takes to long. If it takes to long, the token will be claimed by another process. This other process in this case is the thread of the TEP of your other application instance. The "claim timeout" is defaulted to 10 seconds, which also corresponds with the long running event handling process.
This claimTimeout is adjustable though, by invoking the Builder of the JpaTokenStore/JdbcTokenStore (depending on which you are using / auto wiring) and calling the JpaTokenStore.Builder#claimTimeout(TemporalAmount) method. And, I think this would be required on your end, giving the fact you have a long running operation.
There are of course different ways of tackling this. Like, making sure the TEP is only ran on a single instance (not really fault tolerant though), or offloading this long running operation to a schedule task which is triggered by the event.
But, I think we've found the issue at least, so I'd suggest to tweak the claimTimeout and see if the problem persists.
Let us know if this resolves the problem on your end #dan!
My ChangeFeedProcessor's IChangeFeedObserver.CloseAsync callback was invoked with ChangeFeedObserverCloseReason as "ObserverError". So, far I have seen this error only once and I am not sure how to repro it. What causes this error? Is there a way to diagnose this more? Is there any recommended action that one should take after receiving this error?
From your question, I understand you are using Change Feed Processor.
That Close reason is provided when the code in your ProcessChangesAsync implementation throws an unhandled exception.
Basically, if that happens, it means your code had an error processing the changes so:
The Observer is closed, releasing the Lease
The lease becomes available to be picked by any Host instance
The lease gets picked up by a Host, an Observer started, the same batch of changes gets sent to be processed
If the nature of your error was transient, then this time it will work (hopefully). If its not transient, then you will again face an ObserverError.
As a rule of thumb, always try to manage your exceptions if possible, if not, it will be treated as a temporary scenario and will get eventually retried as I described.
Also please next time give more context, describe which libraries and versions you are using and provide some related code. It will help a lot understanding and diagnosing.
What I'm trying to do is set up a decoupled/flexible framework/strategy for all applications I develop in the future, that includes as much 're-use' as possible. Preferably what I'd love to have in the end is a single orchestration that I can 'plug-in' to any other orchestration which will take a message and send to a send adapter and return the response to the calling orchestration (having converted the received response to XML dynamically based on the constructed message to the adapter). This would require being able to set the receive pipeline on the message in the orchestration.
Am I on the right track here? I can't find much on what the best practice is in regards to artifact re-use in BizTalk.
Such comes up from time to time and I can tell you, it just never works out. You will spend a lot of time building essentially a framework, only to never actually use it beyond a handful of situations.
Meaning, no one tries this anymore because it was never actually useful. You might want to look at the ESB Toolkit, but even that almost always makes things more complicated than needed.
If you describe some of your scenarios, we can give the best advice.
I am trying to maximize the benefits from an experience.
Also I usually use Enterprise library logging block, I log errors and a portion of statistical information into the database, because it is centralized place to track errors, if database logging failed, Normally it goes to Event Log.
Tracing messages should go into file:
Which choice you believe we should go
1- Only Some tracing messages can be left in code if there is a complex algorithm or unstable module.
OR
2- We should not keep any tracing messages in code, clean it up as soon as bug is resolved.
For database.
I think that Errors raised from SP and functions should be logged into another table in the database, and that exactly what is done by AdventureWorksLT2008 database.
Is it a bad idea to log database events directly to Enterprise library Log table without raising this errors to next tier. I think it is more fixable, because I can put more custom information in the message. of course some errors will not be handled and will reach the next tier.
Any ideas, or comments, something else you do. something you want to clarify.
Thanks
Are you talking about catching errors and logging directly in T-SQL and not then doing RAISERROR to get it to the caller?
I think that's a viable strategy for certain kinds of issues - for instance, if an SP wants to find a problem and correct it silently and simply issue a warning.
But the kind of issues it would apply to might not be terribly frequent.
The kind of things I would think about are things like unusual cases where unexpected UPDATEs are done instead of INSERTs? Or where data already exists so is not generated. Or in a deployment or build script which skips an existing table, etc.
What if your database has performance issues and SP/functions start timing out - logging the error to the database may not work?
I have a Flex 3 app which I want to instrument to report errors generated by the app to a server via simple HTTPService call.
My idea is to wrap all the methods in try ... catch blocks which then pass the Error object to the reportError() function (which then fires off the HTTP request and pops up a dialog) but is there a better way?
I have implemented a system such as the one you suggest, wrapping all of my methods in try/catch and sending the stack trace to a service that emails me the errors. I created a basic format for the error that logs which method the error occurred in. I noticed that sometimes I end up getting null from the stack traces, so I wanted to log that information for these situations.
It GREATLY improved my application. I tracked down a (large) handful of errors and released a much cleaner build to my users. Now I don't ever get the emails.
The better way IMO is something like this.
I've no idea how good is this particular project (aside from this spooky GPL license), but I don't see why logging in action script should be any different from J2EE, C++, or say Python. Yes, it has some sand box security issues, but I think if this solved, you could log into some centralized log server..
Unfortunately, there really isn't -- errors don't bubble up in such a way as to be trappable at a global level, so the only real way you have to catch errors is to try and catch them all manually. (The community's been pretty vocal in asking for a global exception-handling feature for a while, but it's not there yet.)