Is there any way by which I can make the various parallel shapes in BizTalk orchestration to get processed in asynchronous way? More specifically, if any one of the parallel shape fails while getting executed, it should not affect the other parallel shapes execution.
As you found out the parallel shapes in a BizTalk Orchestration aren't truly independent to each other. It will try and execute the steps in the first parallel action and only start on one of the other parallel actions when it is waiting on a response on the current parallel action. Any failures or termination can lead to unexpected results as per.
How to Configure the Parallel Actions Shape
Caution If you place a Terminate shape inside a Parallel Actions
shape, and the branch with the Terminate on it is run, the instance
completes immediately, regardless of whether other branches have
finished running. Depending on your design, results might be
unpredictable in this case.
Your options are
Have a scope inside each branch of the parallel shape, so if there is a failure it is caught but doesn't stop the execution of the other branches. Note: If the execution of the steps inside the parallel shapes are of short duration it probably would pay just to remove the parallel shape altogether and just have some scopes following each other.
Have Start Orchestration shapes that call an other Orchestrations that do the processing. A failure in the Orchestration you started won't stop the execution in the Orchestration you started it from.
Related
I have built an orchestration with a loop to retreive paged data from REST web service. From page size and offset I am able to call the service for "next page" of data. Then I debatch it, map it to internal format and process it further. When one page is processed, I request next page from the REST web service.
As it turns out, the host running the orchestration and send ports causes the memory to constantly grow during processing of all the data, and eventually hit the throttling mode.
Why is memory not released when I am done with one page loop? Is it the "consumed" messages that are stored in the orchestration that builds up the memory? Is it possible to clear orchestration from these "consumed" messages, to release the memory used?
(No message tracking active on the orchestration, or send ports.)
Apparently, there is no way to prohibit BizTalk Orchestrations from building up a list of messages in Orchestration, including used/processed/consumed messages. Putting things in Scope does not prohibit this behaviour.
Hence, for long-running Orchestrations there can be a lot of messages building up. Especially for singleton Orchestrations, where the general solution proposed to deal with this problem is to make sure Orchestration shuts down once in a while (when idle, e.g.).
My solution was to split the Orchestration into two, and have the initial Orchestration start the second Orchestration with the Start Orchestration, which in turn calls the second Orchestration recursively, and so on, until last page is received and the last Orchestration ends.
Yes, what you need to do is to have scopes, and that the messages are initialised in the scope (Green highlight below) rather than an the top level (yellow), and that means they will also be disposed of at the end of the scope. Note: That means those message can't be used outside of the scope.
However if you are just re-using the same messages in the loop, then I wouldn't expect it to increase memory usage. So there is possibly something else going on. I suspect that you must be adding each page to a message, and that is what is growing
I am trying to understand the basic functionality of Serial Queue and Concurrent Queue in GCD.
Can we perform synchronous operations on Concurrent Queue? As I know synchronous means executing tasks one after another but how it is possible with Concurrent Queue which executes tasks in parallel? It seems contradictory to me.
Similarly, how can we perform asynchronous operation on serial queue as serial queue perform tasks one after another so how they can be executed concurrently?
If anyone can explain with the help of image then it will be very clear.
You asked:
Can we perform synchronous operations on Concurrent Queue? As I know synchronous means executing tasks one after another but how it is possible with Concurrent Queue which executes tasks in parallel?
OK, let’s consider terminology before answering your question:
What is a “synchronous operation”? It is one that will block its respective thread during that operation. But a concurrent queue can use multiple threads to perform these individual synchronous operations on that same queue at the same time, each running on its own thread.
Let us use a practical example: Consider a synchronous operation that might be an algorithm to process an image (e.g. resize it or convert a color image to black-and-white). When you perform this operation, it will generally tie up the respective thread until the operation is done.
So, given that example, yes, you can certainly can (and we often do) perform multiple concurrent synchronous operations in parallel. Using our prior example, you might have 4 images that you want to process concurrently. So you might instantiate a concurrent queue, and add these four operations to that queue, and they will be processed in parallel, each on its own “worker thread”.
You then ask:
Similarly, how can we perform asynchronous operation on serial queue as serial queue perform tasks one after another so how they can be executed concurrently?
This depends a little upon what you mean by “operation”. Are you talking about a Swift Operation (or Objective-C NSOperation) on an “operation queue”? Or are you using the term “operation” a little more generally as it applies to GCD and dispatch queues?
The reason I ask, is that in the world of GCD (aka “dispatch queues”), you simply do not “perform an asynchronous operation on a serial queue”. You start asynchronous tasks from a serial queue, but the definition of “asynchronous” means that the current thread does not wait for the task to finish (which generally means that, often behind the scenes, another queue/thread is doing the work).
A good example of that would be when you start a series of network requests from a serial queue. Hidden in NSURLSession/URLSession, it has its own queues/threads that are managing these multiple network requests concurrently. If you do not want these requests to run concurrently, some sleight of hand is required to take an API which is designed for concurrent operation and have it behave sequentially, one after the other.
This is where operation queues come into play, as they do have the concept of custom Operation/NSOperation subclasses, in which you can define an operation to wrap an asynchronous task, such that the operation does not “complete” until the asynchronous task is done. It uses KVO to notify the queue when the operation is executing, is finished, etc. In that scenario, you can define a serial operation queue (i.e., one with a maxConcurrentOperationCount of 1), add a series of your own asynchronous operation subclass instances to that queue, and it can run them sequentially, one after the other. But using operation queues with asynchronous operations can be a little complicated. If that’s really what you are trying to do, we can point you to some examples. But, in the interest of full disclosure, this operation queue pattern is used less frequently nowadays, and you will often see other patterns such as Combine, or the new async-await API, to achieve similar results.
So, we can’t answer this latter question without a little more detail of what precisely you mean by “asynchronous operation on serial queue”. Give us a practical example of what you mean (and what API you are using).
I am currently looking at diagnosing some reoccurring issues within a BizTalk environment and currently that is the issue of zombie messages. I am aware of the conditions that create these errors and whilst diagnosing the orchestration and making use of the Orchestration Debugger, I see that when a message has hit a terminate shape, it is followed by an initialisation.
The general structure of the orchestration is as follows:
The first scope is a long-running transaction and within the loop after that scope, there is a listen shape that waits for a message for 10 seconds. If a message comes in time, it enters another long-running transaction. It's like a singleton in a way? Both scopes share the same logical receive port and are correlated, only odd part is how the first scope is repeated within the loop that's inside the listen shape. (Orchestration is part of a behemoth of an application that wasn't written by myself.)
Would this initialisation after a termination (what actually causes this to happen?) cause zombies, if so is the structure of the orchestration and the transactions a cause of this? Or am I looking in the wrong place?.
Let me know if there's any extra information that can help!
In the Orchestration debugger it will show when something start and also when it ends with slightly different icons. So what you are seeing is the end of the Orchestration.
No, that will not cause zombies. Zombies occur after it ends the logical receive location that listens for something (and it is tearing down the instance subscription) and another message arrives that matched that subscription before the Orchestration has fully ended.
Can queued kernels continue to execute while an OpenCL clEnqueueReadBuffer operation is occurring?
In other words, is clEnqueueReadBuffer a blocking operation on the device?
From a host API point of view, clEnqueueReadBuffer can be blocking or not, depending on if you set the blocking_read parameter to CL_TRUE or CL_FALSE.
If you set it to not block, then the read just gets queued and you should use an event (or subsequent blocking call) to determine when it has finished (i.e., before you access the memory that you are reading to).
If you set it to block, the call won't return until the read is done. The memory being read to will be correct. Also (and answering your actual question) any operations you queued prior to the clEnqueueReadBuffer will all have to finish first before the read starts (see exception note below).
All clEnqueue* API calls are asynchronous, but some have "blocking" parameters you can set. Using it is the equivalent to using a non-blocking version and then calling clFinish instead. The command queue will be flushed to the device and your host thread won't continue until the work has finished. Of course, it is hard to keep the GPU always busy doing it this way, since now it doesn't have any work, but if you queue up new work fast enough you can still keep it reasonably busy.
This all assumes a single, in-order command queue. If your command queue is out-of-order and your device supports out-of-order queues then enqueued items can execute in any order that doesn't violate the event_wait_list parameters you provided. Likewise, you can have multiple command queues, which can again be executed in any order that doesn't violate the event_wait_list parameters you provided. Typically, they are used to overlap memory transfers and compute, and to keep multiple compute units busy. Out-of-order command queues and multiple command queues are both advanced OpenCL concepts and shouldn't be attempted until you fully understand and have experience with in-order command queues.
Clarification added later after DarkZeros pointed out the "on the device" part of the OP's question: My answer was from the host thread API point of view. On the device, with an in-order command queue all downstream commands are blocked by the current command. With an out-of-order queue they are only blocked by the event_wait_list. However, out-of-order command queues are not well supported in today's drivers. With multiple command queues, in theory commands are only blocked by prior commands (if in-order) and the event_wait_list. In reality, there are sometimes special vendor rules that prevent the free flowing of potentially non-blocked commands that you might like. This is often because the multiple OpenCL command queues get transferred to device-side memory and compute queues, and get executed in-order there. So depending on the order that you add commands to your multiple command queues, they might get interleaved in such a way that they block in sub-optimal ways. The best solution I'm aware of is to either be careful about the order you enqueue (based on knowledge of this implementation detail), or use one queue for memory and one for compute, which matches the device-side queueing.
If overlap of memory and compute is your goal, both AMD and NVIDIA both provide examples of how to overlap memory and compute operations, and for GPUs that support multiple compute operations, how to do that too. NVIDIA examples are hard to get ahold of but they are out there (from CUDA 4 days).
I'm trying to get around the concept of cooperative multitasking system and exactly how it works in a single threaded application.
My understanding is that this is a "form of multitasking in which multiple tasks execute by voluntarily ceding control to other tasks at programmer-defined points within each task."
So if you have a list of tasks and one task is executing, how do you determine to pass execution to another task? And when you give execution back to a previous task, how do resume from where you were previously?
I find this a bit confusing because I don't understand how this can be achieve without a multithreaded application.
Any advice would be very helpeful :)
Thanks
In your specific scenario where a single process (or thread of execution) uses cooperative multitasking, you can use something like Windows' fibers or POSIX setcontext family of functions. I will use the term fiber here.
Basically when one fiber is finished executing a chunk of work and wants to voluntarily allow other fibers to run (hence the "cooperative" term), it either manually switches to the other fiber's context or more typically it performs some kind of yield() or scheduler() call that jumps into the scheduler's context, then the scheduler finds a new fiber to run and switches to that fiber's context.
What do we mean by context here? Basically the stack and registers. There is nothing magic about the stack, it's just a block of memory the stack pointer happens to point to. There is also nothing magic about the program counter, it just points to the next instruction to execute. Switching contexts simply saves the current registers somewhere, changes the stack pointer to a different chunk of memory, updates the program counter to a different stream of instructions, copies that context's saved registers into the CPU, then does a jump. Bam, you're now executing different instructions with a different stack. Often the context switch code is written in assembly that is invoked in a way that doesn't modify the current stack or it backs out the changes, in either case it leaves no traces on the stack or in registers so when code resumes execution it has no idea anything happened. (Again, the theme: we assume that method calls fiddle with registers, push arguments to the stack, move the stack pointer, etc but that is just the C calling convention. Nothing requires you to maintain a stack at all or to have any particular method call leave any traces of itself on the stack).
Since each stack is separate, you don't have some continuous chain of seemingly random method calls eventually overflowing the stack (which might be the result if you naively tried to implement this scheme using standard C methods that continuously called each other). You could implement this manually with a state machine where each fiber kept a state machine of where it was in its work, periodically returning to the calling dispatcher's method, but why bother when actual fiber/co-routine support is widely available?
Also remember that cooperative multitasking is orthogonal to processes, protected memory, address spaces, etc. Witness Mac OS 9 or Windows 3.x. They supported the idea of separate processes. But when you yielded, the context was changed to the OS context, allowing the OS scheduler to run, which then potentially selected another process to switch to. In theory you could have a full protected virtual memory OS that still used cooperative multitasking. In those systems, if a errant process never yielded, the OS scheduler never ran, so all other processes in the system were frozen. **
The next natural question is what makes something pre-emptive... The answer is that the OS schedules an interrupt timer with the CPU to stop the currently executing task and switch back to the OS scheduler's context regardless of whether the current task cares to release the CPU or not, thus "pre-empting" it.
If the OS uses CPU privilege levels, the (kernel configured) timer is not cancelable by lower level (user mode) code, though in theory if the OS didn't use such protections an errant task could mask off or cancel the interrupt timer and hijack the CPU. There are some other scenarios like IO calls where the scheduler can be invoked outside the timer, and the scheduler may decide no other process has higher priority and return control to the same process without a switch... And in reality most OSes don't do a real context switch here because that's expensive, the scheduler code runs inside the context of whatever process was executing, so it has to be very careful not to step on the stack, to save register states, etc.
** You might ask why not just fire a timer if yield isn't called within a certain period of time. The answer lies in multi-threaded synchronization. In a cooperative system, you don't have to bother taking locks, worry about re-entrance, etc because you only yield when things are in a known good state. If this mythical timer fires, you have now potentially corrupted the state of the program that was interrupted. If programs have to be written to handle this, congrats... You now have a half-assed pre-emptive multitasking system. Might as well just do it right! And if you are changing things anyway, may as well add threads, protected memory, etc. That's pretty much the history of the major OSes right there.
The basic idea behind cooperative multitasking is trust - that each subtask will relinquish control, of its own accord, in a timely fashion, to avoid starving other tasks of processor time. This is why tasks in a cooperative multitasking system need to be tested extremely thoroughly, and in some cases certified for use.
I don't claim to be an expert, but I imagine cooperative tasks could be implemented as state machines, where passing control to the task would cause it to run for the absolute minimal amount of time it needs to make any kind of progress. For example, a file reader might read the next few bytes of a file, a parser might parse the next line of a document, or a sensor controller might take a single reading, before returning control back to a cooperative scheduler, which would check for task completion.
Each task would have to keep its internal state on the heap (at object level), rather than on the stack frame (at function level) like a conventional blocking function or thread.
And unlike conventional multitasking, which relies on a hardware timer to trigger a context switch, cooperative multitasking relies on the code to be written in such a way that each step of each long-running task is guaranteed to finish in an acceptably small amount of time.
The tasks will execute an explicit wait or pause or yield operation which makes the call to the dispatcher. There may be different operations for waiting on IO to complete or explicitly yielding in a heavy computation. In an application task's main loop, it could have a *wait_for_event* call instead of busy polling. This would suspend the task until it has input to process.
There may also be a time-out mechanism for catching runaway tasks, but it is not the primary means of switching (or else it wouldn't be cooperative).
One way to think of cooperative multitasking is to split a task into steps (or states). Each task keeps track of the next step it needs to execute. When it's the task's turn, it executes only that one step and returns. That way, in the main loop of your program you are simply calling each task in order, and because each task only takes up a small amount of time to complete a single step, we end up with a system which allows all of the tasks to share cpu time (ie. cooperate).