Erlang supervisor with one critical child

Erlang supervisor with one critical child - functional-programming

We are in the process of re-organizing our applications supervision tree to make it more robustly handle failures and re-starts. However, we have a scenario where we have one parent supervisor that starts four child supervisors. The problem we have is that the first child supervisor starts several children gen_servers that must be started and initialized prior to the second child supervisor starting or it will fail.
So, I need a startup like the following:
test_app.erl -> super_supervisor -> [config_supervisor, auth_supervisor, rest_supervisor]
The trick I'm having trouble with is that config_supervisor must complete all initialization prior to auth_supervisor or rest_supervisor being started. With the rest_for_one startup strategy I get, essentially, this behavior but only by allowing auth_supervisor to fail because the needed config is not there. I would prefer to just request that config_supervisor is completed with it's initialization (which includes starting several gen_servers) prior to moving-on to auth_supervisor.
This seems like a common scenario that would have been conquered previously but, I am having a hard time "googling" a solution. Does anybody have advice or sample code that might exist to handle this scenario?

Supervisors do a synchronous start of their children, starting each one in turn before starting the next in the order they occur in the childspeclist. So your super_supervisor should start its children in the right order, first config_supervisor, then auth_supervisor and finally rest_supervisor by having them in that order. A supervisor must (successfully) start all its children before it is considered to be started. So if config_supervisor has all the necessary processes which must be started during the initialization as its children then super_supervisor will not start the other supervisors until the config_supervisor is done.
In this case you would not need rest_for_one to ensure starting in the right order if the children are in the right order in the childspeclist.
For a worker process, gen_server/gen_fsm/gen_event, they are considered started when their init callback returns.
Have I understood your description and question correctly?

You may try to move config_supervisor into its own application and set the application as a requirement for the main one, in this case the config application will be started first and then the main supervisor with auth_supervisor, etc will start their initialisation.

Did you look at the rest_for_one restart strategy? It seems that it should be covenient in this case, the middle supervisor starts the gen_servers in a defined order and last the leaf supervisor who in turn start the critical process.

Related

Airflow tasks not running in correct order?

cleanup_pbp is downstream of all 4 of load_pbp_30629, load_pbp_30630, load_to_bq_30629, load_to_bq_30630. cleanup_pbp started at 2021-12-05T08:54:48.
however, load_pbp_30630, one of the 4 upstream tasks, did not end until 2021-12-05T09:02:23.
How is cleanup_pbp running before load_pbp_30630 ends? I've never seen this before. I know our task dependencies have a bit of criss-cross going on, but that shouldn't explain why the tasks run out of order?

We have exactly the same problem and after checking, finally we found out that the problem is caused by the loop function in Dag script.
Actually we use a for loop to create tasks as well as their relationships. As in each iteration it will create the upstream task and the downstream task (like the cleanup_pbp in your case) and always give the same id to downstream one and define its relations (ex. Load_pbp_xxx >> cleanup_pbp), in graph or tree view it considers this downstream id has serval dependences but when Dag runs it takes only the relation defined in the last iteration. If you check Task Instance you will see only one task in its upstream_list.
The solution is to move the definition of this final task out of the loop but keep the definition of dependencies inside the loop. This well resolves our problem of running order, the final task won’t start until all dependences are finished. (with tigger rule all_success of course)
I’m not sure if you are in the same situation as us but hope this will give you some idea.

Can a parent process determine if its child has received a SIGINT?

Suppose, I launch a parent process, which launches a subprocess, but then the parent receives a SIGINT. I want the parent to exit, but I don't want the child process to linger and/or become a zombie. I need to make sure it dies.
If I can determine that the child also received a SIGINT, then it is probably cleaning up on its own. In that case, I'd prefer to briefly wait while it finishes and exits on its own. But if it did not receive a SIGINT, then I will send it a SIGTERM (or SIGKILL) immediately and let the parent proceed with its own cleanup.
How can I figure out if the child recevied the SIGINT? (Leaving aside the fact that it might not even respond to SIGINT...) Do I just have to guess, based on whether or not the parent is running in the foreground process group? What if the SIGINT was sent programmatically, not via Ctrl+C?

How can I figure out if the child received the SIGINT?
Perhaps you can't. And what should matter to you is if the child handled SIGINT (it could have ignored it). See my answer to your other question.
However, in many cases, the signal sent by Ctrl C was sent to a process group. Then you might have got that signal too.
In pathological cases, your entire system experiments thrashing and the child process had not even being scheduled yet to process the signal.
I want the parent to exit, but I don't want the child process to linger and/or become a zombie. I need to make sure it dies.
Maybe you want to use somewhere daemon(3) ?
BTW, I don't understand your question, because I have to guess its (ungiven) motivations. Are you caring about job control or implementing a shell? In what concrete cases do you really care that the child got the SIGINT and what does that mean to you?

Parallel shapes in BizTalk Server

I am using parallel shapes in the BizTalk orchestration. There are four parallel branches in the shape and in each branch I am using a scope shape (Transaction Type = None) with subsequent catch block and the execution logic is placed in the scope shape.
This parallel Shape is also contained in a scope (Transaction Type = None ) in the orchestration with corresponding catch block.
Now what is the supposed behaviour if the execution in one of the branch fails? As per me if execution of one branch fails, the execution of other branch should have been taken place.
But in my orchestration if one branch execution fails the other branch execution is not started even. It seems like that other branch starts executes after the previous branch code is executed successfully.
Please tell me what can be the possible source of this behaviour?

According to MSDN, the Parallel shape will have all its branches run independently See MSDN: http://msdn.microsoft.com/en-us/library/ee253584(v=bts.10).aspx
However, this is from a business process perspective, not from a technical one. If one of your branches fails, it is perfectly possible that other branches will not be executed. As far as I know, you don't have any control over the order of execution (not sure about that one though).
See this small blog post for more information: http://blogs.msdn.com/b/pkelcey/archive/2006/08/22/705171.aspx
An aggregator pattern might be a good idea here, depending on your specific situation. It would give you full control over the situation.

Basically, if one of the branches fail, then all of the branches fail. The key point to remember is:
All branches come together at the end of the Parallel Actions shape, and processing does not continue until all have completed.
So, if one of then branches fails, then they will never converge. If an exception is thrown on one branch, the catch block will catch it and all the other branches will cease to process any incoming messages. It's my understanding that Parallel branches are mostly used in message correlation for situations where you need to wait for more than one message to arrive in order to be able to proceed. Order of branch execution is determined by order of the messages received that each branch is expecting.

Servlet getParameter

I have a servlet program for counting numbers, I want to control it through an html interface.
by pressing the start button the program must start running and by pressing pause button the servlet program must be paused and by clicking on the restart button it must restart again. by the way i used thread. My problem is that each time I should click one button and send its value to the servlet, and when I'm getting the buttons values inside the servlet a NullPointerException is occur... any help ??

I wouldn't use a Thread for that purpose and in general is not usually a good idea to create threads in servlets.
Say we count one number per millisecond meaning: it will give me the time between one click and another in milliseconds.
One work around could be:
Click on start = save the start time in the session.
click on stop = to get the count we do currentTime-StartTime (saved in session)
Now if you really must use Threads be sure then to create it using another class.
One suggestion might be create a ThreadManager class and store it in the session (use a listener for this) and then start it in that session object.
Even better store the ThreadManager inside the servletContext and have a way to create your thread per session.
To create Threads favor the Executor classes instead of the Thread classes.
Also make sure you stop your threads since having threads created by us inside a web container may prevent it from stoping entirely.
If you provide some code I can help you further.
Good luck, have fun.

Starting mutliple orchestrations from parent orchestration and passing messages to them

I have a situation where a main orchestration is responsible for processing a convoy of messages. These messages belong to a set of customers, the orchestration will read the messages as they come in, and for each new customer id it finds, it will spin up a new orchestration that is responsible for processing the messages of a particular customer. I have to preserve the order of messages as they come in, so the newly created orchestrations should process the message it has and wait for additional messages from the main orchestration.
Tried different ways to tackle this, but was not able to successfuly implement it.
I would like to hear your opinions on how this could be done.
Thanks.

It sounds like what you want is a set of nested convoys. While it might be possible to get that working, it's going to... well, hurt. In particular, my first worry would be maintenance: any changes to the process would be a pain in the neck to make, and, much worse, deployment would really, really suck.
Personally, I would really try to find an alternative way to implement this and avoid the convoys if possible, but that would depend a lot on your specific scenario.
A few questions, if you don't mind:
What are your ordering requirements? For example, do you only need ordered processing for each customer on a single incoming batch, or across batches? If the latter, could you make do without the master orchestration and just force a single convoy'd instance per customer? Still not great, but would likely simplify things a lot.
What are you failure requirements with respect to ordering? Should it completely stop processing? Save message and keep going? What about retries?
Is ordering based purely on the arrival time of the message? Is there anything in the message that you could use to force ordering internally instead of relying purely on the arrival time?
What does the processing of the individual messages do? Is the ordering requirement only to ensure that certain preconditions are met when a specific message is processed (for example, messages represent some tree structure that requires parents are processed before children).

I don't think you need a master orchestration to start up the sub-orchestrations. I am assumin you are not talking about the master orchestration implmenting a convoy pattern. So, if that's the case, here's what I might do.
There is a brief example here on how to implment a singleton orchestration. This example shows you how to setup an orchestration that will only ever exist once. All the messages going to it will be lined up in order of receipt and processed one at a time. Your example differs in that you want to have this done by customer ID. This is pretty simple. Promote the customer ID in the inbound message and add it to the correlation type. Now, there will only ever be one instance of the orchestration per customer.
The problem with singletons is this. You have to kill them at some point or they will live forever as dehydrated orchestrations. So, you need to have them end. You can do this if there is a way for the last message for a given customer to signal the orchestration that it's time to die through an attribute or such. If this is not possible, then you need to set a timer. If no messags are received in x seconds, terminate the orch. This is all easy to do, but it can introduce Zombies. Zombies occur when that orchestration is in the process of being shut down when another message for that customer comes in. this can usually be solved by tweeking the time to wait. Regardless, it will cause the occasional Zombie.
A note fromt he field. We've done this and it's really not a great long term solution. We were receiving customer info updates and we had to ensure ordered processing. We did this singleton approach and it's been problematic from the Zombie issue and the exeption issue. If the Singleton orchestration throws an exception, it will block the processing for a all future messages for that customer. So - handle every single possible exception. The real solution would have been to have the far end system check the time stamps from the update messages and discard ones that were older than the last update. We wanted to go this way, but the receiving system didn't want to do this extra work.