How many supervisors do I need for an Erlang app? - nginx

I want to create a webapp with the microservices approach in mind.
It will consist of a set of independent services made as Erlang applications, that I will be able to start and stop separately. In front of them there will be nginx server working as a reverse proxy.
Should I put all these services under one god supervisor or should I keep them separate having a supervisor for each?
Based on what facts should I make this decision?

By making each service an Erlang application, you already have the answer: each application has a separate supervision tree.

In my mind the answer depends on the ability (or necessity) to release your apps in run independently.
If the services need to stay consistent due to interfaces that are likely to change, it will be easier to have one, or a limited number, of supervision tree (and application(s). The start and stop of services is then an interface of your application(s).
If your services are independent, or if they have very stable interfaces (using a standard for example), or if you don't care about releasing in run, then an application per service seems to be a preferred solution.

Supervisors are there with well defined goal in mind: to give you control over the way your state is managed in presence of errors.
If you have two parts (A and B) of application. Each stores some state: SA and SB. You can decide how to construct your supervision tree basing on one question: does this part of state make sense without other part of state? Can my app continue to work correctly if A is restarted and SA is cleared but B is not restarted and SB is not cleared?
When looking at supervisor, people often think about restarts and rate of restarts. Arguably, this is less important part of supervisor. What you should really care is what is started and in which order. What should be restarted and in which order?
Supervisor gives three strategies for restarts.
one_for_one - this assumes that states of managed children are independent. Default and least useful of all of them.
one_for_all - if one of children dies, state of other children have to be cleared as well
rest_for_one - children on the left should not be restarted, children on the right should be restarted.
Other concept that I really like is "error kernel". In case of rest_for_one children on the left are your "error kernel".
One more important thing you have to know is that supervisor starts all children in order left-to-right, depth-first. And stops them in reverse order.

Related

How to block flows from other nodes on Corda network?

My dev node is part of the Corda test network, and when I open the logs I see something like (node etc..sent you a flow which you don't have installed, you can kill it with kill flow). So I have 2 questions:
How do I reject these calls? I know the purpose of being part of the Corda network is to have the ability for CorDapps of different orgs to transact, and I don't want to go with the segregated network model (because it's more expensive for prod and pre-prod Corda nets).
Can a node on the network perform a DoS (Denial of Service) attack by sending me flows that I don't have installed and eventually bringing my node down?
I'm not sure if I'm right about my answer but as far as I know Corda Network is designed on a need to know basis and I know that you are aware of it and I was having the same doubt when I first started with Corda but I found out that one can simply block a node from sending you any undesired flows which could cost you unnecessary CPU runtime. The explanation to this is given in this link.
Apart from this I have gone through a Medium Post which explained about ResoponderFlow validating information passed through flows and one of the points mentioned over there was to verify the identity of the flow initiator(so as to find if we need that flow),which can't be done within a contract so it needs to be done inside a flow.
Also one can't keep flooding a node with a flow because it contains a timeout,maxRestartCount and backOffBase which really help in determining how the flow is getting propagated across the network.
I hope this helps you to construct a solution to your doubt.

Load balance background tasks in Azure Web App

I am developing an ASP.NET application that will be hosted as an Azure web app. Part of the app will continuously record multiple web-based cameras by retrieving a snapshot every N seconds. I would like to design the app so that the processes that record the cameras can be run on multiple instances. I would like it to load balance between all instances, but not duplicate effort for any one camera.
For example, if I have 100 cameras, and am running on 2 instances, I want each instance to get 50 cameras to process. If I have 5 instances, each instance should get 20 cameras to process. As I add cameras or scale instances up/down I would like for the system to load balance the work evenly.
If it's feasible, I would rather not spin up dedicated VMs just for processing cameras, due to increased cost.
I'm somewhat familiar with Akka.NET, Hangfire, and WebJobs, but am unclear if these will help in this scenario. I have used Hangfire and WebJobs to do background processing, but not with this sort of load-balancing requirement. Will these or some other framework or tool help me load balance these background tasks evenly across Azure Web App Instances? How should I go about setting up these or another framework to do this?
I honestly don't think you want to try to "balance" the servers. I think you just want to make sure the work is well distributed. If I were you, I would use a queue system like SQS to queue up all of the cameras that need a snapshot and let each instance worker dequeue one at a time and process it.
A good approach could be to have a master server responsible for queueing up the snapshots, and then have all of your workers servers simply work out of this shared queue. Even if one server happens to process more than the others, that is fine since the others were working out of the same queue. It just means that this server was able to process its jobs more quickly than the others.
To be honest, there are a lot of ways to approach this. You could do something as simple as just having a shared list of your cameras, with a timestamp for the last snapshot, and use this to work off of. Each server would request a camera, they would look at the list and find one that was stale, and then update the timestamp and perform the snapshot for the camera. The downside to something like this is you are going to struggle with non-atomic operations and the possibility of multiple workers making the request at the same time and both working on the same server. These are the type of things that a queue system will help you with, because as soon as one of those queue items are in flight, they will no longer be available. And also, because each server is responsible for invalidating their items once they are finished, if a server were to crash mid-snapshot, this work would simple go back into the queue.
No matter which solution you choose, it is going to boil down to having a central system/list for serving up stale cameras.
The Azure WebJob SDK uses the Storage Account you set up to balance the work between the various instances that are running your Jobs. You can gain finer control by using a Queue to divide up the work that needs doing and then scale your App Service Plan based on the Queue length.
Here's a rough picture of that architecture:

Service Bus (Maybe?) for web farm

Part of this question is I'm not even sure what exactly I'll need to ask, so I'll start with the situation and work out from there.
One project I'm working on involves use of COMET via the aspComet library. The use case of the program is somewhat of a collaborative slideshow. One person runs the bulk of it, with one or more participants able to perform certain actions. Low latency between when an action is performed on screen
Previously, it was just running on one server. Now, we're wanting to scale it out a bit, more for reliability than performance reasons. So, we have some boxes out in Rackspace's cloud and all that fun stuff.
I knew from the start of this that I was going to need to make some changes to the way the COMET stuff works since different people in the same "show" might be on different servers, and I have no way of knowing what "show" they belong to until after they have already arrived on the site.
I initially tackled this using the WCF Mesh provider, which wasn't well documented to start with, now I'm running into issues with dispatching messages to it sometimes get lost, or delayed (I'm not 100% sure what is going on there), but it screws up the long poll for COMET and breaks things in rather strange ways (Clicking a button may trigger an event, or it'll hang for 10 seconds {long poll duration} and not actually do anything).
More research leads me to believe one of the .Net service bus providers may do what I need. However, I can't find examples that would cover what I need:
No single point of failure (outside of a database)
No hardcoding of peers.
Near-realtime (no polling, event based would be best)
My ideal solution would involve that when a server comes up, it lets other servers know of its existence (Even if it's just a row in a table somewhere), and they can start sending broadcast messages among each other, with each server being both a publisher and subscriber. This is what I somewhat had in the WCF Mesh provider, but I'm not overly confident in that code.
Can anyone point me in the right direction with this? Even proper terms to look for in the docs for service bus providers would be good at this point. Or are service buses not what I want? At this point I would settle for setting up a Jabber server on each web server and use that, if it could fit within my constraints.
I can't speak a ton to NServiceBus, but I expect the answers will be similar.
Single point of failure: MSMQ can use multicasting, which means each endpoint will broadcast it's existence and no DB table is needed. RabbitMQ uses this Exchange-to-Queue binding process which means as long as the Rabbit instance or cluster is up then messages still exist. RabbitMQ can be clustered, MSMQ cannot be. *Note: You might have issues with multicasting with Rackspace, no idea how they work. If so, you'll have to fall back on the runtime services for MSMQ (not RabbitMQ), that would create a single point of failure because everyone has a single point to coordinate control messages through.
Hard coding of peers: discussed above a bit; MSMQ's multicast handles it. Rabbit it can also be done, just bind queues to an exchange you want to listen to. MassTransit takes care of this for you.
Near-realtime: These both use messaging which is near real time. There's no polling in your message consumer code.
I think a service bus seems like a reasonable solution for what you're trying. Some more details would likely be needed, but the general messaging approach is correct. There are other more light weight messaging libraries if you decided you just want something on top of RabbitMQ and configure Rabbit to handle most of the stuff.
To get started with MassTransit, we have documentation up: http://readthedocs.org/projects/masstransit/ and mailing list http://groups.google.com/group/masstransit-discuss. Join the mailing list if you have future questions and someone will try and help you out.

Peer-to-peer replication of a sqlite database

I am looking for a way to replicate a small and simple relational database (like SQLite) across peers. This should work in an environment with unstable network connections, hence the need for each peer to have a full copy of the database. This should allow a peer to continue working off-line in the event of network failure.
To keep things simple, replication should only have to support the replication of addition of data, i.e. only INSERTs, not DELETEs or UPDATEs.
Does anyone know of a good - and ideally cross-platform - technology or method of creating such a system? I am currently looking at JXTA and JXSE, but I am put off by its complexity and apparant lack of life in its community after the takeover of Sun by Oracle.
Thanks!
Frans
rqlite uses the raft consensus algorithm, so it should be fairly resilient to unstable network connection.
Also, it seems to be possible to configure rqlite to accept reads even in the case of a network failure.
A similar project, dqlite, exists as a library, available in various languages, but it seems less explicit about the event of a network failure.
You may want to explore JGroups for the communication layer if you don't like JXTA. For the replication, I think you will have to implement your own code.
I am working on something similar (though the code is far from ready). I'll describe a little about my intended approach, but whether that is suitable for you depends on some key design points you'd need to consider. I am not aware of any ready-built projects that will do this, unfortunately.
In particular we'd need to know what language you wish to use, or which languages you'd rather avoid.
Also, consider how you intend to do peer dicovery - can you set up trust between node pairs manually, or do you want them to auto-discover?
Presumably all peers may insert data?
If you are able to use PHP, and are happy manually peering node pairs, then my approach may be of interest. Set up an ORM such as Doctrine, Propel or NotORM, and get each node to regularly sync with an internet time source. For each new row in a db, grab the data (either in an array or ORM object), serialise it, and push it out to all nodes that you have a trust relationship with. Where a push fails, keep a note of this and retry at periodic intervals (potentially giving up after a remote node fails to answer a large number of retries).
Pushes can either be kicked off by your application that creates the row, or can be called by whatever scheduler is available on each machine. A push message can be XML, or for simplicity can be just a POST message containing the new row and whatever metadata (e.g. timestamp of save, so as to resolve INSERT order from several nodes).
If your nodes do not have static IP addresses, they could be registered with a dynamic DNS addressing service so as to allow each node to stay in touch with peers even if their IP changes. You might also consider adding a message signing system, to ensure that messages between nodes are genuine.

Hot plugging an additional node in OpenMPI cluster

Is it possible to hot plug an additional node (host) into a working OpenMPI app? We're talking about production environment where we cannot afford even a 5 second downtime.
There are two scenarios I'm interested in:
We just would like to enhance the computing power by adding one more broadcast listener.
A node died, the master node handles it well and reassigns the task to somebody else. The system administrator comes in, restarts the dead node and plugs it back into the cluster.
Which platform independent MPI implementation would be best for the scenario above? OpenMPI is not a must here.
MPI-2 -- any implementation -- does allow dynamic processes, and in fact adding processes is currently much more feasible than removing processes. You can use MPI_COMM_SPAWN to launch a new process with a given executable, and that returns an intracommunicator that can be used to communicate between the old (original) processes.
The tricks here are -- nothing will automatically detect the new node. You'll have to have some process keeping an eye out for them, SPAWN something on them. If the new nodes will just be listeners to the master node, that's probably the best case, as only the master node really needs to know about it. The invocation to ensure the spawn happens on the new node and not somewhere else will be done through the info argument to spawn, and may be implementation dependant.

Resources