I have set up Akka.net actors running inside an ASP.net application to handle some asynchronous & lightweight procedures. I'm wondering how Akka scales when I scale out the website on Azure. Let's say that in code I have a single actor to process messages of type FooBar. When I have two instances of the website, is there still a single actor or are there now two actors?
By default, whenever you'll call ActorOf method, you'll order creation of a new actor instance. If you'll call that in two actor systems, you'll end up with two separate actors, having the same relative paths inside their systems, but different global addresses.
There are several ways to share an information about actors between actor systems:
When using Akka.Remote you can call actors living on another actor system given their addresses or IActorRefs. Requirements:
You must know the path to a given actor.
You must know the actual address (URL or IP) of an actor system, on which that actor lives.
Both actor systems must be able to communicate via TCP between actor system (i.e. open ports on firewall).
When using Akka.Cluster actor systems (also known as nodes) can form a cluster. They will exchange information about their localization in the network, track incoming nodes and eventually detect a dead or unreachable ones. On top of it, you can use higher level components i.e. cluster routers. Requirements:
Every node must be able to open TCP channel to every other (so again, firewalls etc.)
A new incoming node must know at least one node that is already part of the cluster. This is easily achievable as part of the pattern known as lighthouse or via plugins and 3rd party services like consul.
All nodes must have the same name.
Finally, when using cluster configuration you can make use of Akka.Cluster.Sharding - it's essentially a higher level abstraction over actor's location inside the cluster. When using it, you don't need to explicitly tell, where to find or when to create an actor. Instead, all you need is a unique actor identifier. When sending a message to such actor, it will be created ad-hoc somewhere in the cluster if it didn't exist before, and rebalanced to equally spread the workload in cluster. This plugin also handles all logic associated with routing the message to that actor.
Related
I have data need to be downloaded on a local server every 24 hours. For high availability we provided 2 servers to avoid failures and losing data.
My question is: What is the best approach to use the 2 servers?
My ideas are:
-Download on a server and just if download failed for any reason, download will continue on the other server.
-Download will occur on the 2 servers at the same time every day.
Any advice?
In terms of your high-level approach, break it down into manageable chunks i.e. reliable data acquisition, and highly available data dissemination. I would start with the second part first, because that's the state you want to get to.
Highly available data dissemination
Working backwards (i.e. this is the second part of your problem), when offering highly-available data to consumers you have two options:
Active-Passive
Active-Active
Active-Active means you have at least two nodes servicing requests for the data, with some kind of Load Balancer (LB) in front, which allocates the requests. Depending on the technology you are using there may be existing components/solutions in that tech stack, or reference models describing potential solutions.
Active-Passive means you have one node taking all the traffic, and when that becomes unavailable requests are directed at the stand-by / passive node.
The passive node can be "hot" ready to go, or "cold" - meaning it's not fully operational but is relatively fast and easy to stand-up and start taking traffic.
In both cases, and if you have only 2 nodes, you ideally want both the nodes to be capable of handling the entire load. That's obvious for Active-Passive, but it also applies to active-active, so that if one goes down the other will successfully handle all requests.
In both cases you need some kind of network component that routes the traffic. Ideally it will be able to operate autonomously (it will have to if you want active-active load sharing), but you could have a manual / alert based process for switching from active to passive. For one thing, it will depend on what your non-functional requirements are.
Reliable data acquisition
Having figured out how you will disseminate the data, you know where you need to get it to.
E.g. if active-active you need to get it to both at the same time (I don't know what tolerances you can have) since you want them to serve the same consistent data. One option to get around that issues is this:
Have the LB route all traffic to node A.
Node B performs the download.
The LB is informed that Node B successfully got the new data and is ready to serve it. LB then switches the traffic flow to just Node B.
Node A gets the updated data (perhaps from Node B, so the data is guaranteed to be the same).
The LB is informed that Node A successfully got the new data and is ready to serve it. LB then allows the traffic flow to Nodes A & B.
This pattern would also work for active-passive:
Node A is the active node, B is the passive node.
Node B downloads the new data, and is ready to serve it.
Node A gets updated with the new data (probably from node B), to ensure consistency.
Node A serves the new data.
You get the data on the passive node first so that if node A went down, node B would already have the new data. Admittedly the time-window for that to happen should be quite small.
I have a Java EE application that resides on multiple servers over multiple sites.
Each instance of the application produces logs locally.
The Java EE application also communicates with IBM Mainframe CICS applications via SOAP/HTTP.
These CICS applications execute in multiple CICS regions over multiple mainframe LPARS over multiple sites.
Like the Java EE application the CICS application produces logs locally.
Attempting to trouble shoot issues is extremely time consuming. This entails support staff having to manually log onto UNIX servers and or mainframe LPARS tracking down all related Logs for a particular issue.
One solution we are looking at is to create a single point that collects all distributed logs from both UNIX and Mainframe.
Another area we are looking at is whether or not its possible to drive client traffic to designated Java EE servers and IBM Mainframe LAPS right down to a particular application server node and a single IBM CICS region.
We would only want to do this for "synthetic" client calls, e.g. calls generated by our support staff, not "real" customer traffic.
Is this possible?
So for example say we had 10 UNIX servers distributed over two geographical sites as follows:-
Geo One: UNIX_1, UNIX_3, UNIX_5, UNIX_7, UNIX_9
Geo Two: UNIX_2, UNIX_4, UNIX_6, UNIX_8, UNIX_0
Four IBM Mainframe lpars over two two geographical sites as follows:-
Geo One: lpar_a, lpar_c
Geo Two: lpar_b, lpar_d
each lpar has 8 cics regions
cicsa_1, cicsa_2... cicsa_8
cicsb_1, cicsb_2... cicsb_8
cicsc_1, cicsc_2... cicsc_8
cicsd_1, cicsd_2... cicsd_8
we would want to target a single route for our synthetic traffic of
unix_5 > lpar_b, > cicsb_6
this way we will know where to look for the log output on all platforms
UPDATE - 0001
By "synthetic traffic" I mean that our support staff would make client calls to our back end API's instead of "Real" front end users.
If our support staff could specify the exact route these synthetic calls traversed, they would know exactly which log files to search at each step.
These log files are very large 10's of MB each and there are many of them
for example, one of our applications runs on 64 UNIX physical servers, split across 2 geographical locations. Each UNIX server hosts multiple application server nodes, each node produces multiple log files, each of these log files are 10MB+. the log files roll over so log output can be lost very quickly .
One solution we are looking at is to create a single point that
collects all distributed logs from both UNIX and Mainframe.
I believe collecting all logs into a single point is the way to go. When the log files roll over, perhaps you could SFTP them to your single point as part of that rolling over process. Or use NFS mounts to copy them.
I think you can make your synthetic traffic solution work, but I'm not sure what it accomplishes.
You could have your Java applications send to a synthetic URL, which is mapped by DNS to a single CICS region containing a synthetic WEBSERVICE definition, synthetic PIPELINE definition, and a synthetic URIMAP definition which in turn maps to a synthetic transaction which is defined to run locally. The local part of the definition should keep it from being routed to another CICS region in the CICSPlex.
In order to get the synthetic URIMAP you would have to run your WSDL through the IBM tooling (DFHWS2LS or DFHLS2WS) with a URI control card indicating your synthetic URL. You would also use the TRANSACTION control card to point to your synthetic transaction defined to run locally.
I think this is seriously twisting the CICS definitions such that it barely resembles your non-synthetic environment - and that's provided it would work at all, I am not a CICS Systems Programmer and yours might read this and conclude my sanity is in question. Your auditors, on the other hand, may simply ask for my head on a platter.
All of the extra definitions are needed (IMHO) to defeat the function of the CICSPlex, which is to load balance incoming requests, sending them to the CICS region that is best able to service them. You need some requests to go to a specific region, short-circuiting all the load balancing being done for you.
I want to create a proxy server which routes incoming packets from REQ type sockets to one of the REP sockets on one of the computers in a cluster. I have been reading the guide and I think the proper structure is a combination of ROUTER and DEALER on the proxy server. Where the ROUTER passes messages to the dealer to be distributed. However, I cannot figure out how to create this connection scheme. Is this the correct architecture? If so how to I bind a dealer to multiple addresses. The flow I envision is like this REQ->ROUTER|DEALER->[REP, REP, ...] where only one REP socket would handle a single request.
NB: forget about packets -- think in terms of "Behaviour", that's the key
ZeroMQ is rather an abstract layer for certain communication-behavioral patterns, so while terms alike socket do sound similar to what one has read/used previously, the ZeroMQ-world is by far different from many points of view.
This very formalism allows ZeroMQ Formal-Communication-Patterns to grow in scale, to get assembled in higher-order-patterns ( for load-balancing, for fault-tolerance, for performance-scaling ). Mastering this style of thinkign, you forget about packets, thread-sync-issues, I/O-polling and focus on your higher-abstraction-based design -- on Behaviour -- rather than on underlying details. This makes your design both free from re-inventing wheel & very powerful, as you re-use a highly professional tools right for your problem-domain tasks.
DEALER->[REP,REP,...] Segment
That said, your DEALER-node ( in fact a ZMQsocket-access-node, having The Behaviour called a "DEALER" to resemble it's queue/buffering-style, it's round-robin dispatcher, it's send-out&expect-answer-in model ) may .bind() to multiple localhost address:port-s and these "service-points" may also operate over different TransportClass-es -- one working over tcp://, another over inproc://, if that makes sense for your Design Architecture -- ZeroMQ empowers you to use this transparently abstracted from all the "awfull&dangerous" lower level gritty-nitties.
ZeroMQ also allows to reverse .connect() / .bind()
In principle, where helpfull, one may reverse the .bind() and .connect() from DEALER to a known target address of the respective REP entity.
You leave a couple details out that are important to determining the correct architecture.
When you say "from REQ type sockets to one of the REP sockets on one of the computers in a cluster", how do you determine which computer gets the message? Is it addressed to a specific computer? Does a computer announce its availability before it can receive a message? Does each message just get passed to the next one in line in a round-robin fashion? (if it's not the last one, you probably don't want a DEALER socket)
When you say "how do I bind a dealer to multiple addresses", it's not clear what you mean by "addresses"... Do you mean to say that the proxy has a unique IP address that it uses to communicate with each computer in the cluster? Or are you just wondering how to manage the connection to multiple different peers with the same socket? The former is a special case, the latter is simple.
I'm going to work with the following assumptions:
You want a worker computer from the cluster to announce its availability for work before it receives any work, and any computer in the cluster can handle any job. A faster worker, or a worker working on a smaller job, will not have to wait behind some slow worker to finish their job and get a new job first.
The proxy/broker uses a single ip interface to communicate with all workers.
If those are true, then what you want will be closer to this:
REQ->ROUTER|ROUTER->[REQ, REQ, ...]
A worker will create a request to the backend router socket to announce its availability, and await a reply with work. Once it is finished, it will create a new request with the finished work, which again announces its availability. The other half of the pattern you've already worked out.
This is the Simple Pirate Pattern from the ZMQ guide. It's a good place to start, but it's not very robust. This is in the Reliable Request-Reply Patterns section of the guide, and I suggest you read or reread that section carefully as it will guide you well. In particular, they keep refining this pattern into more and more reliable implementations and wind up with the Majordomo pattern, which is very robust and fault tolerant. You should see if you need all the features that provides or if you can scale it back a little. Either way, you should learn and understand what these patterns are doing and why before you make the choice to do something different.
I have 2 instance in windows azure. I am finding that only one instance is under use. How to tell/configure windows azure to use both instances in round robin way or any other way?
If it's two instances of the same Role, and you're accessing that role through its cloudapp.net dns name (or some custom dns name mapped to cloudapp.net), then the load balancer takes care of that for you; no ability to configure it. You'd see traffic across both instances.
If it's two instances of the same role, but you're accessing them through internal endpoints, it's up to you to choose which instance to talk to.
If it's two instances spread across two different roles, load-balancing doesn't come into play, as you only have one instance of each role.
If they are instances in the same role, load balancing is automatic.
If you have 2 different roles, with different instances underneath them, then you need to scrap that configuration, upload only 1 role and change the configuration to 2 instances.
Also how are you determining that there is only 1 instance being used?
I have a network of computers connected in form of a graph.
I want to ping from one computer(A) to another computer(B). A and B are connected to each other through many different ways, but I want to PING via only a particular edges only. I have the information of the edges to be followed during pinging available at both A and B.
How should I do this?
You could source route the ping but the return would choose its own path.
Furthermore, source-routed packets are often filtered due to security concerns. (Not always, they are useful and sometimes even required at edge routers.)
If the machines are under your local administrative control, then you could ensure that source-routed packets are permitted. As long as you are able to start a daemon on machine B, you could also easily enough design your own ping protocol that generates source-routed echo returns.
Well, this is actually done by routing protocols that are configured on the media in between the computers (routers I expect). I think there isn't a way where you can say "use that specific route". The routers have different protocols (OSPF, EIGRP, RIPv2) and they do the load balancing. The only way you would be sure of one specific route is to use static routing, but this isn't dynamically done where your computer decides the route.
This is normal because :
if you would be able to chose a route, DoS would be quite easy to do to kill one route.