For a system that implements Raft, if the leader node goes down and between the time the leader is down and a new leader is elected, a log write request arrives, then does it succeed or is the system unavailable during this period?
The system is unavailable until a new leader is able to commit an entry from its current term. If a request reaches an old leader, the old leader may attempt to replicate it but will ultimately fail due to a higher term, in which case that leader should reject the client’s request and step down. If a requests reaches a node that is not a leader, it can either reject and force the client to retry or enqueue the request to be forwarded to the next leader.
Clients should really handle these cases by retrying requests for some reasonable amount of time when no leader is available. So, to the user of the client a leader change may just appear like high latency.
Related
There is one dynamic send port (Req/response) in my orchestration.
Request is sending to external system and accepting response in orch. There is a chance the external system have monthly maintenance of 2 days. To handle that scenario
Retry interval if I set to 2 days is it impacting the performance? Is it a good idea?
I wouldn't think it is a good idea, as even a transitory error of another type would then mean that message would be delayed by two days.
As maintenance is usually scheduled, either stop the send port (but don't unenlist) or stop the receive port that picks up the messages to send (preferable, especially if it is high volume), and start them again after the maintenance period.
The other option would be to build that logic into the Orchestration, that if it catches an exception that it increased the retry interval on each retry. However as above, if it is high volume, you might be better off switching of the receive location, as otherwise you will have a high number of running instances.
Set a service interval at the send port if you know when the receiving system will be down. If the schedule is unknown I would rather set:
retry count = 290
retry interval = 10 minutes
to achieve that the messages will be transmitted over two days.
We have a solution that takes a message, and sends it to a web API.
Every day, an automatic procedure is run by another department that passes thousands of records into the messagebox, which seems to cause errors related to the API solicit-response port (strangely these errors don't allude to a timeout, but they do only trigger when such a massive quantity of data is sent downstream).
I've contacted the service supplier to determine the capacity of their API calls, so I'll be able to tailor our flow once I have a better idea.
I've been reading up on Rate Based Throttling this morning, and have a few questions I can't find an answer to;
If throttling is enabled, does it only process the Minimum number of samples/messages? If so, what happens to the remaining messages? I read somewhere they're queued in memory, but only of a max of 100, so where do all the others go?
If I have 2350 messages flood through in the space of 2 seconds, and I want to control the flow, would changing my Sampling Window duration down to 1 second and setting Throttling override to initiate throttling make a difference?
If you are talking about Host Throttling setting, the remaining messages will be in the message box database and will show as being in a Dehydrated state.
You would have to test the throttling settings under load. If you get it wrong it can be very bad. I've come across one server where the settings were configured incorrectly and it is constantly throttling.
I have a question about the situation which arises during the flash sale in e-commerce websites. Assume there are only 5 items in stock and if 10000 requests hit the server at same instant, how does the server handle the requests and how does it manage to order the request?
Given the cpu speeds of current computers, like it says here
1 million requests per second, would come out as 1 request per 1000
cpu cycles.
Although requests come from many ends of the world, they are received through a single channel. Which means that two requests come after one another even if they are originated at the exact same time. The time of receipt would certainly not be the same, if routing conditions for the two requests are considered. It is impossible for them to hit the server at the exact same time. Because routing wouldn't allow it in order to prevent collisions.
Therefore the order in which the requests are handled is the order they are received at the network interface. After the request packets are through the application layer, each client will have a thread dedicated for itself. But the access of shared variables like the 5 items you mentioned will be synchronized. Therefore only the first 5 threads to acquire the lock on these shared variable will win.
Lets imagine that we have a bank and an ATM. They communicate over network, which could fail. Is it possible to create a scenario, where communication between them is 100% durable. In this case it means, that:
client withdrawn physical amount of money
<=>
account balance updated accordingly
Let's check couple of scenarios:
ATM sends a requests, bank sends a confirmation. Confirmation get lost, bank updated account but client haven't got the money.
(if bank awaits confirmation from ATM to update balance) ATM sends a requests, bank sends a confirmation, ATM sends ack for reception. Ack got lost. ATM issued money, but bank never updated an account.
So I never could create a solution, where failing network would not prevent money from getting lost on either side.
Please, advise.
Actually, if I am not misunderstanding your question, you're probably talking about Long Wait algorithm.
In your first step---I'd suggest you to wait until the confirmation is not received(acknowledged) by the ATM or vice-versa. This is the only viable solution in this case. In this case,you set up a minimum fixed time-bound after which if the acknowledgement isn't received, you again request the same from the bank at each regular interval of n time-unit(the minimum time unit for checking acknowledgement from bank by the ATM). If it repeatedly fails, this means there is something wrong with the code OR concept.
Also, do utilise the concept of Redo Log Buffer as these are the best option to store and update the bank balances !!! Don't just keep only one copy,but two or three copies of the account information and make changes in temporary copy and only update the final account info in redo log once the acknowledgement from the ATM is received to the bank or the vice-versa! Mind receiving the acknowledgement before updating values in redo log!
We have a shell script setup on one Unix box (A) that remotely calls a web service deployed on another box (B). On A we just have the scripts, configurations and the Jar file needed for the classpath.
After the batch job is kicked off, the control is passed over from A to B for the transactions to happen on B. Usually the processing is finished on B in less than an hour, but in some cases (when we receive larger data for processing) the process continues for more than an hour. In those cases the firewall tears down the connection between the 2 hosts after an inactivity of 1 hour. Thus, the control is never returned back from B to A and we are not notified that the batch job has ended.
To tackle this, our network team has suggested to implement keep-alives at the application level.
My question is - where should I implement those and how? Will that be in the web service code or some parameters passed from the shell script or something else? Tried to google around but could not find much.
You basically send an application level message and wait for a response to it. That is, your applications must support sending, receiving and replying to those heart-beat messages. See FIX Heartbeat message for example:
The Heartbeat monitors the status of the communication link and identifies when the last of a string of messages was not received.
When either end of a FIX connection has not sent any data for [HeartBtInt] seconds, it will transmit a Heartbeat message. When either end of the connection has not received any data for (HeartBtInt + "some reasonable transmission time") seconds, it will transmit a Test Request message. If there is still no Heartbeat message received after (HeartBtInt + "some reasonable transmission time") seconds then the connection should be considered lost and corrective action be initiated....
Additionally, the message you send should include a local timestamp and the reply to this message should contain that same timestamp. This allows you to measure the application-to-application round-trip time.
Also, some NAT's close your TCP connection after N minutes of inactivity (e.g. after 30 minutes). Sending heart-beat messages allows you to keep a connection up for as long as required.