What message persistence guarantee NATS streaming provides in cluster and FT modes? - nats.io

I'm looking for a streaming server with the message persistence guarantee, i.e. where the messages published by producers are guaranteed to be durably stored before the server acknowledges publishing to the producer.
My use case requires that we reduce the possibility of losing any produced messages. Producers are able to replay messages if required but they need to be sure that the ACKed message is durably persisted and will be delivered by the streaming server to the consumers.
NATS Streaming server seems to do something along those lines, but the docs for clustering and fault tolerance don't make it very clear what persistence guarantee is provided in each case. The doc on producer integration confirms that the server will actively ACK the published messages, either synchronously or via callback, but it does not make it clear if the ACK means that the message was durably stored at this point or not yet.
The doc on store configuration, specifically SQL options briefly mentions that the ACK from the server means a durable storage guarantee, but it's not clear still how exactly that applies in cases of Clustering and Fault Tolerance and different persistence backends (files or SQL).

NATS Streaming will have persisted the message before sending the publisher ACK back. The store implementations (filestore/SQL) may use some caching, but regardless, the writes are sync'ed (unless disabled) before the ACK is sent back.
However, in cluster mode, the filestore sync'ing is disabled because we rely on the fact that the data is replicated to each node of the cluster and so you would need multiple failures at once to lose the message. (note that there is an option for file store implementation to perform auto-sync at regular interval: see auto_sync here


HTTP Server-Push: Service to Service, without Browser

I am developing a cloud-based back-end HTTP service that will be exposed for integration with some on-prem systems. Client systems are custom-made by external vendors, they are back-end systems with their own databases. These systems are deployed in companies of our clients, we don't have access to them and don't control them. We are providing vendors our API specifications and they implement client code.
The data format which my service exchanges with clients is based on XML and follows a certain standard. Vendors implement their client systems in different programming languages and new vendors will appear over time. I want as many of clients to be able to work with my service as possible.
Most of my service API is REST-like: it receives HTTP requests, processes them, and sends back HTTP responses.
Additionally, my service accumulates some data state changes and needs to regularly push this data to client systems. Because of the below limitations, this use-case does not seem to fit the traditional client-server HTTP request-response model.
Due to the nature of the business, the client systems cannot afford to have their own HTTP API endpoints open and so my service can't establish an outbound HTTP connection to them for delivering data state notifications. I.e. use of WebHooks is not an option.
At the same time my service stakeholders need recorded acknowledgment that data state notifications were accepted by the client system, therefore fire-and-forget systems like Amazon SNS don't seem to apply.
I was considering few approaches to this problem but I'm not sure if I'm missing some simple options or some technologies that already address the problem. Hence this question.
The question text updated: options moved to my own answer.
Related questions and resources
REST API with active push notifications from server to client
Is ReST over websockets possible?
Can we use Web-Sockets for Communication between Microservices?
What is difference between grpc and websocket? Which one is more suitable for bidirectional streaming connection?
I eventually found answers to my question myself and with some help from my team. For people like me who come here with a question "how do I arrange notifications delivery from my service to its clients" here's an overview of available options.
This is when the client opens endpoint iself. The service calls client's endpoints whenever the service has some notification to deliver. This way the client also acts as a service and so the client and the service swap roles during notification delivery.
With WebHooks the client must be able to open the endpoint with a well-known address. This is complicated if the client's software is working behind NAT or firewall or if the client is Browser or a mobile application.
The service needs to be prepared that client's WebHook endpoints may not always be online and may not always be healthy.
Another issue is flow control: special measures should be taken in the service not to overwhelm the client with high volume of connections, requests and/or data.
In this case the client is still the client and the service is still the service, unlike WebHooks. The service offers an endpoint where the client can continuously request new notifications. The advantage of this option is that it does not change connection direction and request-response direction and so it works well with HTTP-based services.
The caveat is that polling API should have some rich semantics to be reasonably reliable if loss of notifications is not acceptable. Good examples could be Google Pub/Sub pull and Amazon SQS.
Here are few considerations:
Receiving and deleting notification should be separate operations. Otherwise, if the service deletes notification just before giving it to the client and the client fails to process the notification, the notification will be lost forever. When deletion operation is separate from receiving, the client is forced to do deletion explicitly which normally happens after successful processing.
In case the client received the notification and has not yet deleted it, it might be undesirable to let the same notification to be processed by some other actor (perhaps a concurrent process of the same client). Therefore the notification must be hidden from receiving after it was first received.
In case the client failed to delete the notification in reasonable time because of error, network loss or process crash, the service has to make notification visible for receiving again. This is retry mechanism which allows the notification to be ultimately processed.
In case the service has no notifications to deliver, it should block the client's call for some time by not delivering empty response immediately. Otherwise, if the client polls in a loop and response comes immediately, the loop iteration will be short and clients will make excessive requests to the service increasing network, parsing load and requests counts. A nice-to have feature is for the service to unblock and respond to the client as soon as some notification appears for delivery. This is sometimes called "long polling".
HTTP Server-sent Events
With HTTP Server-sent Events the client opens HTTP connection and sends a request to the service, then the service can send multiple events (notifications) instead of a single response. The connection is long-living and the service can send events as soon as they are ready.
The downside is that the communication is one-way, the client has no way to inform the service if it successfully processed the event. Because this feedback is absent, it may be difficult for the service to control the rate of events to prevent overwhelming the client.
WebSockets were created to enable arbitrary two-way communication and so this is viable option for the service to send notifications to the client. The client can also send processing confirmation back to the service.
WebSockets have been around for a while and should be supported by many frameworks and languages. WebSocket connection begins as HTTP 1.1 connection and so WebSockets over HTTPS should be supported by many load balancers and reverse proxies.
WebSockets are often used with browsers and mobile clients and more rarely in service-to-service communication.
gRPC is similar to WebSockets in a way that it enables arbitrary two-way communication. The advantage of gRPC is that it is centered around protocol and message format definition files. These files are used for code generation that is essential for client and service developers.
gRPC is used for service-to-service communication plus it is supported for Browser clients with grpc-web.
gRPC is supported on multiple popular programming languages and platforms, yet the support is narrower than for HTTP.
gRPC works on top of HTTP/2 which might cause difficulties with reverse proxies and load balancers around things like TLS termination.
Message queue (PubSub)
Finally, the service and the client can use a message queue as a delivery mechanism for notifications. The service puts notifications on the queue and the client receives them from the queue. A queue can be provided by one of many systems like RabbitMQ, Kafka, Celery, Google PubSub, Amazon SQS, etc. There's a wide choice of queuing systems with different properties and choosing one is a challenge on its own. The queue can also be emulated by using database for example.
It has to be decided between the service and the client who owns the queue, i.e. who pays for it. Either way, the queuing system and the queue should be available whenever the service needs to push notifications to it otherwise notifications will be lost (unless the service buffers them internally, with another queue).
Queues are typically used for service-to-service communication but some technologies also allow Browsers as clients.
It is worth noting that an "implicit" internal queue might be used on the service side in other options listed above. One reason is to prevent loss of notifications when there's no client available to receive them. There are many other good reasons like letting clients handle notifications at their pace, allowing to maximize processing throughput, allowing to handle spiky traffic with fixed capacity.
In this option the queue is used "explicitly" as delivery mechanism, i.e. the service does not put any other mechanism (HTTP, gRPC or WebSocket endpoint) in front of the queue and lets the client receive notifications from the queue directly.
Message passing is popular in organizing microservice communications.
Common considerations
In all options it has to be decided whether the loss of notifications is tolerable for the service, the client and the business. Some simpler technical choices are possible if it is ok to lose notifications due to processing errors, unavailability, etc.
It is valuable to have a monitoring for client processing errors from the service side. This way service owners know which clients are more broken without having to ask them.
If the queue is used (implicitly or explicitly) it is valuable to monitor the length of the queue and the age of the oldest notifications. It lets service owners judge how stale data may be in the client.
In case the delivery of notification is organized in a way that notification gets deleted only after a successful processing by the client, the same notification could be stuck in infinite receive loop when the client fails to process it. Such notification is sometimes called "poison message". Poison messages should be removed by the service or the queuing system to prevent clients being stuck in infinite loop. A common practice is to move poison messages to a special place, sometimes called "dead letter queue", for the later human intervention.
One alternative to WebSockets for the problem of server→client notifications with acks from the client seems to be gRPC.
It supports bidirectional communication between server and client in bidirectional streaming mode.
It works on top of HTTP 2.0. In our case functioning over HTTP ports is essential.
There are client and server generators for multiple popular languages and platforms. A nice thing is that I can share protocol definition file with vendors and can be sure my service and their clients will talk the same language.
Not as many languages and platforms are supported compared to HTTP. Alternative C from the question will be more accessible if based on HTTP 1.1. WebSockets have also been around longer and I would expect broader adoption than gRPC.
Not all gRPC implementations seem to currently support XML format for data according to FAQ. In order to transport XML my service and its clients will have to transfer XML message as byte arrays inside of gRPC protobuf message.
With gRPC, TLS termination cannot be done on general-purpose HTTP 1.1 load balancer. An application-layer HTTP/2-aware reverse proxy (load balancer) such as Traefik is required.
There are approaches like this and this to allow HTTP 1.1 compatible protocols but they have their own restrictions like limited amount of available clients or necessary client customizations.

When does AkkaHttp backpressure kick in?

.. when the http response entity is not consumed, or the client tcp buffer becomes full, or when the rate of client taking from its tcp buffer is lower then the rate of server pushing data to it?
I am looking for a way for to achieve the following:
Let's assume that there is a backpressure-able source of data on the server, such as an Apache Kafka topic.
If I consume this source from a remote location it may be possible that the rate at which that remote location can consume is lower - this is solved if Kafka client or consumer is used.
However let's assume that the client is a browser and that exposing direct Kafka protocol / connectivity is not a possibility.
Further, let's assume that there is a possibility of getting all the value even if jumping over some messages.
For instance in case of compacted topics, getting only the latest values for each key is enough for a client, no need to go through intermediate values.
This would be equivalent to Flowable.onBackpressureLatest() or AkkaStreams.aggregateOnBackpressure or onBackpressureAggregate.
Would it be a way to expose the topic over HTTP REST (e.g. Server Side Events / chunked transfer-encoding) or over web-sockets, that would achieve this effect of skipping over intermediate values for each key?
Please advise, thanks
Akka http supports back pressure based on TCP protocol very well and you can read about using it in combination with streaming here
Kafka consumption and exposure via http with back pressure can be easily achieved in combination of akka-http, akka-stream and alpakka-kafka.
Kafka consumers need to do polling and alpakka covers back pressure with reduction of polling requests.
I don't see the necessity of skipping over the messages when back pressure is fully supported. Kafka will keep track of the offset consumed by a consumer group (the one you pick for your service or http connection) and this will guarantee eventual consumption of all messages. Of course, if you produce messages way faster in a topic, the consumer will never catch up. Let me know if this is your case.
As a final note, you may check out Confluent REST Proxy API, which allows you to read Kafka messages in a restful manner.

What is the difference between DEALER and ROUTER socket archetype in ZeroMQ?

What is the difference between the ROUTER and the DEALER socket archetypes in zmq?
And which should I use, if I have a server, which is receiving messages and a client, which is sending messages? The server will never send a message to a client.
EDIT: I forgot to say that there can be several instances of the client.
For details on ROUTER/DEALER Formal Communication Pattern, do not hesitate to consult the API documentation. There are many features important for ROUTER/DEALER ( XREQ/XREP ) that have nothing beneficial for your indicated use-case.
Many just send, just one just listens?
Given N-clients purely .send() messages to 1-server, which exclusively .recv() messages, but never sends any message back,
the design may benefit from a PUB/SUB Formal Communication Pattern.
In case some other preferences outweight the trivial approach, one may setup a more complex "wireing", using another one-way type of infrastructure, based on PUSH/PULL, and use a reverse setup PUB/SUB, where each new client, the PUB side, .connect()-s to the SUB-side, given a server-side .bind() access-point is on a known, static IP address and the client self-advertises on this signalling channel, that it is alive ( keep-alive with IP-address:port#, where the server-side ought initiate a new PUSHtoPULL.connect() setup onto the client-advertised, .bind()-ready PULL-side access point.
Complex? Rather a limitless tool, only our imagination is our limit.
After some time, one realises all the powers of multi-functional SIG/MSG-infrastructure, so do not hesitate to experiment and re-use the elementary archetypes in more complex, mutually-cooperating distributed systems computing.

Did server successfully receive request

I am working on a C# mobile application that requires major interaction with a PHP web server. However, the application also needs to support an "offline mode" as connection will be over a cellular network. This network may drop requests at random times. The problem that I have experienced with previous "Offline Mode" applications is that when a request results in a Timeout, the server may or may not have already processed that request. In cases where sending the request more than once would create a duplicate, this is a problem. I was walking through this and came up with the following idea.
Mobile sets a header value such as UniqueRequestID: 1 to be sent with the request.
Upon receiving the request, the PHP server adds the UniqueRequestID to the current user session $_SESSION['RequestID'][] = $headers['UniqueRequestID'];
Server implements a GetRequestByID that returns true if the id exists for the current session or false if not. Alternatively, this could returned the cached result of the request.
This seems to be a somewhat reliable way of seeing if a request successfully contacted the server. In mobile, upon re-connecting to the server, we check if the request was received. If so, skip that pending offline message and go to the next one.
Have I reinvented the wheel here? Is this method prone to failure (or am I going down a rabbit hole)? Is there a better way / alternative?
-I was pitching this to other developers here and we thought that this seemed very simple implying that this "system" would likely already exist somewhere.
-Apologies if my Google skills are failing me today.
As you correctly stated, this problem is not new. There have been multiple attempts to solve it at different levels.
Transport level
HTTP transport protocol itself does not provide any mechanisms for reliable data transfer. One of the reasons is that HTTP is stateless and don't care much about previous requests and responses. There have been attempts by IBM to make a reliable transport protocol called HTTPR what was based on HTTP, but it never got popular. You can read more about it here.
Messaging level
Most Web Services out there still uses HTTP as a transport protocol and SOAP messaging protocol on top of it. SOAP over HTTP is not sufficient when an application-level messaging protocol must also guarantee some level of reliability and security. This is why WS-Reliability and WS-ReliableMessaging protocols where introduced. Those protocols allow SOAP messages to be reliably delivered between distributed applications in the presence of software component, system, or network failures. At the same time they provide additional security. You can read more about those protocols here and here.
Your solution
I guess there is nothing wrong with your approach if you need a simple way to ensure that message has not been already processed. I would recommend to use database instead of session to store processing result for each request. If you use $_SESSION['RequestID'][] you will run in to trouble if the session is lost (user is offline for specific time, server is restarted or has crashed, etc). Also, if you use database instead of session, you can scale-up easier later on just by adding extra web server.

what happens when tcp/udp server is publishing faster than client is consuming?

I am trying to get a handle on what happens when a server publishes (over tcp, udp, etc.) faster than a client can consume the data.
Within a program I understand that if a queue sits between the producer and the consumer, it will start to get larger. If there is no queue, then the producer simply won't be able to produce anything new, until the consumer can consume (I know there may be many more variations).
I am not clear on what happens when data leaves the server (which may be a different process, machine or data center) and is sent to the client. If the client simply can't respond to the incoming data fast enough, assuming the server and the consumer are very loosely coupled, what happens to the in-flight data?
Where can I read to get details on this topic? Do I just have to read the low level details of TCP/UDP?
With TCP there's a TCP Window which is used for flow control. TCP only allows a certain amount of data to remain unacknowledged at a time. If a server is producing data faster than a client is consuming data then the amount of data that is unacknowledged will increase until the TCP window is 'full' at this point the sending TCP stack will wait and will not send any more data until the client acknowledges some of the data that is pending.
With UDP there's no such flow control system; it's unreliable after all. The UDP stacks on both client and server are allowed to drop datagrams if they feel like it, as are all routers between them. If you send more datagrams than the link can deliver to the client or if the link delivers more datagrams than your client code can receive then some of them will get thrown away. The server and client code will likely never know unless you have built some form of reliable protocol over basic UDP. Though actually you may find that datagrams are NOT thrown away by the network stack and that the NIC drivers simply chew up all available non-paged pool and eventually crash the system (see this blog posting for more details).
Back with TCP, how your server code deals with the TCP Window becoming full depends on whether you are using blocking I/O, non-blocking I/O or async I/O.
If you are using blocking I/O then your send calls will block and your server will slow down; effectively your server is now in lock step with your client. It can't send more data until the client has received the pending data.
If the server is using non blocking I/O then you'll likely get an error return that tells you that the call would have blocked; you can do other things but your server will need to resend the data at a later date...
If you're using async I/O then things may be more complex. With async I/O using I/O Completion Ports on Windows, for example, you wont notice anything different at all. Your overlapped sends will still be accepted just fine but you might notice that they are taking longer to complete. The overlapped sends are being queued on your server machine and are using memory for your overlapped buffers and probably using up 'non-paged pool' as well. If you keep issuing overlapped sends then you run the risk of exhausting non-paged pool memory or using a potentially unbounded amount of memory as I/O buffers. Therefore with async I/O and servers that COULD generate data faster than their clients can consume it you should write your own flow control code that you drive using the completions from your writes. I have written about this problem on my blog here and here and my server framework provides code which deals with it automatically for you.
As far as the data 'in flight' is concerned the TCP stacks in both peers will ensure that the data arrives as expected (i.e. in order and with nothing missing), they'll do this by resending data as and when required.
TCP has a feature called flow control.
As part of the TCP protocol, the client tells the server how much more data can be sent without filling up the buffer. If the buffer fills up, the client tells the server that it can't send more data yet. Once the buffer is emptied out a bit, the client tells the server it can start sending data again. (This also applies to when the client is sending data to the server).
UDP on the other hand is completely different. UDP itself does not do anything like this and will start dropping data if it is coming in faster then the process can handle. It would be up to the application to add logic to the application protocol if it can't lose data (i.e. if it requires a 'reliable' data stream).
If you really want to understand TCP, you pretty much need to read an implementation in conjunction with the RFC; real TCP implementations are not exactly as specified. For example, Linux has a 'memory pressure' concept which protects against running out of the kernel's (rather small) pool of DMA memory, and also prevents one socket running any others out of buffer space.
The server can't be faster than the client for a long time. After it has been faster than the client for a while, the system where it is hosted will block it when it writes on the socket (writes can block on a full buffer just as reads can block on an empty buffer).
With TCP, this cannot happen.
In case of UDP, packets will be lost.
The TCP Wikipedia article shows the TCP header format which is where the window size and acknowledgment sequence number are kept. The rest of the fields and the description there should give a good overview of how transmission throttling works. RFC 793 specifies the basic operations; pages 41 and 42 details the flow control.
