Steps to reproduce
The logic of the application assumes that there are number of data sources on the server which are handled by groups.
If client wants to subscribe to the specific data source, it calls:
myhub.Subscribe(dataSourceId);
On the server side, we just add the client to the specific group:
await Groups.Add(Context.ConnectionId, dataSourceId.ToString());
Then all the messages are sent with huge cursor payload. And the most important part, the size of it grows with every subscription.
Am I doing something wrong?
Update
Similar: SignalR and large number of groups
Unfortunately this is how cursors work. Cursor contains references to all topics the connection is subscribed to and each group is a separate topic. Besides the cursor getting bigger there is one more limitation to using many groups. The more groups the client is a member of the bigger the groups token. The groups token is sent back to the server when the client is reconnecting and if it gets too big it may exceed the URL size limit causing reconnect failures.
Related
I have developed a Kafka version : 0.9.0.1 application that cannot afford to lose any messages.
I have a constraint that the messages must be consumed in the correct sequence.
To ensure I do not loose any messages I have implemented Retries within my application code and configured my Producer to ack=all.
To enforce exception handling and to Fail Fast I immediately get() on the returned Future from Producer.send(), e.g.
final Future<RecordMetadata> futureRecordMetadata = KAFKA_PRODUCER.send(producerRecord);
futureRecordMetadata.get();
This approach works fine for guaranteeing the delivery of all messages, however the performance is completely unacceptable.
For example it takes 34 minutes to send 152,125 messages with ack=all.
When I comment out the futureRecordMetadata.get(), I can send 1,089,125 messages in 7 minutes.
When I change ack=all to ack=1 I can send 815,038 in 30 minutes. Why is there such a big difference between ack=all and ack=1?
However by not blocking on the get() I have no way of knowing if the message arrived safely.
I know I can pass a Callback into the send and have Kafka retry for me, however this approach has a drawback that messages may be consumed out of sequence.
I thought request.required.acks config could save the day for me, however when I set any value for it I receive this warning
130 [NamedConnector-Monitor] WARN org.apache.kafka.clients.producer.ProducerConfig - The configuration request.required.acks = -1 was supplied but isn't a known config.
Is it possible to asynchronously send Kafka messages, with a guarantee they will ALWAYS arrive safely and in the correct sequence?
UPDATE 001
Is there anyway I can consume messages in kafka message KEY order direct from the TOPIC?
Or would I have to consume messages in offset order then sort programmatically
to Kafka message Key order?
If you expect a total order, the send performance is bad. (actually total order scenario is very rare).
If Partition order are acceptable, you can use multiple thread producer. One producer/thread for each partition.
I am building a system that processes orders. Each order will follow a workflow. So this order can be, e.g., booked,accepted,payment approved,cancelled and so on.
Every time a status of a order changes I will post this change to SNS. To know if a status order has changed I will need to make a request to a external API, and compare to the last known status.
The question is: What is the best place to store the last known order status?
1. A SQS queue. So every time I read a message from queue, check status using the external API, delete the message and insert another one with the new status.
2. Use a database (like Dynamo DB) to control the order status.
You should not use the word "store" to describe something happening with stateful facts and a queue. Stateful, factual information should be stored -- persisted -- to a database.
The queue messages should be treated as "hints" on what work needs to be done -- a request to consider the reasonableness of a proposed action, and if reasonable, perform the action.
What I mean by this, is that when a queue consumer sees a message to create an order, it should check the database and create the order if not already present. Update an order? Check the database to see whether the order is in a correct status for the update to occur. (Canceling an order that has already shipped would be an example of a mismatched state).
Queues, by design, can't be as precise and atomic in their operation as a database should. The Two Generals Problem is one of several scenarios that becomes an issue in dealing with queues (and indeed with designing a queue system) -- messages can be lost or delivered more than once.
What happens in a "queue is authoritative" scenario when a message is delivered (received from the queue) more than once? What happens if a message is lost? There's nothing wrong with using a queue, but I respectfully suggest that in this scenario the queue should not be treated as authoritative.
I will go with the database option instead of SQS:
1) option SQS:
You will have one application which will change the status
Add the status value into SQS
Now another application will check your messages and send notification, delete the message
2) Option DynamoDB:
Insert you updated status in DynamoDB
Configure a Lambda function on update of that field
Lambda function will send notifcation
The database option looks clear additionally, you don't have to worry about maintaining any queue plus you can read one message from the queue at a time unless you implement parallel reader to read from the queue. In a database, you can update multiple rows and it will trigger the lambda and you don't have to worry about it.
Hope that helps
I'm in the process of building a consumer service for a Kafka topic. Each message contains a url to which my service will make an http request. Each message / url is completely independent from other messages / urls.
The problem I'm worried about is how to handle long-running requests. It's possible for some http requests to take 50+ minutes before a response is returned. During that time, I do not want to hold up any other messages.
What is the best way to parallelize this operation?
I know that Kafka's approach to parallelism is to create partitions. However, from what I've read, it seems that you need to define the number of partitions up front when I really want an infinite or dynamic number of partitions (ideally each message gets its own partition created on the fly)
As an example, let's say I create 1,000 partitions. If 1,001+ messages are produced to my topic, the first 1,000 requests will be made but every message after that will be queued up until the previous request in that partition finishes.
I've thought about making the http requests asynchronous but then I seem to run into a problem when determining what offset to commit.
For instance, on a single partition I can have a consumer read the first message and make an async request. It provides a callback function which commits that offset to Kafka. While that request is waiting, my consumer reads the next message and makes another async request. If that request finishes before the first it will commit that offset. Now, what happens if the first request fails for some reason or my consumer process dies? If I've already committed a higher offset, it sounds like this means my first message will never get reprocessed, which is not what I want.
I'm clearly missing something when it comes to long-running, asynchronous message processing using Kafka. Has anyone experienced a similar issue or have thoughts on how to best solve this? Thanks in advance for taking the time to read this.
You should look at Apache Storm for the processing portion of your consumer and leave the message storage and retrieval to Kafka. What you've described is a very common use case in Big Data (although the 50+ minute thing is a bit extreme). In short, you'll have a small number of partitions for your topic and let Storm stream processing scale the number of components ("bolts" in Storm-speak) that would actual make the http requests. A single spout (the kind of storm component that reads data from an external source) could read the messages from the Kafka topic and stream them to the processing bolts.
I've posted an open source example of how to write a Storm/Kafka application on github.
Some follow-on thoughts to this answer:
1) While I think Storm is the correct platform approach to take, there's no reason you couldn't roll your own by writing a Runnable that performs the http call and then write some more code to make a single Kafka consumer read messages and process them with multiply-threaded instances of your runnable. The management code required is a bit interesting, but probably easier to write than what it takes to learn Storm from scratch. So you'd scale by adding more instances of the Runnable on more threads.
2) Whether you use Storm or your own multi-threaded solution, you'll still have the problem of how to manage the offset in Kafka. The short answer there is that you'll have to do your own complex offset management. Not only will you have to persist the offset of the last message you read from Kafka, but you'll have to persist and manage the list of in-flight messages currently being processed. In this way, if your app goes down, you know what messages were being processed and you can retrieve and re-process them when you start back up. The base Kafka offset persistence doesn't support this more complex need, but it's only there as a convenience for the simpler use cases anyway. You can persist your offsets info anywhere you like (Zookeeper, file system or any data base).
I am just getting started with Firebase and had a question regarding the Firebase Event Guarantees listed at the following URL:
Event Guarantees.
One of the guarantees states that writes from a single client will always be written to the server and broadcast out to other users in-order.
Does this guarantee also imply that clients will receive events broadcast by a single client in the order that they were broadcast, or is it possible to receive events out of the order they were broadcast?
For example, if one client adds a node, then adds a child to that node, am I guaranteed that other clients will see those events in the same order?
The only guarantee is that the values will be eventually consistent. Thinking this through, it's the only reasonable answer. Any operation over the internet could be delayed indefinitely by any moving part in the process, thus producing out-of-order events received by the client, regardless of the order they reach the server.
Thus, you are guaranteed that all the clients will see both of the added child nodes eventually, and that they will be consistent across all the clients (eventually).
If you want to guarantee the order of events, you are using a messaging queue--which is one adaptation of how you can use Firebase, but not the only one. This is easily achieved using the push() method, which creates chronologically ordered, unique ids.
You can also throw in a timestamp and utilize the orderByChild method to sort records.
Good day guys!. I'm currently working on a system using JMS queues that send message over SMPP (using Logica SMPP library).
My problem is that I need to attach an internal id (that we manage within our system) to the message sequence id so that when in async mode I receive a response, the proper action can be taken for that particular message.
The first option I tried to implement was the use of optional parameters, as established for SMPP 3.4. I do not receive the optional parameters in the response (I've read that the response attaches the optional parameters depending on the provider).
A second approach was to keep a mapping in memory for those messages until their response is received (it saturates the memory, so it is a no go).
Can anyone else think on a viable solution for correlating an internal system ID of a message to its sequence number within an asynchronous SMPP environment?
Thank you for your time.
You need to keep a map of seq_nr - internal message id and delete from this map as soon you get an async response back from SMSC.
It should not saturate the memory as it will keep only inflight messages but you need to periodicaly iterate over the map and delete orphaned entries (as sometimes you will not get an reponse back from smsc).