Using Rabbit MQ
Is there a way ,I can ensure all messages of Type A are consumed before messages of Type B.
Eg: if I have a pool of Product and Order messages,
I want to make sure all Product messages are consumed before Order messages ?
As Order belongs to a specific Product.So Order cannot exist without a Product.
You could do it with two queues and two listeners, but that would be tricky; you would have to hold up the Order listener when a Product is missing and wait for it to arrive.
You could do it with a single queue and single concurrency as long as the producer always sends the Product before the Order.
Related
I am currently exploring kafka as a beginner for a simple problem.
There will one Producer pushing message to one Topic but there will
be n number of Consumer of spark application massage the data from
kafka and insert into database (each consumer inserts to different
table).
Is there a possibility that consumers will go out of sync (like some part of the consumer goes down for quite some time), then
one or more consumer will not process the message and insert to table
?
assuming the code is always correct, no exception will arise when
massaging the data. It is important that every message is processed
only once.
My question is that does Kafka handles this part for us or do we have to write some other code to make sure this does not happen.
You can group consumers (see group.id config) and that grouped consumers split topic's partitions among themselves. Once a consumer drops, another consumers from the group will take over partitions read by dropped one.
However, there may be some problems: when consumer read a partition it commit offset back to Kafka and if consumer dropped after it processed received data but before commit offset, other consumers will start read from the latest available offset. Fortunately, you can manage strategy of how offset is committed (see consumer's settings enable.auto.commit, auto.offset.reset etc)
Kafka and Spark Streaming guide provide some explanations and possible strategies of how to manage offsets.
By design Kafka decouples the producer and the consumer. Consumer will read as fast as they can - and consumers can produce as fast as they can.
Consumers can be organized into "consumer groups" and you can set it up so that multiple consumers can read from a single group as well set it up so that an individual consumer reads from its own group.
If you have 1 consumer to 1 group you (depending on your acknowledgement strategy) should be able to ensure each message is read only once (per consumer).
Otherwise if you want multiple consumer reading from a single group - same thing - but the message is read once by a one of n consumers.
I have developed a Kafka version : 0.9.0.1 application that cannot afford to lose any messages.
I have a constraint that the messages must be consumed in the correct sequence.
To ensure I do not loose any messages I have implemented Retries within my application code and configured my Producer to ack=all.
To enforce exception handling and to Fail Fast I immediately get() on the returned Future from Producer.send(), e.g.
final Future<RecordMetadata> futureRecordMetadata = KAFKA_PRODUCER.send(producerRecord);
futureRecordMetadata.get();
This approach works fine for guaranteeing the delivery of all messages, however the performance is completely unacceptable.
For example it takes 34 minutes to send 152,125 messages with ack=all.
When I comment out the futureRecordMetadata.get(), I can send 1,089,125 messages in 7 minutes.
When I change ack=all to ack=1 I can send 815,038 in 30 minutes. Why is there such a big difference between ack=all and ack=1?
However by not blocking on the get() I have no way of knowing if the message arrived safely.
I know I can pass a Callback into the send and have Kafka retry for me, however this approach has a drawback that messages may be consumed out of sequence.
I thought request.required.acks config could save the day for me, however when I set any value for it I receive this warning
130 [NamedConnector-Monitor] WARN org.apache.kafka.clients.producer.ProducerConfig - The configuration request.required.acks = -1 was supplied but isn't a known config.
Is it possible to asynchronously send Kafka messages, with a guarantee they will ALWAYS arrive safely and in the correct sequence?
UPDATE 001
Is there anyway I can consume messages in kafka message KEY order direct from the TOPIC?
Or would I have to consume messages in offset order then sort programmatically
to Kafka message Key order?
If you expect a total order, the send performance is bad. (actually total order scenario is very rare).
If Partition order are acceptable, you can use multiple thread producer. One producer/thread for each partition.
I'm working on a system where clients enter data into a program and the save action posts a message to activemq for more time intensive processing.
We are running into rare occasions where a record will be updated by a client twice in a row and a consumer on that activemq queue will process the two records at the same time. I'm looking for a way to ensure that messages containing records with the same identity are processed in-order and only one at a time. To be clear if a record with ID 1, 1, and 2 (in that order) are sent to activemq, 1 would process, then 2 (if 1 was still in process) and finally 1.
Another requirement, (due to volume) requires that the consumer be multi-threaded, so there may be 16 threads accessing that queue. This would have to be taken into consideration.
So if you have multiple threads reading that queue and you want the solution to be close to ActiveMQ you have to think about how you scale related to order concerns.
If you have multiple consumers, they may operate at different speed and you can never be sure which consumer goes before the other. The only way is to have a single consumer (you can still achieve High Availability by using exclusive-consumers).
You can, however, segment the load in other ways. How depends a lot on your application. If you can create, say 16 "worker" queues (or whatever your max consumer count would be) and distribute load to these queues while guarantee that requests from a single user always come to the same "worker queue", message order will remain per user.
If you have no good way to divide users into groups, simply take the userID mod MAX_CONSUMER_THREADS as a simple solution.
There may be better ways to deal with this problem in the consumer logic itself. Like keeping track of the sequence number and postpone updates that are out of order (scheduled delay can be used for that).
I am building a system that processes orders. Each order will follow a workflow. So this order can be, e.g., booked,accepted,payment approved,cancelled and so on.
Every time a status of a order changes I will post this change to SNS. To know if a status order has changed I will need to make a request to a external API, and compare to the last known status.
The question is: What is the best place to store the last known order status?
1. A SQS queue. So every time I read a message from queue, check status using the external API, delete the message and insert another one with the new status.
2. Use a database (like Dynamo DB) to control the order status.
You should not use the word "store" to describe something happening with stateful facts and a queue. Stateful, factual information should be stored -- persisted -- to a database.
The queue messages should be treated as "hints" on what work needs to be done -- a request to consider the reasonableness of a proposed action, and if reasonable, perform the action.
What I mean by this, is that when a queue consumer sees a message to create an order, it should check the database and create the order if not already present. Update an order? Check the database to see whether the order is in a correct status for the update to occur. (Canceling an order that has already shipped would be an example of a mismatched state).
Queues, by design, can't be as precise and atomic in their operation as a database should. The Two Generals Problem is one of several scenarios that becomes an issue in dealing with queues (and indeed with designing a queue system) -- messages can be lost or delivered more than once.
What happens in a "queue is authoritative" scenario when a message is delivered (received from the queue) more than once? What happens if a message is lost? There's nothing wrong with using a queue, but I respectfully suggest that in this scenario the queue should not be treated as authoritative.
I will go with the database option instead of SQS:
1) option SQS:
You will have one application which will change the status
Add the status value into SQS
Now another application will check your messages and send notification, delete the message
2) Option DynamoDB:
Insert you updated status in DynamoDB
Configure a Lambda function on update of that field
Lambda function will send notifcation
The database option looks clear additionally, you don't have to worry about maintaining any queue plus you can read one message from the queue at a time unless you implement parallel reader to read from the queue. In a database, you can update multiple rows and it will trigger the lambda and you don't have to worry about it.
Hope that helps
I am just getting started with Firebase and had a question regarding the Firebase Event Guarantees listed at the following URL:
Event Guarantees.
One of the guarantees states that writes from a single client will always be written to the server and broadcast out to other users in-order.
Does this guarantee also imply that clients will receive events broadcast by a single client in the order that they were broadcast, or is it possible to receive events out of the order they were broadcast?
For example, if one client adds a node, then adds a child to that node, am I guaranteed that other clients will see those events in the same order?
The only guarantee is that the values will be eventually consistent. Thinking this through, it's the only reasonable answer. Any operation over the internet could be delayed indefinitely by any moving part in the process, thus producing out-of-order events received by the client, regardless of the order they reach the server.
Thus, you are guaranteed that all the clients will see both of the added child nodes eventually, and that they will be consistent across all the clients (eventually).
If you want to guarantee the order of events, you are using a messaging queue--which is one adaptation of how you can use Firebase, but not the only one. This is easily achieved using the push() method, which creates chronologically ordered, unique ids.
You can also throw in a timestamp and utilize the orderByChild method to sort records.