Writting more event to channel leads to channel full exception - flume-ng

I am using the flume JMS source to dequeue message from ActiveMQ and convert this message into List<Event> using custom converter
Channel configuration
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1000000
agent.channels.c1.transactionCapacity = 1500
when size of the List<Event> is less than or equal to 1500 (channel transaction capacity),then flume write the events to channel , but if the event size is greater than 1500 then i am getting the below exception
Error log
21 Apr 2015 12:19:28,245 WARN [PollableSourceRunner-JMSSource-s1] (org.apache.flume.source.jms.JMSSource.doProcess:263) - Error appending event to channel. Channel might be full. Consider increasing the channel capacity or make sure the sinks perform faster.
org.apache.flume.ChannelException: Unable to put batch on required channel: org.apache.flume.channel.MemoryChannel{name: c1}
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:200)
at org.apache.flume.source.jms.JMSSource.doProcess(JMSSource.java:257)
at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:54)
at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:139)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.flume.ChannelException: Put queue for MemoryTransaction of capacity 1500 full, consider committing more frequently, increasing capacity or increasing thread count
at org.apache.flume.channel.MemoryChannel$MemoryTransaction.doPut(MemoryChannel.java:84)
at org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93)
at org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189)
... 4 more
How to solve this problem?
Note: Event size varies dynamically based on ActiveMQ message

Related

Spring kafka - multiple consumers receiving same message

I am using spring kafka to consume messages from kafka. Consumer listener is as below.
#KafkaListener(topics = "topicName",
groupId = "groupId",
containerFactory = "kafkaListenerFactory")
public void onMessage(ConsumerRecord record) {
logger.info("Received Message from kafka topic " + record.topic() + " with record key " + kafkaRecordKey + " partition " + record.partition() + " offset " +record.offset());
}
Single instance of application with ConcurrentKafkaListenerContainerFactory concurrency=6.
The topic has 6 partitions.
Time: 5/27/22 6:28:52.864 PM
message: Received Message from kafka topic payment-topic with record key ti9:a1956769-28d2-4329-a0ff-9003003a3cde partition 4 offset 325
thread: org.springframework.kafka.KafkaListenerEndpointContainer#0-4-C-1
threadId: 69
Time: 5/27/22 6:28:52.864 PM
message: Received Message from kafka topic payment-topic with record key ti9:a1956769-28d2-4329-a0ff-9003003a3cde partition 4 offset 325
thread: org.springframework.kafka.KafkaListenerEndpointContainer#0-3-C-1
threadId: 66
From above logs, it is clear that 2 consumers received same message from partition and offset and exactly same time.
Each thread continues processing the message. In the end one of the consumer fails with below error
Time: 5/27/22 6:28:52.887 PM
message: [Consumer clientId=consumer-payment-consumer-5, groupId=payment-consumer] Offset commit failed on partition payment-topic-4 at offset 326: The coordinator is not aware of this member.
thread: org.springframework.kafka.KafkaListenerEndpointContainer#0-4-C-1
threadId: 69
Time: 5/27/22 6:28:53.902 PM
message: Error handler threw an exception
thread: org.springframework.kafka.KafkaListenerEndpointContainer#0-4-C-1
threadId: 69
threadPriority: 5
thrown: { [-]
cause: { [+]
}
commonElementCount: 0
extendedStackTrace: [ [+]
]
localizedMessage: Seek to current after exception; nested exception is org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
message: Seek to current after exception; nested exception is org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
name: org.springframework.kafka.KafkaException
I understand above error can when there is load or processing of the message takes time. In this case processing is less than a second and there are less than 10 messages in kafka topic.
Please advise on why multiple consumers receiving same message.
Also the error logs says "Offset commit failed on partition payment-topic-4 at offset 326" for message at offset 325
Library versions
Spring boot - 2.5.7
org.springframework.kafka.spring-kafka - 2.7.8
org.apache.kafka.kafka-clients - 2.8.1
The process time of the record has to be less than max.poll.interval.ms otherwise the rebalance happens and it is likely the currently processed record offset is not committed therefore another assigned consumer fetches only from the previously committed offset for that partition.
https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#consumerconfigs_max.poll.interval.ms

Spring kafka MessageListener and max.poll.records

I am using spring kafka 2.7.8 to consume messages from kafka.
Consumer listener is as below
#KafkaListener(topics = "topicName",
groupId = "groupId",
containerFactory = "kafkaListenerFactory")
public void onMessage(ConsumerRecord record) {
}
Above onMessage method receives single message at a time.
Does this mean max.poll.records is set to 1 by spring library or it polls 500 at a time(default value) and the method receives one by one.
Reason for this question is, we often see below errors together in prod.
Received all 4 errors below for multiple consumers in under a minute.
Trying to understand whether it is due to intermittent kafka broker connectivity issue or due to load. Please advise.
Seek to current after exception; nested exception is org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
seek to current after exception; nested exception is org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {topic-9=OffsetAndMetadata{offset=2729058, leaderEpoch=null, metadata=''}}
Consumer clientId=consumer-groupName-5, groupId=consumer] Offset commit failed on partition topic-33 at offset 2729191: The coordinator is not aware of this member.
Seek to current after exception; nested exception is org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records
max.poll.records is not changed by Spring; it will take the default (or whatever you set it to). The records are handed to the listener one at a time before the next poll.
This means that your listener must be able to process max.poll.records within max.poll.interval.ms.
You need to reduce max.poll.records and/or increase max.poll.interval.ms so that you can process the records in that time, with a generous margin, to avoid these rebalances.

SCS kafka consumer attempts to acquire info from a partition that is no longer assigned to it

spring-cloud-stream-binder-kafka 3.0.9-RELEASE
spring-boot 2.2.13.RELEASE
Hi, we have a project using Spring Cloud Stream with kafka and we are having a problem in reconnecting the consumers when the broker nodes are down for a period of time.
The problem is that the consumer is not able to reconnect and acquire the partitions because it is trying to check the offset position of a partition that is no longer assigned to it? how can this happen?
The logs are shown below:
2021-06-09T09:39:25.358Z [mecstkac-45-6gvd4] [WARN] [KafkaConsumerDestination{consumerDestinationName='topicName1', partitions=0, dlqName='null'}.container-0-C-1] [messageKey=] [Consumer clientId=clientid-0, groupId=groupid-v1] Connection to node 2147483644 (hostnode/10.71.34.4:9092) could not be established. Broker may not be available.
2021-06-09T09:42:30.217Z [mecstkac-45-6gvd4] [ERROR] [KafkaConsumerDestination{consumerDestinationName='topicName1', partitions=0, dlqName='null'}.container-0-C-1] [messageKey=] [Consumer clientId=clientid-0, groupId=groupid-v1] User provided listener org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer$ListenerConsumerRebalanceListener failed on invocation of onPartitionsAssigned for partitions [topicName1-1]org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition topicName1-1 could be determined
2021-06-09T09:42:30.217Z [mecstkac-45-6gvd4] [ERROR] [KafkaConsumerDestination{consumerDestinationName='topicName1', partitions=0, dlqName='null'}.container-0-C-1] [messageKey=] Error while processing: nullorg.apache.kafka.common.KafkaException: User rebalance callback throws an error\\n at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:403)\\nCaused by: org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition topicName1-1 could be determined"
2021-06-09T09:43:03.924Z [mecstkac-45-6gvd4] [ERROR] [KafkaConsumerDestination{consumerDestinationName='topicName1', partitions=0, dlqName='null'}.container-0-C-1] [messageKey=] Error while processing: nulljava.lang.IllegalStateException: You can only check the position for partitions assigned to this consumer.\\n at org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1717)
Is it possible that the kafka binder stores information about the previously assigned partition and tries to connect to it even though a rebalance has already been performed and it is now assigned to another consumer?
NOTE: The configuration of the consumer Assignor is the default (RangeAssignor).

How to set maxWebsocketFrameSize

getting error:
2020-01-20 21:15:29,599 WARN [io.net.cha.DefaultChannelPipeline] (vert.x-eventloop-thread-0) An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: io.netty.handler.codec.http.websocketx.CorruptedWebSocketFrameException: Max frame length of 65536 has been exceeded.
at io.netty.handler.codec.http.websocketx.WebSocket08FrameDecoder.protocolViolation(WebSocket08FrameDecoder.java:426)
at io.netty.handler.codec.http.websocketx.WebSocket08FrameDecoder.decode(WebSocket08FrameDecoder.java:286)
Very hard to tell "what" is implementing the websocket server and how to modify the configuration.
Vert.x core has a HttpServerOptions with maxWebsocketFrameSize that "might" be the correct thing to increase, or even the maxWebsocketMessageSize that when increased, might increase the frameSize?
Cannot find any way to clear up this exception when client sends a big message to the server over the websocket channel. (it is text message)

DataFlow not acking PubSub messages

Simple gcloud dataflow pipeline:
PubsubIO.readStrings().fromSubscription -> Window -> ParDo -> DatastoreIO.v1().write()
When load is applied to the pubsub topic, the messages are read but not acked:
Jul 25, 2017 4:20:38 PM org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSource$PubsubReader stats
INFO: Pubsub projects/my-project/subscriptions/my-subscription has 1000 received messages, 950 current unread messages, 843346 current unread bytes, 970 current in-flight msgs, 28367ms oldest in-flight, 1 current in-flight checkpoints, 2 max in-flight checkpoints, 770B/s recent read, 1000 recent received, 0 recent extended, 0 recent late extended, 50 recent ACKed, 990 recent NACKed, 0 recent expired, 898ms recent message timestamp skew, 9224873061464212ms recent watermark skew, 0 recent late messages, 2017-07-25T23:16:49.437Z last reported watermark
What pipeline step should ack the messages?
stackdriver dashboard shows that there are some acks but the number of unacked messages stays stable.
no error messages in the trace indicating that the message processing failed.
entries show up in the datastore
Dataflow will only acknowledge PubSub messages after they are durably committed somewhere else. In a pipeline that consists of PubSub -> ParDo -> 1 or more sinks, this may be delayed by any of the sinks having problems (even if they are being retried, that will slow things down). This is part of ensuring that results seem to be processed effectively-once. See a previous question about when Dataflow acknowledges a message for more details.
One (easy) option to change this behavior is to add a GroupByKey (using a randomly generated key) after the PubSub source and before the sinks. This will cause the messages to be acknowledged earlier, but may perform worse, since PubSub is generally better at holding the unprocessed inputs than the GroupByKey.

Resources