The invoke method of sink seems no way to make async io? e.g. returns Future?
For example, the redis connector uses jedis lib to execute redis command synchronously:
https://github.com/apache/bahir-flink/blob/master/flink-connector-redis/src/main/java/org/apache/flink/streaming/connectors/redis/RedisSink.java
Then it will block the task thread of flink waiting the network response from redis server per command?! Is it possible for other operators running in the same thread with sink? If so, then it would block them too?
I know flink has asyncio api, but it seems not for used by sink impl?
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html
As #Dexter mentioned, you can use RichAsyncFunction. Here is an sample code(may need further update to make it work ;)
AsyncDataStream.orderedWait(ds, new RichAsyncFunction<Tuple2<String,MyEvent>, String>() {
transient private RedisClient client;
transient private RedisAsyncCommands<String, String> commands;
transient private ExecutorService executor;
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
client = RedisClient.create("redis://localhost");
commands = client.connect().async();
executor = Executors.newFixedThreadPool(10);
}
#Override
public void close() throws Exception {
// shut down the connection and thread pool.
client.shutdown();
executor.shutdown();
super.close();
}
public void asyncInvoke(Tuple2<String, MyEvent> input, final AsyncCollector<String> collector) throws Exception {
// eg.g get something from redis in async
final RedisFuture<String> future = commands.get("key");
future.thenAccept(new Consumer<String>() {
#Override
public void accept(String value) {
collector.collect(Collections.singletonList(future.get()));
}
});
}
}, 1000, TimeUnit.MILLISECONDS);
Related
I am using spring-kafka 2.2.8 and trying to understand if there is an option to deploy a kafka consumer being in pause mode until i signal to start consume the messages. Please suggest.
I see in the below post, we can pause and start the consumer but I need the consumer to be in pause mode when it's deployed.
how to pause and resume #KafkaListener using spring-kafka
#KafkaListener(id = "foo", ..., autoStartup = "false")
Then start it using the KafkaListenerEndpointRegistry when you are ready
registry.getListenerContainer("foo").start();
There is not much point in starting it in paused mode, but you can do that...
#SpringBootApplication
public class So62329274Application {
public static void main(String[] args) {
SpringApplication.run(So62329274Application.class, args);
}
#KafkaListener(id = "so62329274", topics = "so62329274", autoStartup = "false")
public void listen(String in) {
System.out.println(in);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so62329274").partitions(1).replicas(1).build();
}
#Bean
public ApplicationRunner runner(KafkaListenerEndpointRegistry registry, KafkaTemplate<String, String> template) {
return args -> {
template.send("so62329274", "foo");
registry.getListenerContainer("so62329274").pause();
registry.getListenerContainer("so62329274").start();
System.in.read();
registry.getListenerContainer("so62329274").resume();
};
}
}
You will see a log message like this when the partitions are assigned:
Paused consumer resumed by Kafka due to rebalance; consumer paused again, so the initial poll() will never return any records
We are using spring-kafka 1.2.2.RELEASE.
What we want
1. As soon as a message is consumed and processed successfully, offset is committed in spring-kafka. I am using Manaul Commit/Acknowledgement for it, it is working fine.
2. In case of any exception we want spring-kafka to resend the same message. We are throwing RunTime exception on any system error, which was logged by spring-kafka and never committed.
This is fine as we don't want it to commit, but that message stays in spring-kafka and never comes back unless we restart the service. On restart message comes back and executes once again and then stay in spring-kafka
What we tried
1. I have tried both ErrorHandler and RetryingMessageListenerAdapter, but in both cases we have to code in service how to process the message again
This is my consumer
public class MyConsumer{
#KafkaListener
public void receive(...){
// application logic to return success/failure
if(success){
acknowledgement.acknowledge();
}else{
throw new RunTimeException();
}
}
}
Also I have following configurations for container factory
factory.getContainerProperties().setErrorHandler(new ErrorHandler(){
#Override
public void handle(...){
throw new RunTimeException("");
}
});
While executing the flow, control is coming inside both first to receive and then handle method. After that service waits for new message. However I was expecting, since we threw an exception, and message is not committed, same message should land in receive method again.
Is there any way, we can tell spring kafka "do not commit this message and send it again asap?"
1.2.x is no longer supported; 1.x users are recommended to upgrade to at least 1.3.x (currently 1.3.8) because of its much simpler threading model, thanks to KIP-62.
The current version is 2.2.2.
2.0.1 introduced the SeekToCurrentErrorHandler which re-seeks the failed record so that it is redelivered.
With earlier versions, you had to stop and restart the container to redeliver a failed message, or add retry to the listener adapter.
I suggest you upgrade to the newest possible release.
Unfortunately version available for us to use is 1.3.7.RELEASE.
I have tried implementing the ConsumerSeekAware interface. Below is how I am doing it and I can see message delivering repreatedly
Consumer
public class MyConsumer implements ConsumerSeekAware{
private ConsumerSeekCallback consumerSeekCallback;
if(condition) {
acknowledgement.acknowledge();
}else {
consumerSeekCallback.seek((String)headers.get("kafka_receivedTopic"),
(int) headers.get("kafka_receivedPartitionId"),
(int) headers.get("kafka_offset"));
}
}
#Override
public void registerSeekCallback(ConsumerSeekCallback consumerSeekCallback) {
this.consumerSeekCallback = consumerSeekCallback;
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> arg0, ConsumerSeekCallback arg1) {
LOGGER.debug("onIdleContainer called");
}
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> arg0, ConsumerSeekCallback arg1) {
LOGGER.debug("onPartitionsAssigned called");
}
}
Config
public class MyConsumerConfig {
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
// Set server, deserializer, group id
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return props;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, MyModel> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, MyModel> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(new DefaultKafkaConsumerFactory<>(consumerConfigs()));
factory.getContainerProperties().setAckMode(AckMode.MANUAL);
return factory;
}
#Bean
public MyConsumer receiver() {
return new MyConsumer();
}
}
Here is my setup:
ConsumerSeekAware implementation:
public class ReplayJobKafkaConsumer implements ConsumerSeekAware, AcknowledgingMessageListener<String, String> {
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> map, ConsumerSeekCallback consumerSeekCallback) {
}
#Override
public void onIdleContainer(Map<TopicPartition, Long> map, ConsumerSeekCallback consumerSeekCallback) {
}
private static final ThreadLocal<ConsumerSeekCallback> seekCallBack = new ThreadLocal<>();
private static ConsumerSeekCallback consumerSeekCallback;;
#Override
public void registerSeekCallback(ConsumerSeekCallback callback) {
this.seekCallBack.set(callback);
consumerSeekCallback = callback;
}
public void onMessage(final ConsumerRecord<String, String> data, final Acknowledgment acknowledgment) {
}
public static ThreadLocal<ConsumerSeekCallback> getSeekCallback(){
return seekCallBack;
}
public static ConsumerSeekCallback getAnotherSeekCallback(){
return consumerSeekCallback;
}
}
My Spring Boot application approximates to:
#SpringBootApplication
public class ReplayJobApplication{
...
public void run(final String... args){
context = SpringApplication.run(ReplayJobApplication.class, args);
ReplayJobKafkaConsumer.getAnotherSeekCallback().seek("top", 0, 23);
}
...}
The above setup works. Now I can run this application using
java -jar -Dstart.offset=0....
But it only works if the seekcallback variable is not a ThreadLocal. I need this to be accessible at the Spring Boot application as that is how I intend running this consumer. TEMP-TOPIC's other consumers can still be processing, but I intend to run this consumer on a need basis with a start and end offset. While the command line parameters can be read in the consumer, the concerns I have are
callback variable is static (I cannot possibly create an instance of ReplayJobKafkaConsumer
it is a plain variable and not a ThreadLocal
Though the life time of this container is only going to be from start to end, I wonder if this setup is flawed and need some confirmation that this implementation is OK.
You appear to have some fundamental misunderstanding of what's going on.
The ThreadLocal is needed because the Kafka consumer object is not thread-safe. If you store the callback in a ThreadLocal, you can perform arbitrary seek operations at runtime - either from the onMessage method, or by listening for an ListenerContainerIdleEvent when there are no messages.
You can't perform arbitrary seeks ReplayJobKafkaConsumer.getAnotherSeekCallback().seek("top", 0, 23); from another thread.
You can't perform arbitrary seeks before partitions have been assigned.
So, as I have been telling you in other answers/comments, you must do the seek when the partition(s) are assigned.
#Override
public void onPartitionsAssigned(Map<TopicPartition, Long> map, ConsumerSeekCallback consumerSeekCallback) {
// Do the seeks here using the `consumerSeekCallback` parameter.
}
With modern versions of spring-kafka, you don't need to use ConsumerSeekAware unless you want to perform arbitrary seeks at runtime (after the initial seek). You can use a ConsumerAwareRebalanceListener instead.
My kafka listener should process messages in sequential order , onMessage method should process messages synchronously, I dont want my listener to process multiple messages at the same time, the onmessage method first stops
org.springframework.kafka.listener.MessageListenerContainer
then delgates payload to a synchronized method, after complete processing , starts listener back. Other options ofcousrse are to use a blocking queue, executor service etc, need advice on better strategy to achieve this, does kafka consumer has any feature built to process messages in series?
here is my code.
I changed implementation to this
public static class KafkaReadMsgTask implements Runnable{
#Override
public void run() {
KakfaMsgConumerImpl kakfaMsgConumerImpl=null;;
try{
kakfaMsgConumerImpl=SpContext.getBean(KakfaMsgConumerImpl.class);
kakfaMsgConumerImpl.pollFormDef();
kakfaMsgConumerImpl.pollFormData();
} catch (Exception e){
logger.error(" kafka listener errors "+e);
kakfaMsgConumerImpl.pauseTask();
}
}
}
#Component
public static class KakfaMsgConumerImpl {
#Autowired
ObjectMapper mapper;
#Autowired
FormSink formSink;
#Autowired
Environment env;
#Resource(name="formDefConsumer")
Consumer formDefConsumer;
#Resource(name="formDataConsumer")
Consumer formDataConsumer;
ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
public void startPolling() throws Exception{
executor.scheduleAtFixedRate(new KafkaReadMsgTask(),10, 3,TimeUnit.SECONDS);
}
public void pauseTask(){
try{
Thread.sleep (120000l);
}catch(Exception e){
throw new RuntimeException(e);
}
}
public void pollFormDef() throws Exception{
ConsumerRecords<Long, String> records =formDefConsumer.poll(0);
if(!records.isEmpty()){
int recordsCount=records.count();
if(logger.isDebugEnabled()){
logger.debug(" form-def consumer poll records size "+recordsCount);
}
if(records.count()>1){
logger.warn(" form-def consumer poll returned records more than 1 , expected 1 , received "+recordsCount);
}
ConsumerRecord<Long,String> record= records.iterator().next();
processFormDef(record.key(), record.value());
}
}
void pollFormData() throws Exception{
ConsumerRecords<Long, String> records =formDataConsumer.poll(0);
if(!records.isEmpty()){
int recordsCount=records.count();
if(logger.isDebugEnabled()){
logger.debug(" form-data consumer poll records size "+recordsCount);
}
if(records.count()>1){
logger.warn(" form-data consumer poll returned records more than 1 , expected 1 , received "+recordsCount);
} ConsumerRecord<Long,String> record= records.iterator().next();
processFormData(record.key(), record.value());
}
}
void processFormDef(Long key, String msg) throws Exception{
if(logger.isDebugEnabled()){
logger.debug(" key "+key+" payload : "+msg);
}
FormDefinition formDefinition= mapper.readValue(msg, FormDefinition.class);
formSink.createFromDef(formDefinition);
logger.debug(" processed message, key: "+key+ " msg : "+msg);
Thread.sleep(60000l);
}
void processFormData(Long key, String msg) throws Exception{
if(logger.isDebugEnabled()){
logger.debug(" key "+key+" payload : "+msg);
}
FormData formData= mapper.readValue(msg, FormData.class);
formSink.persists(formData);
logger.debug(" processed message, key: "+key+ " msg : "+msg);
Thread.sleep(60000l);
}
}
Using a message-driven listener container is not the right technology for this application; it looks like you want to consume messages alternately from two different topics.
Furthermore, stopping the container on the consumer thread won't take effect anyway, until the thread exits the method, at which time the consumer will be closed.
I would suggest you use the consumer factory to create two consumers; subscribe to the topics, set the max.poll.records on each to 1 and call the poll() method on each alternately.
I managed to setup an Hystrix Command to be called from an Undertow HTTP Handler:
public void handleRequest(HttpServerExchange exchange) throws Exception {
if (exchange.isInIoThread()) {
exchange.dispatch(this);
return;
}
RpcClient rpcClient = new RpcClient(/* ... */);
try {
byte[] response = new RpcCommand(rpcClient).execute();
// send the response
} catch (Exception e) {
// send an error
}
}
This works nice. But now, I would like to use the observable feature of Hystrix, calling observe instead of execute, making the code non-blocking.
public void handleRequest(HttpServerExchange exchange) throws Exception {
RpcClient rpcClient = new RpcClient(/* ... */);
new RpcCommand(rpcClient).observe().subscribe(new Observer<byte[]>(){
#Override
public void onCompleted() {
}
#Override
public void onError(Throwable throwable) {
exchange.setStatusCode(StatusCodes.INTERNAL_SERVER_ERROR);
exchange.endExchange();
}
#Override
public void onNext(byte[] body) {
exchange.getResponseHeaders().add(Headers.CONTENT_TYPE, "text/plain");
exchange.getResponseSender().send(ByteBuffer.wrap(body));
}
});
}
As expected (reading the doc), the handler returns immediately and as a consequence, the exchange is ended; when the onNext callback is executed, it fails with an exception:
Caused by: java.lang.IllegalStateException: UT000127: Response has already been sent
at io.undertow.io.AsyncSenderImpl.send(AsyncSenderImpl.java:122)
at io.undertow.io.AsyncSenderImpl.send(AsyncSenderImpl.java:272)
at com.xxx.poc.undertow.DiyServerBootstrap$1$1.onNext(DiyServerBootstrap.java:141)
at com.xxx.poc.undertow.DiyServerBootstrap$1$1.onNext(DiyServerBootstrap.java:115)
at rx.internal.util.ObserverSubscriber.onNext(ObserverSubscriber.java:34)
Is there a way to tell Undertow that the handler is doing IO asynchronously? I expect to use a lot of non-blocking code to access database and other services.
Thanks in advance!
You should dispatch() a Runnable to have the exchange not end when the handleRequest method returns. Since the creation of the client and subscription are pretty simple tasks, you can do it on the same thread with SameThreadExecutor.INSTANCE like this:
public void handleRequest(HttpServerExchange exchange) throws Exception {
exchange.dispatch(SameThreadExecutor.INSTANCE, () -> {
RpcClient rpcClient = new RpcClient(/* ... */);
new RpcCommand(rpcClient).observe().subscribe(new Observer<byte[]>(){
//...
});
});
}
(If you do not pass an executor to dispatch(), it will dispatch it to the XNIO worker thread pool. If you wish to do the client creation and subscription on your own executor, then you should pass that instead.)