Status of accessing the current offset of a consumer? - spring-kafka

I see that there was some discussion on this subject before[*], but I cannot see any way how to access this. Is it possible yet?
Reasoning: we have for example ContainerStoppingErrorHandler. It would be vital, if container was automatically stopped, to know, where it stopped. I guess ContainerStoppingErrorHandler could be improved to log partition/offset, as it has access to Consumer (I hope I'm not mistaken). But if I'm stopping/pausing it manually via MessageListenerContainer, I cannot see a way how to get info about last processed offset. Is there a way?
[*] https://github.com/spring-projects/spring-kafka/issues/431

Related

What exactly is timekey_wait parameter in Fluentd?

I'm new in EFK. I have a problem with showing logs in Kibana. I already resolved, but I'm not sure of my approach.
Problem: Kibana shows log after 10 minutes when Elasticsearch is restarted.
I studied in https://docs.fluentd.org/configuration/buffer-section
And I found out that if I configure timekey_wait parameter in buffer section is 0s, Kibana shows logs without delay.
The problem is resolved, but I have still concerned about timekey_wait parameter.
Are there others impacted by the change?
Why is timekey_wait needed? Please give me an example of the necessity of it.
Thank you for your time!
1 & 2 ) According to the documentation by using timekey_wait parameter fluentd waits the specified amount of time, before writing chunks. By this way delayed log lines that needs to be in the same log chunk won't be missed.
If your timekey is 60m and timekey_wait is 10m, now the chunks will be written after 70m not 60m.
If you don't have delayed log lines to be come it becomes less important.In one of my implementations, I use flush_interval parameter. That way timekey is not needed. (buffer chunks will be flushed after this time)

Seek to an offset via an external trigger

Currently I use the AcknoledgingMessageListener to implement a Kafka consumer using spring-Kafka. This implementation helps me listen on a specific topic and process messages with a manual ack.
I now need to build the following capability:
Let us assume that for an some environmental exception or some entry of bad data via this topic, I need to replay data on a topic from and to a specific offset. This would be a manual trigger (mostly via the execution of a Java class).
It would be ideal if I can retrieve the messages between those offsets and feed it is a replay topic so that a new consumer can process those messages thus keeping the offsets intact on the original topic.
CosumerSeekAware interface - if this is the answer how can I trigger this externally? Via let say a mvn -Dexec. I am not sure if this is even possible
Also let say that I have an crash time stamp with me, is it possible to introspect the topic to find the offset corresponding to the crash so that I can replay from that offset?
Can I find offsets corresponding to some specific data so that I can replay those specific offsets?
All of these requirements are towards building a resilience layer around our Kafka capabilities. I need all of these to be managed by a separate executable class that can be triggered manually providing the relevant data (like time stamps etc). This class should determine offsets and then seek to that offset, retrieve the messages corresponding to those offsets and post them to a separate topic. Can someone please point me in the right direction? I’m afraid I’m going around in circles.
so that a new consumer can process those messages thus keeping the offsets intact on the original topic.
Just create a new listener container with a different group id (new consumer) and use a ConsumerAwareRebalanceListener (or ConsumerSeekAware) to perform the seeks when the partitions are assigned.
Here is a sample CARL that seeks all assigned topics based on a timestamp.
You will need some mechanism to know when the new consumer should stop consuming (at which time you can stop() the new container). Maybe set max.poll.records=1 on the new consumer so he doesn't prefetch past the failure point.
I am not sure what you mean by #3.

Vulkan: trouble understanding cycling of framebuffers

In Vulkan,
A semaphore(A) and a fence(X) can be passed to vkAcquireNextImageKHR. That semaphore(A) is subsequently passed to vkQueueSubmit, to wait until the image is released by the Presentation Engine (PE). A fence(Y) can also be passed to vkQueueSubmit. Client code can check when the submission has completed by checking fence(Y).
When fence(Y) signals, this means the PE can display the image.
My question:
How do I know when the PE has finished using the image after a call to vkQueuePresentKHR? To me, it doesn't seem that it would be by checking fence(X), because that is for client code to know when the image can be written to by vkQueueSubmit, isn't it? After the image is sent to vkQueueSubmit, it seems the usefulness of fence(X) is done. Or, can the same fence(X) be used to query the image availability after the call to vkQueuePresentKHR?
I don't know when the image is available again after a call to vkQueuePresentKHR, without having to call vkAcquireNextImageKHR.
The reason this is causing trouble for me is that in an asynchronous, 60fps, triple buffered app (throwaway learning code), things get out of wack like this:
Send an initial framebuffer to the PE. This framebuffer is now unavailable for 16 milliseconds.
Within the 16ms, acquire a second image/framebuffer, submit commands, but don't present.
Do the same as #2, for a third image. We submit it before 16ms.
16ms have gone by, so we vkQueuePresentKHR the second image.
Now, if I call vkAcquireNextImageKHR, the whole thing can fail if image #1 is not yet done being used, because I have acquired three images at this point.
How to know if image #1 is available again without calling vkAcquireNextImageKHR?
How do I know when the PE has finished using the image after a call to vkQueuePresentKHR?
You usually do not need to know.
Either you need to acquire a new VkImage, or you don't. Whether PE has finished or not does not even enter that decision.
Only reason wanting to know is if you want to measure presentation times. There's a special extension for that: VK_GOOGLE_display_timing.
After the image is sent to vkQueueSubmit, it seems the usefulness of fence(X) is done.
Well, you can reuse the fence. But the Implementation has stopped using it as soon as it was signaled and won't be changing its state anymore to anything, if that's what you are asking (and so you are free to vkDestroy it or do other things with it).
I don't know when the image is available again after a call to vkQueuePresentKHR, without having to call vkAcquireNextImageKHR.
Hopefully I cover it below, but I am not precisely sure what the problem here is. I don't know how to eat a soup without a spoon neither. Simply use a spoon— I mean vkAcquireNextImageKHR.
Now, if I call vkAcquireNextImageKHR, the whole thing can fail if image #1 >is not yet done being used, because I have acquired 3 images at this point.
How to know if image #1 is available again without calling >vkAcquireNextImageKHR?
How is it any different than image #1 and #2?
Yes, you may have already acquired all the images the swapchain has to offer, or the PE is "not ready" to give away an image even if it has two.
In the first case the spec advises against calling vkAcquireNextImageKHR with timeout of UINT64_MAX. It is a simple matter of counting the successful vkAcquireNextImageKHR calls vs the vkQueuePresentKHRs. One way may be to simply do one vkAcquireNextImageKHR and then do one vkQueuePresentKHR.
In the second case you can simply call vkAcquireNextImageKHR and you will eventually get the image.
In order to use a swapchain image, You need to acquire it. After that the actual availability of the image for rendering purposes is signaled by the semaphore (A) or the fence (X). You can either use the semaphore (X) during the submission as a wait semaphore or wait on the CPU for the fence (X) and submit after that. For performance reasons the semaphore is a preferred way.
Now when You present an image, You give it back to the Presentation Engine. From now on You cannot use that image for whatever purposes. There is no way to check when that image is available again for You so You can render into it again. You cannot do that. If You want to render into a swapchain image again, You need to acquire another image. And during this operation You once again provide a semaphore or a fence (probably different than those provided when You previously acquired a swapchain image). There is no other way to check when an image is again available than through calling the vkAcquireNextImageKHR() function.
And when You want to implement triple-buffering, You should just select appropriate presentation mode (mailbox mode is the closest match). You shouldn't wait for a specific time before You present an image. You just should present it when You are done rendering into it. Your synchronization should be entirely based on acquire, present commands and semaphores or fences provided during these operations and during submission. Appropriate present mode should do the rest. Detailed explanation of different present modes is available in Intel's tutorial.

sensu: "previous check command execution in progress"

My client-side sensu metric is reporting a WARN and the data is not getting to my OpenTSDB.
It seems to be stuck, but I don't understand what the message is telling me. Can someone translate?
The command is a ruby script.
In /var/log/sensu/sensu-client.log :
{"timestamp":"2014-09-11T16:06:51.928219-0400",
"level":"warn",
"message":"previous check command execution in progress",
"check":{"handler":"metric_store","type":"metric",
"standalone":true,"command":"...",
"output_type":"json","auto_tag_host":"yes",
"interval":60,"description":"description here",
"subscribers"["system"],
"name":"foo_metric","issued":1410466011,"executed":1410465882
}
}
My questions:
what does this message mean?
what causes this?
Does it really mean we are waiting for the same check to run? if so, how do we clear it?
This error means that sensu is (or thinks it is, actually executing this check currently
https://github.com/sensu/sensu/blob/4c36d2684f2e89a9ce811ca53de10cc2eb98f82b/lib/sensu/client.rb#L115
This can be caused by stacking checks, that take longer than their interval to run. (60 seconds in this case)
You can try to set the "timeout" option in the check definition:
https://github.com/sensu/sensu/blob/4c36d2684f2e89a9ce811ca53de10cc2eb98f82b/lib/sensu/client.rb#L101
To try to make sensu time out after a while on that check. You could also add internal logic to your check to make it not hang.
In my case, I had accidentally configured two sensu-client instances to have the same name. I think that caused one of them to always think its checks were already running when in reality they were not. Giving them unique names solved the problem for me.

Starting mutliple orchestrations from parent orchestration and passing messages to them

I have a situation where a main orchestration is responsible for processing a convoy of messages. These messages belong to a set of customers, the orchestration will read the messages as they come in, and for each new customer id it finds, it will spin up a new orchestration that is responsible for processing the messages of a particular customer. I have to preserve the order of messages as they come in, so the newly created orchestrations should process the message it has and wait for additional messages from the main orchestration.
Tried different ways to tackle this, but was not able to successfuly implement it.
I would like to hear your opinions on how this could be done.
Thanks.
It sounds like what you want is a set of nested convoys. While it might be possible to get that working, it's going to... well, hurt. In particular, my first worry would be maintenance: any changes to the process would be a pain in the neck to make, and, much worse, deployment would really, really suck.
Personally, I would really try to find an alternative way to implement this and avoid the convoys if possible, but that would depend a lot on your specific scenario.
A few questions, if you don't mind:
What are your ordering requirements? For example, do you only need ordered processing for each customer on a single incoming batch, or across batches? If the latter, could you make do without the master orchestration and just force a single convoy'd instance per customer? Still not great, but would likely simplify things a lot.
What are you failure requirements with respect to ordering? Should it completely stop processing? Save message and keep going? What about retries?
Is ordering based purely on the arrival time of the message? Is there anything in the message that you could use to force ordering internally instead of relying purely on the arrival time?
What does the processing of the individual messages do? Is the ordering requirement only to ensure that certain preconditions are met when a specific message is processed (for example, messages represent some tree structure that requires parents are processed before children).
I don't think you need a master orchestration to start up the sub-orchestrations. I am assumin you are not talking about the master orchestration implmenting a convoy pattern. So, if that's the case, here's what I might do.
There is a brief example here on how to implment a singleton orchestration. This example shows you how to setup an orchestration that will only ever exist once. All the messages going to it will be lined up in order of receipt and processed one at a time. Your example differs in that you want to have this done by customer ID. This is pretty simple. Promote the customer ID in the inbound message and add it to the correlation type. Now, there will only ever be one instance of the orchestration per customer.
The problem with singletons is this. You have to kill them at some point or they will live forever as dehydrated orchestrations. So, you need to have them end. You can do this if there is a way for the last message for a given customer to signal the orchestration that it's time to die through an attribute or such. If this is not possible, then you need to set a timer. If no messags are received in x seconds, terminate the orch. This is all easy to do, but it can introduce Zombies. Zombies occur when that orchestration is in the process of being shut down when another message for that customer comes in. this can usually be solved by tweeking the time to wait. Regardless, it will cause the occasional Zombie.
A note fromt he field. We've done this and it's really not a great long term solution. We were receiving customer info updates and we had to ensure ordered processing. We did this singleton approach and it's been problematic from the Zombie issue and the exeption issue. If the Singleton orchestration throws an exception, it will block the processing for a all future messages for that customer. So - handle every single possible exception. The real solution would have been to have the far end system check the time stamps from the update messages and discard ones that were older than the last update. We wanted to go this way, but the receiving system didn't want to do this extra work.

Resources