I have scheduled a flow using nextScheduledActivity(). After once the flow is initiated, I want to stop the scheduled flow. How can we achieve that?
There are at least 3 ways in which it can be stopped:
Consume the incoming state. Once state is consumed, it stops.
Specify a time in past in nextScheduledActivity() based on an exit condition - this disables any further scheduling.
return null from nextScheduledActivity() based on an exit condition
Related
I'm using ReplyingKafkaTemplate to make synchronous call with reply. From what i found till now, every time i'm going to use template i call start() and after receiving response the stop() method. However, I came across a problem with the message commit, the offset of my consumer was not increasing. I assumed, that is because consumer did not have time to make a commit, because basic commit time (property "auto.commit.interval.ms") is set to 5 seconds in ConsumerConfig class and I'm stopping him immediatelly after receiving a message. So i changed this time to 1 ms, to commit immediatelly after receiving message. This way it's working, but i would like to understand it better
My question is : How start() and stop() methods should be used properly, is there a purpose to start it before every call and stop after ? And what is a right way to make sure that commit was made ?
Btw. I would be honored if Gary answered the question
You should not start and stop the container each time; just leave the reply container running all the time.
In any case, you should leave enable.auto.commit alone - while it's default is true, Spring will set it to false unless you explicitly set it to true.
The container will commit the offset in a more deterministic manner than the built-in auto commit mechanism.
I have this kafka consumer:
new ReactiveKafkaConsumerTemplate<>(createReceivingOptions())
it happily processes messages, I set
max-poll-records=1
so that things don't happen to fast for me. I can verify via logging breakpoint in poll method on
final Map<TopicPartition, List<ConsumerRecord<K, V>>> records = pollForFetches(timer);
how many records poll returned, and yes it's one. Then I asked it to pause all assigned partitions. In log is can see that it worked!
o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=Testing, groupId=Testing] Pausing partitions [TestTopic-0]
and from that point on I can see, that poll gets only 0 records and also this log:
Skipping fetching records for assigned partition TestTopic-0 because it is paused
OK great, it works! But wait, why is my whole topic getting processed then?
Then I found out, that at certain point there is also this log:
Consumer clientId=Testing, groupId=Testing] Resuming partitions [TestTopic-0]
what? Who is calling that? And then I also found out, that there are multiple requests for pausing all over place, not just the one I actually invoked.
Pausing is somehow used by reactive and cannot be used manually? Or does someone have explanation why …clients.consumer.KafkaConsumer does pause/resume topic on it's own all the time, and manual pause because of that gets unpaused?
After reviewing the ConsumerEventLoop code, the reactive client is using pause/resume internally, to handle back pressure - when the downstream can't receive any more data he pauses all assigned partitions and unconditionally resumes them when the back pressure is relieved.
It seems to me that it needs to keep track of whether the pause was done because of back-pressure and only resume in that case.
It looks like it used to do that before this commit.
Perhaps you could use back pressure instead to force the pause?
I have a bit of confusion about the way BaseSensorOperator's parameters work: timeout & poke_interval.
Consider this usage of the sensor :
BaseSensorOperator(
soft_fail=True,
poke_interval = 4*60*60, # Poke every 4 hours
timeout = 12*60*60, # Timeout after 12 hours
)
The documentation mentions the timeout acts to set the task to 'fail' after it runs out. But I'm using a soft_fail=True, I don't think it retains the same behavior, because I've found the task failed instead of skipping after I've used both parameters soft_fail and timeout.
So what does happen here?
The sensor pokes every 4 hours, and at every poke, will wait for the duration of the timeout (12 hours)?
Or does it poke every 4 hours, for a total of 3 pokes, then times out?
Also, what happens with these parameters if I use the mode="reschedule"?
Here's the documentation of the BaseSensorOperator
class BaseSensorOperator(BaseOperator, SkipMixin):
"""
Sensor operators are derived from this class and inherit these attributes.
Sensor operators keep executing at a time interval and succeed when
a criteria is met and fail if and when they time out.
:param soft_fail: Set to true to mark the task as SKIPPED on failure
:type soft_fail: bool
:param poke_interval: Time in seconds that the job should wait in
between each tries
:type poke_interval: int
:param timeout: Time, in seconds before the task times out and fails.
:type timeout: int
:param mode: How the sensor operates.
Options are: ``{ poke | reschedule }``, default is ``poke``.
When set to ``poke`` the sensor is taking up a worker slot for its
whole execution time and sleeps between pokes. Use this mode if the
expected runtime of the sensor is short or if a short poke interval
is requried.
When set to ``reschedule`` the sensor task frees the worker slot when
the criteria is not yet met and it's rescheduled at a later time. Use
this mode if the expected time until the criteria is met is. The poke
inteval should be more than one minute to prevent too much load on
the scheduler.
:type mode: str
"""
Defining the terms
poke_interval: the duration b/w successive 'pokes' (evaluation the necessary condition that is being 'sensed')
timeout: Just poking indefinitely is inadmissible (if for e.g. your buggy code is poking on day to become 29 whenever month is 2, it will keep poking for upto 4 years). So we define a maximum period beyond which we stop poking and terminate (the sensor is marked either FAILED or SKIPPED)
soft_fail: Normally (when soft_fail=False), sensor is marked as FAILED after timeout. When soft_fail=True, sensor will instead be marked as SKIPPED after timeout
mode: This is a slightly complex
Any task (including sensor) when runs, eats up a slot in some pool (either default pool or explicitly specified pool); essentially meaning that it takes up some resources.
For sensors, this is
wasteful: as a slot is consumed even when we are just waiting (doing no actual work
dangerous: if your workflow has too many sensors that go into sensing around the same time, they can freeze a lot of resources for quite a bit. In fact too many having ExternalTaskSensors is notorious for putting entire workflows (DAGs) into deadlocks
To overcome this problem, Airflow v1.10.2 introduced modes in sensors
mode='poke' (default) means the existing behaviour that we discussed above
mode='reschedule' means after a poke attempt, rather than going to sleep, the sensor will behave as though it failed (in current attempt) and it's status will change from RUNNING to UP_FOR_RETRY. That ways, it will release it's slot, allowing other tasks to progress while it waits for another poke attempt
Citing the relevant snippet from code here
if self.reschedule:
reschedule_date = timezone.utcnow() + timedelta(
seconds=self._get_next_poke_interval(started_at, try_number))
raise AirflowRescheduleException(reschedule_date)
else:
sleep(self._get_next_poke_interval(started_at, try_number))
try_number += 1
For more info read Sensors Params section
And now answering your questions directly
Q1
The sensor pokes every 4 hours, and at every poke, will wait for the duration of the timeout (12 hours)?
Or does it poke every 4 hours, for a total of 3 pokes, then times out?
point 2. is correct
Q2
Also, what happens with these parameters if I use the
mode="reschedule"?
As explained earlier, each one of those params are independent and setting mode='reschedule' doesn't alter their behaviour in any way
BaseSensorOperator(
soft_fail=True,
poke_interval = 4*60*60, # Poke every 4 hours
timeout = 12*60*60, # Timeout of 12 hours
mode = "reschedule"
)
Let's say the criteria is not met at the first poke. So it will run again after 4 hours of interval. But the worker slot will be freed during the wait since we're using the mode="reschedule".
That is what I understood.
I was confused by vertx instance. The first time seeing the docs, i think the instance means the the number of event-looping threads.
As i dig into the source code(vertx 2.1.2), i found the verticle instance means a task in the event-loop thread group. The event-loop thread always waits on the selector and run tasks.
Then the first Question comes:
Is it necessary to have verticle instances in the vertx? Since the vertcle run only once by one event loop. To be more precise, the event-loop thread run the Verticle start method and throw it away, it works like an entry and that is all.
My second Question is:
How to collect the results of multiple events?
Scenario
send multiple queries on the event bus with the same handler instance
the handler waits for every callback and modify flags
if the flags reach the threadhold, do some jobs
problem
when multiple events callback, it has a chance that multiple event-loop threads will execute the handler, thus there is a race condition that the jobs will be run multiple times. How can i avoid it?
Any solutions will be apperciated.
Is it necessary to have verticle instances in the vertx?
No. This would not be required. You do not have to create any instances of the Verticle class.
How to collect the results of multiple events?
problem
when multiple events callback, it has a chance that multiple event-loop threads will execute the handler, thus there is a race condition that the jobs will be run multiple times. How can i avoid it?
Each query which is sent over the event bus will have a corresponding Handler object. They will not share the same Handler instance. For every response to a query, it's corresponding Handler object's handle() method is called. Hence, there would be no place for a race condition over a specific Handler object.
When a workflow has a receive activity that occurs after another receive activity and the second receive activity is called first the workflow holds the caller by blocking for 1 minute before timing out.
I want the workflow to return immediately when there are no matching workflow instances.
I do not want to change the timeout on the client as some calls may take a while.
This is a known issue in WF4, at least I am not aware of it being fixed yet.