Axon #EndSaga SagaEventHandler is not triggered at all, and #StartSaga SagaEventHandler is retried multiple times - axon

I am trying to create a saga, and start this saga by triggering an event. However, after the event is triggered, I just get an endless loop for "claim on token". and it retries to execute this code all the time. and it just runs it after a few seconds.
#StartSaga
#SagaEventHandler(associationProperty = "eventId")
fun on(event: CreateTargetReferenceEvent) {
println(event.eventId)
}
My issue here is that I try to trigger #EndSaga event, but it never happened. I am sure the eventId is the same in the #StartSaga and #EndSaga, and both of the events are triggered in the right way since the corresponding event handlers are triggered elsewhere.
I'm not sure what I have missed here to make the #EndSaga triggered. Please help.
This the #Saga component
#Component
#Saga
internal class TestSaga {
var testString: String = ""
#Autowired
private lateinit var commandGateway: CommandGateway
#StartSaga
#SagaEventHandler(associationProperty = "eventId")
fun on(event: CreateTargetReferenceEvent) {
println(event.eventId)
}
#EndSaga
#SagaEventHandler(associationProperty = "eventId")
fun on(event: UpdateTargetReferenceEvent) {
println(event.eventId)
}
}
And there are the outputs:
2022-11-01 21:49:10.529 WARN 11916 --- [agaProcessor]-0] o.a.e.TrackingEventProcessor : Releasing claim on token and preparing for retry in 4s
Hibernate: update token_entry set owner=null where owner=? and processor_name=? and segment=?
2022-11-01 21:49:10.530 INFO 11916 --- [agaProcessor]-0] o.a.e.TrackingEventProcessor : Released claim
Hibernate: update token_entry set timestamp=? where processor_name=? and segment=? and owner=?
Hibernate: update token_entry set timestamp=? where processor_name=? and segment=? and owner=?
Hibernate: update token_entry set timestamp=? where processor_name=? and segment=? and owner=?
Hibernate: select tokenentry0_.processor_name as processo1_7_0_, tokenentry0_.segment as segment2_7_0_, tokenentry0_.owner as owner3_7_0_, tokenentry0_.timestamp as timestam4_7_0_, tokenentry0_.token as token5_7_0_, tokenentry0_.token_type as token_ty6_7_0_ from token_entry tokenentry0_ where tokenentry0_.processor_name=? and tokenentry0_.segment=? for update
Hibernate: update token_entry set owner=?, timestamp=?, token=?, token_type=? where processor_name=? and segment=?
2022-11-01 21:49:14.536 INFO 11916 --- [agaProcessor]-0] o.a.e.TrackingEventProcessor : Fetched token: null for segment: Segment[0/0]
Hibernate: update token_entry set token=?, token_type=?, timestamp=? where owner=? and processor_name=? and segment=?
Hibernate: select associatio0_.saga_id as col_0_0_ from association_value_entry associatio0_ where associatio0_.association_key=? and associatio0_.association_value=? and associatio0_.saga_type=?
baccd32c-1547-4621-a04c-3a5cb285a9af
2022-11-01 21:49:14.551 WARN 11916 --- [agaProcessor]-0] o.a.e.TrackingEventProcessor : Releasing claim on token and preparing for retry in 8s
Hibernate: update token_entry set owner=null where owner=? and processor_name=? and segment=?
2022-11-01 21:49:14.553 INFO 11916 --- [agaProcessor]-0] o.a.e.TrackingEventProcessor : Released claim

As Vaelyr said, don't use #Component. It's not a component as it has a different lifecycle. Typically with a Saga, you orchestrate over different aggregates. So the UpdateTargetReferenceEvent will be triggered by a command the saga sends.

Yes, adding #Transient above commandGateway in Saga makes it work like magic. Thanks
Vaelyr

Related

(airflow) emr steps operator -> emr steps sensor; sensor failed -> trigger before operator

I want to handle failovers.
But, sensor failed -> retries only sensor self.
But, I want to trigger before operator.
This is flow chart.
a -> a_sensor (failed) -> a (retry) -> a_sensor -> (done)
Can I do this?
I recommend waiting the EMR job in the operator itself, even if this keeps the task running and occupying the worker slot, but it doesn't consume much resources, and you can simply manage the timeout, cleanup and the retry strategy:
class EmrOperator(BaseOperator):
...
def execute():
run_job():
wait_job()
def wait_job():
while not (is_finished()):
sleep(10s)
def on_kill():
cleanup()
And you can use the official operator EmrAddStepsOperator which already supports this.
And if you want to implement what you have mentioned in the question, Airflow doesn't support the retry for a group of tasks yet, but you can achieve this using callbacks:
a = EmrOperator(..., retries=0)
a_sensor = EmrSensor(, retries=0, on_failure_callback=emr_a_callback)
def emr_a_callback(ti, dag_run,):
max_retries = 3
retry_num = ti.xcom_pull(dag_ids=ti.task_id, key="retry_num")
if retry > max_retries:
retrun # do nothing
task_a = dag_run.get_task_instance("<task a id>")
task_a.state = None # pass a state to None
ti.state = None # pass the sensor state to None

airflow reschedule error: dependency 'Task Instance State' PASSED: False

I have a customized sensor that looked like below. The idea is one dag can have different tasks that can start from different time, and take advantage of the built-in airflow reschedule system.
class MySensor(BaseSensorOperator):
def __init__(self, *, start_time, tz, ...)
super().__init__(**kwargs)
self._start_time = start_time
self._tz = tz
#provide_session
def execute(self, context, session: Session=None):
dt_start = datetime.combine(context['next_execution_date'].date(), self._start_time)
dt_start = dt_start.replace(tzinfo=self._tz)
if datetime.now().timestamp() < dt_start.timestamp():
dt_reschedule = datetime.utcnow().replace(tzinfo=UTC)
dt_reschedule += timedelta(seconds=dt_start.timestamp()-datetime.now().timestamp())
raise AirflowRescheduleException(dt_reschedule)
return super().execute(context)
In the dag, I have something as below. However, I notice when the mode is 'poke', which is default, the sensor will not work properly.
with DAG( schedule='0 10 * * 1-5', ... ) as dag:
task1 = MySensor(start_time=time(14,0), mode='poke')
task2 = MySensor(start_time=time(16,0), mode='reschedule')
... ...
From the log, i can see the following:
{taskinstance.py:1141} INFO - Rescheduling task, mark task as UP_FOR_RESCHEDULE
[5s later]
{local_task_job.py:102} INFO - Task exited with return code 0
[14s later]
{taskinstance.py:687} DEBUG - <TaskInstance: mydag.mytask execution_date [failed]> dependency 'Task Instance State' PASSED: False, Task in in the 'failed' state which is not a valid state for execution. The task must be cleared in order to be run.
{taskinstance.py:664} INFO - Dependencies not met for <TaskInstance ... [failed]> ...
Why rescheduling not working with mode='poke'? And when did the scheduler(?) flip the state of the taskinstance from "up_for_reschedule" to "failed"? Any better way to start the each task/sensor at different time? The sensor is an improved version of FileSensor, and checks a bunch of files or files with patterns. My current option is to force every task with mode='reschedule'
Airflow version 1.10.12

Index state never change to ENABLED on Titan with Amazon DynamoDB backend

I'm trying to use composite index on DynamoDB and the index never switches from from INSTALLED to REGISTERED state.
Here is the code I used to create it
graph.tx().rollback(); //Never create new indexes while a transaction is active
TitanManagement mgmt=graph.openManagement();
PropertyKey propertyKey=getOrCreateIfNotExist(mgmt, "propertyKeyName");
String indexName = makePropertyKeyIndexName(propertyKey);
if (mgmt.getGraphIndex(indexName)==null) {
mgmt.buildIndex(indexName, Vertex.class).addKey(propertyKey).buildCompositeIndex();
mgmt.commit();
graph.tx().commit();
ManagementSystem.awaitGraphIndexStatus(graph, indexName).status(SchemaStatus.REGISTERED).call();
}else {
mgmt.rollback();
}
A sample of the log is:
...
...
612775 [main] INFO
com.thinkaurelius.titan.graphdb.database.management.GraphIndexStatusWatcher
- Some key(s) on index myIndex do not currently have status REGISTERED: type=INSTALLED 613275 [main] INFO
com.thinkaurelius.titan.graphdb.database.management.GraphIndexStatusWatcher
- Some key(s) on index typeIndex do not currently have status REGISTERED: type=INSTALLED 613275 [main] INFO
com.thinkaurelius.titan.graphdb.database.management.GraphIndexStatusWatcher
- Timed out (PT1M) while waiting for index typeIndex to converge on status REGISTERED
Waiting for a longer time does the trick. Example:
ManagementSystem.awaitGraphIndexStatus(graph, propertyKeyIndexName)
.status(SchemaStatus.ENABLED)
.timeout(10, ChronoUnit.MINUTES) // set timeout to 10 min
.call();

SQLITE_ERROR: Connection is closed when connecting from Spark via JDBC to SQLite database

I am using Apache Spark 1.5.1 and trying to connect to a local SQLite database named clinton.db. Creating a data frame from a table of the database works fine but when I do some operations on the created object, I get the error below which says "SQL error or missing database (Connection is closed)". Funny thing is that I get the result of the operation nevertheless. Any idea what I can do to solve the problem, i.e., avoid the error?
Start command for spark-shell:
../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar
Reading from the database:
val emails = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "Emails")).load()
Simple count (fails):
emails.count
Error:
15/09/30 09:06:39 WARN JDBCRDD: Exception closing statement
java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)
at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)
at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
at org.apache.spark.scheduler.Task.run(Task.scala:90)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
res1: Long = 7945
I got the same error today, and the important line is just before the exception:
15/11/30 12:13:02 INFO jdbc.JDBCRDD: closed connection
15/11/30 12:13:02 WARN jdbc.JDBCRDD: Exception closing statement
java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)
at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
So Spark succeeded to close the JDBC connection, and then it fails to close the JDBC statement
Looking at the source, close() is called twice:
Line 358 (org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, Spark 1.5.1)
context.addTaskCompletionListener{ context => close() }
Line 469
override def hasNext: Boolean = {
if (!finished) {
if (!gotNext) {
nextValue = getNext()
if (finished) {
close()
}
gotNext = true
}
}
!finished
}
If you look at the close() method (line 443)
def close() {
if (closed) return
you can see that it checks the variable closed, but that value is never set to true.
If I see it correctly, this bug is still in the master. I have filed a bug report.
Source: JDBCRDD.scala (lines numbers differ slightly)

Missing user Profile ,But in Conbridge.conf it is present Asterisk-11.5.1 app_confbridge

Dailplan
'5006005' => 1. answer()
2. Set(myconference=4000)
3. Set(TMP_CONF_COUNT=${CONFBRIDGE_INFO(parties,myconference)})
4. verbose(3,"sabse 4000 has users:${TMP_CONF_COUNT}) [pbx_config]
5. Set(TMP_CONF_LOCKED=${CONFBRIDGE_INFO(locked,myconference)}) [pbx_config]
6. verbose(3,"sasbse 4000 has users:${TMP_CONF_COUNT} and lock or unlock:${TMP_CONF_LOCKED}") [pbx_config]
[Press5MuteAll] 7. ConfBridge(4000,,6016adminuser,) [pbx_config]
8. Set(TMP_CONF_COUNT=${CONFBRIDGE_INFO(parties,myconference)}) [pbx_config]
9. Set(TMP_CONF_LOCKED=${CONFBRIDGE_INFO(locked,myconference)}) [pbx_config]
10. verbose(3,"sasbse 4000 has users:${TMP_CONF_COUNT} and lock or unlock:${TMP_CONF_LOCKED}") [pbx_config]
*CLI> confbridge show profile user 6016adminuser
--------------------------------------------
Name: 6016adminuser
Admin: true
Marked User: false
Start Muted: false
MOH When Empty: enabled
MOH Class: default
Announcement:
Quiet: disabled
Wait Marked: disabled
END Marked: disabled
Drop_silence: enabled
Silence Threshold: 160ms
Talking Threshold: 0ms
Denoise: enabled
Jitterbuffer: disabled
Talk Detect Events: disabled
DTMF Pass Through: enabled
PIN: 4321
Announce User Count: enabled
Announce join/leave: enabled
Announce User Count all: enabled
===
static int confbridge_exec(struct ast_channel *chan, const char *data)
if(conf_find_user_profile(chan, u_profile_name, &conference_bridge_user.u_profile)==NULL){
ast_verb(3,"\n NULL USER\n");
}
if (!conf_find_user_profile(chan, u_profile_name, &conference_bridge_user.u_profile)) {
ast_verb(3, "Conference user profile %s does not exist\n", u_profile_name);
ast_log(LOG_WARNING, "Conference user profile %s does not exist\n", u_profile_name);
res = -1;
goto confbridge_cleanup;
}
confbridge_cleanup:
ast_bridge_features_cleanup(&conference_bridge_user.features);
conf_bridge_profile_destroy(&conference_bridge_user.b_profile);
===========
so why user is missing when user is preset in confbridge.conf?
-- Executing [5006005#ConfBridgeIncomingCalls:1] Answer("SIP/6017-00000002", "") in new stack
> 0xb766d7d0 -- Probation passed - setting RTP source address to 172.18.100.49:8000
-- Executing [5006005#ConfBridgeIncomingCalls:2] Set("SIP/6017-00000002", "myconference=4000") in new stack
-- Executing [5006005#ConfBridgeIncomingCalls:3] Set("SIP/6017-00000002", "TMP_CONF_COUNT=0") in new stack
-- Executing [5006005#ConfBridgeIncomingCalls:4] Verbose("SIP/6017-00000002", "3,"sabse 4000 has users:0") in new stack
-- "sabse 4000 has users:0
-- Executing [5006005#ConfBridgeIncomingCalls:5] Set("SIP/6017-00000002", "TMP_CONF_LOCKED=0") in new stack
-- Executing [5006005#ConfBridgeIncomingCalls:6] Verbose("SIP/6017-00000002", "3,"sasbse 4000 has users:0 and lock or unlock:0"") in new stack
-- "sasbse 4000 has users:0 and lock or unlock:0"
-- Executing [5006005#ConfBridgeIncomingCalls:7] ConfBridge("SIP/6017-00000002", "4000,,6016adminuser,") in new stack
--
info === 4000,6016adminuser,, --
**NULL USER**
**-- Conference user profile 6016adminuser, does not exist**
[2013-12-27 15:48:45] WARNING[20071][C-00000002]: app_confbridge.c:3722 confbridge_exec: Conference user profile 6016adminuser, does not exist
[2013-12-27 15:48:45] WARNING[20071][C-00000002]: app_confbridge.c:3722 confbridge_exec: Conference user profile 6016adminuser, does no
I can see in source code:
ast_log(LOG_WARNING, "Conference user profile %s does not exist\n", u_profile_name);
As you can see it have no comma after profile.
So solution is:
change your dialplan to remove trailing comma after userprofile

Resources