How to properly configure clustered Quartz with MariaDB - mariadb

We have a clustered quartz configuration with 8 geographically distributed nodes all backed by a single instance of MariaDB. We’ve repeatedly observed the following error :
2018-04-26 00:14:01,186 [censored_QuartzSchedulerThread] ERROR org.quartz.core.ErrorLogger - An error occurred while firing triggers '[Trigger 'DEFAULT.a563a68e-b30d-4dab-b24f-540c5fa0cef8': triggerClass: 'org.quartz.impl.triggers.CronTriggerImpl calendar: 'null' misfireInstruction: 0 nextFireTime: Thu Apr 26 00:11:00 EDT 2018]'
org.quartz.impl.jdbcjobstore.LockException: Failure obtaining db row lock: (conn:83) Lock wait timeout exceeded; try restarting transaction
Query is: SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = 'censored' AND LOCK_NAME = ? FOR UPDATE, parameters ['TRIGGER_ACCESS']
at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:157)
at org.quartz.impl.jdbcjobstore.DBSemaphore.obtainLock(DBSemaphore.java:113)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3792)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.triggersFired(JobStoreSupport.java:2912)
at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:336)
Caused by: java.sql.SQLException: (conn:83) Lock wait timeout exceeded; try restarting transaction
Query is: SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = 'censored' AND LOCK_NAME = ? FOR UPDATE, parameters ['TRIGGER_ACCESS']
at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:150)
at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101)
at org.mariadb.jdbc.internal.util.ExceptionMapper.throwAndLogException(ExceptionMapper.java:77)
at org.mariadb.jdbc.MariaDbStatement.executeQueryEpilog(MariaDbStatement.java:226)
at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:413)
at org.mariadb.jdbc.MariaDbServerPreparedStatement.execute(MariaDbServerPreparedStatement.java:362)
at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeQuery(MariaDbServerPreparedStatement.java:343)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:96)
... 4 common frames omitted
Other nodes also have similar errors:
2018-04-26 00:01:05,106 [censored_QuartzSchedulerThread] ERROR org.quartz.core.ErrorLogger - An error occurred while scanning for the next triggers to fire.
org.quartz.JobPersistenceException: Couldn't acquire next trigger: (conn:75) Lock wait timeout exceeded; try restarting transaction
Query is: UPDATE QRTZ_TRIGGERS SET TRIGGER_STATE = ? WHERE SCHED_NAME = 'censored' AND TRIGGER_NAME = ? AND TRIGGER_GROUP = ? AND TRIGGER_STATE = ?, parameters ['ACQUIRED','12d649da-56fb-47bf-874e-96dca451291f','DEFAULT','WAITING']
at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2860)
at org.quartz.impl.jdbcjobstore.JobStoreSupport$40.execute(JobStoreSupport.java:2759)
at org.quartz.impl.jdbcjobstore.JobStoreSupport$40.execute(JobStoreSupport.java:2757)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3799)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTriggers(JobStoreSupport.java:2756)
at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:272)
Caused by: java.sql.SQLException: (conn:75) Lock wait timeout exceeded; try restarting transaction
And
2018-05-06 06:27:10,438 [QuartzScheduler_censored-schedulerName_ClusterManager] ERROR o.q.impl.jdbcjobstore.JobStoreTX - ClusterManager: Error managing cluster: Failure updating scheduler state when checking-in: (conn:52) Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO QRTZ_SCHEDULER_STATE (SCHED_NAME, INSTANCE_NAME, LAST_CHECKIN_TIME, CHECKIN_INTERVAL) VALUES('censored', ?, ?, ?), parameters ['schedulerName',1525602430435,15000]
org.quartz.JobPersistenceException: Failure updating scheduler state when checking-in: (conn:52) Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO QRTZ_SCHEDULER_STATE (SCHED_NAME, INSTANCE_NAME, LAST_CHECKIN_TIME, CHECKIN_INTERVAL) VALUES('censored', ?, ?, ?), parameters ['schedulerName',1525602430435,15000]
at org.quartz.impl.jdbcjobstore.JobStoreSupport.clusterCheckIn(JobStoreSupport.java:3418)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.doCheckin(JobStoreSupport.java:3265)
at org.quartz.impl.jdbcjobstore.JobStoreSupport$ClusterManager.manage(JobStoreSupport.java:3870)
at org.quartz.impl.jdbcjobstore.JobStoreSupport$ClusterManager.run(JobStoreSupport.java:3907)
Caused by: java.sql.SQLTransactionRollbackException: (conn:52) Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO QRTZ_SCHEDULER_STATE (SCHED_NAME, INSTANCE_NAME, LAST_CHECKIN_TIME, CHECKIN_INTERVAL) VALUES('censored', ?, ?, ?), parameters ['schedulerName',1525602430435,15000]
at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:141)
at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101)
at org.mariadb.jdbc.internal.util.ExceptionMapper.throwAndLogException(ExceptionMapper.java:77)
at org.mariadb.jdbc.MariaDbStatement.executeQueryEpilog(MariaDbStatement.java:226)
at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:413)
at org.mariadb.jdbc.MariaDbServerPreparedStatement.execute(MariaDbServerPreparedStatement.java:362)
at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeUpdate(MariaDbServerPreparedStatement.java:351)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.insertSchedulerState(StdJDBCDelegate.java:2948)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.clusterCheckIn(JobStoreSupport.java:3413)
... 3 common frames omitted
Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO QRTZ_SCHEDULER_STATE (SCHED_NAME, INSTANCE_NAME, LAST_CHECKIN_TIME, CHECKIN_INTERVAL) VALUES('censored', ?, ?, ?), parameters ['schedulerName',1525602430435,15000]
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:1144)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1076)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1031)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executePreparedQuery(AbstractQueryProtocol.java:617)
at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:401)
... 10 common frames omitted
Our Quartz configuration is:
org.quartz.scheduler.instanceName=censored
org.quartz.scheduler.instanceId=AUTO
org.quartz.threadPool.class=org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount=10
org.quartz.jobListener.NAME.class=our.JobListenerImplementation
org.quartz.jobListener.NAME.jobListenerName=JobListener
org.quartz.triggerListener.NAME.class=our.TriggerListenerImplementation
org.quartz.triggerListener.NAME.triggerName=SchedulerTriggerListener
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.tablePrefix=QRTZ_
org.quartz.jobStore.dataSource=quartzDataSource
org.quartz.jobStore.isClustered = true
org.quartz.jobStore.clusterCheckinInterval=15000
org.quartz.jobStore.maxMisfiresToHandleAtATime=20
org.quartz.jobStore.acquireTriggersWithinLock=true
org.quartz.dataSource.quartzDataSource.connectionProvider.class=our.ConnectionProviderImplementation
org.quartz.dataSource.quartzDataSource.driver=org.mariadb.jdbc.Driver
org.quartz.dataSource.quartzDataSource.url=
org.quartz.dataSource.quartzDataSource.user=
org.quartz.dataSource.quartzDataSource.password=
org.quartz.dataSource.quartzDataSource.initialPoolSize=3
org.quartz.dataSource.quartzDataSource.maxActive=50
org.quartz.dataSource.quartzDataSource.maxIdle=20
org.quartz.dataSource.quartzDataSource.minIdle=2
org.quartz.dataSource.quartzDataSource.maxWait=180000
org.quartz.dataSource.quartzDataSource.setMaxOpenPreparedStatements=10
org.quartz.dataSource.quartzDataSource.removeAbandoned=true
org.quartz.dataSource.quartzDataSource.removeAbandonedTimeout=300
org.quartz.dataSource.quartzDataSource.logAbandoned=true
org.quartz.dataSource.quartzDataSource.testWhileIdle=true
org.quartz.dataSource.quartzDataSource.testOnBorrow=true
org.quartz.dataSource.quartzDataSource.testOnReturn=true
org.quartz.dataSource.quartzDataSource.validationQuery=select 1 from dual
org.quartz.dataSource.quartzDataSource.validationQueryTimeout=1
As a result of these issues obtaining a lock, the call to create a new schedule often (but not always) does not complete, and triggers are not being triggered as expected. All this is happening under light load, with no more than a couple of requests to create a schedule per minute.
We are also seeing the following Warning, which may or may not be related:
2018-05-06 06:27:10,434 [QuartzScheduler_censored-schedulerName_ClusterManager] WARN o.q.impl.jdbcjobstore.JobStoreTX - This scheduler instance (schedulerName) is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior.
Anyone experienced this problem while running clustered Quartz with MariaDB? Suggestions would be greatly appreciated.

Related

Emergency unlocking of resources in database

We are having problem on a live production system.
One of the nodes stopped working properly (because of problems with network file system on which is it hosted) and that happened while the channel staging process was in progress.
Because of that really bad timing, the staging process remained unfinished and all locked resources remained that way which prevented editing products or catalogs on live system.
1st solution we have tried is restarting servers node by node, that didn't help.
2nd solution we tried executing SQLs mentioned in this support article:
https://support.intershop.com/kb/index.php/Display/2350K6
The exact SQLs we have executed are below, first one is for deleting from RESOURCELOCK table:
DELETE FROM RESOURCELOCK rl WHERE rl.LOCKID IN (SELECT
resourcelock.lockid
FROM
isresource ,
domaininformation resourcedomain,
process,
basiccredentials ,
domaininformation userdomain,
resourcelock ,
isresource_av
WHERE (
(isresource.domainid = resourcedomain.domainid)
AND (isresource.resourceownerid = process.uuid)
AND (resourcelock.lockid = isresource.uuid)
AND (process.userid = basiccredentials.basicprofileid(+))
AND (basiccredentials.domainid = userdomain.domainid(+))
AND (isresource_av.ownerid(+) = isresource.uuid)
AND (isresource.resourceownerid is not null)
AND (isresource_av.name(+) = 'locknestinglevel')
AND (process.name = 'StagingProcess')
));
And another one for ISRESOURCE table:
UPDATE isresource
SET
resourceownerid=null,
lockexpirationdate=null,
lockcreationdate=null,
lockingthreadid=null
WHERE
RESOURCEOWNERID='QigK85q6scAAAAF9Pf9fHEwf'; //UUID of StagingProcess
Now this has somewhat helped as it allowed for single products to be staged, but here are still two problems remaining:
Products can't be manually locked on live system for editing, when lock icon is clicked the page refreshes but it looks like it is still unlocked, but records are created for each product which is clicked on in ISRESOURCE table altough they are incomplete (the is no RESOURCEOWNERID, lock creation date, or lock expiration date), this can be seen below:
Also processes are tried to be created for product locking but they are all failing or running without end date as can be seen here:
Now for the second problem:
The channel staging cannot be started and it fails with message:
ERROR - Could not lock resources for process 'StagingProcess': Error finding resource lock with lockid: .0kK_SlyFFUAAAFlhGJujvESnull
That resource is MARKETING_Promotion resource:
Both problems started occuring after running above mentioned SQLs and it seems they are related, any advice on how to resolve this situation would be helpfull.
The first SQL that I posted shouldn't have been run:
DELETE FROM RESOURCELOCK rl WHERE rl.LOCKID IN....
The fix was to restore deleted resource locks and just set the lock fields in ISRESOURCE table to null with the second SQL:
UPDATE isresource
SET
resourceownerid=null,
lockexpirationdate=null,
lockcreationdate=null,
lockingthreadid=null
WHERE
RESOURCEOWNERID='QigK85q6scAAAAF9Pf9fHEwf'; //UUID of StagingProcess

How can my Flask app check whether a SQLite3 transaction is in progress?

I am trying to build some smart error messages using the #app.errorhandler (500) feature. For example, my route includes an INSERT command to the database:
if request.method == "POST":
userID = int(request.form.get("userID"))
topicID = int(request.form.get("topicID"))
db.execute("BEGIN TRANSACTION")
db.execute("INSERT INTO UserToTopic (userID,topicID) VALUES (?,?)", userID, topicID)
db.execute("COMMIT")
If that transaction violates a constraint, such as UNIQUE or FOREIGN_KEY, I want to catch the error and display a user-friendly message. To do this, I'm using the Flask #app.errorhandler as follows:
#app.errorhandler(500)
def internal_error(error):
db.execute("ROLLBACK")
return render_template('500.html'), 500
The "ROLLBACK" command works fine if I'm in the middle of a database transaction. But sometimes the 500 error is not related to the db, and in those cases the ROLLBACK statement itself causes an error, because you can't rollback a transaction that never started. So I'm looking for a method that returns a Boolean value that would be true if a db transaction is under way, and false if not, so I can use it to make the ROLLBACK conditional. The only one I can find in the SQLite3 documentation is for a C interface, and I can't get it to work with my Python code. Any suggestions?
I know that if I'm careful enough with my forms and routes, I can prevent 99% of potential violations of db rules. But I would still like a smart error catcher to protect me for the other 1%.
I don't know how transaction works in sqlite but what you are trying to do, you can achieve it by try/except statements
use try/except within the function
try:
db.execute("ROLLBACK")
except:
pass
return render_template('500.html'), 500
Use try/except when inserting data.
from flask import abort
try:
userID = int(request.form.get("userID"))
[...]
except:
db.rollback()
abort(500)
I am not familiar with sqlite errors, if you know what specific error occurs except for that specific error.

Why does Hangfire wait for 15s every few seconds when polling sql server for jobs?

I’ve inherited a system that uses Hangfire with sql server job storage. Usually when a job is scheduled to be run immediately we notice it takes a few seconds before it’s triggered.
Looking at SQL Profiler when running in my dev environment, the SQL run against Hangfire db looks like this -
exec sp_executesql N'delete top (1) JQ
output DELETED.Id, DELETED.JobId, DELETED.Queue
from [HangFire].JobQueue JQ with (readpast, updlock, rowlock, forceseek)
where Queue in (#queues1) and (FetchedAt is null or FetchedAt < DATEADD(second, #timeout, GETUTCDATE()))',N'#queues1 nvarchar(4000),#timeout float',#queues1=N'MYQUEUENAME_master',#timeout=-1800
-- Exactly the same SQL as above is executed about 6 times/second for about 3-4 seconds,
-- then nothing for about 2 seconds, then:
exec sp_getapplock #Resource=N'HangFire:recurring-jobs:lock',#DbPrincipal=N'public',#LockMode=N'Exclusive',#LockOwner=N'Session',#LockTimeout=5000
exec sp_getapplock #Resource=N'HangFire:locks:schedulepoller',#DbPrincipal=N'public',#LockMode=N'Exclusive',#LockOwner=N'Session',#LockTimeout=5000
exec sp_executesql N'select top (#count) Value from [HangFire].[Set] with (readcommittedlock, forceseek) where [Key] = #key and Score between #from and #to order by Score',N'#count int,#key nvarchar(4000),#from float,#to float',#count=1000,#key=N'recurring-jobs',#from=0,#to=1596053348
exec sp_executesql N'select top (#count) Value from [HangFire].[Set] with (readcommittedlock, forceseek) where [Key] = #key and Score between #from and #to order by Score',N'#count int,#key nvarchar(4000),#from float,#to float',#count=1000,#key=N'schedule',#from=0,#to=1596053348
exec sp_releaseapplock #Resource=N'HangFire:recurring-jobs:lock',#LockOwner=N'Session'
exec sp_releaseapplock #Resource=N'HangFire:locks:schedulepoller',#LockOwner=N'Session'
-- Then nothing is executed for about 8-10 seconds, then:
exec sp_executesql N'update [HangFire].Server set LastHeartbeat = #now where Id = #id',N'#now datetime,#id nvarchar(4000)',#now='2020-07-29 20:09:19.097',#id=N'ps12345:19764:fe362d1a-5ee4-4d97-b70d-134fdfab2b87'
-- Then about 500ms-2s later I get
exec sp_executesql N'delete top (1) JQ ... -- i.e. Same as first query
The update LastHeartbeat query is only there every second time (from just a brief inspection, maybe that’s not exactly right).
It looks like there’s at least 3 threads running the DELETE query against JQ, since I can see several RPC:Starting before the RPC:Completed, suggesting they’re being executed in parallel instead of sequentially.
I don’t know if that’s normal but seems weird as I thought we had just one ‘consumer’ of the jobs.
I only have one Queue in my dev environment, although in live we’d have 20-50 I’d guess.
Any suggestions on where I should look for the configuration that’s causing:
a) the 8-10s pause between checking for jobs
b) the number of threads that are checking for jobs - it seems like I have too many
After writing this I realised we were using an old version so I upgraded from 1.5.x to 1.7.12, upgraded the database, and changed the startup config to this:
app.UseHangfireDashboard();
GlobalConfiguration.Configuration
.UseSqlServerStorage(connstring, new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
UseRecommendedIsolationLevel = true,
PrepareSchemaIfNecessary = true, // Default value: true
EnableHeavyMigrations = true // Default value: false
})
.UseAutofacActivator(_container);
JobActivator.Current = new AutofacJobActivator(_container);
but if anything the problem is now worse. Or the same but faster: 20 calls to delete top (1) JQ... happen within about 1s now, then the other queries, then a 15s wait, then it starts all over again.
To be clear, the main problem is that if any jobs are added during that 15s delay then it'll take the remainder of that 15s before my job is executed. A second problem I think is it's hitting SQL Server more than needed: 20 times in a second is a bit much, for my needs at least.
(Cross-posted to hangfire forums)
If you don't set QueuePollInterval then Hangfire with sql server storage defaults to polling every 15s. So the first thing to do if you have this problem is set QueuePollInterval to something smaller, e.g. 1s.
But in my case even when I set that it wasn't having any effect. The reason for that was calling app.UseHangfireServer() before I was calling GlobalConfiguration.Configuration.UseSqlServerStorage() with the SqlServerStorageOptions.
When you call app.UseHangfireServer() it uses the current value of JobStorage.Current. My code had set that:
var storage = new SqlServerStorage(connstring);
JobStorage.Current = storage;
then later called
app.UseHangfireServer()
then later called
GlobalConfiguration.Configuration
.UseSqlServerStorage(connstring, new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
UseRecommendedIsolationLevel = true,
PrepareSchemaIfNecessary = true,
EnableHeavyMigrations = true
})
Reordering it to use SqlServerStorageOptions before app.UseHangfireServer() means the SqlServerStorageOptions take effect.
I would suggest checking the Hangfire BackgroundJobServerOptions to see what polling interval you have set up there. This will define the time before the hangfire server will check to see if there are any jobs in queue to execute.
From the documentation
Hangfire Docs
Hangfire Server periodically checks the schedule to enqueue scheduled jobs to their queues, allowing workers to
execute them. By default, check interval is equal to 15 seconds, but you can change it by setting the SchedulePollingInterval property on the options you pass to the BackgroundJobServer constructor:
var options = new BackgroundJobServerOptions
{
SchedulePollingInterval = TimeSpan.FromMinutes(1)
};
var server = new BackgroundJobServer(options);

How can I log sql execution results in airflow?

I use airflow python operators to execute sql queries against a redshift/postgres database. In order to debug, I'd like the DAG to return the results of the sql execution, similar to what you would see if executing locally in a console:
I'm using psycop2 to create a connection/cursor and execute the sql. Having this logged would be extremely helpful to confirm the parsed parameterized sql, and confirm that data was actually inserted (I have painfully experiences issues where differences in environments caused unexpected behavior)
I do not have deep knowledge of airflow or the low level workings of the python DBAPI, but the pscyopg2 documentation does seem to refer to some methods and connection configurations that may allow this.
I find it very perplexing that this is difficult to do, as I'd imagine it would be a primary use case of running ETLs on this platform. I've heard suggestions to simply create additional tasks that query the table before and after, but this seems clunky and ineffective.
Could anyone please explain how this may be possible, and if not, explain why? Alternate methods of achieving similar results welcome. Thanks!
So far I have tried the connection.status_message() method, but it only seems to return the first line of the sql and not the results. I have also attempted to create a logging cursor, which produces the sql, but not the console results
import logging
import psycopg2 as pg
from psycopg2.extras import LoggingConnection
conn = pg.connect(
connection_factory=LoggingConnection,
...
)
conn.autocommit = True
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.addHandler(logging.StreamHandler(sys.stdout))
conn.initialize(logger)
cur = conn.cursor()
sql = """
INSERT INTO mytable (
SELECT *
FROM other_table
);
"""
cur.execute(sql)
I'd like the logger to return something like:
sql> INSERT INTO mytable (
SELECT ...
[2019-07-25 23:00:54] 912 rows affected in 4 s 442 ms
Let's assume you are writing an operator that uses postgres hook to do something in sql.
Anything printed inside an operator is logged.
So, if you want to log the statement, just print the statement in your operator.
print(sql)
If you want to log the result, fetch the result and print the result.
E.g.
result = cur.fetchall()
for row in result:
print(row)
Alternatively you can use self.log.info in place of print, where self refers to the operator instance.
Ok, so after some trial and error I've found a method that works for my setup and objective. To recap, my goal is to run ETL's via python scripts, orchestrated in Airflow. Referring to the documentation for statusmessage:
Read-only attribute containing the message returned by the last command:
The key is to manage logging in context with transactions executed on the server. In order for me to do this, I had to specifically set con.autocommit = False, and wrap SQL blocks with BEGIN TRANSACTION; and END TRANSACTION;. If you insert cur.statusmessage directly following a statement that deletes or inserts, you will get a response such as 'INSERT 0 92380'.
This still isn't as verbose as I would prefer, but it is a much better than nothing, and is very useful for troubleshooting ETL issues within Airflow logs.
Side notes:
- When autocommit is set to False, you must explicitly commit transactions.
- It may not be necessary to state transaction begin/end in your SQL. It may depend on your DB version.
con = psy.connect(...)
con.autocommit = False
cur = con.cursor()
try:
cur.execute([some_sql])
logging.info(f"Cursor statusmessage: {cur.statusmessage})
except:
con.rollback()
finally:
con.close()
There is some buried functionality within psycopg2 that I'm sure can be utilized, but the documentation is pretty thin and there are no clear examples. If anyone has suggestions on how to utilize things such as logobjects, or returning join PID to somehow retrieve additional information.

tokuDb setting for time to timesout a statement

In a mariadb table with tokuDb engine; I am ecountering the below error - either on a delete statement; whilst there is a background insert load, and vice versa.
Lock wait timeout exceeded; try restarting transaction
Does tokuDb user a setting that can be updated to determine how long it waits before it timesout a statement?
I couldn't find the answer in tokuDb documents. The maria varaible is still at its default value: 'lock_wait_timeout', '31536000' -- but my timeout is coming back in quite a bit less than a year. The timeouts are coming during a load test; and I haven't spotted a time value in the error - but it feels like a few seconds; to minutes at the most before the timeout is thrown.
Thanks,
Brent
TokuDB has its own timeout variable, tokudb_lock_timeout, it is measured in milliseconds and has the default value 4000 (4 seconds), which fits your observations. It can be modified both on the session and global levels, and can also be configured in the .cnf file.
Remember that when you set a global value for a variable which has both scopes, it only affects future sessions (connections), but not the existing ones.
-- for the current session
SET SESSION tokudb_lock_timeout = 60000;
-- for future sessions
SET GLOBAL tokudb_lock_timeout = 60000;

Resources