In Airflow I know you can automatically send slack notification with on on_success_callback and on_failure_callback which in my case have worked properly.
In my use case, I have an ETL that will raise AirflowSkipException if the current day data is empty which worked properly. But this will send a success notification to my slack
I was wondering if there is something like on_skip_callback or a way to send notification that my DAG are skipped for the current day to my slack.
Any helps would be appreciated.Thanks
Edit : Added Code References for my ETL. Datapoints are acquired from database and it can change from day to day, sometimes if there are no data to be processed then the datapoints will be empty and vice versa.
def ETL_function():
# Retrieve data code
....
# Validation to check if ETL data is empty
if not datapoints:
print("OUTPUT LOG : ETL Data not found/empty")
print("OUTPUT LOG : ETL skipped due to empty data, Skipping ETL ...... ")
raise AirflowSkipException
# return False
else :
print("OUTPUT LOG : ETL Data found")
print("OUTPUT LOG : ETL continued due data available , Running ETL ...... ")
# return True
# ETL Process code
....
ETL_function_Task = PythonOperator(
task_id='ETL_function',
provide_context=True,
python_callable=fleet_behavior_idling,
on_success_callback=task_success_slack_alert,
dag=dag,
)
Hi #Anindhito Irmandharu,
You can use the ShortCircuitOperator which is derived from the PythonOperator for this purpose.
def ETL_function():
...
# Validation to check if ETL data is empty
if not datapoints:
print("OUTPUT LOG : ETL Data not found/empty")
print("OUTPUT LOG : ETL skipped due to empty data, Skipping ETL ...... ")
return False
else :
print("OUTPUT LOG : ETL Data found")
print("OUTPUT LOG : ETL continued due data available , Running ETL ...... ")
return True
ETL_function_Task = ShortCircuitOperator(
task_id="ETL_function",
python_callable= ETL_function,
provide_context=True,
dag=dag,
)
ETL_function_Task >> downstream_Tasks
NOTE: Your downstream task will get skipped, but this task 'ETL_function_Task' will go to success state. I'm not exactly sure why you need to send slack notification for tasks executing successfully.
Though you can easily alter the
on_success_callback=task_success_slack_alert
based on your requirement. Write a new task_skipped_slack_alert in the slack_hook you are using.
Related
I run a python script through Airflow.
The script gets a source file from S3. I want to set the script to mark the task as 'skipped' when there is no file in the bucket.
from airflow.exceptions import AirflowSkipException
if len(file_list) == 0 : #This means there's no file in the bucket. I omitted some codes.
print('The target file does not exist. Skipping the task...')
raise AirflowSkipException
However, Airflow still marks this task 'failure' when the file doesn't exist.
Am I missing anything? Should I add something to DAG too?
I think you should call raise AirflowSkipException() with the round parentheses at the end, otherwise you are not raising an instance of the AirflowSkipException class, but the class itself which I guess is creating the error that sets the task to failed.
I am having a tough time in figuring out how to find the failed task for the same dag run running twice on same day(same execution day).
Consider an example when a dag with dag_id=1 has failed on the first run (due to any reason lets say connection timeout maybe) and task got failed. TaskInstance table will contain the entry of the failed task when we try to query it. GREAT!!
But, If I re-run the same dag(note that dag_id is still 1) then in the last task(it has the rule of ALL_DONE so irrespective of the whether upstream task was failed or was successful it will be executed) I want to calculate the number of tasks failed in the current dag_run ignoring the previous dag_runs. I came across dag_run id which could be useful if we can relate it to TaskInstance but I could not. Any suggestions/help is appreciated.
In Airflow 1.10.x the same result can be achieved by much simpler code that avoids touching ORM directly.
from airflow.utils.state import State
def your_python_operator_callable(**context):
tis_dagrun = context['ti'].get_dagrun().get_task_instances()
failed_count = sum([True if ti.state == State.FAILED else False for ti in tis_dagrun])
print(f"There are {failed_count} failed tasks in this execution"
The one unfortunate problem is that context['ti'].get_dagrun() does not return instance of DAGRun when running test of a single task from CLI. In the effect, manual testing of that single task will fail but the standard run will work as expected.
You could create a PythonOperator task which queries the Airflow database to find the information you're looking for. This has the added benefit of passing along the information you need to query for the data you want:
from contextlib import closing
from airflow import models, settings
from airflow.utils.state import State
def your_python_operator_callable(**context):
with closing(settings.Session()) as session:
print("There are {} failed tasks in this execution".format(
session.query(
models.TaskInstance
).filter(
models.TaskInstance.dag_id == context["dag"].dag_id,
models.TaskInstance.execution_date == context["execution_date"],
models.TaskInstance.state == State.FAILED).count()
)
Then add the task to your DAG with a PythonOperator.
(I have not tested the above, but hopefully will send you on the right path)
I have made a very simple DAG that looks like this:
from datetime import datetime
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
cleanup_command = "/home/ubuntu/airflow/dags/scripts/log_cleanup/log_cleanup.sh "
dag = DAG(
'log_cleanup',
description='DAG for deleting old logs',
schedule_interval='10 13 * * *',
start_date=datetime(2018, 3, 30),
catchup=False,
)
t1 = BashOperator(task_id='cleanup_task', bash_command=cleanup_command, dag=dag)
The task finishes successfully but despite of this the DAG remains in "running" status. Any idea what could cause this. The screenshot below show the issue with the DAG remaining running. The earlier runs are only finished because I manually mark status as success. [Edit: I had originally written: "The earlier runs are only finished because I manually set status to running."]
The earlier runs are only finished because I manually set status to running.
Are you sure your scheduler is running? You can start it with $ airflow scheduler, and check the scheduler CLI command docs You shouldn't have to manually set tasks to running.
Your code here seems fine. One thing you might try is restarting your scheduler.
In the Airflow metadata database, DAG run end state is disconnected from task run end state. I've seen this happen before, but usually it resolves itself on the scheduler's next loop when it realizes all of the tasks in the DAG run have reached a final state (success, failed, or skipped).
Are you running the LocalExecutor, SequentialExecutor, or something else here?
I scheduled a data extraction with an Xquery query on ML 8.0.6 using the "scheduler tasks".
My Xquery query (this query works if I copy/paste it in the ML web console and I get a file available on AWS S3):
xdmp:save("s3://XX.csv",let $nl := "
"
return
document {
for $book in collection("books")/books
return (root($book)/bookId||","||
$optin/updatedDate||$nl
)
})
My scheduled task :
Task enabled : yes
Task path : /home/bob/extraction.xqy
task root : /
task type: hourly
task period : 1
task start time: 8 past the hour
task database : mydatabase
task modules : file system
task user : admin
task host: XX
task priority : higher
Unfortunately, my script is not executed because no file is generated on AWS S3 (the storage used)and I do not have any logs.
Any idea to :
1/debug a job in the scheduler task?
2/ See the job running at the expected time ?
Thanks,
Romain.
First, I would try take a look at the ErrorLog.txt because it will probably show you where to look for the problem.
xdmp:filesystem-file(concat(xdmp:data-directory(),"/","Logs","/","ErrorLog.txt"))
Where is the script located logically: Has it been uploaded to the content database, modules database, or ./MarkLogic/Modules directory?
If this is a cluster, have you specified which host it's running on? If so, and using the filesystem modules, then ensure the script exists in the ./MarkLogic/Modules directory of the specified host. Inf not, and using the filesystem modules, then ensure the script exists in the ./MarkLogic/Modules directory of all the hosts of the cluster.
As for seeing the job running, you can check the http://servername:8002/dashboard/ and take a look at the Query Execution tab see the running processes, or you can get a snapshot of the process by taking a look at the Status page of the task server (Configure > Groups > [group name] > Task Server: Status and click show more button)
While using py.test, I have some tests that run fine with SQLite but hang silently when I switch to Postgresql. How would I go about debugging something like that? Is there a "verbose" mode I can run my tests in, or set a breakpoint ? More generally, what is the standard plan of attack when pytest stalls silently? I've tried using the pytest-timeout, and ran the test with $ py.test --timeout=300, but the tests still hang with no activity on the screen whatsoever
I ran into the same SQLite/Postgres problem with Flask and SQLAlchemy, similar to Gordon Fierce. However, my solution was different. Postgres is strict about table locks and connections, so explicitly closing the session connection on teardown solved the problem for me.
My working code:
#pytest.yield_fixture(scope='function')
def db(app):
# app is an instance of a flask app, _db a SQLAlchemy DB
_db.app = app
with app.app_context():
_db.create_all()
yield _db
# Explicitly close DB connection
_db.session.close()
_db.drop_all()
Reference: SQLAlchemy
To answer the question "How would I go about debugging something like that?"
Run with py.test -m trace --trace to get trace of python calls.
One option (useful for any stuck unix binary) is to attach to process using strace -p <PID>. See what system call it might be stuck on or loop of system calls. e.g. stuck calling gettimeofday
For more verbose py.test output install pytest-sugar. pip install pytest-sugar And run test with pytest.py --verbose . . .
https://pypi.python.org/pypi/pytest-sugar
I had a similar problem with pytest and Postgresql while testing a Flask app that used SQLAlchemy. It seems pytest has a hard time running a teardown using its request.addfinalizer method with Postgresql.
Previously I had:
#pytest.fixture
def db(app, request):
def teardown():
_db.drop_all()
_db.app = app
_db.create_all()
request.addfinalizer(teardown)
return _db
( _db is an instance of SQLAlchemy I import from extensions.py )
But if I drop the database every time the database fixture is called:
#pytest.fixture
def db(app, request):
_db.app = app
_db.drop_all()
_db.create_all()
return _db
Then pytest won't hang after your first test.
Not knowing what is breaking in the code, the best way is to isolate the test that is failing and set a breakpoint in it to have a look. Note: I use pudb instead of pdb, because it's really the best way to debug python if you are not using an IDE.
For example, you can the following in your test file:
import pudb
...
def test_create_product(session):
pudb.set_trace()
# Create the Product instance
# Create a Price instance
# Add the Product instance to the session.
...
Then run it with
py.test -s --capture=no test_my_stuff.py
Now you'll be able to see exactly where the script locks up, and examine the stack and the database at this particular moment of execution. Otherwise it's like looking for a needle in a haystack.
I just ran into this problem for quite some time (though I wasn't using SQLite). The test suite ran fine locally, but failed in CircleCI (Docker).
My problem was ultimately that:
An object's underlying implementation used threading
The object's __del__ normally would end the threads
My test suite wasn't calling __del__ as it should have
I figured I'd add how I figured this out. Other answers suggest these:
Found usage of pytest-timeout didn't help, the test hung after completion
Invoked via pytest --timeout 5
Versions: pytest==6.2.2, pytest-timeout==1.4.2
Running pytest -m trace --trace or pytest --verbose yielded no useful information either
I ended up having to comment literally everything out, including:
All conftest.py code and test code
Slowly uncommented/re-commented regions and identified the root cause
Ultimate solution: using a factory fixture to add a finalizer to call __del__
In my case the Flask application did not check if __name__ == '__main__': so it executed app.start() when that was not my intention.
You can read many more details here.
In my case diff worked very slow on comparing 4 MB data when assert failed.
with open(path, 'rb') as f:
assert f.read() == data
Fixed by:
with open(path, 'rb') as f:
eq = f.read() == data
assert eq