I am trying to trigger airflow from bamboo and trying to keep it running it in the foreground so bamboo can know when is the execution of trigger is complete.
Can someone suggest how can I make bamboo wait for execution of airflow trigger on remote server?
Is it possible to capture the result if airflow dag execution was successful so bamboo can mark the build fail or success?
After triggering airflow dag, capture the execution_date and keep checking the dag_status after every few seconds, and return if status is 'success' or 'failed'.
here is the script I wrote:
try_log.py
import re
import os
import sys
import time
f = open('log.txt', 'r')
outs = f.readlines()
regex = r"Created.*?#(.*?)\s(.*?)manual"
matches = re.findall(regex, str(outs), re.MULTILINE)
dag_exec_date = matches[0][1].strip()[0:-1]
status = 'running'
while status not in ['failed', 'success']:
stream = os.popen("airflow dag_state dag_id '{}'".format(dag_exec_date))
output = stream.read()
lines = output.split('\n')
status = lines[-2].strip()
if status not in ['failed', 'success']:
time.sleep(60)
else:
print (status)
exit()
the following below is added in Bamboo SSH task to execute the airflow trigger and wait for status completion
airflow trigger_dag ${bamboo.dag_id} > /path/log.txt
outputs=$(python try_log.py)
if [ "$outputs" = "failed" ]; then
echo "status failed. Exiting and failing build"
exit 125
fi
Related
I'm using arflow 2.2.1.
I have a dag, that send alert message to slack, when its failed.
dag:
from alert.alert_slack import alert_slack
...
default_args = {
'owner': 'name',
'start_date': dt.datetime(2022, 11, 18),
'retries': 2,
'retry_delay': dt.timedelta(seconds=10),
'on_failure_callback': alert_slack('myslackid')
}
alert_slack:
from airflow.providers.slack.operators.slack import SlackAPIPostOperator
def alert_slack(channel: str):
def failure(context):
last_task = context.get('task_instance')
task_name = last_task.task_id
dag_name = last_task.dag_id
log_link = f"<{last_task.log_url}|{task_name}>"
error_message = context.get('exception') or context.get('reason')
execution_date = context.get('execution_date')
title = f':red_circle: DAG Failed.'
msg_parts = {
'*Dag*': dag_name,
'*Owner*': owner,
'*Task*': task_name,
'*Log*': log_link,
'*Error*': error_message,
'*Execution date*': execution_date
}
msg = "\n".join([title,
*[f"{key}: {value}" for key, value in msg_parts.items()]
]).strip()
SlackAPIPostOperator(
task_id="alert",
slack_conn_id="slack_alert",
text=msg,
channel=channel,
).execute(context=None)
return failure
I need to add an url of failed dag from airflow site (like https://myairflow.dev/code?dag_id=my_test_dag) to alert message
I tried to add the following code:
dagcode = context.get('dag_code')
dag_url = dagcode.source_code
dagcode return None.
But I think it's completely wrong and I don't know where to look for this url.
Can anyone please help me find where dag's url is and how do i pass it to alert message?
So the log link that you are including in the alert should take you to the Airflow UI. It will be the page with the log for that failed DAG run.
If you want a specific page in the UI for that DAG, say you want the 'Grid' view of a particular DAG like the picture I attach, you can simply hardcode the URL and include this URL in your slack message.
I.e.
last_task = context.get('task_instance')
dag_id = last_task.dag_id
# airflow_server_id is whatever the address
# to your Airflow webserver (e.g. myairflow.dev).
base_url = 'https://{airflow_server_id}/dags/{dag_id}/grid'
msg_parts = {
'*Dag*': dag_id,
'*Link to Dag grid page*': base_url
}
.
.
.
Or, say you want the Code UI page for this DAG,
last_task = context.get('task_instance')
dag_id = last_task.dag_id
# airflow_server_id is whatever the address
# to your Airflow webserver (e.g. myairflow.dev).
base_url = 'https://{airflow_server_id}/dags/{dag_id}/code'
msg_parts = {
'*Dag*': dag_id,
'*Link to Dag code page*': base_url
}
.
.
.
I will say, it is odd to want to include a link to a DAG page instead of the page of the logs of the failed DAG run. Usually, the person responding to slack alert would want to see more information about a DAG run NOT a DAG in itself. So if it were me, including the log URL of failed DAG run like you're already doing should be enough.
[![grid_dag_ui][1]][1]
[1]: https://i.stack.imgur.com/1IMII.png
I am using Airflow v2.2.5.
I want to send email notification when a dag is timeout.
So far I am able to send email for task level failure .
Please help.
The code you posted should already satisfy your request.
When the dagrun_timeout is reached the DAG is marked as failed, hence the on_failure_callback is called.
In the callback you can access the context['reason'] field to check if the failure is due to the timeout or another reason:
dag_timed_out = context['reason'] == 'timed_out'
Here is a full example:
from time import sleep
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
def printx(v):
print(v)
with open("/tmp/SO_74153563.log", "a") as f:
f.write(v + "\n")
def dag_callback(ctx):
printx("DAG Failure.\nReason: " + ctx['reason'])
timed_out = ctx['reason'] == 'timed_out'
printx("Timed out: " + str(timed_out))
def long_running_job():
printx("Sleeping...")
sleep(40)
printx("Sleeped")
with DAG(
"SO_74153563",
start_date=datetime.now() - timedelta(days = 2),
schedule_interval=None,
dagrun_timeout=timedelta(seconds = 15),
on_failure_callback=dag_callback
) as dag:
task_1 = PythonOperator(
task_id="task_1",
python_callable=long_running_job
)
The task sleeps for 40 seconds while the DAG has a timeout of 15 seconds, so it will fail. The output will be:
DAG Failure.
Reason: timed_out
Timed out: True
The only difference from your callback is that now it is defined directly on the DAG.
I want to get the status of a task from an external DAG. I have the same tasks running in 2 different DAGs based on some conditions. So, I want to check the status of this task in DAG2 from DAG1. If the task status is 'running' in DAG2, then I will skip this task in DAG1.
I tried using:
dag_runs = DagRun.find(dag_id=dag_id,execution_date=exec_dt)
for dag_run in dag_runs:
dag_run.state
I couldn't figure out if we can get task status using DagRun.
If I use TaskDependencySensor, the DAG will have to wait until it finds the allowed_states of the task.
Is there a way to get the current status of a task in another DAG?
I used below code to get the status of a task from another DAG:
from airflow.api.common.experimental.get_task_instance import get_task_instance
def get_dag_state(execution_date, **kwargs):
ti = get_task_instance('dag_id', 'task_id', execution_date)
task_status = ti.current_state()
return task_status
dag_status = BranchPythonOperator(
task_id='dag_status',
python_callable=get_dag_state,
dag=dag
)
More details can be found here
I'm testing out a DAG that I used to have running on Google Composer without error, on a local install of Airflow. The DAG spins up a Google Dataproc cluster, runs a Spark job (JAR file located on a GS bucket), then spins down the cluster.
The DataProcSparkOperator task fails immediately each time with the following error:
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://dataproc.googleapis.com/v1beta2/projects//regions/global/jobs:submit?alt=json returned "Invalid resource field value in the request.">
It looks as though the URI is incorrect/incomplete, but I am not sure what is causing it. Below is the meat of my DAG. All the other tasks execute without error, and the only difference is the DAG is no longer running on Composer:
default_dag_args = {
'start_date': yesterday,
'email': models.Variable.get('email'),
'email_on_failure': True,
'email_on_retry': True,
'retries': 0,
'retry_delay': dt.timedelta(seconds=30),
'project_id': models.Variable.get('gcp_project'),
'cluster_name': 'susi-bsm-cluster-{{ ds_nodash }}'
}
def slack():
'''Posts to Slack if the Spark job fails'''
text = ':x: The DAG *{}* broke and I am not smart enough to fix it. Check the StackDriver and DataProc logs.'.format(DAG_NAME)
s.post_slack(SLACK_URI, text)
with DAG(DAG_NAME, schedule_interval='#once',
default_args=default_dag_args) as dag:
# pylint: disable=no-value-for-parameter
delete_existing_parquet = bo.BashOperator(
task_id = 'delete_existing_parquet',
bash_command = 'gsutil rm -r {}/susi/bsm/bsm.parquet'.format(GCS_BUCKET)
)
create_dataproc_cluster = dpo.DataprocClusterCreateOperator(
task_id = 'create_dataproc_cluster',
num_workers = num_workers_override or models.Variable.get('default_dataproc_workers'),
zone = models.Variable.get('gce_zone'),
init_actions_uris = ['gs://cjones-composer-test/susi/susi-bsm-dataproc-init.sh'],
trigger_rule = trigger_rule.TriggerRule.ALL_DONE
)
run_spark_job = dpo.DataProcSparkOperator(
task_id = 'run_spark_job',
main_class = MAIN_CLASS,
dataproc_spark_jars = [MAIN_JAR],
arguments=['{}/susi.conf'.format(CONF_DEST), DATE_CONST]
)
notify_on_fail = po.PythonOperator(
task_id = 'output_to_slack',
python_callable = slack,
trigger_rule = trigger_rule.TriggerRule.ONE_FAILED
)
delete_dataproc_cluster = dpo.DataprocClusterDeleteOperator(
task_id = 'delete_dataproc_cluster',
trigger_rule = trigger_rule.TriggerRule.ALL_DONE
)
delete_existing_parquet >> create_dataproc_cluster >> run_spark_job >> delete_dataproc_cluster >> notify_on_fail
Any assistance with this would be much appreciated!
Unlike the DataprocClusterCreateOperator, the DataProcSparkOperator does not take the project_id as a parameter. It gets it from the Airflow connection (if you do not specify the gcp_conn_id parameter, it defaults to google_cloud_default). You have to configure your connection.
The reason you don't see this while running DAG in Composer is that Composer configures the google_cloud_default connection.
Is it possible to setup Nagios alerts for airflow dags?
In case the dag is failed, I need to alert the respective groups.
You can add an "on_failure_callback" to any task which will call an arbitrary failure handling function. In that function you can then send an error call to Nagios.
For example:
dag = DAG(dag_id="failure_handling",
schedule_interval='#daily')
def handle_failure(context):
# first get useful fields to send to nagios/elsewhere
dag_id = context['dag'].dag_id
ds = context['ds']
task_id = context['ti'].task_id
# instead of printing these out - you can send these to somewhere else
logging.info("dag_id={}, ds={}, task_id={}".format(dag_id, ds, task_id))
def task_that_fails(**kwargs):
raise Exception("failing test")
task_to_fail = PythonOperator(
task_id='python_task_to_fail',
python_callable=task_that_fails,
provide_context=True,
on_failure_callback=handle_failure,
dag=dag)
If you run a test on this:
airflow test failure_handling task_to_fail 2018-08-10
You get the following in your log output:
INFO - dag_id=failure_handling, ds=2018-08-10, task_id=task_to_fail