In airflow, I'm trying to make a function that is dedicated to generate DAGs in a file:
dynamic_dags.py:
def generate_dag(name):
with DAG(
dag_id=f'dag_{name}',
default_args=args,
start_date=days_ago(2),
schedule_interval='5 5 * * *',
tags=['Test'],
catchup=False
) as dag:
dummy_task=DummyOperator(
task_id="dynamic_dummy_task",
dag=dag
)
return dag
Then in another file I'm trying to import the dags from a separate file:
load_dags.py:
from dynamic_dag import generate_dag
globals()["Dynamic_DAG_A"] = generate_dag('A')
However, the dags are not shown up on web UI.
But if I do them in a single file as below code, it will work:
def generate_dag(name):
with DAG(
dag_id=f'dag_{name}',
default_args=args,
start_date=days_ago(2),
schedule_interval='5 5 * * *',
tags=['Test'],
catchup=False
) as dag:
dummy_task=DummyOperator(
task_id="dynamic_dummy_task",
dag=dag
)
return dag
globals()["Dynamic_DAG_A"] = generate_dag('A')
I'm wondering why doing it in two separate files doesn't work.
I think if you are using Airflow 1.10, then the DAG files should contain DAG and airlfow:
https://airflow.apache.org/docs/apache-airflow/1.10.15/concepts.html?highlight=airflowignore#dags
When searching for DAGs, Airflow only considers python files that contain the strings “airflow” and “DAG” by default. To consider all python files instead, disable the DAG_DISCOVERY_SAFE_MODE configuration flag.
In Airflow 2 it's been changed (slightly - dag is case-insensitive):
https://airflow.apache.org/docs/apache-airflow/2.2.2/concepts/dags.html
When searching for DAGs inside the DAG_FOLDER, Airflow only considers Python files that contain the strings airflow and dag (case-insensitively) as an optimization.
To consider all Python files instead, disable the DAG_DISCOVERY_SAFE_MODE configuration flag.
I think you simply miss 'airflow' in your load_dags.py. You can add it wherever - including comments.
Related
Dear Apache Airflow experts,
I am currently trying to make the parallel execution of Apache Airflow 2.3.x DAGs configurable via the DAG run config.
When executing below code the DAG creates two tasks - for the sake of my question it does not matter what the other DAG does.
Because max_active_tis_per_dag is set to 1, the two tasks will be run one after another.
What I want to achieve: I want to provide the result of get_num_max_parallel_runs (which checks the DAG config, if no value is present it falls back to 1 as default) to max_active_tis_per_dag.
I would appreciate any input on this!
Thank you in advance!
from airflow import DAG
from airflow.decorators import task
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
from datetime import datetime
with DAG(
'aaa_test_controller',
schedule_interval=None,
start_date=datetime(2021, 1, 1),
catchup=False
) as dag:
#task
def get_num_max_parallel_runs(dag_run=None):
return dag_run.conf.get("num_max_parallel_runs", 1)
trigger_dag = TriggerDagRunOperator.partial(
task_id="trigger_dependent_dag",
trigger_dag_id="aaa_some_other_dag",
wait_for_completion=True,
max_active_tis_per_dag=1,
poke_interval=5
).expand(conf=['{"some_key": "some_value_1"}', '{"some_key": "some_value_2"}'])
I'm facing some issues trying to set up a basic DAG file inside the Airflow (but also I have other two files).
I'm using the LocalExecutor through the Ubuntu and saved my files at "C:\Users\tdamasce\Documents\workspace" with the dag and log file inside it.
My script is
# step 1 - libraries
from email.policy import default
from airflow import DAG
from datetime import datetime, timedelta
from airflow.operators.dummy_operator import DummyOperator
# step 2
default_args = {
'ownwer': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'retries':0
}
# step 3
dag = DAG(
dag_id='DAG-1',
default_args=default_args,
catchup=False,
schedule_interval=timedelta(minutes=5)
)
# step 4
start = DummyOperator(
task_id='start',
dag=dag
)
end = DummyOperator(
task_id='end',
dag=dag
)
My DAG stays like that:
Please, let me know if any add info is needed
As per your updated Question , I can see that you place the DAgs under a directory
"C:\Users\tdamasce\Documents\workspace" with the dag and log file
inside it.
you need to add dags to dags_folder (specified in airflow.cfg. By default it's $AIRFLOW_HOME/dags subfolder). See if your AIRFLOW_HOME variable and you should found a dag folder there.
you can also check airflow list_dags - this will list out all the dags,
Still you are not able to get that in the UI , then restart the servers.
I am using airflow v2.0 on windows 10 WSL (Ubuntu 20.04).
The warning message is :
/home/jainri/.local/lib/python3.8/site-packages/airflow/models/dag.py:1342: PendingDeprecationWarning: The requested task could not be added to the DAG because a task with task_id create_tag_template_field_result is already in the DAG. Starting in Airflow 2.0, trying to overwrite a task will raise an exception.
warnings.warn(
Done.
Due to this warning, the dags showing in web UI are also some example dags included with apache airflow. I have setup **AIRFLOW_HOME** and it also picks up dags from there. But the list of example dags also displayed. I have posted the image of WEB UI also.
WebUI
This is the dag below that I am trying to run:
import datetime
import logging
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
#
# TODO: Define a function for the python operator to call
#
def greet():
logging.info("Hello Rishabh!!")
dag = DAG(
'lesson1.demo1',
start_date = datetime.datetime.now()
end_date
)
#
# TODO: Define the task below using PythonOperator
#
greet_task = PythonOperator(
task_id='greet_task',
python_callable=greet,
dag=dag
)
Also, the main issue is like the list of dags showing in webUI is some example dags. That shows up a huge list along with my own dags. Which makes it cumbersome to look for my own dags.
I found the issue, the error you are seeing is because of airflow/example_dags/example_complex.py (one of the example_dags) that is shipped with Airflow.
Disable loading of example_dags by setting AIRFLOW__CORE__LOAD_EXAMPLES=False as an environment variable or set [core] load_examples = False in airflow.cfg (docs).
I have created a python_scripts/ folder under my dags/ folder.
I have 2 different dags running the same python_operator - calling to 2 different python scripts located in the python_scripts/ folder.
They both write output files BUT:
one of them creates the file under the dags/ folder, and one of them creates it in the plugins/ folder.
How does Airflow determine the working path?
How can I get Airflow to write all outputs to the same folder?
One thing you could try, that I use in my dags, would be to set you working path by adding os.chdir('some/path') in your DAG.
This only works if you do not put it into an operator, as those are run in subprocesses and therefore do not change the working path of the parent process.
The other solution I could think of would be using absolute paths when specifying your output.
For the approach with os.chdir try the following and you should see both files get created in the folder defined with path='/home/chr/test/':
from datetime import datetime
import os
import logging
from airflow import DAG
from airflow.exceptions import AirflowException
from airflow.operators.python_operator import PythonOperator
log = logging.getLogger(__name__)
default_args = {
'owner': 'admin',
'depends_on_past': False,
'retries': 0
}
dag = DAG('test_dag',
description='Test DAG',
catchup=False,
schedule_interval='0 0 * * *',
default_args=default_args,
start_date=datetime(2018, 8, 8))
path = '/home/chr/test'
if os.path.isdir(path):
os.chdir(path)
else:
os.mkdir(path)
os.chdir(path)
def write_some_file():
try:
with open("/home/chr/test/absolute_testfile.txt", "wt") as fout:
fout.write('test1\n')
with open("relative_testfile.txt", "wt") as fout:
fout.write('test2\n')
except Exception as e:
log.error(e)
raise AirflowException(e)
write_file_task = PythonOperator(
task_id='write_some_file',
python_callable=write_some_file,
dag=dag
)
Also, please try to provide code next time you ask a question, as it is almost impossible to find out what the problem is, just by reading your question.
I'm using the relatively new airflow project. I have a bunch of DAGs written and running. Now I want to integrate a bug reporting service, so that if any code in any of the DAGs raises an exception, the information will be sent to a certain API. I can put the API call in the on_failure_callback of each DAG, but I need to execute an initializing line like bug_reporter.init(bug_reporter_token) that just needs to run once.
Is there a place in Airflow for initializing code? Right now I'm initializing the bug tracker at the beginning of every DAG definition file. This seems to be redundant, but I can't find a place to write a file that runs before the DAGs are defined. I've tried reading about plugins, but it doesn't seem to be there.
In your DAG definition file, instead of a DAG use your own subclass:
from airflow.utils.decorators import apply_defaults
import bug_reporter
class DAGWithBugReporter(DAG):
#apply_defaults
def __init__(
self,
bug_reporter_token,
*args, **kwargs):
super(DAGWithBugReporter, self).__init__(*args, **kwargs)
bug_reporter.init(bug_reporter_token)
Then in your dag definition:
dag = DAGWithBugReporter(
dag_id='my_dag',
schedule_interval=None,
start_date=datetime(2017, 2, 26),
bug_reporter_token=my_token_from_somewhere
)
t1 = PythonOperator(
task_id='t1',
provide_context=True,
python_callable=my_callable,
xcom_push=True,
dag=dag)