Apache airflow doens't display the DAG - airflow

I'm facing some issues trying to set up a basic DAG file inside the Airflow (but also I have other two files).
I'm using the LocalExecutor through the Ubuntu and saved my files at "C:\Users\tdamasce\Documents\workspace" with the dag and log file inside it.
My script is
# step 1 - libraries
from email.policy import default
from airflow import DAG
from datetime import datetime, timedelta
from airflow.operators.dummy_operator import DummyOperator
# step 2
default_args = {
'ownwer': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'retries':0
}
# step 3
dag = DAG(
dag_id='DAG-1',
default_args=default_args,
catchup=False,
schedule_interval=timedelta(minutes=5)
)
# step 4
start = DummyOperator(
task_id='start',
dag=dag
)
end = DummyOperator(
task_id='end',
dag=dag
)
My DAG stays like that:
Please, let me know if any add info is needed

As per your updated Question , I can see that you place the DAgs under a directory
"C:\Users\tdamasce\Documents\workspace" with the dag and log file
inside it.
you need to add dags to dags_folder (specified in airflow.cfg. By default it's $AIRFLOW_HOME/dags subfolder). See if your AIRFLOW_HOME variable and you should found a dag folder there.
you can also check airflow list_dags - this will list out all the dags,
Still you are not able to get that in the UI , then restart the servers.

Related

Dags triggered by DagTriggenRunOperator are stuck in queue

example_trigger_controller_dag.py
import pendulum
from airflow import DAG
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
with DAG(
dag_id="example_trigger_controller_dag",
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
schedule="#once",
tags=["example"],
) as dag:
trigger = TriggerDagRunOperator(
task_id="test_trigger_dagrun",
trigger_dag_id="example_trigger_target_dag", # Ensure this equals the dag_id of the DAG to trigger
conf={"message": "Hello World"},
)
example_trigger_target_dag.py
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.operators.bash import BashOperator
#task(task_id="run_this")
def run_this_func(dag_run=None):
"""
Print the payload "message" passed to the DagRun conf attribute.
:param dag_run: The DagRun object
"""
print(f"Remotely received value of {dag_run.conf.get('message')} for key=message")
with DAG(
dag_id="example_trigger_target_dag",
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["example"],
) as dag:
run_this = run_this_func()
bash_task = BashOperator(
task_id="bash_task",
bash_command='echo "Here is the message: $message"',
env={"message": '{{ dag_run.conf.get("message") }}'},
)
the task in controller dag successfully ended but the task in target dag stuck in queue. Any ideas about how to solve this problem?
I ran your DAGs (with both of them unpaused) and they work fine in a completely new environment (Airflow 2.5.0, Astro CLI Runtime 7.1.0). So the issue is most likely not with your DAG code.
Tasks stuck in queue is often an issue with the scheduler, mostly with older Airflow versions. I suggest you:
make sure both DAGs are unpaused when the first DAG runs.
make sure all start_dates are in the past (though in this case usually the tasks don't even get queued)
restart your scheduler/Airflow environment
try running the DAGs while no other DAGs are running to check if the issue could be that the parallelism limit is reached. (if you are using K8s executor you should also check worker_pods_creation_batch_size and with the Celery Executor worker_concurrency and stalled_task_timeout)
take a look at your scheduler logs (at $AIRFLOW_HOME/logs/scheduler)
upgrade Airflow if you are running an older version.

How can I use a xcom value to configure max_active_tis_per_dag for the TriggerDagRunOperator in Airflow 2.3.x?

Dear Apache Airflow experts,
I am currently trying to make the parallel execution of Apache Airflow 2.3.x DAGs configurable via the DAG run config.
When executing below code the DAG creates two tasks - for the sake of my question it does not matter what the other DAG does.
Because max_active_tis_per_dag is set to 1, the two tasks will be run one after another.
What I want to achieve: I want to provide the result of get_num_max_parallel_runs (which checks the DAG config, if no value is present it falls back to 1 as default) to max_active_tis_per_dag.
I would appreciate any input on this!
Thank you in advance!
from airflow import DAG
from airflow.decorators import task
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
from datetime import datetime
with DAG(
'aaa_test_controller',
schedule_interval=None,
start_date=datetime(2021, 1, 1),
catchup=False
) as dag:
#task
def get_num_max_parallel_runs(dag_run=None):
return dag_run.conf.get("num_max_parallel_runs", 1)
trigger_dag = TriggerDagRunOperator.partial(
task_id="trigger_dependent_dag",
trigger_dag_id="aaa_some_other_dag",
wait_for_completion=True,
max_active_tis_per_dag=1,
poke_interval=5
).expand(conf=['{"some_key": "some_value_1"}', '{"some_key": "some_value_2"}'])

read cli input without calling python operator

we want to read cli input pass to dag from UI during Dagtrigger in Dag.
i tried below code but its not working. here i am passing input as {""kpi":"ID123"}
and i want to print this ip value in my function get_data_from_bq
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.operators.python_operator import PythonOperator
from airflow import models
from airflow.models import Variable
from google.cloud import bigquery
from airflow.configuration import conf
LOCATION = Variable.get("HDM_PROJECT_LOCATION")
PROJECT_ID = Variable.get("HDM_PROJECT_ID")
client = bigquery.Client()
kpi='{{ kpi}}'
# default arguments
default_dag_args = {
'start_date':days_ago(0),
'retries': 0,
'project_id': PROJECT_ID
}
# Setting airflow environment varriable,getting hdm_batch_details data and updating it
def get_data_from_bq(**kwargs):
print("op is:")
print(kpi)
#Dag Defination
with models.DAG(
'00_test_sql1',
schedule_interval=None,
default_args=default_dag_args) as dag:
v_run_sql_01 = PythonOperator(
task_id='Run_SQL',
provide_context=True,
python_callable=get_data_from_bq,
location=LOCATION,
use_legacy_sql=False)
v_run_sql_01
Note: I don't want to use any operator to read data passed from cli
Note: I don't want to use any operator to read data passed from cli
This is impossible. Dag Run is only created when there are tasks to run.
You should understand that :
DAG + its top level code - builds DAG structure consisting of Tasks
DAG Run -> is single instance of DAG run which contains Task Instances to be executed. Dag Run simply consists of task instancess that belong to the DAG run with the given "dag run".
The configuration that you pass is "dag_run.conf" not "dag.conf" - which meanss that it is only specified for the DagRun, which is valid only for all Task Instances that belong to it.
Only Task Instances have access to dag_run.conf

Airflow: Create DAG from a separate file

In airflow, I'm trying to make a function that is dedicated to generate DAGs in a file:
dynamic_dags.py:
def generate_dag(name):
with DAG(
dag_id=f'dag_{name}',
default_args=args,
start_date=days_ago(2),
schedule_interval='5 5 * * *',
tags=['Test'],
catchup=False
) as dag:
dummy_task=DummyOperator(
task_id="dynamic_dummy_task",
dag=dag
)
return dag
Then in another file I'm trying to import the dags from a separate file:
load_dags.py:
from dynamic_dag import generate_dag
globals()["Dynamic_DAG_A"] = generate_dag('A')
However, the dags are not shown up on web UI.
But if I do them in a single file as below code, it will work:
def generate_dag(name):
with DAG(
dag_id=f'dag_{name}',
default_args=args,
start_date=days_ago(2),
schedule_interval='5 5 * * *',
tags=['Test'],
catchup=False
) as dag:
dummy_task=DummyOperator(
task_id="dynamic_dummy_task",
dag=dag
)
return dag
globals()["Dynamic_DAG_A"] = generate_dag('A')
I'm wondering why doing it in two separate files doesn't work.
I think if you are using Airflow 1.10, then the DAG files should contain DAG and airlfow:
https://airflow.apache.org/docs/apache-airflow/1.10.15/concepts.html?highlight=airflowignore#dags
When searching for DAGs, Airflow only considers python files that contain the strings “airflow” and “DAG” by default. To consider all python files instead, disable the DAG_DISCOVERY_SAFE_MODE configuration flag.
In Airflow 2 it's been changed (slightly - dag is case-insensitive):
https://airflow.apache.org/docs/apache-airflow/2.2.2/concepts/dags.html
When searching for DAGs inside the DAG_FOLDER, Airflow only considers Python files that contain the strings airflow and dag (case-insensitively) as an optimization.
To consider all Python files instead, disable the DAG_DISCOVERY_SAFE_MODE configuration flag.
I think you simply miss 'airflow' in your load_dags.py. You can add it wherever - including comments.

airflow "python operator" writes files to different locations

I have created a python_scripts/ folder under my dags/ folder.
I have 2 different dags running the same python_operator - calling to 2 different python scripts located in the python_scripts/ folder.
They both write output files BUT:
one of them creates the file under the dags/ folder, and one of them creates it in the plugins/ folder.
How does Airflow determine the working path?
How can I get Airflow to write all outputs to the same folder?
One thing you could try, that I use in my dags, would be to set you working path by adding os.chdir('some/path') in your DAG.
This only works if you do not put it into an operator, as those are run in subprocesses and therefore do not change the working path of the parent process.
The other solution I could think of would be using absolute paths when specifying your output.
For the approach with os.chdir try the following and you should see both files get created in the folder defined with path='/home/chr/test/':
from datetime import datetime
import os
import logging
from airflow import DAG
from airflow.exceptions import AirflowException
from airflow.operators.python_operator import PythonOperator
log = logging.getLogger(__name__)
default_args = {
'owner': 'admin',
'depends_on_past': False,
'retries': 0
}
dag = DAG('test_dag',
description='Test DAG',
catchup=False,
schedule_interval='0 0 * * *',
default_args=default_args,
start_date=datetime(2018, 8, 8))
path = '/home/chr/test'
if os.path.isdir(path):
os.chdir(path)
else:
os.mkdir(path)
os.chdir(path)
def write_some_file():
try:
with open("/home/chr/test/absolute_testfile.txt", "wt") as fout:
fout.write('test1\n')
with open("relative_testfile.txt", "wt") as fout:
fout.write('test2\n')
except Exception as e:
log.error(e)
raise AirflowException(e)
write_file_task = PythonOperator(
task_id='write_some_file',
python_callable=write_some_file,
dag=dag
)
Also, please try to provide code next time you ask a question, as it is almost impossible to find out what the problem is, just by reading your question.

Resources