When I checked the documentation, it was not mentioned anywhere that the start_date is a required parameter to be passed while creating a dag. I am getting the following error when I don't pass the parameter:
Broken DAG: [/opt/airflow/dags/dag_api_to_minio.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1039, in dag
dag.add_task(self)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py", line 2328, in add_task
raise AirflowException("DAG is missing the start_date parameter")
airflow.exceptions.AirflowException: DAG is missing the start_date parameter
Here is my dag definition using the TaskFlow API:
#dag(start_date=None,
schedule_interval=None,
max_active_runs=1,
catchup=False,
default_args={
"retries": 1,
"retry_delay": timedelta(minutes=3)
},)
Versions: airflow:2.5.1, python:3.7
Can someone please explain what is going in here?
Expected output: The code should run without any errors as start_date is not a required parameter
Actual output: airflow exception: missing start_date parameter
it was not mentioned anywhere that the start_date is a required parameter to be passed while creating a dag
Because it's not. You may define the start_date on the tasks.
When Airflow parse the DAG it tried to register tasks into their associated DAG objects. Part of that process it to verify that start_date was provided. The task start_date override the DAG start_date but you must define at least in one of the objects. You can view this in the source code.
Related
I'm learning Airflow and just want to get up and running with the Quickstart: https://airflow.apache.org/docs/apache-airflow/stable/start.html
I'm not sure if this is a virtual environment issue or something I'm missing with Airflow that should be obvious and this may be a duplicate of this question from 2017: Running Airflow task from the command line does not work but there were no answers there.
My OS is POP_OS (debian)
I have created a new virtual environment and installed airflow by running the script provided:
# Airflow needs a home. `~/airflow` is the default, but you can put it
# somewhere else if you prefer (optional)
export AIRFLOW_HOME=~/airflow
# Install Airflow using the constraints file
AIRFLOW_VERSION=2.5.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
# For example: 3.7
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.5.0/constraints-3.7.txt
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
# The Standalone command will initialise the database, make a user,
# and start all components for you.
airflow standalone
# Visit localhost:8080 in the browser and use the admin account details
# shown on the terminal to login.
# Enable the example_bash_operator dag in the home page
I was expecting this to be plug-and-play and I haven't even reached the tutorials yet.
airflow standalone works and I can run the DAG from the web UI. However, if I run
# run your first task instance
airflow tasks run example_bash_operator runme_0 2015-01-01
from the CLI, I get
airflow.exceptions.DagRunNotFound: DagRun for example_bash_operator with run_id or execution_date of '2015-01-01' not found
full error:
(airflow) jasonstewartnz#pop-os:~$ airflow tasks run example_bash_operator runme_0 2015-01-01
[2022-12-22 11:11:18,776] {dagbag.py:538} INFO - Filling up the DagBag from /home/jasonstewartnz/airflow/dags
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_entry_group>, delete_entry_group already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_entry_group>, create_entry_group already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_entry_gcs>, delete_entry already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_entry>, create_entry_gcs already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_tag>, delete_tag already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_tag>, create_tag already registered for DAG: example_complex
[2022-12-22 11:11:18,832] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,832] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/models/dag.py:3492 RemovedInAirflow3Warning: Param `schedule_interval` is deprecated and will be removed in a future release. Please use `schedule` instead.
[2022-12-22 11:11:18,914] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): prepare_email>, send_email already registered for DAG: example_dag_decorator
[2022-12-22 11:11:18,914] {taskmixin.py:205} WARNING - Dependency <Task(EmailOperator): send_email>, prepare_email already registered for DAG: example_dag_decorator
Traceback (most recent call last):
File "/home/jasonstewartnz/.venv/airflow/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/__main__.py", line 39, in main
args.func(args)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/cli_parser.py", line 52, in command
return func(*args, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/utils/cli.py", line 108, in wrapper
return f(*args, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 384, in task_run
ti, _ = _get_ti(task, args.map_index, exec_date_or_run_id=args.execution_date_or_run_id, pool=args.pool)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 159, in _get_ti
dag_run, dr_created = _get_dag_run(
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 115, in _get_dag_run
raise DagRunNotFound(
airflow.exceptions.DagRunNotFound: DagRun for example_bash_operator with run_id or execution_date of '2015-01-01' not found
The web UI tells me my config is /home/jasonstewartnz/airflow/airflow.cfg
The dags_folder in this /home/jasonstewartnz/airflow/dags is empty.
When I go to
http://localhost:8080/dags/example_bash_operator/details
I see that the fileloc attribute for the dag is:
fileloc /home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/example_dags/example_bash_operator.py
Even if I copy this file to the DAGs directory or change the DAGs directory in the config to the above, or add it to the path the CLI still seems unable to find the DAG.
did you enable the DAG, 'example_bash_operator' in the UI as the instructions specify? I am referring to this step in the guide's instructios:
# Visit localhost:8080 in the browser and use the admin account details
# shown on the terminal to login.
# Enable the example_bash_operator dag in the home page
This should be done before you attempt to execute command, airflow tasks run.
I'm running Apache Airflow 2.x locally, using the Docker Compose file that is provided in the documentation. In the .\dags directory on my local filesystem (which is mounted into the Airflow containers), I create a new Python script file, and implement the DAG using the TaskFlow API.
The changed to my DAG are sometimes invalid. For example, maybe I have an ImportError due to an invalid module name, or a syntax error. When Airflow attempts to import the DAG, I cannot find any log messages, from the web server, scheduler, or worker, that would indicate a problem, or what the specific problem is.
Instead, I have to read through my code line-by-line, and look for a problem. This problem is compounded by the fact that my local Python environment on Windows 10, and the Python environment for Airflow, are different versions and have different Python packages installed. Hence, I cannot reliably use my local development environment to detect package import failures, because the packages I expect to be installed in the Airflow environment are different than the ones I have locally. Additionally, the version of Python I'm using to write code locally, and the Python version being used by Airflow, are not matched up.
Thus, I am needing some kind of error logging to indicate that a DAG import failed.
Question: When a DAG fails to update / import, where are the logs to indicate if an import failure occurred, and what the exact error message was?
Currently, the DAG parsing logs would be under $AIRFLOW_HOME/logs/EXECUTION_DATE/scheduler/DAG_FILE.py.log
Example:
Let's say my DAG file is example-dag.py which has the following contents, as you can notice there is a typo in datetime import:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import dattime # <-- This Line has typo
dag = DAG(
dag_id='example_Dag',
schedule_interval=None,
start_date=datetime(2019, 2, 6),
)
t1 = BashOperator(
task_id='print_date1',
bash_command='sleep $[ ( $RANDOM % 30 ) + 1 ]s',
dag=dag)
Now, if you check logs under $AIRFLOW_HOME/logs/scheduler/2021-04-07/example-dag.py.log where $AIRFLOW_HOME/logs is what I have set in $AIRFLOW__LOGGING__BASE_LOG_FOLDER or [logging] base_log_folder in airflow.cfg (https://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#base-log-folder)
That file should have logs as below:
[2021-04-07 21:39:02,222] {scheduler_job.py:182} INFO - Started process (PID=686) to work on /files/dags/example-dag.py
[2021-04-07 21:39:02,230] {scheduler_job.py:633} INFO - Processing file /files/dags/example-dag.py for tasks to queue
[2021-04-07 21:39:02,233] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,233] {dagbag.py:451} INFO - Filling up the DagBag from /files/dags/example-dag.py
[2021-04-07 21:39:02,368] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,357] {dagbag.py:308} ERROR - Failed to import: /files/dags/example-dag.py
Traceback (most recent call last):
File "/opt/airflow/airflow/models/dagbag.py", line 305, in _load_modules_from_file
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/files/dags/example-dag.py", line 3, in <module>
from datetime import dattime
ImportError: cannot import name 'dattime'
[2021-04-07 21:39:02,380] {scheduler_job.py:645} WARNING - No viable dags retrieved from /files/dags/example-dag.py
[2021-04-07 21:39:02,407] {scheduler_job.py:190} INFO - Processing /files/dags/example-dag.py took 0.189 seconds
and you will see the error in the Webserver as follow:
Having an issue doing a backfill. When I run this in the command-line
airflow backfill my_dag -s 2021-01-01 -e 2021-01-12
the dag is triggered and begins running. The first task (a simple python script with no external dependencies) completes but in the second task, I receive this error about fernet key. The dag is just calling a MsSqlOperator.
cryptography.fernet.InvalidToken
If I trigger the dag manually in the UI, all the steps succeed.
If I trigger one execution in the dag in the CLI, the dag succeeds.
airflow dags trigger -e '2021-01-19T04:00:00' my_dag
The fernet key is in the config file and we've already run resetdb and re-created the connections. Same issue exists, backfill command doesn't work but other methods do.
Also tried using the --local flag (not sure what this does) but it doesn't work either.
Any ideas how to troubleshoot?
Running Airflow 1.10.14 on-prem with LocalExecutor. Edit: Issue exists when using .15 too.
Backfill Doc for Airflow 1.10.14
Logs:
Note, it says something about variable missing but that's misleading since it works if I manually run it from the UI. Key line is
ERROR - Can't decrypt _val for key=xyz_users_overlap_import, FERNET_KEY configuration missing
Full log:
INFO - Job 4303: Subtask download_user_files
[2021-05-17 23:41:29,327] {{logging_mixin.py:120}} INFO - Running <TaskInstance: xyz_users_overlap_import.download_user_files 2021-01-22T15:10:00+00:00 [running]> on host da2m.mycorp.corp
[2021-05-17 23:41:29,360] {{variable.py:58}} ERROR - Can't decrypt _val for key=xyz_users_overlap_import, FERNET_KEY configuration missing
[2021-05-17 23:41:29,361] {{taskinstance.py:1150}} ERROR - 'Variable xyz_users_overlap_import does not exist'
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 965, in _run_raw_task
self.render_templates(context=context)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1424, in render_templates
self.task.render_template_fields(context)
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 719, in render_template_fields
self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 726, in _do_render_template_fields
rendered_content = self.render_template(content, context, jinja_env, seen_oids)
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 755, in render_template
return jinja_env.from_string(content).render(**context)
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/usr/local/lib/python3.8/site-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "<template>", line 1, in top-level template code
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 471, in getattr
return getattr(obj, attribute)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1335, in __getattr__
self.var = Variable.get(item, deserialize_json=True)
File "/usr/local/lib/python3.8/site-packages/airflow/models/variable.py", line 117, in get
raise KeyError('Variable {} does not exist'.format(key))
KeyError: 'Variable xyz_users_overlap_import does not exist'
Have you tried by generating a new FERNET_KEY and set up as env variable (in this way you will overwrite the value from the airflow.cfg file):
export AIRFLOW__CORE__FERNET_KEY=your_fernet_key
The code to generate a new one is:
from cryptography.fernet import Fernet
fernet_key= Fernet.generate_key()
print(fernet_key.decode())
Do not forget to install before the package https://pypi.org/project/cryptography/ (pip install cryptography)
I am trying to setup and understand custom policy. Not sure what I am doing wrong however, following this is not working.
Airflow Version: 1.10.10
Expected result: it should throw exception if I try to run DAG with default_owner
Actual Result: no such exception
/root/airflow/config/airflow_local_settings.py
class PolicyError(Exception):
pass
def cluster_policy(task):
print("task_instance_mutation_hook")
raise PolicyError
def task_instance_mutation_hook(ti):
print("task_instance_mutation_hook")
raise PolicyError
/root/airflow/config/airflow_local_settings.pyc file is being created so I know this file is being processed by airflow.
if there is any compilation error in this file all my dags fails. however not with above file.
Not sure what I am doing wrong.
This feature is available from 1.10.12 version only.
PendingDeprecationWarning: The requested task could not be added to the DAG because a task with task_id create_tag_template_field_result is already in the DAG. Starting in Airflow 2.0, trying to overwrite a task will raise an exception.
First , This is just a warning that for now but from version 2.0 of Airflow, it will raise exception, so can crash your pipeline if not handled (given you update airflow module)
Warning suggests that you are adding a task twice or using same id (create_tag_template_field_result) for two different tasks, which is causing this warning.