error while importing DAG file in Airflow 2.5.0 - airflow

when I start my airflow schedular and webserver my bigdata.py file not getting imported, below is the error which am getting.
​Broken DAG: [/home/adminn/airflow/dags/bigdata.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'airflow.providers.apache'
this is the DAG I have written, am I missing something?
here I am trying to pull MySQL table using sqoop and load in HDFS, and scheduling this operation using Airflow.
from airflow.models import DAG
from airflow.contrib.operators.sqoop_operator import SqoopOperator
from airflow.utils.dates import days_ago
Dag_Sqoop_Import = DAG(dag_id="SqoopImport",
schedule_interval="* * * * *",
start_date=days_ago(2))
sqoop_mysql_import = SqoopOperator(conn_id="sqoop_local",
table="shipmethod",
cmd_type="import",
target_dir="/airflow_sqoopImport",
num_mappers=1,
task_id="SQOOP_Import",
dag=Dag_Sqoop_Import)
sqoop_mysql_import
replies appreciated,thanks.

From airflow 2.x the providers are no longer included by default, but you have to install and then import them and the import path is changed
In your case you have to:
install the sqoop provider by running: pip install 'apache-airflow-providers-apache-sqoop'
change the import string for the operator into: from airflow.providers.apache.sqoop.operators.sqoop import SqoopOperator
Here is the full list of the providers available:
https://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html
And this is the provider you are looking for:
https://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html#apache-airflow-providers-apache-sqoop

Related

MS SQL Hook and Operators not importing into Airflow

Trying to import the mssql hook and operator into my dag but I keep getting this error from Airflow.
I'm currently importing with the newest syntax:
from airflow.providers.microsoft.mssql.hooks.mssql import MsSqlHook
from airflow.providers.microsoft.mssql.operators.mssql import MsSqlOperator
and I'm getting this import error:
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/dags/hevo_dag.py", line 5, in <module>
from airflow.providers.microsoft.mssql.hooks.mssql import MsSqlHook
ModuleNotFoundError: No module named 'airflow.providers.microsoft.mssql'
To import the operator you need to do:
For Airflow>=2.0:
you need to use Mssql provider package:
pip install apache-airflow-providers-microsoft-mssql
For Airflow<2.0:
you need to use Mssql backport provider package:
pip install apache-airflow-backport-providers-microsoft-mssql
In your code it's the same import path (regardless of the package). Airflow backported the provider to ease migration from Airflow 1 to Airflow 2 so upon upgrading you will not need to change the import paths.

Import error DAG airflow cannot import name 'serialization'

I have written a Dag file with contain ssh hook implementation.
I get import error in the DAG file
File "/usr/local/lib/python3.6/dist-packages/paramiko/transport.py", line 91, in
from paramiko.dsskey import DSSKey
File "/usr/local/lib/python3.6/dist-packages/paramiko/dsskey.py", line 25, in
from cryptography.hazmat.primitives import hashes, serialization
ImportError: cannot import name 'serialization'
how to resolve this?
thanks in advance
Sundar

Log messages for DAG import errors in Airflow 2.x

I'm running Apache Airflow 2.x locally, using the Docker Compose file that is provided in the documentation. In the .\dags directory on my local filesystem (which is mounted into the Airflow containers), I create a new Python script file, and implement the DAG using the TaskFlow API.
The changed to my DAG are sometimes invalid. For example, maybe I have an ImportError due to an invalid module name, or a syntax error. When Airflow attempts to import the DAG, I cannot find any log messages, from the web server, scheduler, or worker, that would indicate a problem, or what the specific problem is.
Instead, I have to read through my code line-by-line, and look for a problem. This problem is compounded by the fact that my local Python environment on Windows 10, and the Python environment for Airflow, are different versions and have different Python packages installed. Hence, I cannot reliably use my local development environment to detect package import failures, because the packages I expect to be installed in the Airflow environment are different than the ones I have locally. Additionally, the version of Python I'm using to write code locally, and the Python version being used by Airflow, are not matched up.
Thus, I am needing some kind of error logging to indicate that a DAG import failed.
Question: When a DAG fails to update / import, where are the logs to indicate if an import failure occurred, and what the exact error message was?
Currently, the DAG parsing logs would be under $AIRFLOW_HOME/logs/EXECUTION_DATE/scheduler/DAG_FILE.py.log
Example:
Let's say my DAG file is example-dag.py which has the following contents, as you can notice there is a typo in datetime import:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import dattime # <-- This Line has typo
dag = DAG(
dag_id='example_Dag',
schedule_interval=None,
start_date=datetime(2019, 2, 6),
)
t1 = BashOperator(
task_id='print_date1',
bash_command='sleep $[ ( $RANDOM % 30 ) + 1 ]s',
dag=dag)
Now, if you check logs under $AIRFLOW_HOME/logs/scheduler/2021-04-07/example-dag.py.log where $AIRFLOW_HOME/logs is what I have set in $AIRFLOW__LOGGING__BASE_LOG_FOLDER or [logging] base_log_folder in airflow.cfg (https://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#base-log-folder)
That file should have logs as below:
[2021-04-07 21:39:02,222] {scheduler_job.py:182} INFO - Started process (PID=686) to work on /files/dags/example-dag.py
[2021-04-07 21:39:02,230] {scheduler_job.py:633} INFO - Processing file /files/dags/example-dag.py for tasks to queue
[2021-04-07 21:39:02,233] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,233] {dagbag.py:451} INFO - Filling up the DagBag from /files/dags/example-dag.py
[2021-04-07 21:39:02,368] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,357] {dagbag.py:308} ERROR - Failed to import: /files/dags/example-dag.py
Traceback (most recent call last):
File "/opt/airflow/airflow/models/dagbag.py", line 305, in _load_modules_from_file
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/files/dags/example-dag.py", line 3, in <module>
from datetime import dattime
ImportError: cannot import name 'dattime'
[2021-04-07 21:39:02,380] {scheduler_job.py:645} WARNING - No viable dags retrieved from /files/dags/example-dag.py
[2021-04-07 21:39:02,407] {scheduler_job.py:190} INFO - Processing /files/dags/example-dag.py took 0.189 seconds
and you will see the error in the Webserver as follow:

airflow initdb: cannot import name 'Pendulum' from 'pendulum'

I installed airflow within one of my Anaconda envs named engdados. When I execute the command airflow initdb I'm getting the following error: airflow initdb: cannot import name 'Pendulum' from 'pendulum'. The full trace back is shown below:
(engdados) guilherme#Athena-LNX:~$ airflow initdb
Traceback (most recent call last):
File "/home/guilherme/anaconda3/envs/engdados/bin/airflow", line 25, in <module>
from airflow.configuration import conf
File "/home/guilherme/anaconda3/envs/engdados/lib/python3.8/site-packages/airflow/__init__.py", line 47, in <module>
settings.initialize()
File "/home/guilherme/anaconda3/envs/engdados/lib/python3.8/site-packages/airflow/settings.py", line 403, in initialize
configure_adapters()
File "/home/guilherme/anaconda3/envs/engdados/lib/python3.8/site-packages/airflow/settings.py", line 319, in configure_adapters
from pendulum import Pendulum
ImportError: cannot import name 'Pendulum' from 'pendulum' (/home/guilherme/anaconda3/envs/engdados/lib/python3.8/site-packages/pendulum/__init__.py)
(engdados) guilherme#Athena-LNX:~$ service start mysql$
start: unrecognized service
(engdados) guilherme#Athena-LNX:~$ service mysql start$
Usage: /etc/init.d/mysql start|stop|restart|reload|force-reload|status|bootstrap
(engdados) guilherme#Athena-LNX:~$ airflow initdb
Traceback (most recent call last):
File "/home/guilherme/anaconda3/envs/engdados/bin/airflow", line 25, in <module>
from airflow.configuration import conf
File "/home/guilherme/anaconda3/envs/engdados/lib/python3.8/site-packages/airflow/__init__.py", line 47, in <module>
settings.initialize()
File "/home/guilherme/anaconda3/envs/engdados/lib/python3.8/site-packages/airflow/settings.py", line 403, in initialize
configure_adapters()
File "/home/guilherme/anaconda3/envs/engdados/lib/python3.8/site-packages/airflow/settings.py", line 319, in configure_adapters
from pendulum import Pendulum
ImportError: cannot import name 'Pendulum' from 'pendulum' (/home/guilherme/anaconda3/envs/engdados/lib/python3.8/site-packages/pendulum/__init__.py)
The problem is: the pendulum is installed! When I execute the conda list command I can see the Pendulum there as follows:
Name Version Build Channel
pendulum 2.1.2 pypi_0 pypi
What I've checked so far:
Is the engdados environment activated? Yes
Is the Pendulum installed on Anaconda environment? Yes
The version of Pendulum the Anaconda shows is different of the one showed in conda list (1.4.4). Why?
I have no idea what is going on. Thanks in advance.
In pendulum version 2, the class pendulum.Pendulum is replaced with pendulum.DateTime.
Your version of airflow is expecting pendulum 1.x but your environment has 2.x.
You may be able to fix this by making a new env and installing airflow 2.0 (which uses pendulum 2.x). If you must use airflow < 2.0, you will need to pin pendulum to < 2.0 (e.g. using pip constraints).
Also if you using Pendulum in your code, for example in custom Operators you can add
try:
from pendulum import DateTime as Pendulum
except ImportError:
from pendulum import Pendulum

Apache Airflow initdb command fails, due to syntax error

I have created virtualenv for python3 using:
virtualenv -p $(which python3) ENV
Then activate the source
source /Users/myusername/ENV/bin/activate
Install the apache-airflow:
pip install apache-airflow
then which airflow yields /Users/myusername/ENV/bin/airflow
But when I try to initdb using:
airflow initdb
I get below error:
{db.py:350} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.utils.log.logging_mixin.LoggingMixin] cryptography not found - values will not be stored encrypted.
ERROR [airflow.models.DagBag] Failed to import: /Library/Python/2.7/site-packages/airflow/example_dags/example_http_operator.py
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/airflow/models/__init__.py", line 413, in process_file
m = imp.load_source(mod_name, filepath)
File "/Library/Python/2.7/site-packages/airflow/example_dags/example_http_operator.py", line 27, in <module>
from airflow.operators.http_operator import SimpleHttpOperator
File "/Library/Python/2.7/site-packages/airflow/operators/http_operator.py", line 21, in <module>
from airflow.hooks.http_hook import HttpHook
File "/Library/Python/2.7/site-packages/airflow/hooks/http_hook.py", line 23, in <module>
import tenacity
File "/Library/Python/2.7/site-packages/tenacity/__init__.py", line 375, in <module>
from tenacity.tornadoweb import TornadoRetrying
File "/Library/Python/2.7/site-packages/tenacity/tornadoweb.py", line 24, in <module>
from tornado import gen
File "/Library/Python/2.7/site-packages/tornado-6.0.3-py2.7-macosx-10.14-intel.egg/tornado/gen.py", line 126
def _value_from_stopiteration(e: Union[StopIteration, "Return"]) -> Any:
^
SyntaxError: invalid syntax
Done.
(ENV) ---------------------------------------------------------
Seems like example scripts use python 2.7 and it can't recognize the function definition syntax.
Does apache-airflow package need to be fixed by the next release or I can do something to fix this?
I tried fixing this:
Use python2.7 instead of python3
then install airflow on default python 2.7 enabled on mac but this throws other errors like package "six" is not compatible.
You need to turn off the example DAGs to be loaded in config file to solve this problem.
Anyway, it seems weird that airflow uses 2.7 Python when you told that it is installed into Python 3 virtual environment.

Resources