Airflow DAG cannot import module in root-folder - airflow

I have the following folder-structure
airflow/
|_dag/
as far as I understand, airflow uses the "airflow" folder as root i.e I assume that everything placed in "airflow" would be able to be imported.
Say I have different projects with tasks placed in the following structure
airflow/
|_dag/
| |_ mydag.py
|
|_myprojects/
|_projectone/
| |_tasks/
| |_ mytask.py
|_projecttwo/
|_tasks/
|_ mytask.py
then I would assume that I in mydag.py should be able to import mytask from a given project like
#mydag.py
from myprojects.projectone import tasks
but I get a DAG import error; ModuleNotFoundError: No module named 'myprojects'.
Is this doable, or should I (somehow) change the airflows PYTHONPATH (and in that case, where is that done?)
Note, I have created __init__.py files in the folders.

One option is to set the path just before importing myproject:
#mydag.py
import sys
# it's important for path being inserted before importing `myproject`
sys.path.insert(0, "..")
from myproject import tasks
The second option is to move myproject under the dag folder:
airflow
+-- dag
+--- myproject
The third option is to move some logic into plugins/ folder.
https://airflow.apache.org/docs/apache-airflow/stable/modules_management.html

Related

How to setting up right "plugins" folder in airflow?

I'm using airflow, and I want to import some plugins for my dags
The problem get into the play: when I'm trying to use them, follow by:
from dags_folder.plugins import fetchingData
I get this error
Module name "dags_folder" not found!
(here is my layout)
I've setting up with the airflow.cfg, but it doesn't work!
notes: I'm using conda-env
You need to add the folder dags_folder to python path:
export PYTHONPATH=${PYTHONPATH}:/path/to/dags_folder
then you could import fetchingData from plugins:
from plugins import fetchingData
(if you want to import it from dags_folder.plugins, you need to add the parent folder of dags_folder to the python path).
If you need this plugin in one module, you can add the plugins package to python path pragmatically:
import sys
sys.path.append("/path/to/dags_folder")
from plugins import fetchingData

airflow not recognize local directory ModuleNotFoundError: No module named

my project structure looks like :
my_project
-dags
-config
however on airflow dashboard I see an error Broken DAG pointing to this line : from config.setup_configs import somemethod
and yields this err:
Broken DAG: [/usr/local/airflow/dags/airflow_foo.py] No module named 'config'
although the directory exists
According to documentation Airflow has, by default, three directories to path
AIRFLOW_HOME/dags
AIRFLOW_HOME/config
AIRFLOW_HOME/plugins
Any other path has to be added to system path, as described in airflow module management
For sake of simplicity, I added my module mymodule.py to AIRFLOW_HOME/plugins and I can import them successfully.
from mymodule import my_method
So, in your case, if you rename configurations to plugins and update import statement into DAG,
from setup_configs import somemethod
it should work.
You need to move the config directory into the dags folder and create an empty __init__.py file within the config folder. Then, it should work.

Python 3.x: import a function, config dictionary, ... etc. from a file in a different directory

The folder tree of my project is:
project:
|
| -- src:
|--- dir_a:
|--- file_a.py
|--- dir_b:
|--- file_b.py
I want to import a function, config dictionary, ... etc. in the file_a.py (the current file) from the file_b.py
I found many answers talking about packages and modules, but I don't know anything about them because I'm writing simple python files. Moreover, I want to send this project to someone to use it on his computer (running some files .py from the command line) without editing the system path manually or any other hard solutions.
I would create a new package that has both dir_a and dir_b in it. Then you need an empty file __init__.py in every directory in the package.
src/
|_ pkg/
|_ __init__.py
|_ dir_a/
|_ __init__.py
|_ file_a.py
|_ dir_b/
|_ __init__.py
|_ file_b.py
Then, provided there are no circular dependencies, in file_a.py you can put any of these:
import pkg.dir_b.file_b
# or
from pkg.dir_b.file_b import ...
# or
from pkg.dir_b import file_b

Writing and importing custom plugins in Airflow

This is actually two questions combined into one.
My AIRFLOW_HOME is structured like
airflow
+-- dags
+-- plugins
+-- __init__.py
+-- hooks
+-- __init__.py
+-- my_hook.py
+-- another_hook.py
+-- operators
+-- __init__.py
+-- my_operator.py
+-- another_operator.py
+-- sensors
+-- utils
I've been following astronomer.io's examples here https://github.com/airflow-plugins. My custom operators use my custom hooks, and all the imports are relative to the top level folder plugins.
# my_operator.py
from plugins.hooks.my_hook import MyHook
However, when I tried moving my entire repository into the plugins folder, I get an import error after running airflow list_dags saying that plugins cannot be found.
I read up a little about it and apparently Airflow loads the plugins into its core module so they can be imported like
# my_operator.py
from airflow.hooks.my_hook import MyHook
So I changed all the imports to read directly from airflow.plugin_type instead. I get another import error though, this time saying that my_hook cannot be found. I restart my workers, scheduler and webserver every time but that doesn't seem to be the issue. I've looked at solutions proposed in similar questions and they don't work either.
The official documentation also shows this way https://airflow.apache.org/plugins.html of extending the AirflowPlugin class, but I'm not sure where this "interface" should reside. I would also prefer a drag and drop option.
Finally, it clearly doesn't make sense for my code repo to be the plugins folder itself, but if I separate them testing becomes inconvenient. Do I have to modify my Airflow configurations to point to my repo every time I run unit tests on my hooks/ops? What are the best practices for testing custom plugins?
I figured this out by doing some trial and error. This is the final structure of my AIRFLOW_HOME folder
airflow
+-- dags
+-- plugins
+-- __init__.py
+-- plugin_name.py
+-- hooks
+-- __init__.py
+-- my_hook.py
+-- another_hook.py
+-- operators
+-- __init__.py
+-- my_operator.py
+-- another_operator.py
+-- sensors
+-- utils
In plugin_name.py, I extend the AirflowPlugin class
# plugin_name.py
from airflow.plugins_manager import AirflowPlugin
from hooks.my_hook import *
from operators.my_operator import *
from utils.my_utils import *
# etc
class PluginName(AirflowPlugin):
name = 'plugin_name'
hooks = [MyHook]
operators = [MyOperator]
macros = [my_util_func]
In my custom operators which use my custom hooks, I import them like
# my_operator.py
from hooks.my_hook import MyHook
Then in my DAG files, I can do
# sample_dag.py
from airflow.operators.plugin_name import MyOperator
It is necessary to restart the webserver and scheduler. Took me a while to figure out.
This also facilitates testing since the imports within the custom classes are relative to the sub modules within the folder plugins. I wonder if I can omit the __init__.py file inside plugins, but since everything is working I didn't try doing that.

pyinstaller can stop working correctly

Recently, I added a sub-directory and few script files with an empty init.py in the new sub-directory. There is no init.py in foo_main_dir.
foo_main_dir
|
|---- main.py
----- foo_sub_dir
|
---- foo.py
Sometimes pyinstaller stoped working completely after I made some changes in foo.sub.dir with an import error at run-time of the executable file:
ImportError: No module named foo
Due to main.py import:
from foo_main_sub.foo import FOO
None of .pyc was generated in both directories.
A workaround is to reinstall pyinstller. It is quite annoying.
Appreciate if you know a permanent solution.

Resources