import file from dag subfolder in airflow - airflow

I am writing a python script that computes for some thing. That is why the script is separated from the dags folder. In that script, I have to import a file. But I have no success since it errors FileNotFoundError.
This is my directory:
dags/
- my_dag.py
sub_folder/
- __init__.py
- my_functions.py
meta/
- file.csv
my_functions.py contains the computing scripts needed in my DAG. It has to read the file.csv located in the meta folder.
In my_functions.py, I wrote:
file_df = pd.read_csv('meta/file.csv')
But the file cannot be found.

Use AIRFLOW_HOME env variable and provide the full path.
import os
AIRFLOW_HOME = os.getenv('AIRFLOW_HOME')
file_df = pd.read_csv(AIRFLOW_HOME + '/dags/sub_folder/meta/file.csv')

Related

Pyinstaller module issues

I have a python file that I am trying to create an executable out of using Pyinstaller on my Mac. This python file imports several different python files. When I run the unix executable that was generated, I get this error:
File "main/__init__.py", line 4, in <module>
ModuleNotFoundError: No module named 'game'
Line 4 reads:
from game.scripts.gui import creator
The command I used to create the executable:
pyinstaller __init__.py --onefile --clean --windowed
The directory:
__init__.py
game
scripts
gui
creator.py
Any ideas on how I could fix this? Thanks
The subdirs are not included by creating an *.exe, so the creator.py is not found inside your *.exe. To avoid that, you have to include the extra files/folders by specifying them. This can be done by a *.spec file
By calling pyinstaller with your *.py file it will create a default *.spec file which you can edit and use next time to create your *.exe. Every option you used when calling
pyinstaller __init__.py --onefile --clean --windowed
is configured here so calling
pyinstaller *.spec
the next time gives the same result.
Edit this in your spec-file to fit your needs by copying single files or even whole folders including their content into the *.exe:
a = Analysis(['your.py'],
pathex=['.'],
binaries=[],
datas=[('some.dll', '.'),
('configurationfile.ini', '.'),
('data.xlsx', '.'),
('../../anotherfile.pdf', '.')
],
....some lines cut ....
a.datas += Tree('./thisfoldershouldbecopied', prefix='foldernameinexe')
More infos to that are found in the docs of pyinstaller regarding spec-files and including data files
https://pyinstaller.readthedocs.io/en/stable/spec-files.html
and for example in this post here:
Pyinstaller adding data files

airflow not recognize local directory ModuleNotFoundError: No module named

my project structure looks like :
my_project
-dags
-config
however on airflow dashboard I see an error Broken DAG pointing to this line : from config.setup_configs import somemethod
and yields this err:
Broken DAG: [/usr/local/airflow/dags/airflow_foo.py] No module named 'config'
although the directory exists
According to documentation Airflow has, by default, three directories to path
AIRFLOW_HOME/dags
AIRFLOW_HOME/config
AIRFLOW_HOME/plugins
Any other path has to be added to system path, as described in airflow module management
For sake of simplicity, I added my module mymodule.py to AIRFLOW_HOME/plugins and I can import them successfully.
from mymodule import my_method
So, in your case, if you rename configurations to plugins and update import statement into DAG,
from setup_configs import somemethod
it should work.
You need to move the config directory into the dags folder and create an empty __init__.py file within the config folder. Then, it should work.

What is export command doing in apache airflow setup

I am following this tutorial.
https://towardsdatascience.com/getting-started-with-apache-airflow-df1aa77d7b1b
when I run the export command as below
export AIRFLOW_HOME='pwd' airflow_home
what is this export command doing. it will create a environment variable AIRFLOW_HOME = pwd
is this the purpose?
when I run the next command airflow initdb it creates a folder called pwd inside my newly created project directory and puts the files in there.
Am I missing something here?
I am using macbook, python 3.7, airflow 1.10.9
You're missing the correct backtick ` instead of a single quote '.
On *nix systems `pwd` will be evaluated to the current directory. That's why it creates a folder called pwd instead of using the current directory as the airflow home

which path will be appended to sys.path when i run python3 src/b.py?

Which paths are added to sys.path when the command is run, what are the factors that affect it?
My Python version is 3.6.4, and I also tried it on version 3.7.
Directory structure is
.
├── __init__.py
└── src
├── a.py
└── b.py
code is
# a.py
class A: pass
# b.py
from sys
print(sys.path)
from src.a import A
a = A()
print(a)
I tried to run python3 src/b.py on two machines with the same Python version. One of them did not report an error and the other error occurred.
In the correct running result, there are two directories in sys.path, one is the current directory and the other is the src directory;
The correct output is:
['/home/work/test/testimport/src', '/home/work/test/testimport',...]
<src.a.A object at 0x7f8b71535ac8>
The wrong result is:
['/home/work/test/testimport/src', ...]
Traceback (most recent call last):
File "src/b.py", line 3, in <module>
from src.a import A
ModuleNotFoundError: No module named 'src'
sys.path contains only the src directory.
Which path will be appended to sys.path when i run python3 src/b.py?
src is indeed not a module (does not contain __init__.py) and it does not matter if it is in your path or not. In addition, b.py "sees" the directory it is in (src) anyway, so
from a import A
would work no matter where are you executing B from (python3 /path/to/src/b.py) should work. Note even if you did create
`src/__init__.py`
your b.py would fail if you did not add the directory src to your path (or PYTHONPATH, which is the recommended way to add python modules to your path).

Pyinstaller for arm 32bit compilation

I am using Ubuntu as a host system,.compiled for 32bit manually chosen the bootloader, with respect to that create bundle Python file. And copy into my arm target board.
The error which I am facing is not able to execute the binary file
In my arm board.
I cannot able to bundle .my .csv files with executable using --add-data. While running the executable, it searches my CSV file in the current folder, it shows error as file not found error.
how to add multiple files (CSV and INI) files with my executable.
How to fix this issue.
Regards
Rajalakshmi
For adding data files you need to First, provide your data files with --add-data flag. Then because your data would be extracted on a temp directory, you need to set its address for your app. In below example, I'm addressing all CSV files from resource_path function which would return the relative path for each file.
I assume that you put all your files in data directory beside your app.
app.py:
import os
import sys
def resource_path(relative_path):
if hasattr(sys, '_MEIPASS'):
return os.path.join(sys._MEIPASS, relative_path)
return os.path.join(os.path.abspath("."), relative_path)
if __name__ == "__main__":
csv_files = ["data/a.csv", "data/b.csv", "data/c.csv"]
print("Reading CSV files from data directory")
for csv_file in csv_files:
with open(resource_path(csv_file), "r") as f:
print(csv_files, ":", f.read())
print("Done!")
You can then generate your executable with:
pyinstaller app.py -F --add-data "./data/*;data/"

Resources