Airflow on AWS EKS, importing custom package , ModuleNotFoundError - airflow

I've installed Airflow on AWS EKS, and things are working great.
Now I'm trying to add slack alert to my dags. My dag directory is like this:
So I tried to use alert.py by inserting this into DAGs
from utils.alert import SlackAlert
and web shows an error
Broken DAG: [/opt/airflow/dags/repo/dags/sample_slack.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/dags/repo/dags/sample_slack.py", line 8, in <module>
from utils.alert import SlackAlert
ModuleNotFoundError: No module named 'utils'
How can I make my DAGs be able to import packages from utils folder?
Plus+:
I deployed Airflow on Docker Desktop K8s locally, and it works..
Plus++:
I'm using gitSyncSidecar with persistence enabled. In scheduler pod, I checked dags path. Now I see that there's auto-generated directory(maybe by gitSyncSidecar?).
$ kubectl exec --stdin --tty airflow-scheduler-fc9c56d9c-ltql7 -- /bin/bash
Defaulted container "scheduler" out of: scheduler, git-sync, scheduler-log-groomer, wait-for-airflow-migrations (init), git-sync-init (init)
airflow#airflow-scheduler-fc9c56d9c-ltql7:/opt/airflow$ cd dags/
.git/ c5d7d684141f605142885d429e10ec3d81ca745b/ repo/
airflow#airflow-scheduler-fc9c56d9c-ltql7:/opt/airflow$ cd dags/c5d7d684141f605142885d429e10ec3d81ca745b/dags/utils/
airflow#airflow-scheduler-fc9c56d9c-ltql7:/opt/airflow/dags/c5d7d684141f605142885d429e10ec3d81ca745b/dags/utils$ ls
__init__.py alert.py
So with this environment, if I want to do what I'm trying to do, I have to deploy airflow and check auto-generated directory's name, and use it in my dag like this?
from c5d7d684141f605142885d429e10ec3d81ca745b.dags.utils.alert import SlackAlert

I solved my question with the help of Oluwafemi Sule's question.
I went into pods and checked airflow.cfg, and default dags directory was set to '/opt/airflow/dags/repo' . Under there starts my git Repo. So I changed from 'import utills' to 'import dags.utils' and now it finds the module correctly.

Related

No such file or directory - Airflow

I have my airflow project with the structure as below
airflow
|
|----dags
| |----dag.py
|
|----dbt-redshift
|----models
|----model.sql
I have included the dbt-redshift directory in the volumes section as
volumes:
-./dbt-redshift:/opt/airflow/dbt-redshift
And I'm trying to run the dbt inside the DAG using a bash operator
dbt_task = BashOperator(task_id='dbt', bash_command="cd ~/dbt-redshift && dbt run", dag=dag)
But when i execute the DAG i get the error
cd: /home/***/dbt-redshift no such file or directory
I'm not sure I understand how these directories are located inside the airflow project.
You are mounting the volume inside the container to /opt/airflow/dbt-redshift, but the BashOperator references ~/dbt-redshit with ~ resolving to /home/airflow.
(Assuming you are using the apache/airflow image)
Either change the command used by the BashOperator to reference /opt/airflow/dbt-redshift or change the volume to mount to the home directory.

Google Dataflow: Import custom Python module

I try to run a Apache Beam pipeline (Python) within Google Cloud Dataflow, triggered by a DAG in Google Cloud Coomposer.
The structure of my dags folder in the respective GCS bucket is as follows:
/dags/
dataflow.py <- DAG
dataflow/
pipeline.py <- pipeline
setup.py
my_modules/
__init__.py
commons.py <- the module I want to import in the pipeline
The setup.py is very basic, but according to the Apache Beam docs and answers on SO:
import setuptools
setuptools.setup(setuptools.find_packages())
In the DAG file (dataflow.py) I set the setup_file option and pass it to Dataflow:
default_dag_args = {
... ,
'dataflow_default_options': {
... ,
'runner': 'DataflowRunner',
'setup_file': os.path.join(configuration.get('core', 'dags_folder'), 'dataflow', 'setup.py')
}
}
Within the pipeline file (pipeline.py) I try to use
from my_modules import commons
but this fails. The log in Google Cloud Composer (Apache Airflow) says:
gcp_dataflow_hook.py:132} WARNING - b' File "/home/airflow/gcs/dags/dataflow/dataflow.py", line 11\n from my_modules import commons\n ^\nSyntaxError: invalid syntax'
The basic idea behind the setup.py file is documented here
Also, there are similar questions on SO which helped me:
Google Dataflow - Failed to import custom python modules
Dataflow/apache beam: manage custom module dependencies
I'm actually wondering why my pipelines fails with a Syntax Error and not a module not found kind of error...
I tried to reproduce your issue and then try to solve it, so I created the same folder structure you already have:
/dags/
dataflow.py
dataflow/
pipeline.py -> pipeline
setup.py
my_modules/
__init__.py
common.py
Therefore, to make it work, the change I made is to copy these folders to a place where the instance is running the code is able to find it, for example in the /tmp/ folder of the instance.
So, my DAG would be something like this:
1 - Fist of all I declare my arguments:
default_args = {
'start_date': datetime(xxxx, x, x),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'dataflow_default_options': {
'project': '<project>',
'region': '<region>',
'stagingLocation': 'gs://<bucket>/stage',
'tempLocation': 'gs://<bucket>/temp',
'setup_file': <setup.py>,
'runner': 'DataflowRunner'
}
}
2- After this, I created the DAG and before running the Dataflow task, I copied the whole folder directory, above created, into the /tmp/ folder of the instance Task t1, and after this, I run the pipeline from the /tmp/ directory Task t2:
with DAG(
'composer_df',
default_args=default_args,
description='datflow dag',
schedule_interval="xxxx") as dag:
def copy_dependencies():
process = subprocess.Popen(['gsutil','cp', '-r' ,'gs://<bucket>/dags/*',
'/tmp/'])
process.communicate()
t1 = python_operator.PythonOperator(
task_id='copy_dependencies',
python_callable=copy_dependencies,
provide_context=False
)
t2 = DataFlowPythonOperator(task_id="composer_dataflow",
py_file='/tmp/dataflow/pipeline.py', job_name='job_composer')
t1 >> t2
That's how I created the DAG file dataflow.py, and then, in the pipeline.py the package to import would be like:
from my_modules import commons
It should work fine, since the folder directory is understandable for the VM.

Python3.6 error: ModuleNotFoundError: No module named 'src'

I know similar questions have been asked before... But I had a quick doubt...
I have been following this link: https://www.python-course.eu/python3_packages.php
my code structure:
my-project
-- __init__.py
-- src
-- __init__.py
-- file1.py
-- test
-- __init__.py
-- test_file1.py
test_file1.py:
import unittest
from src.file1 import *
class TestWriteDataBRToOS(unittest.TestCase):
def test_getData(self):
sampleData = classInFile1()
sampleData.getData()
self.assertNotEqual(sampleData.usrname, "")
if __name__ == '__main__':
unittest.main()
Here I get the error:
ModuleNotFoundError: No module named 'src'
If I change to :
import sys
sys.path.insert(0, '../src')
import unittest
from file1 import *
then it works!
Can someone help me understand why it won't work like how it has been described in the link pasted above or any alternative way instead of writing the sys.path.insert(0, '../src') statement.
Thanks!
Edit:
after executing from my-project dir: python -m unittest test/test_file1/TestWriteDataBRToOS I am getting the error as updated below.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/unittest/__main__.py", line 12, in <module>
main(module=None)
File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__
self.parseArgs(argv)
File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
self.createTests()
File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
self.module)
File "/usr/lib/python2.7/unittest/loader.py", line 130, in loadTestsFromNames
suites = [self.loadTestsFromName(name, module) for name in names]
File "/usr/lib/python2.7/unittest/loader.py", line 91, in loadTestsFromName
module = __import__('.'.join(parts_copy))
ImportError: Import by filename is not supported.
You have to run the test from the my-project folder, rather than from the test folder.
python -m unittest test.test_file1.TestWriteDataBRToOS
Also you can you do:
export PYTHONPATH="${PYTHONPATH}:/path/to/your/project/"
or in Windows:
set PYTHONPATH="${PYTHONPATH}:/path/to/your/project/"
This is because it is not able to locate the module named 'src' probably because the path to the 'src' folder isn't specified correctly. If you directly write src.file1, then your 'src' file should be present in the same directory in which your current python file(test_file1.py) is located. If it isn't in the same directory, then you have to specify the entire directory. The reason why sys.path.insert(0, '../src') worked is because .. will move you up one directory and that's where your src folder might be. If your working directory for test_file1.py is /usr/bin/python then the location of your src folder would be /usr/bin/src and since it isn't the same as your current working directory, the python file is not able to locate the 'src' module.
this is my folder structure
while I was trying to extract fun present in common.py file in training .py
I got the error saying src module error
The code was: from src.utils.common import read_config
Then I tried this : from utils.common import read_config
It worked😊
If you also run two commends as follows:
set -a
source .env
It will work. Just make sure that you are running these commends in root directory.

appengine python remote_api module object has no attribute GoogleCredentials

AttributeError: 'module' object has no attribute 'GoogleCredentials'
I have an appengine app which is running on localhost.
I have some tests which i run and i want to use the remote_api to check the db values.
When i try to access the remote_api by visiting:
'http://127.0.0.1:8080/_ah/remote_api'
i get a:
"This request did not contain a necessary header"
but its working in the browser.
When i now try to call the remote_api from my tests by calling
remote_api_stub.ConfigureRemoteApiForOAuth('localhost:35887','/_ah/remote_api')
i get the error:
Error
Traceback (most recent call last):
File "/home/dan/src/gtup/test/test_users.py", line 38, in test_crud
remote_api_stub.ConfigureRemoteApiForOAuth('localhost:35887','/_ah/remote_api')
File "/home/dan/Programs/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 747, in ConfigureRemoteApiForOAuth
credentials = client.GoogleCredentials.get_application_default()
AttributeError: 'module' object has no attribute 'GoogleCredentials'
I did try to reinstall the whole google cloud but this didn't work.
When i open the client.py
google-cloud-sdk/platform/google_appengine/lib/google-api-python-client/oauth2client/client.py
which is used by remote_api_stub.py, i can see, that there is no GoogleCredentials class inside of it.
The GoogleCredentials class exists, but inside of other client.py files which lie at:
google-cloud-sdk/platform/google_appengine/lib/oauth2client/oauth2client/client.py
google-cloud-sdk/platform/gsutil/third_party/oauth2client/oauth2client/client.py
google-cloud-sdk/platform/bq/third_party/oauth2client/client.py
google-cloud-sdk/lib/third_party/oauth2client/client.py
my app.yaml looks like this:
application: myapp
version: 1
runtime: python27
api_version: 1
threadsafe: true
libraries:
- name: webapp2
version: latest
builtins:
- remote_api: on
handlers:
- url: /.*
script: main.app
Is this just a wrong import/bug inside of appengine.
Or am i doing something wrong to use the remote_api inside of my unittests?
I solved this problem by replacing the folder:
../google-cloud-sdk/platform/google_appengine/lib/google-api-python-client/oauth2client
with:
../google-cloud-sdk/platform/google_appengine/lib/oauth2client/oauth2client
the one which gets included in the google-api-python-client folder now has the needed Class: GoogleCredentials in the client file.
Then i had a second problem with the connection and now i have to call:
remote_api_stub.ConfigureRemoteApiForOAuth('localhost:51805','/_ah/remote_api', False)
note, the port changes every time, the server gets restarted.
Answering instead of commenting as I cannot post a comment with my reputation -
Similar things have happened to me, when running these types of scripts on mac. Sometimes, your PATH variable gets confused as to which files to actually check for functions, especially when you have gcloud installed alongside the app engine launcher. If on mac, I would suggest editing/opening your ~/.bash_profile file to fix this (or possible ~/.bashrc, if on linux). For example, on my Mac I have the following lines to fix my PATH variable:
export PATH="/usr/local/bin:$PATH"
export PYTHONPATH="/usr/local/google_appengine:$PYTHONPATH
These basically make sure the python / command line will look in /usr/local/bin (or /usr/local/google_appengine in the case of the PYTHONPATH line) BEFORE anything in the PATH (or PYTHONPATH).
The PATH variable is where the command line checks for python files when you type them into the prompt. The PYTHONPATH is where your python files find the modules to load at runtime.

django filer - ImportError: cannot import name mixins

I have a django 1.4 installation and I have django-cms running.
I'm try to install filer but when I syncdb or runserver I keep having this error.
from filer.models import mixins
ImportError: cannot import name mixins
In my setting.py I have:
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.messages',
'django.contrib.staticfiles',
# Uncomment the next line to enable the admin:
'django.contrib.admin',
'cms',
'mptt',
'menus',
'south',
'sekizai',
'ftlom',
#'cms.plugins.text',
#'cms.plugins.picture',
'cmsplugin_twitter',
'easy_thumbnails',
'filer',
#'ordered_model',
#'cmsplugin_filer_file',
#'cmsplugin_filer_folder',
'cmsplugin_filer_image',
#'cmsplugin_gallery',
'cms.plugins.video',
#'gunicorn',
# Uncomment the next line to enable admin documentation:
# 'django.contrib.admindocs',
)
If I remove filer and all its plugins my site works fine. What could possibly cause the problem? thanks
Traceback :
Validating models...
Unhandled exception in thread started by <bound method Command.inner_run of <django.contrib.staticfiles.management.commands.runserver.Command object at 0x1050d4fd0>>
Traceback (most recent call last):
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site- packages/django/core/management/commands/runserver.py", line 91, in inner_run
self.validate(display_num_errors=True)
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/django/core/management/base.py", line 266, in validate
num_errors = get_validation_errors(s, app)
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/django/core/management/validation.py", line 30, in get_validation_errors
for (app_name, error) in get_app_errors().items():
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/django/db/models/loading.py", line 158, in get_app_errors
self._populate()
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/django/db/models/loading.py", line 67, in _populate
self.load_app(app_name)
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/django/db/models/loading.py", line 88, in load_app
models = import_module('.models', app_name)
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
__import__(name)
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/filer/models/__init__.py", line 2, in <module>
from filer.models.clipboardmodels import *
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/filer/models/clipboardmodels.py", line 5, in <module>
from filer.models import filemodels
File "/Users/Alex/.virtualenvs/FTLOM/lib/python2.7/site-packages/filer/models/filemodels.py", line 8, in <module>
from filer.models import mixins
ImportError: cannot import name mixins
This my pip freeze :
Django==1.4
PIL==1.1.7
South==0.8.1
cmsplugin-filer==0.9.5
cmsplugin-twitter==1.0.4
django-classy-tags==0.4
django-cms==2.4.1
django-filer==0.9.3
django-mixins==0.0.10
django-mptt==0.5.2
django-ordered-model==0.2.0
django-polymorphic==0.5
django-sekizai==0.7
easy-thumbnails==1.3
gunicorn==17.5
html5lib==1.0b1
six==1.3.0
wsgiref==0.1.2
You are running Django version 1.4.1 or lower, which you can check by any of these methods:
on the bash
pip freeze | grep Django | awk 'BEGIN { FS = "==" } ; { print $2 }'
on the bash, if django has been properly set to the Linux path
django-admin --version
on the django.shell
import django
django.VERSION #or
django.get_version()
Although not specifically asserted, it seems that django-filer 0.9.x is incompatible with django versions below 1.4.1
Fixing the mixin problem won't help:
First off, django-mixins is a different project, and its namespace not used nor conflicting.
django-filer contains a module mixins, which is more of a stump for Icon-loading at the moment, rather than anything of genuine importance.
You could change the foldermodels.py and filemodels.py:
from filer.models import mixins
=> to
import filer.models.mixins
Yet after resolving the mixinsissue, you will still struggle with import difficulties of django.utils.six
Part of the reason is that utils.six, a Python 2/3 compatibility layer has only recently been added to Django 1.4.2
Solution
You can see from https://github.com/stefanfoulis/django-filer/blob/develop/HISTORY , that installing version 0.8.7 could work. First remove the newer django-filer package though.
pip uninstall django-filer
easy_install django-filer==0.8.7
Indeed this version works well on Django version 1.4.1
Note: If you installed django-mixins by mistake, remove it as well. It is good practice to not keep unused and non-dependent django-modules around.
I ran into the same problem. The problem was hard to find but simple to solve.
After installing django-polymorphic, everything worked fine.

Resources