How to use requirements.txt file in Airflow's PythonVirtualenvOperator? - airflow

In Airflow 2.2.3, PythonVirtualenvOperator was updated to allow templated requirements.txt files in the requirements parameter. However, I am unable to properly utilize that parameter. I am seeking guidance on how to properly utilize a requirements.txt file with Airflow's PythonVirtualenvOperator.
Here's my /dags format:
dags
├── .airflowignore
├── daily.py
└── modules
└── monday
├── monday.py
└── requirements.txt
In daily.py, I define my daily DAG. The task I'd like to reference requirements.txt for is defined like so:
#task.virtualenv(requirements='modules/monday/requirements.txt')
def sync_board_items():
from modules.monday.monday import sync_board_items
import logging
logging.basicConfig(level=logging.INFO)
sync_board_items(board_id=XXXX, table=XXXX)
This seems to fit the implementation described in GitHub, because requirements is a string, not a list, and it complies with the *.txt template. However, when the task runs, I quickly receive an error:
Executing cmd: /tmp/venvfn63dy3c/bin/pip install m o d u l e s / m o n d a y / r e q u i r e m e n t s . t x t
ERROR: Directory '/' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
This seems to indicate that PythonVirtualenvOperator is treating my requirements param like a list instead of a string. In other words, I am doing something wrong and the PythonVirtualenvOperator is not properly handling my requirements.txt file. What am I missing or how can I leverage a requirements.txt file in Airflow's PythonVirtualenvOperator?

The feature is available only for Airflow>=2.3.0 it's not available in 2.2.3
Currently 2.3.0 is under testing. 2.3.0b1 is available via pip.
if you want to use this feature in 2.2.3 you will need to create a custom operator by backporting the code in the PR.
Creating a MyPythonVirtualenvOperator with the code in the PR and utils should work fine in 2.2.3

Related

Usabe of pyinstaller with opencl

What is the correct way on converting a python script to binary using pyinstaller and opencl? How do I setup the name.spec file? See my pyinstaller settings below
Usage: pyinstaller --clean --noconfirm --log-level=DEBUG --onefile --paths=/home/testuser/projects/tool --paths=/usr/local/lib/python3.8/dist-packages/pyopencl-2018.2.2-py3.8-linux-x86_64.egg --hidden-import=pyopencl --name=toolexe tool.py
From my main script(myscript.py), I am calling "import tool as *", which is my tool package, and it's calling "import pyopencl as cl" in several tool package scripts. When I run the compiled binary version on my scripts, I see "pyopencl.init.py" being called multiple times.. (see diagram below)
I am wondering if it's the relative import of "tool as *", that screwing up the binary version. Note: This code does work as python script code.
#myscript -- "import tool as *"
└── tool
├── common.py ... #"import pyopencl as cl"
├── engine.py ... #"import pyopencl as cl"
├── traffic.py... #"import pyopencl as cl"
├── init.py
I found two issue with my conversion python scripts to binary
The python script import tool/tool.py was wrong, you can't have import script with the same name as the directory it's in.
In linux, there is a warning with threading when converting to binary..
https://docs.python.org/3/library/multiprocessing.html
Warning The 'spawn' and 'forkserver' start methods cannot currently be used with “frozen” executables (i.e., binaries produced by packages like PyInstaller and cx_Freeze) on Unix. The 'fork' start method does work

which path will be appended to sys.path when i run python3 src/b.py?

Which paths are added to sys.path when the command is run, what are the factors that affect it?
My Python version is 3.6.4, and I also tried it on version 3.7.
Directory structure is
.
├── __init__.py
└── src
├── a.py
└── b.py
code is
# a.py
class A: pass
# b.py
from sys
print(sys.path)
from src.a import A
a = A()
print(a)
I tried to run python3 src/b.py on two machines with the same Python version. One of them did not report an error and the other error occurred.
In the correct running result, there are two directories in sys.path, one is the current directory and the other is the src directory;
The correct output is:
['/home/work/test/testimport/src', '/home/work/test/testimport',...]
<src.a.A object at 0x7f8b71535ac8>
The wrong result is:
['/home/work/test/testimport/src', ...]
Traceback (most recent call last):
File "src/b.py", line 3, in <module>
from src.a import A
ModuleNotFoundError: No module named 'src'
sys.path contains only the src directory.
Which path will be appended to sys.path when i run python3 src/b.py?
src is indeed not a module (does not contain __init__.py) and it does not matter if it is in your path or not. In addition, b.py "sees" the directory it is in (src) anyway, so
from a import A
would work no matter where are you executing B from (python3 /path/to/src/b.py) should work. Note even if you did create
`src/__init__.py`
your b.py would fail if you did not add the directory src to your path (or PYTHONPATH, which is the recommended way to add python modules to your path).

Cannot collect patch dependency on a local Artifactory Pypi repository

While testing out conan, I had to "pip install" it.
As I am working in a fully offline environment, my expectation was that I could simply
Manually deploy all dependencies listed in https://github.com/conan-io/conan/blob/master/conans/requirements.txt to a local Pypi repository called myrepo-python
Install conan with
pip3 install --index http://myserver/artifactory/api/pypi/myrepo-python/simple conan
This works fine for some packages and then fails for the dependency on patch == 1.16
[...]
Collecting patch==1.16 (from conan)
Could not find a version that satisfies the requirement patch==1.16 (from conan) (from versions: )
No matching distribution found for patch==1.16 (from conan)
Looking into the Artifactory logs, this shows that even though I manually deployed patch-1.16.zip (from https://pypi.org/project/patch/1.16/#files) into the repository, it is not present in the index...
The .pypi/simple.html file doesn't contain an entry for 'patch' (checked from the Artifactory UI)
The server logs ($ARTIFACTORY_HOME/logs/artifactory.log) show the file being added to the indexing queue but don't contain a line saying that it got indexed
Does anyone know why patch-1.16.zip is not indexed by Artifactory?
This is on Artifactory 5.8.4.
For now, my only workaround is to gather all the dependencies into a local path and point pip3 at it
scp conan-1.4.4.tar.gz installserver:/tmp/pip_cache
[...]
scp patch-1.16.zip installserver:/tmp/pip_cache
[...]
scp pyparsing-2.2.0-py2.py3-none-any.whl installserver:/tmp/pip_cache
ssh installserver
installserver$ pip3 install --no-index --find-links="/tmp/pip_cache" conan
The reason you are unable to install the "patch" Pypi package via Artifactory is that it does not comply with the Python spec.
Based on Python spec (https://www.python.org/dev/peps/pep-0314/ and https://packaging.python.org/tutorials/packaging-projects/), the structure of a Python package should be, for example:
└── patch-1.16.zip
└── patch-1.16
   ├── PKG-INFO
   ├── __main__.py
   ├── patch.py
   └── setup.py
However, the zip archive (can be found here https://pypi.org/project/patch/1.16/#files) is structured like that:
└── patch-1.16.zip
   ├── PKG-INFO
├── __main__.py
   ├── patch.py
   └── setup.py
Artifactory is searching for the metadata file (PKG-INFO in this case) in a certain pattern (inside any folder). Since the PKG-INFO is in the root of the zip (and not in a folder), it cannot find it, therefore, this package's metadata will not be calculated and it will not appear in the "simple" index file (see the error in artifactory.log). As a result, you are unable to install it with pip.
Workaround:
What you can do is manually changing the structure to the correct one.
Create a folder named patch-1.16 and extract the zip to it. Then, zip the whole folder, so you will get the structure like the example above. Finally, deploy this zip to Artifactory.
This time, the PKG-INFO file will be found, the metadata will be calculated and pip will be able to install it.

How to install flyway DB migration tool in CentOS?

I am trying to install flyway on a centOS machine.
I have downloaded Flyway command line tar file and extracted it.
I tried to execute some flyway commands but dnt work
it says "-bash: flyway: command not found"
Did I miss anything.
Do I have to install?
I dnt find any tutorials for Installation.
No need to install it, it's simply a shell script with a JRE, the Flyway Java libraries and associated resources.
Sounds like you need to add the location of to the flyway shell script to your PATH variable if you want to run it without being in the directory or specifying the path.
e.g.
If you have extracted flyway-commandline-4.1.2-linux-x64.tar.gz to /opt/flyway/flyway-4.1.2 which looks like:
flyway-4.1.2
├── conf
├── flyway # <---- The shell script
├── lib
└── ...
somewhere in your setup you want that on your PATH
export PATH=$PATH:/opt/flyway/flyway-4.1.2
Note the command line documentation mentions the first two steps as
download the tool and extract it
cd into the extracted directory.

How to define directory structure following packages in project's Scala build definitions?

There are two full build definition files in sbt project: Build.scala and Helpers.scala. They are located in project folder.
I'd like to put Helpers module into separate sub-folder project/utils. When I do import utils.Helpers in Build.scala it says:
not found: object utils
Is it possible to define directory structure that follows the packages in sbt full build definitions?
you should use project/src/main/scala/utils instead of project/utils
Sbt builds are recursive, which means that sbt build definition is built by sbt, applying the same rules as per normal project.
Unlike Java, Scala has no strict relation between the package and folder structure. Meaning you can place your sources wherever you like and it doesn't have to match package declaration. Scala will not complain.
Sbt knows where to search for folders by checking sourceDirectories setting key.
You can check it easily by executing show sourceDirectories. However this will show the sourceDirectories for your actual project. How you can check it for the build? Quite easily, execute reload plugins, this will take you to your build. Execute show sourceDirectories, and it should show you that it looks for sources in /project/src/main/scala, project/src/main/java and one more, which is managed sources (doesn't matter for our case). Now you can execute reload return to go back to your main project.
Given that you should be able to create an object let's say, named Helpers in project/src/main/scala/utils/Helpers.scala:
package utils
object Helpers {
def printFancy(name: String) = println(s">>$name<<")
}
And use it in your Build.scala:
import sbt._
import Keys._
import utils.Helpers._
object MyBuild extends Build {
val printProjectName = taskKey[Unit]("Prints fancy project name")
lazy val root = project.in(file(".")).settings(
printProjectName := printFancy(name.value)
)
}
You can test it by executing printProjectName.
> printProjectName
>>root<<
[success] Total time: 1 s, completed May 29, 2014 1:24:16 AM
I've stated earlier that sbt is recursive. This means, that if you want, you can use the same technique to configure the sbt build, as you use for configuring building of your own project.
If you don't want to keep your files under /project/src/main/scala, but just under /project/utils, you can do so by creating build.sbt in your project folder, with following content:
unmanagedSourceDirectories in Compile += baseDirectory.value / "utils"
Just as it is described in the documentation
Now even if you place your utils in project/utils sbt should be able to find it.

Resources