I am new to airflow (v1.10).
I have two dags each requires the same dependencies but with different versions. how can I make sure that my dependencies will not overwrite each other(for example dag1.py requires helper.py from version v1 and dag2.py requires helper.py from version v2)?
In general I see two possible solutions for your problem:
Airflow has a PythonVirtualEnvOperator which allows a task to run in a virtualenv which gets created and destroyed automatically. You can pass a python_version and a list of requirements for the task to build the virtual env.
Set up a docker registry and use a DockerOperator rather than a PythonOperator. This would allow teams to set up their own Docker images with specific requirements. (Suggested by dlamblin)
Related
There seems to be no proper documentation about upgrading airflow. The Upgrading Airflow to a newer version page only talks about upgrading the database. So what is the proper way of upgrading airflow?
Is it just upgrading the python packages to the newest versions? Or should I use the same venv and install the newer airflow version completely from scratch? Or is it something else altogether?
I'm guessing doing the database upgrade would be the final step followed by one of these steps.
I was also struggling with upgrading airflow for minor versions and didn't feel like I found a good answer in the docs. I think I have the right approach after looking back at how I installed airflow in the first place.
If you followed the guide to run airflow locally you'll want to change the value for AIRFLOW_VERSION in the commands to your desired version.
If you followed the guide to run airflow on docker, then you'll want to fetch the latest docker-compose.yaml. The command on the site always has the latest version. Then re-run docker compose up.
You can confirm you have the right version by running airflow version. I run airflow via docker so the docker steps work for me, I imagine the local steps should be about the same.
Adding to Vivian's answer -
I had installed airflow from PyPi and was upgrading from 2.2.4 to 2.3.0.
To upgrade airflow,
I installed the new version of airflow in the same virtual environment as 2.2.4 (using this).
Upgraded the database using airflow db upgrade. More details here.
You might have to manually upgrade providers using pip install packagename -U
After this, when I started Airflow, I got an error related to some missing conf. Airflow wanted the newest version of airflow.cfg, but I had the older version. To fix this,
Renamed my airflow.cfg to airflowbackup.cfg. This is done so that airflow will make a new airflow.cfg on start up when it sees that there is no config file.
Compared airflowbackup.cfg with the 2.2.4 config to find out all the fields I had changed.
Manually made those same changes in the newly made airflow.cfg
I upgraded docker image to use airflow 1.10.14. Airflow is deployed with helm and I have an init-container which execute script to initialize airflow. The init script contain commands
...
airflow upgradedb
alembic upgrade heads
...
The upgrade failed so I need to rollback to previous deployed release version which contains the 1.10.10 version of airflow but it is now getting the alembic error. I tried to delete the row/record in the alembic_version table based on my search.
The error in scheduler container is this:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateColumn) column "operator" of relation "task_instance" already exists
All the other pods are running fine (webserver and workers).
Any resolution/workaround to this issue?
Unless you are ok with scrapping your entire metadata DB (connections, variables, task runs, etc) I might opt to just push to 1.10.15 and see if the bug you encountered is resolved there. From my best understanding it is not possible to downgrade the DB after the upgrade has been done.
Suggesting upgrade to 1.10.15 based on if you remember encountering an issue similar to this user here. The CLI fix can be found here. If you found another issue with your 1.10.14 upgrade besides the one I noted for the CLI, it might be worth investigating a path to resolving that instead.
Airflow has an upgradedb command that needs to be run when upgrading Airflow versions. I wonder if it's safe to run even if the version is the same
The way it works is in db.py they use the alembic command module to check the checked in files in the migrations directory https://github.com/apache/incubator-airflow/tree/master/airflow/migrations/versions, and only make the changes if the commit version differs. But these files only get changed/added once we change the version so the upgrade db step does nothing when it's the same version/whl.
Adding it as a default step since I've verified it's safe to do so.
I'm trying to create a SBT build that can publish a Docker container either to DockerHub or to our internal Docker repository. I'm using sbt-native-packager 1.0.3 to build the Docker image.
Here's an excerpt from my build.sbt:
dockerRepository in dockerInternal := Some("thomaso"),
packageName in dockerInternal := "externalname",
sbt docker:publish now successfully publishes to thomaso/externalname on DockerHub.
To add the option to publish to our internal Docker repo I added a configuration called dockerInternal:
val dockerInternal = config("dockerInternal") extend Docker
I then added these two settings to override the defaults:
dockerRepository in Docker := Some("docker.nrk.no/project"),
packageName in Docker := "internalname",
My expectation was that sbt dockerInternal:publish should publish a Docker image to docker.nrk.no/project/internalname. Instead, I get this error message:
delivering ivy file to /home/n06944/repos/nrk.recommendations/api/target/scala-2.10/ivy-0.1- SNAPSHOT.xml
java.lang.RuntimeException: Repository for publishing is not specified.
It seems to me SBT tried to publish to Ivy, not to Docker - when I hardcode the values to the internal repo the publishing works fine and there is no mention of Ivy in the logs. The Docker configuration modifies the publish task, and I hoped that by letting dockerInternal extend Docker I would inherit the Docker-specific publish behavior. Is that an incorrect assumption? Am I missing some incantations, or is there another approach that would be better?
You forgot to import all the necessary task to your new config. sbt-native-packager recommends generating submodules for different packaging configurations.
If you want to fiddle around with configuration scopes (which gets pretty messy very fast) here is another SO answer I gave.
cheers,
Muki
In pre-buildout-times one would install ZOPE2 by downloading the tarball of http://old.zope.org/Products/Zope/ and do the configure/make/install-procedure.
Since ZOPE version 2.12 releases are made on pypi. Would it still be possible to install newer ZOPE2-versions the same way manually without using buildout?
Perspectively Plone is ment to be put on top ZOPE2, but to narrow down the question for now, an answer only concerning ZOPE2 is very welcome.
I may be late to the party but:
As a starting point: There is the projects installation docs at https://zope.readthedocs.io/en/2.13/INSTALL-virtualenv.html which worked fine (and is without buildout) the last time I tried.
Since I use virtualenv and pip a lot, the above method becomes cumbersome fast (installation from different path than pypi and local equivalent, accidentally upgrading wrong packages when installing more packages) I made an almost pure reference installation and then just did a pip freeze > zope_2.13_requirements.txt.
Now I can just create a new virtualenv and do a quick pip install -r zope_2.13_requirements.txt, can do it directly via pipy and have a fresh installation whenever I need.
The main part of the question probably is that you probably want to use Zope 3 and not legacy Zope 2 (which e.g. Plone still depends on). Zope is not a signle, coherent, entity. What components of Zope stack you want to use (zope.interface, zope.component, ZODB, Medusa web server, Zope management interface, others?) All are individual Python packages and can be used as is in any Python application with normal Python package workflow.
Buildout is nothing but scripts, templates and Python package installer with advanced dependency solving.
You can still install all Zope packages by hand, resulting a lenghty requirements.txt. Zope 2 comes with command line scripts for creating and maintaining databases and you can call these scripts by hand, no need to go through buildout. You can also create configuration files by hand, e.g. looking the examples generated by buildout if you have some specific legacy project in mind.
For example, substanced, a CMS based on Pyramid and ZODB, does not rely on buildout. Pyramid internally uses zope.interface, zope.component and various other packakages.