What is the ideal environment to run Apache Airflow on? - airflow

I am currently running Airflow through Ubuntu WSL on my PC, which is working great. However I am setting up a pipeline which will need to be running constantly (24/7), so I am looking for ideas and recommendations on what to run Airflow on. I do not want to have my system on all the time obviously.
Surprisingly, I cannot find much information on this! It seems it is not discussed at length...

It depends on your workload.
If you have few tasks to run you can just create a VM on any cloud provider (GCP, AWS, Azure, etc.) and install Airflow on it that would run 24x7.
If your workload is high you can user K8s (GKE, EKS, etc.) and install Airflow on it.

Related

Airflow conn types missing even after installing the correct provider?

Learning apache airflow and I'm trying to create a new connection type the correct way, but it still doesn't show up. I am working within a virtual environment on WSL2 through VS Code, and my terminal looks like this (sandbox) nick#GameCube:~$
This is what my pip list within the venv shows
But these are my only conn types within the airflow UI (I am running airflow webserver within the venv too)
You can see that all the other installed conn types show up, but http doesn't for some reason. Why does this happen? I have tried restarting vscode, killing the webserver process and restarting it, etc but it still doesn't show!
The apache-airflow-providers-http==4.0.0 is not compatible with apache-airflow==2.1.0
To solve your issue you need to downgrade http provider to a version compatible with Airflow 2.1.0
Noting that Http provider is installed by default with Airflow if you need to bump its version make sure you set it to compatible version.
In larger context I suggest to install Airflow from constraints as suggested in the docs. This will prevent case of unstable environment as Airflow brings many dependencies and it guarantees a working environment of Airflow. You can upgrade providers to newer versions but you will need to take a closer look on each upgrade. You can read about it in this doc: Installing/upgrading/downgrading providers separately from Airflow core.

Airflow background tasks

I have started to work with Airflow recently and I have two questions. I am currently using it as background on Ubuntu as I have created the two following services:
sudo touch /etc/systemd/system/airflow-webserver.service
sudo touch /etc/systemd/system/airflow-scheduler.service
It work well but, my questions are:
Is there a way to use the GUI when not connected to the VM? (It is currently impossible for me)
The scheduler does not work as soon as I stop the VM local forwarding. Is there a way to have it enables 24/7, whatever if my computer is on or off?
Any kind of explanation would be appreciated.
Edit:
According to this post How do you keep your airflow scheduler running in AWS EC2 while exiting ssh?
running as a service seems to be enough to have the scheduler set even when not doing ssh. But in my case it is not working. Could it be because of the user name in the .service file? What should be the user name in the file?

Airflow task initiation issue

We are running more then 150 dags in our production airflow environment and we facing task initiation issues very often. We are running airflow 1.7.2 with local executer mode hosted in google compute engine, Cloudsql as our metadb.
How to fix this issue? I have upgraded airflow into 1.8.2 but no luck. For the temporary solution, we are changing our dag name and start date to fix this issue. but this is not a solution. what is the solution for this issue? and Why it is happening?
It appears that the airflow scheduler and webserver are working OK, but there is an issue with the airflow workers. If you review the airflow worker log, you may see an error there.

Openstack and devstack

Does devstack completely install openstack? I read somewhere that devStack is not and has never been intended to be a general OpenStack installer. So what does devstack actually install? Is there any other scripted method available to completely install openstack(grizzly release) or I need to follow the manual installation steps given on openstack website?
devstack does completely install from git openstack.
for lesser values of completely anyways. devstack is the version of openstack used in jenkins gate testing by developers committing code to the openstack project.
devstack as the name suggests is specifically for developing for openstack. as such it's existence is ephemeral. in short, after running stack.sh the resulting ( probably ) functioning openstack is setup... but upon reboot it will not come back up. there are no upstart or systemd or init.d scripts for restarting services. there is no high availability, no backups, no configuration management. And following the latest git releases in the development branch of openstack can be a great way to discover just how unstable openstack is before a feature freeze.
there are several vagrant recipes in the world for deploying openstack, and openstack-puppet is a puppet recipe for deploying openstack. chef also maintains an openstack recipe as well.
Grizzly is a bit old now. Havana is the current stable release.
https://github.com/stackforge/puppet-openstack
http://docs.opscode.com/openstack.html
http://cloudarchitectmusings.com/2013/12/01/deploy-openstack-havana-on-your-laptop-using-vagrant-and-chef/
and ubuntu even maintains a system called maas and juju for deploying openstack super quickly on their OS.
https://help.ubuntu.com/community/UbuntuCloudInfrastructure
http://www.youtube.com/watch?v=mspwQfoYQks
so lots of ways to install openstack.
however most folks pushing a production cloud use some form of configuration management system. that way they can deploy compute nodes automatically. and recover systems quickly.
also check out openstack on openstack.
https://wiki.openstack.org/wiki/TripleO
I think the code should be same, but at least the configuration is not same, for example, devstack will by default use nova network. In a manual installation, you can choose neutron. so:
if you are starting to learn openstack, devstack is a good starting point. with it, you can quickly have a development env.
if you are deploying openstack env, devstack is not a choice, and
instead you need install it following the installation guide.
If you would like another scripted option for deployment, you can try Packstack. This will work only on Fedora and RHEL.
https://wiki.openstack.org/wiki/Packstack
https://www.rdoproject.org/install/quickstart/
In this, you can choose which services you would like to install. For example you may choose to install Neutron for networking purposes, instead of using nova.
Also, it lets you deploy multiple instances of compute nodes by just providing it's IP !!
Yes. Devstack is a tool which help you build all in one for Openstack environment in quickly (Just take a coffee cup and wait until complete). Normally they were using for developer to develop new features and/ or test code quickest. For operator, we need to setup by manual step by step for each services.
To build via devstack repo then you need pull newest source-code from http://git.openstack.org/openstack-dev/devstack. then create new local.conf in devstack folder. And run ./stack.sh.
For example local.conf: https://github.com/pshchelo/stackdev/blob/master/conf/local.conf.sample
Yes, Devstack install all the components of Openstack. But when you use basic configuration then it will install core components of openstack which are the base of openstack cloud platform to run some basic things.
And in Advance configuration of openstack you should configure your local.conf file for what type of services and components you want to install or use in your cloud.
https://github.com/openstack/tacker/blob/master/devstack/local.conf.example

Is it possible to run OpenStack on a laptop/desktop?

I have some questions:
Is it possible to install openstack on a Notebook with a 4GB DD3 Ram? Because the website says it needs atleast 8GB of RAM.
They say it requirs a double-QuadCore , I assue that means Octacore. Can we install that on a Quadcore?
They say that there is no possibility to install it on a NAS . Did you find any where if there is a possibility to do?. I dint find any even after asking our friend(google).
All in all, is it at-all possible to install on it a notebook/Desktop?
That advice is for production environments,
so 1)If you just want to play around your notebook will do fine. I had a succesful test-run on a 1.2 Ghz 1GB Netbook. It became incredibly slow when it launched it's first instance...
With a Double Quadcore they actually mean two seperate Quad-cores, as in two quad-core xeon processors on a single motherboard
So 2) yes you can install it on a quad-core.
3) a NAS device running openstack an openstack storage service seems to be unlikely indeed. You will most likely need more computing power.However If your NAS supports NFS or SSH or sth you can probably mount this drive and use it for storage.
4) You can perfectly build a all-in-one openstack test setup on your notebook. Performance will be low, but acceptable for testing.
It depends on what you mean by "install OpenStack". OpenStack itself is an extremely modular framework consisting on many services (Compute, Networking, Image service, Block Storage, Object Storage, Orchestration, Telemetry, ...). On top of that, a typical production deployment of OpenStack also requires several components, like load balancers, caching systems, firewalls, web servers and others. It is definitely possible to install a minimal openstack system, even on an average laptop.
The simplest way to run OpenStack on a laptop/desktop is to use Devstack, a shell script that installs all services from source and run them (by default) on a single machine. It is customizable enough to provide very good testing ground; it's used by OpenStack developers as well as the OpenStack QA team to test latest developments against "real" systems.
To avoid messing up your system, it's generally recommended to install OpenStack in a VM. From devstack doc:
DevStack should run in any virtual machine running a supported Linux release. It will perform best with 2Gb or more of RAM.
As of the time of this writing (Jan 2015), supported distros are:
Ubuntu (latest LTS)
Fedora
CentOS
Regarding NAS: you can of course use it, but "outside" Openstack apis, by providing mount points to your vms. It's even mandatory if you want to support live migration.

Resources