Composer auto scaling? - airflow

Given that GCP Cloud Composer is running with GKE/GCE, is it auto scaling?
Now I have 3 nodes in the cluster to support say, 100 DAGs.
Later, if I have about 300 DAGs, will it scale up itself (w/ celery workers)?

We currently don't support auto-scaling but it's on our roadmap. You could however manually scale up/down the GKE cluster by updating the nodeCount value.

I just published an article showing how to enable autoscale in its underlying kubernetes https://link.medium.com/AMUlwUIkD0.
Basically:
Enable autoscale on the node level
Apply HorizontalPodAutoscaler to airflow-worker deployment
Increase some airflow config parameter to remove the bottleneck
This has been tested in composer-1.7.2 and composer-1.7.5 but might be applicable to other version as well
Do check it out

No autoscaling at this point, but I expect it's in the roadmap.
Astronomer is working on an autoscaling Airflow service using Kubernetes, it'll probably launch around the time Google Composer launches autoscaling... so good times are coming :)

Cloud Composer brings native support for Environment Scaling since major version 2:
gcloud beta composer composer environments update <ENVIRONMENT_NAME> \
--location <LOCATION> \
--min-workers <WORKERS_MIN> \
--max-workers <WORKERS_MAX>
See Scale Environments

Related

Airflow background tasks

I have started to work with Airflow recently and I have two questions. I am currently using it as background on Ubuntu as I have created the two following services:
sudo touch /etc/systemd/system/airflow-webserver.service
sudo touch /etc/systemd/system/airflow-scheduler.service
It work well but, my questions are:
Is there a way to use the GUI when not connected to the VM? (It is currently impossible for me)
The scheduler does not work as soon as I stop the VM local forwarding. Is there a way to have it enables 24/7, whatever if my computer is on or off?
Any kind of explanation would be appreciated.
Edit:
According to this post How do you keep your airflow scheduler running in AWS EC2 while exiting ssh?
running as a service seems to be enough to have the scheduler set even when not doing ssh. But in my case it is not working. Could it be because of the user name in the .service file? What should be the user name in the file?

What is the ideal environment to run Apache Airflow on?

I am currently running Airflow through Ubuntu WSL on my PC, which is working great. However I am setting up a pipeline which will need to be running constantly (24/7), so I am looking for ideas and recommendations on what to run Airflow on. I do not want to have my system on all the time obviously.
Surprisingly, I cannot find much information on this! It seems it is not discussed at length...
It depends on your workload.
If you have few tasks to run you can just create a VM on any cloud provider (GCP, AWS, Azure, etc.) and install Airflow on it that would run 24x7.
If your workload is high you can user K8s (GKE, EKS, etc.) and install Airflow on it.

Symfony 3 application running on OSX

My Symfony 3.4 application is running super slow on DEV environment - it is taking about 35 seconds when using Docker and 20 seconds when running with Symfony's server.
Profiler shows my controller takes too much time to compile.
Symfony Profiler
What I noticed is Symfony Profiler does not shows performance metrics when I run it using Docker - it does when I run using it's own server.
Any idea where I can look at? I already tried lots of workarounds without zero success.
Thanks
You probably run Symfony in prod mode on Docker. You need to make sure to run it in the environment dev so the profiler is active. Depending on your setup you need to use web/app_dev.php instead of web/app.php or set the environment variable APP_ENV=dev in case you have a public/index.php.

Using gitLab-CI for Qt-Projects

I want to use gitLab-CI for a Qt-project, but i can't figure out, what I need to do so. I understand, that the whole pipeline process takes place on the CI-Server, but how do I setup the needed requirements like the qt-environment?
Solution:
Ok now I got it! You just use the Runner for it, if you do not have a Server, you can use a VM.
For GitLab.com
The runners are already set up (shared runners).
You need to use a Qt SDK Docker image or install it yourself:
Use image: <image-name> for .gitlab-ci.yml
Use apt or some other package manager (not recommended)
Once you got a Qt SDK environment set up inside .gitlab-ci.yml, make sure to add a command to build/compile/run/test it.
For non-GitLab.com
The runners may or may not be set up, but you do not need to do any specific changes (other than using faster machines with bigger memory, etc for building in necessary).
You need to use a Qt SDK Docker image or install it yourself:
Use image: <image-name> for .gitlab-ci.yml
Use apt or some other package manager (not recommended)
Once you got a Qt SDK environment set up inside .gitlab-ci.yml, make sure to add a command to build/compile/run/test it.
Other Helpful Comments
This is from Josh Peak's comment:
Ok that answers my question. I'm going to have to preconfigure a VM image and/or a Docker image with the QT SDK that the rest of my dev team can leverage. Thanks for the quick response.
This answer is from ManuelP.'s question:
Ok now I got it! You just use the Runner for it, if you do not have a Server, you can use a VM.

(GCP/GCE) Update gcloud SDK to use VPN tools

I've been using GCE and gcloud for a few weeks now. A new set of VPN tools were released in alpha last Dec. 3rd (https://cloud.google.com/compute/docs/vpn), and I need to start testing with them.
The problem is that gcloud doesn't seem to recognise this new set of tools, and I get errors like:
$ gcloud compute target-vpn-gateways create --region us-central1 --network default vpn1
ERROR: (gcloud.compute) Invalid choice: 'target-vpn-gateways'.
$ gcloud compute vpn-tunnels describe
ERROR: (gcloud.compute) Invalid choice: 'vpn-tunnels'
target-vpn-gateways and vpn-tunnels are just not part of the command groups.
So, I though of updating the core and compute components, but they're all up to date. This seems so new that there's no information at this time in Google Cloud documentation about updating the SDK to be able to use these VPN tools.
Any ideas? I'm using OSX in case it matters.
Thanks a lot in advance!
EDIT:
As of March 2015, the documentation has been updated and it's now in beta stage:
https://cloud.google.com/compute/docs/vpn
So, to answer my own question according to this update, beta VPN functionality can be accessed by updating components this way:
$ gcloud components update beta
Sorry about the confusion here. Google Cloud Platform support is still in limited preview state. What this means is that the API calls only work for specially whitelisted projects and that the normal gcloud build doesn't yet include VPN support. (Because it would be potentially confusing if the gcloud command existed but would always fail do the API not being enabled yet.)
As I understand it your best bet for getting your project whitelisted is to go through the GCP sales office:
https://cloud.google.com/contact/
I'm going to try to get the docs updated so that the situation is more clear.
It was announced in GCP Live that VPN will be available in Q1, 2015; that means Beta and GA is likely to happen between Jan- Mar.
the updates i have been receiving is the beta launch is likely around end Feb 2015.

Resources