Airflow background tasks - airflow

I have started to work with Airflow recently and I have two questions. I am currently using it as background on Ubuntu as I have created the two following services:
sudo touch /etc/systemd/system/airflow-webserver.service
sudo touch /etc/systemd/system/airflow-scheduler.service
It work well but, my questions are:
Is there a way to use the GUI when not connected to the VM? (It is currently impossible for me)
The scheduler does not work as soon as I stop the VM local forwarding. Is there a way to have it enables 24/7, whatever if my computer is on or off?
Any kind of explanation would be appreciated.
Edit:
According to this post How do you keep your airflow scheduler running in AWS EC2 while exiting ssh?
running as a service seems to be enough to have the scheduler set even when not doing ssh. But in my case it is not working. Could it be because of the user name in the .service file? What should be the user name in the file?

Related

Airflow SparkSubmitOperator with Yarn Cluster Mode not being able to track application status

I started reading about how we could run Spark batch jobs using Airflow.
I have tried using SparkSubmitOperator in local and it works fine. However, I need a recommendation, if we could use it in cluster mode.
The only problem I see when using in cluster mode is that, here the application status not being able to be tracked,ref shared in the link below:
https://albertusk95.github.io/posts/2019/12/airflow-tracks-spark-driver-status-cluster-deployment/
Please suggest if anyone has tried using this operator and works well in cluster mode, or if there is any issue using it.

What is the ideal environment to run Apache Airflow on?

I am currently running Airflow through Ubuntu WSL on my PC, which is working great. However I am setting up a pipeline which will need to be running constantly (24/7), so I am looking for ideas and recommendations on what to run Airflow on. I do not want to have my system on all the time obviously.
Surprisingly, I cannot find much information on this! It seems it is not discussed at length...
It depends on your workload.
If you have few tasks to run you can just create a VM on any cloud provider (GCP, AWS, Azure, etc.) and install Airflow on it that would run 24x7.
If your workload is high you can user K8s (GKE, EKS, etc.) and install Airflow on it.

GCE: cannot login, The VM guest environment is outdated and only supports the deprecated 'sshKeys' metadata item

I cannot ssh into my Google Compute Engine (GCE) Wordpress instance anymore.
It was working one month ago when I tried last.
I use the Google built-in SSH client in a Chrome browser window.
Yesterday I tried an got the following message:
The VM guest environment is outdated and only supports the deprecated
'sshKeys' metadata item. Please follow the steps here to update.
The "Steps here" link navigates to https://cloud.google.com/compute/docs/images/configuring-imported-images#install_guest_environment which does not seem to help me much.
I am not aware of any changes that I may have made.
How can I fix this?
It looks like your instance's disk is full, and so the SSH keys can't be created in the temp directory. You can do the following:
Stop your instance and wait for it to shut down
Click on the disk your instance is using, and choose "edit" at the top
Enter a larger disk size, and save
Go back to your instance and start it up again
You should now be able to connect via SSH. While you're in there, check to see what filled up your hard disk so you can prevent this from happening again (maybe a rogue program is printing out too many logs, etc).
If you're seeing this on Debian 8 or 9, the most likely reason for this is that the google-compute-engine.* packages that allow SSH access to the instance have been removed by apt-get autoremove.
If you have an open SSH connection to the machine or can use a tool like gcloud, running apt-get update && sudo apt-get install gce-compute-image-packages should fix this.
If you no longer have any SSH access, there is a procedure available on the GCP docs site that can be used to restore it.
I've created a bug report here for this.
Might be a bit late, but you can
1) Stop the VM
2) Edit and enable serial console
3) Use the serial connection to login and update the VM
recent days, I meet similar problem, later I find the permission rights of my home directory fools me, as a lazy-bone, I chmod 777 ~
After did that, I cannot ssh via my terminal, even cannot ssh via browser, only get 'The VM guest environment is outdated and only supports the deprecated 'sshKeys' metadata item, Plese follow the steps here to update'. Sounds like you must set 755 to your home dir, not just care your 700 .ssh or 600 authorized_keys.
I met the similar issue after I created a FreeBSD VM, gcloud ssh not works, but I am lucky that I can use the browser window ssh to my VM. Then I manually add the google_compute-engine public key to the .ssh/authorized_keys, now it work, I can use the gcloud ssh to connect. But not sure if this is a better/security way.

BOSH doesn't recognize deleted VM

I'm working with BOSH on Openstack. I called bosh -n deploy to have BOSH update an existing deployment. The update required some slave machines to be brought down. As far as I can tell on the Openstack Horizon Web GUI and through command line calls to the Openstack tenancy I'm working on the VMs that should have been brought down have been. However, BOSH seems to think all but one have been brought down.
Is there a way to go into my MicroBOSH VM to edit an entry somewhere that will fix this error?
I can't be positive that the error is completely due to BOSH because the Openstack cloud that I'm working on is going to be completely rebuilt soon and therefore there could be any number of things happening behind the scenes that I don't know about. As such I just want to be able to stop BOSH from complaining about a VM that it can't delete (because it's already gone).
Running bosh cck should take care of it.

openstack how to prevent losing vms

I am using "devstack" to play with the openstack in my desktop.
I had configured several vms in my instance. What happened was couple of days ago there was a power failure which caused my desktop to power down(I didnt have a UPS) attached to it. This resulted in my losing all the vms since i didnt unstack.
One of the solution to prevent this from happening next time is using a UPS. Are there any other solutions that I can use to back the vms so that even if there is a power loss the vms will run if i just restart and do ./stack.sh
Create snapshot of VM
Instance snapshots are uploaded to Glance which will store them in /var/lib/glance/images on the controller node.
Backup this folder.
When there is a data lose occurs , just restore this folder and Launch new instance by boot from image. select the snapshot and click launch.
Devstack is a developer environment, it is not meant to recover from power losses.
You should consider using another all-in-one openstack installer which should support restarting the openstack services without losing state. For instance, you can use Redhat's packstack - https://openstack.redhat.com/Quickstart

Resources