Airflow does not pick up symlinked DAGs - airflow

I want to link in some DAGs from a directory outside of my dags_folder. How ever when I create a symlink using
ln -s /absolute/path/to/dag.py dags/
It does not show up when running airflow dags list.
Neither does it show up after hard linking it.
ln /absolute/path/to/dag.py dags/
If I copy it in though, it shows up almost straight away.
cp /absolute/path/to/dag.py dags/
Airflow version 2.0.0, provided by the Astro CLI.

Related

You have two airflow.cfg files

I have created a venv project and installed airflow with in this venv. I have also set the export AIRFLOW_HOME - to a directory ( airflow_home ) with in this venv project. First time, after I ran
$airflow version
this created airflow.cfg and logs directory under this 'airflow_home' folder. However, when I repeat the same on next day, now I have the error message that I have two airflow.cfg.
one airflow.cfg under my venv project
another one under /home/username/airflow/airflow.cfg
Why is that ? I haven't installed airflow anywhere outside this venv project.
Found the issue. If I don't set the environment variable AIRFLOW_HOME, by default it creates a new ariflow.cfg under /home/usernme/airflow. To avoid this, AIRFLOW_HOME should be set before calling airflow each time after terminal starts or add to bash profile.

New airflow directory created when i run airflow webserver

I'm having a few problems with airflow. When I installed airflow and set the airflow home directory to be
my_home/Workspace/airflow_home
But when I start the webserver a new airflow directory is created
my_home/airflow
I thought maybe something in the airflow.cfg file needs to be changed but I'm not really sure. Has anyone had this problem before?
Try doing echo $AIRFLOW_HOME and see if it the correct path you set
you need to set AIRFLOW_HOME to the directory where you save airflow config file.
if the full path of airflow.cfg file is /home/test/bigdata/airflow/airflow.cfg
just run
export AIRFLOW_HOME=/home/test/bigdata/airflow
if AIRFLOW_HOME is not set, it will use ~/airflow as default.
you could also write a shell script to start airflow webserver
it might contain lines below
source ~/.virtualenvs/airflow/bin/activate # if your airflow is installed with virtualenv, this is not necessary
export AIRFLOW_HOME=/home/test/bigdata/airflow # path should be changed according to your environment
airflow webserver -D # start airflow webserver as daemon

Airflow user issues

We have installed airflow from service account say 'ABC' using sudo root in virtual environment, but we are facing few issues.
Calling python script using bash operator. Python script uses some
environmental variables from unix account 'ABC'.While running from
airflow, environmental variables are not picked. In order to find the
user of airflow, created dummy dag with bashoperator command
'whoami', it returns the ABC user. So airflow is using the same 'ABC'
user. Then why environmental variables are not picked?
Then tried sudo -u ABC python script. Environmental variables are not picked, due to sudo usage. Did the workaround without environmental variables and it ran well in development environment without issues. But while moving to different environment, got the below error and we don't have permission to edit sudoers file. Admin policy didn't comply.
sudo: sorry, you must have a tty to run sudo
Then used 'impersonation=ABC' option in .cfg file and ran the airflow without sudo. This time, bash command fails for environmental variables and it's asking all the packages used in script in virtual environment.
My Questions:
Airflow is installed through ABC after sudoing root. Why ABC was not
treated while running the script.
Why ABC environmental variables are not picked?
Even Impersonation option is not picking the environmental
variables?
Can airflow be installed without virtual environment?
Which is the best approach to install airflow? Using separate user
and sudoing root? We are using dedicated user for running python
script.Experts kindly clarify.
It's always a good idea to use virtualenv for installing any python packages. So, you should always prefer installing airflow in a virtaulenv.
You can use systemd or supervisord and create programs for airflow webserver and scheduler. Example configuration for supervisor:
[program:airflow-webserver]
command=sh /home/airflow/scripts/start-airflow-webserver.sh
directory=/home/airflow
autostart=true
autorestart=true
startretries=3
stderr_logfile=/home/airflow/supervisor/logs/airflow-webserver.err.log
stdout_logfile=/home/airflow/supervisor/logs/airflow-webserver.log
user=airflow
environment=AIRFLOW_HOME='/home/airflow/'
[program:airflow-scheduler]
command=sh /home/airflow/scripts/start-airflow-scheduler.sh
directory=/home/airflow
autostart=true
autorestart=true
startretries=3
stderr_logfile=/home/airflow/supervisor/logs/airflow-scheduler.err.log
stdout_logfile=/home/airflow/supervisor/logs/airflow-scheduler.log
user=airflow
environment=AIRFLOW_HOME='/home/airflow/'
We got the same issue as.
sudo: sorry, you must have a tty to run sudo
The solution we got is,
su ABC python script

Update deployed meteor app while running with minimum downtime - best practice

I run my meteor app on EC2 like this: node main.js (in tmux session)
Here are the steps I use to update my meteor app:
1) meteor bundle app.tgz
2) scp app.tgz EC2-server:/path
3) ssh EC2-server and attach to tmux
4) kill the current meteor-node process by C-c
5) extract app.tgz
6) run "node main.js" of the extracted app.tgz
Is this the standard practice?
I realize forever can be used too but still do you have to kill the old node process and start a new one every time I update my app? Can the upgrade be more seamless without killing the Node process?
You can't do this without killing the node process, but I haven't found that really matters. What's actually more annoying is the browser refresh on the client, but there isn't much you can do about that.
First, let's assume the application is already running. We start our app via forever with a script like the one in my answer here. I'd show you my whole upgrade script but it contains all kinds of Edthena-specific stuff, so I'll outline the steps we take below:
Build a new bundle. We do this on the server itself, which avoids any missing fibers issues. The bundle file is written to /home/ubuntu/apps/edthena/edthena.tar.gz.
We cd into the /home/ubuntu/apps/edthena directory and rm -rf bundle. That will blow away the files used by the current running process. Because the server is still running in memory it will keep executing. However, this step is problematic if your app regularly does uncached disk operatons like reading from the private directory after startup. We don't, and all of the static assets are served by nginx, so I feel safe in doing this. Alternatively, you can move the old bundle directory to something like bundle.old and it should work.
tar xzf edthena.tar.gz
cd bundle/programs/server && npm install
forever restart /home/ubuntu/apps/edthena/bundle/main.js
There really isn't any downtime with this approach - it just restarts the app in the same way it would if the server threw an exception. Forever also keeps the environment from your original script, so you don't need to specify your environment variables again.
Finally, you can have a look at the log files in your ~/.forever directory. The exact path can be found via forever list.
David's method is better than this once, because there's less downtime when using forever restart compared to forever stop; ...; forever start.
Here's the deploy script spelled out, using the latter technique. In ~/MyApp, I run this bash script:
echo "Meteor bundling..."
meteor bundle myapp.tgz
mkdir ~/myapp.prod 2> /dev/null
cd ~/myapp.prod
forever stop myapp.js
rm -rf bundle
echo "Unpacking bundle"
tar xzf ~/MyApp/myapp.tgz
mv bundle/main.js bundle/myapp.js
# `pwd` is there because ./myapp.log would create the log in ~/.forever/myapp.log actually
PORT=3030 ROOT_URL=http://myapp.example.com MONGO_URL=mongodb://localhost:27017/myapp forever -a -l `pwd`/myapp.log start myapp.js
You're asking about best practices.
I'd recommend mup and cluster
They allow for horizontal scaling, and a bunch of other nice features, while using simple commands and configuration.

Completely remove openstack from system after installation from devstack script

I am installing OpenStack on my local machine via this link. But I am having trouble in completely removing installed components from my local machine. I ran following command:-
$ sudo ./unstack.sh
tgtadm: can't send the request to the tgt daemon, Transport endpoint is not connected
tgtd seems to be in a bad state, restarting...
stop: Unknown instance:
tgt start/running, process 14629
tgt stop/waiting
Volume group "stack-volumes" not found
Skipping volume group stack-volumes
And file are still present in /opt/stack and /usr/local/bin/. But manually removing these file will not be a good option.
The unstack.sh script only stops the services without removing them.
Devstack's folder contains a clean.sh script that removes openstack and dependencies so you can run something like this:
cd path/to/devstack
# There's no need to call unstack.sh explicitly
# clean.sh invokes that script itself.
./clean.sh
Follow the following 3 steps:
./clean.sh
rm -rf /opt/stack
rm -rf /usr/local/bin (careful, this will remove everything installed to your local bin folder, which might include previously installed applications).
For more info of all the impacted files and directories this link.
unstack doesn't clean out /opt/stack. or purge all dependency packages. or clean all eggs out of python.
I recommend running devstack in a VM. It's easy enough to simply remove the VM and rebuild from scratch.
Example shell script for creating a devstack VM for kvm:
#!/bin/sh
/usr/bin/vmbuilder kvm ubuntu -v --suite=oneiric --libvirt=qemu:///system --flavour=server --arch=amd64 --cpus=2 --mem=4096 --swapsize=2048 --rootsize=30480 --ip=192.168.122.236 --hostname=devstack --user=stack --name=stack --pass=stack --addpkg=git --addpkg=screen --addpkg=vim --addpkg=strace --addpkg=lsof --addpkg=nmap --addpkg=subversion --addpkg=acpid --addpkg=tcpdump --addpkg=python-pip --addpkg=wget --addpkg=htop --mirror=http://us.archive.ubuntu.com/ubuntu --components='main,universe' --addpkg=openssh-server --dns=8.8.8.8 --dest=/virts/devstack

Resources