Airflow user issues

Airflow user issues - unix

We have installed airflow from service account say 'ABC' using sudo root in virtual environment, but we are facing few issues.
Calling python script using bash operator. Python script uses some
environmental variables from unix account 'ABC'.While running from
airflow, environmental variables are not picked. In order to find the
user of airflow, created dummy dag with bashoperator command
'whoami', it returns the ABC user. So airflow is using the same 'ABC'
user. Then why environmental variables are not picked?
Then tried sudo -u ABC python script. Environmental variables are not picked, due to sudo usage. Did the workaround without environmental variables and it ran well in development environment without issues. But while moving to different environment, got the below error and we don't have permission to edit sudoers file. Admin policy didn't comply.
sudo: sorry, you must have a tty to run sudo
Then used 'impersonation=ABC' option in .cfg file and ran the airflow without sudo. This time, bash command fails for environmental variables and it's asking all the packages used in script in virtual environment.
My Questions:
Airflow is installed through ABC after sudoing root. Why ABC was not
treated while running the script.
Why ABC environmental variables are not picked?
Even Impersonation option is not picking the environmental
variables?
Can airflow be installed without virtual environment?
Which is the best approach to install airflow? Using separate user
and sudoing root? We are using dedicated user for running python
script.Experts kindly clarify.

It's always a good idea to use virtualenv for installing any python packages. So, you should always prefer installing airflow in a virtaulenv.
You can use systemd or supervisord and create programs for airflow webserver and scheduler. Example configuration for supervisor:
[program:airflow-webserver]
command=sh /home/airflow/scripts/start-airflow-webserver.sh
directory=/home/airflow
autostart=true
autorestart=true
startretries=3
stderr_logfile=/home/airflow/supervisor/logs/airflow-webserver.err.log
stdout_logfile=/home/airflow/supervisor/logs/airflow-webserver.log
user=airflow
environment=AIRFLOW_HOME='/home/airflow/'
[program:airflow-scheduler]
command=sh /home/airflow/scripts/start-airflow-scheduler.sh
directory=/home/airflow
autostart=true
autorestart=true
startretries=3
stderr_logfile=/home/airflow/supervisor/logs/airflow-scheduler.err.log
stdout_logfile=/home/airflow/supervisor/logs/airflow-scheduler.log
user=airflow
environment=AIRFLOW_HOME='/home/airflow/'

We got the same issue as.
sudo: sorry, you must have a tty to run sudo
The solution we got is,
su ABC python script

Related

how to monitor a directory and include new files with tail -f in Centos(for shiny-server logs in Docker)

Due to the need to direct shiny-server logs to stdout so that "docker logs" (and monitoring utilities relying on it) can see them i'm trying to do some kind of :
tail -f <logs_directory>/*
Which works as needed when no new files are added to the directory, the problem is shiny-server dynamically creates files in this directory which we need to automatically consider.
I found other users have solved this via the xtail package, the problem is i'm using Centos and xtail is not available for centos.
The question is , is there any "clean" way of doing this via standard tail command without needing xtail ? or maybe there exists an equivalent package to xtail for centos?

You will probably find it easier to use the docker run -v option to mount a host directory into the container and collect logs there. Then you can use any tool you want that collects log files out of a directory (logstash is popular but far from the only option) to collect those log files.
This also avoids the problem of having to both run your program and a log collector inside your container; you can just run the service as the main container process, and not have to do gymnastics with tail and supervisord and whatever else to try to keep everything running.

Jar file run on a server background with close putty session

I have tried the run spring boot jar file using putty. but the problem is after closed the putty session service was stopped.

then i tried up the jar file with following command. its working fine .
**nohup java -jar /web/server.jar **

You should avoid using nohup as it will just disassociate your terminal and the process. Instead, use the following command to run your process as a service.
sudo ln -s /path/to/your-spring-boot-app.jar /etc/init.d/your-spring-boot-app
This command creates a symbolic link to your JAR file. Which then, you can run as a service using the command sudo service your-spring-boot-app start. This will write console log to /var/log/your-spring-boot-app.log
Moreover, you can configure spring-boot/application.properties to write console logs at your specified location using logging.path=path-to-your-log-directoryor logging.file=path-to-your-log-file.txt. Also, it may be worth noting that logging.file takes priority over logging.path

Oozie executing hadoop commands in shell action as yarn

Environment : Hortonworks Sandbox HDP 2.2.4
Issue : Unable to run the hadoop commands present in the shell scripts as a root user. The oozie job is getting triggered as a root user, but when the hadoop fs or any mapreduce command is executed, then it runs as yarn user. As yarn, doesn’t have access to some of the file system , so the shell script is failing to execute. Let me know what changes I need to do , for making it run the hadoop commands as root user.

It is an expected behaviour to get Yarn in place whenever we are invoking shell actions in oozie. Yarn user only have the capabilities to run shell actions. One thing we can do is to give access permissions to Yarn on the file system.

This is more like a shell script question than an Oozie question. In theory, Oozie job runs as the user who submits the job. In a kerberos' env, the user is whoever signed in with keytab/password.
Once job is running on Hadoop cluster, in order to change the ownership of command, you can use "sudo" within your shell script. In your case, you may also want to make sure user "yarn" is allowed to sudo to the commands you want to execute.

Add below property to workflow:
HADOOP_USER_NAME=${wf:user()}

Bootstrap action for EMR

While bootstapping on AWS EMR - I am getting the following. Any clues how to resolve it?
/mnt/var/lib/bootstrap-actions/1/STAR: /lib/libc.so.6: version 'GLIBC_2.14' not found (required by /mnt/var/lib/bootstrap-actions/1/STAR)

It's probably caused by not having high enough version of libc6.
You can SSH into the EC2 instance the EMR job created by following this: Open an SSH Tunnel to the Master Node
Then update the packages, for example, if your instance uses ubuntu, you should do sudo apt-get update. The command depends on which distribution of linux you are creating on your ec2 instance. The default emr job uses Debian, and Amazon linux is built based on redhat.
See if this would work.
If this is actually the problem, you can add this update package command (with ignoring Y/N prompt) at the start of your bootstrap script.

Completely remove openstack from system after installation from devstack script

I am installing OpenStack on my local machine via this link. But I am having trouble in completely removing installed components from my local machine. I ran following command:-
$ sudo ./unstack.sh
tgtadm: can't send the request to the tgt daemon, Transport endpoint is not connected
tgtd seems to be in a bad state, restarting...
stop: Unknown instance:
tgt start/running, process 14629
tgt stop/waiting
Volume group "stack-volumes" not found
Skipping volume group stack-volumes
And file are still present in /opt/stack and /usr/local/bin/. But manually removing these file will not be a good option.

The unstack.sh script only stops the services without removing them.
Devstack's folder contains a clean.sh script that removes openstack and dependencies so you can run something like this:
cd path/to/devstack
# There's no need to call unstack.sh explicitly
# clean.sh invokes that script itself.
./clean.sh

Follow the following 3 steps:
./clean.sh
rm -rf /opt/stack
rm -rf /usr/local/bin (careful, this will remove everything installed to your local bin folder, which might include previously installed applications).
For more info of all the impacted files and directories this link.

unstack doesn't clean out /opt/stack. or purge all dependency packages. or clean all eggs out of python.
I recommend running devstack in a VM. It's easy enough to simply remove the VM and rebuild from scratch.
Example shell script for creating a devstack VM for kvm:
#!/bin/sh
/usr/bin/vmbuilder kvm ubuntu -v --suite=oneiric --libvirt=qemu:///system --flavour=server --arch=amd64 --cpus=2 --mem=4096 --swapsize=2048 --rootsize=30480 --ip=192.168.122.236 --hostname=devstack --user=stack --name=stack --pass=stack --addpkg=git --addpkg=screen --addpkg=vim --addpkg=strace --addpkg=lsof --addpkg=nmap --addpkg=subversion --addpkg=acpid --addpkg=tcpdump --addpkg=python-pip --addpkg=wget --addpkg=htop --mirror=http://us.archive.ubuntu.com/ubuntu --components='main,universe' --addpkg=openssh-server --dns=8.8.8.8 --dest=/virts/devstack

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex