airflow: why it still shows dags after I logged out? - airflow

I am new to airflow, I have managed it up and running and I can access the web GUI and see the example DAGs, however, after I clicked on the top right corner logout icon and logged out, the DAGs are still available on the front end instead of presumably no web GUI at all?

That's is because by default web authentication is turned off.
Enable web authentication by adding the following to your airflow.cfg file:
[webserver]
authenticate = True
auth_backend = airflow.contrib.auth.backends.password_auth
and make sure that they is no other duplicate value for auth_backend in airflow.cfg file.
Then create a user for the Web UI:
# navigate to the airflow installation directory
$ cd ~/airflow
$ python
Python 2.7.9 (default, Feb 10 2015, 03:28:08)
Type "help", "copyright", "credits" or "license" for more information.
>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.User())
>>> user.username = 'new_user_name'
>>> user.email = 'new_user_email#example.com'
>>> user.password = 'set_the_password'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()
Now when you try to access the web ui, it will ask for user and password. And when you log out you won't be able to see the DAGs.
To integrate it with LDAP: https://airflow.apache.org/security.html#ldap

Related

using localstack AWS how can i listen to SNS event in chalice

i have a localstack setup and i can succesfuly publish SNS from my chalice application by poiting the endpoint url like such
_sns_topic = session.resource('sns', endpoint_url='http://localhost:4566')
however one of our endpoints needs to listen to this SNS event in local, but it is currently set-up using #app.on_sns_message()
#app.on_sns_message(topic='topicNameHere', ''))
def onSnsEvent(event) -> None:
#do something
I checked the documentation but i cant seem to find how can I point this so that it will listen to local/localstack events instead.
any tips and/or idea?
I was able to get this running on my side by following the official Chalice docs and using LocalStack's recommended Chalice wrapper. Here are the steps:
Start LocalStack: localstack start -d
Create a SNS topic: awslocal sns create-topic --name my-demo-topic --region us-east-1 --output table | cat (I am using my-demo-topic).
Install chalice-local: pip3 install chalice-local.
Initiate a new app: chalice-local new-project chalice-demo-sns
Keep the following code inside chalice-demo-sns:
from chalice import Chalice
app = Chalice(app_name='chalice-sns-demo')
app.debug = True
#app.on_sns_message(topic='my-demo-topic')
def handle_sns_message(event):
app.log.debug("Received message with subject: %s, message: %s",
event.subject, event.message)
Deploy it: chalice-local deploy
Use boto3 to publish messages on SNS:
$ python
>>> import boto3
>>> endpoint_url = "http://localhost.localstack.cloud:4566"
>>> sns = boto3.client('sns', endpoint_url=endpoint_url)
>>> topic_arn = [t['TopicArn'] for t in sns.list_topics()['Topics']
... if t['TopicArn'].endswith(':my-demo-topic')][0]
>>> sns.publish(Message='TestMessage1', Subject='TestSubject1',
... TopicArn=topic_arn)
{'MessageId': '12345', 'ResponseMetadata': {}}
>>> sns.publish(Message='TestMessage2', Subject='TestSubject2',
... TopicArn=topic_arn)
{'MessageId': '54321', 'ResponseMetadata': {}}
Check the logs: chalice-local logs -n handle_sns_message

SFTP with Google Cloud Composer

I need to upload a file via SFTP into an external server through Cloud Composer. The code for the task is as follows:
from airflow import DAG
from airflow.operators.python_operator import PythonVirtualenvOperator
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime, timedelta
def make_sftp():
import paramiko
import pysftp
import os
from airflow.contrib.hooks.ssh_hook import SSHHook
import subprocess
ssh_hook = SSHHook(ssh_conn_id="conn_id")
sftp_client = ssh_hook.get_conn().open_sftp()
return 0
etl_dag = DAG("dag_test",
start_date=datetime.now(tz=local_tz),
schedule_interval=None,
default_args={
"owner": "airflow",
"depends_on_past": False,
"email_on_failure": False,
"email_on_retry": False,
"retries": 5,
"retry_delay": timedelta(minutes=5)})
sftp = PythonVirtualenvOperator(task_id="sftp",
python_callable=make_sftp,
requirements=["sshtunnel", "paramiko"],
dag=etl_dag)
start_pipeline = DummyOperator(task_id="start_pipeline", dag=etl_dag)
start_pipeline >> sftp
In "conn_id" I have used the following options: {"no_host_key_check": "true"}, the DAG runs for a couple of seconds and the fail with the following message:
WARNING - Remote Identification Change is not verified. This wont protect against Man-In-The-Middle attacks\n[2022-02-10 10:01:59,358] {ssh_hook.py:171} WARNING - No Host Key Verification. This wont protect against Man-In-The-Middle attacks\nTraceback (most recent call last):\n File "/tmp/venvur4zvddz/script.py", line 23, in <module>\n res = make_sftp(*args, **kwargs)\n File "/tmp/venvur4zvddz/script.py", line 19, in make_sftp\n sftp_client = ssh_hook.get_conn().open_sftp()\n File "/usr/local/lib/airflow/airflow/contrib/hooks/ssh_hook.py", line 194, in get_conn\n client.connect(**connect_kwargs)\n File "/opt/python3.6/lib/python3.6/site-packages/paramiko/client.py", line 412, in connect\n server_key = t.get_remote_server_key()\n File "/opt/python3.6/lib/python3.6/site-packages/paramiko/transport.py", line 834, in get_remote_server_key\n raise SSHException("No existing session")\nparamiko.ssh_exception.SSHException: No existing session\n'
do I have to set other options? Thank you!
Configuring the SSH connection with key pair authentication
To SSH into the host as a user with username “user_a”, an SSH key pair should be generated for that user and the public key should be added to the host machine. The following are the steps that would create an SSH connection to the “jupyter” user which has the write permissions.
Run the following commands on the local machine to generate the required SSH key:
ssh-keygen -t rsa -f ~/.ssh/sftp-ssh-key -C user_a
“sftp-ssh-key” → Name of the pair of public and private keys (Public key: sftp-ssh-key.pub, Private key: sftp-ssh-key)
“user_a” → User in the VM that we are trying to connect to
chmod 400 ~/.ssh/sftp-ssh-key
Now, copy the contents of the public key sftp-ssh-key.pub into ~/.ssh/authorized_keys of your host system. Check for necessary permissions for authorized_keys and grant them accordingly using chmod.
I tested the setup with a Compute Engine VM . In the Compute Engine console, edit the VM settings to add the contents of the generated SSH public key into the instance metadata. Detailed instructions can be found here. If you are connecting to a Compute Engine VM, make sure that the instance has the appropriate firewall rule to allow the SSH connection.
Upload the private key to the client machine. In this scenario, the client is the Airflow DAG so the key file should be accessible from the Composer/Airflow environment. To make the key file accessible, it has to be uploaded to the GCS bucket associated with the Composer environment. For example, if the private key is uploaded to the data folder in the bucket, the key file path would be /home/airflow/gcs/data/sftp-ssh-key.
Configuring the SSH connection with password authentication
If password authentication is not configured on the host machine, follow the below steps to enable password authentication.
Set the user password using the below command and enter the new password twice.
sudo passwd user_a
To enable SSH password authentication, you must SSH into the host machine as root to edit the sshd_config file.
/etc/ssh/sshd_config
Then, change the line PasswordAuthentication no to PasswordAuthentication yes. After making that change, restart the SSH service by running the following command as root.
sudo service ssh restart
Password authentication has been configured now.
Creating connections and uploading the DAG
1.1 Airflow connection with key authentication
Create a connection in Airflow with the below configuration or use the existing connection.
Extra field
The Extra JSON dictionary would look like this. Here, we have uploaded the private key file to the data folder in the Composer environment's GCS bucket.
{
"key_file": "/home/airflow/gcs/data/sftp-ssh-key",
"conn_timeout": "30",
"look_for_keys": "false"
}
1.2 Airflow connection with password authentication
If the host machine is configured to allow password authentication, these are the changes to be made in the Airflow connection.
The Extra parameter can be empty.
The Password parameter is the user_a's user password on the host machine.
The task logs show that the password authentication was successful.
INFO - Authentication (password) successful!
Upload the DAG to the Composer environment and trigger the DAG. I was facing key validation issue with the latest version of the paramiko=2.9.2 library. I tried downgrading paramiko but the older versions do not seem to support OPENSSH keys. Found an alternative paramiko-ng in which the validation issue has been fixed. Changed the Python dependency from paramiko to paramiko-ng in the PythonVirtualenvOperator.
from airflow import DAG
from airflow.operators.python_operator import PythonVirtualenvOperator
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime, timedelta
def make_sftp():
import paramiko
from airflow.contrib.hooks.ssh_hook import SSHHook
ssh_hook = SSHHook(ssh_conn_id="sftp_connection")
sftp_client = ssh_hook.get_conn().open_sftp()
print("=================SFTP Connection Successful=================")
remote_host = "/home/sftp-folder/sample_sftp_file" # file path in the host system
local_host = "/home/airflow/gcs/data/sample_sftp_file" # file path in the client system
sftp_client.get(remote_host,local_host) # GET operation to copy file from host to client
sftp_client.close()
return 0
etl_dag = DAG("sftp_dag",
start_date=datetime.now(),
schedule_interval=None,
default_args={
"owner": "airflow",
"depends_on_past": False,
"email_on_failure": False,
"email_on_retry": False,
"retries": 5,
"retry_delay": timedelta(minutes=5)})
sftp = PythonVirtualenvOperator(task_id="sftp",
python_callable=make_sftp,
requirements=["sshtunnel", "paramiko-ng", "pysftp"],
dag=etl_dag)
start_pipeline = DummyOperator(task_id="start_pipeline", dag=etl_dag)
start_pipeline >> sftp
Results
The sample_sftp_file has been copied from the host system to the specified Composer bucket.

Jupyter Password Not Hashed

When I try to set up the jupyter notebook password, I don't get a password hash when I open up the jupyter_notebook_config.json file.
This is the output of the json file:
{
"NotebookApp": {
"password":
"argon2:$argon2id$v=19$m=10240,t=10,p=8$pcTg1mB/X5a3XujQqYq/wQ$/UBQBRlFdzmEmxs6c2IzmQ"
}
}
I've tried running passwd() from python as well, like in the instructions for Preparing a hashed password instructions found online but it produces the same results as above. No hash.
Can someone please let me know what I'm doing wrong?
I'm trying to set up a Jetson Nano in similar fashion to the Deep Learing Institute Nano build. With that build you can run Jupyter Lab remotely so the nano can run headless. I'm trying to do the same things with no luck.
Thanks!
This is the default algorithm (argon2):
https://github.com/jupyter/notebook/blob/v6.5.2/notebook/auth/security.py#L23
you can provide a different algorithm like sha1 if you like:
>>> from notebook.auth import passwd
>>> from notebook.auth.security import passwd_check
>>>
>>> password = 'myPass123'
>>>
>>> hashed_argon2 = passwd(password)
>>> hashed_sha1 = passwd(password, 'sha1')
>>>
>>> print(hashed_argon2)
argon2:$argon2id$v=19$m=10240,t=10,p=8$JRz5GPqjOYJu/cnfXc5MZw$LZ5u6kPKytIv/8B/PLyV/w
>>>
>>> print(hashed_sha1)
sha1:c29c6aeeecef:0b9517160ce938888eb4a6ec9ca44e3a31da9519
>>>
>>> passwd_check(hashed_argon2, password)
True
>>>
>>> passwd_check(hashed_sha1, password)
True
Check whether you don't have a different Jupyter server running on your machine. It happened to me that I was trying over and over a password on port 8888 while my intended server was on port 8889.
Another time, Anaconda started a server on localhost:8888, and I was trying to reach a mapped port from a docker container, also on port 8888, and the only way to access was actually on 0.0.0.0:8888.

Apache airflow REST API call fails with 403 forbidden when API authentication is enabled

Apache Airflow REST API fails with 403 forbidden for the call:
"/api/experimental/test"
Configuration in airflow.cfg
[webserver]
authenticate = True
auth_backend = airflow.contrib.auth.backends.password_auth
[api]
rbac = True
auth_backend = airflow.contrib.auth.backends.password_auth
After setting all this, docker image is built and run as a docker container.
Created the airflow user as follows:
airflow create_user -r Admin -u admin -e admin#hpe.com -f Administrator -l 1 -p admin
Login with credentials for Web UI works fine.
Where as login to REST API is not working.
HTTP Header for authentication:
Authorization BASIC YWRtaW46YWRtaW4=
Airflow version: 1.10.9
By creating user in the following manner we can access the Airflow experimental API using credentials.
import airflow
from airflow import models, settings
from airflow.contrib.auth.backends.password_auth import PasswordUser
user = PasswordUser(models.User())
user.username = 'new_user_name'
user.email = 'new_user_email#example.com'
user.password = 'set_the_password'
session = settings.Session()
session.add(user)
session.commit()
session.close()
exit()
By creating user with "airflow create_user" command, we cannot access the Airflow Experimental APIs.

Airflow basic authentication hides Admin -menu options

I enabled basic authentication in Airflow, but now I can't see most of the Admin -menu items. For example I can't create connections using the UI.
I'm using apache-airflow 1.10.2 and in the config I set:
authenticate = True
auth_backend = airflow.contrib.auth.backends.password_auth
When creating a new user:
You can follow the directions here. Then you should also make the user a "superuser".
# after user.password = 'set_the_password'
>>> user.superuser = True
...
If you already created the user you can change it this way:
$ python
>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> session = settings.Session()
>>> user = session.query(models.User).filter(models.User.username == {{ username you previously created }}).first()
>>> user.superuser = True
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()
To access the admin functions the user must be in the Admin role. You can create an admin user with the Airflow CLI:
airflow create_user -r Admin -u myadmin -p secret_password

Resources