apache airflow configuration is empty and dags && plugins missing - airflow

I have installed apache airflow on Ubuntu 18.4 using this link https://airflow.apache.org/docs/apache-airflow/stable/start/local.html
now when i run airflow with
airflow webserver --port 8080
and the Admin/Configurtion is empty and there is this message:
"Your Airflow administrator chose not to expose the configuration,
most likely for security reasons."
What i did wrong?
More information that me be helpfull is that i created an user[airflow] and do all installtion with sudo , so my airflow info is :
Paths info
airflow_home | /home/airflow/airflow
system_path | /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
python_path | /usr/local/bin:/usr/lib/python36.zip:/usr/lib/python3.6:/usr/lib/python3.6/lib-dynload:/usr/local/lib/python3.6/dist-packages:/usr/lib/python3/dist-pac
| kages:/home/airflow/airflow/dags:/home/airflow/airflow/config:/home/airflow/airflow/plugins
airflow_on_path | True
Config info
executor | LocalExecutor
task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler
sql_alchemy_conn | postgresql+psycopg2://airflow:airflow#localhost:5432/airflow
dags_folder | /home/airflow/airflow/dags
plugins_folder | /home/airflow/airflow/plugins
base_log_folder | /home/airflow/airflow/logs
However these folder does not exists also :/home/airflow/airflow/dags && /home/airflow/airflow/plugins

You will probably need to set expose_config = True in airflow.cfg and restart the web-server.

As #somuchtolearnandshare mentioned, it should be AIRFLOW__WEBSERVER__EXPOSE_CONFIG: "True"

If You are running airflow inside the docker using the YAML file then please go to the docker-compose.YAML file and add this is a line under the env tag:
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: 'true'
This should fix the issue

I'm deploy airflow with helm chart, but it should help.
First, you need content file values, command (helm chart)
helm show values apache-airflow/airflow > values.yaml
Find in file values: extraEnv, put the value below and save.
extraEnv: |
- name: AIRFLOW__WEBSERVER__EXPOSE_CONFIG
value: 'TRUE'
now, run changes with command below
helm upgrade --install airflow apache-airflow/airflow -n airflow -f values.yaml --debug

I just wanted to add that for "Azure bitnami Airflow multi-tier" implementation.
expose_config = True
in airflow.cfg and restart the web-server did the trick

Related

Airflow2 gitSync DAG works for airflow namespace, but not alternate namespace

I'm running minikube to develop with Apache Airflow2. I am trying to sync my DAGs from a private repo on GitLab, but have taken a few steps back just to get a basic example working. In the case of the default "airflow" namespace, it works, but when using the exact same file in a non-default name space, it doesn't.
I have a values.yaml file which has the following section:
dags:
gitSync:
enabled: true
repo: "ssh://git#github.com/apache/airflow.git"
branch: v2-1-stable
rev: HEAD
depth: 1
maxFailures: 0
subPath: "tests/dags"
wait: 60
containerName: git-sync
uid: 65533
extraVolumeMounts: []
env: []
resources: {}
If I run helm upgrade --install airflow apache-airflow/airflow -f values.yaml -n airflow, and then kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow, I get a whole list of DAGs as expected at http://localhost:8080.
But if I run helm upgrade --install airflow apache-airflow/airflow -f values.yaml -n mynamespace, and then kubectl port-forward svc/airflow-webserver 8080:8080 --namespace mynamespace, I get no DAGs listed at http://localhost:8080.
This post would be 10 times longer if I listed all the sites I hit trying to resolve this. What have I done wrong??
UPDATE: I created a new namespace, test01, in case there was some history being held over and causing the problem. I ran helm upgrade --install airflow apache-airflow/airflow -f values.yaml -n test01. Starting the webserver and inspecting, I do not get a login screen, but it goes straight to the usual web pages, also does not show the dags list, but this time has a notice at the top of the DAG page:
The scheduler does not appear to be running.
The DAGs list may not update, and new tasks will not be scheduled.
This is different behaviour yet again (although the same as with mynamespace insofar as showing no DAGs via gitSync), even though it seems to suggest a reason why DAGs aren't being retrieved in this case. I don't understand why a scheduler isn't running if everything was spun-up and initiated the same as before.
Curiously, helm show values apache-airflow/airflow --namespace test01 > values2.yaml gives the default dags.gitSync.enabled: false and dags.gitSync.repo: https://github.com/apache/airflow.git. I would have thought that should reflect what I upgraded/installed from values.yaml: enable = true and the ssh repo fetch. I get no change in behaviour by editing values2.yaml to dags.gitSync.enabled: true and re-upgrading -- still the error note about scheduler no running, and no DAGs.

Airflow 2.0.2 - No user yet created

we're moving from airflow 1.x to 2.0.2, and I'm noticing the below error in my terminal after i run docker-compose run --rm webserver initdb:
{{manager.py:727}} WARNING - No user yet created, use flask fab
command to do it.
but in my entrypoint.sh I have the below to create users:
echo "Creating airflow user: ${AIRFLOW_CREATE_USER_USER_NAME}..."
su -c "airflow users create -r ${AIRFLOW_CREATE_USER_ROLE} -u ${AIRFLOW_CREATE_USER_USER_NAME} -e ${AIRFLOW_CREATE_USER_USER_NAME}#vice.com \
-p ${AIRFLOW_CREATE_USER_PASSWORD} -f ${AIRFLOW_CREATE_USER_FIRST_NAME} -l \
${AIRFLOW_CREATE_USER_LAST_NAME}" airflow
echo "Created airflow user: ${AIRFLOW_CREATE_USER_USER_NAME} done!"
;;
Because of this error whenever I try to run airflow locally I still have to run the below to create a user manually every time I start up airflow:
docker-compose run --rm webserver bash
airflow users create \
--username name \
--firstname fname \
--lastname lname \
--password pw \
--role Admin \
--email email#email.com
Looking at the airflow docker entrypoint script entrypoint_prod.sh file, looks like airflow will create the an admin for you when the container on boots.
By default the admin user is 'admin' without password.
If you want something diferent, set this variables: _AIRFLOW_WWW_USER_PASSWORD and _AIRFLOW_WWW_USER_USERNAME
(I'm on airflow 2.2.2)
Looks like they changed the admin creation command password from -p test to -p $DEFAULT_PASSWORD. I had to pass in this DEFAULT_PASSWORD env var to the docker-compose environment for the admin user to be created. It also looks like they now suggest using the .env.localrunner file for configuration.
Here is the commit where that change was made.
(I think you asked this question prior to that change being made, but maybe this will help someone in the future who had my same issue).

How do you access Airflow Web Interface?

Hi I am taking a datacamp class on how to use Airflow and it shows how to create dags once you have access to an Airflow Web Interface.
Is there an easy way to create an account in the Airflow Web Interface? I am very lost on how to do this or is this just an enterprise tool where they provide you access to it once you pay?
You must do this on terminal. Run these commands:
export AIRFLOW_HOME=~/airflow
AIRFLOW_VERSION=2.2.5
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
airflow standalone
Then, in there, you can see the username and password provided.
Then, open Chrome and search for:
localhost:8080
And write the username and password.
airflow has a web interface as well by default and default user pass is : airflow/airflow
you can run it by using :
airflow webserver --port 8080
then open the link : http://localhost:8080
if you want to make a new username by this command:
airflow create_user [-h] [-r ROLE] [-u USERNAME] [-e EMAIL] [-f FIRSTNAME]
[-l LASTNAME] [-p PASSWORD] [--use_random_password]
learn more about Running Airflow locally
You should install it , it is a python package not a website to register on.
The easiest way to install Airflow is:
pip install apache-airflow
if you need extra packages with it:
pip install apache-airflow[postgres,gcp]
finally run the webserver and the scheduler in different cmd :
airflow webserver # it is by default 8080
airflow scheduler

Change Airflow Services Logs Path

I am looking for resources to change the log paths for Airflow services such as Webserver and Scheduler. I am running out of space every now and then and so want to move the logs into a bigger mount space.
airflow-scheduler.log
airflow-webserver.log
airflow-scheduler.out
airflow-webserver.out
airflow-scheduler.err
airflow-webserver.err
I am starting the services using below given command:
airflow webserver -D
airflow scheduler -D
Thanking in advance!
From https://airflow.apache.org/howto/write-logs.html#writing-logs-locally
Users can specify a logs folder in airflow.cfg using the base_log_folder setting. By default, it is in the AIRFLOW_HOME directory.
You need to change the airflow.cfg for log related parameters as below:
[core]
...
# The folder where airflow should store its log files
# This path must be absolute
base_log_folder = /YOUR_MOUNTED_PATH/logs
...
[webserver]
...
# Log files for the gunicorn webserver. '-' means log to stderr.
access_logfile = /YOUR_MOUNTED_PATH/webserver-access.log"
error_logfile = /YOUR_MOUNTED_PATH/webserver-error.log"
...
Log location can be specified on airflow.cfg as follows. By default, it is under AIRFLOW_HOME
[core]
...
# The folder where airflow should store its log files
# This path must be absolute
base_log_folder = /airflow/logs
...
Please refer to this for additional information https://airflow.apache.org/howto/write-logs.html?highlight=logs
In both master (code) and the 1.10 branch (code), the locations of the following files are hardcoded unless you pass an argument to the cli:
airflow-webserver.err
airflow-webserver.out
airflow-webserver.log
airflow-scheduler.err
airflow-scheduler.out
airflow-scheduler.log
The rest of the log locations can be modified through one of the following variables:
In the [core] section:
base_log_folder
log_filename_template
log_processor_filename_template
dag_processor_manager_log_location
And in the [webserver] section:
access_logfile
error_logfile
You can supply flags to the airflow webserver -D and airflow scheduler -D commands to put all of the generated webserver and scheduler log files where you want them. Here's an example:
airflow webserver -D \
--port 8080 \
-A $AIRFLOW_HOME/logs/webserver/airflow-webserver.out \
-E $AIRFLOW_HOME/logs/webserver/airflow-webserver.err \
-l $AIRFLOW_HOME/logs/webserver/airflow-webserver.log \
--pid $AIRFLOW_HOME/logs/webserver/airflow-webserver.pid \
--stderr $AIRFLOW_HOME/logs/webserver/airflow-webserver.stderr \
--stdout $AIRFLOW_HOME/logs/webserver/airflow-webserver.stdout
and
airflow scheduler -D \
-l $AIRFLOW_HOME/logs/scheduler/airflow-scheduler.log \
--pid $AIRFLOW_HOME/logs/scheduler/airflow-scheduler.pid \
--stderr $AIRFLOW_HOME/logs/scheduler/airflow-scheduler.stderr \
--stdout $AIRFLOW_HOME/logs/scheduler/airflow-scheduler.stdout
Note: If you use these, you'll need to create the logs/webserver and logs/scheduler subfolders. This is only tested for airflow 2.1.2.

Airflow will keep showing example dags even after removing it from configuration

Airflow example dags remain in the UI even after I have turned off load_examples = False in config file.
The system informs the dags are not present in the dag folder but they remain in UI because the scheduler has marked it as active in the metadata database.
I know one way to remove them from there would be to directly delete these rows in the database but off course this is not ideal.How should I proceed to remove these dags from UI?
There is currently no way of stopping a deleted DAG from being displayed on the UI except manually deleting the corresponding rows in the DB. The only other way is to restart the server after an initdb.
Airflow 1.10+:
Edit airflow.cfg and set load_examples = False
For each example dag run the command airflow delete_dag example_dag_to_delete
This avoids resetting the entire airflow db.
(Since Airflow 1.10 there is the command to delete dag from database, see this answer )
Assuming you have installed airflow through Anaconda.
Else look for airflow in your python site-packages folder and follow the below.
After you follow the instructions https://stackoverflow.com/a/43414326/1823570
Go to $AIRFLOW_HOME/lib/python2.7/site-packages/airflow directory
Remove the directory named example_dags or just rename it to revert back
Restart your webserver
cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill -9
airflow webserver -p [port-number]
Definitely airflow resetdb works here.
What I do is to create multiple shell scripts for various purposes like start webserver, start scheduler, refresh dag, etc. I only need to run the script to do what I want. Here is the list:
(venv) (base) [pchoix#hadoop02 airflow]$ cat refresh_airflow_dags.sh
#!/bin/bash
cd ~
source venv/bin/activate
airflow resetdb
(venv) (base) [pchoix#hadoop02 airflow]$ cat start_airflow_scheduler.sh
#!/bin/bash
cd /home/pchoix
source venv/bin/activate
cd airflow
nohup airflow scheduler >> "logs/schd/$(date +'%Y%m%d%I%M%p').log" &
(venv) (base) [pchoix#hadoop02 airflow]$ cat start_airflow_webserver.sh
#!/bin/bash
cd /home/pchoix
source venv/bin/activate
cd airflow
nohup airflow webserver >> "logs/web/$(date +'%Y%m%d%I%M%p').log" &
(venv) (base) [pchoix#hadoop02 airflow]$ cat start_airflow.sh
#!/bin/bash
cd /home/pchoix
source venv/bin/activate
cd airflow
nohup airflow webserver >> "logs/web/$(date +'%Y%m%d%I%M%p').log" &
nohup airflow scheduler >> "logs/schd/$(date +'%Y%m%d%I%M%p').log" &
Don't forget to chmod +x to those scripts
I hope you find this helps.

Resources