Airflow Celery flower looks at the old config entry for mysql - airflow

I have tried to test airflow with celery. I changed broker_url and celery_result_backend to look at mysql.
So celery starts fine connecting to db but when I run airflow flower it seems that it reads a default config entry to connect to db.
this is what I am seeing in log:
[I 170420 13:51:38 mixins:231] Connected to sqla+mysql://airflow:airflow#localhost:3306/airflow
do I have to modify flower config somewhere?

comment out the the sql_alchemy_conn config parameter in the airflow.cfg file

Related

Airflow up and running but no DAG-s in UI

I’m running Apache airflow on my local Windows 11 machine. Airflow processes are up and running and Airflow UI is accessible through localhost:8080 address.
I have also DAGs file (tuto.py) in dags folder but I can’t see any DAGs via the Airflow UI. I have restarted the Airflow webserver but without success. Any ideas what’s wrong with that?
The contents of tuto.py DAG file:
In your docker-compose file, you mentioned that your dags are located in: opt/airflow/dags
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
Either you put your dags in /opt/airflow/dags instead of docker-airflow-master/dags or you change the location of dags directory.
If your docker-compose.yaml file is under my_project/docker-compose.yaml, your dags directory must be under: my_project/dags/dag1.py
dags directory and docker-compose file must be in the same level.

Airflow Log file does not exist:

Airflow was working fine for several weeks and suddenly started getting errors for a few days.
Dags fail randomly with this error.
Log file does not exist: airflow_path/1.log
Fetching from: http://:8793/airflow_path/1.log
*** Failed to fetch log file from worker. The request to ':///' is missing either an 'http://
I had a similar issue, and I figured that in my case the worker node (I was using Celery Executor) was exhausted and therefore unavailable to execute any dags on it, can you check the CPU and memory utilized by the worker node (or its alternative if you are not using celery executor).
You can try to increase the CPU and memory for that applicable node and try.
Happened to me as well using LocalExecutor and an Airflow setup on Docker Compose. Eventually, I figured that the webserver would fail to fetch old logs whenever I recreated my Docker containers. Digging deeper, I realized that the webserver was failing to fetch the logs because it didn't have access to the filesystem of the scheduler (where the logs live).
The fix was to ensure that both the scheduler and the webserver services in docker-compose.yml share a volume with the logs, i.e.:
# docker-compose.yml
version: "3.9"
services:
scheduler:
image: ...
volumes:
- airflow_logs:/airflow/logs
...
webserver:
image: ...
volumes:
- airflow_logs:/airflow/logs
...
volumes:
airflow_logs:

How to use airflow configuration file (airflow.cfg) when airflow run in container?

I'm using Airflow that run in container as described here. It seems that the configuration file airflow.cfg on the host have no impact on Airflow. I tried the solution here but it didn't help.
The configuration fields I changed are:
default_timezone = system #(from utc)
load_examples = False #(from True)
base_url = http://localhost:8081 #(from 8080)
default_ui_timezone = system #(from UTC)
I didn't see any impact on airflow eventhough I did docker-compose down and docker-compose up

Airflow - Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url

I am running Airflowv1.9 with Celery Executor. I have 5 Airflow workers running in 5 different machines. Airflow scheduler is also running in one of these machines. I have copied the same airflow.cfg file across these 5 machines.
I have daily workflows setup in different queues like DEV, QA etc. (each worker runs with an individual queue name) which are running fine.
While scheduling a DAG in one of the worker (no other DAG have been setup for this worker/machine previously), I am seeing the error in the 1st task and as a result downstream tasks are failing:
*** Log file isn't local.
*** Fetching here: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
*** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
I have configured MySQL for storing the DAG metadata. When I checked task_instance table, I see proper hostnames are populated against the task.
I also checked the log location and found that the log is getting created.
airflow.cfg snippet:
base_log_folder = /var/log/airflow
base_url = http://<webserver ip>:8082
worker_log_server_port = 8793
api_client = airflow.api.client.local_client
endpoint_url = http://localhost:8080
What am I missing here? What configurations do I need to check additionally for resolving this issue?
Looks like the worker's hostname is not being correctly resolved.
Add a file hostname_resolver.py:
import os
import socket
import requests
def resolve():
"""
Resolves Airflow external hostname for accessing logs on a worker
"""
if 'AWS_REGION' in os.environ:
# Return EC2 instance hostname:
return requests.get(
'http://169.254.169.254/latest/meta-data/local-ipv4').text
# Use DNS request for finding out what's our external IP:
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect(('1.1.1.1', 53))
external_ip = s.getsockname()[0]
s.close()
return external_ip
And export: AIRFLOW__CORE__HOSTNAME_CALLABLE=airflow.hostname_resolver:resolve
The web program of the master needs to go to the worker to fetch the log and display it on the front-end page. This process is to find the host name of the worker. Obviously, the host name cannot be found,Therefore, add the host name to IP mapping on the master's vim /etc/hosts
If this happens as part of a Docker Compose Airflow setup, the hostname resolution needs to be passed to the container hosting the webserver, e.g. through extra_hosts:
# docker-compose.yml
version: "3.9"
services:
webserver:
extra_hosts:
- "worker_hostname_0:192.168.xxx.yyy"
- "worker_hostname_1:192.168.xxx.zzz"
...
...
More details here.

Airflow: New DAG is not found by webserver

In Airflow, how should I handle the error "This DAG isn't available in the webserver DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database"?
I've copied a new DAG to an Airflow server, and have tried:
unpausing it and refreshing it (basic operating procedure, given in this previous answer https://stackoverflow.com/a/42291683/160406)
restarting the webserver
restarting the scheduler
stopping the webserver and scheduler, resetting the database (airflow resetdb), then starting the webserver and scheduler again
running airflow backfill (suggested here Airflow "This DAG isnt available in the webserver DagBag object ")
running airflow trigger_dag
The scheduler log shows it being processed and no errors occurring, I can interact with it and view it's state through the CLI, but it still does not appear in the web UI.
Edit: the webserver and scheduler are running on the same machine with the same airflow.cfg. They're not running in Docker.
They're run by Supervisor, which runs them both as the same user (airflow). The airflow user has read, write and execute permission on all of the dag files.
This helped me...
pkill -9 -f "airflow scheduler"
pkill -9 -f "airflow webserver"
pkill -9 -f "gunicorn"
then restart the airflow scheduler and webserver.
Just had this issue myself. After changing permissions, resetting the meta database, restarting the webserver & even making some potential code changes to rectify the situation, it didn't happen.
However, I noticed that even though we were stopping the webserver, our gunicorn process was still running. Killing these processes & then starting everything back up resulted in success
I had the same problem on an airflow installed from a Docker image
What I did was:
1- delete all files .pyc
2- delete Metadata databse using :
for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]:
sql="delete from {} where dag_id='{}'".format(t, dag_input)
hook.run(sql, True)
3- restart webserver & scheduler
4- Execute airflow updatedb
It resolved the problem for me.
if the airflow_home - dags_folder config parameter is same for scheduler, webUI and the command line interface the only cause for the error:
This DAG isn't available in the webserver DagBag object
can be file permissions or error in python script.
Please check
Run the dag as normal python script and check for errors
User in airflow.cfg and the one creating the dag should be same or the dag should have execute permission for the airflow user
With Airflow 1.9 I don't experience the problem with zombie gunicorn processes.
I do a simple restart: systemctl restart airflow-webserver and it forces webserver to refresh DAG status.

Resources