Airflow - Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url - airflow

I am running Airflowv1.9 with Celery Executor. I have 5 Airflow workers running in 5 different machines. Airflow scheduler is also running in one of these machines. I have copied the same airflow.cfg file across these 5 machines.
I have daily workflows setup in different queues like DEV, QA etc. (each worker runs with an individual queue name) which are running fine.
While scheduling a DAG in one of the worker (no other DAG have been setup for this worker/machine previously), I am seeing the error in the 1st task and as a result downstream tasks are failing:
*** Log file isn't local.
*** Fetching here: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
*** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
I have configured MySQL for storing the DAG metadata. When I checked task_instance table, I see proper hostnames are populated against the task.
I also checked the log location and found that the log is getting created.
airflow.cfg snippet:
base_log_folder = /var/log/airflow
base_url = http://<webserver ip>:8082
worker_log_server_port = 8793
api_client = airflow.api.client.local_client
endpoint_url = http://localhost:8080
What am I missing here? What configurations do I need to check additionally for resolving this issue?

Looks like the worker's hostname is not being correctly resolved.
Add a file hostname_resolver.py:
import os
import socket
import requests
def resolve():
"""
Resolves Airflow external hostname for accessing logs on a worker
"""
if 'AWS_REGION' in os.environ:
# Return EC2 instance hostname:
return requests.get(
'http://169.254.169.254/latest/meta-data/local-ipv4').text
# Use DNS request for finding out what's our external IP:
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect(('1.1.1.1', 53))
external_ip = s.getsockname()[0]
s.close()
return external_ip
And export: AIRFLOW__CORE__HOSTNAME_CALLABLE=airflow.hostname_resolver:resolve

The web program of the master needs to go to the worker to fetch the log and display it on the front-end page. This process is to find the host name of the worker. Obviously, the host name cannot be found,Therefore, add the host name to IP mapping on the master's vim /etc/hosts

If this happens as part of a Docker Compose Airflow setup, the hostname resolution needs to be passed to the container hosting the webserver, e.g. through extra_hosts:
# docker-compose.yml
version: "3.9"
services:
webserver:
extra_hosts:
- "worker_hostname_0:192.168.xxx.yyy"
- "worker_hostname_1:192.168.xxx.zzz"
...
...
More details here.

Related

Apache Airflow 1.10.10, Remote Worker and S3 logs

I have recently upgraded airflow from 1.10.0 to 1.10.10. The current setup is web, worker, scheduler and flower are on same machine. When a DAG is run first step is it spins up new EMR for the DAG and along with it a worker node where only worker process runs. We are using celery executor. This worker node sends tasks to run on EMR cluster. Once the tasks are run next steps are terminating EMR and terminating this worker instance. Every task's log is present on this worker node. As long as the tasks are running or worker node is running, I can see the logs on web UI. But as soon as worker is terminated, I am unable to see the logs. The config is setup is to upload logs to s3. I see logs of startEMR and startWorker on S3 since these logs are main airflow instance(where all 4 processes are running)
Here is the config snippet of airflow.cfg
base_log_folder = /home/deploy/airflow/logs
remote_logging = True
remote_base_log_folder = s3://airflow-log-bucket/airflow/logs/
remote_log_conn_id = aws_default
encrypt_s3_logs = False
s3_log_folder = '/airflow/logs/'
executor = CeleryExecutor
Same config file is setup when worker instance is initialized for DAG and only worker process is started on that node.
Here is the log from a task when worker node is terminated.
*** Log file does not exist: /home/deploy/airflow/logs/XXXX/XXXXXX/2020-07-07T23:30:05+00:00/1.log
*** Fetching from: http://ip-10-164-62-253.ap-southeast-2.compute.internal:8799/log/XXXX/XXXXXX/2020-07-07T23:30:05+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='ip-10-164-62-253.ap-southeast-2.compute.internal', port=8799): Max retries exceeded with url: /log/xxxx/XXXXXX/2020-07-07T23:30:05+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6750ac52b0>: Failed to establish a new connection: [Errno 113] No route to host',))
So basically -
This was working in airflow 1.10.1 (I did not need to add remote_logging=True)
The logs are copied to S3 for EMR start and Worker Node start steps and are shown on web-UI.
Only tasks running on remote worker node are not copied to S3.
Can someone please let me know what am I missing in configuration as same config used to work on airflow1.10.0
I found the mistake I was doing. The S3 module that was getting installed on new worker node was being installed via pip and not pip3. Airflow server was having this installation from pip3.
Another config change I had to do was in webserver section of airflow.cfg file.
worker_class = sync
This was previously gevent.

Airflow (LocalExecutor) - Docker :: Job is failing with Log file does not exist

Airflow version: 1.10.9
Executor : LocalExecutor
Docket Setup
when job runs sometime we are getting following error. I have searched in web, many people faced this issue in celeryExecutor but we are using LocalExecutor(Docker setup). How can I resolve this problem?
*** Log file does not exist: /home/ubuntu/airflow/airflow/logs/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log
*** Fetching from: http://:8793/log/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log
*** Failed to fetch log file from worker. Invalid URL 'http://:8793/log/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log': No host supplied
Here is one approach I've seen when running the scheduler and webserver in their own containers and using LocalExecutor:
Mount a host log directory as a volume into both the scheduler and webserver containers:
volumes:
- /location/on/host/airflow/logs:/opt/airflow/logs
Make sure the user within the airflow containers (usually airflow) has permissions to read and write that directory. If the permissions are wrong you will see an error like the one in your post.
This probably won't scale beyond LocalExecutor usage though.

Airflow: How to setup log directory?

I upload a dag file to the web page and when I click 'Graph View' -> ${my_dag} -> 'View Log', it shows:
*** Log file isn't local.
*** Fetching here: http://:8793/log/demo_dag/hello_task/2018-11-14T15:06:00
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
I have checked the airflow.cfg and find these config info:
worker_log_server_port = 8793
base_log_folder = /root/airflow/logs
My question is:
How to setup IP address for log service (Only port is setup)?
I have setup directory for log service, why does it still go to /log/.. ?
Any help is appreciated.
This can happen when the task status was manually changed (likely through the "Mark Success" option) and the task never receives a hostname value on the record.
The webserver is attempting to reach out to a server, with no name, to get logs for a task that never ran.
PS: Be careful running processes as the root user.
I've been getting this error, fix it by correcting the socket volume path:
WARNING - OSError while attempting to symlink the latest log directory
In windows the volume will go with a double bar like this:
volumes:
- //var/run/docker.sock:/var/run/docker.sock
Bind to docker socket on Windows
Setting up Airflow to run with Docker Swarm’s orchestration

Remote logs in Airflow

I have two machines. Machine1: airflow-webserver, airflow-scheduler. Machine2: airflow-worker on specific queue. I am using CeleryExecutor. Task on machine2 runs successfully (writing and deleting files on local drive), but in web UI on machine1 I didnt read log files.
*** Log file does not exist: /home/airflow/logs/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Fetching from: http://localhost-int.localdomain:8793/log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='localhost-int.localdomain', port=8793): Max retries exceeded with url: /log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
To solve this problem edit your /etc/hosts. Add ip and dns-name for airflow webserver
HTTPConnectionPool means webserver is not able to communicate to the worker node.
Add worker node hostname on /etc/hosts file
Also verify below
base_log_folder = /home/airflow/logs/
sudo chmod -R 777 /home/airflow/logs/

Airflow live executor logs with DaskExecutor

I have an Airflow installation (on Kubernetes). My setup uses DaskExecutor. I also configured remote logging to S3. However when the task is running I cannot see the log, and I get this error instead:
*** Log file does not exist: /airflow/logs/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log
*** Fetching from: http://airflow-worker-74d75ccd98-6g9h5:8793/log/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-74d75ccd98-6g9h5', port=8793): Max retries exceeded with url: /log/dbt/run_dbt/2018-11-01T06:00:00+00:00/3.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7d0668ae80>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Once the task is done, the log is shown correctly.
I believe what Airflow is doing is:
for finished tasks read logs from s3
for running tasks, connect to executor's log server endpoint and show that.
Looks like Airflow is using celery.worker_log_server_port to connect to my dask executor to fetch logs from there.
How to configure DaskExecutor to expose log server endpoint?
my configuration:
core remote_logging True
core remote_base_log_folder s3://some-s3-path
core executor DaskExecutor
dask cluster_address 127.0.0.1:8786
celery worker_log_server_port 8793
what i verified:
- verified that the log file exists and is being written to on the executor while the task is running
- called netstat -tunlp on executor container, but did not find any extra port exposed, where logs could be served from.
UPDATE
have a look at serve_logs airflow cli command - I believe it does exactly the same.
We solved the problem by simply starting a python HTTP handler on a worker.
Dockerfile:
RUN mkdir -p $AIRFLOW_HOME/serve
RUN ln -s $AIRFLOW_HOME/logs $AIRFLOW_HOME/serve/log
worker.sh (run by Docker CMD):
#!/usr/bin/env bash
cd $AIRFLOW_HOME/serve
python3 -m http.server 8793 &
cd -
dask-worker $#

Resources