i have my airflow docker container running with celery worker nodes, the jobs are getting triggered correctly. However from the webserver UI section the it cannot reach the worker for log, showing as
*** Log file does not exist: /usr/local/airflow/airflow/logs/spark-submit/transform/2021-08-07T04:21:58.646836+00:00/1.log
*** Fetching from: http://localhost.localdomain:8793/log/spark-submit/transform/2021-08-07T04:21:58.646836+00:00/1.log
*** Failed to fetch log file from worker. [Errno 111] Connection refused
trying to diagnose why this happened? does it mean that the webserver is not able to resolve the worker ip address correctly? How can i configure this worker ip mapping somewhere?
I've tried set up the hostname in airflow.cfg as
hostname_callable = airflow.utils.net.get_host_ip_address
but doesn't help.
Appreciate any help! Thanks
Version:
airflow==2.1.2
celery[redis]==4.4.2
Related
I am trying to integrate an Apache Atlas instance I have running with Apache Airflow. Once I set up the connection in airflow.cfg I tried running a DAG from the Airflow scheduler. I get the following error in the log.
[2021-02-02 20:50:47,958] {connectionpool.py:752} WARNING - Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f464b856950>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/atlas/v2/types/typedefs
[2021-02-02 20:50:47,960] {taskinstance.py:1150} ERROR - HTTPConnectionPool(host='localhost', port=21000): Max retries exceeded with url: /api/atlas/v2/types/typedefs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f464b8650d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
My airflow.cfg is configured as the following:
[lineage]
backend = airflow.lineage.backend.atlas.AtlasBackend
[atlas]
username = <username>
password = <password>
host = localhost
port = 21000
I have tried changing the host to http://localhost as well. I am not sure where to investigate in Atlas to identify why the connection is being refused.
Connection Refused means either service is not listening on the configured port or the wrong hostname.
Try to replace localhost to fqdn,
A good way to configure it correctly is to access atlas ui and simply put the hostname from the url to config.
I was able to solve the problem by adding the --hostname flag when starting the docker container for atlas. I then used the hostname I provided as the host in airflow.cfg
Airflow version: 1.10.9
Executor : LocalExecutor
Docket Setup
when job runs sometime we are getting following error. I have searched in web, many people faced this issue in celeryExecutor but we are using LocalExecutor(Docker setup). How can I resolve this problem?
*** Log file does not exist: /home/ubuntu/airflow/airflow/logs/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log
*** Fetching from: http://:8793/log/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log
*** Failed to fetch log file from worker. Invalid URL 'http://:8793/log/es_update_relevance_score/es_update_relevance_score/2020-05-14T16:26:06.062416+00:00/1.log': No host supplied
Here is one approach I've seen when running the scheduler and webserver in their own containers and using LocalExecutor:
Mount a host log directory as a volume into both the scheduler and webserver containers:
volumes:
- /location/on/host/airflow/logs:/opt/airflow/logs
Make sure the user within the airflow containers (usually airflow) has permissions to read and write that directory. If the permissions are wrong you will see an error like the one in your post.
This probably won't scale beyond LocalExecutor usage though.
I am running Airflowv1.9 with Celery Executor. I have 5 Airflow workers running in 5 different machines. Airflow scheduler is also running in one of these machines. I have copied the same airflow.cfg file across these 5 machines.
I have daily workflows setup in different queues like DEV, QA etc. (each worker runs with an individual queue name) which are running fine.
While scheduling a DAG in one of the worker (no other DAG have been setup for this worker/machine previously), I am seeing the error in the 1st task and as a result downstream tasks are failing:
*** Log file isn't local.
*** Fetching here: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
*** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
I have configured MySQL for storing the DAG metadata. When I checked task_instance table, I see proper hostnames are populated against the task.
I also checked the log location and found that the log is getting created.
airflow.cfg snippet:
base_log_folder = /var/log/airflow
base_url = http://<webserver ip>:8082
worker_log_server_port = 8793
api_client = airflow.api.client.local_client
endpoint_url = http://localhost:8080
What am I missing here? What configurations do I need to check additionally for resolving this issue?
Looks like the worker's hostname is not being correctly resolved.
Add a file hostname_resolver.py:
import os
import socket
import requests
def resolve():
"""
Resolves Airflow external hostname for accessing logs on a worker
"""
if 'AWS_REGION' in os.environ:
# Return EC2 instance hostname:
return requests.get(
'http://169.254.169.254/latest/meta-data/local-ipv4').text
# Use DNS request for finding out what's our external IP:
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect(('1.1.1.1', 53))
external_ip = s.getsockname()[0]
s.close()
return external_ip
And export: AIRFLOW__CORE__HOSTNAME_CALLABLE=airflow.hostname_resolver:resolve
The web program of the master needs to go to the worker to fetch the log and display it on the front-end page. This process is to find the host name of the worker. Obviously, the host name cannot be found,Therefore, add the host name to IP mapping on the master's vim /etc/hosts
If this happens as part of a Docker Compose Airflow setup, the hostname resolution needs to be passed to the container hosting the webserver, e.g. through extra_hosts:
# docker-compose.yml
version: "3.9"
services:
webserver:
extra_hosts:
- "worker_hostname_0:192.168.xxx.yyy"
- "worker_hostname_1:192.168.xxx.zzz"
...
...
More details here.
I upload a dag file to the web page and when I click 'Graph View' -> ${my_dag} -> 'View Log', it shows:
*** Log file isn't local.
*** Fetching here: http://:8793/log/demo_dag/hello_task/2018-11-14T15:06:00
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
I have checked the airflow.cfg and find these config info:
worker_log_server_port = 8793
base_log_folder = /root/airflow/logs
My question is:
How to setup IP address for log service (Only port is setup)?
I have setup directory for log service, why does it still go to /log/.. ?
Any help is appreciated.
This can happen when the task status was manually changed (likely through the "Mark Success" option) and the task never receives a hostname value on the record.
The webserver is attempting to reach out to a server, with no name, to get logs for a task that never ran.
PS: Be careful running processes as the root user.
I've been getting this error, fix it by correcting the socket volume path:
WARNING - OSError while attempting to symlink the latest log directory
In windows the volume will go with a double bar like this:
volumes:
- //var/run/docker.sock:/var/run/docker.sock
Bind to docker socket on Windows
Setting up Airflow to run with Docker Swarm’s orchestration
I have two machines. Machine1: airflow-webserver, airflow-scheduler. Machine2: airflow-worker on specific queue. I am using CeleryExecutor. Task on machine2 runs successfully (writing and deleting files on local drive), but in web UI on machine1 I didnt read log files.
*** Log file does not exist: /home/airflow/logs/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Fetching from: http://localhost-int.localdomain:8793/log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='localhost-int.localdomain', port=8793): Max retries exceeded with url: /log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
To solve this problem edit your /etc/hosts. Add ip and dns-name for airflow webserver
HTTPConnectionPool means webserver is not able to communicate to the worker node.
Add worker node hostname on /etc/hosts file
Also verify below
base_log_folder = /home/airflow/logs/
sudo chmod -R 777 /home/airflow/logs/