i have my airflow docker container running with celery worker nodes, the jobs are getting triggered correctly. However from the webserver UI section the it cannot reach the worker for log, showing as
*** Log file does not exist: /usr/local/airflow/airflow/logs/spark-submit/transform/2021-08-07T04:21:58.646836+00:00/1.log
*** Fetching from: http://localhost.localdomain:8793/log/spark-submit/transform/2021-08-07T04:21:58.646836+00:00/1.log
*** Failed to fetch log file from worker. [Errno 111] Connection refused
trying to diagnose why this happened? does it mean that the webserver is not able to resolve the worker ip address correctly? How can i configure this worker ip mapping somewhere?
I've tried set up the hostname in airflow.cfg as
hostname_callable = airflow.utils.net.get_host_ip_address
but doesn't help.
Appreciate any help! Thanks
Version:
airflow==2.1.2
celery[redis]==4.4.2
I'm working on a project for deploying pentest lab with terraform & ansible. All is working good except that last problem.
In my lab I have a nginx server running on a Windows server. Nginx with php works when I start them as Administrator with ansible but i need them to run with a non admin local account.
For the php i've made a wrapper using this tools : https://github.com/antonioCoco/RunasCs
But it doesn't work with nginx cause of a working directory problem :
Here is the error :
PS C:\Users\Administrator> .\RunAsCs.exe nginx ***** C:\Web\nginx-1.19.6\nginx.exe
[*] Warning: GetUserProfileDirectory failed with error code: 2
[*] Warning: Unable to obtain environment for user 'nginx'.
[*] Warning: Environment of created process might be incorrect.
nginx: [alert] could not open error log file: CreateFile() "logs/error.log" failed (3: The system cannot find the path specified)
2021/03/06 10:18:33 [emerg] 5556#6124: CreateFile() "C:\Windows\system32/conf/nginx.conf" failed (3: The system cannot find the path specified)
And that's normal because as you can see my wrapper start in Windows/System32
I would like to know if there is a solution either with nginx.conf or with ansible to start this exe as the "nginx" user.
This is a working code for starting nginx as Administrator
- name: Starting web server
win_shell: .\nginx.exe
args:
chdir: C:\Web\nginx-1.19.6
async: 180
poll: 0
I know that there is a psexec module in ansible but psexec will work only for Local Admin account and the goal of that is that my nginx don't run as Local Admin.
Thanks for the help !
Composer is failing a task due to it not being able to read a log file, it's complaining about incorrect encoding.
Here's the log that appears in the UI:
*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)
*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I try viewing the file in the google cloud console and it also throws an error:
Failed to load
Tracking Number: 8075820889980640204
But I am able to download the file via gsutil.
When I view the file, it seems to have text overriding other text.
I can't show the entire file but it looks like this:
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
Where the #-#{} pieces seems to be "on top of" the typical log.
I faced the same problem. In my case the problem was that I removed the google_gcloud_default connection that was being used to retrieve the logs.
Check the configuration and look for the connection name.
[core]
remote_log_conn_id = google_cloud_default
Then check the credentials used for that connection name has the right permissions to access the GCS bucket.
I'm having a similar problem with viewing logs in GCP Cloud Composer. It doesn't appear to be preventing the failing DAG task from running though. What it looks like is a permissions error between the GKE and Storage Bucket where the log files are kept.
You can still view the logs by going into your cluster's storage bucket in the same directory as your /dags folder where you should also see a logs/ folder.
Your helm chart should setup global env:
- name: AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT
value: "google-cloud-platform://"
Then, you should deploy a Dockerfile with root account only (not airflow account), additionaly, you set up your helm uid, gid as:
uid: 50000 #airflow user
gid: 50000 #airflow group
Then upgrade helm chart with new config
*** Unable to read remote log from gs://bucket
1)Found the solution after assigning the roles to the service account
2)The SA key(json or txt) to be added and configured to the connection in the
remote_log_conn_id = google_cloud_default
3)restart the scheduler and webserver of the airflow
4)restart the dags on the airflow
you can find the logs on the GCS bucket where its configured
I upload a dag file to the web page and when I click 'Graph View' -> ${my_dag} -> 'View Log', it shows:
*** Log file isn't local.
*** Fetching here: http://:8793/log/demo_dag/hello_task/2018-11-14T15:06:00
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
I have checked the airflow.cfg and find these config info:
worker_log_server_port = 8793
base_log_folder = /root/airflow/logs
My question is:
How to setup IP address for log service (Only port is setup)?
I have setup directory for log service, why does it still go to /log/.. ?
Any help is appreciated.
This can happen when the task status was manually changed (likely through the "Mark Success" option) and the task never receives a hostname value on the record.
The webserver is attempting to reach out to a server, with no name, to get logs for a task that never ran.
PS: Be careful running processes as the root user.
I've been getting this error, fix it by correcting the socket volume path:
WARNING - OSError while attempting to symlink the latest log directory
In windows the volume will go with a double bar like this:
volumes:
- //var/run/docker.sock:/var/run/docker.sock
Bind to docker socket on Windows
Setting up Airflow to run with Docker Swarm’s orchestration
I have two machines. Machine1: airflow-webserver, airflow-scheduler. Machine2: airflow-worker on specific queue. I am using CeleryExecutor. Task on machine2 runs successfully (writing and deleting files on local drive), but in web UI on machine1 I didnt read log files.
*** Log file does not exist: /home/airflow/logs/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Fetching from: http://localhost-int.localdomain:8793/log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='localhost-int.localdomain', port=8793): Max retries exceeded with url: /log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
To solve this problem edit your /etc/hosts. Add ip and dns-name for airflow webserver
HTTPConnectionPool means webserver is not able to communicate to the worker node.
Add worker node hostname on /etc/hosts file
Also verify below
base_log_folder = /home/airflow/logs/
sudo chmod -R 777 /home/airflow/logs/