Airflow: Indefinitely running HTTP Task with no response - http

Please help me understand on why is this http task running for long time, with no progress.
I`m running the official example on HTTP but looks like missing something here.
https://github.com/apache/airflow/blob/providers-http/4.1.1/tests/system/providers/http/example_http.py
AIRFLOW_CTX_DAG_EMAIL=airflow#example.com
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=example_http_operator
AIRFLOW_CTX_TASK_ID=http_sensor_check
AIRFLOW_CTX_EXECUTION_DATE=2023-02-17T20:53:45.614721+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-02-17T20:53:45.614721+00:00
[2023-02-17, 20:53:48 UTC] {__init__.py:117} DEBUG - Preparing lineage inlets and outlets
[2023-02-17, 20:53:48 UTC] {__init__.py:155} DEBUG - inlets: [], outlets: []
[2023-02-17, 20:53:48 UTC] {http.py:122} INFO - Poking:
[2023-02-17, 20:53:48 UTC] {base.py:73} INFO - Using connection ID 'http_default' for task execution.
[2023-02-17, 20:53:48 UTC] {http.py:150} DEBUG - Sending 'GET' to url: https://jsonplaceholder.typicode.com/
[2023-02-17, 20:53:52 UTC] {taskinstance.py:769} DEBUG - Refreshing TaskInstance <TaskInstance: example_http_operator.http_sensor_check manual__2023-02-17T20:53:45.614721+00:00 [running]> from DB
[2023-02-17, 20:53:52 UTC] {base_job.py:240} DEBUG - [heartbeat]
[2023-02-17, 20:53:58 UTC] {taskinstance.py:769} DEBUG - Refreshing TaskInstance <TaskInstance: example_http_operator.http_sensor_check manual__2023-02-17T20:53:45.614721+00:00 [running]> from DB
[2023-02-17, 20:53:58 UTC] {base_job.py:240} DEBUG - [heartbeat]
[2023-02-17, 20:54:03 UTC] {taskinstance.py:769} DEBUG - Refreshing TaskInstance <TaskInstance: example_http_operator.http_sensor_check manual__2023-02-17T20:53:45.614721+00:00 [running]> from DB
[2023-02-17, 20:54:03 UTC] {base_job.py:240} DEBUG - [heartbeat]
[2023-02-17, 20:54:08 UTC] {taskinstance.py:769} DEBUG - Refreshing TaskInstance <TaskInstance: example_http_operator.http_sensor_check manual__2023-02-17T20:53:45.614721+00:00 [running]> from DB
[2023-02-17, 20:54:08 UTC] {base_job.py:240} DEBUG - [heartbeat]
[2023-02-17, 20:54:13 UTC] {taskinstance.py:769} DEBUG - Refreshing TaskInstance <TaskInstance: example_http_operator.http_sensor_check manual__2023-02-17T20:53:45.614721+00:00 [running]> from DB
Surprisingly, I`m able to test this code from CLI without any issue but having trouble run this from UI.
AIRFLOW_CTX_DAG_EMAIL=airflow#example.com
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=example_http_operator
AIRFLOW_CTX_TASK_ID=http_sensor_check
AIRFLOW_CTX_EXECUTION_DATE=2023-02-17T21:05:22.781965+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=__airflow_temporary_run_2023-02-17T21:05:22.781968+00:00__
[2023-02-17 16:05:23,328] {__init__.py:117} DEBUG - Preparing lineage inlets and outlets
[2023-02-17 16:05:23,328] {__init__.py:155} DEBUG - inlets: [], outlets: []
[2023-02-17 16:05:23,329] {http.py:122} INFO - Poking:
[2023-02-17 16:05:23,332] {base.py:73} INFO - Using connection ID 'http_default' for task execution.
[2023-02-17 16:05:23,332] {http.py:150} DEBUG - Sending 'GET' to url: https://jsonplaceholder.typicode.com/
[2023-02-17 16:05:23,335] {connectionpool.py:1003} DEBUG - Starting new HTTPS connection (1): jsonplaceholder.typicode.com:443
[2023-02-17 16:05:23,667] {connectionpool.py:456} DEBUG - https://jsonplaceholder.typicode.com:443 "GET / HTTP/1.1" 200 None
[2023-02-17 16:05:23,669] {base.py:228} INFO - Success criteria met. Exiting.
[2023-02-17 16:05:23,669] {__init__.py:75} DEBUG - Lineage called with inlets: [], outlets: []
[2023-02-17 16:05:23,670] {taskinstance.py:1329} DEBUG - Clearing next_method and next_kwargs.
[2023-02-17 16:05:23,670] {taskinstance.py:1318} INFO - Marking task as SUCCESS. dag_id=example_http_operator, task_id=http_sensor_check, execution_date=20230217T210522, start_date=, end_date=20230217T210523
[2023-02-17 16:05:23,670] {taskinstance.py:2241} DEBUG - Task Duration set to None
[2023-02-17 16:05:23,696] {cli_action_loggers.py:83} DEBUG - Calling callbacks: []
[2023-02-17 16:05:23,696] {settings.py:407} DEBUG - Disposing DB connection pool (PID 65429)

Related

Airflow BashOperator completes code with error code 0. However, Airflow marks the task as failed

I am working on Airflow. I have several Bash operators which calls the Python code. Normally it works fine. However, from yesterday I faced a situation, I cannot understand. In the logs of the task, everything is ok as seen in the below;
*** Reading local file: /opt/airflow/logs/dag_id=derin_emto_preprocess/run_id=manual__2022-10-01T13:54:50.246801+00:00/task_id=emto_preprocess-month0day0/attempt=1.log
[2022-10-01, 13:55:21 UTC] {taskinstance.py:1159} INFO - Dependencies all met for <TaskInstance: derin_emto_preprocess.emto_preprocess-month0day0 manual__2022-10-01T13:54:50.246801+00:00 [queued]>
[2022-10-01, 13:55:21 UTC] {taskinstance.py:1159} INFO - Dependencies all met for <TaskInstance: derin_emto_preprocess.emto_preprocess-month0day0 manual__2022-10-01T13:54:50.246801+00:00 [queued]>
[2022-10-01, 13:55:21 UTC] {taskinstance.py:1356} INFO -
--------------------------------------------------------------------------------
[2022-10-01, 13:55:21 UTC] {taskinstance.py:1357} INFO - Starting attempt 1 of 1
[2022-10-01, 13:55:21 UTC] {taskinstance.py:1358} INFO -
--------------------------------------------------------------------------------
[2022-10-01, 13:55:21 UTC] {taskinstance.py:1377} INFO - Executing <Task(BashOperator): emto_preprocess-month0day0> on 2022-10-01 13:54:50.246801+00:00
[2022-10-01, 13:55:21 UTC] {standard_task_runner.py:52} INFO - Started process 624 to run task
[2022-10-01, 13:55:21 UTC] {standard_task_runner.py:79} INFO - Running: ['***', 'tasks', 'run', 'derin_emto_preprocess', 'emto_preprocess-month0day0', 'manual__2022-10-01T13:54:50.246801+00:00', '--job-id', '8958', '--raw', '--subdir', 'DAGS_FOLDER/derin_emto_preprocess.py', '--cfg-path', '/tmp/tmpjn_8tmiv', '--error-file', '/tmp/tmp_jr_2w3j']
[2022-10-01, 13:55:21 UTC] {standard_task_runner.py:80} INFO - Job 8958: Subtask emto_preprocess-month0day0
[2022-10-01, 13:55:21 UTC] {task_command.py:369} INFO - Running <TaskInstance: derin_emto_preprocess.emto_preprocess-month0day0 manual__2022-10-01T13:54:50.246801+00:00 [running]> on host 5b44f8453a08
[2022-10-01, 13:55:21 UTC] {taskinstance.py:1571} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=derin_emto_preprocess
AIRFLOW_CTX_TASK_ID=emto_preprocess-month0day0
AIRFLOW_CTX_EXECUTION_DATE=2022-10-01T13:54:50.246801+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-10-01T13:54:50.246801+00:00
[2022-10-01, 13:55:21 UTC] {subprocess.py:62} INFO - Tmp dir root location:
/tmp
[2022-10-01, 13:55:21 UTC] {subprocess.py:74} INFO - Running command: ['bash', '-c', 'python /opt/***/dags/scripts/derin/pipeline/pipeline.py --valid_from=20200101 --valid_until=20200102 --purpose=emto_preprocess --module=emto_preprocess --***=True']
[2022-10-01, 13:55:21 UTC] {subprocess.py:85} INFO - Output:
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:22 : Hello, world!
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:22 : [20200101, 20200102)
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:22 : Running emto_preprocess purpose
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - Current directory : /opt/***/dags
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:22 : Airflow parameter passed: changing configuration..
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:24 : Parallel threads: 15
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:24 : External money transfer out: preprocess is starting..
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO -
Thread None for emto_preprocess: 0%| | 0/1 [00:00<?, ?it/s]
Thread None for emto_preprocess: 100%|██████████| 1/1 [00:00<00:00, 12633.45it/s]
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:24 : DEBUG: Checking existing files
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:24 : This module is already processed
[2022-10-01, 13:55:24 UTC] {subprocess.py:92} INFO - 2022-10-01 13:55:24 : Good bye!
[2022-10-01, 13:55:24 UTC] {subprocess.py:96} INFO - Command exited with return code 0
[2022-10-01, 13:55:24 UTC] {taskinstance.py:1400} INFO - Marking task as SUCCESS. dag_id=derin_emto_preprocess, task_id=emto_preprocess-month0day0, execution_date=20221001T135450, start_date=20221001T135521, end_date=20221001T135524
[2022-10-01, 13:55:24 UTC] {local_task_job.py:156} INFO - Task exited with return code 0
[2022-10-01, 13:55:25 UTC] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check
However, Airflow marked this task as failed. How can I fix this?
I understood and solved this weird problem. When Airflow renders/upload the dags, it reads all files in /airflow/dags/ folder. I have a data storage folder under this with 250 GB consisting of many feather file. I guess reading files take too much time, and creates this kind of situation. Solution is creating an .airflowignore file, and adding other directories (not storing dag files) in .airflowignore file.

Airflow task randomly exited with return code 1 [Local Executor / PythonOperator]

To give some context, I am using Airflow 2.3.0 on Kubernetes with the Local Executor (which may sound weird, but it works for us for now) with one pod for the webserver and two for the scheduler.
I have a DAG consisting of a single task (PythonOperator) that makes many API calls (200K) using requests.
Every 15 calls, the data is loaded in a DataFrame and stored on AWS S3 (using boto3) to reduce the RAM usage.
The problem is that I can't get to the end of this task because it goes into error randomly (after 1, 10 or 120 minutes).
I have made more than 50 tries, no success and the only logs on the task are:
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1159} INFO - Dependencies all met for <TaskInstance: INGESTION-DAILY-dag.extract_task scheduled__2022-08-30T00:00:00+00:00 [queued]>
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1159} INFO - Dependencies all met for <TaskInstance: INGESTION-DAILY-dag.extract_task scheduled__2022-08-30T00:00:00+00:00 [queued]>
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1356} INFO -
--------------------------------------------------------------------------------
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1357} INFO - Starting attempt 23 of 24
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1358} INFO -
--------------------------------------------------------------------------------
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1377} INFO - Executing <Task(_PythonDecoratedOperator): extract_task> on 2022-08-30 00:00:00+00:00
[2022-09-01, 14:45:44 UTC] {standard_task_runner.py:52} INFO - Started process 942 to run task
[2022-09-01, 14:45:44 UTC] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'INGESTION-DAILY-dag', 'extract_task', 'scheduled__2022-08-30T00:00:00+00:00', '--job-id', '4390', '--raw', '--subdir', 'DAGS_FOLDER/dags/ingestion/daily_dag/dag.py', '--cfg-path', '/tmp/tmpwxasaq93', '--error-file', '/tmp/tmpl7t_gd8e']
[2022-09-01, 14:45:44 UTC] {standard_task_runner.py:80} INFO - Job 4390: Subtask extract_task
[2022-09-01, 14:45:45 UTC] {task_command.py:369} INFO - Running <TaskInstance: INGESTION-DAILY-dag.extract_task scheduled__2022-08-30T00:00:00+00:00 [running]> on host 10.XX.XXX.XXX
[2022-09-01, 14:48:17 UTC] {local_task_job.py:156} INFO - Task exited with return code 1
[2022-09-01, 14:48:17 UTC] {taskinstance.py:1395} INFO - Marking task as UP_FOR_RETRY. dag_id=INGESTION-DAILY-dag, task_id=extract_task, execution_date=20220830T000000, start_date=20220901T144544, end_date=20220901T144817
[2022-09-01, 14:48:17 UTC] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check
But when I go to the pod logs, I get the following message:
[2022-09-01 14:06:31,624] {local_executor.py:128} ERROR - Failed to execute task an integer is required (got type ChunkedEncodingError).
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/local_executor.py", line 124, in _execute_work_in_fork
args.func(args)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 51, in command
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 99, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 377, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 183, in _run_task_by_selected_method
_run_task_by_local_task_job(args, ti)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 241, in _run_task_by_local_task_job
run_job.run()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/base_job.py", line 244, in run
self._execute()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/local_task_job.py", line 105, in _execute
self.task_runner.start()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py", line 41, in start
self.process = self._start_by_fork()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py", line 125, in _start_by_fork
os._exit(return_code)
TypeError: an integer is required (got type ChunkedEncodingError)
What I find strange is that I never had this error on other DAGs (where tasks are smaller and faster). I checked, during an attempt, CPU and RAM usages are stable and low.
I have the same error locally, I also tried to upgrade to 2.3.4 but nothing works.
Do you have any idea how to fix this?
Thanks a lot!
Nicolas
As #EDG956 said, this is not an error from Airflow but from the code.
I solved it using a context manager (which was not enough) and recreating a session:
s = requests.Session()
while True:
try:
with s.get(base_url) as r:
response = r
except requests.exceptions.ChunkedEncodingError:
s.close()
s.requests.Session()
response = s.get(base_url)

Airflow 2.0.2 - Hourly DAG getting stuck seeing Refreshing TaskInstance repeatedly

I've been noticing that some of the DAG runs for an hourly DAG are being skipped, I checked the log for the DAG run before it started skipping and noticed it had actually been running for 7 hours which is why other DAG runs didn't happen, it is very strange since it usually only takes 30 min to finish running.
We're using Airflow version 2.0.2
This is what I saw in the logs:
2022-05-06 13:26:56,668] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]> from DB
[2022-05-06 13:26:56,806] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]>
[2022-05-06 13:27:01,860] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]> from DB
[2022-05-06 13:27:01,872] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]>
[2022-05-06 13:27:06,960] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]> from DB
[2022-05-06 13:27:07,019] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]>
[2022-05-06 13:27:12,224] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]> from DB
[2022-05-06 13:27:12,314] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]>
[2022-05-06 13:27:17,368] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: dfp_hourly.revequery 2022-05-05T13:00:00+00:00 [running]> from DB
[2022-05-06 13:27:17,377] {taskinstance.py:630} DEBUG - Refreshed TaskInstance
well, I think you are running too many task-parallel which causes them to run for hours, well this can be fixed by using Pool. Airflow pools can be used to limit the execution parallelism on arbitrary sets of tasks. The list of pools is managed in the UI (Menu -> Admin -> Pools) by giving the pools a name and assigning it several worker slots.
Tasks can then be associated with one of the existing pools by using the pool parameter when creating tasks:
aggregate_db_message_job = BashOperator(
task_id="aggregate_db_message_job",
execution_timeout=timedelta(hours=3),
pool="ep_data_pipeline_db_msg_agg",
bash_command=aggregate_db_message_job_cmd,
dag=dag,
)
aggregate_db_message_job.set_upstream(wait_for_empty_queue)
Tasks will be scheduled as usual while the slots fill up. The number of slots occupied by a task can be configured by pool_slots (see the section below). Once capacity is reached, runnable tasks get queued and their state will show as such in the UI. As slots free up, queued tasks start running based on the Priority Weights of the task and its descendants.
Note that if tasks are not given a pool, they are assigned to a default pool default_pool, which is initialized with 128 slots and can be modified through the UI or CLI (but cannot be removed).

Airflow tasks showing too many DEBUG lines

We recently migrated from Airflow 1.10.11 to 2.2.3.
After successful migration, we started seeing the tasks showing DEBUG logs as below. For one of the tasks, we see that there are more than 8k DEBUG lines which takes forever to load the page and also making it difficult to identify the logs coming from the DAG code itself.
[2022-03-22, 21:11:30 UTC] {store.py:44} DEBUG - Building store_backend.
[2022-03-22, 21:11:30 UTC] {hooks.py:421} DEBUG - Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
[2022-03-22, 21:11:30 UTC] {hooks.py:421} DEBUG - Changing event name from before-call.apigateway to before-call.api-gateway
[2022-03-22, 21:11:30 UTC] {hooks.py:421} DEBUG - Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
[2022-03-22, 21:11:30 UTC] {hooks.py:421} DEBUG - Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
[2022-03-22, 21:11:30 UTC] {hooks.py:421} DEBUG - Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
[2022-03-22, 21:11:30 UTC] {hooks.py:421} DEBUG - Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
[2022-03-22, 21:11:30 UTC] {hooks.py:421} DEBUG - Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
[2022-03-22, 21:11:31 UTC] {hooks.py:421} DEBUG - Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
[2022-03-22, 21:11:31 UTC] {hooks.py:421} DEBUG - Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
[2022-03-22, 21:11:31 UTC] {hooks.py:421} DEBUG - Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
[2022-03-22, 21:11:31 UTC] {hooks.py:421} DEBUG - Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: env
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: assume-role
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: assume-role-with-web-identity
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: sso
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: shared-credentials-file
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: custom-process
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: config-file
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: ec2-credentials-file
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: boto-config
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: container-role
[2022-03-22, 21:11:31 UTC] {credentials.py:1984} DEBUG - Looking for credentials via: iam-role
[2022-03-22, 21:11:31 UTC] {loaders.py:175} DEBUG - Loading JSON file: /home/airflow/.local/lib/python3.7/site-packages/botocore/data/endpoints.json
[2022-03-22, 21:11:31 UTC] {loaders.py:175} DEBUG - Loading JSON file: /home/airflow/.local/lib/python3.7/site-packages/botocore/data/sdk-default-configuration.json
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event choose-service-name: calling handler <function handle_service_name_alias at 0x7fba40dc6440>
[2022-03-22, 21:11:31 UTC] {loaders.py:175} DEBUG - Loading JSON file: /home/airflow/.local/lib/python3.7/site-packages/botocore/data/s3/2006-03-01/service-2.json
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7fba40eae0e0>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function lazy_call.<locals>._handler at 0x7fba2a069cb0>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7fba40ea6e60>
[2022-03-22, 21:11:31 UTC] {endpoint.py:345} DEBUG - Setting s3 timeout as (60, 60)
[2022-03-22, 21:11:31 UTC] {loaders.py:175} DEBUG - Loading JSON file: /home/airflow/.local/lib/python3.7/site-packages/botocore/data/_retry.json
[2022-03-22, 21:11:31 UTC] {client.py:198} DEBUG - Registering retry handlers for service: s3
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function sse_md5 at 0x7fba40babb00>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function validate_bucket_name at 0x7fba40baba70>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7fba29500990>>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7fba29536190>>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function generate_idempotent_uuid at 0x7fba40bab8c0>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <function add_expect_header at 0x7fba40babdd0>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7fba29500990>>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <function add_recursion_detection_header at 0x7fba40ba65f0>
[2022-03-22, 21:11:31 UTC] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <function inject_api_version_header_if_needed at 0x7fba40bb3170>
As per some suggestions, we added the below lines to the airflow.cfg but when we did airflow config list inside the webserver pod, it is still showing DEBUG.
[logging]
base_log_folder = /opt/airflow/logs
remote_logging = False
logging_level = INFO
Can someone help how to get rid of DEBUG logs for tasks please?

Airlfow Execution Timeout not working well

I've set 'execution_timeout': timedelta(seconds=300) parameter on many tasks. When the execution timeout is set on task downloading data from Google Analytics it works properly - after ~300 seconds is the task set to failed. The task downloads some data from API (python), then it does some transformations (python) and loads data into PostgreSQL.
Then I've a task which executes only one PostgreSQL function - execution sometimes takes more than 300 seconds but I get this (task is marked as finished successfully).
*** Reading local file: /home/airflow/airflow/logs/bulk_replication_p2p_realtime/t1/2020-07-20T00:05:00+00:00/1.log
[2020-07-20 05:05:35,040] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: bulk_replication_p2p_realtime.t1 2020-07-20T00:05:00+00:00 [queued]>
[2020-07-20 05:05:35,051] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: bulk_replication_p2p_realtime.t1 2020-07-20T00:05:00+00:00 [queued]>
[2020-07-20 05:05:35,051] {__init__.py:1353} INFO -
--------------------------------------------------------------------------------
[2020-07-20 05:05:35,051] {__init__.py:1354} INFO - Starting attempt 1 of 1
[2020-07-20 05:05:35,051] {__init__.py:1355} INFO -
--------------------------------------------------------------------------------
[2020-07-20 05:05:35,098] {__init__.py:1374} INFO - Executing <Task(PostgresOperator): t1> on 2020-07-20T00:05:00+00:00
[2020-07-20 05:05:35,099] {base_task_runner.py:119} INFO - Running: ['airflow', 'run', 'bulk_replication_p2p_realtime', 't1', '2020-07-20T00:05:00+00:00', '--job_id', '958216', '--raw', '-sd', 'DAGS_FOLDER/bulk_replication_p2p_realtime.py', '--cfg_path', '/tmp/tmph11tn6fe']
[2020-07-20 05:05:37,348] {base_task_runner.py:101} INFO - Job 958216: Subtask t1 [2020-07-20 05:05:37,347] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=10, pool_recycle=1800, pid=26244
[2020-07-20 05:05:39,503] {base_task_runner.py:101} INFO - Job 958216: Subtask t1 [2020-07-20 05:05:39,501] {__init__.py:51} INFO - Using executor LocalExecutor
[2020-07-20 05:05:39,857] {base_task_runner.py:101} INFO - Job 958216: Subtask t1 [2020-07-20 05:05:39,856] {__init__.py:305} INFO - Filling up the DagBag from /home/airflow/airflow/dags/bulk_replication_p2p_realtime.py
[2020-07-20 05:05:39,894] {base_task_runner.py:101} INFO - Job 958216: Subtask t1 [2020-07-20 05:05:39,894] {cli.py:517} INFO - Running <TaskInstance: bulk_replication_p2p_realtime.t1 2020-07-20T00:05:00+00:00 [running]> on host dwh2-airflow-dev
[2020-07-20 05:05:39,938] {postgres_operator.py:62} INFO - Executing: CALL dw_system.bulk_replicate(p_graph_name=>'replication_p2p_realtime',p_group_size=>4 , p_group=>1, p_dag_id=>'bulk_replication_p2p_realtime', p_task_id=>'t1')
[2020-07-20 05:05:39,960] {logging_mixin.py:95} INFO - [2020-07-20 05:05:39,953] {base_hook.py:83} INFO - Using connection to: id: postgres_warehouse. Host: XXX Port: 5432, Schema: XXXX Login: XXX Password: XXXXXXXX, extra: {}
[2020-07-20 05:05:39,973] {logging_mixin.py:95} INFO - [2020-07-20 05:05:39,972] {dbapi_hook.py:171} INFO - CALL dw_system.bulk_replicate(p_graph_name=>'replication_p2p_realtime',p_group_size=>4 , p_group=>1, p_dag_id=>'bulk_replication_p2p_realtime', p_task_id=>'t1')
[2020-07-20 05:23:21,450] {logging_mixin.py:95} INFO - [2020-07-20 05:23:21,449] {timeout.py:42} ERROR - Process timed out, PID: 26244
[2020-07-20 05:23:36,453] {logging_mixin.py:95} INFO - [2020-07-20 05:23:36,452] {jobs.py:2562} INFO - Task exited with return code 0
Does anyone know how to enforce execution timeout out for such long running functions? It seems that the execution timeout is evaluated once the PG function finish.
Airflow uses the signal module from the standard library to affect a timeout. In Airflow it's used to hook into these system signals and request that the calling process be notified in N seconds and, should the process still be inside the context (see the __enter__ and __exit__ methods on the class) it will raise an AirflowTaskTimeout exception.
Unfortunately for this situation, there are certain classes of system operations that cannot be interrupted. This is actually called out in the signal documentation:
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
To which we say "But I'm not doing a long-running calculation in C!" -- yeah for Airflow this is almost always due to uninterruptable I/O operations.
The highlighted sentence above (emphasis mine) nicely explains why the handler is still triggered even after the task is allowed to (frustratingly!) finish, well beyond your requested timeout.

Categories

Resources