I install Airflow 2.1.2 with python 3.6.5 in a linux server and trigger a DAG by run_as user but it is not worked as expected.
I've created a logs folder with chmod -R 777 but when the DAG get start and system create the log file under logs/ with
"drwx------ 3 root root"
and caused permission denied since only root can access the folder.
is there any solution I can fix it?
here is the error I got
[2021-07-16 07:00:14,334] {base_task_runner.py:135} INFO - Running: ['sudo', '-E', '-H', '-u', 'hdfsprod', 'airflow', 'tasks', 'run', 'hdfs_kinit', 'hdfs_kinit', '2021-07-16T07:00:04.092676+00:00', '--job-id', '19', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/hdfs_test.py', '--cfg-path', '/tmp/tmp8_t6s5o2', '--error-file', '/tmp/tmpmf3zruty']
[2021-07-16 07:00:15,841] {base_task_runner.py:119} INFO - Job 19: Subtask hdfs_kinit [[34m2021-07-16 07:00:15,839[0m] {[34mdagbag.py:[0m487} INFO[0m - Filling up the DagBag from /opt/data/dags/hdfs_test.py[0m
[2021-07-16 07:00:15,992] {base_task_runner.py:119} INFO - Job 19: Subtask hdfs_kinit Traceback (most recent call last):
[2021-07-16 07:00:15,993] {base_task_runner.py:119} INFO - Job 19: Subtask hdfs_kinit File "/service/python3/lib/python3.6/pathlib.py", line 1246, in mkdir
[2021-07-16 07:00:15,993] {base_task_runner.py:119} INFO - Job 19: Subtask hdfs_kinit self._accessor.mkdir(self, mode)
[2021-07-16 07:00:15,993] {base_task_runner.py:119} INFO - Job 19: Subtask hdfs_kinit File "/service/python3/lib/python3.6/pathlib.py", line 387, in wrapped
[2021-07-16 07:00:15,993] {base_task_runner.py:119} INFO - Job 19: Subtask hdfs_kinit return strfunc(str(pathobj), *args)
[2021-07-16 07:00:15,993] {base_task_runner.py:119} INFO - Job 19: Subtask hdfs_kinit PermissionError: [Errno 13] Permission denied: '/opt/data/logs/hdfs_kinit/2021-07-16T07:00:04.092676+00:00'
[2021-07-16 07:00:15,994] {base_task_runner.py:119} INFO - Job 19: Subtask hdfs_kinit
The easiest way to fix it is to make your "run_as" user belong to "root" group and change umask to 002 in the scripts that start Airflow.
https://unix.stackexchange.com/questions/12842/make-all-new-files-in-a-directory-accessible-to-a-group
Related
To give some context, I am using Airflow 2.3.0 on Kubernetes with the Local Executor (which may sound weird, but it works for us for now) with one pod for the webserver and two for the scheduler.
I have a DAG consisting of a single task (PythonOperator) that makes many API calls (200K) using requests.
Every 15 calls, the data is loaded in a DataFrame and stored on AWS S3 (using boto3) to reduce the RAM usage.
The problem is that I can't get to the end of this task because it goes into error randomly (after 1, 10 or 120 minutes).
I have made more than 50 tries, no success and the only logs on the task are:
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1159} INFO - Dependencies all met for <TaskInstance: INGESTION-DAILY-dag.extract_task scheduled__2022-08-30T00:00:00+00:00 [queued]>
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1159} INFO - Dependencies all met for <TaskInstance: INGESTION-DAILY-dag.extract_task scheduled__2022-08-30T00:00:00+00:00 [queued]>
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1356} INFO -
--------------------------------------------------------------------------------
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1357} INFO - Starting attempt 23 of 24
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1358} INFO -
--------------------------------------------------------------------------------
[2022-09-01, 14:45:44 UTC] {taskinstance.py:1377} INFO - Executing <Task(_PythonDecoratedOperator): extract_task> on 2022-08-30 00:00:00+00:00
[2022-09-01, 14:45:44 UTC] {standard_task_runner.py:52} INFO - Started process 942 to run task
[2022-09-01, 14:45:44 UTC] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'INGESTION-DAILY-dag', 'extract_task', 'scheduled__2022-08-30T00:00:00+00:00', '--job-id', '4390', '--raw', '--subdir', 'DAGS_FOLDER/dags/ingestion/daily_dag/dag.py', '--cfg-path', '/tmp/tmpwxasaq93', '--error-file', '/tmp/tmpl7t_gd8e']
[2022-09-01, 14:45:44 UTC] {standard_task_runner.py:80} INFO - Job 4390: Subtask extract_task
[2022-09-01, 14:45:45 UTC] {task_command.py:369} INFO - Running <TaskInstance: INGESTION-DAILY-dag.extract_task scheduled__2022-08-30T00:00:00+00:00 [running]> on host 10.XX.XXX.XXX
[2022-09-01, 14:48:17 UTC] {local_task_job.py:156} INFO - Task exited with return code 1
[2022-09-01, 14:48:17 UTC] {taskinstance.py:1395} INFO - Marking task as UP_FOR_RETRY. dag_id=INGESTION-DAILY-dag, task_id=extract_task, execution_date=20220830T000000, start_date=20220901T144544, end_date=20220901T144817
[2022-09-01, 14:48:17 UTC] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check
But when I go to the pod logs, I get the following message:
[2022-09-01 14:06:31,624] {local_executor.py:128} ERROR - Failed to execute task an integer is required (got type ChunkedEncodingError).
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/local_executor.py", line 124, in _execute_work_in_fork
args.func(args)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 51, in command
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 99, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 377, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 183, in _run_task_by_selected_method
_run_task_by_local_task_job(args, ti)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 241, in _run_task_by_local_task_job
run_job.run()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/base_job.py", line 244, in run
self._execute()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/local_task_job.py", line 105, in _execute
self.task_runner.start()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py", line 41, in start
self.process = self._start_by_fork()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py", line 125, in _start_by_fork
os._exit(return_code)
TypeError: an integer is required (got type ChunkedEncodingError)
What I find strange is that I never had this error on other DAGs (where tasks are smaller and faster). I checked, during an attempt, CPU and RAM usages are stable and low.
I have the same error locally, I also tried to upgrade to 2.3.4 but nothing works.
Do you have any idea how to fix this?
Thanks a lot!
Nicolas
As #EDG956 said, this is not an error from Airflow but from the code.
I solved it using a context manager (which was not enough) and recreating a session:
s = requests.Session()
while True:
try:
with s.get(base_url) as r:
response = r
except requests.exceptions.ChunkedEncodingError:
s.close()
s.requests.Session()
response = s.get(base_url)
Upon running:
airflow scheduler
I get the following error:
[2022-08-10 13:26:53,501] {scheduler_job.py:708} INFO - Starting the scheduler
[2022-08-10 13:26:53,502] {scheduler_job.py:713} INFO - Processing each file at most -1 times
[2022-08-10 13:26:53,509] {executor_loader.py:105} INFO - Loaded executor: SequentialExecutor
[2022-08-10 13:26:53 -0400] [1388] [INFO] Starting gunicorn 20.1.0
[2022-08-10 13:26:53,540] {manager.py:160} INFO - Launched DagFileProcessorManager with pid: 1389
[2022-08-10 13:26:53,545] {scheduler_job.py:1233} INFO - Resetting orphaned tasks for active dag runs
.
.
.
[2022-08-10 13:26:53 -0400] [1391] [INFO] Booting worker with pid: 1391
Process DagFileProcessor10-Process:
Traceback (most recent call last):
File "/home/dromo/anaconda3/envs/airflow_env_2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 998, in _commit_impl
self.engine.dialect.do_commit(self.connection)
File "/home/dromo/anaconda3/envs/airflow_env_2/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 672, in do_commit
dbapi_connection.commit()
sqlite3.OperationalError: disk I/O error
I get this 'disk I/O error' as well when I run airflow webserver --port 8080 command as so:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
Access Logformat:
=================================================================
[2022-08-10 14:42:28 -0400] [2759] [INFO] Starting gunicorn 20.1.0
[2022-08-10 14:42:29 -0400] [2759] [INFO] Listening at: http://0.0.0.0:8080 (2759)
[2022-08-10 14:42:29 -0400] [2759] [INFO] Using worker: sync
.
.
.
[2022-08-10 14:42:55,149] {app.py:1455} ERROR - Exception on /static/appbuilder/datepicker/bootstrap-datepicker.css [GET]
Traceback (most recent call last):
File "/home/dromo/anaconda3/envs/airflow_env_2/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 998, in _commit_impl
self.engine.dialect.do_commit(self.connection)
File "/home/dromo/anaconda3/envs/airflow_env_2/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 672, in do_commit
dbapi_connection.commit()
sqlite3.OperationalError: disk I/O error
Any ideas as to what might be causing this and possible fixes?
It seems like airflow doesn't find the database on the disk, try to initialize it:
airflow db init
I am doing Bigquery operation in composer DAG and getting the following error :
Event with job id 36cc0fe962103bf2bb7a9c Failed
[2021-08-19 15:13:07,807] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,806] {pod_launcher.py:125} INFO - b' from google.cloud.bigquery.routine import RoutineReference\n'
[2021-08-19 15:13:07,807] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,807] {pod_launcher.py:125} INFO - b' File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/routine/__init__.py", line 18, in <module>\n'
[2021-08-19 15:13:07,810] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,808] {pod_launcher.py:125} INFO - b' from google.cloud.bigquery.enums import DeterminismLevel\n'
[2021-08-19 15:13:07,810] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,808] {pod_launcher.py:125} INFO - b' File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/enums.py", line 21, in <module>\n'
[2021-08-19 15:13:07,811] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,808] {pod_launcher.py:125} INFO - b' from google.cloud.bigquery.query import ScalarQueryParameterType\n'
[2021-08-19 15:13:07,812] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,809] {pod_launcher.py:125} INFO - b' File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/query.py", line 23, in <module>\n'
[2021-08-19 15:13:07,812] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,809] {pod_launcher.py:125} INFO - b' from google.cloud.bigquery.table import _parse_schema_resource\n'
[2021-08-19 15:13:07,812] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,809] {pod_launcher.py:125} INFO - b' File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/table.py", line 23, in <module>\n'
[2021-08-19 15:13:07,813] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,809] {pod_launcher.py:125} INFO - b' import pytz\n'
[2021-08-19 15:13:07,813] {base_task_runner.py:113} INFO - Job 263145: Subtask sample [2021-08-19 15:13:07,809] {pod_launcher.py:125} INFO - b"ModuleNotFoundError: No module named 'pytz'\n"
I didn't use pytz in my airflow or BQ method . I have tried to add pytz in my composer environment but it didn't work.Please suggest.
This error is due to the breakage https://github.com/googleapis/python-bigquery/issues/885 and a fix for this is released on Bigquery version 2.24.1.
You can update your bigquery dependency using the command gcloud composer environments update. See Installing a Python dependency from PyPI for more details.
For testing purposes, environment has Composer Version=1.16.14 and Airflow Version=1.10.15. Using the pre installed packages listed in Cloud Composer pre installed packages, I only copied the Google dependencies and updated the packages below. This is because they are dependencies of Bigquery as per Bigquery constraints.
google-cloud-bigquery==2.24.1
google-api-core==1.29.0
grpcio==1.38.1
Full requirements.txt:
google-ads==4.0.0
google-api-core==1.29.0
google-api-python-client==1.12.8
google-apitools==0.5.31
google-auth==1.28.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.3
google-cloud-automl==2.2.0
google-cloud-bigquery==2.24.1
google-cloud-bigquery-datatransfer==3.1.0
google-cloud-bigquery-storage==2.1.0
google-cloud-bigtable==1.7.0
google-cloud-build==2.0.0
google-cloud-container==1.0.1
google-cloud-core==1.6.0
google-cloud-datacatalog==3.1.0
google-cloud-dataproc==2.3.0
google-cloud-datastore==1.15.3
google-cloud-dlp==1.0.0
google-cloud-kms==2.2.0
google-cloud-language==1.3.0
google-cloud-logging==2.2.0
google-cloud-memcache==0.3.0
google-cloud-monitoring==2.0.0
google-cloud-os-login==2.1.0
google-cloud-pubsub==2.3.0
google-cloud-pubsublite==0.3.0
google-cloud-redis==2.1.0
google-cloud-secret-manager==1.0.0
google-cloud-spanner==1.19.1
google-cloud-speech==1.3.2
google-cloud-storage==1.36.2
google-cloud-tasks==2.2.0
google-cloud-texttospeech==1.0.1
google-cloud-translate==1.7.0
google-cloud-videointelligence==1.16.1
google-cloud-vision==1.0.0
google-cloud-workflows==0.2.0
google-crc32c==1.1.2
google-pasta==0.2.0
google-resumable-media==1.2.0
googleapis-common-protos==1.53.0
grpc-google-iam-v1==0.12.3
grpcio==1.38.1
grpcio-gcp==0.2.2
Command:
gcloud composer environments update your-composer-environment-name --update-pypi-packages-from-file requirements.txt --location your-composer-location
I'm running airflow on my computer (Mac AirBook, 1.6 GHz Intel Core i5 and 8 GB 2133 MHz LPDDR3). A DAG with several tasks, failed with below error. Checked several articles online but with little to no help. There is nothing wrong with the task itself(double checked).
Any help is much appreciated.
[2019-08-27 13:01:55,372] {sequential_executor.py:45} INFO - Executing command: ['airflow', 'run', 'Makefile_DAG', 'normalize_companies', '2019-08-27T15:38:20.914820+00:00', '--local', '--pool', 'default_pool', '-sd', '/home/airflow/dags/makefileDAG.py']
[2019-08-27 13:01:56,937] {settings.py:213} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=40647
[2019-08-27 13:01:57,285] {__init__.py:51} INFO - Using executor SequentialExecutor
[2019-08-27 13:01:59,423] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/dags/makefileDAG.py
[2019-08-27 13:02:01,736] {cli.py:516} INFO - Running <TaskInstance: Makefile_DAG.normalize_companies 2019-08-27T15:38:20.914820+00:00 [queued]> on host ajays-macbook-air.local
Traceback (most recent call last):
File "/anaconda3/envs/airflow/bin/airflow", line 32, in <module>
args.func(args)
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
return f(*args, **kwargs)
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py", line 522, in run
_run(args, dag, ti)
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py", line 435, in _run
run_job.run()
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/jobs/base_job.py", line 213, in run
self._execute()
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/jobs/local_task_job.py", line 111, in _execute
self.heartbeat()
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/jobs/base_job.py", line 196, in heartbeat
self.heartbeat_callback(session=session)
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/db.py", line 70, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/jobs/local_task_job.py", line 159, in heartbeat_callback
raise AirflowException("Hostname of job runner does not match")
airflow.exceptions.AirflowException: Hostname of job runner does not match
[2019-08-27 13:05:05,904] {sequential_executor.py:52} ERROR - Failed to execute task Command '['airflow', 'run', 'Makefile_DAG', 'normalize_companies', '2019-08-27T15:38:20.914820+00:00', '--local', '--pool', 'default_pool', '-sd', '/home/airflow/dags/makefileDAG.py']' returned non-zero exit status 1..
[2019-08-27 13:05:05,905] {scheduler_job.py:1256} INFO - Executor reports execution of Makefile_DAG.normalize_companies execution_date=2019-08-27 15:38:20.914820+00:00 exited with status failed for try_number 2
Logs from the task:
[2019-08-27 13:02:13,616] {bash_operator.py:115} INFO - Running command: python /home/Makefile_Redo/normalize_companies.py
[2019-08-27 13:02:13,628] {bash_operator.py:124} INFO - Output:
[2019-08-27 13:05:02,849] {logging_mixin.py:95} INFO - [[34m2019-08-27 13:05:02,848[0m] {[34mlocal_task_job.py:[0m158} [33mWARNING[0m - [33mThe recorded hostname [1majays-macbook-air.local[0m does not match this instance's hostname [1mAJAYs-MacBook-Air.local[0m[0m
[2019-08-27 13:05:02,860] {helpers.py:319} INFO - Sending Signals.SIGTERM to GPID 40649
[2019-08-27 13:05:02,861] {taskinstance.py:897} ERROR - Received SIGTERM. Terminating subprocesses.
[2019-08-27 13:05:02,862] {bash_operator.py:142} INFO - Sending SIGTERM signal to bash process group
[2019-08-27 13:05:03,539] {taskinstance.py:1047} ERROR - Task received SIGTERM signal
Traceback (most recent call last):
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 922, in _run_raw_task
result = task_copy.execute(context=context)
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/operators/bash_operator.py", line 126, in execute
for line in iter(sp.stdout.readline, b''):
File "/anaconda3/envs/airflow/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 899, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
[2019-08-27 13:05:03,550] {taskinstance.py:1076} INFO - All retries failed; marking task as FAILED
A weird thing I noticed from above log is:
The recorded hostname [1majays-macbook-air.local[0m does not match this instance's hostname [1mAJAYs-MacBook-Air.local[0m[0m
How is this possible and any solution to fix this?
I had the same problem on my Mac. The solution that worked for me was updating airflow.cfg with hostname_callable = socket:gethostname. The original getfqdn returns different hostnames from time to time.
I have to update (re-type and save) my ssh connection password every time I restart airflow. Why is that?
I'm running airflow 1.10.3 in a docker container and I can see that all passwords are stored properly in the postgres database.
*** Reading local file: /root/airflow/logs/archive/check_source/2019-07-07T00:00:00+00:00/4.log
[2019-07-08 01:30:27,253] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: archive.check_source 2019-07-07T00:00:00+00:00 [queued]>
[2019-07-08 01:30:27,267] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: archive.check_source 2019-07-07T00:00:00+00:00 [queued]>
[2019-07-08 01:30:27,267] {__init__.py:1353} INFO -
--------------------------------------------------------------------------------
[2019-07-08 01:30:27,267] {__init__.py:1354} INFO - Starting attempt 4 of 4
[2019-07-08 01:30:27,268] {__init__.py:1355} INFO -
--------------------------------------------------------------------------------
[2019-07-08 01:30:27,295] {__init__.py:1374} INFO - Executing <Task(SSHOperator): check_source> on 2019-07-07T00:00:00+00:00
[2019-07-08 01:30:27,296] {base_task_runner.py:119} INFO - Running: [u'airflow', u'run', 'archive', 'check_source', '2019-07-07T00:00:00+00:00', u'--job_id', '1321', u'--raw', u'-sd', u'DAGS_FOLDER/archive.py', u'--cfg_path', '/tmp/tmpQwBRud']
[2019-07-08 01:30:28,392] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source [2019-07-08 01:30:28,392] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=656
[2019-07-08 01:30:28,741] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source [2019-07-08 01:30:28,740] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-07-08 01:30:28,975] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source [2019-07-08 01:30:28,974] {__init__.py:305} INFO - Filling up the DagBag from /root/airflow/dags/archive.py
[2019-07-08 01:30:29,073] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source [2019-07-08 01:30:29,073] {cli.py:517} INFO - Running <TaskInstance: archive_to_glacier.check_source 2019-07-07T00:00:00+00:00 [running]> on host airflow-webserver-66d5747dc7-99mhr
[2019-07-08 01:30:29,158] {ssh_operator.py:80} INFO - ssh_hook is not provided or invalid. Trying ssh_conn_id to create SSHHook.
[2019-07-08 01:30:29,204] {__init__.py:1580} ERROR - SSH operator error:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/lib/python2.7/site-packages/airflow/contrib/operators/ssh_operator.py", line 167, in execute
raise AirflowException("SSH operator error: {0}".format(str(e)))
AirflowException: SSH operator error:
[2019-07-08 01:30:29,206] {__init__.py:1609} INFO - All retries failed; marking task as FAILED
[2019-07-08 01:30:29,232] {logging_mixin.py:95} INFO - [2019-07-08 01:30:29,232] {configuration.py:287} WARNING - section/key [smtp/smtp_user] not found in config
[2019-07-08 01:30:29,314] {logging_mixin.py:95} INFO - [2019-07-08 01:30:29,313] {email.py:126} INFO - Sent an alert email to [u'bruno.pessanha#imc.com']
[2019-07-08 01:30:29,605] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source Traceback (most recent call last):
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source File "/usr/bin/airflow", line 32, in <module>
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source args.func(args)
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source File "/usr/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source return f(*args, **kwargs)
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 523, in run
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source _run(args, dag, ti)
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 442, in _run
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source pool=args.pool,
[2019-07-08 01:30:29,606] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 73, in wrapper
[2019-07-08 01:30:29,608] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source return func(*args, **kwargs)
[2019-07-08 01:30:29,608] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source File "/usr/lib/python2.7/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
[2019-07-08 01:30:29,608] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source result = task_copy.execute(context=context)
[2019-07-08 01:30:29,608] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source File "/usr/lib/python2.7/site-packages/airflow/contrib/operators/ssh_operator.py", line 167, in execute
[2019-07-08 01:30:29,608] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source raise AirflowException("SSH operator error: {0}".format(str(e)))
[2019-07-08 01:30:29,608] {base_task_runner.py:101} INFO - Job 1321: Subtask check_source airflow.exceptions.AirflowException: SSH operator error:
[2019-07-08 01:30:32,260] {logging_mixin.py:95} INFO - [2019-07-08 01:30:32,259] {jobs.py:2562} INFO - Task exited with return code 1
try to add new ssh_conn_id in Admin -> Connections. https://airflow.apache.org/howto/connection/index.html
Because of:
INFO - ssh_hook is not provided or invalid. Trying ssh_conn_id to create SSHHook.