I have created the dag with the following configuration
job_type='daily'
SOURCE_PATH='/home/ubuntu/daily_data'
with DAG(
dag_id="transformer_daily_v1",
is_paused_upon_creation=False,
default_args=default_args,
description="transformer to insert data",
start_date=datetime(2022,9,20),
schedule_interval='31 12 * * *',
catchup=False
) as dag:
task1=PythonOperator(
task_id="dag_task_1",
python_callable=get_to_know_details(job_type,SOURCE_PATH),
)
def get_to_know_details(job_type,SOURCE_PATH):
print("************************",job_type,SOURCE_PATH)
Each time when i start the airflow using command
airflow standalone
the dag function executed automatically without Triggering as seen in the logs
standalone | Starting Airflow Standalone
standalone | Checking database is initialized
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
************************ daily /home/ubuntu/daily_data
WARNI [unusual_prefix_8fc9338bb4cf0c5518fed57dffa1a11abec44c36_example_kubernetes_executor] The example_kubernetes_executor examp
le DAG requires the kubernetes provider. Please install it with: pip install apache-airflow[cncf.kubernetes]
airflow version - 2.2.5
Related
I got this dag, nevrtheless when trying to run it, it stacks on Queued run. When i then trying to run manually i get error:
Error:
Only works with the Celery, CeleryKubernetes or Kubernetes executors
Code:
from airflow import DAG
from airflow.providers.postgres.hooks.postgres import PostgresHook
from airflow.operators.python import PythonOperator
from datetime import datetime
def helloWorld():
print('Hello World')
def take_clients():
hook = PostgresHook(postgres_conn_id="postgres_robert")
df = hook.get_pandas_df(sql="SELECT * FROM clients;")
print(df)
# do what you need with the df....
with DAG(dag_id="test",
start_date=datetime(2021,1,1),
schedule_interval="#once",
catchup=False) as dag:
task1 = PythonOperator(
task_id="hello_world",
python_callable=helloWorld)
task2 = PythonOperator(
task_id="get_clients",
python_callable=take_clients)
task1 >> task2
I guess you are trying to use RUN button from the UI.
This button is enabled only for executors that supports it.
In your Airflow setup you are using Executor that doesn't support this command.
In newer Airflow versions the button is simply disable if you you are using Executor that doesn't support it:
I assume that what you are after is to create a new run, in that case you should use Trigger Run button. If you are looking to re-run specific task then use Clear button.
you run it as LocalExecutor , you have to change your Executor to Celery, CeleryKubernetes or Kubernetes or DaskExecutor
if you using docker-compose add:
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
otherwise go to airflow Executor
I'm testing kubernetes pod operator running
Airflow 2.2.1 with kubernetes cnf 2.1.0 on my minikube.
I'm having issues trying to spaw a mock task:
init_environments = [
k8s.V1EnvVar(name='AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE', value='""'),
k8s.V1EnvVar(name='KUBERNETES__POD_TEMPLATE_FILE', value='""'),
k8s.V1EnvVar(name='POD_TEMPLATE_FILE', value='""')]
other_task = KubernetesPodOperator(
dag=dag,
task_id="ingestion_kube",
env_vars=init_environments,
cmds=["bash", "-cx"],
arguments=["echo 10 \n\n\n\n\n\n\n"],
name="base",
image="meltano-flieber",
image_pull_policy="IfNotExists",
in_cluster=True,
namespace="localkubeflow",
is_delete_operator_pod=False,
pod_template_file=None,
get_logs=True
)
The task pod from KubernetesExecutor that executes the airflow completes successfully but there is no sign of the pod operator task.
Relevant logs of airflow task:
[2021-11-23 21:30:28,213] {dagbag.py:500} INFO - Filling up the DagBag from /opt/***/dags/meltano/meltano_ingest_pendo.py
Running <TaskInstance: meltano_tasks.ingestion_kube manual__2021-11-23T21:30:10.788963+00:00 [queued]> on host meltanotasksingestionkube.996b19ee10464c2f8683cdfc8ce7303
And in airflow tha task looks like if failed but without any relevant logs, anybody have something related to that or has any suggestions?
I have configured a dag in such a way that if current instance has failed next instance won't run. However, the problem is.
Problem
let's say past instance of the task is failed and current instance is in waiting state. Once I fix the issue how to run the current instance without making past run successful. I want to see the history when the task(dag) failed.
DAG
dag = DAG(
dag_id='test_airflow',
default_args=args,
tags=['wealth', 'python', 'ml'],
schedule_interval='5 13 * * *',
max_active_runs=1,
)
run_this = BashOperator(
task_id='run_after_loop',
bash_command='lll',
dag=dag,
depends_on_past=True
)
I guess you could trigger a task execution via cli using airflow run
There are two arguments that may help you:
-i, --ignore_dependencies - Ignore task-specific dependencies, e.g. upstream, depends_on_past, and retry delay dependencies
-I, --ignore_depends_on_past - Ignore depends_on_past dependencies (but respect upstream dependencies)
I want to run a simple Dag "test_update_bq", but when I go to localhost I see this: DAG "test_update_bq" seems to be missing.
There are no errors when I run "airflow initdb", also when I run test airflow test test_update_bq update_table_sql 2015-06-01, It was successfully done and the table was updated in BQ. Dag:
from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'Anna',
'depends_on_past': True,
'start_date': datetime(2017, 6, 2),
'email': ['airflow#airflow.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 5,
'retry_delay': timedelta(minutes=5),
}
schedule_interval = "00 21 * * *"
# Define DAG: Set ID and assign default args and schedule interval
dag = DAG('test_update_bq', default_args=default_args, schedule_interval=schedule_interval, template_searchpath = ['/home/ubuntu/airflow/dags/sql_bq'])
update_task = BigQueryOperator(
dag = dag,
allow_large_results=True,
task_id = 'update_table_sql',
sql = 'update_bq.sql',
use_legacy_sql = False,
bigquery_conn_id = 'test'
)
update_task
I would be grateful for any help.
/logs/scheduler
[2019-10-10 11:28:53,308] {logging_mixin.py:95} INFO - [2019-10-10 11:28:53,308] {dagbag.py:90} INFO - Filling up the DagBag from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:53,333] {scheduler_job.py:1532} INFO - DAG(s) dict_keys(['test_update_bq']) retrieved from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:53,383] {scheduler_job.py:152} INFO - Processing /home/ubuntu/airflow/dags/update_bq.py took 0.082 seconds
[2019-10-10 11:28:56,315] {logging_mixin.py:95} INFO - [2019-10-10 11:28:56,315] {settings.py:213} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=3600, pid=11761
[2019-10-10 11:28:56,318] {scheduler_job.py:146} INFO - Started process (PID=11761) to work on /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,324] {scheduler_job.py:1520} INFO - Processing file /home/ubuntu/airflow/dags/update_bq.py for tasks to queue
[2019-10-10 11:28:56,325] {logging_mixin.py:95} INFO - [2019-10-10 11:28:56,325] {dagbag.py:90} INFO - Filling up the DagBag from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,350] {scheduler_job.py:1532} INFO - DAG(s) dict_keys(['test_update_bq']) retrieved from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,399] {scheduler_job.py:152} INFO - Processing /home/ubuntu/airflow/dags/update_bq.py took 0.081 seconds
Restarting the airflow webserver helped.
So I kill gunicorn process on ubuntu and then restart airflow webserver
This error is usually due to an exception happening when Airflow tries to parse a DAG. So the DAG gets registered in metastore(thus visible UI), but it wasn't parsed by Airflow. Can you take a look at Airflow logs, you might see an exception causing this error.
None of the responses helped me solving this issue.
However after spending some time I found out how to see the exact problem.
In my case I ran airflow (v2.4.0) using helm chart (v1.6.0) inside kubernetes. It created multiple docker containers. I got into the running container using ssh and executed two commands using airflow's cli and this helped me a lot to debug and understand the problem
airflow dags report
airflow dags reserialize
In my case the problem was that database schema didn't match the airflow version.
I am new to airflow and I have written a simple SSHOperator to learn how it works.
default_args = {
'start_date': datetime(2018,6,20)
}
dag = DAG(dag_id='ssh_test', schedule_interval = '#hourly',default_args=default_args)
sshHook = SSHHook(ssh_conn_id='testing')
t1 = SSHOperator(
task_id='task1',
command='echo Hello World',
ssh_hook=sshHook,
dag=dag)
When I manually trigger it on the UI, the dag shows a status of running but the operator stays white, no status.
I'm wondering why my task isn't queuing. Does anyone have any ideas? My airflow.config is the default if that is useful information.
Even this isn't running
dag=DAG(dag_id='test',start_date = datetime(2018,6,21), schedule_interval='0 0 * * *')
runMe = DummyOperator(task_id = 'testest', dag = dag)
Make sure you've started the Airflow Scheduler in addition to the Airflow Web Server:
airflow scheduler
check if airflow scheduler is running
check if airflow webserver is running
check if all DAGs are set to On in the web UI
check if the DAGs have a start date which is in the past
check if the DAGs have a proper schedule (before the schedule date) which is shown in the web UI
check if the dag has the proper pool and queue.