Airflow failed to get task instance - airflow

i have tried to run a simple task using airflow bash operator but keep getting stuck on my DAG never stop running, it stays like green forever without success or fail, when i check the logs i see something like this. Thanks in advance for your time and answers
**`your text`**airflow-scheduler_1 | [SQL: INSERT INTO task_fail (task_id, dag_id, execution_date, start_date, end_date, duration) VALUES (%(task_id)s, %(dag_id)s, %(execution_date)s, %(start_date)s, %(end_date)s, %(duration)s) RETURNING task_fail.id]
airflow-scheduler_1 | [parameters: {'task_id': 'first_task', 'dag_id': 'LocalInjestionDag', 'execution_date': datetime.datetime(2023, 1, 20, 8, 0, tzinfo=Timezone('UTC')), 'start_date': datetime.datetime(2023, 1, 23, 3, 35, 27, 332954, tzinfo=Timezone('UTC')), 'end_date': datetime.datetime(2023, 1, 23, 3, 35, 27, 710572, tzinfo=Timezone('UTC')), 'duration': 0}]
postgres_1 | 2023-01-23 03:55:59.712 UTC [4336] ERROR: column "execution_date" of relation "task_fail" does not exist at character 41"""
I have tried with execution_datetime , using xcom_push and creating functions with xcom and changing to python operator but everything still fall back to same error

Related

How to set start_date on airflow for dyanmic dag

I have a dag
for cust_idx, cust_details in enumerate(cust_list):
#code that gets customer os
for idx, os in enumerate(customer_os):
dag_id = f'pipeline-{cust_details[2]}-{os}'
globals()[dag_id] = create_dag(dag_id, timedelta(days=15), default_args, cust_details, os)
def create_dag(dag_id, schedule, default_args, cust_details, os):
dag = DAG(dag_id,
schedule_interval=schedule,
description='pipeline',
start_date=datetime(2022, 9, 28, 0, 0),
catchup=False,
max_active_runs=1,
concurrency=10,
default_args=default_args)
with dag:
#dag definition
I want to make sure each customer job is schedule every 15 days & schedule one customer one day
I tried something like sending cust_idx to create_dag which set start_date=datetime(2022, 9, 28, 0, 0)+timedelta(days=tenant_idx)
However my all dynamically created dags are scheduling at same time. When I get the customer I get them in fixed set of sequence like order clause

Is the Airflow BranchDateTimeOperator > or >= for the upper and lower bounds?

Suppose I execute this BranchDateTimeOperator on the following schedules (ds values). Which tasks will be executed and which will be skipped?
branch_task = BranchDateTimeOperator(
task_id='create_records_attributed',
depends_on_past=True,
use_task_execution_date=True,
follow_task_ids_if_true=['task_a'],
follow_task_ids_if_false=['task_b'],
target_upper=datetime(2022, 5, 20),
target_lower=None,
dag=dag,
)
datetime(2022, 5, 19)
datetime(2022, 5, 20)
datetime(2022, 5, 21)
In your example task_a will be followed if target_lower < now < target_upper otherwise task_b will follow.
BranchDateTimeOperator makes more sense if used with use_task_execution_date=True
in this scenario it compares target_lower < DAG logical_date < target_upper so it also relevant when you backfill a task or clear historical runs because the date which will be compared is the DAG running date rather than current date.
Alternatively, note that the target parameters accept also datetime.time so for scheduled runs you might want to use datetime.time rather than datetime.datetime for example to check if the time of running is between 08:00 to 10:00

Apache Airflow - How to set execution_date using TriggerDagRunOperator in target DAG for use the current execution_date

I want to set the execution_date in a trigger DAG. I´m using the operator TriggerDagRunOperator, this operator have the parameter execution_date, I want to set the current execution_date.
def conditionally_trigger(context, dag_run_obj):
"""This function decides whether or not to Trigger the remote DAG"""
pp = pprint.PrettyPrinter(indent=4)
c_p = Variable.get("VAR2") == Variable.get("VAR1") and Variable.get("VAR3") == "1"
print("Controller DAG : conditionally_trigger = {}".format(c_p))
if Variable.get("VAR2") == Variable.get("VAR1") and Variable.get("VAR3") == "1":
pp.pprint(dag_run_obj.payload)
return dag_run_obj
default_args = {
'owner': 'pepito',
'depends_on_past': False,
'retries': 2,
'start_date': datetime(2018, 12, 1, 0, 0),
'email': ['xxxx#yyyyy.net'],
'email_on_failure': False,
'email_on_retry': False,
'retry_delay': timedelta(minutes=1)
}
dag = DAG(
'DAG_1',
default_args=default_args,
schedule_interval="0 12 * * 1",
dagrun_timeout=timedelta(hours=22),
max_active_runs=1,
catchup=False
)
trigger_dag_2 = TriggerDagRunOperator(
task_id='trigger_dag_2',
trigger_dag_id="DAG_2",
python_callable=conditionally_trigger,
execution_date={{ execution_date }},
dag=dag,
pool='a_roz'
)
But I obtain the next error
name 'execution_date' is not defined
If I set
execution_date={{ 'execution_date' }},
or
execution_date='{{ execution_date }}',
I obtain
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 78, in execute
replace_microseconds=False)
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 45, in _trigger_dag
assert timezone.is_localized(execution_date)
File "/usr/local/lib/python3.6/site-packages/airflow/utils/timezone.py", line 38, in is_localized
return value.utcoffset() is not None
AttributeError: 'str' object has no attribute 'utcoffset'
Does anyone know how I can set the execution date for DAG_2 if I want to be equal to DAG_1?
This question is diferent to airflow TriggerDagRunOperator how to change the execution date because In this post didn't explain how to send the execution_date through the operator TriggerDagRunOperator, in it is only said that the possibility exists. https://stackoverflow.com/a/49442868/10269204
it was not templated previously, but it is templated now with this commit
you can try your code with new version of airflow
additionally for hardcoded execution_date, you need to set tzinfo:
from datetime import datetime, timezone
execution_date=datetime(2019, 3, 27, tzinfo=timezone.utc)
# or:
execution_date=datetime.now().replace(tzinfo=timezone.utc)

Datetime and Pytz Timezone .weekday() issue

I'm running into an issue when I'm trying to create a histogram of specific createdAt datetimes for orders. The issue is that even after created timezone aware datetimes, the .weekday() shows up as the same day, even though it should be a different time
The code I'm using to test this occurrence is as follows:
import datetime
import pytz
value = {
'createdAt': '2017-04-24T00:48:03+00:00'
}
created_at = datetime.datetime.strptime(value['createdAt'], '%Y-%m-%dT%H:%M:%S+00:00')
timezone = pytz.timezone('America/Los_Angeles')
created_at_naive = created_at
created_at_aware = timezone.localize(created_at_naive)
print(created_at_naive) # 2017-04-24 00:48:03
print(created_at_aware) # 2017-04-24 00:48:03-07:00
print(created_at_naive.weekday()) # 0 (Monday)
print(created_at_aware.weekday()) # 0 (should be Sunday)
The problem is that you need to actually change the datetime to the new timezone:
>>> timezone('UTC').localize(created_at)
datetime.datetime(2017, 4, 24, 0, 48, 3, tzinfo=<UTC>)
>>>timezone('UTC').localize(created_at).astimezone(timezone('America/Los_Angeles'))
datetime.datetime(2017, 4, 23, 17, 48, 3, tzinfo=<DstTzInfo 'America/Los_Angeles' PDT-1 day, 17:00:00 DST>)

How to fix datetime error when extending HIT in mturk using boto3

Attempting to extend expiration on a list of hits per API instructions
for hit_id in expired_hit_list:
response = client.update_expiration_for_hit(
HITId=hit_id,
ExpireAt=datetime(2017, 4, 9, 19, 9, 41, tzinfo=tzlocal())
)
getting error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-59-e0764e20a54b> in <module>()
2 response = client.update_expiration_for_hit(
3 HITId=hit_id,
----> 4 ExpireAt=datetime(2017, 4, 9, 19, 9, 41, tzinfo=tzlocal())
5 )
NameError: name 'datetime' is not defined
I also tried datetime.datetime and dateTime and also just removing it.
ExpireAt=(2017, 4, 9, 19, 9, 41, tzinfo=tzlocal())
nothing working. Suggestions?
it's just an issue with my Python setup, nothing to do with boto3
import datetime
from dateutil.tz import tzlocal

Resources