If we look at an example DAG in Airflow we see (Graph View):
What determines the positions of tasks also_run_this and this_will_skip? Notice these 2 tasks don't have any connecting lines prior to themselves, which means they could be placed on the same layer (first vertical set of tasks) as runme_0, runme_1 and runme_2 (using my obviously incorrect assumptions about the DAG).
Is it their runtime that places them in the same layer as run_after_loop?
I am looking at the detailed tasks data for this DAG and I don't see anything that distinguishes also_run_this and this_will_skip from runme_0 in terms of position:
Here is runme_0:
{
"class_ref": {
"class_name": "BashOperator",
"module_path": "airflow.operators.bash"
},
"depends_on_past": false,
"downstream_task_ids": ["run_after_loop"],
"end_date": null,
"execution_timeout": null,
"extra_links": [],
"owner": "airflow",
"pool": "default_pool",
"pool_slots": 1,
"priority_weight": 1,
"queue": "default",
"retries": 0,
"retry_delay": {
"__type": "TimeDelta",
"days": 0,
"microseconds": 0,
"seconds": 300
},
"retry_exponential_backoff": false,
"start_date": "2021-06-17T00:00:00+00:00",
"task_id": "runme_0",
"template_fields": ["bash_command", "env"],
"trigger_rule": "all_success",
"ui_color": "#f0ede4",
"ui_fgcolor": "#000",
"wait_for_downstream": false,
"weight_rule": "downstream"
}
And here is also_run_this:
{
"class_ref": {
"class_name": "BashOperator",
"module_path": "airflow.operators.bash"
},
"depends_on_past": false,
"downstream_task_ids": ["run_this_last"],
"end_date": null,
"execution_timeout": null,
"extra_links": [],
"owner": "airflow",
"pool": "default_pool",
"pool_slots": 1,
"priority_weight": 1,
"queue": "default",
"retries": 0,
"retry_delay": {
"__type": "TimeDelta",
"days": 0,
"microseconds": 0,
"seconds": 300
},
"retry_exponential_backoff": false,
"start_date": "2021-06-17T00:00:00+00:00",
"task_id": "also_run_this",
"template_fields": ["bash_command", "env"],
"trigger_rule": "all_success",
"ui_color": "#f0ede4",
"ui_fgcolor": "#000",
"wait_for_downstream": false,
"weight_rule": "downstream"
}
It would make sense if the same layer was based on parallelism (all tasks in the same vertical layer run in parallel) but this would require some thresholding of the run times, and I don't see any such data available in the DAG or TASK information.
In fact, looking at the Tree View, it appears to show runme_0, runme_1, runme_2, also_run_this and this_will_skip all running at the same time:
As per #bruno-uy 's comment, it appears the Graph View has a UI "problem." Definitely not very intuitive.
After checking that runme_0, runme_1, runme_2, also_run_this and this_will_skip all running at the same time, we can say that this is a UI "problem" that they're not shown in the same "layer". Airflow doesn't have the "layer" concept, so basically they don't assure the tasks starting at the same time are aligned vertically.
Could be a good improvement for Airflow, or just add another diagram as you mentioned Sankey.
Related
I have successfully setup smtp server. also working fine in case of job failed.
But I tried to set SLA miss as per the below link.
https://blog.clairvoyantsoft.com/airflow-service-level-agreement-sla-2f3c91cd84cc
mid = BashOperator(
task_id='mid',
sla=timedelta(seconds=5),
bash_command='sleep 10',
retries=0,
dag=dag,
)
There is no event saving . Also i have checked through as below
Browse->SLA misses
I have tried more. Unable to catch the issue.
the dag is defined as :
args = {
'owner': 'airflow',
'start_date': datetime(2020, 11, 18),
'catchup':False,
'retries': 0,
'provide_context': True,
'email' : "XXXXXXXX#gmail.com",
'start_date': airflow.utils.dates.days_ago(n=0, minute=1),
'priority_weight': 1,
'email_on_failure' : True,
'default_args':{
'on_failure_callback': on_failure_callback,
}
}
d = datetime(2020, 10, 30)
dag = DAG('MyApplication', start_date = d,on_failure_callback=on_failure_callback, schedule_interval = '#daily', default_args = args)
The issue seems to be in the arguments, more specifically 'start_date': airflow.utils.dates.days_ago(n=0, minute=1), this means that start_date gets newly interpreted every time the scheduler parses the DAG file. You should specify a "static" start date like datetime(2020,11,18).
See also Airflow FAQ:
We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing. The task is triggered once the period closes, and in theory an #hourly DAG would never get to an hour after now as now() moves along.
Also specifying default_args inside of args seems weird to me.
I tried everywhere for a solution but I'm stuck.
My setup is the following:
ESP32 uses BLE GATT NOTIFICATION characteristic to push temperature data via thingsboard gateway into thingsboard. Once the BLE connection is established the first telemetry package is shown in the freshly created device's 'latest telemetry' area. If i turn on gateway debugging I can see further notifications reaching thingboard like this:
{"LOGS":"2020-07-20 02:04:19,640 - DEBUG - [ble_connector.py] - ble_connector - 321 - Notification received from device {'device_config': {'name': 'Esp32 v2.2', 'MACAddress': '24:62:AB:F3:43:72', 'telemetry': [{'key': 'temperature', 'method': 'notify', 'characteristicUUID': '0972EF8C-7613-4075-AD52-756F33D4DA91', 'byteFrom': 0, 'byteTo': -1}], 'attributes': [{'key': 'name', 'characteristicUUID': '00002A00-0000-1000-8000-00805F9B34FB', 'method': 'read', 'byteFrom': 0, 'byteTo': -1}], 'attributeUpdates': [{'attributeOnThingsBoard': 'sharedName', 'characteristicUUID': '00002A00-0000-1000-8000-00805F9B34FB'}], 'serverSideRpc': [{'methodRPC': 'rpcMethod1', 'withResponse': True, 'characteristicUUID': '00002A00-0000-1000-8000-00805F9B34FB', 'methodProcessing': 'read'}]}, 'interest_uuid': {'00002A00-0000-1000-8000-00805F9B34FB': [{'section_config': {'key': 'name', 'characteristicUUID': '00002A00-0000-1000-8000-00805F9B34FB', 'method': 'read', 'byteFrom': 0, 'byteTo': -1}, 'type': 'attributes', 'converter': <thingsboard_gateway.connectors.ble.bytes_ble_uplink_converter.BytesBLEUplinkConverter object at 0xb4427eb0>}], '0972EF8C-7613-4075-AD52-756F33D4DA91': [{'section_config': {'key': 'temperature', 'method': 'notify', 'characteristicUUID': '0972EF8C-7613-4075-AD52-756F33D4DA91', 'byteFrom': 0, 'byteTo': -1}, 'type': 'telemetry', 'converter': <thingsboard_gateway.connectors.ble.bytes_ble_uplink_converter.BytesBLEUplinkConverter object at 0xb4427eb0>}]}, 'scanned_device': <bluepy.btle.ScanEntry object at 0xb443a290>, 'is_new_device': False, 'peripheral': <bluepy.btle.Peripheral object at 0xb58f0070>, 'services': {'00001801-0000-1000-8000-00805F9B34FB': {'00002A05-0000-1000-8000-00805F9B34FB': {'characteristic': <bluepy.btle.Characteristic object at 0xb443a210>, 'handle': 2}}, '00001800-0000-1000-8000-00805F9B34FB': {'00002A00-0000-1000-8000-00805F9B34FB': {'characteristic': <bluepy.btle.Characteristic object at 0xb443a270>, 'handle': 21}, '00002A01-0000-1000-8000-00805F9B34FB': {'characteristic': <bluepy.btle.Characteristic object at 0xb443a1d0>, 'handle': 23}, '00002AA6-0000-1000-8000-00805F9B34FB': {'characteristic': <bluepy.btle.Characteristic object at 0xb443a2b0>, 'handle': 25}}, 'AB0828B1-198E-4351-B779-901FA0E0371E': {'0972EF8C-7613-4075-AD52-756F33D4DA91': {'characteristic': <bluepy.btle.Characteristic object at 0xb443a6b0>, 'handle': 41}, '4AC8A682-9736-4E5D-932B-E9B31405049C': {'characteristic': <bluepy.btle.Characteristic object at 0xb443a5f0>, 'handle': 44}}}} handle: 42, data: b'25.00'"}
the data i would like to update is the string '25.00'
I know I could update thingsboard directly but is the use of BLE that I'm interested in because I like that the sensors are notwork agnostic.
My question is why the updated temperature, even if reaching thingsboard won't show up and what can I change in order to make it happen.
Any kind of help much appreciated. I've being wrestling with this the entire weekend.
Adding more clarifications:
ESP32 code generate the BLE notifications: https://pastebin.com/NqMfxsK6
{
"name": "BLE Connector",
"rescanIntervalSeconds": 100,
"checkIntervalSeconds": 10,
"scanTimeSeconds": 5,
"passiveScanMode": true,
"devices": [
{
"name": "Temperature and humidity sensor",
"MACAddress": "24:62:AB:F3:43:72",
"telemetry": [
{
"key": "temperature",
"method": "notify",
"characteristicUUID": "0972EF8C-7613-4075-AD52-756F33D4DA91",
"byteFrom": 0,
"byteTo": -1
}
],
"attributes": [
{
"key": "name",
"characteristicUUID": "00002A00-0000-1000-8000-00805F9B34FB",
"method": "read",
"byteFrom": 0,
"byteTo": -1
}
],
"attributeUpdates": [
{
"attributeOnThingsBoard": "sharedName",
"characteristicUUID": "00002A00-0000-1000-8000-00805F9B34FB"
}
]
}
]
}
My goal is to list up Slack shared channel list, which joined users via Slack web API.
/usr/bin/curl -s -XPOST 'https://slack.com/api/conversations.list?token=MY_TOKEN&pretty=1' | jq -r '.channels[]|select(.is_shared = "true")'
But return is including non-shared channels also, like a (.is_shared = "false"). I have no idea why am I getting such results. Appreciate any help.
I use following code
/usr/bin/curl -s -XPOST 'https://slack.com/api/conversations.list?token=MY_TOKEN&pretty=1'
and results is
{
"ok": true,
"channels": [
{
"id": "C2U56FH6Z",
"name": "hoge_general",
"is_channel": true,
"is_group": false,
"is_im": false,
"created": 1477470814,
"is_archived": false,
"is_general": false,
"unlinked": 0,
"name_normalized": "hoge_general",
"is_shared": false,
"parent_conversation": null,
"creator": "U2UABCDEF",
"is_ext_shared": false,
"is_org_shared": false,
"shared_team_ids": [
"T2U94ABCDE"
],
"pending_shared": [],
"pending_connected_team_ids": [],
"is_pending_ext_shared": false,
"is_member": true,
"is_private": false,
"is_mpim": false,
"topic": {
"value": "Editor \2",
"creator": "U2UABCDE",
"last_set": 1478675694
},
"purpose": {
"value": "AAA editor ",
"creator": "U2UABABCDF",
"last_set": 14774596815
},
"previous_names": [],
"num_members": 11
},
So, I try to get channnel name which method is .
Comparison of values in jq is done using the == operator; = is the assignment operator which changes values (similar to many C-like programming languages).
The is_shared property in your JSON document is of type boolean (true or false), but you are comparing it to a string ("true" or "false"). That will always result in false (true == "true" is false).
Instead, compare booleans with booleans: .channels[] | select(.is_shared == true)
And, as with most other programming languages, conditionals compare implicitly against true, so if you have == true somewhere, this is redundant 99% of the time and can be simplified: .channels[] | select(.is_shared)
I'm trying to use apache airflow.
I managed to install everything.
I added a new DAG into dag folder and when I run airflow list_dags it shows me the dag examples along with my new dag.
However, when I go to UI I can't see the DAG listed in DAG tab.
I already killed webserver and restarted everything. It didn't work
fyi, I'm running apache on a VM with centos7.
thanks.
Zack in the comment section is right. If you change the owner in the dag's argument from the default 'airflow' to something else i.e
default_args = {
'owner': 'whateveryournameis', <----
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'email': ['airflow#example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG('tutorial', default_args=default_args, schedule_interval=timedelta(days=1))
in order to have your new dag shown in UI dags list, you should create a new user in airflow.
Creating a user is simple. Go to UI, under Admin, go to Users and create a new one.
I want the tasks in the DAG to all finish before the 1st task of the next run gets executed.
I have max_active_runs = 1, but this still happens.
default_args = {
'depends_on_past': True,
'wait_for_downstream': True,
'max_active_runs': 1,
'start_date': datetime(2018, 03, 04),
'owner': 't.n',
'email': ['t.n#example.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=4)
}
dag = DAG('example', default_args=default_args, schedule_interval = schedule_interval)
(All of my tasks are dependent on the previous task. Airflow version is 1.8.0)
Thank you
I changed to put max_active_runs as an argument of DAG() instead of in default_arguments, and it worked.
Thanks SimonD for giving me the idea, though not directly pointing to it in your answer.
You've put the 'max_active_runs': 1 into the default_args parameter and not into the correct spot.
max_active_runs is a constructor argument for a DAG and should not be put into the default_args dictionary.
Here is an example DAG that shows where you need to move it to:
dag_args = {
'owner': 'Owner',
# 'max_active_runs': 1, # <--- Here is where you had it.
'depends_on_past': False,
'start_date': datetime(2018, 01, 1, 12, 00),
'email_on_failure': False
}
sched = timedelta(hours=1)
dag = DAG(
job_id,
default_args=dag_args,
schedule_interval=sched,
max_active_runs=1 # <---- Here is where it is supposed to be
)
If the tasks that your dag is running are actually sub-dags then you may need to pass max_active_runs into the subdags too but not 100% sure on this.
You can use xcoms to do it. First take 2 python operators as 'start' and 'end' to the DAG. Set the flow as:
start ---> ALL TASKS ----> end
'end' will always push a variable
last_success = context['execution_date'] to xcom (xcom_push). (Requires provide_context = True in the PythonOperators).
And 'start' will always check xcom (xcom_pull) to see whether there exists a last_success variable with value equal to the previous DagRun's execution_date or to the DAG's start_date (to let the process start).
Followed this answer
Actually you should use DAG_CONCURRENCY=1 as environment var. Worked for me.