airflow pool used slots is greater than slots limit

airflow pool used slots is greater than slots limit - airflow

There are three Sensor task and use same pool, the pool 'limit_sensor' is set to 1, but the pool limit not work, three pool is running together
sensor_wait = SqlSensor(
task_id='sensor_wait',
dag=dag,
conn_id='dest_data',
sql="select count(*) from test",
poke_interval=10,
timeout=60,
pool='limit_sensor',
priority_weight=100
)
same_pool1 = SqlSensor(
task_id='same_pool1',
dag=dag,
conn_id='dest_data',
sql="select count(*) from test",
poke_interval=10,
timeout=60,
pool='limit_sensor',
priority_weight=10
)
same_pool2 = SqlSensor(
task_id='same_pool2',
dag=dag,
conn_id='dest_data',
sql="select count(*) from test",
poke_interval=10,
timeout=60,
pool='limit_sensor',
priority_weight=10
)
there is backfill log, airflow 1.10.0
[2018-10-12 11:20:35,036] {jobs.py:2198} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 0 | succeeded: 0 | running: 3 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0
in the web ui Admin->Pools you can see:
Pool Slots Used Slots Queued Slots
limit_sensor 1 3 0
how should i do to make pool limit work? thanks.

Related

How to refer a task success status of one dag to another dag

I have a scenario like below:
dag A :
task 1,
task 2
Dag B :
Task 3,
Task 4
Now i want to trigger/ run the task 3 (dag B) only after the success of task 1(dag A). Both dags scheduled on the same day but different time.
For example:Dag A runs on 14 July 8 AM,.
Dag B runs on 14 July 2 PM
Is that doable? How?
Please help
Thanks

In DagB you should create a BranchPythonOperator that returns "task3" if the appropriate conditions occurred.
in this code example, I return "task3" only if "DagA" finished state=success on the same day.
def check_success_dag_a(**context):
ti: TaskInstance = context['ti']
dag_run: DagRun = context['dag_run']
date: datetime = ti.execution_date
ts = timezone.make_aware(datetime(date.year, date.month, date.day, 0, 0, 0))
dag_a = dag_run.find(
dag_id='DagA',
state="success",
execution_start_date=ts,
execution_end_date=ti.execution_date)
if dag_a:
return "task3"
check_success = BranchPythonOperator(
task_id="check_success_dag_a",
python_callable=check_success_dag_a,
)
def run(**context):
ti = context['ti']
print(ti.task_id)
task3 = PythonOperator(
task_id="task3",
python_callable=run,
trigger_rule=TriggerRule.ONE_SUCCESS
)
task4 = PythonOperator(
task_id="task4",
python_callable=run,
trigger_rule=TriggerRule.ONE_SUCCESS
)
(check_success >> [task3] >> task4)

Can we configure different schedule_interval for different tasks within a DAG?

Can we set different schedule_intervals for different tasks in the same DAG?
i.e. I have one DAG with three tasks, A >> B >> C. I want the upstream tasks A &B to run weekly, but for downstream task C, I want it to run daily. Is it possible? If so, what are the schedule_interval should be for the DAG and tasks?

There are two options you can use ShortCircuitOperator or BranchDayOfWeekOperator.
1 Using BranchDayOfWeekOperator for that use case. This operator branches based on specific day of the week:
with DAG('my_dag',
schedule_interval='#daily'
) as dag:
task1 = DummyOperator(task_id='TASK1')
task2 = DummyOperator(task_id='TASK2')
task3 = DummyOperator(task_id='TASK3')
end_task = DummyOperator(task_id='end_task')
branch = BranchDayOfWeekOperator(
task_id="make_choice",
follow_task_ids_if_true="TASK3",
follow_task_ids_if_false="end_task",
week_day="Monday",
)
task1 >> task2 >> branch >> [task3, end_task]
In this example task3 will be executed only on Monday while task1 & task2 will run daily.
Note this operator available only for Airflow >=2.1.0 however you can copy the operator source code and create local version.
2 Using ShortCircuitOperator:
from datetime import date
def func():
if date.today().weekday() == 0:
return True
return False
with DAG('my_dag',
schedule_interval='#daily'
) as dag:
task1 = DummyOperator(task_id='TASK1')
task2 = DummyOperator(task_id='TASK2')
task3 = DummyOperator(task_id='TASK3')
verify = ShortCircuitOperator(task_id='check_day', python_callable=func)
task1 >> task2 >> verify >> task3

Airflow DAG Task Dependency in a Loop

I have a DAG that needs to recompile this customer lists in various brands. The script is called with two arguments brand and listtype.
I need the brands to run concurrently, but the list types to be dependent on the preceding list type, but I can't figure out how to do that in a loop. Can ya'll help me out?
BrandsToRun = ['A', 'B', 'C']
ListTypes = ['1', '2', '3']
# Defining the DAG
################################################################################
with DAG(
'MusterMaster',
default_args = default_args,
description = 'x',
# schedule_interval = None
schedule_interval = '30 4 * * *',
catchup = False
) as MusterMaster:
for Brand in BrandsToRun:
for ListType in ListTypes:
ListLoad = BashOperator(
task_id='Load_'+str(Brand)+'_'+str(ListType),
bash_command = """python3 '/usr/local/bin/MusterMaster.py' {0} {1}""".format(Brand[0], ListType[0]),
pool='logs'
)
ListLoad
I want the tasks to have dependency structure like this, but I can't figure it out. Brand should run concurrently, but ListTypes should be dependent on the preceding ListType.
Muster A 1 >> Muster A 2 >> Muster A 3
Muster B 1 >> Muster B 2 >> Muster B 3
Muster C 1 >> Muster C 2 >> Muster C 3
How can I best accomplish this?

You can do:
for Brand in BrandsToRun:
list = []
for ListType in ListTypes:
list.append(BashOperator(
task_id='Load_'+str(Brand)+'_'+str(ListType),
bash_command = """python3 '/usr/local/bin/MusterMaster.py' {0} {1}""".format(Brand[0], ListType[0]),
pool='logs'))
if len(list) > 1:
list[-2] >> list[-1]
Which will give you:

'testthat', how should I interpret this failure?

I want to use 'testthat' to add tests to my package.
This is my test-file:
library(RBaseX)
test_that("Credentials are accepted", {
skip_unless_socket_available()
expect_error(BasexClient$new("localhost", 1984L, username = "admin", password = "denied"), "Access denied")
Session <- BasexClient$new("localhost", 1984L, username = "admin", password = "admin")
expect_equal(class(Session)[[1]], "BasexClient")
})
skip_unless_socket_available is defined in a separate helper.R-file:
skip_unless_socket_available <- function() {
tryCatch({
Socket <- socketConnection(host = "localhost", 1984,
open = "w+b", server = FALSE, blocking = TRUE, encoding = "utf-8")
close(Socket)
TRUE
}, error = function(e) {
skip(paste0("basexserver not available:\n'", conditionMessage(e), "'"))
})
}
When the program is executed I get this output:
Loading RBaseX
Testing RBaseX
✓ | OK F W S | Context
✓ | 2 | test_RbaseX [0.2 s]
⠹ | 2 1 | test_RbaseX
══ Results ═══════════════════════════════
Duration: 0.3 s
OK: 2
Failed: 1
Warnings: 0
Skipped: 0
No matter what I do, I still get 1 failure. Both exceptions, however, are handled correctly.
How should I act upon this failure?
Ben
After inserting 3 contexts() in test_RBaseX.R, I now get this output:
Loading RBaseX
Testing RBaseX
✓ | OK F W S | Context
✓ | 2 | Access [0.1 s]
✓ | 2 | Create Session [0.1 s]
⠏ | 0 | Check setter/getter (and BasexClient$Execute())
✓ | 2 | Check setter/getter (and BasexClient$Execute())
⠏ | 0 | Intercept set/get is handled correctlyDatabase 'TestOpen' is niet gevonden.
Database 'TestOpen' is niet gevonden.
Database 'TestOpen' is niet gevonden.
✓ | 1 | Intercept set/get is handled correctly
⠙ | 1 1 | Intercept set/get is handled correctly
══ Results ════════════════════════════════════════════
Duration: 0.4 s
OK: 7
Failed: 1
Warnings: 0
Skipped: 0
All tests give the expected result but then 1 failure is added. I still haven't seen any indication why.
Does this help?
Ben
(By the way, thanks to these testing-activities, I found and fixed serveral errors :-))

After moving the file 'testthat.R' from the '/tests/testthat'-directory, to the '/tests'-directory, there were no failures detected anymore.
Ben

ERROR MY-011542 Repl Plugin group_replication reported: 'Table repl_test does not have any PRIMARY KEY

I installed an InnoDB Cluster recently and trying to create a table without any primary key or equivalent to test the cluster index concept where "InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values if the table has no PRIMARY KEY or suitable UNIQUE index".
I created table as below:
create table repl_test (Name varchar(10));
Checked for the creation of GEN_CLUST_INDEX:
select * from mysql.innodb_index_stats where database_name='test' and table_name = 'repl_test';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test | repl_test | GEN_CLUST_INDEX | 2019-02-22 06:29:26 | n_diff_pfx01 | 0 | 1 | DB_ROW_ID |
| test | repl_test | GEN_CLUST_INDEX | 2019-02-22 06:29:26 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| test | repl_test | GEN_CLUST_INDEX | 2019-02-22 06:29:26 | size | 1 | NULL | Number of pages in the index |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
But, when I try to insert row, I get the below error:
insert into repl_test values ('John');
ERROR 3098 (HY000): The table does not comply with the requirements by an external plugin.
2019-02-22T14:32:53.177700Z 594022 [ERROR] [MY-011542] [Repl] Plugin group_replication reported: 'Table repl_test does not have any PRIMARY KEY. This is not compatible with Group Replication.'
Below is my conf file:
[client]
port = 3306
socket = /tmp/mysql.sock
[mysqld_safe]
socket = /tmp/mysql.sock
[mysqld]
socket = /tmp/mysql.sock
port = 3306
basedir = /mysql/product/8.0/TEST
datadir = /mysql/data/TEST/innodb_data
log-error = /mysql/admin/TEST/innodb_logs/mysql.log
log_bin = /mysql/binlog/TEST/innodb_logs/mysql-bin
server-id=1
max_connections = 500
open_files_limit = 65535
expire_logs_days = 15
innodb_flush_log_at_trx_commit=1
sync_binlog=1
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
binlog_checksum = NONE
enforce_gtid_consistency = ON
gtid_mode = ON
relay-log=<<hostname>>-relay-bin
My MySQL version: mysql Ver 8.0.11 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL)

Auto-generation of the PK should work fine for non-clustered setups. However, InnoDB Cluster (Group Replication) and Galera need a user PK (or UNIQUE that can be promoted).
If you like, file a documentation bug with bugs.mysql.com complaining that this restriction is not clear. Be sure to point to the page that needs fixing.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

airflow pool used slots is greater than slots limit - airflow

Related

How to refer a task success status of one dag to another dag

Can we configure different schedule_interval for different tasks within a DAG?

Airflow DAG Task Dependency in a Loop

'testthat', how should I interpret this failure?

ERROR MY-011542 Repl Plugin group_replication reported: 'Table repl_test does not have any PRIMARY KEY

Categories

Resources