Airflow: Concurrency with respect to dag path

Airflow: Concurrency with respect to dag path - airflow

Suppose I have a dag with deep concurrency-compatible paths:
B3 <-- B2 <-- B1 <-- B0
/
C
\
A3 <-- A2 <-- A1 <-- A0
Where each path above can be solved concurrently. However, if one of the branches above is failing (for example, if B0, A0 are sensors and B0 evaluates to true while A0 is still waiting, then the rest of the B branch should still execute.
However, although I am able to get task concurrency, the entire dag is stuck at the B0, A0 task, rather than advancing along the B0 path while A0 waits.
How do I configure Airflow to advance along each path, rather than getting blocked at a task if one branch is blocked?
Or is the only solution to create many mini-dags? It seems as if the executor is favoring parallelization across only one level of nodes over vertical execution -- i.e., it is performing breadth only calculations.

This was a bit of a trick at first due to the naming convention:
// I am using the following convention: filename(variable name or description)
// conceptually,
C = airflow.cfg(dag_concurrency) * dag.py(dag concurrency for tasks)
C <= airflow.cfg(parallelism)

Related

Auto scaling task in airflow

I want to use airflow for image processing.
I have 4 Tasks: Image Pre process (A) ,bounding box finder (B), classification (C), image finalize (D).
the chart look like this:
A -> B1 -> C \
-> B2 -> C - D
-> B3 -> C /
-> Bn -> C /
the output of Image Pre process task is a list of bounding box proposals, for each bounding box I run classification and once all classification tasks ends I run the image finalize.
I want everything to run in parallel
This will run on 10000 images per day so if I will have different presentation of pipeline in the UI for each image, I can't keep track of the pipeline...
Is it possible in airflow ?

Dynamically creating tasks like this is not something Airflow is best for. Take a look at the answer here to get some insight: Airflow dynamic tasks at runtime.
Airflow is better suited as a scheduling tool, so I propose you delegate the actual work and parallelization to another tool like Celery. You can still use Airflow to schedule this work, in a way that your B step is a simple operator which reads the output from A (via XCom or similar) and distributes actual work to some remote workers.
Can you know in advance the maximum possible number of B tasks? If that's manageable, you could get away with creating the max B tasks, and then skipping some of them as needed depending on the outcome of A.
The implementation might not be trivial, but you could get some hints from this discussion: Launch a subdag with variable parallel tasks in airflow.

Constraints for graph models

I have four possible chains that can be formed with 6 different chain links:
G0 -> G1 -> G2
E0 -> E1 -> E2
G0 -> E1 -> G2
E0 -> G1 -> G2
Now I want to express this four chains using a graph model which would look like the following picture:
If I use a graph query language to ask eg give me all paths having G0 as first vertex and E2 as last vertex, I would get a path G0 -> E1 -> E2 which is not a valid path or chain out of the four...
So my question is is there a possibility to express such constraints such that I only receive "valid" paths?

I don't understand why you say that the path G0 -> E1 -> E2 is not valid. By your definition, it should be the only valid path. This query should return the desired result:
g.V(G0). /* G0 as first vertex */
repeat(out()).
until(__.is(E2)). /* E2 as last vertex */
path() /* all paths */

In the motto of the simplest solution is usually the best, I would do this.
1) For each chain, create a node to represent that chain.
2) Create a relation from that node to each node in the chain, and add an index property on the relationship. (You may use a first/end relationship for the first and last element, or add that as a property for easier Cyphers)
3) Run Cyphers on your "chain" nodes instead.
Any other way of doing this will make the Cypher either overly complex, or risk corrupting your original data to the point where it is barley usable. This is the easiest and most flexible setup.

target indices in OpenMDAO 1.x group connections?

By my understanding of src_indices in the documentation, self.connect('a', 'b', src_indices=[1]) is roughly equivalent to b=a[1]. Is there a convenient way to do "target indices" that would allow writing something like b[1]=a?

If a is an output of one component, and b is an input of some other component, then generally a connection can only be a->b. So in that context b[1] -> a would never work, because you can't use the input as the source side of a connection.
However, if you broaden the question a little bit, and assume there are two outputs a1 and a2, and you want to issue two connections as a1 -> b[0], a2 ->b[1], these would be "target indices." However, this isn't allowed in either OpenMDAO V1 or OpenMDAO V2. The reason is that any given input can be connected to one and only one output as its source. This restriction makes the underlying code much simpler.
In this kind of situation, you need to make a muxing component that will have two inputs and one vector output. Its solve_nonlinear in V1 or compute method in V2 will push the values into the array.

How could I calculate the number of recursions that a recursive rule does?

I deal with a problem; I want to calculate how many recursions a recursive rule of my code does.
My program examines whether an object is component of a computer hardware or not(through component(X,Y) predicate).E.g component(computer,motherboard) -> true.
It does even examine the case an object is not directly component but subcomponent of another component. E.g. subcomponent(computer,ram) -> true. (as ram is component of motherboard and motherboard is component of computer)
Because my code is over 400 lines I will present you just some predicates of the form component(X,Y) and the rule subcomponent(X,Y).
So, some predicates are below:
component(computer,case).
component(computer,power_supply).
component(computer,motherboard).
component(computer,storage_devices).
component(computer,expansion_cards).
component(case,buttons).
component(case,fans).
component(case,ribbon_cables).
component(case,cables).
component(motherboard,cpu).
component(motherboard,chipset).
component(motherboard,ram).
component(motherboard,rom).
component(motherboard,heat_sink).
component(cpu,chip_carrier).
component(cpu,signal_pins).
component(cpu,control_pins).
component(cpu,voltage_pins).
component(cpu,capacitors).
component(cpu,resistors).
and so on....
My rule is:
subcomponent(X,Z):- component(X,Z).
subcomponent(X,Z):- component(X,Y),subcomponent(Y,Z).
Well, in order to calculate the number of components that a given component X to a given component Y has-that is the number of recursions that the recursive rule subcomponents(X,Y), I have made some attempts that failed. However, I present them below:
i)
number_of_components(X,Y,N,T):- T is N+1, subcomponent(X,Y).
number_of_components(X,Y,N,T):- M is N+1, subcomponent(X,Z), number_of_components(Z,Y,M,T).
In this case I get this error: "ERROR: is/2: Arguments are not sufficiently instantiated".
ii)
number_of_components(X,Y):- bagof(Y,subcomponent(X,Y),L),
length(L,N),
write(N).
In this case I get as a result either 1 or 11 and after this number true and that's all. No logic at all!
iii)
count_elems_acc([], Total, Total).
count_elems_acc([Head|Tail], Sum, Total) :-
Count is Sum + 1,
count_elems_acc(Tail, Count, Total).
number_of_components(X,Y):- bagof(Y,subcomponent(X,Y),L),
count_elems_acc(L,0,Total),
write(Total).
In this case I get as results numbers which are not right according to my knowledge base.(or I mistranslate them-because this way seems to have some logic)
So, what am I doing wrong and what should I write instead?
I am looking forward to reading your answers!

One thing you could do is iterative deepening with call_with_depth_limit/3. You call your predicate (in this case, subcomponent/2). You increase the limit until you get a result, and if you get a result, the limit is the deepest recursion level used. You can see the documentation for this.
However, there is something easier you can do. Your database can be represented as an unweighted, directed, acyclic graph. So, stick your whole database in a directed graph, as implemented in library(ugraphs), and find its transitive closure. In the transitive closure, the neighbours of a component are all its subcomponents. Done!
To make the graph:
findall(C-S, component(C, S), Es),
vertices_edges_to_ugraph([], Es, Graph)
To find the transitive closure:
transitive_closure(Graph, Closure)
And to find subcomponents:
neighbours(Component, Closure, Subcomponents)
The Subcomponents will be a list, and you can just get its length with length/2.
EDIT
Some random thoughts: in your case, your database seems to describe a graph that is by definition both directed and acyclic (the component-subcomponent relationship goes strictly one way, right?). This is what makes it unnecessary to define your own walk through the graph, as for example nicely demonstrated in this great question and answers. So, you don't need to define your own recursive subcomponent predicate, etc.
One great thing about representing the database as a term when working with it, instead of keeping it as a flat table, is that it becomes trivial to write predicates that manipulate it: you get Prolog's backtracking for free. And since the S-representation of a graph that library(ugraph) uses is well-suited for Prolog, you most probably end up with a more efficient program, too.

The number of calls of a predicate can be a difficult concept. I would say, use the tools that your system make available.
?- profile(number_of_components(computer,X)).
20===================================================================
Total time: 0.00 seconds
=====================================================================
Predicate Box Entries = Calls+Redos Time
=====================================================================
$new_findall_bag/0 1 = 1+0 0.0%
$add_findall_bag/1 20 = 20+0 0.0%
$free_variable_set/3 1 = 1+0 0.0%
...
so:count_elems_acc/3 1 = 1+0 0.0%
so:subcomponent/2 22 = 1+21 0.0%
so:component/2 74 = 42+32 0.0%
so:number_of_components/2 2 = 1+1 0.0%
On the other hand, what is of utmost importance is the relation among clause variables. This is the essence of Prolog. So, try to read - let's say, in plain English - your rules.
i) number_of_components(X,Y,N,T) what relation N,T have to X ? I cannot say. So
?- leash(-all),trace.
?- number_of_components(computer,Y,N,T).
Call: (7) so:number_of_components(computer, _G1931, _G1932, _G1933)
Call: (8) _G1933 is _G1932+1
ERROR: is/2: Arguments are not sufficiently instantiated
Exception: (8) _G1933 is _G1932+1 ?
ii) number_of_components(X,Y) here would make much sense if Y would be the number_of_components of X. Then,
number_of_components(X,Y):- bagof(S,subcomponent(X,S),L), length(L,Y).
that yields
?- number_of_components(computer,X).
X = 20.
or better
?- aggregate(count, S^subcomponent(computer,S), N).
N = 20.
Note the usage of S. It is 'existentially quantified' in the goal where it appears. That is, allowed to change while proving the goal.
iii) count_elements_acc/3 is - more or less - equivalent to length/2, so the outcome (printed) seems correct, but again, it's the relation between X and Y that your last clause fails to establish. Printing from clauses should be used only when the purpose is to perform side effects... for instance, debugging...

hazards in a three-way superscalar pipeline

I am working my way though exercises relating to superscalar architecture. I need some help conceptualizing the answer to this question:
“If you ever get confused about what a register renamer has to do, go back to the assembly code you're executing, and ask yourself what has to happen for the right result to be obtained. For example, consider a three-way superscalar machine renaming these three instructions concurrently:
ADDI R1, R1, R1
ADDI R1, R1, R1
ADDI R1, R1, R1
If the value of R1 starts out as 5, what should its value be when this sequence has executed?”
I can look at that and see that, ok, the final value of R1 should be 40. How would a three-way superscalar machine reach this answer though? If I understand them correctly, in this three-way superscalar pipeline, these three instructions would be fetched in parallel. Meaning, you would have a hazard right from start, right? How should I conceptualize the answer to this problem?
EDIT 1: When decoding these instructions, the three-way superscalar machine would, by necessity, have to perform register renaming to get the following instruction set, correct:
ADDI R1, R2, R3
ADDI R4, R5, R6
ADDI R1, R2, R3

Simply put - you won't be able to perform these instructions together. However, the goal of this example doesn't seem to do with hazards (namely - detecting that these instruction are interdependent and must be performed serially with sufficient stalls), it's about renaming - it serves to show that a single logical register (R1) will have multiple physical "versions" in-flight simultaneously in the pipeline. The original one would have the value 5 (lets call it "p1"), but you'll also need to allocate one for the result of the first ADD ("p2"), to be used as source for the second, and again for the results of the second and third ADD instructions ("p3" and "p4").
Since this processor decodes and attempts to issue these 3 instructions simultaneously, you can see that you can't just have R1 as the source for all - that would prevent each of them from using the correct mid-calculation value, so you need to rename them. The important part is that p1..p4 as we've dubbed them, can be allocated simultaneously, and the dependencies would be known at the time of issue - long before each of them is populated with the result. This essentially decouples the front-end from the execution back-end, which is important for the performance flexibility in modern CPUs as you may have bottlenecks anywhere.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex