I am trying to calculate a hash for each task in airflow, using a combination of dag_id, task_id & execution_date. I am doing the calculation in the init of a custom operator, so that I could use it to calculate a unique retry_delay for each task (I don't want to use exponential backoff)
I find it difficult to use the {{ execution_date}} macro inside a call to hash function or int function, in those cases airflow does not replace it the specific date (just keeps the string {{execution_date}} and I get the same has for all execution dates
self.task_hash = int(hashlib.sha1("{}#{}#{}".format(self.dag_id,
self.task_id,
'{{execution_date}}')
.encode('utf-8')).hexdigest(), 16)
I have put task_hash in template_fields, also I have tried to do the calculation in a custom macro - this works for the hash part, but when I put it inside int(), it's the same issue
Any workround, or perhaps I could retrieve the execution_date (on the init of an operator), not from macros?
thanks
Try:
self.task_hash = int(hashlib.sha1("{}#{}#{{execution_date}}".format(
self.dag_id, self.task_id).encode('utf-8')).hexdigest(), 16)
Related
I have an upstream extract task, that extracts files into two different s3 paths. This operator returns a tuple of the two separate s3 paths as XCOM. How do I pass the appropriate XCOM value to the appropriate task?
extract_task >> load_task_0
load_task_1
Probably a little late to the party, but will answer anyways.
With TaskFlow API in Airflow 2.0 you can do something like this using decorators:
#task(multiple_outputs=True)
def extract_task():
return {
"path_0": "s3://path0",
"path_1": "s3://path1",
}
Then in your DAG:
#dag()
def my_dag():
output = extract_task()
load_task_0(output["path_0"])
load_task_1(output["path_1"])
This works with dictionary, probably won't work with tuple but you can try.
How do I retrieve outputs from objects in an array as described in the background?
I have a function in R that returns multiple variables. For eg. if my function is called function_ABC,then:
a<-function_ABC (input_var)
gives a such that a$var1, a$var2, and a$var3 exist.
I have multiple cases to run such that I have put then in an array:
input_var <- c(1, 2, ...15)
for storing the outputs, I declared var such that:
var <- c(v1, v2, v3, .... v15)
Then I run:
assign(v1[i],function(input_var(i)))
However, after that I am unable to access these variables as v1[1]$var1. I can access them as: v1$var1, or v3$var1, etc. But this means I need to write 15*3 commands to retrieve my output.
Is there an easier way to do this?
Push your whole input set into an array Arr[ ].
Open a multi threaded executor E of certain size N.
Using a for loop on the input array Arr[], submit your function calls as a Callable job to the executor E. While submitting each job, hold the reference to the FutureTask in another Array FTArr[ ].
When all the FutureTask jobs are executed, you may retrieve the output for each of them by running another for loop on FTArr[ ].
Note :
• make sure to add synchronized block in your func_ABC, where you are accessing shared resources to avoid deadlocks.
• Please refer to the below link, if you want to know more about the usage of a count-down-latch. A count-down-latch helps you to find out, when exactly, all the child threads have finished execution.
https://www.geeksforgeeks.org/countdownlatch-in-java/
In an ExplicitComponent class, within the definition of the setup function is there a way to give the value of an output during its declaration based on the value of an input created just before in that setup function ?
For instance to do something like :
class Discipline_1(ExplicitComponent):
def setup(self):
self.add_input('Input_1', val=5.0)
self.add_output('Output_1',val = 0.07*inputs['Input_1'])
The idea is, as the 'NonlinearBlockGS' solver in a cycle use the 'val' information to initialize a fixed point method, I would like to give an appropriate initialization to minimize the number of 'NonlinearBlockGS' iterations.
If you really wanted to initialize the output based on the input value, just define the input value as a local variable:
class Discipline_1(ExplicitComponent):
def setup(self):
inp_val = 5.0
self.add_input('Input_1', val=inp_val)
self.add_output('Output_1',val = 0.07*inp_val)
That would only work if you had a fixed value known at setup time. You could pass that value into the class constructor via metadata to make it a little more general. However this may not be the most general approach.
Your group has a certain run sequence, which you could set manually if you liked. the GaussSeidel solver will execute the loop on that sequence, so you only need to initialize the value that is the the input to the first component in the loop. All the others will get their values passed to them by their upstream component source as the GS solver runs. This initial value could be set manually after setup is called by setting the value using the problem interface:
prob['Input_1'] = 5.0
As a side note in openmdao 2.0.2 we have a guess_nonlinear method defined on ImplicitComponents that you can use to initialize implicit output variables, but that is not applicable here
While defining a function to be later used as a python_callable, why is 'ds' included as the first arg of the function?
For example:
def python_func(ds, **kwargs):
pass
I looked into the Airflow documentation, but could not find any explanation.
This is related to the provide_context=True parameter. As per Airflow documentation,
if set to true, Airflow will pass a set of keyword arguments that can be used in your function. This set of kwargs correspond exactly to what you can use in your jinja templates. For this to work, you need to define **kwargs in your function header.
ds is one of these keyword arguments and represents execution date in format "YYYY-MM-DD". For parameters that are marked as (templated) in the documentation, you can use '{{ ds }}' default variable to pass the execution date. You can read more about default variables here:
https://pythonhosted.org/airflow/code.html?highlight=pythonoperator#default-variables (obsolete)
https://airflow.incubator.apache.org/concepts.html?highlight=python_callable
PythonOperator doesn't have templated parameters, so doing something like
python_callable=print_execution_date('{{ ds }}')
won't work. To print execution date inside the callable function of your PythonOperator, you will have to do it as
def print_execution_date(ds, **kwargs):
print(ds)
or
def print_execution_date(**kwargs):
print(kwargs.get('ds'))
Hope this helps.
I try to evaluate a field in my report but it fails every time :
= IIf(Fields!lectcrs_hrs.IsMissing,
Round(Fields!lectcrs_fee.Value * "1.00", 2),
Round(Fields!lectcrs_fee.Value * Fields!lectcrs_hrs.Value, 2))
in the case of Fields!lectcrs_hrs.IsMissing = true my field is empty and i find that the reason that the second case Round(Fields!lectcrs_fee.Value * Fields!lectcrs_hrs.Value, 2) contains a missing field Fields!lectcrs_hrs .why it checks the second case if it passes the first one !
How to fix this problem ?
The behavior you are looking for is called "short-circuiting" and, unfortunately, the IIf function in Visual Basic does not offer that. The reason being is that IIf() is a ternary function and, as such, all arguments passed into it are evaluated before the function call occurs. A ternary operator (If() in VB 9+), on the other hand, does support conditional evaluation. However, I do not believe that the If() operator can be used as a part of an expression in SSRS.
Given the fact that you are trying to use a field which may or may not exist at run time, I suggest that you create a custom function in your report to handle the proper checking and return values. For reference, take a look at this blog post, which covers the same scenario.