Global variables in Airflow - global-variables

I am trying to implement basic ETL job, using Airflow, but stucked in one point:
I have 3 functions. And I want to define global variables for each of them like:
function a():
return a_result
function b():
use a
return b_result
function c():
use a and b
And then use these functions in python_callable.
Defining as usual global a_result is not working. Any solutions?

As I wrote in my comment,
When you return something in your python_callable, you can access the returned value if you pass the task context to the next operator. https://airflow.apache.org/concepts.html?highlight=xcom
The following is semi-pseudo code that illustrates the idea
# inside a PythonOperator called 'pushing_task'
def push_function():
return value
# inside another PythonOperator where provide_context=True
def pull_function(**context):
value = context['task_instance'].xcom_pull(task_ids='pushing_task')
pushing_task = PythonOperator('pushing_task',
push_function, ...)
pulling_task = PythonOperator('pulling_task',
pull_function,
provide_context=True ...)

Related

How do I get information about function calls from a Lua script?

I have a script written in Lua 5.1 that imports third-party module and calls some functions from it. I would like to get a list of function calls from a module with their arguments (when they are known before execution).
So, I need to write another script which takes the source code of my first script, parses it, and extracts information from its code.
Consider the minimal example.
I have the following module:
local mod = {}
function mod.foo(a, ...)
print(a, ...)
end
return mod
And the following driver code:
local M = require "mod"
M.foo('a', 1)
M.foo('b')
What is the better way to retrieve the data with the "use" occurrences of the M.foo function?
Ideally, I would like to get the information with the name of the function being called and the values of its arguments. From the example code above, it would be enough to get the mapping like this: {'foo': [('a', 1), ('b')]}.
I'm not sure if Lua has functions for reflection to retrieve this information. So probably I'll need to use one of the existing parsers for Lua to get the complete AST and find the function calls I'm interested in.
Any other suggestions?
If you can not modify the files, you can read the files into a strings then parse mod file and find all functions in it, then use that information to parse the target file for all uses of the mod library
functions = {}
for func in modFile:gmatch("function mod%.(%w+)") do
functions[func] = {}
end
for func, call in targetFile:gmatch("M%.(%w+)%(([^%)]+)%)") do
args = {}
for arg in string.gmatch(call, "([^,]+)") do
table.insert(args, arg)
end
table.insert(functions[func], args)
end
Resulting table can then be serialized
['foo'] = {{"'a'", " 1"}, {"'b'"}}
3 possible gotchas:
M is not a very unique name and could vary possibly match unintended function calls to another library.
This example does not handle if there is a function call made inside the arg list. e.g. myfunc(getStuff(), true)
The resulting table does not know the typing of the args so they are all save as strings representations.
If modifying the target file is an option you can create a wrapper around your required module
function log(mod)
local calls = {}
local wrapper = {
__index = function(_, k)
if mod[k] then
return function(...)
calls[k] = calls[k] or {}
table.insert(calls[k], {...})
return mod[k](...)
end
end
end,
}
return setmetatable({},wrapper), calls
end
then you use this function like so.
local M, calls = log(require("mod"))
M.foo('a', 1)
M.foo('b')
If your module is not just functions you would need to handle that in the wrapper, this wrapper assumes all indexes are a function.
after all your calls you can serialize the calls table to get the history of all the calls made. For the example code the table looks like
{
['foo'] = {{'a', 1}, {'b'}}
}

Python Dunder method for printing all variables contained by a function? For custom debugging decorator

Is there a way to print all of a function's variables? I want to build a custom debugging decorator, and can't seem to find what I'm looking for. I'm assuming there is some dunder method for this? So for a function:
def debugger(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
print(func.__funcVariables__) #Some dunder method that Prints all variables contained in func
return func
return wrapper
#debugger
def my_func():
x = 'foo'
y = 'bar'
I would want 'foo' and 'bar' printed to the console from the decorator. How can I achieve this?
It sounds like you're looking for function.__code__.co_varnames, which is a tuple of the names of the functions arguments and local variables. This is documented with the rest of the code introspection tools in the documentation for the inspect module
def debugger(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
print(func.__code__.co_varnames)
return func
return wrapper
#debugger
def my_func():
x = 'foo'
y = 'bar'

accumulator in pyspark with dict as global variable

Just for learning purpose, I tried to set a dictionary as a global variable in accumulator the add function works well, but I ran the code and put dictionary in the map function, it always return empty.
But similar code for setting list as a global variable
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, acc1, acc2):
acc1.update(acc2)
if __name__== "__main__":
sc, sqlContext = init_spark("generate_score_summary", 40)
rdd = sc.textFile('input')
#print(rdd.take(5))
dict1 = sc.accumulator({}, DictParam())
def file_read(line):
global dict1
ls = re.split(',', line)
dict1+={ls[0]:ls[1]}
return line
rdd = rdd.map(lambda x: file_read(x)).cache()
print(dict1)
For anyone who arrives at this thread looking for a Dict accumulator for pyspark: the accepted solution does not solve the posed problem.
The issue is actually in the DictParam defined, it does not update the original dictionary. This works:
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, value1, value2):
value1.update(value2)
return value1
The original code was missing the return value.
I believe that print(dict1()) simply gets executed before the rdd.map() does.
In Spark, there are 2 types of operations:
transformations, that describe the future computation
and actions, that call for action, and actually trigger the execution
Accumulators are updated only when some action is executed:
Accumulators do not change the lazy evaluation model of Spark. If they
are being updated within an operation on an RDD, their value is only
updated once that RDD is computed as part of an action.
If you check out the end of this section of the docs, there is an example exactly like yours:
accum = sc.accumulator(0)
def g(x):
accum.add(x)
return f(x)
data.map(g)
# Here, accum is still 0 because no actions have caused the `map` to be computed.
So you would need to add some action, for instance:
rdd = rdd.map(lambda x: file_read(x)).cache() # transformation
foo = rdd.count() # action
print(dict1)
Please make sure to check on the details of various RDD functions and accumulator peculiarities because this might affect the correctness of your result. (For instance, rdd.take(n) will by default only scan one partition, not the entire dataset.)
For accumulator updates performed inside actions only, their value is
only updated once that RDD is computed as part of an action

Standard name for a function that modifies a function to ignore an argument

I'm using Python because it's generally easy to read, but this is not a Python-specific question.
Take the following Python function strip_argument:
def strip_argument(func_with_no_args):
return lambda unused: func_with_no_args()
In use, I can pass a no-argument function to strip_argument, and it will return a function that accepts one argument that is never used. For example:
# some API I want to use
def set_click_event_listener(listener):
"""Args:
listener: function which will be passed the view that was clicked.
"""
# ...implementation...
# my code
def my_click_listener():
# I don't care about the view, so I don't want to make that an arg.
print "some view was clicked"
set_click_event_listener(strip_argument(my_click_listener))
Is there a standard name for the function strip_argument? I'm interested in any languages that have a function like this in the standard library.
Most functional programming languages offer a const function, that's a function that will always ignore it's first parameter and return it's second. If you pass a function to const that's exactly the behavior you described.
In Haskell you can use it like that:
f x = x + 1
g = const f
g 2 3 == 4 --2 is ignored and 3 is incremented
I have done a quick search for such a function in python but haven't found anything. It seems the standard is to use a lambda function as you did.

Converting a bunch of nested recursive functions to iterative funcs

I have read that it is in principle possible to convert a recursive function to an iterative function. I have some bunch of functions calling each other. I constructed the structure of the code looking at my flowchart and it was kind of obvious to do it by recursive style. It runs good for small size problems but gives segmentation fault for bigger scale. So I am trying to switch to iterative style but I cannot imagine a way to do it technically since the branching structure confuses me. Can someone give me a clue to handle it? The code is something like that in python:
def main_function(parameters):
if condition0:
....
if condition1:
....
if condition2:
....
return function1(parameters)
else:
....
return function2(parameters)
else:
return function1(parameters)
else:
return function2(parameters)
#############################################
def function1(parameters):
if condition3:
...
return function3(parameters) ### yet another function.. so messed up? :-(((
else:
return main_function(parameters)
##############################################
def function2(parameters):
if condition4:
...
return main_function(parameters)
else:
return function1(parameters)
###############################################
def function3(parameters):
if condition5
if condition6:
...
return function3(parameters)
else:
...
return main_function(parameters)
else:
return RESULTS # The only way out!
Any idea would be greatly appreciated, thank you very much in advance.
Since every return statement that you've shown is essentially a return some_other_function(), it seems that a state machine would be a natural way to model this. There would be a state corresponding to each function, and the return statements would become state transitions.
Since every recursive call is initiated in return statements. You don't need to hold up the old stack. For example, when function1() calls return function3(), function1 stack can be removed. This way you won't get RuntimeError: maximum recursion depth exceeded.
You can achieve this by returning the successive function to call with parameters, instead of calling recursively.
def main_function(parameters):
if condition0:
if condition1:
if condition2:
return function1, parameters # return function to call next with arguments
else:
....
return function2, parameters
else:
return function1, parameters
else:
return function2, parameters
You should change the other functions to a similar way. Now you can call the main_function() as follows:
next_function, next_fun_param = main_function(parameters)
while hasattr(next_function, '__call__')
next_function, next_fun_param = next_function(next_fun_param)
# got the RESULT

Resources