Airflow 2 loosely coupling #task return values to receiving #task? - airflow

I'm trying to write two tasks that have no knowledge of the other. One task returns a dict (via XComArg) and I want to pass a single property of that object to the next task. If I pass the entire XComArg object, its value is populated as expected. But selecting a single property results in a None.
#dag(...):
def _dag():
#task
def A(**ctx):
# ...
return {'a': 42, 'b': 'B', 'c': 'C'}
#task
def B(a, _res, **ctx):
print('A', a) # >>> A None
print('RES', _res) # >>> RES {'a': 42, ...}
res = A()
B(res['a'])
dag = _dag
Ideally, B doesn't know where the value for a comes from, nor how to get it. Yes, passing all of res and having B extract what it needs with res['a'] works, but my goal is loose coupling.

See example in https://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html
You need to specify 'multiple_outputs=true" in task A

Related

where do we use the methods __str__ and __repr__ in python3? [duplicate]

This question already has answers here:
What is the difference between __str__ and __repr__?
(28 answers)
Closed 2 years ago.
I really don't understand where are __str__ and __repr__ used in Python. I mean, I get that __str__ returns the string representation of an object. But why would I need that? In what use case scenario? Also, I read about the usage of __repr__
But what I don't understand is, where would I use them?
__repr__
Called by the repr() built-in function and by string conversions (reverse quotes) to compute the "official" string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).
__str__
Called by the str() built-in function and by the print statement to compute the "informal" string representation of an object.
Use __str__ if you have a class, and you'll want an informative/informal output, whenever you use this object as part of string. E.g. you can define __str__ methods for Django models, which then gets rendered in the Django administration interface. Instead of something like <Model object> you'll get like first and last name of a person, the name and date of an event, etc.
__repr__ and __str__ are similar, in fact sometimes equal (Example from BaseSet class in sets.py from the standard library):
def __repr__(self):
"""Return string representation of a set.
This looks like 'Set([<list of elements>])'.
"""
return self._repr()
# __str__ is the same as __repr__
__str__ = __repr__
The one place where you use them both a lot is in an interactive session. If you print an object then its __str__ method will get called, whereas if you just use an object by itself then its __repr__ is shown:
>>> from decimal import Decimal
>>> a = Decimal(1.25)
>>> print(a)
1.25 <---- this is from __str__
>>> a
Decimal('1.25') <---- this is from __repr__
The __str__ is intended to be as human-readable as possible, whereas the __repr__ should aim to be something that could be used to recreate the object, although it often won't be exactly how it was created, as in this case.
It's also not unusual for both __str__ and __repr__ to return the same value (certainly for built-in types).
Building up and on the previous answers and showing some more examples. If used properly, the difference between str and repr is clear. In short repr should return a string that can be copy-pasted to rebuilt the exact state of the object, whereas str is useful for logging and observing debugging results. Here are some examples to see the different outputs for some known libraries.
Datetime
print repr(datetime.now()) #datetime.datetime(2017, 12, 12, 18, 49, 27, 134411)
print str(datetime.now()) #2017-12-12 18:49:27.134452
The str is good to print into a log file, where as repr can be re-purposed if you want to run it directly or dump it as commands into a file.
x = datetime.datetime(2017, 12, 12, 18, 49, 27, 134411)
Numpy
print repr(np.array([1,2,3,4,5])) #array([1, 2, 3, 4, 5])
print str(np.array([1,2,3,4,5])) #[1 2 3 4 5]
in Numpy the repr is again directly consumable.
Custom Vector3 example
class Vector3(object):
def __init__(self, args):
self.x = args[0]
self.y = args[1]
self.z = args[2]
def __str__(self):
return "x: {0}, y: {1}, z: {2}".format(self.x, self.y, self.z)
def __repr__(self):
return "Vector3([{0},{1},{2}])".format(self.x, self.y, self.z)
In this example, repr returns again a string that can be directly consumed/executed, whereas str is more useful as a debug output.
v = Vector3([1,2,3])
print str(v) #x: 1, y: 2, z: 3
print repr(v) #Vector3([1,2,3])
One thing to keep in mind, if str isn't defined but repr, str will automatically call repr. So, it's always good to at least define repr
Grasshopper, when in doubt go to the mountain and read the Ancient Texts. In them you will find that __repr__() should:
If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value.
Lets have a class without __str__ function.
class Employee:
def __init__(self, first, last, pay):
self.first = first
self.last = last
self.pay = pay
emp1 = Employee('Ivan', 'Smith', 90000)
print(emp1)
When we print this instance of the class, emp1, this is what we get:
<__main__.Employee object at 0x7ff6fc0a0e48>
This is not very helpful, and certainly this is not what we want printed if we are using it to display (like in html)
So now, the same class, but with __str__ function:
class Employee:
def __init__(self, first, last, pay):
self.first = first
self.last = last
self.pay = pay
def __str__(self):
return(f"The employee {self.first} {self.last} earns {self.pay}.")
# you can edit this and use any attributes of the class
emp2 = Employee('John', 'Williams', 90000)
print(emp2)
Now instead of printing that there is an object, we get what we specified with return of __str__ function:
The employee John Williams earns 90000
str will be informal and readable format whereas repr will give official object representation.
class Complex:
# Constructor
def __init__(self, real, imag):
self.real = real
self.imag = imag
# "official" string representation of an object
def __repr__(self):
return 'Rational(%s, %s)' % (self.real, self.imag)
# "informal" string representation of an object (readable)
def __str__(self):
return '%s + i%s' % (self.real, self.imag)
t = Complex(10, 20)
print (t) # this is usual way we print the object
print (str(t)) # this is str representation of object
print (repr(t)) # this is repr representation of object
Answers :
Rational(10, 20) # usual representation
10 + i20 # str representation
Rational(10, 20) # repr representation
str and repr are both ways to represent. You can use them when you are writing a class.
class Fraction:
def __init__(self, n, d):
self.n = n
self.d = d
def __repr__(self):
return "{}/{}".format(self.n, self.d)
for example when I print a instance of it, it returns things.
print(Fraction(1, 2))
results in
1/2
while
class Fraction:
def __init__(self, n, d):
self.n = n
self.d = d
def __str__(self):
return "{}/{}".format(self.n, self.d)
print(Fraction(1, 2))
also results in
1/2
But what if you write both of them, which one does python use?
class Fraction:
def __init__(self, n, d):
self.n = n
self.d = d
def __str__(self):
return "str"
def __repr__(self):
return "repr"
print(Fraction(None, None))
This results in
str
So python actually uses the str method not the repr method when both are written.
Suppose you have a class and wish to inspect an instance, you see the print doesn't give much useful information
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
p1 = Person("John", 36)
print(p1) # <__main__.Animal object at 0x7f9060250410>
Now see a class with a str, it shows the instance information and with repr you even don't need the print. Nice no?
class Animal:
def __init__(self, color, age, breed):
self.color = color
self.age = age
self.breed = breed
def __str__(self):
return f"{self.color} {self.breed} of age {self.age}"
def __repr__(self):
return f"repr : {self.color} {self.breed} of age {self.age}"
a1 = Animal("Red", 36, "Dog")
a1 # repr : Red Dog of age 36
print(a1) # Red Dog of age 36

Have a nested dictionary in python, and would like to fine an efficient way to append to it

I am looking to find the best pythonic way to do this.
Nested dictionary looks something like this (main script):
my_dict = { test: {
test_a: 'true',
test_b: 'true
}
I am importing a module that has functions that return numeric values.
I am looking for a way to append to the my_dict dictionary from the dictionary returned from the module.
i.e. functions from module:
def testResults1():
results = 3129282
return results
def testResults2():
results = 33920230
return results
def combineResults():
Would like to combine results, and return a dictionary. Dictionary returned is:
# Looking for best way to do this.
test_results = { 'testresults1': 3129282,
'testresults2': 33920230
}
I then want to append the test_results dictionary to my_dict.
Looking for best way to do this as well.
Thank you in advance!
Are you looking for the dict.update() method?
>>> d = {'a': 1, 'b': 2}
>>> d2 = {'c': 3}
>>> d.update(d2)
>>> d
{'a': 1, 'b': 2, 'c': 3}
my_dict = {}
def testResults1():
results = 3129282
return results
def testResults2():
results = 33920230
return results
def combineResults():
suite = [testResults1, testResults2]
return dict((test.__name__, test()) for test in suite)
my_dict.update(combineResults())
print my_dict
import collections
my_dict = collections.defaultdict(lambda: {})
def add_values(key, inner_dict):
my_dict[key].update(inner_dict)
You can read about collections.defaultdict in the library docs here.

Pythonic way to iterate over a collections.Counter() instance in descending order?

In Python 2.7, I want to iterate over a collections.Counter instance in descending count order.
>>> import collections
>>> c = collections.Counter()
>>> c['a'] = 1
>>> c['b'] = 999
>>> c
Counter({'b': 999, 'a': 1})
>>> for x in c:
print x
a
b
In the example above, it appears that the elements are iterated in the order they were added to the Counter instance.
I'd like to iterate over the list from highest to lowest. I see that the string representation of Counter does this, just wondering if there's a recommended way to do it.
You can iterate over c.most_common() to get the items in the desired order. See also the documentation of Counter.most_common().
Example:
>>> c = collections.Counter(a=1, b=999)
>>> c.most_common()
[('b', 999), ('a', 1)]
Here is the example to iterate the Counter in Python collections:
>>>def counterIterator():
... import collections
... counter = collections.Counter()
... counter.update(('u1','u1'))
... counter.update(('u2','u2'))
... counter.update(('u2','u1'))
... for ele in counter:
... print(ele,counter[ele])
>>>counterIterator()
u1 3
u2 3
Your problem was solved for just returning descending order but here is how to do it generically. In case someone else comes here from Google here is how I had to solve it. Basically what you have above returns the keys for the dictionary inside collections.Counter(). To get the values you just need to pass the key back to the dictionary like so:
for x in c:
key = x
value = c[key]
I had a more specific problem where I had word counts and wanted to filter out the low frequency ones. The trick here is to make a copy of the collections.Counter() or you will get "RuntimeError: dictionary changed size during iteration" when you try to remove them from the dictionary.
for word in words.copy():
# remove small instance words
if words[word] <= 3:
del words[word]

Appending data to an AT Field using transmogrifier

I have a CSV file of data like this:
1, [a, b, c]
2, [a, b, d]
3, [a]
and some Plone objects which should be updated like this:
ID, LinesField
a, [1,2,3]
b, [1,2]
c, [1]
d, [2]
So, to clarify, the object with the id a is named on lines 1, 2 and 3 of the CSV, and thus the LinesField property of object a needs to have those line ids (the first number on the line) listed.
Ideally I'd like to use Transmogrifier to import this information (and avoid doing any manipulation in Excel beforehand), and I can see two ways, theoretically of doing this, but I can't work out how to do this in practice. I'd be grateful for some pointers to examples. I think that either I need to transform the entire pipeline so that the items reflect the structure of my Plone objects and then use the ATSchemaUpdater blueprint, but I can't see any examples on how to add items to the pipeline (do I need to write my own blueprint?) Or, alternatively I could loop through the items as they exist and append the value in the left column to the items in the list in the right. For that I need a way of appending values with ATSchemaUpdater rather than overwriting them - again, is there a blueprint for that anywhere?
Here's a few sample csv lines:
"Name","Themes"
"Bessie Brown","cah;cab;cac"
"Fred Blogs","cah;cac"
"Dinah Washington","cah;cab"
The Plone object will be a theme and the lines field a list of names:
cah, ['Bessie Brown', 'Fred Boggs' etc etc]
I'm not pretty sure you want to read the CVS file using transmogrifier, but I think you can create a section to insert these values to the items in the pipeline using a function like this:
def transpose(cvs):
keys = []
[keys.extend(v) for v in cvs.values()]
keys = set(keys)
d = {}
for key in keys:
values = [k for k, v in cvs.iteritems() if key in v]
d[key] = values
return d
In this context, cvs is {1: ['a', 'b', 'c'], 2: ['a', 'b', 'd'], 3: ['a']}; keys will contain all possible values set(['a', 'c', 'b', 'd']); and d will be what you want {'a': [1, 2, 3], 'c': [1], 'b': [1, 2], 'd': [2]}.
Probably there are better ways to do it, but I'm not a Python magician.
The insert section could look like this one:
class Insert(object):
"""Insert new keys into items.
"""
classProvides(ISectionBlueprint)
implements(ISection)
def __init__(self, transmogrifier, name, options, previous):
self.previous = previous
self.new_keys = transpose(cvs)
def __iter__(self):
for item in self.previous:
item.update(self.new_keys)
yield item
After that you can use the SchemaUpdater section.

How can I use functools.partial on multiple methods on an object, and freeze parameters out of order?

I find functools.partial to be extremely useful, but I would like to be able to freeze arguments out of order (the argument you want to freeze is not always the first one) and I'd like to be able to apply it to several methods on a class at once, to make a proxy object that has the same methods as the underlying object except with some of its methods parameters being frozen (think of it as generalizing partial to apply to classes). And I'd prefer to do this without editing the original object, just like partial doesn't change its original function.
I've managed to scrap together a version of functools.partial called 'bind' that lets me specify parameters out of order by passing them by keyword argument. That part works:
>>> def foo(x, y):
... print x, y
...
>>> bar = bind(foo, y=3)
>>> bar(2)
2 3
But my proxy class does not work, and I'm not sure why:
>>> class Foo(object):
... def bar(self, x, y):
... print x, y
...
>>> a = Foo()
>>> b = PureProxy(a, bar=bind(Foo.bar, y=3))
>>> b.bar(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() takes exactly 3 arguments (2 given)
I'm probably doing this all sorts of wrong because I'm just going by what I've pieced together from random documentation, blogs, and running dir() on all the pieces. Suggestions both on how to make this work and better ways to implement it would be appreciated ;) One detail I'm unsure about is how this should all interact with descriptors. Code follows.
from types import MethodType
class PureProxy(object):
def __init__(self, underlying, **substitutions):
self.underlying = underlying
for name in substitutions:
subst_attr = substitutions[name]
if hasattr(subst_attr, "underlying"):
setattr(self, name, MethodType(subst_attr, self, PureProxy))
def __getattribute__(self, name):
return getattr(object.__getattribute__(self, "underlying"), name)
def bind(f, *args, **kwargs):
""" Lets you freeze arguments of a function be certain values. Unlike
functools.partial, you can freeze arguments by name, which has the bonus
of letting you freeze them out of order. args will be treated just like
partial, but kwargs will properly take into account if you are specifying
a regular argument by name. """
argspec = inspect.getargspec(f)
argdict = copy(kwargs)
if hasattr(f, "im_func"):
f = f.im_func
args_idx = 0
for arg in argspec.args:
if args_idx >= len(args):
break
argdict[arg] = args[args_idx]
args_idx += 1
num_plugged = args_idx
def new_func(*inner_args, **inner_kwargs):
args_idx = 0
for arg in argspec.args[num_plugged:]:
if arg in argdict:
continue
if args_idx >= len(inner_args):
# We can't raise an error here because some remaining arguments
# may have been passed in by keyword.
break
argdict[arg] = inner_args[args_idx]
args_idx += 1
f(**dict(argdict, **inner_kwargs))
new_func.underlying = f
return new_func
Update: In case anyone can benefit, here's the final implementation I went with:
from types import MethodType
class PureProxy(object):
""" Intended usage:
>>> class Foo(object):
... def bar(self, x, y):
... print x, y
...
>>> a = Foo()
>>> b = PureProxy(a, bar=FreezeArgs(y=3))
>>> b.bar(1)
1 3
"""
def __init__(self, underlying, **substitutions):
self.underlying = underlying
for name in substitutions:
subst_attr = substitutions[name]
if isinstance(subst_attr, FreezeArgs):
underlying_func = getattr(underlying, name)
new_method_func = bind(underlying_func, *subst_attr.args, **subst_attr.kwargs)
setattr(self, name, MethodType(new_method_func, self, PureProxy))
def __getattr__(self, name):
return getattr(self.underlying, name)
class FreezeArgs(object):
def __init__(self, *args, **kwargs):
self.args = args
self.kwargs = kwargs
def bind(f, *args, **kwargs):
""" Lets you freeze arguments of a function be certain values. Unlike
functools.partial, you can freeze arguments by name, which has the bonus
of letting you freeze them out of order. args will be treated just like
partial, but kwargs will properly take into account if you are specifying
a regular argument by name. """
argspec = inspect.getargspec(f)
argdict = copy(kwargs)
if hasattr(f, "im_func"):
f = f.im_func
args_idx = 0
for arg in argspec.args:
if args_idx >= len(args):
break
argdict[arg] = args[args_idx]
args_idx += 1
num_plugged = args_idx
def new_func(*inner_args, **inner_kwargs):
args_idx = 0
for arg in argspec.args[num_plugged:]:
if arg in argdict:
continue
if args_idx >= len(inner_args):
# We can't raise an error here because some remaining arguments
# may have been passed in by keyword.
break
argdict[arg] = inner_args[args_idx]
args_idx += 1
f(**dict(argdict, **inner_kwargs))
return new_func
You're "binding too deep": change def __getattribute__(self, name): to def __getattr__(self, name): in class PureProxy. __getattribute__ intercepts every attribute access and so bypasses everything that you've set with setattr(self, name, ... making those setattr bereft of any effect, which obviously's not what you want; __getattr__ is called only for access to attributes not otherwise defined so those setattr calls become "operative" & useful.
In the body of that override, you can and should also change object.__getattribute__(self, "underlying") to self.underlying (since you're not overriding __getattribute__ any more). There are other changes I'd suggest (enumerate in lieu of the low-level logic you're using for counters, etc) but they wouldn't change the semantics.
With the change I suggest, your sample code works (you'll have to keep testing with more subtle cases of course). BTW, the way I debugged this was simply to stick in print statements in the appropriate places (a jurassic=era approach but still my favorite;-).

Resources