I'm using pytest.mark to give my tests kwargs. However, if I use the same mark on both the class and a test within the class, the class's mark overrides the mark on the function when the same kwargs are used for both.
import pytest
animal = pytest.mark.animal
#animal(species='croc') # Mark the class with a kwarg
class TestClass(object):
#animal(species='hippo') # Mark the function with new kwarg
def test_function(self):
pass
#pytest.fixture(autouse=True) # Use a fixture to inspect my function
def animal_inspector(request):
print request.function.animal.kwargs # Show how the function object got marked
# prints {'species': 'croc'} but the function was marked with 'hippo'
Where'd my hippo go and how can I get him back?
There are unfortunately various pytest bugs related to this, I'm guessing you're running into one of them. The ones I found are related to subclassing which you don't do there though.
So I've been digging around in the pytest code and figured out why this is happening. The marks on the functions are applied to the function at import time but the class and module level marks don't get applied on the function level until test collection. Function marks happen first and add their kwargs to the function. Then class marks overwrite any same kwargs and module marks further overwrite any matching kwargs.
My solution was to simply create my own modified MarkDecorator that filters kwargs before they are added to the marks. Basically, whatever kwarg values get set first (which seems to always be by a function decorator) will always be the value on the mark. Ideally I think this functionality should be added in the MarkInfo class but since my code wasn't creating instances of that I went with what I was creating instances of: MarkDecorator. Note that I only change two lines from the source code (the bits about keys_to_add).
from _pytest.mark import istestfunc, MarkInfo
import inspect
class TestMarker(object): # Modified MarkDecorator class
def __init__(self, name, args=None, kwargs=None):
self.name = name
self.args = args or ()
self.kwargs = kwargs or {}
#property
def markname(self):
return self.name # for backward-compat (2.4.1 had this attr)
def __repr__(self):
d = self.__dict__.copy()
name = d.pop('name')
return "<MarkDecorator %r %r>" % (name, d)
def __call__(self, *args, **kwargs):
""" if passed a single callable argument: decorate it with mark info.
otherwise add *args/**kwargs in-place to mark information. """
if args and not kwargs:
func = args[0]
is_class = inspect.isclass(func)
if len(args) == 1 and (istestfunc(func) or is_class):
if is_class:
if hasattr(func, 'pytestmark'):
mark_list = func.pytestmark
if not isinstance(mark_list, list):
mark_list = [mark_list]
mark_list = mark_list + [self]
func.pytestmark = mark_list
else:
func.pytestmark = [self]
else:
holder = getattr(func, self.name, None)
if holder is None:
holder = MarkInfo(
self.name, self.args, self.kwargs
)
setattr(func, self.name, holder)
else:
# Don't set kwargs that already exist on the mark
keys_to_add = {key: value for key, value in self.kwargs.items() if key not in holder.kwargs}
holder.add(self.args, keys_to_add)
return func
kw = self.kwargs.copy()
kw.update(kwargs)
args = self.args + args
return self.__class__(self.name, args=args, kwargs=kw)
# Create my Mark instance. Note my modified mark class must be imported to be used
animal = TestMarker(name='animal')
# Apply it to class and function
#animal(species='croc') # Mark the class with a kwarg
class TestClass(object):
#animal(species='hippo') # Mark the function with new kwarg
def test_function(self):
pass
# Now prints {'species': 'hippo'} Yay!
Related
Is there a way to print all of a function's variables? I want to build a custom debugging decorator, and can't seem to find what I'm looking for. I'm assuming there is some dunder method for this? So for a function:
def debugger(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
print(func.__funcVariables__) #Some dunder method that Prints all variables contained in func
return func
return wrapper
#debugger
def my_func():
x = 'foo'
y = 'bar'
I would want 'foo' and 'bar' printed to the console from the decorator. How can I achieve this?
It sounds like you're looking for function.__code__.co_varnames, which is a tuple of the names of the functions arguments and local variables. This is documented with the rest of the code introspection tools in the documentation for the inspect module
def debugger(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
print(func.__code__.co_varnames)
return func
return wrapper
#debugger
def my_func():
x = 'foo'
y = 'bar'
During development of a new app based on Django, I noticed a memory corruption. I got two functions which use this class:
class ConfigMap():
data = list()
def add(self, entry: CusDeployPhone):
for row in self.data:
if row.phone_var.varid == entry.phone_var.varid:
return self
self.data.append(entry)
return self
def get(self):
return self.data
Function #1
def gen_config_model(request, deploy_phone):
deploy_phone_general = CusDeployPhone.objects.filter(phone_model=7)
config_list_model = ConfigMap()
for entry in deploy_phone:
config_list_model.add(entry)
for entry in deploy_phone_general:
config_list_model.add(entry)
Function #2
def gen_config_endpoint(request):
config_list_endpoint = ConfigMap()
for entry in deploy_model:
config_list_endpoint.add(entry)
for entry in deploy_phone_general:
config_list_endpoint.add(entry)
Both functions return the data in the list. I noticed that when calling the endpoint view, I also see the data when loading the model-one!
Somehow the class gets corrupted or merged with the existing one. Why?
The variable is local to the function.
I know there are issues with lists (references / pointers) but why in this case?
Issue solved:
Python Stack Corruption?
The children variable was declared as a class-level variable so it is
shared amongst all instances of your Nodes. You need to declare it an
instance variable by setting it in the initializer.
Change declaration to initializer.
def __init__(self):
self.children = []
...
I have a custom DAG (meant to be subclassed), let's name it MyDAG. In the __enter__ method I want to add (or not) an operator based on the subclassing DAG. I'm not interested in using the BranchPythonOperator.
class MyDAG(DAG):
def __enter__(self, context):
start = DummyOperator(taks_id=start)
end = DummyOperator(task_id=end)
op = self.get_additional_operator()
if op:
start >> op
else:
start >> end
retrun self
def get_additional_operator(self):
# None if the subclass doesn't add any operator. A reference to another operator otherwise
if get_additional_operator is returning a reference, I'm obtaining this shape (two branches):
* start --> op
* end
otherwise, if it's returning None, I'm obtaining this (one branch):
* start --> end
What I want is not having end at all in the subclass inherting from MyDAG if get_additional_operator doesn't return None, something like this:
* start --> op
Instead of the two branches I'm obtaining above.
Airflow is somehow parsing every operator declared in the __enter__ method of a subclass of MyDAG. From that assumption, in order not to have an operator it only suffices to declare the operator in the right place. code below:
class MyDAG(DAG):
def __enter__(self, context):
start = DummyOperator(taks_id=start)
op = self.get_additional_operator()
if op:
start >> op
else:
end = DummyOperator(task_id=end)
start >> end
retrun self
def get_additional_operator(self):
# None if the subclass doesn't add any operator. A reference to another operator otherwise
The declaration of the end operator is made in the else section. I think it's only parsed when the else is evaluated to true.
Just for learning purpose, I tried to set a dictionary as a global variable in accumulator the add function works well, but I ran the code and put dictionary in the map function, it always return empty.
But similar code for setting list as a global variable
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, acc1, acc2):
acc1.update(acc2)
if __name__== "__main__":
sc, sqlContext = init_spark("generate_score_summary", 40)
rdd = sc.textFile('input')
#print(rdd.take(5))
dict1 = sc.accumulator({}, DictParam())
def file_read(line):
global dict1
ls = re.split(',', line)
dict1+={ls[0]:ls[1]}
return line
rdd = rdd.map(lambda x: file_read(x)).cache()
print(dict1)
For anyone who arrives at this thread looking for a Dict accumulator for pyspark: the accepted solution does not solve the posed problem.
The issue is actually in the DictParam defined, it does not update the original dictionary. This works:
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, value1, value2):
value1.update(value2)
return value1
The original code was missing the return value.
I believe that print(dict1()) simply gets executed before the rdd.map() does.
In Spark, there are 2 types of operations:
transformations, that describe the future computation
and actions, that call for action, and actually trigger the execution
Accumulators are updated only when some action is executed:
Accumulators do not change the lazy evaluation model of Spark. If they
are being updated within an operation on an RDD, their value is only
updated once that RDD is computed as part of an action.
If you check out the end of this section of the docs, there is an example exactly like yours:
accum = sc.accumulator(0)
def g(x):
accum.add(x)
return f(x)
data.map(g)
# Here, accum is still 0 because no actions have caused the `map` to be computed.
So you would need to add some action, for instance:
rdd = rdd.map(lambda x: file_read(x)).cache() # transformation
foo = rdd.count() # action
print(dict1)
Please make sure to check on the details of various RDD functions and accumulator peculiarities because this might affect the correctness of your result. (For instance, rdd.take(n) will by default only scan one partition, not the entire dataset.)
For accumulator updates performed inside actions only, their value is
only updated once that RDD is computed as part of an action
class Account
def initialize(starting_balance = 0)
#balance = starting_balance
end
def balance #instance getter method
#balance #instance variable visible only to this object
end
def balance=(new_amount)
#balance = new_amount
end
def deposit(amount)
#balance+=amount
end
##bank_name= "MyBank.com" # class (static) variable
# A class method
def self.bank_name
##bank_name
end
# or: def SavingsAccount.bank_name : ##bank_name : end
end
I want to understand the code snippets in bold. What do they do? what is the difference between a setter and initialize method.
If I had an object test=Account.new() and why is test(30) giving an error. Isn't that suppose to call the setter method with parameter 30 and set the balance?
initialize is the method that is called on the newly created object when you do Account.new or Account.new(my_starting_balance). In the first case initialize would be called with the default value 0 for starting_balance and in the second with my_starting_balance.
The setter method balance= is called when you do my_account.balance = some_value where my_account is an instance of the class Account. So if you have the following code, initialize will be called on line 1 (with 0 as its argument) and balance= on line 2 (with 23) as its argument:
my_account = Account.new
my_account.balance = 23
Of course in this case I could just as well write the following and not use the setter method at all:
my_account = Account.new(23)
However that doesn't always work because some times you might want to change the value of balance after the object has already been created.
If I had an object test=Account.new() and why is test(30) giving an error.
Because test(30) means "call the method test with the argument 30" and there is no method called test in your code.
Regarding the second bolded part of your code: As the comments indicate, it sets a class variable named ##bank_name and defines a class method that returns that variable's value.