About tornado.gen.Task's usage - asynchronous request - asynchronous

below is my source:
class Get_Salt_Handler(tornado.web.RequestHandler):
#tornado.web.asynchronous
#tornado.gen.coroutine
def get(self):
#yield tornado.gen.Task(tornado.ioloop.IOLoop.instance().add_timeout, time.time() + 5)
yield tornado.gen.Task(self.get_salt_from_db, 123)
self.write("when i sleep 5s")
def get_salt_from_db(self, params):
print params
and I run it; the console reported that:
TypeError: get_salt_from_db() got an unexpected keyword argument 'callback'
and I don't know why?

gen.Task is used to adapt a callback-based function to the coroutine style; it cannot be used to call synchronous functions. What you probably want is a ThreadPoolExecutor (standard in Python 3.2+, available with pip install futures on Python 2):
# global
executor = concurrent.futures.ThreadPoolExecutor(NUM_THREADS)
#gen.coroutine
def get(self):
salt = yield executor.submit(self.get_salt_from_db)

Related

How to write unittest for #task decorated Airflow tasks?

I am trying to write unittests for some of the tasks built with Airflow TaskFlow API. I tried multiple approaches for example, by creating a dagrun or only running the task function but nothing is helping.
Here is a task where I download a file from S3, there is more stuff going on but I removed that for this example.
#task()
def updates_process(files):
context = get_current_context()
try:
updates_file_path = utils.download_file_from_s3_bucket(files.get("updates_file"))
except FileNotFoundError as e:
log.error(e)
return
# Do something else
Now I was trying to write a test case where I can check this except clause. Following is one the example I started with
class TestAccountLinkUpdatesProcess(TestCase):
#mock.patch("dags.delta_load.updates.log")
#mock.patch("dags.delta_load.updates.get_current_context")
#mock.patch("dags.delta_load.updates.utils.download_file_from_s3_bucket")
def test_file_not_found_error(self, download_file_from_s3_bucket, get_current_context, log):
download_file_from_s3_bucket.side_effect = FileNotFoundError
task = account_link_updates_process({"updates_file": "path/to/file.csv"})
get_current_context.assert_called_once()
log.error.assert_called_once()
I also tried by creating a dagrun as shown in the example here in docs and fetching the task from the dagrun but that also didin't help.
I was struggling to do this myself, but I found that the decorated tasks have a .function parameter: https://github.dev/apache/airflow/blob/be7cb1e837b875f44fcf7903329755245dd02dc3/airflow/decorators/base.py#L522
You can then use .funciton to call the actual function. Using your example:
class TestAccountLinkUpdatesProcess(TestCase):
#mock.patch("dags.delta_load.updates.log")
#mock.patch("dags.delta_load.updates.get_current_context")
#mock.patch("dags.delta_load.updates.utils.download_file_from_s3_bucket")
def test_file_not_found_error(self, download_file_from_s3_bucket, get_current_context, log):
download_file_from_s3_bucket.side_effect = FileNotFoundError
task = dags.delta_load.updates.updates_process
# Call the function for testing
task.function({"updates_file": "path/to/file.csv"})
get_current_context.assert_called_once()
log.error.assert_called_once()
This prevents you from having to set up any of the DAG infrastructure and just run the python function as intended!
This is what I could figure out. Not sure if this is the right thing but it works.
class TestAccountLinkUpdatesProcess(TestCase):
TASK_ID = "updates_process"
#classmethod
def setUpClass(cls) -> None:
cls.dag = dag_delta_load()
#mock.patch("dags.delta_load.updates.log")
#mock.patch("dags.delta_load.updates.get_current_context")
#mock.patch("dags.delta_load.updates.utils.download_file_from_s3_bucket")
def test_file_not_found_error(self, download_file_from_s3_bucket, get_current_context, log):
download_file_from_s3_bucket.side_effect = FileNotFoundError
task = self.dag.get_task(task_id=self.TASK_ID)
task.op_args = [{"updates_file": "file.csv"}]
task.execute(context={})
log.error.assert_called_once()
UPDATE: Based on the answer of #AetherUnbound I did some investigation and found that we can use task.__wrapped__() to call the actual python function.
class TestAccountLinkUpdatesProcess(TestCase):
#mock.patch("dags.delta_load.updates.log")
#mock.patch("dags.delta_load.updates.get_current_context")
#mock.patch("dags.delta_load.updates.utils.download_file_from_s3_bucket")
def test_file_not_found_error(self, download_file_from_s3_bucket, get_current_context, log):
download_file_from_s3_bucket.side_effect = FileNotFoundError
update_process.__wrapped__({"updates_file": "file.csv"})
log.error.assert_called_once()

dagster solid Parallel Run Test exmaple

I tried to run parallel, but it didn't work as I expected
The progress bar doesn't work the way I thought it would.
I think that both operations should be executed at the same time.
but first run find_highest_calorie_cereal after find_highest_protein_cereal
import csv
import time
import requests
from dagster import pipeline, solid
# start_complex_pipeline_marker_0
#solid
def download_cereals():
response = requests.get("https://docs.dagster.io/assets/cereal.csv")
lines = response.text.split("\n")
return [row for row in csv.DictReader(lines)]
#solid
def find_highest_calorie_cereal(cereals):
time.sleep(5)
sorted_cereals = list(
sorted(cereals, key=lambda cereal: cereal["calories"])
)
return sorted_cereals[-1]["name"]
#solid
def find_highest_protein_cereal(context, cereals):
time.sleep(10)
sorted_cereals = list(
sorted(cereals, key=lambda cereal: cereal["protein"])
)
# for i in range(1, 11):
# context.log.info(str(i) + '~~~~~~~~')
# time.sleep(1)
return sorted_cereals[-1]["name"]
#solid
def display_results(context, most_calories, most_protein):
context.log.info(f"Most caloric cereal 테스트: {most_calories}")
context.log.info(f"Most protein-rich cereal: {most_protein}")
#pipeline
def complex_pipeline():
cereals = download_cereals()
display_results(
most_protein=find_highest_protein_cereal(cereals),
most_calories=find_highest_calorie_cereal(cereals),
)
I am not sure but I think you should set up a executor with parallelism available. You could use multiprocess_executor.
"Executors are responsible for executing steps within a pipeline run.
Once a run has launched and the process for the run, or run worker,
has been allocated and started, the executor assumes responsibility
for execution."
modes provide the possible set of executors one can use. Use the executor_defs property on ModeDefinition.
MODE_DEV = ModeDefinition(name="dev", executor_defs=[multiprocess_executor])
#pipeline(mode_defs=[MODE_DEV], preset_defs=[Preset_test])
the execution config section of the run config determines the actual executor.
in the yml file or run_config, set:
execution:
multiprocess:
config:
max_concurrent: 4
retrieved from : https://docs.dagster.io/deployment/executors

mocked function is called but fails assert_called test

I have a test like:
import somemodule
import somemodule2
class SomeTestCase(unittest.TestCase):
def setUp(self):
super().setUp()
self.ft_mock = mock.MagicMock(spec=somemodule.SomeClass,
name='mock_it')
def tearDown(self) -> None:
self.ft_mock.reset_mock(return_value=True, side_effect=True)
#mock.patch('somemodule2.someFunk2',
name = 'mock_2',
spec=True,
side_effect='some val')
def testSomething(self, mock_rr):
m = self.ft_mock
m.addMemberSomething()
self.assertEqual(len(m.addMember.call_args_list), 1)
m.addSomethingElse.return_value = 'RETURN IT'
m.addSomethingElse()
m.addSomethingElse.assert_called_once()
res = somemodule.FooClass.foo()
mock_rr.assert_called()
Here somemodule.FooClass.foo() internally calls someFunk2 from somemodule2 which has been mocked as mock_rr
In the test debug it does call it as i print out a line from someFunk2 but testing it when mock_rr.assert_called() is called, it throws:
AssertionError: Expected 'mock_2' to have been called.
I've tried several ways using patch and patch.object
The issue was that somemodule also imported someFunk2 from somemodule2.
So when patching/mocking, the patched object was from somemodule2 while the object that Needed to be patched was somemodule.someFunk2

How can I type a function argument as native function

I have a helper function to use in python repl to move variables to global for easy debugging. But there is a mypy error:
class stepin(object): # pylint: disable=R0903
def __init__(self, func: Callable) -> None:
self.func = func
self.args = func.__code__.co_varnames
if hasattr(func, "__defaults__") and func.__defaults__:
self.defaults = dict(zip(reversed(self.args), reversed(func.__defaults__)))
else:
self.defaults = None
def __call__(self, *args, **kwargs):
result_dict = {x: None for x in self.args}
if self.defaults:
result_dict.update(self.defaults)
result_dict.update(dict(zip(self.args, args)))
result_dict.update(kwargs)
for x in result_dict.keys():
if result_dict[x] is None:
raise ValueError('Missing args: ', self.func.__qualname__, x)
globals().update(result_dict)
Now, the line
if hasattr(func, "__defaults__") and func.__defaults__:
self.defaults = dict(zip(reversed(self.args), reversed(func.__defaults__)))
raises a mypy error that says func has no __defaults__
Now I understand that the BDFL has said he despises the "hasattr" check so it's probably not gonna be solved inside mypy; then my question is, is there a way to change the __init__ typing signature to get rid of the error?
What have I tried: Callable doesn't work, understandable: not all Callables have __defaults__.
But where is the type "function"? If I type() a function it says "function" but "function" is not in preamble or "typing". I see that some people mention "FunctionType" but it's not in "typing" either.
The type of a function is types.FunctionType (in the types module).
If you modify the annotation for func from Callable to types.FunctionType, mypy no longer complains about __defaults__.

Simplest tornado.gen example

I am trying to use Tornado's sync-style 'gen' tool to run a simple echo function, in a non-blocking style:
import tornado.web
import tornado.gen
import logging
def echo(message):
return message
#tornado.gen.engine
def runme():
response = yield tornado.gen.Task(echo, 'this is a message')
logging.warn(response)
runme()
As far as I can tell this code isn't significantly different to the demo code in the docs, minus the unnecessary request handler stuff - I'm not handling any HTTP requests, AFAICT that's orthagonal to running something asynchronously. Yet this always fails with:
Traceback (most recent call last):
File "./server.py", line 46, in <module>
runme()
TypeError: wrapper() takes at least 1 argument (0 given)
Exactly where am I missing the argument? How can I make Tornado run this function asynchronously?
Task doesn't actually make a callback for the function being run, and start the callback when the function returns, as I originally thought.
I need to create a callback in the task being run myself, and invoke it, i.e.:
import tornado.web
import tornado.gen
import logging
def echo(message, callback=None):
callback(message)
#tornado.gen.engine
def runme():
response = yield tornado.gen.Task(echo, 'this is a message')
logging.warn(response)
runme()

Resources