I am running a development server locally
python manage.py runserver 8000
Then I run a script which consumes the Consumer below
from channels.generic.websocket import AsyncJsonWebsocketConsumer
class MyConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
import time
time.sleep(99999999)
await self.accept()
Everything runs fine and the consumer sleeps for a long time as expected. However I am not able to access http://127.0.0.1:8000/ from the browser.
The problem is bigger in real life since the the consumer needs to make a HTTP request to the same server - and essentially ends up in a deadlock.
Is this the expected behaviour? How do I allow calls to my server while a slow consumer is running?
since this is an async function you should but using asyncio's sleep.
import asyncio
from channels.generic.websocket import AsyncJsonWebsocketConsumer
class MyConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
await asyncio.sleep(99999999)
await self.accept()
if you use time.sleep you will sleep the entire python thread.
this also applies to when you make your upstream HTTP request you need to use an asyncio http library not a synchronise library. (basically you should be awaiting anything that is expected to take any time)
Related
This question is really for the different coroutines in base_events.py and streams.py that deal with Network Connections, Network Servers and their higher API equivalents under Streams but since its not really clear how to group these functions I am going to attempt to use start_server() to explain what I don't understand about these coroutines and haven't found online (unless I missed something obvious).
When running the following code, I am able to create a server that is able to handle incoming messages from a client and I also periodically print out the number of tasks that the EventLoop is handling to see how the tasks work. What I'm surprised about is that after creating a server, the task is in the finished state not too long after the program starts. I expected that a task in the finished state was a completed task that no longer does anything other than pass back the results or exception.
However, of course this is not true, the EventLoop is still running and handling incoming messages from clients and the application is still running. Monitor however shows that all tasks are completed and no new task is dispatched to handle a new incoming message.
So my question is this:
What is going on underneath asyncio that I am missing that explains the behavior I am seeing? For example, I would have expected a task (or tasks created for each message) that is handling incoming messages in the pending state.
Why is the asyncio.Task.all_tasks() passing back finished tasks. I would have thought that once a task has completed it is garbage collected (so long as no other references are to it).
I have seen similar behavior with the other asyncio functions like using create_connection() with a websocket from a site. I know at the end of these coroutines, their result is usually a tuple such as (reader, writer) or (transport, protocol) but I don't understand how it all ties together or what other documentation/code to read to give me more insight. Any help is appreciated.
import asyncio
from pprint import pprint
async def echo_message(self, reader, writer):
data = await reader.read(1000)
message = data.decode()
addr = writer.get_extra_info('peername')
print('Received %r from %r' % (message, addr))
print('Send: %r' % message)
writer.write(message.encode())
await writer.drain()
print('Close the client socket')
writer.close()
async def monitor():
while True:
tasks = asyncio.Task.all_tasks()
pprint(tasks)
await asyncio.sleep(60)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.create_task(monitor())
loop.create_task(asyncio.start_server(echo_message, 'localhost', 7777, loop))
loop.run_forever()
Outputs:
###
# Soon after starting the application, monitor prints out:
###
{<Task pending coro=<start_server() running ...>,
<Task pending coro=<monitor() running ...>,
<Task pending coro=<BaseEventLoop._create_server_getaddrinfo() running ...>}
###
# After, things initialized and the server has started and the next print out is:
###
{<Task finished coro=<start_server() done ...>,
<Task pending coro=<monitor() running ...>,
<Task finished coro=<BaseEventLoop._create_server_getaddrinfo() done ...>}
I have a resource intensive async method that i want to run as a background task. Example code for it looks like this:
#staticmethod
async def trigger_task(id: str, run_e2e: bool = False):
try:
add_status_for_task(id)
result1, result2 = await task(id)
update_status_for_task(id, result1, result2)
except Exception:
update_status_for_task(id, 'FAIL')
#router.post("/task")
async def trigger_task(background_tasks: BackgroundTasks):
background_tasks.add_task(EventsTrigger.trigger_task)
return {'msg': 'Task submitted!'}
When i trigger this endpoint, I expect an instant output: {'msg': 'Task submitted!'}. But instead the api output is awaited till the task completes. I am following this documentation from fastapi.
fastapi: v0.70.0
python: v3.8.10
I believe the issue is similar to what is described here.
Request help in making this a non-blocking call.
What I have learned from the github issues,
You can't use async def for task functions (Which will run in background.)
As in background process you can't access the coroutine, So, your async/await will not work.
You can still try without async/await. If that also doesn't work then you should go for alternative.
Alternative Background Solution
Celery is production ready task scheduler. So, you can easily configure and run the background task using your_task_function.delay(*args, **kwargs)
Note that, Celery also doesn't support async in background task. So, whatever you need to write is sync code to run in background.
Good Luck :)
Unfortunately you seem to have oversimplified your example so it is a little hard to tell what is going wrong.
But the important question is: are add_status_for_task() or update_status_for_task() blocking? Because if they are (and it seems like that is the case), then obviously you're going to have issues. When you run code with async/await all the code inside of it needs to be async as well.
This would make your code look more like:
async def trigger_task(id: str, run_e2e: bool = False):
try:
await add_status_for_task(id)
result1, result2 = await task(id)
await update_status_for_task(id, result1, result2)
except Exception:
await update_status_for_task(id, 'FAIL')
#router.post("/task/{task_id}")
async def trigger_task(task_id: str, background_tasks: BackgroundTasks):
background_tasks.add_task(EventsTrigger.trigger_task, task_id)
return {'msg': 'Task submitted!'}
How are you running your app?
According to the uvicorn docs its running with 1 worker by default, which means only one process will be issued simultaneously.
Try configuring your uvicorn to run with more workers.
https://www.uvicorn.org/deployment/
$ uvicorn example:app --port 5000 --workers THE_AMOUNT_OF_WORKERS
or
uvicorn.run("example:app", host="127.0.0.1", port=5000, workers=THE_AMOUNT_OF_WORKERS)
What I want to do
I have an HTTP API service, written in Flask, which is a template used to build instances of different services. As such, this template needs to be generalizable to handle use cases that do and do not include Kafka consumption.
My goal is to have an optional Kafka consumer running in the background of the API template. I want any service that needs it to be able to read data from a Kafka topic asynchronously, while also independently responding to HTTP requests as it usually does. These two processes (Kafka consuming, HTTP request handling) aren't related, except that they'll be happening under the hood of the same service.
What I've written
Here's my setup:
# ./create_app.py
from flask_socketio import SocketIO
socketio = None
def create_app(kafka_consumer_too=False):
"""
Return a Flask app object, with or without a Kafka-ready SocketIO object as well
"""
app = Flask('my_service')
app.register_blueprint(special_http_handling_blueprint)
if kafka_consumer_too:
global socketio
socketio = SocketIO(app=app, message_queue='kafka://localhost:9092', channel='some_topic')
from .blueprints import kafka_consumption_blueprint
app.register_blueprint(kafka_consumption_blueprint)
return app, socketio
return app
My run.py is:
# ./run.py
from . import create_app
app, socketio = create_app(kafka_consumer_too=True)
if __name__=="__main__":
socketio.run(app, debug=True)
And here's the Kafka consumption blueprint I've written, which is where I think it should be handling the stream events:
# ./blueprints/kafka_consumption_blueprint.py
from ..create_app import socketio
kafka_consumption_blueprint = Blueprint('kafka_consumption', __name__)
#socketio.on('message')
def handle_message(message):
print('received message: ' + message)
What it currently does
With the above, my HTTP requests are being handled fine when I curl localhost:5000. The problem is that, when I write to the some_topic Kafka topic (on port 9092), nothing is showing up. I have a CLI Kafka consumer running in another shell, and I can see that the messages I'm sending on that topic are showing up. So it's the Flask app that's not reacting: no messages are being consumed by handle_message().
What am I missing here? Thanks in advance.
I think you are interpreting the meaning of the message_queue argument incorrectly.
This argument is used when you have multiple server instances. These instances communicate with each other through the configured message queue. This queue is 100% internal, there is nothing that you are a user of the library can do with the message queue.
If you wanted to build some sort of pub/sub mechanism, then you have to implement the listener for that in your application.
let's say i have 10 domains, but every domain need to have delay between requests (to avoid dos situations and ip-banning).
I was thinking about async twisted that call a class, requests from requests module have delay(500) , but then another request to the same domain make it delay(250) and so on, and so on.
How to achive that static delay, and store somewhere something like queue for every domain (class) ?
It's custom web scraper, twisted is TCP but this shouldn't make difference. I don't want the code, but knowledge.
while using asyncio for async,
import asyncio
async def nested(x):
print(x)
await asyncio.sleep(1)
async def main():
# Schedule nested() to run soon concurrently
# with "main()".
for x in range(100):
await asyncio.sleep(1)
task = asyncio.create_task(nested(x))
# "task" can now be used to cancel "nested()", or
# can simply be awaited to wait until it is complete:
await task
asyncio.run(main())
with await in main, it will print every 2s,
without await in nasted, it will print every 1s.
without await task in main, it will print every 0s, even asyncio.sleep is declared.
It is totally hard to maintain if we are new in async.
Background:
We have a Python web application which uses SqlAlchemy as ORM. We run this application with Gunicorn(sync worker) currently. This application is only used to respond LONG RUNNING REQUESTS (i.e. serving big files, please don't advise using X-Sendfile/X-Accel-Redirect because the response is generated dynamically from Python app).
With Gunicorn sync workers, when we run 8 workers only 8 request is served simulatenously. Since all of these responses are IO bound, we want to switch to asyncronous worker type to get better throughput.
We have switched the worker type from sync to eventlet in Gunicorn configuration file. Now we can respond all of the requests simultaneously but another mysterious (mysterious to me) problem has occured.
In the application we have a scoped session object in module level. Following code is from our orm.py file:
uri = 'mysql://%s:%s#%s/%s?charset=utf8&use_unicode=1' % (\
config.MYSQL_USER,
config.MYSQL_PASSWD,
config.MYSQL_HOST,
config.MYSQL_DB,
)
engine = create_engine(uri, echo=False)
session = scoped_session(sessionmaker(
autocommit=False,
autoflush=False,
bind=engine,
query_cls=CustomQuery,
expire_on_commit=False
))
Our application uses the session like this:
from putio.models import session
f = session.query(File).first()
f.name = 'asdf'
session.add(f)
session.commit()
While we using sync worker, session was used from 1 request at a time. After we have switched to async eventlet worker, all requests in the same worker share the same session which is not desired. When the session is commited in one request, or an exception is happened, all other requests fail because the session is shared.
In documents of SQLAlchemy, said that scoped_session is used for seperated sessions in threaded environments. AFAIK requests in async workers run in same thread.
Question:
We want seperate sessions for each request in async worker. What is the correct way of using session with async workers in SQLAlchemy?
Use scoped_session's scopefunc argument.