start_server() method in python asyncio module - python-3.6

This question is really for the different coroutines in base_events.py and streams.py that deal with Network Connections, Network Servers and their higher API equivalents under Streams but since its not really clear how to group these functions I am going to attempt to use start_server() to explain what I don't understand about these coroutines and haven't found online (unless I missed something obvious).
When running the following code, I am able to create a server that is able to handle incoming messages from a client and I also periodically print out the number of tasks that the EventLoop is handling to see how the tasks work. What I'm surprised about is that after creating a server, the task is in the finished state not too long after the program starts. I expected that a task in the finished state was a completed task that no longer does anything other than pass back the results or exception.
However, of course this is not true, the EventLoop is still running and handling incoming messages from clients and the application is still running. Monitor however shows that all tasks are completed and no new task is dispatched to handle a new incoming message.
So my question is this:
What is going on underneath asyncio that I am missing that explains the behavior I am seeing? For example, I would have expected a task (or tasks created for each message) that is handling incoming messages in the pending state.
Why is the asyncio.Task.all_tasks() passing back finished tasks. I would have thought that once a task has completed it is garbage collected (so long as no other references are to it).
I have seen similar behavior with the other asyncio functions like using create_connection() with a websocket from a site. I know at the end of these coroutines, their result is usually a tuple such as (reader, writer) or (transport, protocol) but I don't understand how it all ties together or what other documentation/code to read to give me more insight. Any help is appreciated.
import asyncio
from pprint import pprint
async def echo_message(self, reader, writer):
data = await reader.read(1000)
message = data.decode()
addr = writer.get_extra_info('peername')
print('Received %r from %r' % (message, addr))
print('Send: %r' % message)
writer.write(message.encode())
await writer.drain()
print('Close the client socket')
writer.close()
async def monitor():
while True:
tasks = asyncio.Task.all_tasks()
pprint(tasks)
await asyncio.sleep(60)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.create_task(monitor())
loop.create_task(asyncio.start_server(echo_message, 'localhost', 7777, loop))
loop.run_forever()
Outputs:
###
# Soon after starting the application, monitor prints out:
###
{<Task pending coro=<start_server() running ...>,
<Task pending coro=<monitor() running ...>,
<Task pending coro=<BaseEventLoop._create_server_getaddrinfo() running ...>}
###
# After, things initialized and the server has started and the next print out is:
###
{<Task finished coro=<start_server() done ...>,
<Task pending coro=<monitor() running ...>,
<Task finished coro=<BaseEventLoop._create_server_getaddrinfo() done ...>}

Related

Running process not shown in active process Instances and difference between ASYNC & SYNC tasks

My workflow is quite simple, I have two script, first script is ASYNC and the second is SYNC. In each script I have a loop from 0 to Integer.MAX_VALUE as follow
for(int i=0;i<Integer.MAX_VALUE;i++)
System.out.println("value is "+i);
When I run my process, it starts working and I can see in my log file that it is being filled. But when I want to stop it, I find nothing in my active process instances, neither in completed process or even in aborted. even if I check my data base, I have nothing related to this process in the ProcessInstanceInfo or even ProcessInstanceLog. So weird isn't it? what could be the reason?
The goal from creating this workflow is to see the difference between ASYNC and SYNC tasks, because as I know that ASYNC tasks when they start running, the workflow don't have to wait until this task finish, but what I have is that my task ASYNC is still running and it didn't go to next task. So my second question is can any one give me the difference between ASYNC and SYNC with a good example to learn. I would appreciate if I'll get at least one answer on one of my two questions. thanks
What do you stop? Do you abort the process instance ?
In the scripts you can populate the process variables with kcontext.setVariable("variable_name","variable_value"). This will reflect in DB if you have defined the process variable persistent in the process model.
The tasks, the sync one will return the flow control to the process when is completed. In contrast to the async one, process flow will continue immediately after it sends the async tasks to execute.

How to make async requests using HTTPoison?

Background
We have an app that deals with a considerable amount of requests per second. This app needs to notify an external service, by making a GET call via HTTPS to one of our servers.
Objective
The objective here is to use HTTPoison to make async GET requests. I don't really care about the response of the requests, all I care is to know if they failed or not, so I can write any possible errors into a logger.
If it succeeds I don't want to do anything.
Research
I have checked the official documentation for HTTPoison and I see that they support async requests:
https://hexdocs.pm/httpoison/readme.html#usage
However, I have 2 issues with this approach:
They use flush to show the request was completed. I can't loggin into the app and manually flush to see how the requests are going, that would be insane.
They don't show any notifications mechanism for when we get the responses or errors.
So, I have a simple question:
How do I get asynchronously notified that my request failed or succeeded?
I assume that the default HTTPoison.get is synchronous, as shown in the documentation.
This could be achieved by spawning a new process per-request. Consider something like:
notify = fn response ->
# Any handling logic - write do DB? Send a message to another process?
# Here, I'll just print the result
IO.inspect(response)
end
spawn(fn ->
resp = HTTPoison.get("http://google.com")
notify.(resp)
end) # spawn will not block, so it will attempt to execute next spawn straig away
spawn(fn ->
resp = HTTPoison.get("http://yahoo.com")
notify.(resp)
end) # This will be executed immediately after previoius `spawn`
Please take a look at the documentation of spawn/1 I've pointed out here.
Hope that helps!

flask celery get task real time task status [duplicate]

How does one check whether a task is running in celery (specifically, I'm using celery-django)?
I've read the documentation, and I've googled, but I can't see a call like:
my_example_task.state() == RUNNING
My use-case is that I have an external (java) service for transcoding. When I send a document to be transcoded, I want to check if the task that runs that service is running, and if not, to (re)start it.
I'm using the current stable versions - 2.4, I believe.
Return the task_id (which is given from .delay()) and ask the celery instance afterwards about the state:
x = method.delay(1,2)
print x.task_id
When asking, get a new AsyncResult using this task_id:
from celery.result import AsyncResult
res = AsyncResult("your-task-id")
res.ready()
Creating an AsyncResult object from the task id is the way recommended in the FAQ to obtain the task status when the only thing you have is the task id.
However, as of Celery 3.x, there are significant caveats that could bite people if they do not pay attention to them. It really depends on the specific use-case scenario.
By default, Celery does not record a "running" state.
In order for Celery to record that a task is running, you must set task_track_started to True. Here is a simple task that tests this:
#app.task(bind=True)
def test(self):
print self.AsyncResult(self.request.id).state
When task_track_started is False, which is the default, the state show is PENDING even though the task has started. If you set task_track_started to True, then the state will be STARTED.
The state PENDING means "I don't know."
An AsyncResult with the state PENDING does not mean anything more than that Celery does not know the status of the task. This could be because of any number of reasons.
For one thing, AsyncResult can be constructed with invalid task ids. Such "tasks" will be deemed pending by Celery:
>>> task.AsyncResult("invalid").status
'PENDING'
Ok, so nobody is going to feed obviously invalid ids to AsyncResult. Fair enough, but it also has for effect that AsyncResult will also consider a task that has successfully run but that Celery has forgotten as being PENDING. Again, in some use-case scenarios this can be a problem. Part of the issue hinges on how Celery is configured to keep the results of tasks, because it depends on the availability of the "tombstones" in the results backend. ("Tombstones" is the term use in the Celery documentation for the data chunks that record how the task ended.) Using AsyncResult won't work at all if task_ignore_result is True. A more vexing problem is that Celery expires the tombstones by default. The result_expires setting by default is set to 24 hours. So if you launch a task, and record the id in long-term storage, and more 24 hours later, you create an AsyncResult with it, the status will be PENDING.
All "real tasks" start in the PENDING state. So getting PENDING on a task could mean that the task was requested but never progressed further than this (for whatever reason). Or it could mean the task ran but Celery forgot its state.
Ouch! AsyncResult won't work for me. What else can I do?
I prefer to keep track of goals than keep track of the tasks themselves. I do keep some task information but it is really secondary to keeping track of the goals. The goals are stored in storage independent from Celery. When a request needs to perform a computation depends on some goal having been achieved, it checks whether the goal has already been achieved, if yes, then it uses this cached goal, otherwise it starts the task that will effect the goal, and sends to the client that made the HTTP request a response that indicates it should wait for a result.
The variable names and hyperlinks above are for Celery 4.x. In 3.x the corresponding variables and hyperlinks are: CELERY_TRACK_STARTED, CELERY_IGNORE_RESULT, CELERY_TASK_RESULT_EXPIRES.
Every Task object has a .request property, which contains it AsyncRequest object. Accordingly, the following line gives the state of a Task task:
task.AsyncResult(task.request.id).state
You can also create custom states and update it's value duting task execution.
This example is from docs:
#app.task(bind=True)
def upload_files(self, filenames):
for i, file in enumerate(filenames):
if not self.request.called_directly:
self.update_state(state='PROGRESS',
meta={'current': i, 'total': len(filenames)})
http://celery.readthedocs.org/en/latest/userguide/tasks.html#custom-states
Old question but I recently ran into this problem.
If you're trying to get the task_id you can do it like this:
import celery
from celery_app import add
from celery import uuid
task_id = uuid()
result = add.apply_async((2, 2), task_id=task_id)
Now you know exactly what the task_id is and can now use it to get the AsyncResult:
# grab the AsyncResult
result = celery.result.AsyncResult(task_id)
# print the task id
print result.task_id
09dad9cf-c9fa-4aee-933f-ff54dae39bdf
# print the AsyncResult's status
print result.status
SUCCESS
# print the result returned
print result.result
4
Just use this API from celery FAQ
result = app.AsyncResult(task_id)
This works fine.
Answer of 2020:
#### tasks.py
#celery.task()
def mytask(arg1):
print(arg1)
#### blueprint.py
#bp.route("/args/arg1=<arg1>")
def sleeper(arg1):
process = mytask.apply_async(args=(arg1,)) #mytask.delay(arg1)
state = process.state
return f"Thanks for your patience, your job {process.task_id} \
is being processed. Status {state}"
Try:
task.AsyncResult(task.request.id).state
this will provide the Celery Task status. If Celery Task is already is under FAILURE state it will throw an Exception:
raised unexpected: KeyError('exc_type',)
I found helpful information in the
Celery Project Workers Guide inspecting-workers
For my case, I am checking to see if Celery is running.
inspect_workers = task.app.control.inspect()
if inspect_workers.registered() is None:
state = 'FAILURE'
else:
state = str(task.state)
You can play with inspect to get your needs.
First,in your celery APP:
vi my_celery_apps/app1.py
app = Celery(worker_name)
and next, change to the task file,import app from your celery app module.
vi tasks/task1.py
from my_celery_apps.app1 import app
app.AsyncResult(taskid)
try:
if task.state.lower() != "success":
return
except:
""" do something """
res = method.delay()
print(f"id={res.id}, state={res.state}, status={res.status} ")
print(res.get())
for simple tasks, we can use http://flower.readthedocs.io/en/latest/screenshots.html and http://policystat.github.io/jobtastic/ to do the monitoring.
and for complicated tasks, say a task which deals with a lot other modules. We recommend manually record the progress and message on the specific task unit.
Apart from above Programmatic approach
Using Flower Task status can be easily seen.
Real-time monitoring using Celery Events.
Flower is a web based tool for monitoring and administrating Celery clusters.
Task progress and history
Ability to show task details (arguments, start time, runtime, and more)
Graphs and statistics
Official Document:
Flower - Celery monitoring tool
Installation:
$ pip install flower
Usage:
http://localhost:5555
Update:
This has issue with versioning, flower (version=0.9.7) works only with celery (version=4.4.7) more over when you install flower, it uninstalls your higher version of celery into 4.4.7 and this never works for registered tasks

Autobahn+twisted reconnecting

I have a series of clients which need to be connected constantly to my server via ws protocol. For a number of different reasons, the connections occasionally drop. This is acceptable, but when it happens I'd like my clients to reconnect.
Currently my temporary workaround is to have a parent process launch the client and when it detects connection drop, terminate it (client never handles any critical data, there are no side effects to sigkill-ing it) and respawn a new client. While this does the job, I'd very much prefer to fix the actual problem.
This is roughly my client:
from autobahn.twisted.websocket import WebSocketClientProtocol, WebSocketClientFactory
from twisted.internet import reactor
from threading import Thread
from time import sleep
class Client:
def __init__(self):
self._kill = False
self.factory = WebSocketClientFactory("ws://0.0.0.0")
self.factory.openHandshakeTimeout = 60 # ensures handshake doesnt timeout due to technical limitations
self.factory.protocol = self._protocol_factory()
self._conn = reactor.connectTCP("0.0.0.0", 1234, self.factory)
reactor.run()
def _protocol_factory(self):
class ClientProtocol(WebSocketClientProtocol):
def onConnect(self, response):
Thread(target=_self.mainloop, daemon=True).start()
def onClose(self, was_clean, code, reason):
_self.on_cleanup()
_self = self
return ClientProtocol
def on_cleanup(self):
self._kill = True
sleep(30)
# Wait for self.mainloop to finish.
# It is guaranteed to exit within 30 seconds of setting _kill flag
self._kill = False
self._conn = reactor.connectTCP("0.0.0.0", 1234, self.factory)
def mainloop(self):
while not self._kill:
sleep(1) # does some work
This code makes client work correctly until first connection drop at which point it attempts to reconnect. No exceptions are raised during the process, it appears that everything went correctly clientside, the onConnect is called and new mainloop starts, but the server never received that second handshake. Client seems to think it is connected, though.
What am I doing wrong? Why could this be happening?
I'm not a twisted expert and can't really tell what you are doing wrong, but I'm currently working with Autobahn in a project and I solved the reconnection problems using the ReconnectingClientFactory. Maybe you want to check some examples of the use of ReconnectingClientFactory with Autobahn.

F# Http.AsyncRequestStream just 'hangs' on long queries

I am working with:
let callTheAPI = async {
printfn "\t\t\tMAKING REQUEST at %s..." (System.DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ss"))
let! response = Http.AsyncRequestStream(url,query,headers,httpMethod,requestBody)
printfn "\t\t\t\tREQUEST MADE."
}
And
let cts = new System.Threading.CancellationTokenSource()
let timeout = 1000*60*4//4 minutes (4 mins no grace)
cts.CancelAfter(timeout)
Async.RunSynchronously(callTheAPI,timeout,cts.Token)
use respStrm = response.ResponseStream
respStrm.Flush()
writeLinesTo output (responseLines respStrm)
To call a web API (REST) and the let! response = Http.AsyncRequestStream(url,query,headers,httpMethod,requestBody) just hangs on certain queries. Ones that take a long time (>4 minutes) particularly. This is why I have made it Async and put a 4 minute timeout. (I collect the calls that timeout and make them with smaller time range parameters).
I started Http.RequestStream from FSharp.Data first, but I couldn't add a timeout to this so the script would just 'hang'.
I have looked at the API's IIS server and the application pool Worker Process active requests in IIS manager and I can see the requests come in and go again. They then 'vanish' and the F# script hangs. I can't find an error message anywhere on the script side or server side.
I included the Flush() and removed the timeout and it still hung. (Removing the Async in the process)
Additional:
Successful calls are made. Failed calls can be followed by successful calls. However, it seems to get to a point where all the calls time out and the do so without even reaching the server any more. (Worker Process Active Requests doesn't show the query)
Update:
I made the Fsx script output the queries and ran them through IRM with now issues (I have timeout and it never locks up). I have a suspicion that there is an issue with FSharp.Data.Http.
Async.RunSynchronously blocks. Read the remarks section in the docs: RunSynchronously. Instead, use Async.AwaitTask.

Resources