Suppose, I have a few async functions, f1, f2 and f3. I want to execute these functions in a sequential order. The easiest way to do this would be to await on them:
async def foo():
await f1()
# Do something else
await f2()
# Do something else
await f3()
# Do something else
However, I don't care about the results of these async functions, and I would like to continue with the execution of the rest of the function after scheduling the async functions.
From the asyncio tasks documentation it seems that asyncio.ensure_future() can help me with this. I used the following code to test this out, and the synchronous parts of foo() as per my expectations. However, bar() never executes past asyncio.sleep()
import asyncio
async def bar(name):
print(f'Sleep {name}')
await asyncio.sleep(3)
print(f'Wakeup {name}')
async def foo():
print('Enter foo')
for i in [1, 2, 3]:
asyncio.ensure_future(bar(i))
print(f'Scheduled bar({i}) for execution')
print('Exit foo')
loop = asyncio.get_event_loop()
loop.run_until_complete(foo())
The output for the above code:
Enter foo
Scheduled bar(1) for execution
Scheduled bar(2) for execution
Scheduled bar(3) for execution
Exit foo
Sleep 1
Sleep 2
Sleep 3
So, what is the proper method to do what I'm looking for?
I have a few async functions, f1, f2 and f3. I want to execute these functions in a sequential order. [...] I would like to continue with the execution of the rest of the function after scheduling the async functions.
The straightforward way to do this is by using a helper function and letting it run it in the background:
async def foo():
async def run_fs():
await f1()
await f2()
await f3()
loop = asyncio.get_event_loop()
loop.create_task(run_fs())
# proceed with stuff that foo needs to do
...
create_task submits a coroutine to the event loop. You can also use ensure_future for that, but create_task is preferred when spawning a coroutine.
The code in the question has two issues: first, the functions are not run sequentially, but in parallel. This is fixed as shown above, by running a single async function in the background that awaits the three in order. The second problem is that in asyncio run_until_complete(foo()) only waits for foo() to finish, not also for the tasks spawned by foo (though there are asyncio alternatives that address this). If you want run_until_complete(foo()) to wait for run_fs to finish, foo has to await it itself.
Fortunately, that is trivial to implement - just add another await at the end of foo(), awaiting the task created for run_fs earlier. If the task is already done by that point, the await will exit immediately, otherwise it will wait.
async def foo():
async def run_fs():
await f1()
await f2()
await f3()
loop = asyncio.get_event_loop()
f_task = loop.create_task(run_fs())
# proceed with stuff that foo needs to do
...
# finally, ensure that the fs have finished by the time we return
await f_task
Related
I was recently informed that in
async {
return! async { return "hi" } }
|> Async.RunSynchronously
|> printfn "%s"
the nested Async<'T> (async { return 1 }) would not be sent to the thread pool for evaluation, whereas in
async {
use ms = new MemoryStream [| 0x68uy; 0x69uy |]
use sr = new StreamReader (ms)
return! sr.ReadToEndAsync () |> Async.AwaitTask }
|> Async.RunSynchronously
|> printfn "%s"
the nested Async<'T> (sr.ReadToEndAsync () |> Async.AwaitTask) would be. What is it about an Async<'T> that decides whether it's sent to the thread pool when it's executed in an asynchronous operation like let! or return!? In particular, how would you define one which is sent to the thread pool? What code do you have to include in the async block, or in the lambda passed into Async.FromContinuations?
TL;DR: It's not quite like that. The async itself doesn't "send" anything to the thread pool. All it does is just run continuations until they stop. And if one of those continuations decides to continue on a new thread - well, that's when thread switching happens.
Let's set up a small example to illustrate what happens:
let log str = printfn $"{str}: thread = {Thread.CurrentThread.ManagedThreadId}"
let f = async {
log "1"
let! x = async { log "2"; return 42 }
log "3"
do! Async.Sleep(TimeSpan.FromSeconds(3.0))
log "4"
}
log "starting"
f |> Async.StartImmediate
log "started"
Console.ReadLine()
If you run this script, it will print, starting, then 1, 2, 3, then started, then wait 3 seconds, and then print 4, and all of them except 4 will have the same thread ID. You can see that everything until Async.Sleep is executed synchronously on the same thread, but after that async execution stops and the main program execution continues, printing started and then blocking on ReadLine. By the time Async.Sleep wakes up and wants to continue execution, the original thread is already blocked on ReadLine, so the async computation gets to continue running on a new one.
What's going on here? How does this function?
First, the way the async computation is structured is in "continuation-passing style". It's a technique where every function doesn't return its result to the caller, but calls another function instead, passing the result as its parameter.
Let me illustrate with an example:
// "Normal" style:
let f x = x + 5
let g x = x * 2
printfn "%d" (f (g 3)) // prints 11
// Continuation-passing style:
let f x next = next (x + 5)
let g x next = next (x * 2)
g 3 (fun res1 -> f res1 (fun res2 -> printfn "%d" res2))
This is called "continuation-passing" because the next parameters are called "continuations" - i.e. they're functions that express how the program continues after calling f or g. And yes, this is exactly what Async.FromContinuations means.
Seeming very silly and roundabout on the surface, what this allows us to do is for each function to decide when, how, or even if its continuation happens. For example, our f function from above could be doing something asynchronous instead of just plain returning the result:
let f x next = httpPost "http://calculator.com/add5" x next
Coding it in continuation-passing style would allow such function to not block the current thread while the request to calculator.com is in flight. What's wrong with blocking the thread, you ask? I'll refer you to the original answer that prompted your question in the first place.
Second, when you write those async { ... } blocks, the compiler gives you a little help. It takes what looks like a step-by-step imperative program and "unrolls" it into a series of continuation-passing calls. The "breaking" points for this unfolding are all the constructs that end with a bang - let!, do!, return!.
The above async block, for example, would look somethiing like this (F#-ish pseudocode):
let return42 onDone =
log "2"
onDone 42
let f onDone =
log "1"
return42 (fun x ->
log "3"
Async.Sleep (3 seconds) (fun () ->
log "4"
onDone ()
)
)
Here, you can plainly see that the return42 function simply calls its continuation right away, thus making the whole thing from log "1" to log "3" completely synchronous, whereas the Async.Sleep function doesn't call its continuation right away, instead scheduling it to be run later (in 3 seconds) on the thread pool. That's where the thread switching happens.
And here, finally, lies the answer to your question: in order to have the async computation jump threads, your callback passed to Async.FromContinuations should do anything but call the success continuation immediately.
A few notes for further investigation
The onDone technique in the above example is technically called "monadic bind", and indeed in real F# programs it's represented by the async.Bind method. This answer might also be of help understanding the concept.
The above is a bit of an oversimplification. In reality the async execution is a bit more complicated than that. Internally it uses a technique called "trampoline", which in plain terms is just a loop that runs a single thunk on every turn, but crucially, the running thunk can also "ask" it to run another thunk, and if it does, the loop will do so, and so on, forever, until the next thunk doesn't ask to run another thunk, and then the whole thing finally stops.
I specifically used Async.StartImmediate to start the computation in my example, because Async.StartImmediate will do just what it says on the tin: it will start running the computation immediately, right there. That's why everything ran on the same thread as the main program. There are many alternative starting functions in the Async module. For example, Async.Start will start the computation on the thread pool. The lines from log "1" to log "3" will still all happen synchronously, without thread switching between them, but it will happen on a different thread from log "start" and log "starting". In this case thread switching will happen before the async computation even starts, so it doesn't count.
I'm submitting coroutines to an event loop in a separate thread. This all works well when I wait on each future in sequence with future.next(). But I want to now wait on the first completed future in a list of futures. I'm trying to use asyncio.wait(...) for that, but I appear to be using it incorrectly.
Below is a simplified example. I'm getting the exception TypeError: An asyncio.Future, a coroutine or an awaitable is required at the line done, pending = future.result().
This works if I pass [c1, c2, c3] to asyncio.wait([c1, c2, c3], return_when=asyncio.FIRST_COMPLETE), but I am submitting tasks at random times, so I only can gather the set of futures, not the original tasks. And the documentation clearly states that you can use futures.
coroutine asyncio.wait(futures, *, loop=None, timeout=None, return_when=ALL_COMPLETED)
Wait for the Futures and coroutine objects given by the sequence futures to complete. Coroutines will be wrapped in Tasks. Returns two sets of Future: (done, pending).
import asyncio
import threading
async def generate():
await asyncio.sleep(10)
return 'Hello'
def run_loop(loop):
asyncio.set_event_loop(loop)
loop.run_forever()
event_loop = asyncio.get_event_loop()
threading.Thread(target=lambda: run_loop(event_loop)).start()
c1 = generate() # submitted at a random time
c2 = generate() # submitted at a random time
c3 = generate() # submitted at a random time
f1 = asyncio.run_coroutine_threadsafe(c1, event_loop)
f2 = asyncio.run_coroutine_threadsafe(c2, event_loop)
f3 = asyncio.run_coroutine_threadsafe(c3, event_loop)
all_futures = [f1, f2, f3]
# I'm doing something wrong in these 3 lines
waitable = asyncio.wait(all_futures, return_when=asyncio.FIRST_COMPLETED)
future = asyncio.run_coroutine_threadsafe(waitable, event_loop)
done, pending = future.result() # This returns my TypeError exception
for d in done:
print(d.result())
asyncio.wait expects asyncio futures and works inside an event loop. To wait for multiple concurrent.futures futures (and outside of an event loop), use concurrent.futures.wait instead:
done, pending = concurrent.futures.wait(
all_futures, return_when=concurrent.futures.FIRST_COMPLETED)
Note that your idea would have worked if you had access to the underlying asyncio futures. For example (untested):
async def submit(coro):
# submit the coroutine and return the asyncio task (future)
return asyncio.create_task(coro)
# ...generate() as before
# note that we use result() to get to the asyncio futures:
f1 = asyncio.run_coroutine_threadsafe(submit(c1), event_loop).result()
f2 = asyncio.run_coroutine_threadsafe(submit(c2), event_loop).result()
f3 = asyncio.run_coroutine_threadsafe(submit(c3), event_loop).result()
# these should be waitable by submitting wait() to the event loop
done, pending = asyncio.run_coroutine_threadsafe(
asyncio.wait([f1, f2, f3], return_when=asyncio.FIRST_COMPLETED)).result()
This answer was instrumental in helping answer this:
Create generator that yields coroutine results as the coroutines finish
asyncio.wait(...) can't take futures, only coroutines and awaitables that have NOT yet been scheduled. The correct way to do this is with a callback. When the future finishes it can just add itself to a thread safe queue and you can pull from that queue. The example below fixes the problem in the question:
import asyncio
import threading
import queue
import random
async def generate(i):
await asyncio.sleep(random.randint(5, 10))
return 'Hello {}'.format(i)
def run_loop(loop):
asyncio.set_event_loop(loop)
loop.run_forever()
def done(fut):
q.put(fut)
event_loop = asyncio.get_event_loop()
threading.Thread(target=lambda: run_loop(event_loop)).start()
q = queue.Queue()
c1 = generate(1)
c2 = generate(2)
c3 = generate(3)
f1 = asyncio.run_coroutine_threadsafe(c1, event_loop)
f2 = asyncio.run_coroutine_threadsafe(c2, event_loop)
f3 = asyncio.run_coroutine_threadsafe(c3, event_loop)
f1.add_done_callback(lambda fut: q.put(fut))
f2.add_done_callback(lambda fut: q.put(fut))
f3.add_done_callback(lambda fut: q.put(fut))
print(q.get().result())
print(q.get().result())
print(q.get().result())
Running on Windows 10, Python 3.6.3, running inside PyCharm IDE, this code:
import asyncio
import json
import datetime
import time
from aiohttp import ClientSession
async def get_tags():
url_tags = f"{BASE_URL}tags?access_token={token}"
async with ClientSession() as session:
async with session.get(url_tags) as response:
return await response.read()
async def get_trips(vehicles):
url_trips = f"{BASE_URL}fleet/trips?access_token={token}"
for vehicle in vehicles:
body_trips = {"groupId": groupid, "vehicleId": vehicle['id'], "startMs": int(start_ms), "endMs": int(end_ms)}
async with ClientSession() as session:
async with session.post(url_trips, json=body_trips) as response:
yield response.read()
async def main():
tags = await get_tags()
tag_list = json.loads(tags.decode('utf8'))['tags']
veh = tag_list[0]['vehicles'][0:5]
return [await v async for v in get_trips(veh)]
t1 = time.time()
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
t2 = time.time()
print(t2 - t1)
seems to be running completely synchronously, time increases linearly as size of loop increases. Following examples from a book I read, "Using Asyncio in Python 3", the code should be asynchronous; am I missing something here? Similar code in C# completes in a few seconds with about 2,000 requests, takes about 14s to run 20 requests here (6s to run 10).
Edit:
re-wrote some code:
async def get_trips(vehicle):
url_trips = f"{BASE_URL}fleet/trips?access_token={token}"
#for vehicle in vehicles:
body_trips = {"groupId": groupid, "vehicleId": vehicle['id'], "startMs": int(start_ms), "endMs": int(end_ms)}
async with ClientSession() as session:
async with session.post(url_trips, json=body_trips) as response:
res = await response.read()
return res
t1 = time.time()
loop = asyncio.new_event_loop()
x = loop.run_until_complete(get_tags())
tag_list = json.loads(x.decode('utf8'))['tags']
veh = tag_list[0]['vehicles'][0:10]
tasks = []
for v in veh:
tasks.append(loop.create_task(get_trips(v)))
loop.run_until_complete(asyncio.wait(tasks))
t2 = time.time()
print(t2 - t1)
This is in fact running asynchronously, but now I can't use the return value from my get_trips function and I don't really see a clear way to use it. Nearly all tutorials I see just have the result being printed, which is basically useless. A little confused on how async is supposed to work in Python, and why some things with the async keyword attached to them run synchronously while others don't.
Simple question is: how do I add the return result of a task to a list or dictionary? More advanced question, could someone explain why my code in the first example runs synchronously while the code in the 2nd part runs asynchronously?
Edit 2:
replacing:
loop.run_until_complete(asyncio.wait(tasks))
with:
x = loop.run_until_complete(asyncio.gather(*tasks))
Fixes the simple problem; now just curious why the async list comprehension doesn't run asynchronously
now just curious why the async list comprehension doesn't run asynchronously
Because your comprehension iterates over an async generator which produces a single task, which you then await immediately, thus killing the parallelism. That is roughly equivalent to this:
for vehicle in vehicles:
trips = await fetch_trips(vehicle)
# do something with trips
To make it parallel, you can use wait or gather as you've already discovered, but those are not mandatory. As soon as you create a task, it will run in parallel. For example, this should work as well:
# step 1: create the tasks and store (task, vehicle) pairs in a list
tasks = [(loop.create_task(get_trips(v)), v)
for v in vehicles]
# step 2: await them one by one, while others are running:
for t, v in tasks:
trips = await t
# do something with trips for vehicle v
I'm trying to understand async workflows in F# but I found one part that I really don't understand.
The following code works fine:
let asynWorkflow = async{
let! result = Stream.TryOpenAsync(partition) |> Async.AwaitTask
return result
}
let stream = Async.RunSynchronously asynWorkflow
|> fun openResult -> if openResult.Found then openResult.Stream else Stream(partition)
I define a async workflow where TryOpenAsync returns a Task<StreamOpenResult> type. I convert it to Async<StreamOpenResult> with Async.AwaitTask. (Side quest: "Await"Task? It doesn't await it just convert it, does it? I think it has nothing to do with Task.Wait or the await keyword). I "await" it with let! and return it.
To start the workflow I use RunSynchronously which should start the workflow and return the result (bind it). On the result I check if the Stream is Found or not.
But now to my first question. Why do I have to wrap the TryOpenAsync call in another async computation and let! ("await") it?
E.g. the following code does not work:
let asynWorkflow = Stream.TryOpenAsync(partition) |> Async.AwaitTask
let stream = Async.RunSynchronously asynWorkflow
|> fun openResult -> if openResult.Found then openResult.Stream else Stream(partition)
I thought the AwaitTask makes it an Async<T> and RunSynchronously should start it. Then use the result. What do I miss?
My second question is why is there any "Async.Let!" function available? Maybe because it does not work or better why doesn't it work with the following code?
let ``let!`` task = async{
let! result = task |> Async.AwaitTask
return result
}
let stream = Async.RunSynchronously ( ``let!`` (Stream.TryOpenAsync(partition)) )
|> fun openResult -> if openResult.Found then openResult.Stream else Stream(partition)
I just insert the TryOpenAsync as a parameter but it does not work. By saying does not work I mean the whole FSI will hang. So it has something to do with my async/"await".
--- Update:
Result of working code in FSI:
>
Real: 00:00:00.051, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
val asynWorkflow : Async<StreamOpenResult>
val stream : Stream
Result of not working code in FSI:
>
And you cannot execute anything in the FSI anymore
--- Update 2
I'm using Streamstone. Here the C# example: https://github.com/yevhen/Streamstone/blob/master/Source/Example/Scenarios/S04_Write_to_stream.cs
and here the Stream.TryOpenAsync: https://github.com/yevhen/Streamstone/blob/master/Source/Streamstone/Stream.Api.cs#L192
I can't tell you why the second example doesn't work without knowing what Stream and partition are and how they work.
However, I want to take this opportunity to point out that the two examples are not strictly equivalent.
F# async is kind of like a "recipe" for what to do. When you write async { ... }, the resulting computation is just sitting there, not actually doing anything. It's more like declaring a function than like issuing a command. Only when you "start" it by calling something like Async.RunSynchronously or Async.Start does it actually run. A corollary is that you can start the same async workflow multiple times, and it's going to be a new workflow every time. Very similar to how IEnumerable works.
C# Task, on the other hand, is more like a "reference" to an async computation that is already running. The computation starts as soon as you call Stream.TryOpenAsync(partition), and it's impossible to obtain a Task instance before the task actually starts. You can await the resulting Task multiple times, but each await will not result in a fresh attempt to open a stream. Only the first await will actually wait for the task's completion, and every subsequent one will just return you the same remembered result.
In the async/reactive lingo, F# async is what you call "cold", while C# Task is referred to as "hot".
The second code block looks like it should work to me. It does run it if I provide dummy implementations for Stream and StreamOpenResult.
You should avoid using Async.RunSynchronously wherever possible because it defeats the purpose of async. Put all of this code within a larger async block and then you will have access to the StreamOpenResult:
async {
let! openResult = Stream.TryOpenAsync(partition) |> Async.AwaitTask
let stream = if openResult.Found then openResult.Stream else Stream(partition)
return () // now do something with the stream
}
You may need to put a Async.Start or Async.RunSynchronously at the very outer edge of your program to actually run it, but it's better if you have the async (or convert it to a Task) and pass it to some other code (e.g. a web framework) that can call it in a non-blocking manner.
Not that I want to answer your question with another question, but: why are you doing code like this anyway? That might help to understand it. Why not just:
let asyncWorkflow = async {
let! result = Stream.TryOpenAsync(partition) |> Async.AwaitTask
if result.Found then return openResult.Stream else return Stream(partition) }
There's little point in creating an async workflow only to immediately call RunSynchronously on it - it's similar to calling .Result on a Task - it just blocks the current thread until the workflow returns.
Is a loop.close() needed prior to returning async values in the below code?
import asyncio
async def request_url(url):
return url
def fetch_urls(x):
loop = asyncio.get_event_loop()
return loop.run_until_complete(asyncio.gather(*[request_url(url) for url in x]))
That is, should fetch_urls be like this instead?:
def fetch_urls(x):
loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*[request_url(url) for url in x]))
loop.close()
return results
If the loop.close() is needed, then how can fetch_urls be called again without raising the exception: RuntimeError: Event loop is closed?
A previous post states that it is good practice to close the loops and start new ones however it does not specify how new loops can be opened?
You can also keep the event loop alive, and close it the end of your program, using run_until_complete more than once:
import asyncio
async def request_url(url):
return url
def fetch_urls(loop, urls):
tasks = [request_url(url) for url in urls]
return loop.run_until_complete(asyncio.gather(*tasks, loop=loop))
loop = asyncio.get_event_loop()
try:
print(fetch_urls(loop, ['a1', 'a2', 'a3']))
print(fetch_urls(loop, ['b1', 'b2', 'b3']))
print(fetch_urls(loop, ['c1', 'c2', 'c3']))
finally:
loop.close()
No, the async function (request in this case) should not be closing the event loop. The command loop.run_until_complete will close stop the event loop as soon as it runs out of things to do.
fetch_urls should be the second version -- that is, it will get an event loop, run the event loop until there is nothing left to do, and then closes it loop.close().