Distributing workload - asynchronous

In my application I have a number of objects which can perform some long lasting calculations, let's call them clients. Also I have a number of objects which contain descriptions of tasks to be calculated. Something like this:
let clients = [1..4]
let tasks = [1..20]
let calculate c t =
printf "Starting task %d with client %d\n" t c
Thread.Sleep(3000)
printf "Finished task %d with client %d\n" t c
With one client I can only start one task at a time.
I want to create a function/class which will assign the tasks to the clients and perform the calculations. I've done this in C# using a queue of clients, so as soon as a new task is assigned to a client, this client is removed from the queue and when the calculations are finished, the client is released and is placed in the queue again. Now I'm interested in implementing this in a functional way. I've tried to experiment with asynchronous workflows, but I cannot think of a proper way to implement this.
Here's an F#-like code that I was trying to make work, but couldn't:
let rec distribute clients tasks calculate tasks_to_wait =
match clients, tasks with
| _ , [] -> () // No tasks - we're done!
| [], th::tt -> // No free clients, but there are still tasks to calculate.
let released_client = ?wait_for_one? tasks_to_wait
distribute [released_client] tasks calculate ?tasks_to_wait?
| ch::ct, th::tt -> // There are free clients.
let task_to_wait() =
do calculate ch th
ch
distribute ct tt calculate (task_to_wait::tasks_to_wait)
How do I do this? Is there a functional design pattern to solve this task?

There are various ways to do this. It would be perfectly fine to use some concurrent collection (like a queue) from .NET 4.0 from F#, as this is often the easiest thing to do, if the collection already implements the functionality you need.
The problem requires concurrent access to some resource, so it cannot be solved in a purely functional way, but F# provides agents, which give you a nice alternative way to solve the problem.
The following snippet implements an agent that schedules the work items. It uses its own mailbox to keep the available clients (which gives you a nice way to wait for the next available client). After the agent is created, you can just send all the initial clients. It will continue iterating over the tasks while clients are available. When there is no available client, it will block (asynchronously - without blocking of threads), until some previous processing completes and a client is sent back to the agent's mailbox:
let workloadAgent = MailboxProcessor.Start(fun agent ->
// The body of the agent, runs as a recursive loop that iterates
// over the tasks and waits for client before it starts processing
let rec loop tasks = async {
match tasks with
| [] ->
// No more work to schedule (but there may be still calculation,
// which was started earlier, going on in the background)
()
| task::tasks ->
// Wait for the next client to become available by receiving the
// next message from the inbox - the messages are clients
let! client = agent.Receive()
// Spanw processing of the task using the client
async { calculate client task
// When the processing completes, send the client
// back to the agent, so that it can be reused
agent.Post(client) }
|> Async.Start
// Continue processing the rest of the tasks
return! loop tasks }
// Start the agent with the initial list of tasks
loop tasks )
// Add all clients to the agent, so that it can start
for client in clients do workloadAgent.Post(client)
If you're not familiar with F# agents, then the MSDN section Server-Side Programming has some useful information.

Related

start_server() method in python asyncio module

This question is really for the different coroutines in base_events.py and streams.py that deal with Network Connections, Network Servers and their higher API equivalents under Streams but since its not really clear how to group these functions I am going to attempt to use start_server() to explain what I don't understand about these coroutines and haven't found online (unless I missed something obvious).
When running the following code, I am able to create a server that is able to handle incoming messages from a client and I also periodically print out the number of tasks that the EventLoop is handling to see how the tasks work. What I'm surprised about is that after creating a server, the task is in the finished state not too long after the program starts. I expected that a task in the finished state was a completed task that no longer does anything other than pass back the results or exception.
However, of course this is not true, the EventLoop is still running and handling incoming messages from clients and the application is still running. Monitor however shows that all tasks are completed and no new task is dispatched to handle a new incoming message.
So my question is this:
What is going on underneath asyncio that I am missing that explains the behavior I am seeing? For example, I would have expected a task (or tasks created for each message) that is handling incoming messages in the pending state.
Why is the asyncio.Task.all_tasks() passing back finished tasks. I would have thought that once a task has completed it is garbage collected (so long as no other references are to it).
I have seen similar behavior with the other asyncio functions like using create_connection() with a websocket from a site. I know at the end of these coroutines, their result is usually a tuple such as (reader, writer) or (transport, protocol) but I don't understand how it all ties together or what other documentation/code to read to give me more insight. Any help is appreciated.
import asyncio
from pprint import pprint
async def echo_message(self, reader, writer):
data = await reader.read(1000)
message = data.decode()
addr = writer.get_extra_info('peername')
print('Received %r from %r' % (message, addr))
print('Send: %r' % message)
writer.write(message.encode())
await writer.drain()
print('Close the client socket')
writer.close()
async def monitor():
while True:
tasks = asyncio.Task.all_tasks()
pprint(tasks)
await asyncio.sleep(60)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.create_task(monitor())
loop.create_task(asyncio.start_server(echo_message, 'localhost', 7777, loop))
loop.run_forever()
Outputs:
###
# Soon after starting the application, monitor prints out:
###
{<Task pending coro=<start_server() running ...>,
<Task pending coro=<monitor() running ...>,
<Task pending coro=<BaseEventLoop._create_server_getaddrinfo() running ...>}
###
# After, things initialized and the server has started and the next print out is:
###
{<Task finished coro=<start_server() done ...>,
<Task pending coro=<monitor() running ...>,
<Task finished coro=<BaseEventLoop._create_server_getaddrinfo() done ...>}

AzureFunctions: How to make a sync call to a QueueTrigger Function for an automated functional test

Let me introduce the scenario:
I need to test an AzureFunction with a queue trigger:
[FunctionName("AFunction")]
public async Task DispatchAction([QueueTrigger("queuename")] string message)
{
await DoMyLogicAsync();
}
The test needs to be run by the "functional-test-container" in my docker-compose testing env, which is made up by:
a) functional-test-container: a .net core container running an nUnit test suite
b) azure-function-container: this container hosts the azure function
c) azurite-container: this container hosts the queue server
d) sql-server-container
e) wiremock-container
The test logic is the following:
Clear the sql database, the queue and wiremock status
Prepare the wiremock stubs
Somehow trigger the function
wait for the function to end
make assertions on what the function produced in sql server, in the queue and on what wiremock's stubs have been called
As far as I know I have 2 ways of triggering the function:
a) pushing a message in the queue
b) using azure function's admin API /admin/functions/afunction
the problem is that both of them don't give any hint on when the function ends its execution.
Here it is my question: is there a way to call the function in a "sync" way (so that I can know when the execution ends)?
I don't think it can be implemented. The queue trigger function runs as an instance in azure server, we can just trigger it to run. It doesn't response any data like HttpTrigger function. So it can't be executed synchronously in your entire process.
To solve your problem, I think you can just add some code to do an operation at the end of your function. The operation is used to let you know the function execution is completed. Or another solution is move the steps after function into your function code.

Why do I get a deadlock when using Tokio with a std::sync::Mutex?

I stumbled upon a deadlock condition when using Tokio:
use tokio::time::{delay_for, Duration};
use std::sync::Mutex;
#[tokio::main]
async fn main() {
let mtx = Mutex::new(0);
tokio::join!(work(&mtx), work(&mtx));
println!("{}", *mtx.lock().unwrap());
}
async fn work(mtx: &Mutex<i32>) {
println!("lock");
{
let mut v = mtx.lock().unwrap();
println!("locked");
// slow redis network request
delay_for(Duration::from_millis(100)).await;
*v += 1;
}
println!("unlock")
}
Produces the following output, then hangs forever.
lock
locked
lock
According to the Tokio docs, using std::sync::Mutex is ok:
Contrary to popular belief, it is ok and often preferred to use the ordinary Mutex from the standard library in asynchronous code.
However, replacing the Mutex with a tokio::sync::Mutex will not trigger the deadlock, and everything works "as intended", but only in the example case listed above. In a real world scenario, where the delay is caused by some Redis request, it will still fail.
I think it might be because I am actually not spawning threads at all, and therefore, even though executed "in parallel", I will lock on the same thread as await just yields execution.
What is the Rustacean way to achieve what I want without spawning a separate thread?
The reason why it is not OK to use a std::sync::Mutex here is that you hold it across the .await point. In this case:
task 1 holds the Mutex, but got suspended on delay_for.
task 2 gets scheduled and runs, but can not obtain the Mutex since its still owned by task 1. It will block synchronously on obtaining the Mutex.
Since task 2 is blocked, this also means the runtime thread is fully blocked. It can not actually go into its timer handling state (which happens when the runtime is idle and does not handle user tasks), and thereby can not resume task 1.
Therefore you now are observing a deadlock.
==> If you need to hold a Mutex across an .await point you have to use an async Mutex. Synchronous Mutexes are ok to use with async programs as the tokio documentation describes - but they may not be held across .await points.

How to make async requests using HTTPoison?

Background
We have an app that deals with a considerable amount of requests per second. This app needs to notify an external service, by making a GET call via HTTPS to one of our servers.
Objective
The objective here is to use HTTPoison to make async GET requests. I don't really care about the response of the requests, all I care is to know if they failed or not, so I can write any possible errors into a logger.
If it succeeds I don't want to do anything.
Research
I have checked the official documentation for HTTPoison and I see that they support async requests:
https://hexdocs.pm/httpoison/readme.html#usage
However, I have 2 issues with this approach:
They use flush to show the request was completed. I can't loggin into the app and manually flush to see how the requests are going, that would be insane.
They don't show any notifications mechanism for when we get the responses or errors.
So, I have a simple question:
How do I get asynchronously notified that my request failed or succeeded?
I assume that the default HTTPoison.get is synchronous, as shown in the documentation.
This could be achieved by spawning a new process per-request. Consider something like:
notify = fn response ->
# Any handling logic - write do DB? Send a message to another process?
# Here, I'll just print the result
IO.inspect(response)
end
spawn(fn ->
resp = HTTPoison.get("http://google.com")
notify.(resp)
end) # spawn will not block, so it will attempt to execute next spawn straig away
spawn(fn ->
resp = HTTPoison.get("http://yahoo.com")
notify.(resp)
end) # This will be executed immediately after previoius `spawn`
Please take a look at the documentation of spawn/1 I've pointed out here.
Hope that helps!

F# Http.AsyncRequestStream just 'hangs' on long queries

I am working with:
let callTheAPI = async {
printfn "\t\t\tMAKING REQUEST at %s..." (System.DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ss"))
let! response = Http.AsyncRequestStream(url,query,headers,httpMethod,requestBody)
printfn "\t\t\t\tREQUEST MADE."
}
And
let cts = new System.Threading.CancellationTokenSource()
let timeout = 1000*60*4//4 minutes (4 mins no grace)
cts.CancelAfter(timeout)
Async.RunSynchronously(callTheAPI,timeout,cts.Token)
use respStrm = response.ResponseStream
respStrm.Flush()
writeLinesTo output (responseLines respStrm)
To call a web API (REST) and the let! response = Http.AsyncRequestStream(url,query,headers,httpMethod,requestBody) just hangs on certain queries. Ones that take a long time (>4 minutes) particularly. This is why I have made it Async and put a 4 minute timeout. (I collect the calls that timeout and make them with smaller time range parameters).
I started Http.RequestStream from FSharp.Data first, but I couldn't add a timeout to this so the script would just 'hang'.
I have looked at the API's IIS server and the application pool Worker Process active requests in IIS manager and I can see the requests come in and go again. They then 'vanish' and the F# script hangs. I can't find an error message anywhere on the script side or server side.
I included the Flush() and removed the timeout and it still hung. (Removing the Async in the process)
Additional:
Successful calls are made. Failed calls can be followed by successful calls. However, it seems to get to a point where all the calls time out and the do so without even reaching the server any more. (Worker Process Active Requests doesn't show the query)
Update:
I made the Fsx script output the queries and ran them through IRM with now issues (I have timeout and it never locks up). I have a suspicion that there is an issue with FSharp.Data.Http.
Async.RunSynchronously blocks. Read the remarks section in the docs: RunSynchronously. Instead, use Async.AwaitTask.

Resources