async twisted, synchronous requests per domain (with delay)

async twisted, synchronous requests per domain (with delay) - web-scraping

let's say i have 10 domains, but every domain need to have delay between requests (to avoid dos situations and ip-banning).
I was thinking about async twisted that call a class, requests from requests module have delay(500) , but then another request to the same domain make it delay(250) and so on, and so on.
How to achive that static delay, and store somewhere something like queue for every domain (class) ?
It's custom web scraper, twisted is TCP but this shouldn't make difference. I don't want the code, but knowledge.

while using asyncio for async,
import asyncio
async def nested(x):
print(x)
await asyncio.sleep(1)
async def main():
# Schedule nested() to run soon concurrently
# with "main()".
for x in range(100):
await asyncio.sleep(1)
task = asyncio.create_task(nested(x))
# "task" can now be used to cancel "nested()", or
# can simply be awaited to wait until it is complete:
await task
asyncio.run(main())
with await in main, it will print every 2s,
without await in nasted, it will print every 1s.
without await task in main, it will print every 0s, even asyncio.sleep is declared.
It is totally hard to maintain if we are new in async.

Related

start_server() method in python asyncio module

This question is really for the different coroutines in base_events.py and streams.py that deal with Network Connections, Network Servers and their higher API equivalents under Streams but since its not really clear how to group these functions I am going to attempt to use start_server() to explain what I don't understand about these coroutines and haven't found online (unless I missed something obvious).
When running the following code, I am able to create a server that is able to handle incoming messages from a client and I also periodically print out the number of tasks that the EventLoop is handling to see how the tasks work. What I'm surprised about is that after creating a server, the task is in the finished state not too long after the program starts. I expected that a task in the finished state was a completed task that no longer does anything other than pass back the results or exception.
However, of course this is not true, the EventLoop is still running and handling incoming messages from clients and the application is still running. Monitor however shows that all tasks are completed and no new task is dispatched to handle a new incoming message.
So my question is this:
What is going on underneath asyncio that I am missing that explains the behavior I am seeing? For example, I would have expected a task (or tasks created for each message) that is handling incoming messages in the pending state.
Why is the asyncio.Task.all_tasks() passing back finished tasks. I would have thought that once a task has completed it is garbage collected (so long as no other references are to it).
I have seen similar behavior with the other asyncio functions like using create_connection() with a websocket from a site. I know at the end of these coroutines, their result is usually a tuple such as (reader, writer) or (transport, protocol) but I don't understand how it all ties together or what other documentation/code to read to give me more insight. Any help is appreciated.
import asyncio
from pprint import pprint
async def echo_message(self, reader, writer):
data = await reader.read(1000)
message = data.decode()
addr = writer.get_extra_info('peername')
print('Received %r from %r' % (message, addr))
print('Send: %r' % message)
writer.write(message.encode())
await writer.drain()
print('Close the client socket')
writer.close()
async def monitor():
while True:
tasks = asyncio.Task.all_tasks()
pprint(tasks)
await asyncio.sleep(60)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.create_task(monitor())
loop.create_task(asyncio.start_server(echo_message, 'localhost', 7777, loop))
loop.run_forever()
Outputs:
###
# Soon after starting the application, monitor prints out:
###
{<Task pending coro=<start_server() running ...>,
<Task pending coro=<monitor() running ...>,
<Task pending coro=<BaseEventLoop._create_server_getaddrinfo() running ...>}
###
# After, things initialized and the server has started and the next print out is:
###
{<Task finished coro=<start_server() done ...>,
<Task pending coro=<monitor() running ...>,
<Task finished coro=<BaseEventLoop._create_server_getaddrinfo() done ...>}

Background task in fastapi making blocking requests

I have a resource intensive async method that i want to run as a background task. Example code for it looks like this:
#staticmethod
async def trigger_task(id: str, run_e2e: bool = False):
try:
add_status_for_task(id)
result1, result2 = await task(id)
update_status_for_task(id, result1, result2)
except Exception:
update_status_for_task(id, 'FAIL')
#router.post("/task")
async def trigger_task(background_tasks: BackgroundTasks):
background_tasks.add_task(EventsTrigger.trigger_task)
return {'msg': 'Task submitted!'}
When i trigger this endpoint, I expect an instant output: {'msg': 'Task submitted!'}. But instead the api output is awaited till the task completes. I am following this documentation from fastapi.
fastapi: v0.70.0
python: v3.8.10
I believe the issue is similar to what is described here.
Request help in making this a non-blocking call.

What I have learned from the github issues,
You can't use async def for task functions (Which will run in background.)
As in background process you can't access the coroutine, So, your async/await will not work.
You can still try without async/await. If that also doesn't work then you should go for alternative.
Alternative Background Solution
Celery is production ready task scheduler. So, you can easily configure and run the background task using your_task_function.delay(*args, **kwargs)
Note that, Celery also doesn't support async in background task. So, whatever you need to write is sync code to run in background.
Good Luck :)

Unfortunately you seem to have oversimplified your example so it is a little hard to tell what is going wrong.
But the important question is: are add_status_for_task() or update_status_for_task() blocking? Because if they are (and it seems like that is the case), then obviously you're going to have issues. When you run code with async/await all the code inside of it needs to be async as well.
This would make your code look more like:
async def trigger_task(id: str, run_e2e: bool = False):
try:
await add_status_for_task(id)
result1, result2 = await task(id)
await update_status_for_task(id, result1, result2)
except Exception:
await update_status_for_task(id, 'FAIL')
#router.post("/task/{task_id}")
async def trigger_task(task_id: str, background_tasks: BackgroundTasks):
background_tasks.add_task(EventsTrigger.trigger_task, task_id)
return {'msg': 'Task submitted!'}

How are you running your app?
According to the uvicorn docs its running with 1 worker by default, which means only one process will be issued simultaneously.
Try configuring your uvicorn to run with more workers.
https://www.uvicorn.org/deployment/
$ uvicorn example:app --port 5000 --workers THE_AMOUNT_OF_WORKERS
or
uvicorn.run("example:app", host="127.0.0.1", port=5000, workers=THE_AMOUNT_OF_WORKERS)

Channels consumer blocks normal HTTP in Django?

I am running a development server locally
python manage.py runserver 8000
Then I run a script which consumes the Consumer below
from channels.generic.websocket import AsyncJsonWebsocketConsumer
class MyConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
import time
time.sleep(99999999)
await self.accept()
Everything runs fine and the consumer sleeps for a long time as expected. However I am not able to access http://127.0.0.1:8000/ from the browser.
The problem is bigger in real life since the the consumer needs to make a HTTP request to the same server - and essentially ends up in a deadlock.
Is this the expected behaviour? How do I allow calls to my server while a slow consumer is running?

since this is an async function you should but using asyncio's sleep.
import asyncio
from channels.generic.websocket import AsyncJsonWebsocketConsumer
class MyConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
await asyncio.sleep(99999999)
await self.accept()
if you use time.sleep you will sleep the entire python thread.
this also applies to when you make your upstream HTTP request you need to use an asyncio http library not a synchronise library. (basically you should be awaiting anything that is expected to take any time)

F# Http.AsyncRequestStream just 'hangs' on long queries

I am working with:
let callTheAPI = async {
printfn "\t\t\tMAKING REQUEST at %s..." (System.DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ss"))
let! response = Http.AsyncRequestStream(url,query,headers,httpMethod,requestBody)
printfn "\t\t\t\tREQUEST MADE."
}
And
let cts = new System.Threading.CancellationTokenSource()
let timeout = 1000*60*4//4 minutes (4 mins no grace)
cts.CancelAfter(timeout)
Async.RunSynchronously(callTheAPI,timeout,cts.Token)
use respStrm = response.ResponseStream
respStrm.Flush()
writeLinesTo output (responseLines respStrm)
To call a web API (REST) and the let! response = Http.AsyncRequestStream(url,query,headers,httpMethod,requestBody) just hangs on certain queries. Ones that take a long time (>4 minutes) particularly. This is why I have made it Async and put a 4 minute timeout. (I collect the calls that timeout and make them with smaller time range parameters).
I started Http.RequestStream from FSharp.Data first, but I couldn't add a timeout to this so the script would just 'hang'.
I have looked at the API's IIS server and the application pool Worker Process active requests in IIS manager and I can see the requests come in and go again. They then 'vanish' and the F# script hangs. I can't find an error message anywhere on the script side or server side.
I included the Flush() and removed the timeout and it still hung. (Removing the Async in the process)
Additional:
Successful calls are made. Failed calls can be followed by successful calls. However, it seems to get to a point where all the calls time out and the do so without even reaching the server any more. (Worker Process Active Requests doesn't show the query)
Update:
I made the Fsx script output the queries and ran them through IRM with now issues (I have timeout and it never locks up). I have a suspicion that there is an issue with FSharp.Data.Http.

Async.RunSynchronously blocks. Read the remarks section in the docs: RunSynchronously. Instead, use Async.AwaitTask.

Distributing workload

In my application I have a number of objects which can perform some long lasting calculations, let's call them clients. Also I have a number of objects which contain descriptions of tasks to be calculated. Something like this:
let clients = [1..4]
let tasks = [1..20]
let calculate c t =
printf "Starting task %d with client %d\n" t c
Thread.Sleep(3000)
printf "Finished task %d with client %d\n" t c
With one client I can only start one task at a time.
I want to create a function/class which will assign the tasks to the clients and perform the calculations. I've done this in C# using a queue of clients, so as soon as a new task is assigned to a client, this client is removed from the queue and when the calculations are finished, the client is released and is placed in the queue again. Now I'm interested in implementing this in a functional way. I've tried to experiment with asynchronous workflows, but I cannot think of a proper way to implement this.
Here's an F#-like code that I was trying to make work, but couldn't:
let rec distribute clients tasks calculate tasks_to_wait =
match clients, tasks with
| _ , [] -> () // No tasks - we're done!
| [], th::tt -> // No free clients, but there are still tasks to calculate.
let released_client = ?wait_for_one? tasks_to_wait
distribute [released_client] tasks calculate ?tasks_to_wait?
| ch::ct, th::tt -> // There are free clients.
let task_to_wait() =
do calculate ch th
ch
distribute ct tt calculate (task_to_wait::tasks_to_wait)
How do I do this? Is there a functional design pattern to solve this task?

There are various ways to do this. It would be perfectly fine to use some concurrent collection (like a queue) from .NET 4.0 from F#, as this is often the easiest thing to do, if the collection already implements the functionality you need.
The problem requires concurrent access to some resource, so it cannot be solved in a purely functional way, but F# provides agents, which give you a nice alternative way to solve the problem.
The following snippet implements an agent that schedules the work items. It uses its own mailbox to keep the available clients (which gives you a nice way to wait for the next available client). After the agent is created, you can just send all the initial clients. It will continue iterating over the tasks while clients are available. When there is no available client, it will block (asynchronously - without blocking of threads), until some previous processing completes and a client is sent back to the agent's mailbox:
let workloadAgent = MailboxProcessor.Start(fun agent ->
// The body of the agent, runs as a recursive loop that iterates
// over the tasks and waits for client before it starts processing
let rec loop tasks = async {
match tasks with
| [] ->
// No more work to schedule (but there may be still calculation,
// which was started earlier, going on in the background)
()
| task::tasks ->
// Wait for the next client to become available by receiving the
// next message from the inbox - the messages are clients
let! client = agent.Receive()
// Spanw processing of the task using the client
async { calculate client task
// When the processing completes, send the client
// back to the agent, so that it can be reused
agent.Post(client) }
|> Async.Start
// Continue processing the rest of the tasks
return! loop tasks }
// Start the agent with the initial list of tasks
loop tasks )
// Add all clients to the agent, so that it can start
for client in clients do workloadAgent.Post(client)
If you're not familiar with F# agents, then the MSDN section Server-Side Programming has some useful information.

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

async twisted, synchronous requests per domain (with delay) - web-scraping

Related

start_server() method in python asyncio module

Background task in fastapi making blocking requests

Channels consumer blocks normal HTTP in Django?

F# Http.AsyncRequestStream just 'hangs' on long queries

Distributing workload

Categories

Resources