Discord.py: Reddit API Request takes a long time - asynchronous

I am currently programming a Discord Bot using Discord.py, aiohttp and asyncpraw to work with Reddit API requests. My problem is that every request takes a long time to respond. Do you have any solutions how to improve speed of my code / API request?
When using the /gif Command this function is getting called:
# Function for a GIF from r/gifs
async def _init_command_gif_response(interaction: Interaction):
"""A function to send a random gif using reddit api"""
# Respond in the console that the command has been ran
print(f"> {interaction.guild} : {interaction.user} used the gif command.")
# Tell Discord that Request takes some time
await interaction.response.defer()
try:
submission = await _reddit_api_request(interaction, "gifs")
await interaction.followup.send(submission.url)
except Exception:
print(f" > Exception occured processing gif: {traceback.print_exc()}")
return await interaction.followup.send(f"Exception occured processing gif. Please contact <#164129430766092289> when this happened.")
Which is calling this function to start a Reddit API request:
# Reddit API Function
async def _reddit_api_request(interaction: Interaction, subreddit_string: str):
try:
#async with aiohttp.ClientSession(trust_env=True) as session:
async with aiohttp.ClientSession() as session:
reddit = asyncpraw.Reddit(
client_id = config_data.get("reddit_client_id"),
client_secret = config_data.get("reddit_client_secret"),
redirect_uri = config_data.get("reddit_redirect_uri"),
requestor_kwargs = {"session": session},
user_agent = config_data.get("reddit_user_agent"),
check_for_async=False)
reddit.read_only = True
# Check if Subreddit exists
try:
subreddit = [sub async for sub in reddit.subreddits.search_by_name(subreddit_string, exact=True)]
except asyncprawcore.exceptions.NotFound:
print(f" > Exception: Subreddit \"{subreddit_string}\" not found")
await interaction.followup.send(f"Subreddit \"{subreddit_string}\" does not exist!")
raise
except asyncprawcore.exceptions.ServerError:
print(f" > Exception: Reddit Server not reachable")
await interaction.followup.send(f"Reddit Server not reachable!")
raise
# Respond with content from reddit
return await subreddit[0].random()
except Exception:
raise
My goal is to increase speed of the discord response. Every other function that is not using Reddit API is snappy. So it must be something with my _reddit_api_request Function.
Full Source Code can be found on Github

Related

page.close() not working as expected in Playwright and asyncio

I have written a web scraper which needs to scrape few hundred pages asynchronously in Playwright-Python after login.
I've came across aiometer from #Florimond Manca (https://github.com/florimondmanca/aiometer) to limit requests in the main async function - this works well.
The problem I'm having at the moment, is closing the pages after they've been scraped. The async function just increases the amount of pages load - as it should - but it increases memory consumption significantly if few hundred are loaded.
In the function I'm opening a browser context and passing that to each async scraping request per page, the rationale being that it decreases memory overhead and it conserves the state from my login function (implemented in my main script - not shown).
How can I close the pages after being scraped (in the scrape function)?
import asyncio
import functools
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import pandas as pd
import aiometer
urls = [
"https://scrapethissite.com/pages/ajax-javascript/#2015",
"https://scrapethissite.com/pages/ajax-javascript/#2014",
"https://scrapethissite.com/pages/ajax-javascript/#2013",
"https://scrapethissite.com/pages/ajax-javascript/#2012",
"https://scrapethissite.com/pages/ajax-javascript/#2011",
"https://scrapethissite.com/pages/ajax-javascript/#2010"
]
async def scrape(context, url):
page = await context.new_page()
await page.goto(url)
await page.wait_for_load_state(state="networkidle")
await page.wait_for_timeout(1000)
#Getting results off the page
html = await page.content()
soup = BeautifulSoup(html, "lxml")
tables = soup.find_all('table')
dfs = pd.read_html(str(tables))
df=dfs[0]
print("Dataframe in page "+url+ " scraped")
page.close
return df
async def main(urls):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
context = await browser.new_context()
master_results = pd.DataFrame()
async with aiometer.amap(
functools.partial(scrape, context),
urls,
max_at_once=5, # Limit maximum number of concurrently running tasks.
max_per_second=3, # Limit request rate to not overload the server.
) as results:
async for data in results:
print(data)
master_results = pd.concat([master_results,data], ignore_index=True)
print(master_results)
asyncio.run(main(urls))
I've tried the await keyword before page.close() or context.close() throws an error: "TypeError: object method can't be used in 'await' expression".
After reading a few pages, even into the Playwright documentation bug trackers on github: https://github.com/microsoft/playwright/issues/10476 , I found the problem:
I forgot to add parentheses in my page.close function.
page.close()
So simple - but yet took me hours to get to. Probably part of learning to code.

Is it possible to integrate GCP pub/sub SteamingPullFutures with discordpy?

I'd like to use a pub/sub StreamingPullFuture subscription with discordpy to receive instructions for removing users and sending updates to different servers.
Ideally, I would start this function when starting the discordpy server:
#bot.event
async def on_ready():
print(f'{bot.user} {bot.user.id}')
await pub_sub_function()
I looked at discord.ext.tasks but I don't think this use case fits since I'd like to handle irregularly spaced events dynamically.
I wrote this pub_sub_function() (based on the pub/sub python client docs) but it doesn't seem to be listening to pub/sub or return anything:
def pub_sub_function():
subscriber_client = pubsub_v1.SubscriberClient()
# existing subscription
subscription = subscriber_client.subscription_path(
'my-project-id', 'my-subscription')
def callback(message):
print(f"pubsub_message: {message}")
message.ack()
return message
future = subscriber_client.subscribe(subscription, callback)
try:
future.result()
except KeyboardInterrupt:
future.cancel() # Trigger the shutdown.
future.result() # Block until the shutdown is complete.
Has anyone done something like this? Is there a standard approach for sending data/messages from external services to a discordpy server and listening asynchronously?
Update: I got rid of pub_sub_function() and changed the code to this:
subscriber_client = pubsub_v1.SubscriberClient()
# existing subscription
subscription = subscriber_client.subscription_path('my-project-id', 'my-subscription')
def callback(message):
print(f"pubsub_message: {message}")
message.ack()
return message
#bot.event
async def on_ready():
print(f'{bot.user} {bot.user.id}')
await subscriber_client.subscribe(subscription, callback).result()
This works, sort of, but now the await subscriber_client.subscribe(subscription, callback).result() is blocking the discord bot, and returning this error:
WARNING discord.gateway Shard ID None heartbeat blocked for more than 10 seconds.
Loop thread traceback (most recent call last):
Ok, so this Github pr was very helpful.
In it, the user says that modifications are needed to make it work with asyncio because of Google's pseudo-future implementation:
Google implemented a custom, psuedo-future
need monkey patch for it to work with asyncio
But basically, to make the pub/sub future act like the concurrent.futures.Future, the discord.py implementation should be something like this:
async def pub_sub_function():
subscriber_client = pubsub_v1.SubscriberClient()
# existing subscription
subscription = subscriber_client.subscription_path('my-project-id', 'my-subscription')
def callback(message):
print(f"pubsub_message: {message}")
message.ack()
return message
future = subscriber_client.subscribe(subscription, callback)
# Fix the google pseduo future to behave like a concurrent Future:
future._asyncio_future_blocking = True
future.__class__._asyncio_future_blocking = True
real_pubsub_future = asyncio.wrap_future(future)
return real_pubsub_future
and then you need to await the function like this:
#bot.event
async def on_ready():
print(f'{bot.user} {bot.user.id}')
await pub_sub_function()

Google login for FastAPI

I am using the code below for google authentication. There is two end points (/login and /auth). At the first time I can sign in with my google account but when I want to change it, it does not ask me for Google credentials, it automatically sign in with my previous account. Is there any help?
Here is the sample code:
#app.route('/login')
async def login(request: Request):
# absolute url for callback
# we will define it below
redirect_uri = request.url_for('auth')
return await oauth.google.authorize_redirect(request, redirect_uri)
#app.route('/auth')
async def auth(request: Request):
token = await oauth.google.authorize_access_token(request)
# <=0.15
# user = await oauth.google.parse_id_token(request, token)
user = token['userinfo']
return user
You can find the full code here:
https://blog.authlib.org/2020/fastapi-google-login
clear your session first
#app.get('/logout')
async def logout(request: Request):
request.session.pop('user', None)
return RedirectResponse(url='/')
or clear your cookie

Trace failed fastapi requests with opencensus

I'm using opencensus-python to track requests to my python fastapi application running in production, and exporting the information to Azure AppInsights using the opencensus exporters. I followed the Azure Monitor docs and was helped out by this issue post which puts all the necessary bits in a useful middleware class.
Only to realize later on that requests that caused the app to crash, i.e. unhandled 5xx type errors, would never be tracked, since the call to execute the logic for the request fails before any tracing happens. The Azure Monitor docs only talk about tracking exceptions through the logs, but this is separate from the tracing of requests, unless I'm missing something. I certainly wouldn't want to lose out on failed requests, these are super important to track! I'm accustomed to using the "Failures" tab in app insights to monitor any failing requests.
I figured the way to track these requests is to explicitly handle any internal exceptions using try/catch and export the trace, manually setting the result code to 500. But I found it really odd that there seems to be no documentation of this, on opencensus or Azure.
The problem I have now is: this middleware function is expected to pass back a "response" object, which fastapi then uses as a callable object down the line (not sure why) - but in the case where I caught an exception in the underlying processing (i.e. at await call_next(request)) I don't have any response to return. I tried returning None but this just causes further exceptions down the line (None is not callable).
Here is my version of the middleware class - its very similar to the issue post I linked, but I'm try/catching over await call_next(request) rather than just letting it fail unhanded. Scroll down to the final 5 lines of code to see that.
import logging
from fastapi import Request
from opencensus.trace import (
attributes_helper,
execution_context,
samplers,
)
from opencensus.ext.azure.trace_exporter import AzureExporter
from opencensus.trace import span as span_module
from opencensus.trace import tracer as tracer_module
from opencensus.trace import utils
from opencensus.trace.propagation import trace_context_http_header_format
from opencensus.ext.azure.log_exporter import AzureLogHandler
from starlette.types import ASGIApp
from src.settings import settings
HTTP_HOST = attributes_helper.COMMON_ATTRIBUTES["HTTP_HOST"]
HTTP_METHOD = attributes_helper.COMMON_ATTRIBUTES["HTTP_METHOD"]
HTTP_PATH = attributes_helper.COMMON_ATTRIBUTES["HTTP_PATH"]
HTTP_ROUTE = attributes_helper.COMMON_ATTRIBUTES["HTTP_ROUTE"]
HTTP_URL = attributes_helper.COMMON_ATTRIBUTES["HTTP_URL"]
HTTP_STATUS_CODE = attributes_helper.COMMON_ATTRIBUTES["HTTP_STATUS_CODE"]
module_logger = logging.getLogger(__name__)
module_logger.addHandler(AzureLogHandler(
connection_string=settings.appinsights_connection_string
))
class AppInsightsMiddleware:
"""
Middleware class to handle tracing of fastapi requests and exporting the data to AppInsights.
Most of the code here is copied from a github issue: https://github.com/census-instrumentation/opencensus-python/issues/1020
"""
def __init__(
self,
app: ASGIApp,
excludelist_paths=None,
excludelist_hostnames=None,
sampler=None,
exporter=None,
propagator=None,
) -> None:
self.app = app
self.excludelist_paths = excludelist_paths
self.excludelist_hostnames = excludelist_hostnames
self.sampler = sampler or samplers.AlwaysOnSampler()
self.propagator = (
propagator or trace_context_http_header_format.TraceContextPropagator()
)
self.exporter = exporter or AzureExporter(
connection_string=settings.appinsights_connection_string
)
async def __call__(self, request: Request, call_next):
# Do not trace if the url is in the exclude list
if utils.disable_tracing_url(str(request.url), self.excludelist_paths):
return await call_next(request)
try:
span_context = self.propagator.from_headers(request.headers)
tracer = tracer_module.Tracer(
span_context=span_context,
sampler=self.sampler,
exporter=self.exporter,
propagator=self.propagator,
)
except Exception:
module_logger.error("Failed to trace request", exc_info=True)
return await call_next(request)
try:
span = tracer.start_span()
span.span_kind = span_module.SpanKind.SERVER
span.name = "[{}]{}".format(request.method, request.url)
tracer.add_attribute_to_current_span(HTTP_HOST, request.url.hostname)
tracer.add_attribute_to_current_span(HTTP_METHOD, request.method)
tracer.add_attribute_to_current_span(HTTP_PATH, request.url.path)
tracer.add_attribute_to_current_span(HTTP_URL, str(request.url))
execution_context.set_opencensus_attr(
"excludelist_hostnames", self.excludelist_hostnames
)
except Exception: # pragma: NO COVER
module_logger.error("Failed to trace request", exc_info=True)
try:
response = await call_next(request)
tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, response.status_code)
tracer.end_span()
return response
# Explicitly handle any internal exception here, and set status code to 500
except Exception as exception:
module_logger.exception(exception)
tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, 500)
tracer.end_span()
return None
I then register this middleware class in main.py like so:
app.middleware("http")(AppInsightsMiddleware(app, sampler=samplers.AlwaysOnSampler()))
Explicitly handle any exception that may occur in processing the API request. That allows you to finish tracing the request, setting the status code to 500. You can then re-throw the exception to ensure that the application raises the expected exception.
try:
response = await call_next(request)
tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, response.status_code)
tracer.end_span()
return response
# Explicitly handle any internal exception here, and set status code to 500
except Exception as exception:
module_logger.exception(exception)
tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, 500)
tracer.end_span()
raise exception

How to catch specific redirect using playwright?

when Google Map is to some level confirmed about a place search it redirects to the specific Google place url otherwise it returns a map search result page.
Google Map search for "manarama" is
https://www.google.com/maps/search/manarama/#23.7505522,90.3616303,15z/data=!4m2!2m1!6e6
which redirects to a Google Place URL
https://www.google.com/maps/place/Manarama,+29+Rd+No.+14A,+Dhaka+1209/#23.7505522,90.3616303,15z/data=!4m5!3m4!1s0x3755bf4dfc183459:0xb9127b8c3072c249!8m2!3d23.750523!4d90.3703851
Google Map search result page looks like the following link below when it is not confirmed about the specific place
https://www.google.com/maps/search/Mana/#24.211316,89.340686,8z/data=!3m1!4b1
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
page = await browser.new_page()
await page.goto("https://www.google.com/maps/search/manarama/#23.7505522,90.3616303,15z/data=!4m2!2m1!6e6", wait_until="networkidle")
print(page.url)
await page.close()
await browser.close()
asyncio.run(main())
Sometimes it returns the redirected URL, but most of the time, it doesn't. How to know the URL got redirected to a place URL for sure? the following StackOverflow post has similarities but couldn't make it work for my case
How to catch the redirect with a webapp using playwright
You can use expect_navigation.
In the comments you mentioned about what url to match for with the function. Almost all such playwright functions accept regex patterns. So when in doubt, just use regex. See the code below:
import asyncio
from playwright.async_api import async_playwright, TimeoutError
import re
pattern = re.compile(r"http.*://.+?/place.+")
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
page = await browser.new_page()
try:
async with page.expect_navigation(url=pattern, timeout=7000) as resp:
await page.goto(
"https://www.google.com/maps/search/manarama/#23.7505522,90.3616303,15z/data=!4m2!2m1!6e6",
wait_until='networkidle')
except TimeoutError:
print('place not found')
else:
print('navigated to place')
print(page.url)
await page.close()
await browser.close()
asyncio.run(main())
In order to check whether the page navigated or not, just wrap the function inside a try..except block and pass a suitable timeout argument (in ms) to expect_navigation. Then if a Timeout error was raised, you know that there wasn't any url change which matched our pattern.

Resources