I have a series of clients which need to be connected constantly to my server via ws protocol. For a number of different reasons, the connections occasionally drop. This is acceptable, but when it happens I'd like my clients to reconnect.
Currently my temporary workaround is to have a parent process launch the client and when it detects connection drop, terminate it (client never handles any critical data, there are no side effects to sigkill-ing it) and respawn a new client. While this does the job, I'd very much prefer to fix the actual problem.
This is roughly my client:
from autobahn.twisted.websocket import WebSocketClientProtocol, WebSocketClientFactory
from twisted.internet import reactor
from threading import Thread
from time import sleep
class Client:
def __init__(self):
self._kill = False
self.factory = WebSocketClientFactory("ws://0.0.0.0")
self.factory.openHandshakeTimeout = 60 # ensures handshake doesnt timeout due to technical limitations
self.factory.protocol = self._protocol_factory()
self._conn = reactor.connectTCP("0.0.0.0", 1234, self.factory)
reactor.run()
def _protocol_factory(self):
class ClientProtocol(WebSocketClientProtocol):
def onConnect(self, response):
Thread(target=_self.mainloop, daemon=True).start()
def onClose(self, was_clean, code, reason):
_self.on_cleanup()
_self = self
return ClientProtocol
def on_cleanup(self):
self._kill = True
sleep(30)
# Wait for self.mainloop to finish.
# It is guaranteed to exit within 30 seconds of setting _kill flag
self._kill = False
self._conn = reactor.connectTCP("0.0.0.0", 1234, self.factory)
def mainloop(self):
while not self._kill:
sleep(1) # does some work
This code makes client work correctly until first connection drop at which point it attempts to reconnect. No exceptions are raised during the process, it appears that everything went correctly clientside, the onConnect is called and new mainloop starts, but the server never received that second handshake. Client seems to think it is connected, though.
What am I doing wrong? Why could this be happening?
I'm not a twisted expert and can't really tell what you are doing wrong, but I'm currently working with Autobahn in a project and I solved the reconnection problems using the ReconnectingClientFactory. Maybe you want to check some examples of the use of ReconnectingClientFactory with Autobahn.
Related
I've an airflow dag that executes 10 tasks (exporting different data from the same source) in parallel, every 15min. I've also enabled 'email_on_failure' to get notified on failures.
Once every month or so, the tasks start failing for a couple of hours due to the data source not being available. Causing airflow to generate hundreds of emails (10 emails every 15min.), until the raw data source is available again.
Is there a better way to avoid being spammed with emails once consecutive runs fail to succeed?
For example, is it possible to only send an email on failure once it is the first run that start failing (i.e. previous run was successful)?
To customise the logic in callbacks you can use on_failure_callback and define a python function to call on failure/success. in this function you can access the task instance.
A property on this task instance is try_number - which you can check before sending an alert. An example could be:
some_operator = BashOperator(
task_id="some_operator",
bash_command="""
echo "something"
""",
on_failure_callback=task_fail_email_alert,
dag=dag,
def task_fail_email_alert(context):
try_number = context["ti"].try_number
if try_number == 1:
# send alert
else:
# do nothing
You can them implement the code to send an email in this function, rather than use the builtin email_on_failure. The EmailOperator is available by importing from airflow.operators.email import EmailOperator.
Giving consideration that your tasks are running concurrently and one or multiple failures could occur, I would suggest to treat the dispatch of failure messages as one would a shared resource.
You need to implement a lock that is "dagrun-aware" –– one that knows about the DagRun.
You can back this lock using an in-memory database like Redis, an object store like S3, system file, or a database. How you choose to implement this up to you.
In your on_failure_callback implementation, you must acquire said Lock. If acquisition of said Lock is successful, carry on to dispatch the email. Otherwise, pass.
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
class OnlyOnceLock:
def __init__(self, run_id):
self.run_id = run_id
def acquire(self):
# Returns False if run_id already exists in a backing store.
# S3 example
hook = S3Hook()
key = self.run_id
bucket_name = 'coordinated-email-alerts'
try:
hook.head_object(key, bucket_name)
return False
except:
# This is the first time lock is acquired
hook.load_string('fakie', key, bucket_name)
return True
def __enter__(self):
return self.acquire()
def __exit__(self, exc_type, exc_val, exc_tb):
pass
def on_failure_callback(context):
error = context['exception']
task = context['task']
run_id = context['run_id']
ti = context['ti']
with OnlyOnceLock(run_id) as lock:
if lock:
ti.email_alert(error, task)
This question is really for the different coroutines in base_events.py and streams.py that deal with Network Connections, Network Servers and their higher API equivalents under Streams but since its not really clear how to group these functions I am going to attempt to use start_server() to explain what I don't understand about these coroutines and haven't found online (unless I missed something obvious).
When running the following code, I am able to create a server that is able to handle incoming messages from a client and I also periodically print out the number of tasks that the EventLoop is handling to see how the tasks work. What I'm surprised about is that after creating a server, the task is in the finished state not too long after the program starts. I expected that a task in the finished state was a completed task that no longer does anything other than pass back the results or exception.
However, of course this is not true, the EventLoop is still running and handling incoming messages from clients and the application is still running. Monitor however shows that all tasks are completed and no new task is dispatched to handle a new incoming message.
So my question is this:
What is going on underneath asyncio that I am missing that explains the behavior I am seeing? For example, I would have expected a task (or tasks created for each message) that is handling incoming messages in the pending state.
Why is the asyncio.Task.all_tasks() passing back finished tasks. I would have thought that once a task has completed it is garbage collected (so long as no other references are to it).
I have seen similar behavior with the other asyncio functions like using create_connection() with a websocket from a site. I know at the end of these coroutines, their result is usually a tuple such as (reader, writer) or (transport, protocol) but I don't understand how it all ties together or what other documentation/code to read to give me more insight. Any help is appreciated.
import asyncio
from pprint import pprint
async def echo_message(self, reader, writer):
data = await reader.read(1000)
message = data.decode()
addr = writer.get_extra_info('peername')
print('Received %r from %r' % (message, addr))
print('Send: %r' % message)
writer.write(message.encode())
await writer.drain()
print('Close the client socket')
writer.close()
async def monitor():
while True:
tasks = asyncio.Task.all_tasks()
pprint(tasks)
await asyncio.sleep(60)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.create_task(monitor())
loop.create_task(asyncio.start_server(echo_message, 'localhost', 7777, loop))
loop.run_forever()
Outputs:
###
# Soon after starting the application, monitor prints out:
###
{<Task pending coro=<start_server() running ...>,
<Task pending coro=<monitor() running ...>,
<Task pending coro=<BaseEventLoop._create_server_getaddrinfo() running ...>}
###
# After, things initialized and the server has started and the next print out is:
###
{<Task finished coro=<start_server() done ...>,
<Task pending coro=<monitor() running ...>,
<Task finished coro=<BaseEventLoop._create_server_getaddrinfo() done ...>}
I need to run a stored procedure on an Azure SQL Managed Instance every 10 seconds from a Python application. The specific call to cursor.execute() happens in a class that extends threading.Thread like so:
class Parser(threading.Thread):
def __init__(self, name, event, interface, config):
threading.Thread.__init__(self)
self.name = name
self.stopped = event
self.interface = interface
self.config = config
self.connection_string = config['connection_string']
self.cnxn = pyodbc.connect(self.connection_string)
def run(self):
while not self.stopped.wait(10):
try:
cursor = self.cnxn.cursor()
cursor.execute("exec dbo.myStoredProcedure")
except Exception as e:
logging.error(e)
My current challenge is that the above thread does not recover gracefully from interruptions to network connectivity. My goal is to have the thread continue to run and re-attempt every 10 seconds until connectivity is restored, then recover gracefully.
Is the best practice here to delete and recreate the connection with every pass of the while loop?
Should I be using ConnectRetryCount or ConnectRetryInterval in my connection string?
While debugging I have found that even after connectivity is restored, pyodbc.connect() still fails with ODBC error 08S01 Communication link failure.
I have looked at the solution proposed in this post but don't see how to apply that solution to a continuous polling architecture.
I am running a development server locally
python manage.py runserver 8000
Then I run a script which consumes the Consumer below
from channels.generic.websocket import AsyncJsonWebsocketConsumer
class MyConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
import time
time.sleep(99999999)
await self.accept()
Everything runs fine and the consumer sleeps for a long time as expected. However I am not able to access http://127.0.0.1:8000/ from the browser.
The problem is bigger in real life since the the consumer needs to make a HTTP request to the same server - and essentially ends up in a deadlock.
Is this the expected behaviour? How do I allow calls to my server while a slow consumer is running?
since this is an async function you should but using asyncio's sleep.
import asyncio
from channels.generic.websocket import AsyncJsonWebsocketConsumer
class MyConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
await asyncio.sleep(99999999)
await self.accept()
if you use time.sleep you will sleep the entire python thread.
this also applies to when you make your upstream HTTP request you need to use an asyncio http library not a synchronise library. (basically you should be awaiting anything that is expected to take any time)
Background
We have an app that deals with a considerable amount of requests per second. This app needs to notify an external service, by making a GET call via HTTPS to one of our servers.
Objective
The objective here is to use HTTPoison to make async GET requests. I don't really care about the response of the requests, all I care is to know if they failed or not, so I can write any possible errors into a logger.
If it succeeds I don't want to do anything.
Research
I have checked the official documentation for HTTPoison and I see that they support async requests:
https://hexdocs.pm/httpoison/readme.html#usage
However, I have 2 issues with this approach:
They use flush to show the request was completed. I can't loggin into the app and manually flush to see how the requests are going, that would be insane.
They don't show any notifications mechanism for when we get the responses or errors.
So, I have a simple question:
How do I get asynchronously notified that my request failed or succeeded?
I assume that the default HTTPoison.get is synchronous, as shown in the documentation.
This could be achieved by spawning a new process per-request. Consider something like:
notify = fn response ->
# Any handling logic - write do DB? Send a message to another process?
# Here, I'll just print the result
IO.inspect(response)
end
spawn(fn ->
resp = HTTPoison.get("http://google.com")
notify.(resp)
end) # spawn will not block, so it will attempt to execute next spawn straig away
spawn(fn ->
resp = HTTPoison.get("http://yahoo.com")
notify.(resp)
end) # This will be executed immediately after previoius `spawn`
Please take a look at the documentation of spawn/1 I've pointed out here.
Hope that helps!