I want to use python code for Bokeh server and use it as library as well. So I modularize my code by _name_=='__main__', but standalone Bokeh server is not getting triggered.
def initialize_WatchDataFrame():
print("Initialize Watchlist")
if __name__ == "__main__":
initialize_WatchDataFrame()
curdoc().add_periodic_callback(update_WatchDataFrame, 2000)
curdoc().title = "WatchList"
So when i was running the server with "bokeh serve Watchlist.py". I dont see the call to initialize_WatchDataFrame() being made.
If you want to be able to run python foo.py and don't want to run bokeh serve foo.py then you you will have to embed the Bokeh server as library. That requires setting up and starting a Tornado IOLoop manually yourself. Here is a complete example:
from bokeh.plotting import figure
from bokeh.server.server import Server
from tornado.ioloop import IOLoop
def modify_doc(doc):
p = figure()
p.line([1,2,3,4,5], [3,4,2,7,5], line_width=2)
doc.add_root(p)
if __name__ == '__main__':
server = Server({'/bkapp': modify_doc}, io_loop=IOLoop())
server.start()
server.io_loop.start()
Depending on what you are trying to accomplish, you may also need to embed this app using server_document, or run the IOLoop in a thread. Those use cases are demonstrated in the examples linked in the docs.
It probably also bears mentioning: the code that modifies the document only runs when a browser connection is made. (And: it is run every time a browser connection is made, to generate a new document just for that session.)
Related
Until version 1.11, Streamlit had the following way to access the current Server instance:
from streamlit.server.server import Server
Server.get_current()
Now, in version 1.12, it changed to:
from streamlit.web.server import Server
It's ok. But the method get_current() was removed from the class Server.
So, is there another way to get the server instance?
In case there is no other way (if there is, please tell me), the server instance can be found in the object list of the garbage collector:
import gc
for obj in gc.get_objects():
if type(obj) is Server:
server = obj
break
They removed the singleton in this PR.
Here is one internal way to access the object, by fetching it from the closure variables of a signal handler that streamlit registers in its run() method:
import typing as T
from streamlit.web.server import Server
def get_streamlit_server() -> T.Optional[Server]:
"""
Get the active streamlit server object. Must be called within a running
streamlit session.
Easy access to this object was removed in streamlit 1.12:
https://github.com/streamlit/streamlit/pull/4966
"""
# In the run() method in `streamlit/web/bootstrap.py`, a signal handler is registered
# with the server as a closure. Fetch that signal handler.
streamlit_signal_handler = signal.getsignal(signal.SIGQUIT)
# Iterate through the closure variables and return the server if found.
for cell in streamlit_signal_handler.__closure__:
if isinstance(cell.cell_contents, Server):
return cell.cell_contents
return None
My HTTP runs using aiohttp.web module:
import asyncio as aio
import aiohttp.web as web
server = web.Application()
server.add_routes([...])
web.run_app(server, port=8080)
The code inside web.run_app use the main event loop, it handles KeyboardInterrupt exception and exit when Ctrl+C is pressed, in simple apps. However, I need to terminate all threads, which aiohttp.web won't, and the programme doesn't exit.
How to override the default signal handler of aiohttp.web.Application?
Thanks to #user4815162342 in the comments, 2 solutions for the problem:
Solution 1: Add daemon=True:
my_thread = Thread(target=..., daemon=True)
Solution 2: Wrap web.run_app in a try block:
try:
web.run_app(server,...)
finally:
terminate_threads()
I am running a development server locally
python manage.py runserver 8000
Then I run a script which consumes the Consumer below
from channels.generic.websocket import AsyncJsonWebsocketConsumer
class MyConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
import time
time.sleep(99999999)
await self.accept()
Everything runs fine and the consumer sleeps for a long time as expected. However I am not able to access http://127.0.0.1:8000/ from the browser.
The problem is bigger in real life since the the consumer needs to make a HTTP request to the same server - and essentially ends up in a deadlock.
Is this the expected behaviour? How do I allow calls to my server while a slow consumer is running?
since this is an async function you should but using asyncio's sleep.
import asyncio
from channels.generic.websocket import AsyncJsonWebsocketConsumer
class MyConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
await asyncio.sleep(99999999)
await self.accept()
if you use time.sleep you will sleep the entire python thread.
this also applies to when you make your upstream HTTP request you need to use an asyncio http library not a synchronise library. (basically you should be awaiting anything that is expected to take any time)
What I want to do
I have an HTTP API service, written in Flask, which is a template used to build instances of different services. As such, this template needs to be generalizable to handle use cases that do and do not include Kafka consumption.
My goal is to have an optional Kafka consumer running in the background of the API template. I want any service that needs it to be able to read data from a Kafka topic asynchronously, while also independently responding to HTTP requests as it usually does. These two processes (Kafka consuming, HTTP request handling) aren't related, except that they'll be happening under the hood of the same service.
What I've written
Here's my setup:
# ./create_app.py
from flask_socketio import SocketIO
socketio = None
def create_app(kafka_consumer_too=False):
"""
Return a Flask app object, with or without a Kafka-ready SocketIO object as well
"""
app = Flask('my_service')
app.register_blueprint(special_http_handling_blueprint)
if kafka_consumer_too:
global socketio
socketio = SocketIO(app=app, message_queue='kafka://localhost:9092', channel='some_topic')
from .blueprints import kafka_consumption_blueprint
app.register_blueprint(kafka_consumption_blueprint)
return app, socketio
return app
My run.py is:
# ./run.py
from . import create_app
app, socketio = create_app(kafka_consumer_too=True)
if __name__=="__main__":
socketio.run(app, debug=True)
And here's the Kafka consumption blueprint I've written, which is where I think it should be handling the stream events:
# ./blueprints/kafka_consumption_blueprint.py
from ..create_app import socketio
kafka_consumption_blueprint = Blueprint('kafka_consumption', __name__)
#socketio.on('message')
def handle_message(message):
print('received message: ' + message)
What it currently does
With the above, my HTTP requests are being handled fine when I curl localhost:5000. The problem is that, when I write to the some_topic Kafka topic (on port 9092), nothing is showing up. I have a CLI Kafka consumer running in another shell, and I can see that the messages I'm sending on that topic are showing up. So it's the Flask app that's not reacting: no messages are being consumed by handle_message().
What am I missing here? Thanks in advance.
I think you are interpreting the meaning of the message_queue argument incorrectly.
This argument is used when you have multiple server instances. These instances communicate with each other through the configured message queue. This queue is 100% internal, there is nothing that you are a user of the library can do with the message queue.
If you wanted to build some sort of pub/sub mechanism, then you have to implement the listener for that in your application.
So here's a stupid idea...
I created (many) DAG(s) in airflow... and it works... however, i would like to package it up somehow so that i could run a single DAG Run without having airflow installed; ie have it self contained so i don't need all the web servers, databases etc.
I mostly instantiate new DAG Run's with trigger dag anyway, and i noticed that the overhead of running airflow appears quite high (workers have high loads doing essentially nothing, it can sometimes take 10's of seconds before dependent tasks are queued etc).
i'm not too bothered about all the logging etc.
You can create a script which executes airflow operators, although this loses all the meta data that Airflow provides. You still need to have Airflow installed as a Python package, but you don't need to run any webservers, etc. A simple example could look like this:
from dags.my_dag import operator1, operator2, operator3
def main():
# execute pipeline
# operator1 -> operator2 -> operator3
operator1.execute(context={})
operator2.execute(context={})
operator3.execute(context={})
if __name__ == "__main__":
main()
It sounds like your main concern is the waste of resources by the idling workers more so than the waste of Airflow itself.
I would suggest running Airflow with the LocalExecutor on a single box. This will give you the benefits of concurrent execution without the hassle of managing workers.
As for the database - there is no way to remove the database component without modifying airflow source itself. One alternative would be to leverage the SequentialExecutor with SQLite, but this removes the ability to run concurrent tasks and is not recommended for production.
First I'd say you need to tweak you airflow setup.
But if that's not an option, then another way is to write your main logic in code outside the DAG. (This is also best practice). For me this makes the code easier to test locally as well.
Writing a shell script is pretty easy to tie a few processes together.
You won't get the benefit of operators or dependencies, but you probably can script your way around it. And if you can't, just use Airflow.
You can overload the imported airflow modules if they fail to import. So for example, if you are using from airflow.decorators import dag, task you can overload the #dag and #task decorators:
from datetime import datetime
try:
from airflow.decorators import dag, task
except ImportError:
mock_decorator = lambda f=None,**d: f if f else lambda x:x
dag = mock_decorator
task = mock_decorator
#dag(schedule=None, start_date=datetime(2022, 1, 1), catchup=False)
def mydag():
#task
def task_1():
print("task 1")
#task
def task_2(input):
print("task 2")
task_2(task_1())
_dag = mydag()