Python gRPC health check

Python gRPC health check - grpc

I'm trying to learn gRPC and implemented the same code as in the tutorial. Wondering how to add gRPC health check to it.
Stumbled upon this, but clueless on how to write a gRPC health check.

I found this after many hours of search. To health check gRPC server, you have to add healthCheckService to your existing server. So the existing server will have multiple services running on it.
Example of how to add multiple services in same server is exaplained here.
Sample server:
# pip install grpcio-health-checking
from grpc_health.v1 import health
from grpc_health.v1 import health_pb2
from grpc_health.v1 import health_pb2_grpc
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
# your normal service, that the server is supposed to run
helloworld_pb2.add_GreeterServicer_to_server(_GreeterServicer(), server)
# health check service - add this service to server
health_pb2_grpc.add_HealthServicer_to_server(
health.HealthServicer(), server)
server.add_insecure_port('[::]:50051')
server.start()

The accompanying unit tests serve as a pretty good reference.
def start_server(self, non_blocking=False, thread_pool=None):
self._thread_pool = thread_pool
self._servicer = health.HealthServicer(
experimental_non_blocking=non_blocking,
experimental_thread_pool=thread_pool)
self._servicer.set('', health_pb2.HealthCheckResponse.SERVING)
self._servicer.set(_SERVING_SERVICE,
health_pb2.HealthCheckResponse.SERVING)
self._servicer.set(_UNKNOWN_SERVICE,
health_pb2.HealthCheckResponse.UNKNOWN)
self._servicer.set(_NOT_SERVING_SERVICE,
health_pb2.HealthCheckResponse.NOT_SERVING)
self._server = test_common.test_server()
port = self._server.add_insecure_port('[::]:0')
health_pb2_grpc.add_HealthServicer_to_server(
self._servicer, self._server)
self._server.start()
Keep a reference to the health servicer in the main servicer class for your application server. Then, call the set() method on it at at the appropriate times, e.g. when startup is finished or when it goes into a state in which it is unable to server requests. Do note however that this makes use of some experimental features in order to ensure that the attached health servicer does not cause application-level requests to be starved.

I found this example really helpful:
https://github.com/grpc/grpc/tree/master/examples/python/xds
Specifically, server.py provides an implementation of health checking as well as reflection.

Related

Flask-socketIO + Kafka as a background process

What I want to do
I have an HTTP API service, written in Flask, which is a template used to build instances of different services. As such, this template needs to be generalizable to handle use cases that do and do not include Kafka consumption.
My goal is to have an optional Kafka consumer running in the background of the API template. I want any service that needs it to be able to read data from a Kafka topic asynchronously, while also independently responding to HTTP requests as it usually does. These two processes (Kafka consuming, HTTP request handling) aren't related, except that they'll be happening under the hood of the same service.
What I've written
Here's my setup:
# ./create_app.py
from flask_socketio import SocketIO
socketio = None
def create_app(kafka_consumer_too=False):
"""
Return a Flask app object, with or without a Kafka-ready SocketIO object as well
"""
app = Flask('my_service')
app.register_blueprint(special_http_handling_blueprint)
if kafka_consumer_too:
global socketio
socketio = SocketIO(app=app, message_queue='kafka://localhost:9092', channel='some_topic')
from .blueprints import kafka_consumption_blueprint
app.register_blueprint(kafka_consumption_blueprint)
return app, socketio
return app
My run.py is:
# ./run.py
from . import create_app
app, socketio = create_app(kafka_consumer_too=True)
if __name__=="__main__":
socketio.run(app, debug=True)
And here's the Kafka consumption blueprint I've written, which is where I think it should be handling the stream events:
# ./blueprints/kafka_consumption_blueprint.py
from ..create_app import socketio
kafka_consumption_blueprint = Blueprint('kafka_consumption', __name__)
#socketio.on('message')
def handle_message(message):
print('received message: ' + message)
What it currently does
With the above, my HTTP requests are being handled fine when I curl localhost:5000. The problem is that, when I write to the some_topic Kafka topic (on port 9092), nothing is showing up. I have a CLI Kafka consumer running in another shell, and I can see that the messages I'm sending on that topic are showing up. So it's the Flask app that's not reacting: no messages are being consumed by handle_message().
What am I missing here? Thanks in advance.

I think you are interpreting the meaning of the message_queue argument incorrectly.
This argument is used when you have multiple server instances. These instances communicate with each other through the configured message queue. This queue is 100% internal, there is nothing that you are a user of the library can do with the message queue.
If you wanted to build some sort of pub/sub mechanism, then you have to implement the listener for that in your application.

Google Cloud Functions with Trace Agent connection

I need to connect monitoring and tracing tools for our application. Our main code is on Express 4 running on Google Cloud Functions. All requests incoming from front nginx proxy server that handle domain and pretty routes names. Unfortunately, trace agent traces this requests, that coming on nginx front proxy without any additional information, and this is not enough to collect useful information about app. I found the Stack Driver custom API, which, as I understand might help to collect appropriate data on runtime, but I don't understand how I can connect it to Google Cloud Functions app. All other examples saying, that we must extend our startup script, but Google Cloud Functions fully automated thing, there is no such possibility here.

Found solution. I included require("#google-cloud/trace-agent"); not at the top of the index.js. It should be included before all other modules. After that it started to work.

Placing require("#google-cloud/trace-agent") as the very first import didn't work for me. I still kept getting:
ERROR:#google-cloud/trace-agent: express tracing might not work as /var/tmp/worker/node_modules/express/index.js was loaded before the trace agent was initialized.
However I managed to work around it by manually patching express:
var traceApi = require('#google-cloud/trace-agent').get();
require("#google-cloud/trace-agent/src/plugins/plugin-express")[0].patch(
require(Object.keys(require('module')._cache).find( _ => _.indexOf("express") !== -1)),
traceApi
);

Running Apache spark job from Spring Web application using Yarn client or any alternate way

I have recently started using spark and I want to run spark job from Spring web application.
I have a situation where I am running web application in Tomcat server using Spring boot.My web application receives a REST web service request based on that It needs to trigger spark calculation job in Yarn cluster. Since my job can take longer to run and can access data from HDFS, so I want to run the spark job in yarn-cluster mode and I don't want to keep spark context alive in my web layer. One other reason for this is my application is multi tenant so each tenant can run it's own job, so in yarn-cluster mode each tenant's job can start it's own driver and run in it's own spark cluster. In web app JVM, I assume I can't run multiple spark context in one JVM.
I want to trigger spark jobs in yarn-cluster mode from java program in the my web application. what is the best way to achieve this. I am exploring various options and looking your guidance on which one is best
1) I can use spark-submit command line shell to submit my jobs. But to trigger it from my web application I need to use either Java ProcessBuilder api or some package built on java ProcessBuilder. This has 2 issues. First it doesn't sound like a clean way of doing it. I should have a programatic way of triggering my spark applications. Second problem will be I will loose the capability of monitoring the submitted application and getting it's status.. Only crude way of doing it is reading the output stream of spark-submit shell, which again doesn't sound like good approach.
2) I tried using Yarn client to submit the job from spring application. Following is the code that I use to submit spark job using Yarn Client:
Configuration config = new Configuration();
System.setProperty("SPARK_YARN_MODE", "true");
SparkConf conf = new SparkConf();
ClientArguments cArgs = new ClientArguments(sparkArgs, conf);
Client client = new Client(cArgs, config, conf);
client.run();
But when I run the above code, it tries to connect on localhost only. I get this error:
5/08/05 14:06:10 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/08/05 14:06:12 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
So I don't think it can connect to remote machine.
Please suggest, what is best way of doing this with latest version of spark. Later I have plans to deploy this entire application in amazon EMR. So approach should work there also.
Thanks in advance

Spark JobServer might help:https://github.com/spark-jobserver/spark-jobserver, this project receives RESTful web requests and start a spark job. Results is returned as json response.

I also had similar issues trying to run Spark app that connects to YARN cluster - having no cluster config it was trying to connect to the local machine as for the main node of the cluster, which obviously failed.
It worked for me when I've placed core-site.xml and yarn-site.xml into the classpath (src/main/resources in typical sbt or Maven project structure) - application correctly connected to the cluster.
When using spark-submit location of those files is typically specified by HADOOP_CONF_DIR environment variable, but for stand-alone application it didn't have effect.

how to use the example of scrapy-redis

I have read the example of scrapy-redis but still don't quite understand how to use it.
I have run the spider named dmoz and it works well. But when I start another spider named mycrawler_redis it just got nothing.
Besides I'm quite confused about how the request queue is set. I didn't find any piece of code in the example-project which illustrate the request queue setting.
And if the spiders on different machines want to share the same request queue, how can I get it done? It seems that I should firstly make the slave machine connect to the master machine's redis, but I'm not sure which part to put the relative code in,in the spider.py or I just type it in the command line?
I'm quite new to scrapy-redis and any help would be appreciated !

If the example spider is working and your custom one isn't, there must be something that you have done wrong. Update your question with the code, including all relevant parts, so we can see what went wrong.
Besides I'm quite confused about how the request queue is set. I
didn't find any piece of code in the example-project which illustrate
the request queue setting.
As far as your spider is concerned, this is done by appropriate project settings, for example if you want FIFO:
# Enables scheduling storing requests queue in redis.
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
# Don't cleanup redis queues, allows to pause/resume crawls.
SCHEDULER_PERSIST = True
# Schedule requests using a queue (FIFO).
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderQueue'
As far as the implementation goes, queuing is done via RedisSpider which you must inherit from your spider. You can find the code for enqueuing requests here: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/scheduler.py#L73
As for the connection, you don't need to manually connect to the redis machine, you just specify the host and port information in the settings:
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
And the connection is configured in the ċonnection.py: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/connection.py
The example of usage can be found in several places: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/pipelines.py#L17

What is the proper way to delete a zombie API proxy in my OPDK installation?

I am on an OPDK installation of Apigee Edge. I have a zombie API proxy, meaning I can't delete the API proxy in the UI (and usually not via MS API, either). I get the following error:
What is the best way to ensure Apigee Edge is cleared of this zombie API proxy so that I can redeploy this API proxy again?

To clean up this up, you will need to execute some manual steps:
1) check /o/{}/apiproxies from MS API call ("curl http(s)://{mgmt-host}:{port}/v1/o/{orgname}/e/{envname}/apiproxies") This will give you the actual response info that the UI is -trying- to parse
2) delete the /o/{}/apiproxies/{proxyname} using MS API call ("curl -X DELETE http(s)://:/v1/o/{orgname}/e/{envname}/apiproxies/{apiproxy_name}") Re-check step 1 to see if it is cleaned up
3) if it is clean, try your deployment again. If it succeeds, you are good.
4) if it does not, then
5) go to zookeeper (/opt/apigee//share/zookeeper) and run the CLI (./zkCli.sh)
6) find /organizations/{orgname}/environments/{envname}/apiproxies/ and see if the {apiproxy_name} is there.
7) if so, execute "[{prompt-stuff}] rmr /organization/{orgname}/environment/{envname}/apiproxies/{apiproxy_name}" in zk
8) repeat your checks above, the proxy should be all clean
Note: There a few circumstances that may require some addition steps, such as actually incorrect server configurations, or conflicting confg data.
Hope that helps.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex