I am designing an architecture where all micro services are clustered.
For instance: 5 web server, 1 clustered db, 1 clustered queue system, 8 clustered workers (like send email,send sms,...) that consume from the queue (tasks are pushed by the web server)
I am wondering about the best practice in order to detect that each 'cluster of micro service' is healthy, and how to 'fail fast' the whole service in such case one of the micro service is unavailable.
All the service is sitting behind an nginx for ha proxy - should it be nginx that monitors everything and fails? How can I check the health of all the micro services?
You should use an external monitoring service like Pingometer.
This lets you setup simple health checks (HTTP, HTTPS, Ping, etc.) at regular intervals and receive alerts if a node fails, is unavailable, or not responding with the correct content.
In your contact, you can setup a webhook which is fired when a service goes down. You can use the webhook to trigger a failover, change DNS records, etc.
We setup something similar and it's working quite well.
You can also use something internally to monitor nGinX itself (e.g. cheaping workers + respawning them), but this doesn't let you know that a service is functioning externally (like a monitoring service would).
Related
I have a gRPC client, not dockerised, and server application, which I want to dockerise.
What I don't understand is that gRPC first creates a connection with a server, which involves a handshake. So, if I want to deploy the dockerised server on ECS with multiple instances, then how will the client switch from one to the other (e.g., if one gRPC server falls over).
I know AWS loadbalancer now works with HTTP 2, but I can't find information on how to handle the fact that the server might change after the client has already opened a connection to another one.
What is involved?
You don't necessarily need an in-line load balancer for this. By using a Round Robin client-side load balancing policy along with a DNS record that points to multiple backend instances, you should be able to get some level of redundancy.
Recently, I was having a chat with a much experienced engineer. We had a service on the server that only initiated requests to a partner. I suggested that this service requires us to configure a port and he turned down my suggestion. I believe he said something on the line of "Since we are not hosting a service that is not accessed by anyone rather we are accessing a partner's service, we don't require a port." It got me thinking, given on the same server, we have so many services, how does the server know that this response is for this given service?
broadly, the server is really acting as a client and the ports used for connections are assigned dynamically by the networking stack
under normal conditions, the port
is numbered >1000 (low ports are reserved for root processes)
not in use
I am working testing the auto-scaling feature of OpenStack. In my test set-up, java servlet applications are deployed in tomcat web servers behind a HAproxy load balancer. I aim at stressing testing the application, to see how it scales and the response time using JMeter as the stress tester. However the I observe that HAProxy (or something else) terminates the connection immediately the onComplete signal is sent by one of the member instances. Consequently, the subsequent responses from the remaining servers are reported as failures (timeouts). I have configured the HAProxy server to use a round-robin algorithm with sticky sessions. See attached JMeter results tree , I am not sure of the next step to undertake. The web applications are asyncronous hence my expectation was that the client (HAProxy in this case) should wait until the last thread is submitted before sending the response.
Is there be some issues with my approach or some set up flaws ?
We have developed a TeamViewer-like service where clients connect via SSL to our centralized servers. Other clients can connect to the server as well and we can setup a tunnel through our service to allow peer-to-peer connectivity without NAT or firewall issues.
This works fine with Azure Cloud Services, but we would like to move away from Azure Cloud Services. Service Fabric seems to be the way to go, because it supports ARM and also allows much fine-grained services and make updating parts of the system much more easy.
I know that microservices in Service Fabric can be stateful, but all examples use persistent data as state. In my situation the TCP connection is also part of the state. Is it possible to use TCP with service fabric?
The TCP endpoint should be kept alive on the same instance (for several days), so this makes the entire service fabric model much more difficult.
Sure, you can have users connect to your services over any protocol you want. Your service sounds very stateful to me in the same way that user session state is stateful - you want users to return to the same place where their data is. In your case, that "data" is a TCP connection. But there's no guarantee a TCP endpoint will be kept alive for days in any system - machines fail, software crashes, OSes get patched, etc. You need to be prepared for the connection to break so you can quickly re-establish it. Service Fabric stateful services are great for this. Failover of a stateful service to another machine is extremely fast (milliseconds). Of course, you can't actually replicate a live connection, but you sure can replicate all the metadata you need to re-establish a connection if it breaks.
Every major service in OpenStack has an API service as endpoint for clients to access, eg. openstack-nova-api, openstack-glance-api etc. But for every major service, there are other minor services like openstack-nova-scheduler, openstack-nova-conductor etc. these services are suggested to be deployed on other nodes rather the node where API service is running to get some kind of isolation.
My question is how openstack-nova-api knows where the real services(openstack-nova-scheduler/openstack-nova-conductor) are running, how they communicate with other? When openstack-nova-api got a new request, how does it distribute it to the real services which can process and send back the results?
Internal communication between OpenStack modules is done through the AMQP message queue, typically managed by RabbitMQ.