Correct way to get a gRPC client to communicate with one of many ECS instances of the gRPC service?

Correct way to get a gRPC client to communicate with one of many ECS instances of the gRPC service? - grpc

I have a gRPC client, not dockerised, and server application, which I want to dockerise.
What I don't understand is that gRPC first creates a connection with a server, which involves a handshake. So, if I want to deploy the dockerised server on ECS with multiple instances, then how will the client switch from one to the other (e.g., if one gRPC server falls over).
I know AWS loadbalancer now works with HTTP 2, but I can't find information on how to handle the fact that the server might change after the client has already opened a connection to another one.
What is involved?

You don't necessarily need an in-line load balancer for this. By using a Round Robin client-side load balancing policy along with a DNS record that points to multiple backend instances, you should be able to get some level of redundancy.

Related

Are there existing solutions for serving multiple websocket clients from a single upstream connection using a proxy server?

I have a data vendor for real-time data that has a strict limit on the number of websocket connections I am allowed to make to their API. I have multiple microservices that need to consume this data, with some overlap in subscriptions. The clients do not need to communicate back anything beyond subscriptions.
I would like to design a system using a proxy-server that maintains a single websocket connection to the data vendor and then relays the appropriate messages to the clients via websocket. Optimally, the clients would be able to interact with the proxy server as if it were the data vendor's API.
I have looked at various (reverse?) proxy server solutions here, but have not found specific language about reducing the number of connections to the upstream data source. For example, I have looked into NGINX, but I can't tell if the proxy will combine client connections into a single upstream connection. The other solution I have researched is just putting all messages into a Kafka Pub/Sub via a connector, and have each client subscribe there.
I am curious if there are any existing, out of the box solutions to this problem before I implement my own solution.

Can we Scale gRPC like worker processes in the same server

Our application is expected to receive thousands of request every second and we are considering gRPC as one of our main service is in a different language.
My queries are
Can we use something like supervisor to spawn multiple workers (one gRPC server per service) as gRPC servers listening to the same port, Or is gRPC servers limited to only one per server/port
How would i go about the performance testing to determine maximum requests per gRPC server.
Thanks in advance

While you can certainly use supervisord to spawn multiple gRPC server processes, port sharing would be a problem. However, this is a Posix limitation, not a gRPC limitation. By default, multiple processes cannot listen on the same port. (to be clear, multiple processes can bind to the same port with SO_REUSEPORT, but this would not result in the behavior you presumably want).
So you have two options in order to get traffic routed to the proper service on a single port. The first option is to run all of the gRPC services in the same process and attached to the same server.
If having only a single server process won't work for you, then you'll have to start looking at load balancing. You'd front all of your services with any HTTP/2-capable load balancer (e.g. Envoy, Nginx) and have it listen on your single desired port and route to each gRPC server process as appropriate.
This is a very broad question. The answer is "the way you'd benchmark any non-gRPC server." This site is a great resource for some principles behind benchmarking.

How does GCP Load balancers will manage websocket connections?

Clients are connecting to API gateway server through websocket connection. This server just orchestrates swarm of cloud functions, that are handling all of the data requesting and transforming. Server is statefull - it holds essential session data, which is defining, for example, what cloud functions are allowed to be requested by a given user.
This server doesn't use socket to broadcast data, so socket connections are not interacting between each other, and will not be doing this. So, all it needs to handle is single-client-to-server communication.
What will happen if i'll create bunch of replicas and put load balancer in front of all of them (like regular horizontal scaling)? If a user got connected to certain server instance, then his connection will stick there? or it will be switching between instances by load balancer?

There is a parameter available for load balancer that allows you to do what you are looking for: Session affinity.
"Session affinity if set attempts to send all network request from the same client to the same virtual machine instance."
Actually even if it seems to be related to load balancer you set it while creating target pools and/or backends. You should check if this solution can be applied to your particular configuration.

HTTP connection pools to share among processes

Where I work our main web application is served with nginx+uwsgi+Django. A given production box has 80 uwsgi worker processes running on it. Our Django application makes moderately frequent requests to Amazon S3 but, if each of those 80 workers has to use its own HTTP connection for such requests, they're not frequent enough to take advantage of the (relatively short) HTTP Keep-Alive allowed for by Amazon's servers. So, we frequently have to pay a reconnection penalty after the connection is dropped on Amazon's side.
What I would like is if there were a proxy service running on the same box that could "concentrate" the S3 connections from those 80 processes down into a smaller pool of HTTP connections that would get enough use that they would be kept alive. The Django app would connect to the proxy, and the proxy would use its pool of kept-alive connections to forward the requests to S3. I see that it is possible to use nginx itself as a forward proxy, but it's not clear to me if or how this can take advantage of connection pooling the way I have in mind. An ideal solution would be good at auto-scaling so that a uwsgi worker would never have to wait on the proxy itself for a connection, but would pare back connections as load drops so as to keep the connections as "hot" as possible (perhaps keeping 1 or 2 spare to handle occasional upticks).
I've run across other forward proxies such as Squid but these products seem designed to fulfill the more traditional caching proxy role for use by e.g. ISPs that have many disparate remote clients.
Does anyone know of an existing solution for this type of problem? Many thanks!

Standard way of using a single port for multiple sockets?

Hey I am writing an app in Twisted, and as it stands I have 4 servers bound two different ports all communicating with the client via JSON. Is there anyway to bind these 4 servers to the same port and have the interactions remain the same?
For instance say the client subscribes to two different feeds, transmitted via a direct socket.
Right now I just do like
server1.read_string()
server2.read_string()
and it will read the correct JSON string from the respective feeds. Is there anyway to maintain this type of functionality but contact my server on the same port?
I do not want to throw all of the server functionality into one massive server and partition the data by header prefixes.
I don't want to do something like
s = server.read_string()
header = s.split(//some delimiter)[0]
if (header == "SERVER1")
{
// Blahh
}

It sounds like you have many clients interacting with your servers via HTTP. The standard solution is to throw a reverse proxy between the client and your servers - that proxy then forwards connections to the appropriate server depending on the URL. The reverse proxy can run on any one of your existing servers or on its own server to lighten the load.
If your data is cachable, the reverse proxy can do caching on your results too.
There are many reverse proxies available and you will want to choose one based on what sort of workload you have. Do you need it to be highly configurable? Is the data public or based on logins? How long does each connection last / how many connections to you want to hold open at once?
Squid, Varnish, HAProxy are good reverse proxies and even Apache could do this for you.
I plan to use HAProxy for Gridspy, my project as I have many ongoing connections with my clients and want to place an orbited server in the same URL path as my django server. See This tutorial for more information on how to forward many connections on port 80 from one server to many. This tutorial is focused on Comet, but your problem is even simpler than that.
If you are considering an ongoing tcp/ip connection from the browser back to your servers, seriously consider Orbited. See this tutorial about graphs via orbited and morbidQ. Orbited will also punch through firewalls and proxies better than most custom solutions will, as it looks like normal HTTP traffic.

In order to have multiple servers running on the same machine all bound to the same port, they need to be bound to different IP addresses. The only way to bind to the same port on the same IP is to enable the socket's SO_REUSESOCKET option, but then multiple servers would be able to receive each other's inbound data, really messing up your communications.
Otherwise, having a single server that uses headers to identifies the particular feeds is best. Why do you not want to do that?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex