I am new to nginx, and am wondering if it can help me to solve a use-case we've encountered.
I have n nodes,which are reading from from a kafka topic with the same group id, which means that each node has disjoint data, partitioned by some key.
Nginx has no way of knowing apriori which node has data corresponding to which keys. But we can build an API or have a redis instance which can tell us the node given the key.
Is there a way nginx can incorporate third party information of this kind to route requests?
I'd also welcome any answers, even if it doesn't involve nginx.
Nginx has no way of knowing apriori which node has data corresponding to which keys
Nginx doesn't need to know. You would need to do this in Kafka Streams RPC layer with Interactive Queries. (Spring-Kafka has an InteractiveQueryService interface, btw, that can be used from Spring Web).
If you want to present users with a single address for the KStreams HTTP/RPC endpoints, then that would be a standard Nginx upstream definition for a reverse proxy, which would route to any of the backend servers, which in-turn communicate with themselves to fetch the necessary key/value, and return the response back to the client.
I have no idea how Kafka partitions
You could look at the source code and see it uses a murmur2 hash, which is available in Lua, and can be used in Nginx.
But again, this is a rabbit hole you should probably avoid.
Other option, use Kafka Connect to dump data to Redis (or whatever database you want). Then write a very similar HTTP API service, then (optionally) point Nginx at that.
Related
I was recently reading this article from the Figma engineering blog: https://www.figma.com/blog/rust-in-production-at-figma/ and was curious about their NGINX setup for multiplayer routing. This is how it looks:
Where they have M number of servers, and where each server has W number of workers. Figma lets users collaborate on design documents in real-time, and each document (i.e. the logic that takes care of the real-time multiplayer processing for each doc) always lives in one specific worker.
I’m wondering how they manage to always route users to the machine that has the worker for the document being worked on, and then to the specific process that actually has the doc.
They do this with NGINX, but my question is how?
I know that NGINX has round-robin and ip_hash methods to load balance, but that’s not granular enough to achieve what they do.
Related question:
Route traffic to multiple node servers based on a condition
You should be able to use a cookie to associate a user with the downstream node: https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/#enabling-session-persistence
I know the command swift-ring-builder /etc/swift/object.builder can get all storage nodes in a swift cluster. Now I want to know if there are any commands like it to get the proxy nodes in the cluster?
Every controller node itself acts as a proxy server first.The requests hit the proxy-server code in the controller node which resolves functions and methods to be called and acts upon.
The list of storage nodes MUST be accessible for all nodes in the cluster.
However, swift is agnostic about the list of proxies it has, so there is no command like that.
One suggestion, if you really need this information, would be to look at the storage nodes logs and find out the ips doing the requests. This way you can discover some or all proxies. However this method is totally imprecise.
Twilio and other HTTP-driven web services have the concept of a fallback URL, where the web services sends a GET or POST to a URL of your choice if the main URL times out or otherwise fails. In the case of Twilio, they will not retry the request if the fallback URL also fails. I'd like the fallback URL to be hosted on a separate machine so that the error doesn't get lost in the ether if the primary server is down or unreachable.
I'd like some way for the secondary to:
Store requests to the fallback URL
Replay the requests to a slightly different URL on the primary server
Retry #2 until success, then delete the request from the queue/database
Is there some existing piece of software that can do this? I can build something myself if need be, I just figured this would be something someone would have already done. I'm not familiar enough with HTTP and the surrounding tools (proxies, reverse proxies, etc.) to know the right buzzword to search for.
There are couple of possibilities.
One option is to use Common Address Redundancy Protocol or carp. Brief description from the man page follows.
"carp allows multiple hosts on the same local network to share a set of IP addresses. Its primary purpose is to ensure that these addresses are always available, but in some configurations carp can also provide load balancing functionality."
It should be possible be configure IP balancing such that when a primary or master http service fails, the secondary or backup http service becomes the master. carp is host oriented as opposed to application service. So when the http service goes down, it should also take down the network interface for carp to do its thing. This means you would need more than one IP address in order to log into the machine and do maintenance. You would need a script to do the follow-up actions once the original service comes back online.
The second option is to use nginx. This is probably better suited to what you are trying to do.
Many years ago I needed something similar to what are trying to do and I ended up hacking together something that did it. Essentially it was a switch. When 'A' fails, switch over to 'B'. Re-sync process was to take time stamped logs from 'B' and play them back to 'A', once 'A' was back online.
I am working on a application, which is proposed to be a set of webapps (being called agent), running on tomcat 7, configured on different nodes. I have been tasked, to make these webapps(agent) discover each other automatically. The idea is, that each webapp(say agent X) , once up, will communicate a 'request pattern' to all the other webapps. Other webapps(say agents A, B, C) in turn will store this information ('request pattern') and will use these to route any matching request to agent X using http call.
I am looking for some option where in each webapp will have some component listening on particular port, and the agent X while registering itself will send a multicast request to all the nodes on that particular port.
I think apache camel might be useful here.. but I am not sure.
It will be great if some body can tell the technical viability of this approach, or any other suggestions.
My first thought was that you could use apache httpd and the mod_proxy_loadbalancer to balance all requests over the available nodes. You can define different balancers for any kind of agents. Requests will be send to the balancer and the balancer will route it to any available node.
This is more of a messaging than a routing problem. Add Camel if you need complex routing or adapting to legacy protocols.
This looks like a classic publish and subscribe use case. You can do it with any messaging technology. Look at JMS - ActiveMQ is what Camel uses, or AMQP - I've used RabbitMQ very successfully for this, both use the "topic" paradigm for this, a quick search found this as an example: http://jmsexample.zcage.com/index2.html. Or Jabber.
Julian
Info
CouchDB is a RESTful, HTTP-based NoSQL datastore. Responses are sent back in simple JSON and it is capable of utilizing ETags in the generated responses to help caching servers tell if data has changed or not.
Question
Is it possible to use nginx to front a collection of CouchDB servers where each Couch server is a shard of a larger collection (and not replicas of each other) and have it determine the target shard based on a particular aspect of the query string?
Example Queries:
http://db.mysite.com?id=1
http://db.mysite.com?id=2
Shard Logic:
shard = ${id} % 2; // even/odd
This isn't a straight forward "load balancing" question because I would need the same requests to always end up at the same servers, but I am curious if this type of simple routing logic can be written into an nginx site configuration.
If it can be, what makes this solution so attractive is that you can then turn on nginx caching of the JSON responses from the Couch servers and have the entire setup nicely packaged up and deployed in a very capable and scalable manner.
You could cobble something together or you could use BigCouch (https://github.com/cloudant/bigcouch).