Can nginx be used to front sharded, HTTP-based resources? - nginx

Info
CouchDB is a RESTful, HTTP-based NoSQL datastore. Responses are sent back in simple JSON and it is capable of utilizing ETags in the generated responses to help caching servers tell if data has changed or not.
Question
Is it possible to use nginx to front a collection of CouchDB servers where each Couch server is a shard of a larger collection (and not replicas of each other) and have it determine the target shard based on a particular aspect of the query string?
Example Queries:
http://db.mysite.com?id=1
http://db.mysite.com?id=2
Shard Logic:
shard = ${id} % 2; // even/odd
This isn't a straight forward "load balancing" question because I would need the same requests to always end up at the same servers, but I am curious if this type of simple routing logic can be written into an nginx site configuration.
If it can be, what makes this solution so attractive is that you can then turn on nginx caching of the JSON responses from the Couch servers and have the entire setup nicely packaged up and deployed in a very capable and scalable manner.

You could cobble something together or you could use BigCouch (https://github.com/cloudant/bigcouch).

Related

Custom routing via nginx - read from third party source

I am new to nginx, and am wondering if it can help me to solve a use-case we've encountered.
I have n nodes,which are reading from from a kafka topic with the same group id, which means that each node has disjoint data, partitioned by some key.
Nginx has no way of knowing apriori which node has data corresponding to which keys. But we can build an API or have a redis instance which can tell us the node given the key.
Is there a way nginx can incorporate third party information of this kind to route requests?
I'd also welcome any answers, even if it doesn't involve nginx.
Nginx has no way of knowing apriori which node has data corresponding to which keys
Nginx doesn't need to know. You would need to do this in Kafka Streams RPC layer with Interactive Queries. (Spring-Kafka has an InteractiveQueryService interface, btw, that can be used from Spring Web).
If you want to present users with a single address for the KStreams HTTP/RPC endpoints, then that would be a standard Nginx upstream definition for a reverse proxy, which would route to any of the backend servers, which in-turn communicate with themselves to fetch the necessary key/value, and return the response back to the client.
I have no idea how Kafka partitions
You could look at the source code and see it uses a murmur2 hash, which is available in Lua, and can be used in Nginx.
But again, this is a rabbit hole you should probably avoid.
Other option, use Kafka Connect to dump data to Redis (or whatever database you want). Then write a very similar HTTP API service, then (optionally) point Nginx at that.

Can nginx reverse proxy isolate cache per endpoint?

I'm using nginx reverse proxy to cache content from two endpoints, one of which is very reliable; the other has frequent timeouts.
I've found that those timeouts can sometimes use up all available connections or cause other issues, degrading performance for the server as a whole and leading to increased latency for the reliable endpoint as well.
I've tweaked some settings (worker_rlimit_nofile, worker_connections), but what I'd really like to do is isolate the caching and connections for the two endpoints as much as possible: give each a share of the available cache, and a share of the available connections, and operate as if they're hitting two separate servers, to reduce the chances that issues with one endpoint affect the performance of the other.
If I were to create two location blocks, one for each endpoint, can I designate each block's share of the cache (e.g. number of files, or total size) and share of available connections?
Or is there a better way of achieving this goal of isolation to ensure reliable performance for the good endpoint, even if the bad endpoint is experiencing lots of timeouts?
Most of the proxy_cache_* directives can be specific to location blocks and will allow you to do just that.
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
It may also help others answer if an example config is provided that reflects what you're currently doing.

How to route a user to a specific machine using NGINX?

I was recently reading this article from the Figma engineering blog: https://www.figma.com/blog/rust-in-production-at-figma/ and was curious about their NGINX setup for multiplayer routing. This is how it looks:
Where they have M number of servers, and where each server has W number of workers. Figma lets users collaborate on design documents in real-time, and each document (i.e. the logic that takes care of the real-time multiplayer processing for each doc) always lives in one specific worker.
I’m wondering how they manage to always route users to the machine that has the worker for the document being worked on, and then to the specific process that actually has the doc.
They do this with NGINX, but my question is how?
I know that NGINX has round-robin and ip_hash methods to load balance, but that’s not granular enough to achieve what they do.
Related question:
Route traffic to multiple node servers based on a condition
You should be able to use a cookie to associate a user with the downstream node: https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/#enabling-session-persistence

Handling Multiple request on a single resource

Recently I came across a question where it was asked if we have around 1000 users hitting a rest endpoint of a microservice and on this endpoint, it was fetching the same data from some other slow process, how could we optimize the request in this usecase? Caching is the obvious answer but how could it be optimized for large number of concurrent requests ?
Obviously as you say you have to go through the cache options, when you have a single application the fastest way can be Ehcache, it is a fairly simple framework to implement.
But if you want to increase the availability of the service you have to cluster it. For that you have to have a load balancer like nginx, and on the other hand to centralize the cache can be Redis db.
Redis ( cache )
^^
Nginx (LB) -> cluster of your app -> Other service

Why use Mongrel2?

I'm confused what purpose Mongrel2 serves/provides that nginx doesn't already do.
(Yes, I've read the manual but I must to be too much of a noob to understand how it's fundamentally different than nginx)
My current web application stack is:
- nginx: webserver
- Lua: programming language
- FastCGI + LuaJIT: to connect nginx to Lua
- Postgres: database
If you could only name one thing then it would be that Mongrel2 is build around ZeroMQ which means that scaling your web server has never been easier.
If a request comes in, Mongrel2 receives it (nothing unusual here, same as for NginX and any other httpd). Next thing that happens is that Mongrel2 distributes the task of compiling a response to n (ZeroMQ-enabled) backends, waits for them to do the work, receives results, compiles the response and sends it off to the client.
Now, the magic is with the fact that n can be any number and, that each of n can be written in any language as supported by ZeroMQ (20 or so) plus, all goes across the network so each n can be a dedicated box, possibly in another datacenter.
In other words: with NginX and all the rest you have to do scalability in your logic tier, Mongrel2 allows you to start (from a request/response cycle point of view) this right where the request hits your infrastructure, at the httpd rather than letting complexity penetrate down to your logic tier which blows complexity upwards by at least one order of magnitude imo.
You should look at the strengths of each and decide to use either or both depending on your use cases..
While, it seems that nginx does everything that mongrel2 provides in the surface, you'll find there are major differences in focus between the two.
Nginx shines as a front-end webserver, that can proxy requests to your backend webservers/appservers and also serve static content.
Mongrel2 is a slight change in the stack. As mentioned, it's power comes from it's use of zeromq as the transport layer between it and the backend appservers. It can serve dynamic request urls (app requests) and direct the compute portion of the task out to different backends using zeromq..
mongrel2 allows you to serve not just http, websockets etc, but other protocols (if you're inclined to do so) all from the same server. the user would never know that portions of the app are being served from different backends.
If your requirements for the functionality of your webapp keeps changing or you want to add things like streaming, the ability to code in different languages in the back end etc, then I would definitely look at mongrel2. Or even have a hybrid
where you use nginx/haproxy/varnish for static files and caching, and everything else is directed to mongrel2.

Resources