I have a data vendor for real-time data that has a strict limit on the number of websocket connections I am allowed to make to their API. I have multiple microservices that need to consume this data, with some overlap in subscriptions. The clients do not need to communicate back anything beyond subscriptions.
I would like to design a system using a proxy-server that maintains a single websocket connection to the data vendor and then relays the appropriate messages to the clients via websocket. Optimally, the clients would be able to interact with the proxy server as if it were the data vendor's API.
I have looked at various (reverse?) proxy server solutions here, but have not found specific language about reducing the number of connections to the upstream data source. For example, I have looked into NGINX, but I can't tell if the proxy will combine client connections into a single upstream connection. The other solution I have researched is just putting all messages into a Kafka Pub/Sub via a connector, and have each client subscribe there.
I am curious if there are any existing, out of the box solutions to this problem before I implement my own solution.
I'm trying to solve an architecture design puzzle, it's about designing an infra for keeping data and servers as much secured/hidden as possible, here are requirements:
I want to hide the internal design of my infra (several data servers with public and private hosts)
I want to access to each service using same IP address, and the query is forwarded to right server based on something (cookie, uri, port or whatever)
access to data service must be enforced with ssl/tls encryption
After studying carefully these requirements I was thinking about using a reverse proxy and grant access to all data services only across the reverse proxy server, an other pro of a reverse proxy is that access authentication is enforced at once with sll/tls encryption and no need to configure each endpoint separately.
my real issue is that I didn't find any reverse proxy that supports tcp queries, and same for static load balancing algorithms that are supported only for HTTP requests, (haproxy for instance)
Any idea how to solve this issue ?
Thanks to all
In order to deal with the microservice architecture, it's often used alongside a Reverse Proxy (such as nginx or apache httpd) and for cross cutting concerns implementation API gateway pattern is used. Sometimes Reverse proxy does the work of API gateway.
It will be good to see clear differences between these two approaches.
It looks like the potential benefit of API gateway usage is invoking multiple microservices and aggregating the results. All other responsibilities of API gateway can be implemented using Reverse Proxy. Such as:
Authentication (It can be done using nginx LUA scripts);
Transport security. It itself Reverse Proxy task;
Load balancing
...
So based on this there are several questions:
Does it make sense to use API gateway and Reverse proxy simultaneously (as example request -> API gateway -> reverse proxy(nginx) -> concrete microservice)? In what cases ?
What are the other differences that can be implemented using API gateway and can't be implemented by Reverse proxy and vice versa?
It is easier to think about them if you realize they aren't mutually exclusive. Think of an API gateway as a specific type reverse proxy implementation.
In regards to your questions, it is not uncommon to see both used in conjunction where the API gateway is treated as an application tier that sits behind a reverse proxy for load balancing and health checking. An example would be something like a WAF sandwich architecture in that your Web Application Firewall/API Gateway is sandwiched by reverse proxy tiers, one for the WAF itself and the other for the individual microservices it talks to.
Regarding the differences, they are very similar. It's just nomenclature. As you take a basic reverse proxy setup and start bolting on more pieces like authentication, rate limiting, dynamic config updates, and service discovery, people are more likely to call that an API gateway.
I believe, API Gateway is a reverse proxy that can be configured dynamically via API and potentially via UI, while traditional reverse proxy (like Nginx, HAProxy or Apache) is configured via config file and has to be restarted when configuration changes. Thus, API Gateway should be used when routing rules or other configuration often changes. To your questions:
It makes sense as long as every component in this sequence serves its purpose.
Differences are not in feature list but in the way configuration changes applied.
Additionally, API Gateway is often provided in form of SAAS, like Apigee or Tyk for example.
Also, here's my tutorial on how to create a simple API Gateway with Node.js https://memz.co/api-gateway-microservices-docker-node-js/
Hope it helps.
API gateway acts as a reverse proxy to accept all application programming interface (API) calls, aggregate the various services required to fulfill them, and return the appropriate result.
An API gateway has a more robust set of features — especially around security and monitoring — than an API proxy. I would say API gateway pattern also called as Backend for frontend (BFF) is widely used in Microservices development. Checkout the article for the benefits and features of API Gateway pattern in Microservice world.
On the other hand API proxy is basically a lightweight API gateway. It includes some basic security and monitoring capabilities. So, if you already have an API and your needs are simple, an API proxy will work fine.
The below image will provide you the clear picture of the difference between API Gateway and Reverse proxy.
API Gateways usually operate as a L7 construct.
API Gateways provide additional functionality as compared to a plain reverse proxy. If you consider some of the portals out there they can provide :
full API Lifecycle Management including documentation
a portal which can be used as the source of truth for various client applications and where you can provide client governance, rate limiting etc.
routing to different versions of the API including canary/beta versions
detecting usage patterns, register apps, retrieve client credentials etc.
However with the advent of service meshes like Istio, Consul a lot of the functionality of API Gateways will be subsumed by meshes.
From HTTP: The Definitive Guide:
Strictly speaking, proxies connect two or more applications that speak
the same protocol, while gateways hook up two or more parties that
speak different protocols. A gateway acts as a "protocol converter,"
allowing a client to complete a transaction with a server, even when
the client and server speak different protocols.
In practice, the difference between proxies and gateways is blurry.
Because browsers and servers implement different versions of HTTP,
proxies often do some amount of protocol conversion. And commercial
proxy servers implement gateway functionality to support SSL security
protocols, SOCKS firewalls, FTP access, and web-based applications.
Reverse proxy, such as Nginx and Apache, do not deal with observability, authentication, authorization, service orchestration, etc., but only do load balancing and forward traffic to upstream.
API Gateway is close to the user's business scenario and helps users solve the security and observability issues of various APIs and microservices.
Different positioning leads to different technical aspects of reverse proxy and API gateway. API gateways, such as Apache APISIX, have nearly 100 plugins and support multiple programming languages for plugin development.
If you already have a good API gateway, there is no need to use a reverse proxy.
Regarding the Andrey Chausenko's answer that
I believe, API Gateway is a reverse proxy that can be configured dynamically via API and potentially via UI, while traditional reverse proxy (like Nginx, HAProxy or Apache) is configured via config file and has to be restarted when configuration changes.
I think it is not true nowadays as modern reverse proxy like Envoy can be dynamically configured by control plane via xDS.
My question is the same as this one but hopefully adds clarity to get an answer. After reading this fantastic article on the specifics behind NAT Traversal along with a general summary of methods found here, I'm wondering if the scenario has been accomplished or is possible. I'm writing software that serves web pages on any specified port, and am wondering if it is possible to have a web client from the WAN side connect to this server that is behind a NAT router. The reason this I'm finding this difficult is because:
I don't want to tell the user (who owns the web server) to configure their router to port forward (and many cases the user may not have privileges to do so).
UPnP I believe is often default-disabled, and is another configuration privilege not afforded to the user.
UDP Hole Punching looked promising until I realized the client is using a browser with http, and thus can communicate only through TCP, and limits my capability further by restricting options to browser-scripts.
I haven not found a successful implementation of TCP Hole Punching, considering the difficulties of maintaining state information (currently I'm looking at chownat, but am wondering how to implement TCP over a UDP tunnel from a web browser (or if that's even possible?).
Using a proxy to forward all traffic doesn't scale well (though using an external server, that is not behind a NAT, would be perfectly fine for setting up the initial connection or NAT traversal). By Scaling, I mean if many many users have their own web servers, not for the one user's web server to have high traffic (which is not a concern given the user's upload-bandwidth is often severely limited).
Right now I'm starting to think there will have to be some client-side browser script to help implement this, so the task won't be completely handled by the server. If anybody has any ideas or experience with trying to have a user connect to a web server behind a NAT router, I'd greatly help some direction! Thanks!
Hey I am writing an app in Twisted, and as it stands I have 4 servers bound two different ports all communicating with the client via JSON. Is there anyway to bind these 4 servers to the same port and have the interactions remain the same?
For instance say the client subscribes to two different feeds, transmitted via a direct socket.
Right now I just do like
server1.read_string()
server2.read_string()
and it will read the correct JSON string from the respective feeds. Is there anyway to maintain this type of functionality but contact my server on the same port?
I do not want to throw all of the server functionality into one massive server and partition the data by header prefixes.
I don't want to do something like
s = server.read_string()
header = s.split(//some delimiter)[0]
if (header == "SERVER1")
{
// Blahh
}
It sounds like you have many clients interacting with your servers via HTTP. The standard solution is to throw a reverse proxy between the client and your servers - that proxy then forwards connections to the appropriate server depending on the URL. The reverse proxy can run on any one of your existing servers or on its own server to lighten the load.
If your data is cachable, the reverse proxy can do caching on your results too.
There are many reverse proxies available and you will want to choose one based on what sort of workload you have. Do you need it to be highly configurable? Is the data public or based on logins? How long does each connection last / how many connections to you want to hold open at once?
Squid, Varnish, HAProxy are good reverse proxies and even Apache could do this for you.
I plan to use HAProxy for Gridspy, my project as I have many ongoing connections with my clients and want to place an orbited server in the same URL path as my django server. See This tutorial for more information on how to forward many connections on port 80 from one server to many. This tutorial is focused on Comet, but your problem is even simpler than that.
If you are considering an ongoing tcp/ip connection from the browser back to your servers, seriously consider Orbited. See this tutorial about graphs via orbited and morbidQ. Orbited will also punch through firewalls and proxies better than most custom solutions will, as it looks like normal HTTP traffic.
In order to have multiple servers running on the same machine all bound to the same port, they need to be bound to different IP addresses. The only way to bind to the same port on the same IP is to enable the socket's SO_REUSESOCKET option, but then multiple servers would be able to receive each other's inbound data, really messing up your communications.
Otherwise, having a single server that uses headers to identifies the particular feeds is best. Why do you not want to do that?