how to forward TCP requests through a reverse proxy towards multiple data services (Mysql, hadoop, Aurora, mssql ... etc) - networking

I'm trying to solve an architecture design puzzle, it's about designing an infra for keeping data and servers as much secured/hidden as possible, here are requirements:
I want to hide the internal design of my infra (several data servers with public and private hosts)
I want to access to each service using same IP address, and the query is forwarded to right server based on something (cookie, uri, port or whatever)
access to data service must be enforced with ssl/tls encryption
After studying carefully these requirements I was thinking about using a reverse proxy and grant access to all data services only across the reverse proxy server, an other pro of a reverse proxy is that access authentication is enforced at once with sll/tls encryption and no need to configure each endpoint separately.
my real issue is that I didn't find any reverse proxy that supports tcp queries, and same for static load balancing algorithms that are supported only for HTTP requests, (haproxy for instance)
Any idea how to solve this issue ?
Thanks to all

Related

What will happen if a SSL-configured Nginx reverse proxy pass to an web server without SSL?

I use Nginx to manage a lot of my web services. They listens different port, but all accessed by the reverse proxy of Nginx within one domain. Such as to access a RESTful-API server I can use http://my-domain/api/, and to access a video server I can use http://my-domain/video.
I have generated a SSL certificate for my-domain and added it into my Nginx conf so my Nginx server is HTTPS now -- But those original servers are still using HTTP.
What will happen when I visit https://my-domain/<path>? Is this as safe as configuring SSL on the original servers?
One of the goals of making sites be HTTPS is to prevent the transmitted data between two endpoints from being intercepted by outside parties to either be modified, as in a man-in-the-middle attack, or for the data to be stolen and used for bad purposes. On the public Internet, any data transmitted between two endpoints needs to be secured.
On private networks, this need isn't quite so great. Many services do run on just HTTP on private networks just fine. However, there are a couple points to take into consideration:
Make sure unused ports are blocked:
While you may have an NGINX reverse proxy listening on port 443, is port 80 blocked, or can the sites still be accessed via HTTP?
Are the other ports to the services blocked as well? Let's say your web server runs on port 8080, and the NGINX reverse proxy forwards certain traffic to localhost:8080, can the site still be accessed at http://example.com:8080 or https://example.com:8080? One way to prevent this is to use a firewall and block all incoming traffic on any ports you don't intend to accept traffic on. You can always unblock them later, if you add a service that requires that port be opened.
Internal services are accessible by other services on the same server
The next consideration relates to other software that may be running on the server. While it's within a private ecosystem, any service running on the server can access localhost:8080. Since the traffic between the reverse proxy and the web server are not encrypted, that traffic can also be sniffed, even if authorisation is required in order to authenticate localhost:8080. All a rogue service would need to do is monitor the port and wait for a user to login. Then that service can capture everything between the two endpoints.
One strategy to mitigate the dangers created by spyware is to either use virtualisation to separate a single server into logical servers, or use different hardware for things that are not related. This at least keeps things separate so that the people responsible for application A don't think that service X might be something the team running application B is using. Anything out of place will more likely stand out.
For instance, a company website and an internal wiki probably don't belong on the same server.
The simpler we can keep the setup and configuration on the server by limiting what that server's job is, the more easily we can keep tabs on what's happening on the server and prevent data leaks.
Use good security practices
Use good security best practices on the server. For instance, don't run as root. Use a non-root user for administrative tasks. For any services that run which are long lived, don't run them as root.
For instance, NGINX is capable of running as the user www-data. With specific users for different services, we can create groups and assign the different users to them and then modify the file ownership and permissions, using chown and chmod, to ensure that those services only have access to what they need and nothing more. As an example, I've often wondered why NGINX needs read access to logs. It really should, in theory, only need write access to them. If this service were to somehow get compromised, the worst it could do is write a bunch of garbage to the logs, but an attacker might find their hands are tied when it comes to retrieving sensitive information from them.
localhost SSL certs are generally for development only
While I don't recommend this for production, there are ways to make localhost use HTTPS. One is with a self signed certificate. The other uses a tool called mkcert which lets you be your own CA (certificate authority) for issuing SSL certificates. The latter is a great solution, since the browser and other services will implicitly trust the generated certificates, but the general consensus, even by the author of mkcert, is that this is only recommended for development purposes, not production purposes. I've yet to find a good solution for localhost in production. I don't think it exists, and in my experience, I've never seen anyone worry about it.

Is it possible to establish mTLS using proxy server

I have a pair of public/private certificates, which are very important and can not be stored on my computer, but only in secure premises.
I need to use such certificates to send requests to multiple providers (in my case those are banks).
What I want to do is set up a proxy server which would hold such certificates.
Then I would be able to send requests through that proxy and it will establish mTLS connection with banks using those certificates.
So my questions are:
Is it possible?
Do I understand correctly that such approach could be called "reverse termintaion proxy"?
Ideally, I would like to do that using nginx. It is possible?
I've found some info in the nginx documentation, but not sure whether it is what I am looking for

Building Proxy Site with Nginx and Rotating Proxy Service

Im' looking to build a similar application to https://www.proxysite.com/ but am not sure on the best architecture.
Looking to have a data flow like this.
User Web Browser -> myproxysite.com -> Ngninx Proxy Server (somehow rotating IP for each client session) -> Targetsite.com
Then the user would need to maintain a full session on Targetsite.com as a logged in user.
In this example, targetsite.com is always the same site and is pre-determined. The challenge we are facing is that targetsite.com is blocking our users based on IP, many of whom are accessing it from the same office network.
So my questions are:
Does this seem correct?
Is there anyway for me to configure nginx with a rotating proxy service like luminati? Or do I need to add an API software layer to handle the actual IP changes?
Any guidance on this one would be greatly appreciated!
While I can't help you with your application, I do want to suggest an alternative. You mentioned an office so it sounds like the users who will use the proxy are workers.
Luminati (now BrightData) has a proxy manager which you can host on any server. The proxy manager allows you to create ports (ie port 24000) and configure it with whatever proxy you want (doesn't have to be BrightData's proxy). It has a ton of different parameters that you can include for each proxy (including IP rotation) and each port can be configured to have a unique setup.
Then you simply go to your user PC, open the browser proxy settings, type the IP address of the server that the proxy manager is running on and the specific port you configured and voila. You have central control of the managing the proxies and your user's browser is proxied.
A big benefit of this is the logs in the proxy manager show all activity on each port you setup, so you can monitor traffic and the success rates right there.
Proxy manager: https://prnt.sc/13uyjgj

API gateway vs. reverse proxy

In order to deal with the microservice architecture, it's often used alongside a Reverse Proxy (such as nginx or apache httpd) and for cross cutting concerns implementation API gateway pattern is used. Sometimes Reverse proxy does the work of API gateway.
It will be good to see clear differences between these two approaches.
It looks like the potential benefit of API gateway usage is invoking multiple microservices and aggregating the results. All other responsibilities of API gateway can be implemented using Reverse Proxy. Such as:
Authentication (It can be done using nginx LUA scripts);
Transport security. It itself Reverse Proxy task;
Load balancing
...
So based on this there are several questions:
Does it make sense to use API gateway and Reverse proxy simultaneously (as example request -> API gateway -> reverse proxy(nginx) -> concrete microservice)? In what cases ?
What are the other differences that can be implemented using API gateway and can't be implemented by Reverse proxy and vice versa?
It is easier to think about them if you realize they aren't mutually exclusive. Think of an API gateway as a specific type reverse proxy implementation.
In regards to your questions, it is not uncommon to see both used in conjunction where the API gateway is treated as an application tier that sits behind a reverse proxy for load balancing and health checking. An example would be something like a WAF sandwich architecture in that your Web Application Firewall/API Gateway is sandwiched by reverse proxy tiers, one for the WAF itself and the other for the individual microservices it talks to.
Regarding the differences, they are very similar. It's just nomenclature. As you take a basic reverse proxy setup and start bolting on more pieces like authentication, rate limiting, dynamic config updates, and service discovery, people are more likely to call that an API gateway.
I believe, API Gateway is a reverse proxy that can be configured dynamically via API and potentially via UI, while traditional reverse proxy (like Nginx, HAProxy or Apache) is configured via config file and has to be restarted when configuration changes. Thus, API Gateway should be used when routing rules or other configuration often changes. To your questions:
It makes sense as long as every component in this sequence serves its purpose.
Differences are not in feature list but in the way configuration changes applied.
Additionally, API Gateway is often provided in form of SAAS, like Apigee or Tyk for example.
Also, here's my tutorial on how to create a simple API Gateway with Node.js https://memz.co/api-gateway-microservices-docker-node-js/
Hope it helps.
API gateway acts as a reverse proxy to accept all application programming interface (API) calls, aggregate the various services required to fulfill them, and return the appropriate result.
An API gateway has a more robust set of features — especially around security and monitoring — than an API proxy. I would say API gateway pattern also called as Backend for frontend (BFF) is widely used in Microservices development. Checkout the article for the benefits and features of API Gateway pattern in Microservice world.
On the other hand API proxy is basically a lightweight API gateway. It includes some basic security and monitoring capabilities. So, if you already have an API and your needs are simple, an API proxy will work fine.
The below image will provide you the clear picture of the difference between API Gateway and Reverse proxy.
API Gateways usually operate as a L7 construct.
API Gateways provide additional functionality as compared to a plain reverse proxy. If you consider some of the portals out there they can provide :
full API Lifecycle Management including documentation
a portal which can be used as the source of truth for various client applications and where you can provide client governance, rate limiting etc.
routing to different versions of the API including canary/beta versions
detecting usage patterns, register apps, retrieve client credentials etc.
However with the advent of service meshes like Istio, Consul a lot of the functionality of API Gateways will be subsumed by meshes.
From HTTP: The Definitive Guide:
Strictly speaking, proxies connect two or more applications that speak
the same protocol, while gateways hook up two or more parties that
speak different protocols. A gateway acts as a "protocol converter,"
allowing a client to complete a transaction with a server, even when
the client and server speak different protocols.
In practice, the difference between proxies and gateways is blurry.
Because browsers and servers implement different versions of HTTP,
proxies often do some amount of protocol conversion. And commercial
proxy servers implement gateway functionality to support SSL security
protocols, SOCKS firewalls, FTP access, and web-based applications.
Reverse proxy, such as Nginx and Apache, do not deal with observability, authentication, authorization, service orchestration, etc., but only do load balancing and forward traffic to upstream.
API Gateway is close to the user's business scenario and helps users solve the security and observability issues of various APIs and microservices.
Different positioning leads to different technical aspects of reverse proxy and API gateway. API gateways, such as Apache APISIX, have nearly 100 plugins and support multiple programming languages for plugin development.
If you already have a good API gateway, there is no need to use a reverse proxy.
Regarding the Andrey Chausenko's answer that
I believe, API Gateway is a reverse proxy that can be configured dynamically via API and potentially via UI, while traditional reverse proxy (like Nginx, HAProxy or Apache) is configured via config file and has to be restarted when configuration changes.
I think it is not true nowadays as modern reverse proxy like Envoy can be dynamically configured by control plane via xDS.

Standard way of using a single port for multiple sockets?

Hey I am writing an app in Twisted, and as it stands I have 4 servers bound two different ports all communicating with the client via JSON. Is there anyway to bind these 4 servers to the same port and have the interactions remain the same?
For instance say the client subscribes to two different feeds, transmitted via a direct socket.
Right now I just do like
server1.read_string()
server2.read_string()
and it will read the correct JSON string from the respective feeds. Is there anyway to maintain this type of functionality but contact my server on the same port?
I do not want to throw all of the server functionality into one massive server and partition the data by header prefixes.
I don't want to do something like
s = server.read_string()
header = s.split(//some delimiter)[0]
if (header == "SERVER1")
{
// Blahh
}
It sounds like you have many clients interacting with your servers via HTTP. The standard solution is to throw a reverse proxy between the client and your servers - that proxy then forwards connections to the appropriate server depending on the URL. The reverse proxy can run on any one of your existing servers or on its own server to lighten the load.
If your data is cachable, the reverse proxy can do caching on your results too.
There are many reverse proxies available and you will want to choose one based on what sort of workload you have. Do you need it to be highly configurable? Is the data public or based on logins? How long does each connection last / how many connections to you want to hold open at once?
Squid, Varnish, HAProxy are good reverse proxies and even Apache could do this for you.
I plan to use HAProxy for Gridspy, my project as I have many ongoing connections with my clients and want to place an orbited server in the same URL path as my django server. See This tutorial for more information on how to forward many connections on port 80 from one server to many. This tutorial is focused on Comet, but your problem is even simpler than that.
If you are considering an ongoing tcp/ip connection from the browser back to your servers, seriously consider Orbited. See this tutorial about graphs via orbited and morbidQ. Orbited will also punch through firewalls and proxies better than most custom solutions will, as it looks like normal HTTP traffic.
In order to have multiple servers running on the same machine all bound to the same port, they need to be bound to different IP addresses. The only way to bind to the same port on the same IP is to enable the socket's SO_REUSESOCKET option, but then multiple servers would be able to receive each other's inbound data, really messing up your communications.
Otherwise, having a single server that uses headers to identifies the particular feeds is best. Why do you not want to do that?

Resources