How to write "transparent" HTTP proxy?

How to write "transparent" HTTP proxy? - http

Inspired by this article http://www.catonmat.net/http-proxy-in-nodejs
Any idea, how to convert this proxy to acts as transparent proxy?
PS: I know how to set up my firewall, etc... Just trying to run this toy instead of transparent squid.

In general, the difference between transparent and explicit proxy is that the full URL is not sent in the HTTP command in transparent mode.
The proxy would use the Host header to determine the upstream server instead of extracting it from the URL: otherwise the processing is the same. Note that this works for HTTP only, and transparent HTTPS proxy is much harder.
I'm not familiar with node.js: my guess is that the Host header will be available in the request.headers field, and then it's a matter of fixing the proxy_request object to have the proper full upstream URL.

Related

How does proxy server know the target domain of the client?

I'm currently writing a proxy server in nodejs. To proceed, I need to know how to reliably determine the originally intended domain of the client. When a client is configured to use a proxy, is there a universal way that the client sends this information (e.g. one of the two examples below), or is it application specific (e.g. Chrome proxy settings may do it differently to IE proxy settings, which may be different to a configuration for a proxy for an entire Windows machine, etc.)?
An HTTP request to the proxy server could look something like this, which would suffice:
GET /something HTTP/1.1
Host: example.com
...
In this case, the proxy could get the hostname from the 'Host' header, get the path in the first line of the HTTP request, and then have sufficient information.
It could also look something like this, which would suffice:
GET http://example.com/something HTTP/1.1
...
with a FQDN in the URL, in which case the proxy could just retrieve the path of the HTTP request in the first line.
Any information regarding this would be greatly appreciated! Thanks in advance for the help!

What is the correct way to render absolute URLs behind a reverse proxy?

I have a web application running on a server (let's say on localhost:8000) behind a reverse proxy on that same server (on myserver.example:80). Because of the way the reverse proxy works, the application sees an incoming request targeted at localhost:8000 and the framework I'm using therefore tries to generate absolute URLs that look like localhost:8000/some/ressource instead of myserver.example/some/ressource.
What would be "the correct way" of generating an absolute URL (namely, determining what hostname to use) from behind a proxy server like that? The specific proxy server, framework and language don't matter, I mean this more in an HTTP sense.
From my initial research:
RFC7230 explicitly says that proxies MUST change the Host header when passing the request along to make it look like the request came from them, so it would look like using Host to determine what hostname to use for the URL, yet in most places where I have looked, the general advice seems to be to configure your reverse proxy to not change the Host header (counter to the spec) when passing the request along.
RFC7230 also says that "request URI reconstruction" should use the following fields in order to find what "authority component" to use, though that seems to also only apply from the point-of-view of the agent that emitted that request, such as the proxy:
Fixed URI authority component from the server or outbound gateway config
The authority component from the request's firsr line if it's a complete URI instead of a path
The Host header if it's present and not empty
The listening address or hostname, alongside with the incoming port number if it's not the default one for the protocol
HTTP 1.0 didn't have a Host header at all, and that header was added for routing purposes, not for URL authority resolution.
There are headers that are made specifically to let proxies to send the old value of Host after routing, such as Via, Forwarded and the unofficial X-Forwarded-Host, which some servers and frameworks will check, but not all, and it's unclear which one should even take priority given how there's 3 of them.
EDIT: I also don't know whether HTTPS would work differently in that regard, given that the headers are part of the encrypted payload and routing has to be performed another way because of this.

In general I find it’s best to set the real host and port explicitly in the application rather than try to guess these from the incoming request.
So for example Jira allows you to set the Base URL through which Jira will be accessed (which may be different to the one that it is actually run as). This means you can have Jira running on port 8080 and have Apache or Nginx in front of it (on the same or even a different server) on port 80 and 443.

HTTP connect with Transparent proxy

I am trying to write (and understand) a transparent proxy.
My setup would look like this
Client Browser ---> TProxy ----> Upstream Proxy ------> cloud
When the client browser makes a GET request, the idea is TProxy would then CONNECT to the Upstream proxy. The upstream proxy requires digest authentication. So, essentially the flow would look like
Client Browser ---> TProxy --------> Upstream Proxy ---------------> cloud
GET BBC.co.uk
CONNECT
407 PROXY AUTH REQUIRED
CONNECT
(with proxy-authorization)
200 OK
GET BBC.co.uk
I am confused what happens once CONNECT with authorization succeeds.
Am I suppose to modify the original GET request now to include a
Proxy-Authorization header?
or would the original GET request be then tunnelled in another http header something like
HTTP Header
Proxy Authorization
HTTP Header (GET BBC.CO.UK)
Data
or I can just pass the original GET request as is?
I am just starting with http and would appreciate any help.
Thanks

When you authenticate upstream from your transparent proxy, the Proxy-Authorization header applies only to the CONNECT.
The GET requests happen within the tunnel, so the upstream explicit proxy is not supposed to see them, and for sure does not expect any proxy authentication headers on them.
In short, you do not need to worry about the GET, but not because of the answer given above, but because there is a tunnel between the transparent proxy and the site, and the explicit proxy only sees and authenticates the CONNECT.

There is no such thing as nested headers in HTTP.
A proxy - whether transparent or not - always terminates the HTTP connection from the client, and initiates a new one to the server.
That means that the HTTP GET from the client goes to your TProxy. TProxy creates a new GET request to Upstream Proxy. Ideally, TProxy will simply pass on all the headers. That would make it (nearly) undetectable.
The same goes in reverse for the response headers.
In reality, proxy servers will, and in many cases have to, manipulate some headers. They will often add their own header (for instance, to alert the communication partners to the presence of a proxy), and they can also manipulate existing headers.
So, the short answer to your question: whatever header field your TProxy receives, pass it on unchanged unless you fully understand the implications.

How to configure squid to be a Transparent proxy?

I am working with Squid Proxy Server as I have also used cyberoam,Sonicwall and Clear OS.
I want to setup my own proxy like above products ie authentication in transparent proxy.
Actually I setup transparent proxy but at that time my HTTPS site is not working.Then I configure one iptables rule that redirect all http & https traffic to 3128(squid port) only. but here I can access all my https websites but I cant block them.
My requirement is when I am going to access any website at first time it will ask me to authentication and then and only i can access internet. In log reports also I can show its Username and one more thing it will also possible in thinclient(terminal service).
Anybody help me short-out this problem ?

Proxy authentication doesn't work in transparent proxies setups. The browser should have the proxy configured to catch the authentication request from a proxy and to request the credentials to an user.
Another thing is that you can create a transparent proxy for HTTPS. Why? Because when the browser connects, it's connected to the proxy, not the real server. The browser will try to negotiate SSL which is a thing that Squid won't support. There are tricks to do this, but you'll break the SSL security, browser will complain, etc. There are one tool that I used to get this working: u2nl, but it's a hack that tunnels HTTPS trought the proxy.
The best option, is to use a non-transparent proxy. If you want to avoid browsers configuration, have a look at WPAD

As said before, you can't really block HTTPS sites with Squid, and you can't really use authentication with the proxy running at his transparent mode.
As far as I could use and cofigure, you can use an external acl to force a kind of login, but the login requests will not be treated by the proxy, but you can work it with some PERL.
And about the HTTPS thing, you could work it with some hacks, but it is a very sensible question, because the server performance with be punished with this kind of use and you could be pointed as a fraudulent service, which isn't cool... Believe me.

How can proxy server inform browser to bypass that proxy and make direct connection?

I develop proxy server. It have an internal database of some hosts, with that browser should make direct connection, not through my proxy. Is there any way to inform browser that it should bypass proxy?
For example, I`ve found 305 Use Proxy http header. Is it possible to make what I need with using that header?

You should instruct browser using WPAD or PAC.
http://en.wikipedia.org/wiki/Web_Proxy_Autodiscovery_Protocol
http://en.wikipedia.org/wiki/Proxy_auto-config
It is browser responsibility to decide whenever use proxy or not.
Even large enterprises uses this technology, because it is transparent.