HTTP connect with Transparent proxy - http

I am trying to write (and understand) a transparent proxy.
My setup would look like this
Client Browser ---> TProxy ----> Upstream Proxy ------> cloud
When the client browser makes a GET request, the idea is TProxy would then CONNECT to the Upstream proxy. The upstream proxy requires digest authentication. So, essentially the flow would look like
Client Browser ---> TProxy --------> Upstream Proxy ---------------> cloud
GET BBC.co.uk
CONNECT
407 PROXY AUTH REQUIRED
CONNECT
(with proxy-authorization)
200 OK
GET BBC.co.uk
I am confused what happens once CONNECT with authorization succeeds.
Am I suppose to modify the original GET request now to include a
Proxy-Authorization header?
or would the original GET request be then tunnelled in another http header something like
HTTP Header
Proxy Authorization
HTTP Header (GET BBC.CO.UK)
Data
or I can just pass the original GET request as is?
I am just starting with http and would appreciate any help.
Thanks

When you authenticate upstream from your transparent proxy, the Proxy-Authorization header applies only to the CONNECT.
The GET requests happen within the tunnel, so the upstream explicit proxy is not supposed to see them, and for sure does not expect any proxy authentication headers on them.
In short, you do not need to worry about the GET, but not because of the answer given above, but because there is a tunnel between the transparent proxy and the site, and the explicit proxy only sees and authenticates the CONNECT.

There is no such thing as nested headers in HTTP.
A proxy - whether transparent or not - always terminates the HTTP connection from the client, and initiates a new one to the server.
That means that the HTTP GET from the client goes to your TProxy. TProxy creates a new GET request to Upstream Proxy. Ideally, TProxy will simply pass on all the headers. That would make it (nearly) undetectable.
The same goes in reverse for the response headers.
In reality, proxy servers will, and in many cases have to, manipulate some headers. They will often add their own header (for instance, to alert the communication partners to the presence of a proxy), and they can also manipulate existing headers.
So, the short answer to your question: whatever header field your TProxy receives, pass it on unchanged unless you fully understand the implications.

Related

How does proxy server know the target domain of the client?

I'm currently writing a proxy server in nodejs. To proceed, I need to know how to reliably determine the originally intended domain of the client. When a client is configured to use a proxy, is there a universal way that the client sends this information (e.g. one of the two examples below), or is it application specific (e.g. Chrome proxy settings may do it differently to IE proxy settings, which may be different to a configuration for a proxy for an entire Windows machine, etc.)?
An HTTP request to the proxy server could look something like this, which would suffice:
GET /something HTTP/1.1
Host: example.com
...
In this case, the proxy could get the hostname from the 'Host' header, get the path in the first line of the HTTP request, and then have sufficient information.
It could also look something like this, which would suffice:
GET http://example.com/something HTTP/1.1
...
with a FQDN in the URL, in which case the proxy could just retrieve the path of the HTTP request in the first line.
Any information regarding this would be greatly appreciated! Thanks in advance for the help!

What is the correct way to render absolute URLs behind a reverse proxy?

I have a web application running on a server (let's say on localhost:8000) behind a reverse proxy on that same server (on myserver.example:80). Because of the way the reverse proxy works, the application sees an incoming request targeted at localhost:8000 and the framework I'm using therefore tries to generate absolute URLs that look like localhost:8000/some/ressource instead of myserver.example/some/ressource.
What would be "the correct way" of generating an absolute URL (namely, determining what hostname to use) from behind a proxy server like that? The specific proxy server, framework and language don't matter, I mean this more in an HTTP sense.
From my initial research:
RFC7230 explicitly says that proxies MUST change the Host header when passing the request along to make it look like the request came from them, so it would look like using Host to determine what hostname to use for the URL, yet in most places where I have looked, the general advice seems to be to configure your reverse proxy to not change the Host header (counter to the spec) when passing the request along.
RFC7230 also says that "request URI reconstruction" should use the following fields in order to find what "authority component" to use, though that seems to also only apply from the point-of-view of the agent that emitted that request, such as the proxy:
Fixed URI authority component from the server or outbound gateway config
The authority component from the request's firsr line if it's a complete URI instead of a path
The Host header if it's present and not empty
The listening address or hostname, alongside with the incoming port number if it's not the default one for the protocol
HTTP 1.0 didn't have a Host header at all, and that header was added for routing purposes, not for URL authority resolution.
There are headers that are made specifically to let proxies to send the old value of Host after routing, such as Via, Forwarded and the unofficial X-Forwarded-Host, which some servers and frameworks will check, but not all, and it's unclear which one should even take priority given how there's 3 of them.
EDIT: I also don't know whether HTTPS would work differently in that regard, given that the headers are part of the encrypted payload and routing has to be performed another way because of this.
In general I find it’s best to set the real host and port explicitly in the application rather than try to guess these from the incoming request.
So for example Jira allows you to set the Base URL through which Jira will be accessed (which may be different to the one that it is actually run as). This means you can have Jira running on port 8080 and have Apache or Nginx in front of it (on the same or even a different server) on port 80 and 443.

Which changes do a browser make when using an HTTP Proxy?

Imagine a webbrowser that makes an HTTP request to a remote server, such as site.example.com
If the browser is then configured to use a proxy server, let's call it proxy.example.com using port 8080, in which ways are the request now different?
Obviously the request is now sent to proxy.example.com:8080, but there must surely be other changes to enable the proxy to make a request to the original url?
RFC 7230 - Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing, Section 5.3.2. absolute-form:
When making a request to a proxy, other than a CONNECT or server-wide
OPTIONS request (as detailed below), a client MUST send the target
URI in absolute-form as the request-target.
absolute-form = absolute-URI
The proxy is requested to either service that request from a valid
cache, if possible, or make the same request on the client's behalf
to either the next inbound proxy server or directly to the origin
server indicated by the request-target. Requirements on such
"forwarding" of messages are defined in Section 5.7.
An example absolute-form of request-line would be:
GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1
So, without proxy, the connection is made to www.example.org:80:
GET /pub/WWW/TheProject.html HTTP/1.1
Host: www.example.org
With proxy it is made to proxy.example.com:8080:
GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1
Host: www.example.org
Where in the latter case the Host header is optional (for HTTP/1.0 clients), and must be recalculated by the proxy anyway.
The proxy simply makes the request on behalf of the original client. Hence the name "proxy", the same meaning as in legalese. The browser sends their request to the proxy, the proxy makes a request to the requested server (or not, depending on whether the proxy wants to forward this request or deny it), the server returns a response to the proxy, the proxy returns the response to the original client. There's no fundamental difference in what the server will see, except for the fact that the originating client will appear to be the proxy server. The proxy may or may not alter the request, and it may or may not cache it; meaning the server may not receive a request at all if the proxy decides to deliver a cached version instead.

HTTP Spec: Proxy-Authorization and Authorization headers

So I'm trying to implement the following scenario:
An application is protected by Basic Authentication. Let's say it is hosted on app.com
An HTTP proxy, in front of the application, requires authentication as well. It is hosted on proxy.com
The user must therefore provide credentials for both the proxy and the application in the same request, thus he has different username/password pairs: one pair to authenticate himself against the application, and another username/password pair to authenticate himself against the proxy.
After reading the specs, I'm not really sure on how I should implement this. What I was thinking to do is:
The user makes an HTTP request to the proxy without any sort of authentication.
The proxy answers 407 Proxy Authentication Required and returns a Proxy-Authenticate header in the format of: "Proxy-Authenticate: Basic realm="proxy.com".Question: Is this Proxy-Authenticate header correctly set?
The client then retries the request with a Proxy-Authorization header, that is the Base64 representation of the proxy username:password.
This time the proxy authenticates the request, but then the application answers with a 401 Unauthorized header. The user was authenticated by the proxy, but not by the application. The application adds a WWW-Authenticate header to the response like WWW-Authenticate: Basic realm="app.com". Question: this header value is correct right?
The client retries again the request with both a Proxy-Authorization header, and a Authorization header valued with the Base64 representation of the app's username:password.
At this point, the proxy successfully authenticates the request, forwards the request to the application that authenticates the user as well. And the client finally gets a response back.
Is the whole workflow correct?
Yes, that looks like a valid workflow for the situation you described, and those Authenticate headers seem to be in the correct format.
It's interesting to note that it's possible, albeit unlikely, for a given connection to involve multiple proxies that are chained together, and each one can itself require authentication. In this case, the client side of each intermediate proxy would itself get back a 407 Proxy Authentication Required message and itself repeat the request with the Proxy-Authorization header; the Proxy-Authenticate and Proxy-Authorization headers are single-hop headers that do not get passed from one server to the next, but WWW-Authenticate and Authorization are end-to-end headers that are considered to be from the client to the final server, passed through verbatim by the intermediaries.
Since the Basic scheme sends the password in the clear (base64 is a reversible encoding) it is most commonly used over SSL. This scenario is implemented in a different fashion, because it is desirable to prevent the proxy from seeing the password sent to the final server:
the client opens an SSL channel to the proxy to initiate the request, but instead of submitting a regular HTTP request it would submit a special CONNECT request (still with a Proxy-Authorization header) to open a TCP tunnel to the remote server.
The client then proceeds to create another SSL channel nested inside the first, over which it transfers the final HTTP message including the Authorization header.
In this scenario the proxy only knows the host and port the client connected to, not what was transmitted or received over the inner SSL channel. Further, the use of nested channels allows the client to "see" the SSL certificates of both the proxy and the server, allowing the identity of both to be authenticated.

http push - http streaming method with ssl - do proxies interfere whith https traffic?

My Question is related to the HTTP Streaming Method for realizing HTTP Server Push:
The "HTTP streaming" mechanism keeps a request open indefinitely. It
never terminates the request or closes the connection, even after the
server pushes data to the client. This mechanism significantly
reduces the network latency because the client and the server do not
need to open and close the connection.
The HTTP streaming mechanism is based on the capability of the server
to send several pieces of information on the same response, without
terminating the request or the connection. This result can be
achieved by both HTTP/1.1 and HTTP/1.0 servers.
The HTTP protocol allows for intermediaries
(proxies, transparent proxies, gateways, etc.) to be involved in
the transmission of a response from server to the client. There
is no requirement for an intermediary to immediately forward a
partial response and it is legal for it to buffer the entire
response before sending any data to the client (e.g., caching
transparent proxies). HTTP streaming will not work with such
intermediaries.
Do I avoid the descibed problems whith proxy servers if i use HTTPS?
HTTPS doesn't use HTTP proxies - this would make security void. HTTPS connection can be routed via some HTTP proxy or just HTTP redirector by using HTTP CONNECT command, which establishes transparent tunnel to the destination host. This tunnel is completely opaque to the proxy, and proxy can't get to know, what is transferred (it can attempt to modify the dataflow, but SSL layer will detect modification and send an alert and/or close connection), i.e. what has been encrypted by SSL.
Update: for your task you can try to use one of NULL cipher suites (if the server allows) to reduce the number of operations, such as perform no encryption, anonymous key exchange etc. (this will not affect proxy's impossibility to alter your data).

Resources