Imagine a webbrowser that makes an HTTP request to a remote server, such as site.example.com
If the browser is then configured to use a proxy server, let's call it proxy.example.com using port 8080, in which ways are the request now different?
Obviously the request is now sent to proxy.example.com:8080, but there must surely be other changes to enable the proxy to make a request to the original url?
RFC 7230 - Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing, Section 5.3.2. absolute-form:
When making a request to a proxy, other than a CONNECT or server-wide
OPTIONS request (as detailed below), a client MUST send the target
URI in absolute-form as the request-target.
absolute-form = absolute-URI
The proxy is requested to either service that request from a valid
cache, if possible, or make the same request on the client's behalf
to either the next inbound proxy server or directly to the origin
server indicated by the request-target. Requirements on such
"forwarding" of messages are defined in Section 5.7.
An example absolute-form of request-line would be:
GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1
So, without proxy, the connection is made to www.example.org:80:
GET /pub/WWW/TheProject.html HTTP/1.1
Host: www.example.org
With proxy it is made to proxy.example.com:8080:
GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1
Host: www.example.org
Where in the latter case the Host header is optional (for HTTP/1.0 clients), and must be recalculated by the proxy anyway.
The proxy simply makes the request on behalf of the original client. Hence the name "proxy", the same meaning as in legalese. The browser sends their request to the proxy, the proxy makes a request to the requested server (or not, depending on whether the proxy wants to forward this request or deny it), the server returns a response to the proxy, the proxy returns the response to the original client. There's no fundamental difference in what the server will see, except for the fact that the originating client will appear to be the proxy server. The proxy may or may not alter the request, and it may or may not cache it; meaning the server may not receive a request at all if the proxy decides to deliver a cached version instead.
Related
There is a issue raised by one of our client who is using our Rest based API that whenever he is sending a post request to our server without AcceptEncoding http header but he is getting Compressed content in return. I checked the IIS logs on our API server which addressed his request and the request received on the server has come with a Accept-Encoding(http header) as set to gzip. In between the client machine and our server sits intermediaries(proxies) and load balancer. which network tracing tool should I use for investigating as to where this http header is getting added.
One solution to avoid an HTTP message to be compressed is to add Cache-Control: no-transform to the request headers to avoid payload alteration by proxies as stated in RFC 7234 section 5.2.1.6.
Also, Via header may contain useful comments that can help when looking for what did each proxy add to the request.
I am trying to write (and understand) a transparent proxy.
My setup would look like this
Client Browser ---> TProxy ----> Upstream Proxy ------> cloud
When the client browser makes a GET request, the idea is TProxy would then CONNECT to the Upstream proxy. The upstream proxy requires digest authentication. So, essentially the flow would look like
Client Browser ---> TProxy --------> Upstream Proxy ---------------> cloud
GET BBC.co.uk
CONNECT
407 PROXY AUTH REQUIRED
CONNECT
(with proxy-authorization)
200 OK
GET BBC.co.uk
I am confused what happens once CONNECT with authorization succeeds.
Am I suppose to modify the original GET request now to include a
Proxy-Authorization header?
or would the original GET request be then tunnelled in another http header something like
HTTP Header
Proxy Authorization
HTTP Header (GET BBC.CO.UK)
Data
or I can just pass the original GET request as is?
I am just starting with http and would appreciate any help.
Thanks
When you authenticate upstream from your transparent proxy, the Proxy-Authorization header applies only to the CONNECT.
The GET requests happen within the tunnel, so the upstream explicit proxy is not supposed to see them, and for sure does not expect any proxy authentication headers on them.
In short, you do not need to worry about the GET, but not because of the answer given above, but because there is a tunnel between the transparent proxy and the site, and the explicit proxy only sees and authenticates the CONNECT.
There is no such thing as nested headers in HTTP.
A proxy - whether transparent or not - always terminates the HTTP connection from the client, and initiates a new one to the server.
That means that the HTTP GET from the client goes to your TProxy. TProxy creates a new GET request to Upstream Proxy. Ideally, TProxy will simply pass on all the headers. That would make it (nearly) undetectable.
The same goes in reverse for the response headers.
In reality, proxy servers will, and in many cases have to, manipulate some headers. They will often add their own header (for instance, to alert the communication partners to the presence of a proxy), and they can also manipulate existing headers.
So, the short answer to your question: whatever header field your TProxy receives, pass it on unchanged unless you fully understand the implications.
Imagine we have http client, some proxy and web-server that serves as backend. The proxy is configured to cache the responses of the backend.
A request arrives, the proxy transfers it to the backend, the latter responses, the proxy caches the response and sends it to the client.
Imagine the backend has set some cache-related headers in its response to the proxy's request. For example:
Cache-Control: no-cache (or)
Cache-Control: max-age=100000 (or)
Expires: 'Next Friday'
A question is: Will the next request from the client be processed by the proxy in compliance with that headers?
Another flavour of the question: Is there a way for a proxy to understand that the resource is stale, except for its own static resource-lifetime settings?
Third variation: Can the client force the proxy to load the fresh resource version if the proxy's version of the resource is not considered stale by the proxy?
My question may look a little bit overgeneralized, not specific enough. I'll try to address this issue by working with browser + nginx proxy + nginx web-server setup. If my setup works correctly, in case some resource has been cached by proxy and nginx's proxy_cache_valid timeout is still on - nothing can prevent the proxy from serving a stale response; whatever I do the requests do not hit the backend.
It looks like nginx decides whether the cache is stale only basing on proxy_cache_valid setting, and the headers of backend's response do not matter at all. I'm wondering whether my guess is correct and whether it can be incorrect for some other http proxy setups, serving as reverse proxy such as nginx, office network internal proxy such as squid, internet public proxy.
The JVM allows proxy properties http.proxyHost and http.proxyPort for specifying a HTTP proxy server and https.proxyHost and https.proxyPort for specifying a HTTPS proxy server .
I was wondering whether there are any advantages of using a HTTPS proxy server compared to a HTTP proxy server ?
Is accessing a https url via a HTTPS proxy less cumbersome than accesing it from a HTTP proxy ?
HTTP proxy gets a plain-text request and [in most but not all cases] sends a different HTTP request to the remote server, then returns information to the client.
HTTPS proxy is a relayer, which receives special HTTP request (CONNECT verb) and builds an opaque tunnel to the destination server (which is not necessarily even an HTTPS server). Then the client sends SSL/TLS request to the server and they continue with SSL handshake and then with HTTPS (if requested).
As you see, these are two completely different proxy types with different behavior and different design goals. HTTPS proxy can't cache anything as it doesn't see the request sent to the server. With HTTPS proxy you have a channel to the server and the client receives and validates server's certificate (and optionally vice versa). HTTP proxy, on the other hand, sees and has control over the request it received from the client.
While HTTPS request can be sent via HTTP proxy, this is almost never done because in this scenario the proxy will validate server's certificate, but the client will be able to receive and validate only proxy's certificate, and as name in the proxy's certificate will not match the address the socket connected to, in most cases an alert will be given and SSL handshake won't succeed (I am not going into details of how to try to address this).
Finally, as HTTP proxy can look into the request, this invalidates the idea of security provided by HTTPS channel, so using HTTP proxy for HTTPS requests is normally done only for debugging purposes (again we omit cases of paranoid company security policies which require monitoring of all HtTPS traffic of company employees).
Addition: also read my answer on the similar topic here.
There are no pros or cons.
And there are no "HTTPS proxy" server.
You can tell the protocol handlers which proxy server to use for different protocols. This can be done for http, https, ftp and socks. Not more and not less.
I can't tell you if you should use a different proxy for https connections or not. It depends.
I can only explain the difference of an http and https request to a proxy.
Since the HTTP Proxy (or web proxy) understands HTTP (hence the name), the client can just send the request to the proxy server instead of the actual destenation.
This does not work for HTTPS.
This is because the proxy can't make the TLS handshake, which happens at first.
Therefore the client must send a CONNECT request to the proxy.
The proxy establishes a TCP connection and just sends the packages forth and back without touching them.
So the TLS handshake happens between the client and destenation.
The HTTP proxy server does not see everything and does not validate destenation servers certificate whatsoever.
There can be some confusion with this whole http, https, proxy thing.
It is possible to connect to a HTTP proxy with https.
In this case, the communication between the client and the proxy is encrypted.
There are also so called TLS terminating or interception proxy servers like Squid's SSL Peek and Splice or burp, which see everything.
But this should not work out of the box, because the proxy uses own certificates which are not signed by trusted CAs.
References
https://docs.oracle.com/javase/8/docs/technotes/guides/net/proxies.html
https://parsiya.net/blog/2016-07-28-thick-client-proxying---part-6-how-https-proxies-work/
http://dev.chromium.org/developers/design-documents/secure-web-proxy
https://www.rfc-editor.org/rfc/rfc2817#section-5
https://www.rfc-editor.org/rfc/rfc7231#section-4.3.6
If you mean connecting to a HTTP proxy server over TLS by saying HTTPS proxy, then
I was wondering whether there are any advantages of using a HTTPS
proxy server compared to a HTTP proxy server ?
The advantage is that your client's connection to proxy server is encrypted. E.g. A firewall can't not see which host you use CONNECT method connect to.
Is accessing a https url via a HTTPS proxy less cumbersome than
accesing it from a HTTP proxy ?
Everything is the same except that with HTTPS proxy, brower to proxy server connection is encrypted.
But you need to deploy a certificate on your proxy server, like how a https website does, and use a pac file to configure the brower to enable Connecting to a proxy over SSL.
For more details and a practical example, check my question and answer here HTTPs proxy server only works in SwitchOmega
Unfortunately, "HTTPS proxy" has two distinct meanings:
A proxy that can forward HTTPS traffic to the destination. This proxy itself is using an HTTP protocol to set up the forwarding.
In case the browser is trying to connect to a website using HTTPS, the browser will send a CONNECT request to the proxy, and the proxy will set up a TCP connection with the website and mirror all TCP traffic sent on the connection from the browser to the proxy onto the connection between the proxy and the website, and similarly mirror the response TCP packet payload from the webite to the connection with the browser. Hypothetically, the same mechanism using CONNECT could be used with HTTP traffic, but practically speaking browsers don't do that. For HTTP traffic, they send the actual HTTP request to the proxy, including the full path in the HTTP command (as well as setting the Host header): https://stackoverflow.com/a/38259076/10026
So, by this definition, HTTPS Proxy is a proxy that understands the CONNECT directive and can support HTTPS traffic going between the browser and the website.
A proxy that uses HTTPS protocol to secure client communication.
In this mode (sometimes referred to as "Secure Proxy"), the browser uses the proxy's own certificate to perform TLS handshake with the proxy, and then sends either HTTP or HTTPS traffic, (including CONNECT requests), on that connection as per (1). So, the connection between the browser and the proxy is always protected with a TLS key derived using the proxy's certificate, regardless of whether the traffic itself is encrypted with a key negotiated between the browser and the website. If HTTPS traffic is proxied via a secure proxy, it is double-encrypted on the connection between the browser and the proxy.
For example, the Proxy Switcher Chrome plugin has two separate settings to control each of these funtionalities:
As of 2022, the option to use a secure proxy is not available in MacOS and Windows manual proxy configuration UI. But a secure proxy may be specified in a PAC file used in automatic proxy configuration using the HTTPS proxy directive. It is up to the consuming application to support the HTTPS directive; most major browsers, except Safari, and many desktop apps support it.
NOTE: Things get a bit more complicated because some proxies that proxy HTTPS traffic don't simply forward TCP packet payload, as described in (1), but act as Intercepting Proxies. Using a spoofed website certificate, they effectively perform a Man-in-the-Middle attack (well, it's not necessarily an attack because it's expected behavior). Whereas the browser thinks it's using the website's certificate to set up a TLS tunnel with a website, it's actually using a spoofed certificate to set up TLS tunnel with the proxy, and the proxy sets up the TLS tunnel with the website. Then proxy has visibility into the HTTPS requests/responses. But all of that is completely orthogonal to whether the proxy is acting as a secure proxy as per (2).
So I'm trying to implement the following scenario:
An application is protected by Basic Authentication. Let's say it is hosted on app.com
An HTTP proxy, in front of the application, requires authentication as well. It is hosted on proxy.com
The user must therefore provide credentials for both the proxy and the application in the same request, thus he has different username/password pairs: one pair to authenticate himself against the application, and another username/password pair to authenticate himself against the proxy.
After reading the specs, I'm not really sure on how I should implement this. What I was thinking to do is:
The user makes an HTTP request to the proxy without any sort of authentication.
The proxy answers 407 Proxy Authentication Required and returns a Proxy-Authenticate header in the format of: "Proxy-Authenticate: Basic realm="proxy.com".Question: Is this Proxy-Authenticate header correctly set?
The client then retries the request with a Proxy-Authorization header, that is the Base64 representation of the proxy username:password.
This time the proxy authenticates the request, but then the application answers with a 401 Unauthorized header. The user was authenticated by the proxy, but not by the application. The application adds a WWW-Authenticate header to the response like WWW-Authenticate: Basic realm="app.com". Question: this header value is correct right?
The client retries again the request with both a Proxy-Authorization header, and a Authorization header valued with the Base64 representation of the app's username:password.
At this point, the proxy successfully authenticates the request, forwards the request to the application that authenticates the user as well. And the client finally gets a response back.
Is the whole workflow correct?
Yes, that looks like a valid workflow for the situation you described, and those Authenticate headers seem to be in the correct format.
It's interesting to note that it's possible, albeit unlikely, for a given connection to involve multiple proxies that are chained together, and each one can itself require authentication. In this case, the client side of each intermediate proxy would itself get back a 407 Proxy Authentication Required message and itself repeat the request with the Proxy-Authorization header; the Proxy-Authenticate and Proxy-Authorization headers are single-hop headers that do not get passed from one server to the next, but WWW-Authenticate and Authorization are end-to-end headers that are considered to be from the client to the final server, passed through verbatim by the intermediaries.
Since the Basic scheme sends the password in the clear (base64 is a reversible encoding) it is most commonly used over SSL. This scenario is implemented in a different fashion, because it is desirable to prevent the proxy from seeing the password sent to the final server:
the client opens an SSL channel to the proxy to initiate the request, but instead of submitting a regular HTTP request it would submit a special CONNECT request (still with a Proxy-Authorization header) to open a TCP tunnel to the remote server.
The client then proceeds to create another SSL channel nested inside the first, over which it transfers the final HTTP message including the Authorization header.
In this scenario the proxy only knows the host and port the client connected to, not what was transmitted or received over the inner SSL channel. Further, the use of nested channels allows the client to "see" the SSL certificates of both the proxy and the server, allowing the identity of both to be authenticated.