REST API generating absolute URLs behind reverse proxy

REST API generating absolute URLs behind reverse proxy - http

Situation
I have a REST API in place that returns JSON objects to its clients. Part of said JSON responses are absolute URLs to other "linked" resources/objects of the form "http://myservice.com/servicePath/apiVersionA/123". In the example URL "servicePath" is the web root of my service, "apiVersionA" is the specific version of the REST API being used, and "123" is the identifier of the actual resource. The fact that my API is returning absolute resource URLs in the response body is prescribed by the protocol I'm using, I am not at liberty to change that.
Problem
The problem I'm facing now are reverse proxies, and more specifically how I assume they handle request patterns like the following:
Client sends request "GET http://myproxy.com/rest/apiVersionA/123" to the Reverse Proxy.
Reverse Proxy forwards the request to its final destination: "GET http://myservice.com/servicePath/apiVersionA/123". Please note that the proxy modifies both the host name and the URL path ("rest" is changed to "servicePath") of the request.
AFAIK, the usual suspects (i.e. HTTP headers "Forwarded", "X-Forwarded-For", "X-Forwarded-Host" etc.) only contain either IP addresses or domain names, but not the entire URL path. So right now in the best case scenario I can generate URLs like "http://myproxy.com/apiVersionA/123" - notice the missing "rest" URL path element that is required by the proxy. This URLs will thus not be accepted by the proxy in a subsequent request made by the client, e.g. to retrieve a resource that is referenced in the response message.
Questions
Is there an obvious proxy-safe HTTP header I am missing that would deliver the original request URL to the actual REST API service?
If not, is there another way how I can generate correct absolute URLs in my service?
So far I have thought about using a custom HTTP header (which might get removed by an intermediate proxy), or by using something like https://httpd.apache.org/docs/2.4/mod/mod_substitute.html on the reverse proxy (which I'd say is not possible as I do not control the networking infrastructure of my customers or any intermediaries). Another "non-solution" that I came up with is to document this as part of my API: "do not use custom URL path elements on your HTTP proxies". This would allow me to generate URLs using just X-Forwarded-Host and X-Forwarded-Proto.
Any help is appreciated!

After some more research it became apparent that there is indeed no standard HTTP header that can do the job. I have thus proceeded with implementing a combination of "Forwarded/X-Forwarded-For" and Microsoft's proprietary "X-Forwarded-PathBase" (https://microsoft.github.io/reverse-proxy/articles/transforms.html) header. This works as intended - customers that cannot or do not want to use the proprietary header still get back valid absolute URLs via the reverse proxy, as long as they do not implement request path mapping in their proxy configuration.

Related

What is the actual difference between the different HTTP request methods besides semantics?

I have read many discussions on this, such as the fact the PUT is idempotent and POST is not, etc. However, doesn't this ultimately depend on how the server is implemented? A developer can always build the backend server such that the PUT request is not idempotent and creates multiple records for multiple requests. A developer can also build an endpoint for a PUT request such that it acts like a DELETE request and deletes a record in the database.
So my question is, considering that we don't take into account any server side code, is there any real difference between the HTTP methods? For example, GET and POST have real differences in that you can't send a body using a GET request, but you can send a body using a POST request. Also, from my understanding, GET requests are usually cached by default in most browsers.
Are HTTP request methods anything more than just a logical structure (semantics) so that as developers we can "expect" a certain behavior based on the type of HTTP request we send?

You are right that most of the differences are on the semantic level, and if your components decide to assign other semantics, this will work as well. Unless there are components involved that you do not control (libraries, proxies, load balancers, etc).
For instance, some component might take advantage of the fact that PUT it idempotent and thus can re retried, while POST is not.

The Hypertext Transfer Protocol (HTTP) is designed to enable communications between clients and servers.
HTTP works as a request-response protocol between a client and server.
A web browser may be the client, and an application on a computer that hosts a web site may be the server.
Example: A client (browser) submits an HTTP request to the server; then the server returns a response to the client. The response contains status information about the request and may also contain the requested content.
HTTP Methods
GET
POST
PUT
HEAD
DELETE
PATCH
OPTIONS
The GET Method
GET is used to request data from a specified resource.
GET is one of the most common HTTP methods.
Note that the query string (name/value pairs) is sent in the URL of a GET request.
The POST Method
POST is used to send data to a server to create/update a resource.
The data sent to the server with POST is stored in the request body of the HTTP request.
POST is one of the most common HTTP methods.
The PUT Method
PUT is used to send data to a server to create/update a resource.
The difference between POST and PUT is that PUT requests are idempotent. That is, calling the same PUT request multiple times will always produce the same result. In contrast, calling a POST request repeatedly have side effects of creating the same resource multiple times.
The HEAD Method
HEAD is almost identical to GET, but without the response body.
In other words, if GET /users returns a list of users, then HEAD /users will make the same request but will not return the list of users.
HEAD requests are useful for checking what a GET request will return before actually making a GET request - like before downloading a large file or response body.
The DELETE Method
The DELETE method deletes the specified resource.
The OPTIONS Method
The OPTIONS method describes the communication options for the target resource.
src. w3schools

Omit specific headers after being redirected

I have a web server which contains an API to upload files to Amazon's S3 storage. Since I do not want to waste resources on streaming the files through my server, when an upload request comes in, I generate a pre-signed URL for the client and then redirect that client to this URL using HTTP 307 - Temporary redirect.
In practice, the flow looks like this:
Client issues a PUT request to my server, requesting a file upload
My server inspects the request and generates a pre-signed URL for S3
My server responds to client with 307 redirection to the pre-signed URL
Client repeats the PUT request to the pre-signed URL
Upload commences
The challenge
My server uses the Authorization header for... well, authorisation. Incidentally, Amazon also accepts this header for authorisation, although the values expected by both parties are completely different.
The problem is, that since my upload API requires this header to be present during file upload request, when my server issues the 307 redirect back to the client, the client takes all the headers in the original request and sends them along to the pre-signed S3 URL, which causes the request to be rejected by Amazon due to authorisation error.
The question
Can I somehow instruct the client (via HTTP response header) to not include the Authorization header when following the redirection?
Current solution
Right now we "fixed" this by returning the pre-signed URL to the client in the response body. The client then manually issues a new PUT request to that URL without the Authorization header. This works fine. I would like to know if there is a way to achieve this behaviour without this extra manual work.
What is the client? In the above statements, when I mention the "client", right now it could be either a modern web browser or a native iOS or Android app. On iOS, we use Alamofire for HTTP communication. I am unsure of what library or components are used on Android.
Note: I have seen this question and its answers, but it does not contain the answers I seek.

Any holes in securing a HTTP request with HMAC using ONLY the HTTP method and URL?

I want to redirect my users browser using HTTP code 303 to a GET URL that I secure using HMAC. Because the request will come from the users browser, I will not have fore-knowledge of the request headers. So I am generating the HMAC hash using the values of the HTTP method and URL only. For example, the URL I want the browser to do to might be:
GET /download
?name=report.pdf
&include=http://url1
&include=http://url2
This create report.pdf for me, containing the contents of all the urls specified using the include query param.
My HMAC code will change this URL to be
GET /download
?name=report.pdf
&include=http://url1
&include=http://url2
&hmac-algorithm=simple-hmac
&hmac-signature=idhihhoaiDOICNK
I can issue HTTP 303 to the user using this URL, and the user will get their report.pdf.
As I am not including the request headers in the signature, I am wondering two things:
1) Can a would-be attacker take advantage of the fact that I am not signing the request headers?
2) Is there a better way to achieve what I am trying to do?

When I realised that what I am talking about here is a signed URL, I checked the Amazon Docs and found "REST Authentication Example 3: Query String Authentication Example" in this document: http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html.
This example is about a signed URL for use through a browser. About signing the headers, the document says:
You know that when the browser makes the GET request, it won't provide a Content-Md5 or a Content-Type header, nor will it set any x-amz- headers, so those parts are all kept empty.
In other words, Amazon leave the headers out of the signature.
Amazon make no mention of potential security holes, so until I hear otherwise (or get hacked :) ), I will assume my approach above is fine.

REST: HTTP headers or request parameters

I've been putting in some research around REST. I noticed that the Amazon S3 API uses mainly http headers for their REST interface. This was a surprise to me, since I assumed that the interface would work mainly off request parameters.
My question is this: Should I develop my REST interface using mainly http headers, or should I be using request parameters?

The question mainly is whether the parameters defined are part of the resource identifier (URI) or not. if so, then you would use the request parameters otherwise HTTP custom headers. For example, passing the id of the album in a music gallery must be part of the URI.
Remember, for example /employee/id/45 (Or /employee?id=45, REST does not have a prejudice against query string parameters or for clean slash separated URIs) identifies one resource. Now you could use content-negotiation by sending request header content-type: text/plain or content-type: image/jpg to get the info or the image. In this respect, resource is deemed to be the same and header only used to define format of the resource.
Generally, I am not a big fan of HTTP custom headers. This usually assumes the client to have a prior knowledge of the server implementation (not discoverable through natural HTTP means, i.e. hypermedia) which always is considered a REST anti-pattern
HTTP headers usually define aspects of HTTP orthogonal to what is to be achieved in the process of request/response. Authorization header (really a misnomer, should have been authentication) is a classic example.

Passing params in the URL when using HTTP POST

Is it allowable to pass parameters to a web page through the URL (after the question mark) when using the POST method? I know that it works (most of the time, anyways) because my company's webapp does it often, but I don't know if it's actually supported in the standard or if I can rely on this behavior. I'm considering implementing a SOAP request handler that uses a parameter after the question mark to indicate that it is a SOAP request and not a normal HTTP request. The reason for this that the webapp is an IIS extension, so everything is accessed via the same URL (ex: example.com/myisapi.dll?command), so to get the SOAP request to be processed, I need to specify that "command" parameter. There would be one generic command for SOAP, not a specific command for each SOAP action -- those would be specified in the SOAP request itself.
Basically, I'm trying to integrate the Apache Axis2/C library into my webapp by letting the webapp handle the HTTP request and then pass off the incoming SOAP XML to Axis2 for handling if it's a SOAP request. Intuitively, I can't see any reason why this wouldn't work, since the URL you're posting to is just an arbitrary URL, as far as all the various components are concerned... it's the server that gives special meaning to the parts after the question mark.
Thanks for any help/insight you can provide.

Lets start with the simple stuff. HTTP GET request variables come from the URI. The URI is a requested resource, and so any webserver should (and apache does) have the entire URI stored in some variable available to the modules or appserver components running within the webserver.
An http POST which is different from an http GET is a separate logical call to the webserver, but it still defines a URI that should process the post. A good webserver (apache being one) will again make the URI available to whatever module or appserver is running within it, then will additionally make available the variables which were sent in the POST headers.
At the point where your application takes control from apache during a POST you should have access to both the GET and POST variables and be able to do whatever control logic you wish, including replying with a SOAP protocol instead of HTML.

If you are asking whether it is possible to send parameters via both GET and POST in a single HTTP request, then the answer is "YES". This is standard functionality that can be used reliably AFAIK.
One such example is sending authentication credentials in two pieces, one over GET and the other through POST so that any attempt to hijack a session would require hijacking both the GET and POST variables.
So in your case, you can use POST to contain the actual SOAP request but test for whether it is a SOAP request based on the parameter passed in GET (or in other words through the URL).

I believe that no standard actually defines the concept of "HTTP parameters" or "request variables". RFC 1738 defines that an URL may have a "search part", which is the substring after the question mark. HTML specifies in the form submission protocol how a browser processing a FORM element should submit it. In either case, how the server-side processes both the search part and the HTTP body is entirely up to the server - discarding both would be conforming to these two specs (but fairly useless).
In order to determine whether you can post a search part to a specific service, you need to study this service's protocol specification. If the service is practically defined by means of a HTML form, then you cannot use a mix - you can't even use POST if the FORM specifies GET (and vice versa). If you post to a web service, you need to look at the web service's WSDL - which will typically mandate POST; with all data in a SOAP message. Etc.
Specific web frameworks may have the notion of "request variables" - whether they will draw these variables both from a search part and a request body, you need to find out in the product documentation.

I deployed a web application with 3 (a mobile network operator) in the UK. It originally used POST parameters, but the 3 gateway stripped them (and X-headers as well!). So beware...

allowable? sure, it's doable, but i'm leaning towards the spec suggesting dual methods isn't necessarily supposed to happen, or be supported. RFC2616 defines HTTP/1.1, and i would argue suggests only one method per request. if you think about your typical HTTP transaction from the client side, you can see the limitation as well:
$ telnet localhost 80
POST /page.html?id=5 HTTP/1.1
host: localhost
as you can see, you can only use one method (POST/GET, etc...), however due to the nature of how various languages operate, they may pick up the query string, and assign it to the GET variable. ultimately though, this is a POST request, and not a GET.
so basically, yes this functionality exists, is it intended? i would say no.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex