Is the HTTP 'HEAD' verb useful in web development? - http

I've read the w3.org spec on the 'HEAD' verb, and I guess I'm missing something. I can't see how it would be useful.
Is the HTTP 'HEAD' verb useful in web development?
If so, how?

From RFC2616:
This method (HEAD) can be used for obtaining
metainformation about the entity
implied by the request without
transferring the entity-body itself.
This method is often used for testing
hypertext links for validity,
accessibility, and recent
modification.
The reason why HEAD is preferred to GET is due to the absence of the message body in the response making it using in scenarios where you want to determine if the content has changed at all - a change in the last modified time or content length usually signifies this.
Also, a HEAD request will provide some information about the server setup (whether it is IIS/Apache etc.), unless the server was masked; of course, this is available in all responses, but HEAD is preferred especially when you don't know the size of the response. HEAD is also the easiest way to determine if a site is up or down; again the irrelevance of the message body makes HEAD the ideal candidate.
I'm not sure about this, but RSS/ATOM feed readers would use HEAD over GET to ascertain if the contents of the feed have changed.

The HTTP HEAD can also be used to pre-authenticate to web server, before you do HTTP PUT/POST of some large data. Without the first HEAD request, you would be sending the large data to web server twice (because the first request would return 401 unauthorized reponse with WWW-authenticate header).

It's mainly for browsers and proxies to determine whether they can use a cached copy of the web document without having to download the whole thing (which would rather defeat the purpose of a cache).

Related

Post in REST API design

I've been under the impression that Post in Rest means "Create".
But after reading up on the spec http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.5
It seems like it can be more than just Create?
That was also stated by Stormpath in their screencasts on rest api design.
According to Stormpath, Post means "Process" , which can be pretty much anything.
Is that the correct way to see it?
I can trigger custom actions for my resources using Post?
In theory, a POST request should attempt to create or modify some resource on the server. As #Tichodroma pointed out, an idempotent request will affect this change only the first time it is sent, but otherwise what's important is that some state on the server will be changed by the request.
More practically. POST requests are often used when the request payload is too large to fit into a GET URI (e.g. a large file upload). This is usually an intentional breach of HTTP standards to avoid a 414 Request-URI Too Long response.
In terms of verbiage, I don't know if I like "process", because even a GET request will usually be "processed" to determine the resource to return. The main difference in my mind is the change of some state on the server.

Is there a standard way in HTTP to specify no content should be returned?

For a PUT or POST (for example), I would like to specify to the server that I don't want any content returned in the response, even if it normally would. Essentially I'm looking for a way to perform blind inserts/updates, and was trying to avoid unnecessary response payloads if I have no intention of using them.
I thought maybe Accept: none as a request header (or something similar) might be an option, but couldn't find anything to support that.
Is there a standard way to specify this in an HTTP request, or do I have to just live with a little extra content in the response?
I think a minimal response is necessary to know if the request was handled correctly by web servers or if there was errors, even if it has no data other than status code and HTTP headers.
That said, you can use HEAD HTTP command to make a GET request having a response without the message body (you get back only headers). But this, AFAIK, doesn't work with POST or PUT requests.
Regards.
You might be interested in the proposal outlined in https://datatracker.ietf.org/doc/html/draft-snell-http-prefer-18.

Why do we need HTTP GET? Is there anything that can't be achieved by HTTP POST?

As far as I know what GET can do, the same can be achieved by POST. So why was GET required in first place while defining HTTP protocol. If GET is only for fetching the resource, people can still update resources by sending the parameters values in URL. Why this loophole? Or the guy who did the coding on server side to update the resource on GET request has written a bad code?
HTTP specified different methods for different purposes. The GET method is intended to be used to “retrieve whatever information (in the form of an entity) is identified by the Request-URI”. Especially, it is intended to be a safe and idempotent method. That means a GET request should not have side effects (i.e. changing data):
In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval.
And sending an identical request multiple times results in the same as sending it just once:
Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property.
Practically, no browser implements POSTing by clicking links (without intercepting the click event in JavaScript), nor bookmarking POST data. Furthermore, semantically POST and GET serve different purposes. One is for POSTing data to an application, the other is for GETting data from the application. These semantics have practical implications, but they also have theoretical design implications that speak to the quality of your application's design: an application that doesn't handle GET differently from POST probably has a great deal of security problems and workflow bugs.
From RFC 2616:
9.3 GET
The GET method means retrieve whatever
information (in the form of an entity)
is identified by the Request-URI. If
the Request-URI refers to a
data-producing process, it is the
produced data which shall be returned
as the entity in the response and not
the source text of the process, unless
that text happens to be the output of
the process.
The semantics of the GET method change
to a "conditional GET" if the request
message includes an If-Modified-Since,
If-Unmodified-Since, If-Match,
If-None-Match, or If-Range header
field. A conditional GET method
requests that the entity be
transferred only under the
circumstances described by the
conditional header field(s). The
conditional GET method is intended to
reduce unnecessary network usage by
allowing cached entities to be
refreshed without requiring multiple
requests or transferring data already
held by the client.
The semantics of the GET method change
to a "partial GET" if the request
message includes a Range header field.
A partial GET requests that only part
of the entity be transferred, as
described in section 14.35. The
partial GET method is intended to
reduce unnecessary network usage by
allowing partially-retrieved entities
to be completed without transferring
data already held by the client.
The response to a GET request is
cacheable if and only if it meets the
requirements for HTTP caching
described in section 13.
See section 15.1.3 for security
considerations when used for forms.
9.5 POST
The POST method is used to request
that the origin server accept the
entity enclosed in the request as a
new subordinate of the resource
identified by the Request-URI in the
Request-Line. POST is designed to
allow a uniform method to cover the
following functions:
- Annotation of existing resources;
- Posting a message to a bulletin board, newsgroup, mailing
list,
or similar group of articles;
- Providing a block of data, such as the result of submitting a
form, to a data-handling process;
- Extending a database through an append operation. The actual
function performed by the POST method
is determined by the server and is
usually dependent on the Request-URI.
The posted entity is subordinate to
that URI in the same way that a file
is subordinate to a directory
containing it, a news article is
subordinate to a newsgroup to which it
is posted, or a record is subordinate
to a database.
The action performed by the POST
method might not result in a resource
that can be identified by a URI. In
this case, either 200 (OK) or 204 (No
Content) is the appropriate response
status, depending on whether or not
the response includes an entity that
describes the result.
If a resource has been created on the
origin server, the response SHOULD be
201 (Created) and contain an entity
which describes the status of the
request and refers to the new
resource, and a Location header (see
section 14.30).
Responses to this method are not
cacheable, unless the response
includes appropriate Cache-Control or
Expires header fields. However, the
303 (See Other) response can be used
to direct the user agent to retrieve a
cacheable resource.
POST requests MUST obey the message
transmission requirements set out in
section 8.2.
See section 15.1.3 for security
considerations.
As stated, the response may change with GET if the request message has conditionals based on certain criteria. The POST requires that the server accept the request, no matter what.
Anytime you do a web search and you want to link someone to it, you can easily do it through:
http://www.google.com/search?q=lol
Can you imagine telling someone to do a POST request instead? A POST request isn't really bookmarkable like that, which is why GET is useful.
They simply have different purposes, as stated in other answers. GET is for GETing, POST is for POSTing.
Everything can also be achieved using raw TCP connections. Yet we often use HTTP rather than raw TCP connections because HTTP offers a layer of abstraction and, therefore, convenience and conforming implementations. Likewise, we use HTTP correctly (GETs, POSTs, PUTs, DELETEs, etc) rather than dumbly (POSTs only) because these verbs offer an additional layer of abstraction and, therefore, convenience and conforming implementations.
Lets say I want to send a variable to a page via a link, can I do that with POST? Nope, but with GET, I can send something over by doing ?variableName=someValue
You're right, everything can be tunnel through an HTTP POST. In fact, SOAP web services do exactly that. Everything is a POST using SOAP web services.
In that case, you are tunneling through HTTP, and not using HTTP to its fullest. If that's all you want to do, then that's fine.
However, if you wish to leverage HTTP for the features and benefits that it provides beyond simple message transport, then you should read the RFC and learn the rest of the HTTP protocol including GET, PUT, POST, DELETE, and all of the headers, cache management and result codes.

How do I deal with different requests that map to the same response?

I'm designing a Web service. The request is idempotent, so I chose the GET method. The response is relatively expensive to calculate and not small, so I want to get caching (on the protocol level) right. (Don't worry about memoisation at my part, I have that already covered; my question here is actually also paying attention to the Web as a whole.)
There's only one mandatory parameter and a number of optional parameter with default values if missing. For example, the following two map to the same representation of the response. (If this is a dumb way to go about it the interface, propose something better.)
GET /service?mandatory_parameter=some_data HTTP/1.1
GET /service?mandatory_parameter=some_data;optional_parameter=default1;another_optional_parameter=default2;yet_another_optional_parameter=default3 HTTP/1.1
However, I imagine clients do not know this and would treat them separate and therefore waste cache storage. What should I do to avoid violating the golden rule of caching?
Make up a canonical form, document it (e.g. all parameters are required after all and need to be sorted in a specific order) and return a client error unless the required form is met?
Instead of an error, redirect permanently to the canonical form of a request?
Or is it enough to not mind how the request looks like, and just respond with the same ETag for same responses?
First, don't use semicolons as a delimiter in a query string. You should be using ? to begin a query string and & to delimit variable/value pairs. RFC 3986 doesn't explicitly say you have to use &, but the vast majority of existing code uses this delimiter because of the application/x-www-form-urlencoded precedent.
Second, you're right, in that parameters in a query string result in a different URI, and thus, as far as caches are concerned, a different resource. Assuming you want optimal caching performance, if you know that an optional parameter has been specified, and its inclusion is unnecessary and does not affect the representation that will be transmitted, you should be making a redirect to a canonical representation that omits the parameter. (i.e., An optional parameter is given with a value that is set to the default value. For example, if you have http://example.com:80/, you can normalize to http://example.com/ because 80 is the default value for the port with HTTP. You can do the same for query parameters since you control the URI space.) If you have parameters included (optional or otherwise) that appear in an order other than the canonical order, you should redirect for that too. A 301 redirect would be preferred if you know that the relationship between URIs will be stable. Otherwise, do a 302/307 redirect as appropriate. I would recommend defining your canonical form the same way that OAuth does: Sort each parameter alphabetically, first by key, then by value. Other normalization operations will also help out here. RFC 3986 has an entire section on URI normalization that will be relevant to you. This technique will really only work for GET, and redirects on PUT/POST/DELETE are not generally recommended.
Third, ETags are great, and they provide a huge performance improvement if implemented well by both the client and server. However, it's unfortunately rare for both sides to do it right. Ditto for Last-Modified. You should pursue these, because the CPU and bandwidth savings are significant when it works, but they are not sufficient on their own. Other headers like Cache-Control are also frequently necessary. It's worth familiarizing yourself with Section 13 of RFC 2616 if you're planning on going into great detail on this stuff.
Finally, a word of warning — there is an issue with these redirects you need to be aware of: Clients trying to access your resources may frequently be redirected to other locations. This introduces overhead that only gives you an overall savings if the clients make subsequent requests against the same resource, maintaining state to avoid the subsequent redirect. Unless you've open-sourced a reference client implementation that takes advantage of your caching optimizations, you may never benefit from these tweaks.
I would pick option (2) in your list - I would make the request RESTful, rather than RPC like.
I.e. in this case, if you make all of the parameters parts of the request path:
/service/mandatory_parameter/some_data/optional_parameter/default1/another_optional_parameter/default2/yet_another_optional_parameter/default3
In the case where not all of the optional parameters are specified, return a 301 (Permanent redirect) to the full resource name with the defaults filled in. This will (or should) be cached by clients and web caches appropriately, and even if it gets to your backend then making the 301 should be very cheap for you.
At which point, you have one canonical form for the URI, and caching will work as normal/expected.
This does mean that every combination of parameters will be cached separately (as a 301), however that's fine really as the non-canonical requests will have an independent cache policy to the full request and clients which are worried about the extra round trip can fill in all the parameters themselves.
Your option (3) won't work as you expect - each form will be cached independently as they're different URIs.
It should also be noted that a lot of downstream caches / software won't cache your response at all due to the query parameters, which is why I suggest turning it into a 'proper' resource..
First it's a good thing you choice GET since other methods don't have as good caching support. As far as I know browsers do cache URIs with respect to the parameters so I don't think It's a good idea to use a canonical form.
One thing that you don't state here is how this service is going to be used. If those requests are made from a browser (and it looks to me that those are probably issued from a script) requests will probably look the same even if they are asked for more than once. So make sure that whatever generate the URI end up with the same URI for equal input data (remove default parameters or always include them).
When it comes to the ETag I recommend you to have this, though I would like to clarify how it works; You get the request, you process all your "expensive calculations" and then if there were a If-None-Match header with the same hash (ETag) as your processed response you may return 304 Not-Modified. So ETag is used to avoid transmitting the response if the client already have it. (Sure you may implement caching on server-side, but this is better to do based on input parameters).
To further improve cache hits on client side you may want to set proper caching headers in you response.
I asked almost the same question for me some month ago. My answer I describe on an example of my realization.
On the server side I have WFC service which receive requests in one of the following forms
GET /Service/RequestedData?param1=data1&param2=data2…
GET /Service/RequestedData/IdOfData?param1=data1&param2=data2…
PUT /Service/RequestedData/IdOfData // with param1=data1&param2=data2… in body
POST /Service/RequestedData/IdOfData // with param1=data1&param2=data2… in body
DELETE /Service/RequestedData/IdOfData
So requests are in REST for, but GET requests have some optional parameters. Especially this part is a port of your interest.
Because WFC support a URL templates, the prototype of functions which reply to a client request looks like
[WebGet (UriTemplate = "RequestedData?param1={myParam1}&param2={myParam2}",
ResponseFormat = WebMessageFormat.Json)]
[OperationContract]
MyResult GetData (string myParam1, int myParam2);
All requests like
GET /Service/RequestedData?param1=&param2=data2
GET /Service/RequestedData?param2=data2&param1=
GET /Service/RequestedData?param2=data2
will be mapped to the same call from the side of my WCF service. So I have one problem less.
Now at the beginning of implementation of every method which response to HTTP GET request I set in the HTTP header "Cache-Control: max-age=0". It means that client always try to verify client browser cache and no ajax requests will be not easy responded from the local cache like it can do Internet Explorer.
Next I calculate always an ETag based on my data. The exact algorithm is a subject of separate discussion, but important is, that in all responses to HTTP GET requests exist ETag in the HTTP header.
So clients every time verify his local cache and send GET request to server. They send the ETag, which come from its local cache, inside of "If-None-Match" HTTP header. Server computes the ETag which has data, which will be sending back to this GET request. It ETag of data is the same as in the client request server send back response with empty body and the code "304 Not Modified" back. In this case browser gives data from the local cache.
If the same client from a unknown reason create a new version of URL request, which will be interpret from the web browser as a new URL, then web browser will not find old server response in the local cache and send one more time the same request to the server. Is it a real problem? The server send the data one more time. If you have a server side caching you can makes a little more optimization. In the most cases, the URL of GET requests will be produced by a client side JavaScript so you will be no time have such situation.
Calculation of ETag and setting of "Cache-Control: max-age=0" and Etag header as well as setting "304 Not Modified" code should do WFC service, but it is very easy.
The most important is that my implementation of ETag calculation is not as expansive as getting the whole data from the database server and calculation MD5 cache from there. I use permanently rowversion data type in every row of data in the SQL Server database. This rowversion is nothing other as a counter of changes in the database. If one change a row of data rowversion value in the corresponding row will be incremented. So if one makes SELECT statement from maximum value of rowversion value, and this value is not changed comparing with the previous requests, one can be sure that the data were not changed in the time period. The algorithm of calculation of ETag should be only sensitive to deleting of data from the table. But it is also a solved problem. A little more about this you can read in Concurrency handling of Sql transactrion.
I don’t want suggest my ETag calculation as a best choice, I want only say, that calculation of ETag can be much cheaper as calculation MD5 from the whole data.
In case of errors Server throws an exception which will be mapped to a HTTP code, which I define in the throw statement. As a body WFC sends a standard JSON object {"description":"My error text"}. A custom error object is also possible (see Is WebProtocolException included in .net 4.0?). On the client side I use jQuery and in the corresponding jQuery.ajax inside of error event handler the error message will be decoded and displayed to the user.
So my recommendation: usage of ETag together with "Cache-Control: max-age=0" for all HTTP GET requests. For all other requests I’ll recommend you implement RESTfull service. For the error implementation you should look at the most native way which is supported by the software used for server and client implementation and use this.
UPDATED: To clear the URL structure I should add following. In my service the main part like GET /Service/RequestedData/IdOfData describes data objects requested. Parameters param1=data1&param2=data2 corresponds mostly the information about sorting, paging and filtering of data. I use active jqGrid plugin for jQuery and if the end-user scroll in the grid to the next page, click on the column header (sorting of data) or if he set a filter with respect of searching feature, all these follows to different optional parameters appended the main URL.

Passing params in the URL when using HTTP POST

Is it allowable to pass parameters to a web page through the URL (after the question mark) when using the POST method? I know that it works (most of the time, anyways) because my company's webapp does it often, but I don't know if it's actually supported in the standard or if I can rely on this behavior. I'm considering implementing a SOAP request handler that uses a parameter after the question mark to indicate that it is a SOAP request and not a normal HTTP request. The reason for this that the webapp is an IIS extension, so everything is accessed via the same URL (ex: example.com/myisapi.dll?command), so to get the SOAP request to be processed, I need to specify that "command" parameter. There would be one generic command for SOAP, not a specific command for each SOAP action -- those would be specified in the SOAP request itself.
Basically, I'm trying to integrate the Apache Axis2/C library into my webapp by letting the webapp handle the HTTP request and then pass off the incoming SOAP XML to Axis2 for handling if it's a SOAP request. Intuitively, I can't see any reason why this wouldn't work, since the URL you're posting to is just an arbitrary URL, as far as all the various components are concerned... it's the server that gives special meaning to the parts after the question mark.
Thanks for any help/insight you can provide.
Lets start with the simple stuff. HTTP GET request variables come from the URI. The URI is a requested resource, and so any webserver should (and apache does) have the entire URI stored in some variable available to the modules or appserver components running within the webserver.
An http POST which is different from an http GET is a separate logical call to the webserver, but it still defines a URI that should process the post. A good webserver (apache being one) will again make the URI available to whatever module or appserver is running within it, then will additionally make available the variables which were sent in the POST headers.
At the point where your application takes control from apache during a POST you should have access to both the GET and POST variables and be able to do whatever control logic you wish, including replying with a SOAP protocol instead of HTML.
If you are asking whether it is possible to send parameters via both GET and POST in a single HTTP request, then the answer is "YES". This is standard functionality that can be used reliably AFAIK.
One such example is sending authentication credentials in two pieces, one over GET and the other through POST so that any attempt to hijack a session would require hijacking both the GET and POST variables.
So in your case, you can use POST to contain the actual SOAP request but test for whether it is a SOAP request based on the parameter passed in GET (or in other words through the URL).
I believe that no standard actually defines the concept of "HTTP parameters" or "request variables". RFC 1738 defines that an URL may have a "search part", which is the substring after the question mark. HTML specifies in the form submission protocol how a browser processing a FORM element should submit it. In either case, how the server-side processes both the search part and the HTTP body is entirely up to the server - discarding both would be conforming to these two specs (but fairly useless).
In order to determine whether you can post a search part to a specific service, you need to study this service's protocol specification. If the service is practically defined by means of a HTML form, then you cannot use a mix - you can't even use POST if the FORM specifies GET (and vice versa). If you post to a web service, you need to look at the web service's WSDL - which will typically mandate POST; with all data in a SOAP message. Etc.
Specific web frameworks may have the notion of "request variables" - whether they will draw these variables both from a search part and a request body, you need to find out in the product documentation.
I deployed a web application with 3 (a mobile network operator) in the UK. It originally used POST parameters, but the 3 gateway stripped them (and X-headers as well!). So beware...
allowable? sure, it's doable, but i'm leaning towards the spec suggesting dual methods isn't necessarily supposed to happen, or be supported. RFC2616 defines HTTP/1.1, and i would argue suggests only one method per request. if you think about your typical HTTP transaction from the client side, you can see the limitation as well:
$ telnet localhost 80
POST /page.html?id=5 HTTP/1.1
host: localhost
as you can see, you can only use one method (POST/GET, etc...), however due to the nature of how various languages operate, they may pick up the query string, and assign it to the GET variable. ultimately though, this is a POST request, and not a GET.
so basically, yes this functionality exists, is it intended? i would say no.

Resources