appropriate user-agent header value - http

I'm using HttpBuilder (a Groovy HTTP library built on top of apache's httpclient) to sent requests to the last.fm API. The docs for this API say you should set the user-agent header to "something appropriate" in order to reduce your chances of getting blocked.
Any idea what kind of values would be deemed appropriate?

The name of your application including a version number?

I work for Last.fm. "Appropriate" means something which will identify your app in a helpful way to us when we're looking at our logs. Examples of when we use this information:
investigating bugs or odd behaviour; for example if you've found an edge case we don't handle, or are accidentally causing unusual load on a system
investigating behaviour that we think is inappropriate; we might want to get in touch to help your application work better with our services
we might use this information to judge which API methods are used, how often, and by whom, in order to do capacity planning or to get general statistics on the API eco-system.
A helpful (appropriate) User-Agent:
tells us the name and version of your application (preferably something unique and easy to find on Google!)
tells us the specific version of your application
might also contain a URL at which to find out more, e.g. your application's homepage
Examples of unhelpful (inappropriate) User-Agents:
the same as any of the popular web browsers
the default user-agent for your HTTP Client library (e.g. curl/7.10.6 or PEAR HTTP_Request)
We're aware that it's not possible to change the User-Agent sent when your application is browser-based (e.g. Javascript or Flash) and don't expect you to do so. (That shouldn't be a problem in your case.)
If you're using a 3rd party Last.fm API library, such as one of the ones listed at http://www.last.fm/api/downloads , then we would prefer it if you added extra information to the User-Agent to identify your application, but left the library name and version in there as well. This is immensely useful when tracking down bugs (in either our service or in the client libraries).

Related

What is the best HTTP status code to use when the requested language is unavailable?

I'm working on a site which will be available in multiple languages. I'm using subdomains to identify the locale (seems like the best option for us after reading: https://support.google.com/webmasters/answer/182192), so our English site will be at en.mysite.org, French at fr.mysite.org etc.
Not all translations will be available right away and some will never be available. So I have two scenarios which may require two different status codes:
When the user visits ru.mysite.org I would like to show them a page that tells them that the language is not yet available but will provide more information on how they can help make it available.
When the user visits pirate.mysite.org I would like them to know that it will likely never be available (and Google should probably also be unaware of the site).
Right now I'm simply rendering a 404 in both cases but I'm thinking that there may be a better practise for these cases particularly for SEO purposes. For scenario 1 I'm starting to think that 501 may make more sense. For the second scenario I'm not sure if there is a better option.
I think a 404 is reasonable for both cases:
404 Not Found
The requested resource could not be found but may be available again in the future. Subsequent requests by the client are permissible.
(Wikipedia)
HTTP does have language support via the Accept-Language header, but your site does not use it. (Why not?) For all HTTP knows, the fr and ru subdomains are just other parts of your site. Using the subdomain to represent language is not "the HTTP way" to return a different representation of the same resource. So as far as HTTP is concerned, a user has requested something which is not available, but it might be available in the future.
I do not think that either case represents a server error so 5xx is not the appropriate category.

Is it dangerous if a web resource POSTs to itself?

While reading some articles about writing web servers using Twisted, I came across this page that includes the following statement:
While it's convenient for this example, it's often not a good idea to
make a resource that POSTs to itself; this isn't about Twisted Web,
but the nature of HTTP in general; if you do this, make sure you
understand the possible negative consequences.
In the example discussed in the article, the resource is a web resource retrieved using a GET request.
My question is, what are the possible negative consequences that can arrive from having a resource POST to itself? I am only concerned about the aspects related to the HTTP protocol, so please ignore the fact that I mentioned about Twisted.
The POST verb is used for making a new resource in a collection.
This means that POSTing to a resource has no direct meaning (POST endpoints should always be collections, not resources).
If you want to update your resource, you should PUT to it.
Sometimes, you do not know if you want to update or create the resource (maybe you've created it locally and want to create-or-update it). I think that in that case, the PUT verb is more appropriate because POST really means "I want to create something new".
There's nothing inherently wrong about a page POSTing back to itself - in fact, many of the widely-used frameworks (ASP.NET, etc.) use that method to handle various events that happen on the client - some data is posted back to the same page where the server processes it and sends a new reponse.

Doesn't including tokens like #me in a REST URI break HTTP caching?

OpenSocial and some of the newer Google APIs include these tokens, such as "#me" or "#self", whose values are replaced by the API server with values based on the currently authenticated user. For example "/api/people/#me/#all" is an OpenSocial REST URL.
Doesn't this break with the goal of REST APIs to support native HTTP cache servers (like Squid)?
Even if you could get around the issue using the "Vary" header, it seems like a major drawback. And only real benefit is to allow developers to hard code some URIs into their apps. Anyone know why it was designed this way?
Yes it will make the use of public caches difficult. Personally I think it is a really bad idea and does seem to be driven by the desire to make it easier for clients to construct URIs. I sometimes wonder if the extensive use of caching servers like memcached are causing developers to forget about the benefits of http caching.

Can OAuth 2.0 support multiple grants in a single redirection journey?

My question is pretty simple:
If you have two web-application components:
Server-side (secret-capable) code in PHP, Python, Perl ... whatever
The javascript output and interpreted by the browser
Given a single redirection to the authorisation endpoint (and back) is it possible to specify and transfer the information for:
An authorization code grant (for the server-side code)
An implicit grant with restricted rights for the Javascript
thereby transferring the two grants (one in the request-url proper and the other in the fragment) in one round-trip without violating the RFC?
One redirect-loop seems cleaner than one for each grant (even if the second doesn't block due to previous authorization)
Thanks in advance!
References
https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-16#section-4.2
edit 1: code_and_token seems to be the type of thing I am after ... an auth code grant for the server to request the access code using its credentials ... and an implicit token for the javascript. As mov matake mentions, it was pulled from the RFC after v11, with no real note as to why. Facebook and Google seem to support this which makes me suspect it will return.
The token_and_code request type was removed from the specification because it needed significant work in terms of security analysis and rules, and no one offered to do it. It was originally proposed by a Twitter engineer who left the working group shortly after.
It will not be added to the specification, but it can easily be introduced by an extension. Google supported this flow on the list, but later said they will not implement it, and instead, will implement something else using HTML5 features.
OAuth 2.0 had "code_and_token" response type before (might be "token_and_code").
But it had been removed from the spec later.
So in current spec, if you need code for your server, the way will be
use "code" response type
get an access token on server side
and give it to the client side
You can't get scope-restricted token only for client side though..
Or you might set up an proxy on your server side for your client side code.
http://www.ietf.org/mail-archive/web/oauth/current/msg04969.html and http://www.ietf.org/mail-archive/web/oauth/current/msg03655.html
says that the "code_and_token" type was good, but the RFC didn't make it clear enough that the token in the fragment (for Javascript) should/could have less rights than the token obtained by the access code...
Thanks Nov Matake for pointing out the code_and_token type was part of the spec (at one point) as I missed it in the old specification versions (though it is widely implemented).
Looks like it will make a comeback though, as it is quite well supported by existing implementations at Google and Facebook and seems to be a core request to support both user-agent tokens and server-side access codes in one round trip.
The problem seems to be defining the semantics of "scope" in this context as well as defining a degree to which scope can differ in a single request. It makes sense that the user-agent token has limited rights, ie not the same rights as the client application.
We shall wait and see ... the downside of implementing off the back of an involving RFC.

Is there any reason not to use HTTP PUT and DELETE in a web application?

Looking around, I can't name a single web application (not web service) that uses anything besides GET and POST requests. Is there a specific reason for this? Do some browsers (or servers) not support any other types of requests? Or is this only for historical reasons? I'd like to make use of PUT and DELETE requests to make my life a little easier on the server-side, but I'm reluctant to because no one else does.
Actually a fair amount of people use PUT and DELETE, mostly for non-browser APIs. Some examples are the Atom Publishing Protocol and the Google Data APIs:
http://www.ietf.org/rfc/rfc5023.txt
http://code.google.com/apis/gdata/docs/2.0/basics.html
Beyond that, you don't see PUT/DELETE in common usage because most browsers don't support PUT and DELETE through Forms. HTML5 seems to be fixing this:
http://www.w3.org/TR/html5/forms.html#form-submission-0
The way it works for browser applications is: people design RESTful applications with PUT and DELETE in mind, then "tunnel" those requests through POSTs from the browser. For example, see this SO question on how Ruby on Rails accomplishes this using hidden fields:
How can I emulate PUT/DELETE for Rails and GWT?
So, you wouldn't be on your own designing your application with the larger set of HTTP verbs in mind.
EDIT: By the way, if you're curious about why PUT/DELETE are missing from browser based form posts, it turns out there's no real good technical reason. Reading around this thread on the rest-discuss mailing list, especially Roy Fielding's comments, is interesting for some context:
http://tech.groups.yahoo.com/group/rest-discuss/message/9620?threaded=1&var=1&l=1&p=13
EDIT: There are some comments on whether AJAX libraries support all the methods. It does come down to the actual browser implementation of XMLHttpRequest. I thought someone might find this link handy, which tests your browser to see how compliant the HttpRequest object is with various HTTP options.
http://www.mnot.net/javascript/xmlhttprequest/
Unfortunately, I don't know of a reference which collects these results.
Quite simply, the HTML 4.01 form element only allows the values "POST" and "GET" in its method attribute
Some proxy servers with tough security policies might drop them. I'm using PUT and DELETE anyways.
I've read that some browsers do not support other HTTP methods properly, though I can't name any specifics.
Rails, in particular, will pack your forms with a method parameter to explicitly set this even if the browser doesn't support those methods. That seems like a reasonable precaution if you're going to do this.
I say use all the features of HTTP, browsers be damned, lol. Maybe it'll inspire more complete and proper use of the HTTP protocol moving forward. There's more happening on the net than just POSTs and GETs. About time browser implementations reflected this.
This depends on your browser and Ajax library. For example jQuery supports all HTTP methods even though the browser may not. See for example the jQuery "ajax" documentation on the "type" attribute.
The Restlet Java framework lets you tunnel PUT and DELETE requests through HTML POST operations. To do this, you just add method=put or method=delete to your URI's query string, eg:
http://www.example.com/user=xyz?method=delete ...
This is the same as Ruby on Rails' approach (as described by #ars above).
Personally, I really don't see any purpose for using PUT or DELETE in a web application. All operations that an application performs are read or write, aka input output. Why do you need to distinguish the nature of the operation in the header of the HTTP request?
I could make ajax calls with the same url of form /object/object_id
and do multiple operations like delete, update, get the value, or create.
Just by looking at the URL, I have no clue which one it is.
By using GET and POST only, my urls will be:
/object/id/delete
/object/id/create
/object/id/update
/object/id --> implied GET
etc.
Based on my limited experience, this is a lot cleaner than hidden header request types in many cases.
I am not saying one should never use PUT or DELETE, just saying, use them only if absolutely needed.
Refer to "RESTful Web API" by Leonard Richardson to read more about different use cases and conventions regarding HTTP request methods in a RESTful web api.

Resources