REST design: what verb and resource name to use for a filtering service - http

I am developing a cleanup/filtering service that has a method that receives a list of objects serialized in xml, and apply some filtering rules to return a subset of those objects.
In a REST-ful service, what verb shall I use for such a method? I thought that GET is a natural choice, but I have to put the serialized XML in the body of the request which works but feels incorrect. The other verbs don't seem to fit semantically.
What is a good way to define that Service interface? Naming the resource /Cleanup or /Filter seems weird mainly because in the examples I see online, it is always a name rather than a verb being used for resource name.
Am I right to feel that REST services are better suited for CRUD operations and you start bending the rules in situations like this service? If yes, am I then making a wrong architectural choice.
I've pushed to develop this service in REST-ful style (as opposed to SOAP) for simplicity, but such awkward cases happen a lot and make me feel like I am missing something. Either choosing REST where it shouldn't be used or may be over-thinking some stuff that doesn't really matter? In that case, what really matters?

REST is about using HTTP the way it was designed. To be RESTful consider (title was REST design :):
URLs should be permalinks to a resource (caching benefits, storing/sharing endpoints etc...)
Because they are permalinks to a resource, having verbs in the URL is a hint that you're on the wrong path (filter is a verb).
A collection of resources can be an endpoint /foos.
If you want to filter the collection of resources, consider querystring params like ?filter= or something like ?ids=1,2,3,4,5.
A GET should not change resources. Note that 'cleanup' implies something getting deleted so be cautious of changes to resources when you do a GET. REST says a GET shouldn't alter resources. Imagine a caching server taking you're cleanup request as a GET and returning OK because t's cached. Caching servers know not to cache a POST, DELETE etc... (that's the way HTTP was designed).
Don't rule out multiple calls - for example, you may do a get to filter and get a set of resources to clean up and then could be followed by many or one DELETE verb calls to do the cleanup.
Sometimes there's a temporal resource like a transaction or a 'job' that could do work like a cleanup. Don't rule out a POST to the resource with the body containing items to cleanup up and it returns a job id. You can then query the jobid for the cleanup progress or status.
It's hard to give exact guidance because the question isn't clear but hopefully the RESTful principlies guidance and thoughts above set you on the right track. If you clarify the exact calls, I'll try and recommend APIs.
So, let's say you wanted to cleanup duplicate foos.
[GET] /foos/duplicates (or /foos?filter=duplicates)
returns a body with identifies to of foos that are duplicates. Let's say that returns 1,2,5 (could be names).
Then you could issue:
[DELETE] /foos with the body being an array containing 1,2,5 (or names if unique). the delete call is passive so even if the GET call is cached according to REST principles it's fine.
It's also possible and valid to not go the REST route such as POX or JOSN RPC over http but just realize at that point that it's not REST. And that's fine but you're not getting the benefits of REST described in fielding's thesis.
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
Also, read this:
http://blog.steveklabnik.com/posts/2011-07-03-nobody-understands-rest-or-http
EDIT:
After reading the comment where you clarified you're sending the server a set of objects (not persisted server side) and it returns the subset with the dupes filtered out (like a server side helper function), some options are:
Do this client/browser side if possible - why take the network roundtrip to filter out dupes out of collection?
If for some reason only the server has specific knowledge/data to determine that two items are functional equivalent (even though data not exactly the same), then consider POSTing the data set to the server with the response body containing the unique/filtered set. Even though the server isn't persisting the set, it would fall into a 'temporal' object or set and the server is modifying it. It's not conceptually a GET of server resources and caching offers no benefits in that scenario.

Last question first: What really matters is getting the job done in a way that is
Correct
As easy to use as practical
Easily maintained by future programmers (likely to include yourself)
REST is a natural fit for operations on resources where each URL matches some object that can be manipulated. It is a less natural fit for other uses, but these are more guidelines than actual rules. Others have pointed out the original dissertation on REST, but it is worth remembering that few implementations are pure.
If you have several URLs that perform these transformative kinds of functions, consider putting them in their own special URL space, like /api/filter and /api/transliterate, etc.. That will help users and maintainers alike know that certain URLs aren't REST, but are more like remote procedure calls. Posting data to these URLs results in you getting some kind of data back.
If you get stuck on specific names you should make a list of candidates, have a few beers, then choose one from the list. That's what I do when I get stuck on minutia.
SOAP is a neat protocol and has its uses, but it tends to be very heavy. Good documentation and consistency are probably more important to your budding API than using any specific technology.

Related

RESTful API: Is it more practical to pass in parent resource locators in the request body instead of in the URI?

Suppose we have are building a REST API for a job listings app where a user can apply for a job.
Instead of making JobApplication a nested URI resource like so:
It would be made a top-level resource:
Of course, in the case of the latter, the JobVacancy's id is still included in the request but is passed through the request body instead of the URI.
Why the latter approach? Because it saves the client the inconvenience of having to know the parent resource's id for 3 of the routes.
I believe that in the end, this is just a matter of standard. Both solutions will work fine, however, the first approach is more readable closer to the real-world operations, the applications are always related to some job vacancies.
Moreover, you need to consider if, in your system, you always access the applications starting from a job vacancy. If this is true, I believe that the first approach is much more clear, because the relation of dependencies between the two entities is clearly expressed in the signature of the method.

I am implementing partial updates over POST since I cannot use PATCH. Can I do it conditionally?

So the business requires us to implement Partial Updates.
HTTP PUT only caters to the case where the client sends across a complete representation of the resource.
Hence I decided to use the catch-all HTTP POST to implement the same. Question is, can I still safely take care of Conditional Updates using ETags and Last-Modifieds? Or does the Http Spec prevent me from doing so in any way?
a) Why do you think you can't use PATCH?
b) From the HTTP point of view, the conditional headers apply to all methods. However, there may be existing servers not getting this right, so Id be careful with relying on them. (see, for instance, http://trac.tools.ietf.org/wg/httpbis/trac/ticket/96)

Filtering Data in ASP.NET Web Services

I've been using this site for quite a while, usually being able to sort out my questions by browsing through the questions and following tags. However, I've recently come across a question that is rather hard to lookup amongst the great number of questions asked - a question I hope some of you might be able to share your opinion on.
As my problem is a bit hard to fit into a single line, going in the title, I'll try to give a bit more details on the problem I've encountered. So, as the title says I need to filter, or limit, some of the response data my standard ASP.NET Soap-based Web service returns on invoking various web methods. The web service is used to return data used by other systems (a data repository more or less), where the client today is able to specify a few parameters on how the data should be filtered and in return a full-set of data back.
Well, easy enough I thought, just put additional filtering options on the existing web methods which needs a bit more filtered applied, make adjustments on the server-side and we are all set to go - well, unfortunately it turned out to be a bit more tricky then this.
The problem I am facing is that I'm working on a web service running in a production environment, which needs to be extended in such that additional filters can be applied to existing web method being invoked w/o affecting the calls already being made by other systems used by the customer using their client stubs. This is where I am a bit troubled, since I can't seem to find a "right solution" on extending the current web service.
Today, the filter is send as a custom data structure which holds information on which data should filtered, but I am not sure if I can simply just add more information to this data structure w/o breaking code at the clients? One of my co-workers suggested that I could implement a solution where I would extend the web.config on the server-side to hold a section with details on which data should be excluded (filtered out), but I don't find this to be a viable solution long-sighted - and I don't trust customers with such an option since this is likely to go wrong at some point. So the solution I am looking for is a way that I can apply a "second filter" to the data I am requesting from the client so instead of getting a full-set of data back it should only give a fraction, it implemented in such that the filter can be easily modified and it must not affect the current client calls.
Any suggestions on how I should approach this problem?
Thanks!
Kind regards,
E.
A pretty common practice is to create another instance of the application OR use part of the url to signify the version of the endpoint they are connecting to, perhaps the virtual directory is the date. That way old calls will go to the old API and new calls will come in on the new API.
http://api.example.com/dostuff
vs
http://api.example.com/6-7-2011/dostuff

REST - Modify Part of Resource - PUT or POST

I'm seeing a good bit of hand-waving on the subject of how to update only part of a resource (eg. status indicator) using REST.
The options seem to be:
Complain that HTTP doesn't have a PATCH or MODIFY command. However, the accepted answer on HTTP MODIFY verb for REST? does a good job of showing why that's not as good an idea as it might seem.
Use POST with parameters and identify a method (eg. a parameter named "action"). Some suggestions are to specify an X-HTTP-Method-Override header with a self-defined method name. That seems to lead to the ugliness of switching within the implementation based on what you're trying to do, and to be open to the criticism of not being a particularly RESTful way to use POST. In fact, taking this approach starts to feel like an RPC-type interface.
Use PUT to over-write a sub-resource of the resource which represents the specific attribute(s) to update. In fact, this is effectively an over-write of the sub-resource, which seems in line with the spirit of PUT.
At this point, I see #3 as the most reasonable option.
Is this a best practice or an anti-pattern? Are there other options?
There are two ways to view a status update.
Update to a thing. That's a PUT. Option 3
Adding an additional log entry to the history of the thing. The list item in this sequence of log entries is the current status. That's a POST. Option 2.
If you're a data warehousing or functional programming type, you tend to be mistrustful of status changes, and like to POST a new piece of historical fact to a static, immutable thing. This does require distinguishing the thing from the history of the thing; leading to two tables.
Otherwise, you don't mind an "update" to alter the status of a thing and you're happy with a PUT. This does not distinguish between the thing and it's history, and keeps everything in one table.
Personally, I'm finding that I'm less and less trustful of mutable objects and PUT's (except for "error correction"). (And even then, I think the old thing can be left in place and the new thing added with a reference to the previous version of itself.)
If there's a status change, I think there should be a status log or history and there should be a POST to add a new entry to that history. There may be some optimization to reflect the "current" status in the object to which this applies, but that's just behind-the-scenes optimization.
Option 3 (PUT to some separated sub-resource) is your best bet right now, and it wouldn't necessarily be "wrong" to just use POST on the main resource itself - although you could disagree with that depending on how pedantic you want to be about it.
Stick with 3 and use more granular sub-resources, and if you really do have a need for PATCH-like behavior - use POST. Personally, I will still use this approach even if PATCH does actually end up as a viable option.
HTTP does have a PATCH command. It is defined in Section 19.6.1.1 of RFC 2068, and was updated in draft-dusseault-http-patch-16, currently awaiting publication as RFC.
It's ok to POST & emulating PATCH where not available
Before explaining this, it's probably worth mentioning that there's nothing wrong with using POST to do general updates (see here) In particular:
POST only becomes an issue when it is used in a situation for which some other method is ideally suited: e.g., retrieval of information that should be a representation of some resource (GET), complete replacement of a representation (PUT)
Really we should be using PATCH to make small updates to complex resources but it isn't as widely available as we'd like. We can emulated PATCH by using an additional attribute as part of a POST.
Our service needs to be open to third-party products such as SAP, Flex, Silverlight, Excel etc. That means that we have to use the lowest common denominator technology - for a while we weren't able to use PUT because only GET and POST were supported across all the client technologies.
The approach that I've gone with is to have a "_method=patch" as part of a POST request. The benefits are;
(a) It's easy to deal with on the server side - we're basically pretending that PATCH is available
(b) It indicates to third-parties that we are not violating REST but working around a limitation with the browser. It's also consistent with how PUT was handled a few years back by the Rails community so should be comprehensible by many
(c) It's easy to replace when PATCH becomes more widely available
(d) It's a pragmatic response to an awkward problem.
PATCH is fine for patch or diff formats. Until then it's not very useful at all.
As for your solution 2 with a custom method, be it in the request or in the headers, no no no no and no, it's awful :)
Only two ways that are valid are either to PUT the whole resource, with the sub data modified, or POST to that resource, or PUT to a sub-resource.
It all depends on the granularity of your resources and the intended consequences on caching.
A bit late with an answer but I would consider using JSON Patch for scenarios like this.
At the core of it, it requires two copies of the resource (the original and the modified), and performs a diff on it. The outcome of the diff is an array of patch operations describing the difference.
An example of this:
[
{ "op": "replace", "path": "/baz", "value": "boo" },
{ "op": "add", "path": "/hello", "value": ["world"] },
{ "op": "remove", "path": "/foo" }
]
There are many client libraries that can do the hard lifting in generat

"context" on a drupal page

This question is a general one, and I've already posted a version of it here. I'm hoping, though, that I'll have a better chance of getting a response, and of being useful to more people, by asking in this forum.
Associating content together when it all loads on a drupal page is tricky business. In drupal, each page, no matter the site, is basically the same: you have main content in the middle (a view, a node, or multiple nodes), with blocks surrounding that central content. To make the blocks somehow aware of whats in the middle, (much less aware of each other) you either have to do some really fancy footwork in your own custom module, or you have to make "arguments" available in the URL.
I've been studying the spaces/context/features/purl suite of modules provided by developmentseed, and I've also looked into the Panels/Ctools modules made by Earl Miles (the guy who wrote views). While both provide tools to make my job easier, my understanding of each is that I'm still required to place "arguments" in the URL if I want the contents of my blocks defined by my "context" (I use that in the general sense, and not in the specific sense meant by either the context module, or the concept of context in Ctools).
Am I missing something, or is that where we're at with Drupal?
Finally, I should say in closing that I am aware of other modules that help with this kind of thing on a limited, case-by-case basis. The Views attach module and the Node reference views module, for instance, each take a stab at solving this issue for a very specific use case. They're both good modules, and there are others like them, but I'd really like to find a solution to this problem in general.
I guess I do not really understand what you're aiming at, but I'll try nonetheless:
For every non static website, be it based on Drupal or anything else, there are two basic things providing the 'context' for the decision on what content to deliver for given a request.
The first and most important thing is obviously the request itself. This is the only information that is always guaranteed to be there. In most cases, this will simply be a GET request, and with those, the URL is implicitly the main source of 'context' available. POST requests can provide a bit more 'context' besides the URL, but for your question, one could argue that they are just a more complicated variation of a GET request, providing some more 'arguments' besides the ones from the URL (and in most cases, one could turn a POST request into a GET request with a more elaborated URL anyways).
The second 'context providing' thing is the session. Whatever mechanism session handling is based on (predominantly cookies nowadays), the goal is always the same, namely to carry some 'state' information across the boundary of inherently stateless requests. It does this by tying a given request to information from previous requests, stored on the server side. This allows to 'enrich' the information that is available for the decision on what content to deliver for a request. Basically, one could look at it as a way of adding some more 'arguments' to the request.
And that's it. Any other information needed for assembling a response needs to somehow be derived from the information given in the request (and one could say that session handling is already the main process of doing so, by adding 'context' based on a cookie or some other identifier coming with the request).
Drupal reflects this process pretty well, IMHO, as it first assembles the 'main' content for a response based on the URL, with additional information (e.g. about the user) 'attached' in the session. It is only after the main content got assembled via calling $return = menu_execute_active_handler() in index.php, that the other elements of a response get added (e.g. blocks, menus, etc.), by calling theme('page', $return);.
So whatever 'context' it is that you want to 'pass' to those other elements, you either have to 'reextract' it from the information already used for assembling the main content (URL, session), or you have to store it temporarily during the generation of the main context. You can do this in many ways, e.g. by adding it to the information already stored in the session, by using static caching within some functions, by setting global variables (don't ;), by passing stuff through the database, etc...
So again, I do not seem to understand what you are aiming at. What is it that you are missing here?
Good answer from Henrik but I'd like to add that there can be quite a lot of information in the request beside maintaining state with cookies. Think important HTTP headers like accept or language or even X-REQUESTED-WITH. Most webframeworks wrap this information into one convenient datastructure. Unfortunately from the answers given I have to conclude that drupal doesn't.

Resources