Use of custom header fields in HTTP - http

I just stumbled across this question:
Is it still an acceptable thing to use custom header fields and if yes how should they be named?
As far as i knew custom fields start with an X-.. but according to RFC 6648 these are depricated since 2012 but I also haven't found any proposals for alternatives from them.
So would it make sense to name custom fields just "normal" (without an X-Prefix)? Does it makes sense at all to use custom fields for exchanging additional information?

Just name them normally. See, e.g., Custom HTTP headers : naming conventions
Whether it makes sense to use them depends on the application. They can certainly be used for things done in middleware such as authentication tokens, for example.
If there is something you really want checked by the server before it spins off endpoint-specific data processing code, the headers aren't a terrible place to put the relevant data.

Related

How to detect and possibly drop/sanitize http request parameters/headers to prevent XSS attacks

Recently, we found that some of our SpringMVC based site's pages, that accept query parameters, are susceptible to XSS attacks. For e.g. a url like http://www.our-site.com/page?s='-(console.log(document.cookie))-'&a=1&fx=326tTDE could result in the injected JS to be executed in the context of the rendered page. These pages are all GET-based, no POST requests are supported.
These parameters are written in the markup in numerous places, so doing an HTML encode (in all these places) would be tedious and require more code changes. In some cases, they are also written to cookies.
Ideally, we would want to detect them as early as possible, say inside a Servlet Filter/Spring Interceptor and then for each request parameter, decide if we want to drop it all together, or sanitize it in some way, before it's available to the rest of the application. We would want this decision to be configurable as well, so that the approach to handle a particular request parameter can be modified over time without significant code change.
Now, since these are request parameters that we want to potentially modify, we would probably have to use an approach similar to the one described here, if we go the Filter way. We would potentially want to sanitize HTTP Request Headers similarly too.
So, what would be the most flexible/minimum overhead way to handle this situation? Would ESAPI be able to both detect and sanitize them, in a configurable way? It's not clear from its API as to what is possible. We would definitely not want to hand-roll regexes to do this. Also, would a Filter be the right place to handle this?
Thanks.

REST design: what verb and resource name to use for a filtering service

I am developing a cleanup/filtering service that has a method that receives a list of objects serialized in xml, and apply some filtering rules to return a subset of those objects.
In a REST-ful service, what verb shall I use for such a method? I thought that GET is a natural choice, but I have to put the serialized XML in the body of the request which works but feels incorrect. The other verbs don't seem to fit semantically.
What is a good way to define that Service interface? Naming the resource /Cleanup or /Filter seems weird mainly because in the examples I see online, it is always a name rather than a verb being used for resource name.
Am I right to feel that REST services are better suited for CRUD operations and you start bending the rules in situations like this service? If yes, am I then making a wrong architectural choice.
I've pushed to develop this service in REST-ful style (as opposed to SOAP) for simplicity, but such awkward cases happen a lot and make me feel like I am missing something. Either choosing REST where it shouldn't be used or may be over-thinking some stuff that doesn't really matter? In that case, what really matters?
REST is about using HTTP the way it was designed. To be RESTful consider (title was REST design :):
URLs should be permalinks to a resource (caching benefits, storing/sharing endpoints etc...)
Because they are permalinks to a resource, having verbs in the URL is a hint that you're on the wrong path (filter is a verb).
A collection of resources can be an endpoint /foos.
If you want to filter the collection of resources, consider querystring params like ?filter= or something like ?ids=1,2,3,4,5.
A GET should not change resources. Note that 'cleanup' implies something getting deleted so be cautious of changes to resources when you do a GET. REST says a GET shouldn't alter resources. Imagine a caching server taking you're cleanup request as a GET and returning OK because t's cached. Caching servers know not to cache a POST, DELETE etc... (that's the way HTTP was designed).
Don't rule out multiple calls - for example, you may do a get to filter and get a set of resources to clean up and then could be followed by many or one DELETE verb calls to do the cleanup.
Sometimes there's a temporal resource like a transaction or a 'job' that could do work like a cleanup. Don't rule out a POST to the resource with the body containing items to cleanup up and it returns a job id. You can then query the jobid for the cleanup progress or status.
It's hard to give exact guidance because the question isn't clear but hopefully the RESTful principlies guidance and thoughts above set you on the right track. If you clarify the exact calls, I'll try and recommend APIs.
So, let's say you wanted to cleanup duplicate foos.
[GET] /foos/duplicates (or /foos?filter=duplicates)
returns a body with identifies to of foos that are duplicates. Let's say that returns 1,2,5 (could be names).
Then you could issue:
[DELETE] /foos with the body being an array containing 1,2,5 (or names if unique). the delete call is passive so even if the GET call is cached according to REST principles it's fine.
It's also possible and valid to not go the REST route such as POX or JOSN RPC over http but just realize at that point that it's not REST. And that's fine but you're not getting the benefits of REST described in fielding's thesis.
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
Also, read this:
http://blog.steveklabnik.com/posts/2011-07-03-nobody-understands-rest-or-http
EDIT:
After reading the comment where you clarified you're sending the server a set of objects (not persisted server side) and it returns the subset with the dupes filtered out (like a server side helper function), some options are:
Do this client/browser side if possible - why take the network roundtrip to filter out dupes out of collection?
If for some reason only the server has specific knowledge/data to determine that two items are functional equivalent (even though data not exactly the same), then consider POSTing the data set to the server with the response body containing the unique/filtered set. Even though the server isn't persisting the set, it would fall into a 'temporal' object or set and the server is modifying it. It's not conceptually a GET of server resources and caching offers no benefits in that scenario.
Last question first: What really matters is getting the job done in a way that is
Correct
As easy to use as practical
Easily maintained by future programmers (likely to include yourself)
REST is a natural fit for operations on resources where each URL matches some object that can be manipulated. It is a less natural fit for other uses, but these are more guidelines than actual rules. Others have pointed out the original dissertation on REST, but it is worth remembering that few implementations are pure.
If you have several URLs that perform these transformative kinds of functions, consider putting them in their own special URL space, like /api/filter and /api/transliterate, etc.. That will help users and maintainers alike know that certain URLs aren't REST, but are more like remote procedure calls. Posting data to these URLs results in you getting some kind of data back.
If you get stuck on specific names you should make a list of candidates, have a few beers, then choose one from the list. That's what I do when I get stuck on minutia.
SOAP is a neat protocol and has its uses, but it tends to be very heavy. Good documentation and consistency are probably more important to your budding API than using any specific technology.

I am implementing partial updates over POST since I cannot use PATCH. Can I do it conditionally?

So the business requires us to implement Partial Updates.
HTTP PUT only caters to the case where the client sends across a complete representation of the resource.
Hence I decided to use the catch-all HTTP POST to implement the same. Question is, can I still safely take care of Conditional Updates using ETags and Last-Modifieds? Or does the Http Spec prevent me from doing so in any way?
a) Why do you think you can't use PATCH?
b) From the HTTP point of view, the conditional headers apply to all methods. However, there may be existing servers not getting this right, so Id be careful with relying on them. (see, for instance, http://trac.tools.ietf.org/wg/httpbis/trac/ticket/96)

REST - Modify Part of Resource - PUT or POST

I'm seeing a good bit of hand-waving on the subject of how to update only part of a resource (eg. status indicator) using REST.
The options seem to be:
Complain that HTTP doesn't have a PATCH or MODIFY command. However, the accepted answer on HTTP MODIFY verb for REST? does a good job of showing why that's not as good an idea as it might seem.
Use POST with parameters and identify a method (eg. a parameter named "action"). Some suggestions are to specify an X-HTTP-Method-Override header with a self-defined method name. That seems to lead to the ugliness of switching within the implementation based on what you're trying to do, and to be open to the criticism of not being a particularly RESTful way to use POST. In fact, taking this approach starts to feel like an RPC-type interface.
Use PUT to over-write a sub-resource of the resource which represents the specific attribute(s) to update. In fact, this is effectively an over-write of the sub-resource, which seems in line with the spirit of PUT.
At this point, I see #3 as the most reasonable option.
Is this a best practice or an anti-pattern? Are there other options?
There are two ways to view a status update.
Update to a thing. That's a PUT. Option 3
Adding an additional log entry to the history of the thing. The list item in this sequence of log entries is the current status. That's a POST. Option 2.
If you're a data warehousing or functional programming type, you tend to be mistrustful of status changes, and like to POST a new piece of historical fact to a static, immutable thing. This does require distinguishing the thing from the history of the thing; leading to two tables.
Otherwise, you don't mind an "update" to alter the status of a thing and you're happy with a PUT. This does not distinguish between the thing and it's history, and keeps everything in one table.
Personally, I'm finding that I'm less and less trustful of mutable objects and PUT's (except for "error correction"). (And even then, I think the old thing can be left in place and the new thing added with a reference to the previous version of itself.)
If there's a status change, I think there should be a status log or history and there should be a POST to add a new entry to that history. There may be some optimization to reflect the "current" status in the object to which this applies, but that's just behind-the-scenes optimization.
Option 3 (PUT to some separated sub-resource) is your best bet right now, and it wouldn't necessarily be "wrong" to just use POST on the main resource itself - although you could disagree with that depending on how pedantic you want to be about it.
Stick with 3 and use more granular sub-resources, and if you really do have a need for PATCH-like behavior - use POST. Personally, I will still use this approach even if PATCH does actually end up as a viable option.
HTTP does have a PATCH command. It is defined in Section 19.6.1.1 of RFC 2068, and was updated in draft-dusseault-http-patch-16, currently awaiting publication as RFC.
It's ok to POST & emulating PATCH where not available
Before explaining this, it's probably worth mentioning that there's nothing wrong with using POST to do general updates (see here) In particular:
POST only becomes an issue when it is used in a situation for which some other method is ideally suited: e.g., retrieval of information that should be a representation of some resource (GET), complete replacement of a representation (PUT)
Really we should be using PATCH to make small updates to complex resources but it isn't as widely available as we'd like. We can emulated PATCH by using an additional attribute as part of a POST.
Our service needs to be open to third-party products such as SAP, Flex, Silverlight, Excel etc. That means that we have to use the lowest common denominator technology - for a while we weren't able to use PUT because only GET and POST were supported across all the client technologies.
The approach that I've gone with is to have a "_method=patch" as part of a POST request. The benefits are;
(a) It's easy to deal with on the server side - we're basically pretending that PATCH is available
(b) It indicates to third-parties that we are not violating REST but working around a limitation with the browser. It's also consistent with how PUT was handled a few years back by the Rails community so should be comprehensible by many
(c) It's easy to replace when PATCH becomes more widely available
(d) It's a pragmatic response to an awkward problem.
PATCH is fine for patch or diff formats. Until then it's not very useful at all.
As for your solution 2 with a custom method, be it in the request or in the headers, no no no no and no, it's awful :)
Only two ways that are valid are either to PUT the whole resource, with the sub data modified, or POST to that resource, or PUT to a sub-resource.
It all depends on the granularity of your resources and the intended consequences on caching.
A bit late with an answer but I would consider using JSON Patch for scenarios like this.
At the core of it, it requires two copies of the resource (the original and the modified), and performs a diff on it. The outcome of the diff is an array of patch operations describing the difference.
An example of this:
[
{ "op": "replace", "path": "/baz", "value": "boo" },
{ "op": "add", "path": "/hello", "value": ["world"] },
{ "op": "remove", "path": "/foo" }
]
There are many client libraries that can do the hard lifting in generat

"context" on a drupal page

This question is a general one, and I've already posted a version of it here. I'm hoping, though, that I'll have a better chance of getting a response, and of being useful to more people, by asking in this forum.
Associating content together when it all loads on a drupal page is tricky business. In drupal, each page, no matter the site, is basically the same: you have main content in the middle (a view, a node, or multiple nodes), with blocks surrounding that central content. To make the blocks somehow aware of whats in the middle, (much less aware of each other) you either have to do some really fancy footwork in your own custom module, or you have to make "arguments" available in the URL.
I've been studying the spaces/context/features/purl suite of modules provided by developmentseed, and I've also looked into the Panels/Ctools modules made by Earl Miles (the guy who wrote views). While both provide tools to make my job easier, my understanding of each is that I'm still required to place "arguments" in the URL if I want the contents of my blocks defined by my "context" (I use that in the general sense, and not in the specific sense meant by either the context module, or the concept of context in Ctools).
Am I missing something, or is that where we're at with Drupal?
Finally, I should say in closing that I am aware of other modules that help with this kind of thing on a limited, case-by-case basis. The Views attach module and the Node reference views module, for instance, each take a stab at solving this issue for a very specific use case. They're both good modules, and there are others like them, but I'd really like to find a solution to this problem in general.
I guess I do not really understand what you're aiming at, but I'll try nonetheless:
For every non static website, be it based on Drupal or anything else, there are two basic things providing the 'context' for the decision on what content to deliver for given a request.
The first and most important thing is obviously the request itself. This is the only information that is always guaranteed to be there. In most cases, this will simply be a GET request, and with those, the URL is implicitly the main source of 'context' available. POST requests can provide a bit more 'context' besides the URL, but for your question, one could argue that they are just a more complicated variation of a GET request, providing some more 'arguments' besides the ones from the URL (and in most cases, one could turn a POST request into a GET request with a more elaborated URL anyways).
The second 'context providing' thing is the session. Whatever mechanism session handling is based on (predominantly cookies nowadays), the goal is always the same, namely to carry some 'state' information across the boundary of inherently stateless requests. It does this by tying a given request to information from previous requests, stored on the server side. This allows to 'enrich' the information that is available for the decision on what content to deliver for a request. Basically, one could look at it as a way of adding some more 'arguments' to the request.
And that's it. Any other information needed for assembling a response needs to somehow be derived from the information given in the request (and one could say that session handling is already the main process of doing so, by adding 'context' based on a cookie or some other identifier coming with the request).
Drupal reflects this process pretty well, IMHO, as it first assembles the 'main' content for a response based on the URL, with additional information (e.g. about the user) 'attached' in the session. It is only after the main content got assembled via calling $return = menu_execute_active_handler() in index.php, that the other elements of a response get added (e.g. blocks, menus, etc.), by calling theme('page', $return);.
So whatever 'context' it is that you want to 'pass' to those other elements, you either have to 'reextract' it from the information already used for assembling the main content (URL, session), or you have to store it temporarily during the generation of the main context. You can do this in many ways, e.g. by adding it to the information already stored in the session, by using static caching within some functions, by setting global variables (don't ;), by passing stuff through the database, etc...
So again, I do not seem to understand what you are aiming at. What is it that you are missing here?
Good answer from Henrik but I'd like to add that there can be quite a lot of information in the request beside maintaining state with cookies. Think important HTTP headers like accept or language or even X-REQUESTED-WITH. Most webframeworks wrap this information into one convenient datastructure. Unfortunately from the answers given I have to conclude that drupal doesn't.

Resources