"context" on a drupal page - drupal

This question is a general one, and I've already posted a version of it here. I'm hoping, though, that I'll have a better chance of getting a response, and of being useful to more people, by asking in this forum.
Associating content together when it all loads on a drupal page is tricky business. In drupal, each page, no matter the site, is basically the same: you have main content in the middle (a view, a node, or multiple nodes), with blocks surrounding that central content. To make the blocks somehow aware of whats in the middle, (much less aware of each other) you either have to do some really fancy footwork in your own custom module, or you have to make "arguments" available in the URL.
I've been studying the spaces/context/features/purl suite of modules provided by developmentseed, and I've also looked into the Panels/Ctools modules made by Earl Miles (the guy who wrote views). While both provide tools to make my job easier, my understanding of each is that I'm still required to place "arguments" in the URL if I want the contents of my blocks defined by my "context" (I use that in the general sense, and not in the specific sense meant by either the context module, or the concept of context in Ctools).
Am I missing something, or is that where we're at with Drupal?
Finally, I should say in closing that I am aware of other modules that help with this kind of thing on a limited, case-by-case basis. The Views attach module and the Node reference views module, for instance, each take a stab at solving this issue for a very specific use case. They're both good modules, and there are others like them, but I'd really like to find a solution to this problem in general.

I guess I do not really understand what you're aiming at, but I'll try nonetheless:
For every non static website, be it based on Drupal or anything else, there are two basic things providing the 'context' for the decision on what content to deliver for given a request.
The first and most important thing is obviously the request itself. This is the only information that is always guaranteed to be there. In most cases, this will simply be a GET request, and with those, the URL is implicitly the main source of 'context' available. POST requests can provide a bit more 'context' besides the URL, but for your question, one could argue that they are just a more complicated variation of a GET request, providing some more 'arguments' besides the ones from the URL (and in most cases, one could turn a POST request into a GET request with a more elaborated URL anyways).
The second 'context providing' thing is the session. Whatever mechanism session handling is based on (predominantly cookies nowadays), the goal is always the same, namely to carry some 'state' information across the boundary of inherently stateless requests. It does this by tying a given request to information from previous requests, stored on the server side. This allows to 'enrich' the information that is available for the decision on what content to deliver for a request. Basically, one could look at it as a way of adding some more 'arguments' to the request.
And that's it. Any other information needed for assembling a response needs to somehow be derived from the information given in the request (and one could say that session handling is already the main process of doing so, by adding 'context' based on a cookie or some other identifier coming with the request).
Drupal reflects this process pretty well, IMHO, as it first assembles the 'main' content for a response based on the URL, with additional information (e.g. about the user) 'attached' in the session. It is only after the main content got assembled via calling $return = menu_execute_active_handler() in index.php, that the other elements of a response get added (e.g. blocks, menus, etc.), by calling theme('page', $return);.
So whatever 'context' it is that you want to 'pass' to those other elements, you either have to 'reextract' it from the information already used for assembling the main content (URL, session), or you have to store it temporarily during the generation of the main context. You can do this in many ways, e.g. by adding it to the information already stored in the session, by using static caching within some functions, by setting global variables (don't ;), by passing stuff through the database, etc...
So again, I do not seem to understand what you are aiming at. What is it that you are missing here?

Good answer from Henrik but I'd like to add that there can be quite a lot of information in the request beside maintaining state with cookies. Think important HTTP headers like accept or language or even X-REQUESTED-WITH. Most webframeworks wrap this information into one convenient datastructure. Unfortunately from the answers given I have to conclude that drupal doesn't.

Related

How to detect and possibly drop/sanitize http request parameters/headers to prevent XSS attacks

Recently, we found that some of our SpringMVC based site's pages, that accept query parameters, are susceptible to XSS attacks. For e.g. a url like http://www.our-site.com/page?s='-(console.log(document.cookie))-'&a=1&fx=326tTDE could result in the injected JS to be executed in the context of the rendered page. These pages are all GET-based, no POST requests are supported.
These parameters are written in the markup in numerous places, so doing an HTML encode (in all these places) would be tedious and require more code changes. In some cases, they are also written to cookies.
Ideally, we would want to detect them as early as possible, say inside a Servlet Filter/Spring Interceptor and then for each request parameter, decide if we want to drop it all together, or sanitize it in some way, before it's available to the rest of the application. We would want this decision to be configurable as well, so that the approach to handle a particular request parameter can be modified over time without significant code change.
Now, since these are request parameters that we want to potentially modify, we would probably have to use an approach similar to the one described here, if we go the Filter way. We would potentially want to sanitize HTTP Request Headers similarly too.
So, what would be the most flexible/minimum overhead way to handle this situation? Would ESAPI be able to both detect and sanitize them, in a configurable way? It's not clear from its API as to what is possible. We would definitely not want to hand-roll regexes to do this. Also, would a Filter be the right place to handle this?
Thanks.

REST design: what verb and resource name to use for a filtering service

I am developing a cleanup/filtering service that has a method that receives a list of objects serialized in xml, and apply some filtering rules to return a subset of those objects.
In a REST-ful service, what verb shall I use for such a method? I thought that GET is a natural choice, but I have to put the serialized XML in the body of the request which works but feels incorrect. The other verbs don't seem to fit semantically.
What is a good way to define that Service interface? Naming the resource /Cleanup or /Filter seems weird mainly because in the examples I see online, it is always a name rather than a verb being used for resource name.
Am I right to feel that REST services are better suited for CRUD operations and you start bending the rules in situations like this service? If yes, am I then making a wrong architectural choice.
I've pushed to develop this service in REST-ful style (as opposed to SOAP) for simplicity, but such awkward cases happen a lot and make me feel like I am missing something. Either choosing REST where it shouldn't be used or may be over-thinking some stuff that doesn't really matter? In that case, what really matters?
REST is about using HTTP the way it was designed. To be RESTful consider (title was REST design :):
URLs should be permalinks to a resource (caching benefits, storing/sharing endpoints etc...)
Because they are permalinks to a resource, having verbs in the URL is a hint that you're on the wrong path (filter is a verb).
A collection of resources can be an endpoint /foos.
If you want to filter the collection of resources, consider querystring params like ?filter= or something like ?ids=1,2,3,4,5.
A GET should not change resources. Note that 'cleanup' implies something getting deleted so be cautious of changes to resources when you do a GET. REST says a GET shouldn't alter resources. Imagine a caching server taking you're cleanup request as a GET and returning OK because t's cached. Caching servers know not to cache a POST, DELETE etc... (that's the way HTTP was designed).
Don't rule out multiple calls - for example, you may do a get to filter and get a set of resources to clean up and then could be followed by many or one DELETE verb calls to do the cleanup.
Sometimes there's a temporal resource like a transaction or a 'job' that could do work like a cleanup. Don't rule out a POST to the resource with the body containing items to cleanup up and it returns a job id. You can then query the jobid for the cleanup progress or status.
It's hard to give exact guidance because the question isn't clear but hopefully the RESTful principlies guidance and thoughts above set you on the right track. If you clarify the exact calls, I'll try and recommend APIs.
So, let's say you wanted to cleanup duplicate foos.
[GET] /foos/duplicates (or /foos?filter=duplicates)
returns a body with identifies to of foos that are duplicates. Let's say that returns 1,2,5 (could be names).
Then you could issue:
[DELETE] /foos with the body being an array containing 1,2,5 (or names if unique). the delete call is passive so even if the GET call is cached according to REST principles it's fine.
It's also possible and valid to not go the REST route such as POX or JOSN RPC over http but just realize at that point that it's not REST. And that's fine but you're not getting the benefits of REST described in fielding's thesis.
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
Also, read this:
http://blog.steveklabnik.com/posts/2011-07-03-nobody-understands-rest-or-http
EDIT:
After reading the comment where you clarified you're sending the server a set of objects (not persisted server side) and it returns the subset with the dupes filtered out (like a server side helper function), some options are:
Do this client/browser side if possible - why take the network roundtrip to filter out dupes out of collection?
If for some reason only the server has specific knowledge/data to determine that two items are functional equivalent (even though data not exactly the same), then consider POSTing the data set to the server with the response body containing the unique/filtered set. Even though the server isn't persisting the set, it would fall into a 'temporal' object or set and the server is modifying it. It's not conceptually a GET of server resources and caching offers no benefits in that scenario.
Last question first: What really matters is getting the job done in a way that is
Correct
As easy to use as practical
Easily maintained by future programmers (likely to include yourself)
REST is a natural fit for operations on resources where each URL matches some object that can be manipulated. It is a less natural fit for other uses, but these are more guidelines than actual rules. Others have pointed out the original dissertation on REST, but it is worth remembering that few implementations are pure.
If you have several URLs that perform these transformative kinds of functions, consider putting them in their own special URL space, like /api/filter and /api/transliterate, etc.. That will help users and maintainers alike know that certain URLs aren't REST, but are more like remote procedure calls. Posting data to these URLs results in you getting some kind of data back.
If you get stuck on specific names you should make a list of candidates, have a few beers, then choose one from the list. That's what I do when I get stuck on minutia.
SOAP is a neat protocol and has its uses, but it tends to be very heavy. Good documentation and consistency are probably more important to your budding API than using any specific technology.

Any justification for an IT policy that query parameters should not be used?

My company, which builds ad server, affiliate network, contact form and CRM software was acquired last year, and we are now in the process of reworking our technology to fit the IT policies and guidelines of the parent corporation.
One of these policies is a tremendous sticking point and causing all sorts of problems for us:
No query parameters are to be used in any URL visible to the end user
This includes content URLs, ad clickthrough targets, redirects, anything which will either show up in the address bar or in a mouseover status bar update. The effect would be no affiliate ID parameters, media source tracking IDs, session IDs, CMS content selection parameters, anything. Several fundamental functions of our software simply can't be accomplished without passing parameter data from one page to another. In our case, many of these links are from different sites or subdomains, it's not possible to pass data via cookies, either
The only justification I've been given is that query parameters prevent some proxy caches from working properly. This makes no sense to me--I've never heard of such a thing--and nobody is willing or interested in discussing it at length. I've not even been given an example of what specifically is broken or why the policy was created.
In any case, this being a global corporate IT policy, in the end the reasoning doesn't matter, only compliance. Although getting it changed is most likely out of the question, I would still like to understand what valid concerns may have prompted its institution. Understanding the mindset may be a first step towards finding a workaround.
My first thought for a workaround was to embed parameters within the path portion of the URL and extract them with an Apache mod_rewrite, but this is out of the question because:
Corollary: Every URL must present unique content available through no other URL
So making multiple URLs which actually refer to the same page but contain other parameter data in the URL is also unacceptable.
Questions:
Is there valid justification for not using query parameters?
Specifically what proxies or systems fail to work when query parameters are present?
Does it possibly have something to do with SEO? The corollary makes it appear so.
What workarounds might there be for passing data from one site to another under this restriction?
i only have answer for the "workaround" question: use PATH_INFO.
edit to be more specific
instead of /banner.php?what=ever&any=thing use /banner.php/what=ever/any=thing. apache will still serve the request through /banner.php, and /what=ever/any=thing will be present in $_SERVER['PATH_INFO']. you'll have to rawurldecode and explode the string yourself since the webserver won't do that for you, but that's no big deal.

REST - Modify Part of Resource - PUT or POST

I'm seeing a good bit of hand-waving on the subject of how to update only part of a resource (eg. status indicator) using REST.
The options seem to be:
Complain that HTTP doesn't have a PATCH or MODIFY command. However, the accepted answer on HTTP MODIFY verb for REST? does a good job of showing why that's not as good an idea as it might seem.
Use POST with parameters and identify a method (eg. a parameter named "action"). Some suggestions are to specify an X-HTTP-Method-Override header with a self-defined method name. That seems to lead to the ugliness of switching within the implementation based on what you're trying to do, and to be open to the criticism of not being a particularly RESTful way to use POST. In fact, taking this approach starts to feel like an RPC-type interface.
Use PUT to over-write a sub-resource of the resource which represents the specific attribute(s) to update. In fact, this is effectively an over-write of the sub-resource, which seems in line with the spirit of PUT.
At this point, I see #3 as the most reasonable option.
Is this a best practice or an anti-pattern? Are there other options?
There are two ways to view a status update.
Update to a thing. That's a PUT. Option 3
Adding an additional log entry to the history of the thing. The list item in this sequence of log entries is the current status. That's a POST. Option 2.
If you're a data warehousing or functional programming type, you tend to be mistrustful of status changes, and like to POST a new piece of historical fact to a static, immutable thing. This does require distinguishing the thing from the history of the thing; leading to two tables.
Otherwise, you don't mind an "update" to alter the status of a thing and you're happy with a PUT. This does not distinguish between the thing and it's history, and keeps everything in one table.
Personally, I'm finding that I'm less and less trustful of mutable objects and PUT's (except for "error correction"). (And even then, I think the old thing can be left in place and the new thing added with a reference to the previous version of itself.)
If there's a status change, I think there should be a status log or history and there should be a POST to add a new entry to that history. There may be some optimization to reflect the "current" status in the object to which this applies, but that's just behind-the-scenes optimization.
Option 3 (PUT to some separated sub-resource) is your best bet right now, and it wouldn't necessarily be "wrong" to just use POST on the main resource itself - although you could disagree with that depending on how pedantic you want to be about it.
Stick with 3 and use more granular sub-resources, and if you really do have a need for PATCH-like behavior - use POST. Personally, I will still use this approach even if PATCH does actually end up as a viable option.
HTTP does have a PATCH command. It is defined in Section 19.6.1.1 of RFC 2068, and was updated in draft-dusseault-http-patch-16, currently awaiting publication as RFC.
It's ok to POST & emulating PATCH where not available
Before explaining this, it's probably worth mentioning that there's nothing wrong with using POST to do general updates (see here) In particular:
POST only becomes an issue when it is used in a situation for which some other method is ideally suited: e.g., retrieval of information that should be a representation of some resource (GET), complete replacement of a representation (PUT)
Really we should be using PATCH to make small updates to complex resources but it isn't as widely available as we'd like. We can emulated PATCH by using an additional attribute as part of a POST.
Our service needs to be open to third-party products such as SAP, Flex, Silverlight, Excel etc. That means that we have to use the lowest common denominator technology - for a while we weren't able to use PUT because only GET and POST were supported across all the client technologies.
The approach that I've gone with is to have a "_method=patch" as part of a POST request. The benefits are;
(a) It's easy to deal with on the server side - we're basically pretending that PATCH is available
(b) It indicates to third-parties that we are not violating REST but working around a limitation with the browser. It's also consistent with how PUT was handled a few years back by the Rails community so should be comprehensible by many
(c) It's easy to replace when PATCH becomes more widely available
(d) It's a pragmatic response to an awkward problem.
PATCH is fine for patch or diff formats. Until then it's not very useful at all.
As for your solution 2 with a custom method, be it in the request or in the headers, no no no no and no, it's awful :)
Only two ways that are valid are either to PUT the whole resource, with the sub data modified, or POST to that resource, or PUT to a sub-resource.
It all depends on the granularity of your resources and the intended consequences on caching.
A bit late with an answer but I would consider using JSON Patch for scenarios like this.
At the core of it, it requires two copies of the resource (the original and the modified), and performs a diff on it. The outcome of the diff is an array of patch operations describing the difference.
An example of this:
[
{ "op": "replace", "path": "/baz", "value": "boo" },
{ "op": "add", "path": "/hello", "value": ["world"] },
{ "op": "remove", "path": "/foo" }
]
There are many client libraries that can do the hard lifting in generat

So why should we use POST instead of GET for posting data? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
How should I choose between GET and POST methods in HTML forms?
When do you use POST and when do you use GET?
Obviously, you should. But apart from doing so to fulfil the HTTP protocol, are there any reasons to do so? Less overhead? Some kind of security thing?
because GET must not alter the state of the server by definition.
see RFC2616 9.1.1 Safe Methods:
9.1.1 Safe Methods
Implementors should be aware that the
software represents the user in their
interactions over the Internet, and
should be careful to allow the user to
be aware of any actions they might
take which may have an unexpected
significance to themselves or others.
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
If you use GET to alter the state of the server then a search engine bot or some link prefetching extension in a web browser can wreak havoc on your site and (for example) delete all user data just by following links to your site.
There is a nice paper by the W3C about this: URIs, Addressability, and the use of HTTP GET and POST.
1.3 Quick Checklist for Choosing HTTP GET or POST
Use GET if:
The interaction is more like a question (i.e., it is a safe operation such as a query, read operation, or lookup).
Use POST if:
The interaction is more like an order, or
The interaction changes the state of the resource in a way that the user would perceive (e.g., a subscription to a service), or
The user be held accountable for the results of the interaction
Because, if you use GET to alter state, Google can delete your stuff.
When do you use POST and when do you use GET?
How should I choose between GET and POST methods in HTML forms?
If you accept GETs to perform write operations then a malicious hacker could inject somewhere links to perform an unauthorized operation. Your user clicks on a link - and something is deleted from a database. Or maybe some amount of money is transferred away from the user's account if he's still logged in to their online banking.
http://superbank.com/TransferMoney?amount=1000&recipient=2342524
Send a malicious email with an embedded image referencing this link, and as soon as the document is opened, something funny has happened behind the scenes.
GET is limited by the length of URL the browser/server can handle. This used to be as short as 256 characters.
There is atleast one situation where you want a GET to change data on the server. That is when a GET returns data, and you need to record which data was given to a user and when it was given.
If you use complex data types then it must be in a POST it cannot be in a GET. For example testing a WCF web service in a browser can only be done when the contract uses simple data types.
Using GET and POST where it is expected helps to keep your program understandable.
When you use POST, you can see the information being "posted" in the address-bar of the web browser. This is [apparently] not the case when you use the GET method.
This article was somewhere on http://www.w3schools.com/ Once I've found the exact page it was on, I'll repost. :-)

Resources