I have a project where we are sending text payloads over HTTP that include special ASCII codes. Using URLEncode on those special ASCII codes works as expected.
The gotcha, is I'm being told to use the x-www-form-urlencoded content type, but to accept the raw body as if it wasn't a form (no key/value pairs) and just urldecode the received content as a single payload.
Question: Is this reasonable? I haven't run across this before and checking on real-world versus potential 'spec-breaking' usage of x-www-form-urlencode (i.e. the 'form' part doesnt matter and can be ignored).
TIA for the cross-check!
To answer your question: No, it's not reasonable. If the data is not in key/value format, then using x-www-form-urlencoded as Content-Type doesn't make much sense (see the standard and here is a good explaination).
What to use else?
It really depends on your data. From the standard (RFC2045):
Content-Type Header Field
The purpose of the Content-Type field is to describe the data
contained in the body fully enough that the receiving user agent can
pick an appropriate agent or mechanism to present the data to the
user, or otherwise deal with the data in an appropriate manner. The
value in this field is called a media type.
So if you really have only ASCII characters it is still of type text/something. If the raw data is special to your application you can incen
Related
I am looking for a way to send Key-Value pairs from a client to a server using http (POST). What of the both possibilities would you recommend and why? Does the size of the respective URL or header-field matter (what are the sizes exactly, are there any definitions?)? Thanks in advance.
When browsers send a POST from a standard HTML form, they are sending parameters and values that are effectively key-value pairs--as the body of the POST request. The standard format for this includes standard delimiters, e.g. object=triangle&chosen+action=flip, which requires escaping if your keys or values contain them. The standard HTTP libraries for extracting the parameters will expect this, and any limitations on lengths will be specific to those libraries and server specifications. However, it is perfectly acceptable to use any format for data in the body of the POST request. The conventions are actually more specific to the HTML standard.
In a current project the UI posts an ordered list of ids of several files under one key to tell the server in which order the files need to be processed:
file[]=18&file[]=20&...
So far the order is preserved when handing this over from client to server, however I could find no specification whether the HTTP protocol keeps the parameters in the specified order. So the question is, is it safe to depend on the given order, or should I implement a workaround to assign each file id a specific order? E.g.
file_18=0&file_20=1&...
Edit:
jQuery Ui has a serialize method, that will pass the parameters just in the initial way that I described above:
foo_1, foo_5, foo_2 will serialize to foo[]=1&foo[]=5&foo[]=2
This is for a sortable list, so I assume they know what they are doing.
Depends on the server. In general, the order is guaranteed by the TCP protocol. If you can read in this order, your HTTP parser stores the parameters in the direct sequence - do not worry. Nobody will be changing these parameters in some places.
HTTP doesn't specify the format of GET and POST data. So they just get passed as blobs of data.
It is up to your form data parser to maintain the order (I'm not aware of any that don't (for identically named fields).
I was reading about ASP.NET Script Exploits, and one of the suggestions is: (emphasis is mine; and the suggestion is #3 in section "Guarding Against Scripting Exploits
" in the web page)
If you want your application to accept some HTML (for example, some formatting instructions from users), you should encode the HTML at the client before it is submitted to the server. For more information, see How to: Protect Against Script Exploits in a Web Application by Applying HTML Encoding to Strings.
Isn't that really bad advice? I mean, an exploiter could send the HTML via curl or something similar, and the HTML would then be sent un-encoded to the server, which can't be good(?)
Am I missing something here or mis-interpreting the statement?
Microsoft is not wrong in their sentence, but on the other hand far from complete, and their sentence is dangerous.
Since by default, validateRequest == true, you indeed should encode special HTML characters in the client in order for them to get into the server in the first place and bypass validateRequest.
But - they should have emphasized that this is certainly not a replacement for server side filtering and validation.
Specifically, if you must accept HTML, the strongest advice is to use white-listing instead of black filtering (i.e. allow very specific HTML tags and eliminate all the others). Use of Microsoft AntiXSS library is highly recommended for strong user input filtering. It's far better than "re-inventing the wheel" yourself.
I don't think that advice is good...
From my experience I would totally agree with your thought and replace that advice with the following:
all input has to be checked server-side first thing on arrival
all input that can possibly contain "active content" (like HTML, JavaScript...) has to be escaped on arrival and never be sent to any client till full sanitazion took place
I would never trust the client to send trusted data. As you stated there are simply too many ways that data can be submitted. Even non-malicious users may be able to bypass the system on the client if they have JavaScript disabled.
However on the link from that item it becomes clear what they mean with point 3:
You can help protect against script exploits in the following ways:
Perform parameter validation on form variables, query-string
variables, and cookie values. This validation should include two types
of verification: verification that the variables can be converted to
the expected type (for example, convert to an integer, convert to
date-time, and so on), and verification of expected ranges or
formatting. For example, a form post variable that is intended to be
an integer should be checked with the Int32.TryParse method to verify
the variable really is an integer. Furthermore, the resulting integer
should be checked to verify the value falls within an expected range
of values.
Apply HTML encoding to string output when writing values back out
to the response. This helps ensure that any user-supplied string input
will be rendered as static text in the browsers instead of executable
script code or interpreted HTML elements.
HTML encoding converts HTML elements using HTML–reserved characters so
that they are displayed rather than executed.
I think that this is just a case of a misplaced word because there is no way you can perform this level of validation on the client and in the examples contained in the link it is clearly server side code being presented without any mention of the client.
Edit:
You also have request validation enabled by default right? So clearly the focus of protecting content is on the server as far as Microsoft is concerned.
I think the author of the article misspoke. If you go to the linked web page, it talks about encoding data before it's sent back to the client, not the other way around. I think this is just an editing error by the author and he intended to say the opposite.. to encode it before it's returned to the client.
Below is the statement written at http://hateinterview.com/java-script/methods-get-vs-post-in-html-forms/1854.html
By specification, GET is used basically for retrieving data where as POST is used for data storing, data updating, ordering a product or even e-mailing
Whenever i used get or post method, i used them to get the parameters with getparameter() method of httprequest. I did not get in above statement, how post method is used for datastoring or dataupdation which we cant achieve with get method. Looking for a very brief example.
EDIT: Thanks all for your answers but i am specifically looking for meaning of data storing, data updating in post method apart from file loading stuff.
GET can in theory also store and update data, but it's simply not safe. It's too easy to accidently store or update data by just bookmarking it, following a link or being indexed by a searchbot. POST requests are not bookmarkable/linkable and not indexed by searchbots. Also, the GET query string has a limited length, the safe limit is 255 characters. The POST request body can however be as large as 2GB. Also, uploading files is not possible by GET.
One difference is that the GET data (in the URL, as another answer says) show up at a *nix server as the contents of the environment variable QUERY_STRING, whereas POST data appear on stdin. Regardless of how they are packaged and sent, GET and POST data are identical in format, in my experience.
#Mohit edited his question to add: "Thanks all for your answers but i am specifically looking for meaning of data storing, data updating in post method apart from file loading stuff. "
Read rfc2616, Hypertext Transfer Protocol -- HTTP/1.1, specifically Sections 9.3 GET and 9.5 POST:
"The GET method means retrieve ... information."
"The POST method is used to request that the origin server accept" information.
To be strictly compliant with rfc2616, use the GET method to read data from the server. Use the POST method to write data to the server.
The "meaning of data storing, data updating" is exactly that. How could this possibly be any clearer or more explicit?
POST sends the data in the body while GET puts the data in the URL...
For example to upload a file you use POST... since GET puts the data into URL the data is visible to the user and the length is limited.
see for example http://www.cs.tut.fi/~jkorpela/forms/methods.html
there are things you can not do with GET !first of them is with post you can upload files!
See this: Article or this one
Short version of the question:
Does "GET" at a particular URI need to match what was "PUT" to that URI?
I think not. Here's why:
Given that a resource is an abstract thing that is theoretically unknowable by the client, when we do a PUT, we must be only sending a representation. Based on combing over RFC2616, it doesn't seem entirely specified as to what that means for a resource that has many (potentially infinite?) representations, but here are my thoughts; please tell me if you agree:
My expectation is that if I PUT a representation to a resource, all other representations of the resource at that URI should be kept consistent (potentially updated) as necessary. In other words, you're telling the resource "use this representation to redefine yourself".
Thus, I should be able to do this:
PUT /resources/foo/myvacation
Content-type: image/jpg
...
And follow up with this:
GET /resources/foo/myvacation
Accept: image/png
...
and get the updated version of myvacation in a different format (assuming the server knows how to do that). Extrapolating from that, this composite atomic "image + metadata" PUT should also be legal:
PUT /resources/foo/myvacation
Content-type: multipart/form-data
Content-disposition: form-data; name="document"
Content-type: image/jpg
[..]
Content-disposition: form-data; name="iptc"
Content-type: application/iptc
[..]
Content-disposition: form-data; name="exif"
Content-type: application/exif
[..]
And then, because server-side content negotiation (RFC2616 section 12.1) can take place based on just about anything, we can default to the "document" content for this:
GET /resources/foo/myvacation
Content-type: image/jpg
[..]
or if you believe as I do that RFC 2396 section 3.4 "The query component is a string of information to be interpreted by the resource." means that a URI with a query string refers to the same resource as a URI without a query string (and is isomorphic with just sending application/x-form-urlencoded data to the resource), then this should also be legal:
GET /resources/foo/myvacation?content=exif
Content-type: application/exif
[..]
The description of PUT says:
The PUT method requests that the enclosed entity be stored under the supplied Request-URI.
To me, this is fairly anti-REST, unless you read it in a very liberal manner. My interpretation is "The PUT method requests that a resource be created or updated at the supplied Request-URI based on the representation of the enclosed entity."
Later on, we get:
The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity. That resource might be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended and the server MUST NOT attempt to apply the request to some other resource.
We need to similarly read this creatively, but the key bits here are "knows what URI is intended" and "apply the request".
So, to me the representation returned by GET at a given URI does not necessarily have to be the same representation that was PUT to the given URI, it just has to be consistent.
True or false?
Based on the fact that content negotiation can return different representations from the same URI, I am quite sure that what you PUT does not have to be the same as what you retrieve.
Your assumptions are correct. The GET doesn't necessarily have to return the same representation as what you PUT, but it does have to be the same resource.
I'm currently working on a web application that will return any resource as XHTML, JSON, or a custom XML dialect, depending on what you ask for in the content negotiation headers. So a browser will see the HTML by default. Other HTTP clients, including XMLHttpRequest, can get the JSON, and so on. They are all representations of the same resource at the same URI.
Likewise, our application will accept a PUT or POST in any of the supported formats (subject to the semantics of the particular resource or collection in question.)
I agree with the other answers that resource that you PUT is not required to be the same as the one that you later GET. I wanted to add some of my experience to this question around this area.
You need to be very careful when relying on content-negotiation. It is very tricky to get right and if you don't get it right leads to nasty user problems. Let's do an example based on images...
If Alice PUTs an image in a raw format, then Bob can GET the image as a jpeg(through a server side raw->jpeg transform), and Alice can GET the image in a raw format; No problems. However, if Bob PUTs a jpeg, then there is no way to get back to the raw format for Alice. In the case of vacation photos the lack of symmetric transforms may not be a big deal, but in medical images it would be.
Another area where the lack of symmetric transforms bites you is in representations where one has a schema and the other does not. In this case, on the server side you end up making conventions for how to transform between them. But you get into big problems when you are dealing with documents with schemas that change over time and are out of your control. Everytime the schema changes you have to update all of your transforms for the new schema shape, while still supporting resources using the old schema. Content negotiation quickly becomes more trouble than its worth except for a few limited circumstances. One of the areas where it can be manageable is if you fully control the resource representation and its underlying schema. Another area is if the resource formats are standards and its possible to make symmetric transformations between the different formats.
If you are transforming then it would make sense that what you PUT is not what you GET, so I don't see why it is a problem.
But, if you PUT a user with certain information, then when you use GET then it should retrieve that person, just as, when I put my 4th vacation photo, when I call GET I expect that photo, but it may be transformed by converting to a different format, or have some other transforms, but, if I get the 5th photo instead, then that is a problem.