Nginx - fails with multipart request with custom boundary having (CRLF),while from RFC it's a perfectly valid payload.
Example payload.
MIME-Version: 1.0
Content-Type: multipart/form-data;
boundary=------%^TestBoundary^%------
with multiple files.
At first the special characters on the header was causing the boundary to be skipped from passing to the backend, added ignore_invalid_headers off. Now I see the content-type header passed to the backend but with a notorious ":" added to it.
multipart/form-data; boundary=------%^TestBoundary^%------:
Any clue what's causing this ? How do i fix it on the nginx before passing to the backend ?
This is is a known bug that is most likely your problem where the "boundary" part is appended to the Content-Type string using CRLF plus a TAB:
https://forum.nginx.org/read.php?29,192093,192102#msg-192102
If you cannot fix this in your code then you would need to use HAProxy or something similar to fix this by replacing the CRLF + TAB with LWS (see the "Important note" at the bottom of section 1.2.2 in this https://www.haproxy.org/download/1.7/doc/configuration.txt)
I'm working on an upload control for ASP.NET, and I need to work with Request.GetBufferlessInputStream()
This returns the raw unprocessed request stream. Is there a built in way to parse the content of this stream, stripping out headers such as the example I've copied below.
If not what is the best approach to parsing the file?
-----------------------------13166267887793
Content-Disposition: form-data; name="uploadFile"; filename="ABigFile.txt"
Content-Type: text/plain
The solution came from inspecting how it's done in System.Web, and adapting to my needs. The core functionality is found in an internal class called HttpMultipartContentTemplateParser. The system for processing the input stream is written up here:
http://blog.appsoftware.com/2014/03/aspnet-file-uploader-with-signalr.html
In an HTTP GET request, parameters are sent as a query string:
http://example.com/page?parameter=value&also=another
In an HTTP POST request, the parameters are not sent along with the URI.
Where are the values? In the request header? In the request body? What does it look like?
The values are sent in the request body, in the format that the content type specifies.
Usually the content type is application/x-www-form-urlencoded, so the request body uses the same format as the query string:
parameter=value&also=another
When you use a file upload in the form, you use the multipart/form-data encoding instead, which has a different format. It's more complicated, but you usually don't need to care what it looks like, so I won't show an example, but it can be good to know that it exists.
The content is put after the HTTP headers. The format of an HTTP POST is to have the HTTP headers, followed by a blank line, followed by the request body. The POST variables are stored as key-value pairs in the body.
You can see this in the raw content of an HTTP Post, shown below:
POST /path/script.cgi HTTP/1.0
From: frog#jmarshall.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32
home=Cosby&favorite+flavor=flies
You can see this using a tool like Fiddler, which you can use to watch the raw HTTP request and response payloads being sent across the wire.
Short answer: in POST requests, values are sent in the "body" of the request. With web-forms they are most likely sent with a media type of application/x-www-form-urlencoded or multipart/form-data. Programming languages or frameworks which have been designed to handle web-requests usually do "The Right Thing™" with such requests and provide you with easy access to the readily decoded values (like $_REQUEST or $_POST in PHP, or cgi.FieldStorage(), flask.request.form in Python).
Now let's digress a bit, which may help understand the difference ;)
The difference between GET and POST requests are largely semantic. They are also "used" differently, which explains the difference in how values are passed.
GET (relevant RFC section)
When executing a GET request, you ask the server for one, or a set of entities. To allow the client to filter the result, it can use the so called "query string" of the URL. The query string is the part after the ?. This is part of the URI syntax.
So, from the point of view of your application code (the part which receives the request), you will need to inspect the URI query part to gain access to these values.
Note that the keys and values are part of the URI. Browsers may impose a limit on URI length. The HTTP standard states that there is no limit. But at the time of this writing, most browsers do limit the URIs (I don't have specific values). GET requests should never be used to submit new information to the server. Especially not larger documents. That's where you should use POST or PUT.
POST (relevant RFC section)
When executing a POST request, the client is actually submitting a new document to the remote host. So, a query string does not (semantically) make sense. Which is why you don't have access to them in your application code.
POST is a little bit more complex (and way more flexible):
When receiving a POST request, you should always expect a "payload", or, in HTTP terms: a message body. The message body in itself is pretty useless, as there is no standard (as far as I can tell. Maybe application/octet-stream?) format. The body format is defined by the Content-Type header. When using a HTML FORM element with method="POST", this is usually application/x-www-form-urlencoded. Another very common type is multipart/form-data if you use file uploads. But it could be anything, ranging from text/plain, over application/json or even a custom application/octet-stream.
In any case, if a POST request is made with a Content-Type which cannot be handled by the application, it should return a 415 status-code.
Most programming languages (and/or web-frameworks) offer a way to de/encode the message body from/to the most common types (like application/x-www-form-urlencoded, multipart/form-data or application/json). So that's easy. Custom types require potentially a bit more work.
Using a standard HTML form encoded document as example, the application should perform the following steps:
Read the Content-Type field
If the value is not one of the supported media-types, then return a response with a 415 status code
otherwise, decode the values from the message body.
Again, languages like PHP, or web-frameworks for other popular languages will probably handle this for you. The exception to this is the 415 error. No framework can predict which content-types your application chooses to support and/or not support. This is up to you.
PUT (relevant RFC section)
A PUT request is pretty much handled in the exact same way as a POST request. The big difference is that a POST request is supposed to let the server decide how to (and if at all) create a new resource. Historically (from the now obsolete RFC2616 it was to create a new resource as a "subordinate" (child) of the URI where the request was sent to).
A PUT request in contrast is supposed to "deposit" a resource exactly at that URI, and with exactly that content. No more, no less. The idea is that the client is responsible to craft the complete resource before "PUTting" it. The server should accept it as-is on the given URL.
As a consequence, a POST request is usually not used to replace an existing resource. A PUT request can do both create and replace.
Side-Note
There are also "path parameters" which can be used to send additional data to the remote, but they are so uncommon, that I won't go into too much detail here. But, for reference, here is an excerpt from the RFC:
Aside from dot-segments in hierarchical paths, a path segment is considered
opaque by the generic syntax. URI producing applications often use the
reserved characters allowed in a segment to delimit scheme-specific or
dereference-handler-specific subcomponents. For example, the semicolon (";")
and equals ("=") reserved characters are often used to delimit parameters and
parameter values applicable to that segment. The comma (",") reserved
character is often used for similar purposes. For example, one URI producer
might use a segment such as "name;v=1.1" to indicate a reference to version
1.1 of "name", whereas another might use a segment such as "name,1.1" to
indicate the same. Parameter types may be defined by scheme-specific
semantics, but in most cases the syntax of a parameter is specific
to the implementation of the URIs dereferencing algorithm.
You cannot type it directly on the browser URL bar.
You can see how POST data is sent on the Internet with Live HTTP Headers for example.
Result will be something like that
http://127.0.0.1/pass.php
POST /pass.php HTTP/1.1
Host: 127.0.0.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://127.0.0.1/pass.php
Cookie: passx=87e8af376bc9d9bfec2c7c0193e6af70; PHPSESSID=l9hk7mfh0ppqecg8gialak6gt5
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 30
username=zurfyx&pass=password
Where it says
Content-Length: 30
username=zurfyx&pass=password
will be the post values.
The default media type in a POST request is application/x-www-form-urlencoded. This is a format for encoding key-value pairs. The keys can be duplicate. Each key-value pair is separated by an & character, and each key is separated from its value by an = character.
For example:
Name: John Smith
Grade: 19
Is encoded as:
Name=John+Smith&Grade=19
This is placed in the request body after the HTTP headers.
Form values in HTTP POSTs are sent in the request body, in the same format as the querystring.
For more information, see the spec.
Some of the webservices require you to place request data and metadata separately. For example a remote function may expect that the signed metadata string is included in a URI, while the data is posted in a HTTP-body.
The POST request may semantically look like this:
POST /?AuthId=YOURKEY&Action=WebServiceAction&Signature=rcLXfkPldrYm04 HTTP/1.1
Content-Type: text/tab-separated-values; charset=iso-8859-1
Content-Length: []
Host: webservices.domain.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: identity
User-Agent: Mozilla/3.0 (compatible; Indy Library)
name id
John G12N
Sarah J87M
Bob N33Y
This approach logically combines QueryString and Body-Post using a single Content-Type which is a "parsing-instruction" for a web-server.
Please note: HTTP/1.1 is wrapped with the #32 (space) on the left and with #10 (Line feed) on the right.
First of all, let's differentiate between GET and POST
Get: It is the default HTTP request that is made to the server and is used to retrieve the data from the server and query string that comes after ? in a URI is used to retrieve a unique resource.
this is the format
GET /someweb.asp?data=value HTTP/1.0
here data=value is the query string value passed.
POST: It is used to send data to the server safely so anything that is needed, this is the format of a POST request
POST /somweb.aspHTTP/1.0
Host: localhost
Content-Type: application/x-www-form-urlencoded //you can put any format here
Content-Length: 11 //it depends
Name= somename
Why POST over GET?
In GET the value being sent to the servers are usually appended to the base URL in the query string,now there are 2 consequences of this
The GET requests are saved in browser history with the parameters. So your passwords remain un-encrypted in browser history. This was a real issue for Facebook back in the days.
Usually servers have a limit on how long a URI can be. If have too many parameters being sent you might receive 414 Error - URI too long
In case of post request your data from the fields are added to the body instead. Length of request params is calculated, and added to the header for content-length and no important data is directly appended to the URL.
You can use the Google Developer Tools' network section to see basic information about how requests are made to the servers.
and you can always add more values in your Request Headers like Cache-Control , Origin , Accept.
There are many ways/formats of post parameters
formdata
raw data
json
encoded data
file
xml
They are controlled by content-type in Header that are representes as mime-types.
In CGI Programming on the World Wide Web the author says:
Using the POST method, the server sends the data as an input stream to
the program. ..... since the server passes information to this program
as an input stream, it sets the environment variable CONTENT_LENGTH to
the size of the data in number of bytes (or characters). We can use
this to read exactly that much data from standard input.
I'm troubleshooting an integration between an external service which posts multipart/form-data data to a Controller in MVC3.
On the production server I've captured erroneous request using HttpRequest.SaveAs to a file.
Is there any tool I can use to "replay" the request on my localhost so I can debug with Visual Studio?
(I've been trying with fiddler but I can't get it working right. If a dump a local request from a simple form with POST my controller recieves the files correctly. If i dump the same request and copy paste it into fiddler as raw and send the files are missing so there's something wrong.)
Since there's a built-in function to dump the request I'm thinking it might be some official way to resend the request as well. Is there a way to achieve this?
I have used NCAT command line tool to replay requests captured by SaveAs method.
Command looks like this:
NCAT localhost 80 < CapFileName
you can find it in NMAP library
See my blog for more information.
I got it working in fiddler if I do exactly this in the composer:
Open the dumpfile in notepad
Choose Parsed
Only enter the Content-Type as headers (and let fiddler add the others even if they were the same)
Paste the body of the request in request body from notepad
POST: http://localhost/Controller/Action
Request headers:
Content-Type: multipart/form-data; boundary=fJP-UWKXo6xvqX7niGR0StXXFQwdKhHc9quF
Request body:
--fJP-UWKXo6xvqX7niGR0StXXFQwdKhHc9quF
Content-Disposition: form-data; name="mmsimage"; filename="IMG_0959.jpg"
Content-Type: image/jpeg; name=IMG_0959.jpg; charset=ISO-8859-1
Content-Transfer-Encoding: binary
<the encoded file goes here as jibberish>
--fJP-UWKXo6xvqX7niGR0StXXFQwdKhHc9quF
Content-Disposition: form-data; name="somefield"
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
value of somefield
--fJP-UWKXo6xvqX7niGR0StXXFQwdKhHc9quF--
Do web browsers send the file size in the http header when uploading a file to the server? And if that is the case, then, is it possible to refuse the file just by reading the header and not wait for the whole upload process to finish?
http://www.faqs.org/rfcs/rfc1867.html
HTTP clients are
encouraged to supply content-length for overall file input so that a
busy server could detect if the proposed file data is too large to be
processed reasonably
But the content-length is not required, so you cannot rely on it. Also, an attacker can forge a wrong content-length.
To read the file content is the only reliable way. Having said that, if the content-lenght is present and is too big, to close the connection would be a reasonable thing to do.
Also, the content is sent as multipart, so most of the modern frameworks decode it first. That means you won't get the file byte stream until the framework is done, which could mean "until the whole file is uploaded".
EDIT : before going too far, you may want to check this other answer relying on apache configuration : Using jQuery, Restricting File Size Before Uploading . the description below is only useful if you really need even more custom feedback.
Yes, you can get some information upfront, before allowing the upload of the whole file.
Here's an example of header coming from a form with the enctype="multipart/form-data" attribute :
POST / HTTP/1.1
Host: 127.0.0.1:8000
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.3) Gecko/2008092414 Firefox/3.0.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.7,fr-be;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------886261531333586100294758961
Content-Length: 135361
-----------------------------886261531333586100294758961
Content-Disposition: form-data; name=""; filename="IMG_1132.jpg"
Content-Type: image/jpeg
(data starts here and ends with -----------------------------886261531333586100294758961 )
You have the Content-Length in the header, and additionally there is the Content-Type in the header of the file part ( each file has its own header, which is the purpose of multipart encoding ). Beware that it's the browser responsibility to set a relevant Content-Type by guessing the file type ; you can't guarantee it, but it should be fairly reliable for early rejection ( yet you'd better check the whole file when it's entirely available ).
Now, there is a gotcha. I used to filter image files like that, not on the size, but on the content-type ; but as you want to stop the request as soon as possible, the same problem arises : the browser only gets your response once the whole request is sent, including form content and thus uploaded files.
If you don't want the provided content and stop the upload, you have no choice but to brutally close the socket. The user will only see a confusing "connection reset by peer" message. And that sucks, but it's by design.
So you only want to use this method in cases of background asynchronous checks ( using a timer that checks the file field ). So I had that hack :
I use jquery to tell me if the file field has changed
When a new file is chosen, disable all other file fields on the same form to get only that one.
Send the file asynchronously ( jQuery can do it for you, it uses a hidden frame )
Server-side, check the header ( content-length, content-type, ... ), cut the connection as soon as you got what you need.
Set a session variable telling if that file was OK or not.
Client-side, as the file is uploaded to a frame you don't even get any kind of feedback if the connection is closed. Your only alternative is a timer.
Client-side, a timer polls the server to get a status for the uploaded file. Server side, you have that session variable set, send it back to the brower.
The client has the status code ; render it to your form : error message, green checkmark/red X, whatever. Reset the file field or disable the form, you decide. Don't forget to re-enable other file fields.
Quite messy, eh ? If any of you has a better alternative, I'm all ears.
I'm not sure, but you should not really trust anything sent in the header, as it could be faked by the user.
It depends on how the server works. For example in PHP your script will not run until the file upload is complete, so this wouldn't be possible.