Nginx: referring too-large files uploads to application? - symfony

I need to be able to deal with uploaded files sizes exceeding the maximum nginx and php limits before nginx issues an error 413 page. Instead, I want to issue an error message within my application (symfony) dialog.
To test the file-size limits in symfony, My test upload file is 600 Mb. When I upload the 600 Mb file under nginx, the upload runs to 100%, then 
reports "413 Request Entity Too Large".
If I run "app/console server:run" (which uses the symfony server instead of nginx), symfony reports the error in the gui before the upload occurs (as intended).
Is there any way to modify the nginx configuration so it reads the $_SERVER[CONTENT_LENGTH] or $_SERVER[HTTP_CONTENT_LENGTH], aborts the upload, and then passes the rejected request to the app? Symfony flags the error depending on CONTENT_LENGTH (and, with a work-around for a symfony issue, HTTP_CONTENT_LENGTH).
File size limits:
src/my_app/CoreBundle/Resources/config/validation.yml: maxSize: '500M'
/etc/php5/cgi/php.ini:post_max_size = 550M
/etc/php5/cgi/php.ini:upload_max_filesize = 500M
/etc/php5/cli/php.ini:post_max_size = 550M
/etc/php5/cli/php.ini:upload_max_filesize = 500M
/etc/php5/fpm/php.ini:post_max_size = 550M
/etc/php5/fpm/php.ini:upload_max_filesize = 500M
Versions:
symfony 2.5.12  
nginx 1.4.6-1ubuntu3.4

You could increase the allowed size in the nginx config http block.
client_max_body_size 800m;
Set a value thats higher then the value in your php.ini. Then the nginx server dont response with 413 and symfony shows the normal error page because of the php limitation.

I worked around this by catching changes in the file selection in javascript, then popping up a warning and clearing the selection if the file is too large.
My hunch is that nginx doesn't parse the incoming request in any way, ie, it doesn't read CONTENT_LENGTH or HTTP_CONTENT_LENGTH, and rejects the request purely based on whether the size of the request exceeds client_max_body_size. If that's true (confirmation or denial would be great), then there's no way to deal with this when using nginx except letting nginx run the upload and encounter the 413 error (which is time-consuming for a large file and/or slow network).

Content-Length will work on non chunked content, however I suspect your upload to be chunked and then there is no way that nginx can find out what the size of your content is.
Even if it could, it should not pass it.
By HTTP/1.1 specs:
http://www.ietf.org/rfc/rfc2616.txt:
3.If a Content-Length header field (section 14.13) is present, its
decimal value in OCTETs represents both the entity-length and the
transfer-length. The Content-Length header field MUST NOT be sent
if these two lengths are different (i.e., if a Transfer-Encoding
header field is present). If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.

Related

How-to drop extra data based on "Content-Length" header in nginx

I have a custom application deployed on an IIS instance, which among other things acts as an http file server. For various reasons (bugs), many files are corrupted, in the sense that they have an additional byte at the end of the binary content. Fortunately I've got the exact content length of each file saved on a db, and when my application returns the file, it sets the content-length header correctly (both for corrupted and correct files).
So I have situations where the content-length in the response header says 100, while the bytes actually present in the body of the same response are 101 (100+1).
For some internal reason I cannot change the behavior of the application.
Calling the application directly from the browser (so direct call to IIS) there seem to be no obvious problems, but this situation seems to mess up my nginx (version 1.15.7) behind which the application is exposed in production. Note that the file is served, but it results corrupted (they are Excel files), while those downloaded from direct call to IIS result correct.
I think there is some problem on some internal buffer because it's like it always discards the last 8192 bytes and in the error log it shows this warning: upstream sent more data than specified in "Content-Length" header while reading response header from upstream.
I tried to add the directive proxy_buffering: off; but the result does not change (only the warning disappears from the error log).
My question is: is there any way to trim the response body based on the content-length value provided by my upstream? Obviously if and only if this value is present in the headers.
Thanks,
AleBer

HTTP: Can I trust 'Content-Length' request header value

I'm developing a file uploading service. I want my users to be restricted for total uploaded files sizes, i.e. they have quotas for uploaded files.
So I want to check for available quota as a user starts upload a new file.
The easiest way is to take 'Content-Length' header value of the POST request and check it against remaining user's quota.
But I'm anxious about whether I can trust 'Content-Length' value.
What if a bad guy specifies a small value in 'Content-Length' header and starts uploading a huge file.
Should I check additionally during reading from input stream (and saving a file on disk) or it's redundant (and such a situation should be detected by Web servers)?
Short answer: it's safe
Long answer: Generally, servers are required to read (at most) as many bytes as are specified in the Content-Length request header. Any bytes that comes after that are expected to indicate a completely new request (reusing the same connection).
I'd assume that this requirement is validated on the server by checking that the next few bytes can be parsed as a request-line.
request-line = method SP request-target SP HTTP-version CRLF
If your bad guy is not smart enough to inject request headers in the right locations of the message body, the server should (must?) automatically treat the entire chain of requests as invalid and abort the file upload.
If your guy does inject new request headers in the message body (a.k.a. request smuggling), each resulting request is technically still valid and you'll still be able to trust that the Content-Length is valid for each message body. You just have to watch out for a different kind of attacks. For example: you might have a proxy installed that filters incoming requests, but only does so by checking the headers of the first request. The smuggled requests get a free pass, which is an obvious security breach.

Using If-Modified-Since header for dynamically generated remote files

Our web server regularly downloads images from other web servers. To prevent our server having to download the same image every day even if it has not changed, I plan to store the Last-Modified header when the image downloads and then put that date in the If-Modified-Since header of subsequent requests for the same file.
I have this working fine except when the remote file is generated on-the-fly when requested (e.g. if it generates a certain sized version for the web when requested from separate original file). In this case, the Last-Modified header is the date that the remote server responds to the request so the stored Last-Modified header from the previous download will always be earlier that ones for subsequent requests so the image will always get downloaded and I'll never get the 304 Not Modified status code.
So, is there a way to reduce the download frequency when the remote server is serving up images that are generated on the fly?
It sounds to me like this is not possible, but I thought I'd ask anyway.
If you can create some form of hash for the the images use ETags. Your server will have to check the If-None-Match request header against the hash and if they match you can return a 304 response.
Clients will still send Last-Modified but if your hashing method does not generate many collision you should be able to ignore it and just match the ETags.

Passenger 3.0.17 + Nginx 1.2.4 + "Content-Length" Header = 502 Bad Gateway

I have just run across an issue where setting response.headers['Content-Length'] in a Rails 3.2.2 application causes Nginx to throw a "502 Bad Gateway" error.
I have an action in a controller that uses the send_data method to send raw JPEG data contained within a variable. Before, I had a problem with some browsers not downloading the entire image being sent and discovered that no Content-Length header is being sent, so I decided to use the .bytesize property of the variable containing the JPEG data as the Content-Length.
This works fine in development (using Unicorn) and there is now a Content-Length header where there wasn't one before, but in production where I'm using Nginx and Passenger, I get the afore-mentioned 502 Bad Gateway. Also, in the Nginx error log, I see:
[error] 30574#0: *1686 upstream prematurely closed connection while reading response header from upstream
There are no matching entries in the Rails production log, which tells me the application is fine.
I've commented out the line where I set the Content-Length header, and the problem went away. I'm still testing whether I actually need to send a Content-Length header, but meanwhile, I thought I might post this out of curiosity to see if anyone has any thoughts.
AHA! I had to convert the size to a string by adding the .to_s method. So, my final result is
response.headers['Content-Length'] = photo_data.bytesize.to_s
send_data photo_data, :type => :jpg, :filename => 'file_name.jpg', :disposition => 'attachment'
So, it would seem that it is okay to send the Content-Length header on Nginx/Passenger, but Passenger chokes if it's not explicitly a string.

Should I remove Etag for htm and php pages?

I generate htm files dynamically using php and .htaccess. I read somewhere that I should remove Etags for files of type text/html? Is that correct? I am wondering if I use etags and If i don't change the content, I could save some bandwidth. I would appreciate if you guys could tell me if I can use etags for htm files.
As far as i know the Etag is an http header is generated by the HTTP server used by the cache system.
The idea:
You ask to stackoverflow.com the image logo.png
stackoverflow.com will answer to you with a HTTP 304 (content not modified, etag: XXXXXX)
Your browser before ask the image again will check the cache for a resource called: logo.png, from the website: stackoverflow, with the etag: XXXXXXX
If the browser find it, it will load the image from the cache, without downloading
If it can't find it, it will ask again to the web server to download it.
so... for what propose you want use the ETags ?
If you want understand more about the ETags could be interesting download HttpFox for firefox.
Apache have his own cache system and it's used when you download or require any "static" download, like html files and images.
If you want do it on dynamic context you must implement it by yourself.
Etags can be usefull speeding up your website, even with dynamic content (like php scripts).
Especially on mobile connections this is important, since connection speed is slower.
I use ETag headers on some mobile websites like this:
https://gist.github.com/oliworx/4951478
Hint: You must not include curent time or other often changing content in the page, because this prevents it from beeing cached by the client (the browser).
Remove Etags if possible
The best cache method is max-age. W3C mandates that Browsers must use max-age if available.
When max-age is used the Browser will use the cached version and not even query the Server.
This also means if you are replacing a resource on you web page (e.g. CSS, JS, IMG, link), you should rename the resource.
The next best caching method is Expires.
In every PHP page with an echo it is not a bad idea to always include a max-age header.
header('Cache-Control: max-age=31536000');
These are wise also, (Example Content Type is only for HTML)
header('Content-Type: text/html; charset=utf-8');
header('Connection: Keep-Alive');
header('Keep-Alive: timeout=50, max=100');
eTag has no expiration. The resource must be checked every time.
If you are using max-age or Expires, the Browser will not make an HTTP request to check the resource.
When included with max-age and or expires, it is a waste of hearer space and wastes a few Server CPU cycles to generate or lookup the eTag Value.
The problem with eTag is unless the resource is very large it will have little benefit. In an HTTP Request, the time required for Transmission of the data is often minimal compared to the connect and wait times.
With eTag the Browser still has to do an HTTP Request. When the eTag has not changed, then then the response is 304.
Here is a typical HTTP request:
Only 3 milliseconds to download 2.9KB
454 milliseconds request time. + 58ms DNS (very fast)
DNS Lookup: 58 ms
Initial Connection: 192 ms
Time to First Byte: 262 ms
Content Download: 3 ms
Bytes In (downloaded): 2.9 KB
eTag would save 3 milliseconds.
If resource was cached, it would have freed the connection for another resource in addition to saving the 400-500 ms.
Here is a 301 response from Intel
441 ms
DNS Lookup: 103 ms
Initial Connection: 219 ms
Time to First Byte: 222 ms
Content Download: ms
Bytes In (downloaded): 0.1 KB

Resources