I apologize if there's an answer elsewhere.
I'm building a simple server and am now working on static file responses. Should I refuse all http methods except GET when serving static content? By static content I am referring to files stored on the file system on the server.
My immediate hunch is to only allow GET, but I want to make sure before locking it down.
What http method(s) should resolve static files of the form:
http://somedomain.com/foo/bar/baz.css?
Not necessarily requested through the browser, obviously.
All HTTP requests have a specific purpose. If you don't plan to implement that purpose or feature, block it with 405 Method Not Allowed.
For example, do you want to allow others update the files? You'll need PUT then. I'd recommend simply reading what the methods mean so you know what makes sense and what not.
Intuitively I think you probably only need GET and HEAD. I think it's good to respond to OPTIONS with a correct response as well.
Related
I have a db with original file names, location to files on disk, meta data like user that owns file... Those files on disk are with scrambled names. When user requests a file, the servlet will check whether he's authorized, then send the file in it's original name.
While researching on the subject i've found several cases that cover that issue, but nothing specific to mine.
Essentially there are 2 solutions:
A custom servlet that handles headers and other stuff the Default Servlet containers don't: http://balusc.omnifaces.org/2009/02/fileservlet-supporting-resume-and.html
Then there is the quick and easy one of just using the Default Servlet and do some path remapping. For ex., in Undertow you configure the Undertow subsystem and add file handlers in the standalone.xml that map http://example.com/content/ to /some/path/on/disk/with/files .
So i am leaning towards solution 1, since solution 2 is a straight path remap and i need to change file names on the fly.
I don't want to reinvent the hot water. And both solutions are non standard. So if i decide to migrate app server to other than Wildfly, it will be problematic. Is there a better way? How would you approach this problem?
While your problem is a fairly common one there isn't necessarily a standards based solution for every possible design challenge.
I don't think the #2 solution will be sufficient - what if two threads try to manipulate the file at the same time? If someone got the link to the file could they share it?
I've implemented something very similar to your #1 solution - the key there is that even if the link to the file got out no one could reuse the link as it requires security. You would just "return" a 401 or 403 for the resource.
Another possibility depends on how you're hosted. Amazon S3 allows you to generate a signed URL that has a limited time to live. In this way your server isn't sending the file directly. It is either sending a redirect or a URL to the front end to use. Keep the lifetime at like 15 seconds (depending on your needs) and then the URL is no longer valid.
I believe that the other cloud providers have a similar capability too.
I'd like to ask a question regarding HTTP Conditional GET. Is this actually reliable to detect changes of a web resource?
I mean, if I write a program to check if the page content is changed by using HTTP Conditional GET, is it possible that the web server is misconfigured (or intentionally configured) to return there are no changes even though the contents of the HTML or XML (Restful) has changed?
(I'm referring to requesting a web page with a header "If-Modified-Since" as part of the GET request. So, is the modified date that comes back is always reliable?)
Of course it is possible. But the whole point of using a communication protocol, is that you trust that the other side is fulfilling it.
Usually, the situations like the one you mention are called "byzantine", because one of the ends is not following the protocol or failing, but cheating.
Yes, it is possible for a server to say that the content hasn't changed even though it has. It is still just code running on the server so it can do anything it wants.
While reading some articles about writing web servers using Twisted, I came across this page that includes the following statement:
While it's convenient for this example, it's often not a good idea to
make a resource that POSTs to itself; this isn't about Twisted Web,
but the nature of HTTP in general; if you do this, make sure you
understand the possible negative consequences.
In the example discussed in the article, the resource is a web resource retrieved using a GET request.
My question is, what are the possible negative consequences that can arrive from having a resource POST to itself? I am only concerned about the aspects related to the HTTP protocol, so please ignore the fact that I mentioned about Twisted.
The POST verb is used for making a new resource in a collection.
This means that POSTing to a resource has no direct meaning (POST endpoints should always be collections, not resources).
If you want to update your resource, you should PUT to it.
Sometimes, you do not know if you want to update or create the resource (maybe you've created it locally and want to create-or-update it). I think that in that case, the PUT verb is more appropriate because POST really means "I want to create something new".
There's nothing inherently wrong about a page POSTing back to itself - in fact, many of the widely-used frameworks (ASP.NET, etc.) use that method to handle various events that happen on the client - some data is posted back to the same page where the server processes it and sends a new reponse.
I'm trying to improve performance for clients fetching pages from my Compojure webserver. We serve up a bunch of static files (JS, CSS) using (compojure.route/resources "/"), which looks for files on the filesystem, converts them to URLs, and then serves them to Ring as streams. By converting to streams, it seems to lose all file metadata, such as the mod time.
I can wrap the static-resource handler and add an Expires or Cache-Control: max-age header, but that prevents the client from sending any request at all. Useful, but these files do change on occasion (when we put out a release).
Ideally I'd like the client to trust its own cached version for, say, an hour, and make a request with an If-Modified-Since header after that hour has passed. Then we can just return 304 Not Modified and the client avoids downloading a couple hundred kilos of javascript.
It looks like I can set a Last-Modified header when serving a response, and that causes the client to qualify subsequent requests with If-Modified-Since headers. Great, except I'd have to rewrite most of the code in compojure.route/resources in order to add Last-Modified - not difficult, but tedious - and invent some more code to recognize and respond to the If-Modified-Since header. Not a monumental task, but not a simple one either.
Does this already exist somewhere? I couldn't find it, but it seems like a common enough, and large enough, task that someone would have written a library for it by now.
FWIW, I got this to work by using Ring's wrap-file-info middleware; I'm sorta embarrassed that I looked for this in Compojure instead of Ring. However, compojure.route's files and resources handlers both serve up streams instead of Files or URLs, and of course Ring can't figure out metadata from that.
I had to write basically a copy of resources that returns a File instead; when wrapped in wrap-file-info that met my needs. Still wouldn't mind a slightly better solution that doesn't involve copying a chunk of code from Compojure.
Have you considered using the ring-etag-middleware? It uses the last modified date of a file to generate the entity tag. It then keys a 304 on a match to the if-none-match header in the request.
I am creating a MVC 3 application (although just as applicable to other technologies e.g. ASP.NET Forms) and was just wondering if it is feasible (performance wise) to serve images from code rather than using the direct virtual path (like usual).
The idea is that I improve the common method of serving files to:
Apply security checks
Standardised method of serving files based on route values
Returning modified images (if requested) e.g. different dimentions (ok this would only be used sparingly so don't relate this to the performance question above).
Perform business logic before allowing access to the resource
I know HOW to do it but I don't know IF I should do it.
What are the performance issues (if any)
Does something weird happen e.g. images only load sequentially (maybe that's how HTML does it currently i am not sure - exposing my ignorance here).
Anything else you can think of.
Hope this all makes sense!
Thanks,
Dan.
UPDATE
OK - lets get specific:
What are the performance implications for using this type of method for serving all images in MVC 3 using a memory stream? Note: the image url would be GenericFetchImage/image1 (and just for simplicity - all my images are jpegs).
public FileStreamResult GenericFetchImage(string RouteValueRefToImage)
{
// Create a new memory stream object
MemoryStream ms = new MemoryStream();
// Go get image from file location
ms = GetImageAndPutIntoMemoryStream(RouteValueRefToImage);
// return the output as a file
return new FileStreamResult(ms, "image/jpeg");
}
I know that this method works, because I am using it to dynamically generate an image based on a session value for a captcha image. It's pretty neat - but I would like to use this method for all image retrieval.
I guess I am wondering in the above example if this is ok to do or whether it requires more processing to perform and if so, how much? For example, if the number of visitors were to multiply by 1000 for example, would the server be then processingly burdened in the delivery of images..
THANKS!
A similar question was asked before (Can an ASP.Net MVC controller return an Image?) and it appears that the performance implications are very small to serving images out of actions vs directly. As the accepted answer noted, the difference appears to be on the order of a millisecond (in that test case, about 13%). You could re-run the test locally and see what the difference is on your hardware.
The best answer to your question of if you should be using it is from this answer to (another) similar question (emphasis mine):
DO worry about the following: you will need to re-implement a caching strategy on the server, since IIS manages that for static files requested directly. You will also need to make sure you manage your client-side caching with the correct headers included in the response. Ultimately, just ask yourself if re-inventing a method of serving static files from a server is something that serves your application's needs.
To address the specific cases you provided with the question:
Apply security checks
You can already do this using the IIS 7 integrated pipeline. Relevant bit from documentation:
Allowing services provided by both native and managed modules to apply to all requests, regardless of handler. For example, managed Forms Authentication can be used for all content, including ASP pages, CGIs, and static files.
Standardised method of serving files based on route values
If I'm reading the documentation correctly you can insert a module early enough in the pipeline to re-write incoming URLs to point directly to static resources and let IIS handle the request from there. (For the sake of completeness there also this related question regarding mapping routes to mages: How do I route images using ASP.Net MVC routing?)
Empowering ASP.NET components to provide functionality that was previously unavailable to them due to their placement in the server pipeline. For example, a managed module providing request rewriting functionality can rewrite the request prior to any server processing, including authentication.
There are also some pretty powerful URL rewrite features that come with IIS more or less out of the box.
Returning modified images (if requested) e.g. different dimentions (ok this would only be used sparingly so don't relate this to the performance question above).
It looks like a module that does this is already available for IIS. Not sure if that would fall under serving images from code or not though, I guess it might.
Perform business logic before allowing access to the resource
If you're performing business logic to generate said resources (like a chart) or as you mentioned a captcha image then yeah, you basically have no choice but to do it this way.