Generating PDF on the fly with standard HTTP response fields - http

I'm developing a web page with a form which returns a PDF document based on the form data. Currently I use the HTTP response fields
Content-Type: application/pdf
Content-Disposition: attachment; filename="foo.pdf"
However, since the field Content-Disposition is non-standard and doesn't work in all browsers I'm looking for a different approach. Do I have to save the PDF document on the server? What is the modus operandi?
Edit: By "doesn't work in all browsers" I mean that with some browsers the filename is not set to foo.pdf. Dillo, for instance, just sets the default filename (in the download dialog) to the basename of the URL path (plus query string).

Do I have to save the PDF document on the server?
No. As far as the HTTP client is concerned it, the inner workings of the server are completely opaque to it. All it sees is a TCP stream of bytes from the server and how exactly that stream is produced doesn't matter as long as it matches the specified Content-Type.
Just send the PDF right after the HTTP headers and you're done with.
Update due to comment
So if you're wondering how to supply a filename without using a header field: Just augment the URL with it. I.e. something like
http://${DOMAIN}/${PDF_GENERATOR}/${DESIRED_FILENAME}
In the HTTP server add a rewrite rule to simply omit the filename part and redirect to just
http://${DOMAIN}/${PDF_GENERATOR}
The HTTP client does not see that, all it see is some URL ending with a "filename", that it can present the user as a default for saving.

Related

Flask send_file: how do I know if I need "as_attachment"?

Flask has a method to return a file: http://flask.pocoo.org/docs/1.0/api/#flask.send_file
There is a parameter called as_attachment with a default of False, and there is a handwaavy statement about it: "For extra security you probably want to send certain files as attachment (HTML for instance)"
How do I know if my use case is "those certain files"? Or alternatively stated, what does this do as opposed to leaving this as False?
You'll find a clue in the same documentation, further down in the parameter list:
as_attachment – set to True if you want to send this file with a Content-Disposition: attachment header.
So when the flag is set, an extra header is added to the response, which controls how the browser will handle the response. From the MDN documenation on Content-Disposition:
In a regular HTTP response, the Content-Disposition response header is a header indicating if the content is expected to be displayed inline in the browser, that is, as a Web page or as part of a Web page, or as an attachment, that is downloaded and saved locally.
Without an explicit Content-Disposition header, a text/html response from your Flask server will be shown as a web page in the browser. If you needed the file to be saved to disk instead (the browser prompting you what to do with the file), then you need to have a Content-Disposition: attachment set.
So when your response content type is likely to be shown in the browser as a web page but you want the user to download it instead, use as_attachment=True. These days, in addition to HTML, you probably want to set that flag for images, PDF files, and XML as well.

Header Accept in HTTP

I have a problem with "Accept" header in http. I've writen a http client, and when I set "Accept: image/png" I can still read any file (like txt, html, etc).
I think it shouldn't be possible when header "Accept" is set like above.
I tried to check how my Firefox behaves. I wrote "about:config" and I set "network.http.accept.default" as "image/png", and I can surf the net as usually.
Am I misunderstanding meaning of this header? I think that I should only be able to open files *.png.
Accept isn't mandatory; the server can (and often does) either not implement it, or decides to return something else.
If the [Accept] header field is present in a request and none of the available representations for the response have a media type that is listed as acceptable, the origin server can either honor the header field by sending a 406 (Not Acceptable) response or disregard the header field by treating the response as if it is not subject to content negotiation.
Source - RFC 7231 5.3.2. Accept
Actually, the former behavior is normal. Let me give you an example.
If the given URL points to a PDF file and the Accept header accepts only docx, then the server will blindly ignore it and send the PDF file because server is not setup to decide between PDF and other documents.
If there are multiple formats available, then server will consider the " Accept " header and try to send the response accordingly, if not, then it will ignore the " Accept " header.
As you suppose, setting Accept means that you can't accept others medias than these specified, and servers should return a 406 response code.
It practice, servers don't implements correctly, and always send a response.
All details are available in RFC 2616
The accept header is poorly implemented by browsers and causes strange errors when used on public sites where crawlers make requests too.
That's why, accept header is ignore most of the time like in the Rail framework.

Difference between downloading a web file directly & indirectly

On my web server I have a video file named 03.mp4.
I have a page (videoserver.aspx) to serve that file using below code
Response.ContentType = "application/octet-stream";
Response.AppendHeader("Content-Disposition", "attachment; filename=video.mp4");
Response.TransmitFile(Server.MapPath("03.mp4"));
Response.End();
Whats the difference between these 2 calls?
1: http://localhost/media/03.mp4
2: http://localhost/media/videoserver.aspx?q=03
When I point to those URLs directly in my browser, it prompts me a Save dialog in both the cases.
I have another web page that has a SWFObject. It consumes a video as input. Ok. When I feed it URL 1, it loads the video.
When I feed it URL 2, it doesn't load the video.
Why this difference? I prefer URL 2 as you can dynamically change the videos you are serving to consumers based on the query-string.
A lot of video players, including the new HTML5 <video> element, require support for so-called byte range requests using the HTTP Range header. This is normally already built in a bit self-respected HTTP server. Basically, to inform the client that the requested URL supports byte range requests, the server is supposed to return Accept-Ranges: bytes on the response and to be able to process all incoming Range requests by serving exactly the requested byte ranges back to the response as per the specification (see the first link on the Range header for detail).
So if you choose to take the HTTP response handling fully in your own hands instead of letting the HTTP server do the job it is designed for, you have to take this carefully into account.
Hence it proves I am a newbie to SWFObject.
The SWFObject I was referring to was dished out by Camtasia and it accepts a mp4 file thru FLashVars.
The question is "why did it not accept URL 2 while it accepted URL 1?". To which the answer is, URL 2 was not ending with .mp4.
And solution to my problem then was, create a handler that would accept */media/*.mp4 path and return the appropriate file's content, which in my case is fetched from DB.

nginx resumable upload with upload_module and multipart/form

I currently upload to a webservice on an nginx server using the upload module (http://www.grid.net.ru/nginx/upload.en.html) from a custom desktop application doing a simple multipart-form POST that sends a file in one part and a base64 encoded XML with the file's metadata in another part.
The server receives this POST, passes it to my webservice which reads the metadata, processes the file and all is good.
What I want to do now is use the upload module's upload_resumable directive to do the POST in several chunks to minimize disconnection chances and allow resume. I can currently do this following the protocol described here: http://www.grid.net.ru/nginx/resumable_uploads.en.html
One sends byte ranges of the file along with some headers to identify the chunk and the session in several posts and once all the parts have been uploaded, nginx will compose the final POST containing the file name and path and pass it to your upload_pass location (which in my case CGIs to a django app).
However, I am not clear on how one would send a multipart post with this method since the protocol indicates that the body of the POST must be the bytes indicated in the byte range. I need the final post to also contain the XML I wrote about above.
I can think of sending the XML as the first bytes of the body and a header that indicates how many bytes belong to it but that would mean extra handling of the final file to remove that header and the final files are potentially in the GB size range.
Any other ideas?
Since the protocol supported by nginx specifically states that the post should not be multipart I ended up sending the file in the body, and the rest of the parameters encoded in the URL. Not the prettiest URLs but it works.

Should a HTTP POST'ed file be base64 encoded?

I'm currently implementing a client application that POST's a file over HTTP and have implemented base64 encoding on the file's data parameter.
However, it appears that when inspecting the traffic between a simple HTML page with a file upload form and the server that no Content-Transfer-Encoding header is sent in the body when describing the file's parameter.
Is this the preferred way of POST'ing a file over HTTP?
No, the preferred way is using multipart/form-data encoding, exactly as you would use with HTML form based file uploads.

Resources