I still don't have a clear picture of practical examples of the chunked header usage, after reading some posts and Wikipedia.
One example I see from Content-Length header versus chunked encoding, is:
On the other hand, if the content length is really unpredictable
beforehand (e.g. when your intent is to zip several files together and
send it as one), then sending it in chunks may be faster than
buffering it in server's memory or writing to local disk file system
first.
So it means that I can send zip files while I am zipping them ? How ?
I've also noticed that if I download a GitHub repo, I am receiving data in chunked. Does GitHub also send files in this way (sending while zipping) ?
A minimal example would be much appreciated. :)
Here is an example using perl (with IO::Compress::Zip module) to send a zipped file on the fly as #deceze pointed to
use IO::Compress::Zip qw(:all);
my #files = ('example.gif', 'example1.png'); # here are some files
my $path = "/home/projects/"; # files location
# here is the header
print "Content-Type: application/zip\n"; # we are going to compress to zip and send it
print "Content-Disposition: attachment; filename=\"zip.zip\"\r\n\r\n"; # zip.zip for example is where we are going to zip data
my $zip = new IO::Compress::Zip;
foreach my $file (#files) {
$zip->newStream(Name => $file, Method => ZIP_CM_STORE); # storing files in zip
open(FILE, "<", "$path/$file");
binmode FILE; # reading file in binary mode
my ($buffer, $data, $n);
while (($n = read FILE,$data, 1024) != 0) { # reading data from file to the end
$zip->print($data); # print the data in binary
}
close(FILE);
}
$zip->close;
As you see in the script so even if you add the zip filename in the header, it doesn't matter, because we are zipping the files and printing it in binary mode right away, so it's not necessary to zip the data and store them then send it to the client, you can directly zip the files and print them without storing it.
Related
I want to benchmark an application by doing many HTTP PUT request that include files body. I have many files and each file need to be sent only one time.
For now I am trying to do that using WRK. One way i have find to do that is to split my data in several repo, giving each WRK thread a repo. But my big problem is how to pass the file as a PUT parameter (basically do a curl -T). For now i am doing it by reding the file in the LUA script and puting the content into wrk.body which is not very performant (too slow).
Here is the part of code i am using to do the PUT with a file parameter :
function read_file(path)
local file, errorMessage = io.open(path, "r")
if not file then
error("Could not read the file: "..path.." Error : " .. errorMessage .. "\n")
end
local content = file:read "*all"
file:close()
return content
end
request = function()
local body = read_file("/data/"..id.."/"..files[counter])
counter = counter + 1
local Boundary = "----WebKitFormBoundaryePkpFF7tjBAqx29L"
wrk.headers["Content-Type"] = "multipart/form-data; boundary=" .. Boundary
return wrk.format("PUT", url, wrk.headers,body)
end
I just want to know if there is a more efficient way to add a file as a PUT (or POST) HTTP request using WRK.
I am trying to download files to disk from squeak.
My method worked fine for small text/html files,
but due to lack of buffering,
it was very slow for the large binary file
https://mirror.racket-lang.org/installers/6.12/racket-6.12-x86_64-win32.exe.
Also, after it finished, the file was much larger (113 MB)
than shown on download page (75MB).
My code looks like this:
download: anURL
"download a file over HTTP and save it to disk under a name extracted from url."
| ios name |
name := ((anURL findTokens: '/') removeLast findTokens: '?') removeFirst.
ios := FileStream oldFileNamed: name.
ios nextPutAll: ((HTTPClient httpGetDocument: anURL) content).
ios close.
Transcript show: 'done'; cr.
I have tried [bytes = stream next bufSize. bytes printTo: ios] for fixed size blocks in HTTP response's contentStream using a [stream atEnd] whileFalse: loop, but that garbled the output file with single quotes around each block, and also extra content after the blocks, which looked like all characters of the stream, each single quoted.
How can I implement buffered writing of an HTTP response to a disk file?
Also, is there a way to do this in squeak while showing download progress?
As Leandro already wrote the issue is with #binary.
Your code is nearly correct, I have taken the liberty to run it - now it downloads the whole file correctly:
| ios name anURL |
anURL := ' https://mirror.racket-lang.org/installers/6.12/racket-6.12-x86_64-win32.exe'.
name := ((anURL findTokens: '/') removeLast findTokens: '?') removeFirst.
ios := FileStream newFileNamed: 'C:\Users\user\Downloads\_squeak\', name.
ios binary.
ios nextPutAll: ((HTTPClient httpGetDocument: anURL) content).
ios close.
Transcript show: 'done'; cr.
As for the freezing, I think the issue is with the one thread for the whole environment while you are downloading. That means that means till you download the whole file you won't be able to use Squeak.
Just tested in Pharo (easier install) and the following code works as you want:
ZnClient new
url: 'https://mirror.racket-lang.org/installers/6.12/racket-6.12-x86_64-win32.exe';
downloadTo: 'C:\Users\user\Downloads\_squeak'.
The WebResponse class, when building the response content, creates a buffer large enough to hold the entire response, even for huge responses! I think this happens due to code in WebMessage>>#getContentWithProgress:.
I tried to copy data from the input SocketStream of WebResponse directly to an output FileStream.
I had to subclass WebClient and WebResponse, and write a two methods.
Now the following code works as required.
| client link |
client := PkWebClient new.
link := 'http://localhost:8000/racket-6.12-x86_64-linux.sh'.
client download: link toFile: '/home/yo/test'.
I have verified block by block update and integrity of the downloaded file.
I include source below. The method streamContentDirectToFile: aFilePathString is the one that does things differently and solves the problem.
WebClient subclass: #PkWebClient
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PK'!
!PkWebClient commentStamp: 'pk 3/28/2018 20:16' prior: 0!
Trying to download http directly to file.!
!PkWebClient methodsFor: 'as yet unclassified' stamp: 'pk 3/29/2018 13:29'!
download: urlString toFile: aFilePathString
"Try to download large files sensibly"
| res |
res := self httpGet: urlString.
res := PkWebResponse new copySameFrom: res.
res streamContentDirectToFile: aFilePathString! !
WebResponse subclass: #PkWebResponse
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PK'!
!PkWebResponse commentStamp: 'pk 3/28/2018 20:49' prior: 0!
To make getContentwithProgress better.!
]style[(38)f1!
!PkWebResponse methodsFor: 'as yet unclassified' stamp: 'pk 3/29/2018 13:20'!
streamContentDirectToFile: aFilePathString
"stream response's content directly to file."
| buffer ostream |
stream binary.
buffer := ByteArray new: 4096.
ostream := FileStream oldFileNamed: aFilePathString.
ostream binary.
[stream atEnd]
whileFalse: [buffer := stream nextInBuffer: 4096.
stream receiveAvailableData.
ostream nextPutAll: buffer].
stream close.
ostream close! !
In Ruby:
require 'open-uri'
download = open('http://example.com/download.pdf')
IO.copy_stream(download, '~/my_file.pdf')
How to do the same in Crystal?
We can do the following:
require "http/client"
HTTP::Client.get("http://example.org") do |response|
File.write("example.com.html", response.body_io)
end
This writes just the response without any HTTP headers to the file. File.write is also smart enough to not download the entire file into memory first, but to write to the file as it reads chunks from the given IO.
I found something that works:
require "http/request"
require "file"
res = HTTP::Client.get "https://ya.ru"
fl=File.open("ya.html","wb")
res.to_io(fl)
fl.close
I have a ZIP File to be served via Symfony. The controller looks like this:
$headers = [
'Content-Type' => 'application/zip',
'Content-Disposition' => 'attachment; filename="archive.zip"'
];
return new Response(file_get_contents($pathToFile), 201, $headers);
And this one works well. However, if I try to use BinaryFileResponse (as the Documentation recommends), the ZIP File gets corrupted:
$response = new BinaryFileResponse($pathToFile);
$response->setContentDisposition(ResponseHeaderBag::DISPOSITION_ATTACHMENT);
$response->setStatusCode(Response::HTTP_CREATED);
return $response;
The output I get when trying to fix the file with zip -FF archive.zip --out fixed.zip :
zip warning: End record (EOCDR) only 17 bytes - assume truncated
(this command fixes the archive correctly)
Is it a bug or am I doing something wrong?
My setup:
Symfony 2.8.11
PHP 7.0.8
Ubuntu 16.04
nginx 1.10.0
EDIT:
I have made proposed changes, but the problem still exists:
$response = new BinaryFileResponse($pathToFile);
$response->setContentDisposition(ResponseHeaderBag::DISPOSITION_ATTACHMENT, 'archive.zip');
$response->headers->set('Content-Type', 'application/zip');
clearstatcache(false, $pathToFile);
return $response;
EDIT2:
I found one more interesting thing: serving this ZIP file with standard Response (the working code) creates openable file, however running zip -T on it gives:
1 extra byte at beginning or within zipfile
Testing original file gives:
OK
The size of file is less than 1MB.
SOLUTION:
When I opened generated ZIP file in text editor, I found an extra empty line at the beggining of it...
So I've added ob_clean(); before returning Response object and now it works!
No idea where this newline character came from, though...
Since I see you are returning 201 http header I assume file has been created with same request. As per symfony documentation:
If you just created the file during this same request, the file may be sent without any content. This may be due to cached file stats that return zero for the size of the file. To fix this issue, call clearstatcache(false, $file) with the path to the binary file.
Although they resemble files, objects in Amazon S3 aren't really "files", just like S3 buckets aren't really directories. On a Unix system I can use head to preview the first few lines of a file, no matter how large it is, but I can't do this on a S3. So how do I do a partial read on S3?
S3 files can be huge, but you don't have to fetch the entire thing just to read the first few bytes. The S3 APIs support the HTTP Range: header (see RFC 2616), which take a byte range argument.
Just add a Range: bytes=0-NN header to your S3 request, where NN is the requested number of bytes to read, and you'll fetch only those bytes rather than read the whole file. Now you can preview that 900 GB CSV file you left in an S3 bucket without waiting for the entire thing to download. Read the full GET Object docs on Amazon's developer docs.
The AWS .Net SDK only shows only fixed-ended ranges are possible (RE: public ByteRange(long start, long end) ). What if I want to start in the middle and read to the end? An HTTP range of Range: bytes=1000- is perfectly acceptable for "start at 1000 and read to the end" I do not believe that they have allowed for this in the .Net library.
get_object api has arg for partial read
s3 = boto3.client('s3')
resp = s3.get_object(Bucket=bucket, Key=key, Range='bytes={}-{}'.format(start_byte, stop_byte-1))
res = resp['Body'].read()
Using Python you can preview first records of compressed file.
Connect using boto.
#Connect:
s3 = boto.connect_s3()
bname='my_bucket'
self.bucket = s3.get_bucket(bname, validate=False)
Read first 20 lines from gzip compressed file
#Read first 20 records
limit=20
k = Key(self.bucket)
k.key = 'my_file.gz'
k.open()
gzipped = GzipFile(None, 'rb', fileobj=k)
reader = csv.reader(io.TextIOWrapper(gzipped, newline="", encoding="utf-8"), delimiter='^')
for id,line in enumerate(reader):
if id>=int(limit): break
print(id, line)
So it's an equivalent of a following Unix command:
zcat my_file.gz|head -20