Crystal lang how to get binary file from http - http

In Ruby:
require 'open-uri'
download = open('http://example.com/download.pdf')
IO.copy_stream(download, '~/my_file.pdf')
How to do the same in Crystal?

We can do the following:
require "http/client"
HTTP::Client.get("http://example.org") do |response|
File.write("example.com.html", response.body_io)
end
This writes just the response without any HTTP headers to the file. File.write is also smart enough to not download the entire file into memory first, but to write to the file as it reads chunks from the given IO.

I found something that works:
require "http/request"
require "file"
res = HTTP::Client.get "https://ya.ru"
fl=File.open("ya.html","wb")
res.to_io(fl)
fl.close

Related

Examples that send unkown size data with http chunked header?

I still don't have a clear picture of practical examples of the chunked header usage, after reading some posts and Wikipedia.
One example I see from Content-Length header versus chunked encoding, is:
On the other hand, if the content length is really unpredictable
beforehand (e.g. when your intent is to zip several files together and
send it as one), then sending it in chunks may be faster than
buffering it in server's memory or writing to local disk file system
first.
So it means that I can send zip files while I am zipping them ? How ?
I've also noticed that if I download a GitHub repo, I am receiving data in chunked. Does GitHub also send files in this way (sending while zipping) ?
A minimal example would be much appreciated. :)
Here is an example using perl (with IO::Compress::Zip module) to send a zipped file on the fly as #deceze pointed to
use IO::Compress::Zip qw(:all);
my #files = ('example.gif', 'example1.png'); # here are some files
my $path = "/home/projects/"; # files location
# here is the header
print "Content-Type: application/zip\n"; # we are going to compress to zip and send it
print "Content-Disposition: attachment; filename=\"zip.zip\"\r\n\r\n"; # zip.zip for example is where we are going to zip data
my $zip = new IO::Compress::Zip;
foreach my $file (#files) {
$zip->newStream(Name => $file, Method => ZIP_CM_STORE); # storing files in zip
open(FILE, "<", "$path/$file");
binmode FILE; # reading file in binary mode
my ($buffer, $data, $n);
while (($n = read FILE,$data, 1024) != 0) { # reading data from file to the end
$zip->print($data); # print the data in binary
}
close(FILE);
}
$zip->close;
As you see in the script so even if you add the zip filename in the header, it doesn't matter, because we are zipping the files and printing it in binary mode right away, so it's not necessary to zip the data and store them then send it to the client, you can directly zip the files and print them without storing it.

Build web graph with wget

I'm using wget with -r (recursive) option, to crawl and download all the pages starting from a root.
For debugging purpose I'd like to output which page routed me to another one, for example: https://stackoverflow.com/ -> https://stackoverflow.com/questions
Is there such a way to do that?
Please note that I need explicitly use wget.
The best solution I found untill now is to use the --warc-file option, to export a warc archive of my crawl. This format also store the Referer.
Using a python library to read the output I wrote the following simple script, to export a csv with source/target columns:
import warc
f = warc.open("crawler.warc")
for record in f:
if record['WARC-Type'] != 'request':
continue
for line in record.payload:
if line.startswith("Referer:"):
print line.replace("Referer: ", "").strip('\n\r'), ",", record['WARC-Target-URI']

S3: How to do a partial read / seek without downloading the complete file?

Although they resemble files, objects in Amazon S3 aren't really "files", just like S3 buckets aren't really directories. On a Unix system I can use head to preview the first few lines of a file, no matter how large it is, but I can't do this on a S3. So how do I do a partial read on S3?
S3 files can be huge, but you don't have to fetch the entire thing just to read the first few bytes. The S3 APIs support the HTTP Range: header (see RFC 2616), which take a byte range argument.
Just add a Range: bytes=0-NN header to your S3 request, where NN is the requested number of bytes to read, and you'll fetch only those bytes rather than read the whole file. Now you can preview that 900 GB CSV file you left in an S3 bucket without waiting for the entire thing to download. Read the full GET Object docs on Amazon's developer docs.
The AWS .Net SDK only shows only fixed-ended ranges are possible (RE: public ByteRange(long start, long end) ). What if I want to start in the middle and read to the end? An HTTP range of Range: bytes=1000- is perfectly acceptable for "start at 1000 and read to the end" I do not believe that they have allowed for this in the .Net library.
get_object api has arg for partial read
s3 = boto3.client('s3')
resp = s3.get_object(Bucket=bucket, Key=key, Range='bytes={}-{}'.format(start_byte, stop_byte-1))
res = resp['Body'].read()
Using Python you can preview first records of compressed file.
Connect using boto.
#Connect:
s3 = boto.connect_s3()
bname='my_bucket'
self.bucket = s3.get_bucket(bname, validate=False)
Read first 20 lines from gzip compressed file
#Read first 20 records
limit=20
k = Key(self.bucket)
k.key = 'my_file.gz'
k.open()
gzipped = GzipFile(None, 'rb', fileobj=k)
reader = csv.reader(io.TextIOWrapper(gzipped, newline="", encoding="utf-8"), delimiter='^')
for id,line in enumerate(reader):
if id>=int(limit): break
print(id, line)
So it's an equivalent of a following Unix command:
zcat my_file.gz|head -20

Get the real extension of a file vb.net

I would like to determine real file extension.
example :
file = "test.fakeExt"
// but the real extention is .exe // for security reason I wish to avoid using it!
How can I do that?
If you want to determine the extension you could use findmimefromdata.
It looks at the first part of the file to determine what type of file it is.
FindMimeFromData function
Sample code
The first two bytes of an .exe file are allways 'MZ'.
So you could read the binary file, and see if the first two bytes are MZ, then you know it's an .exe file...

Writing packet information into text file

i wrote the following code to output source address and destination address of all packets that are in a .pcap file to a text file using lua and tshark.
#!/usr/bin/lua
do
local file = io.open("luawrite", "w")
local function init_listener()
local tap = Listener.new("ipv6")
function tap.packet(pinfo, tvb)
local srcadd = pinfo.src
local dstadd = pinfo.dst
file:write(tostring(srcadd), "\t", tostring(dstadd)"\n")
end
end
end
I am running this script using the following command:
tshark -r wireless.pcap -xlua_script:MyScript.lua
Why is nothing being written in my text file? Is there something wrong on the code? Help is very much appreciated. Thanks!
Probably because you are missing a comma before "\n":
---------------------------------------------------vv-----
file:write(tostring(srcadd), "\t", tostring(dstadd), "\n")
It may be useful to check for file value returned by the open call.
I don't see any other problems with the script; if you still have issues, I have a page on debugging Wireshark Lua scripts that may help.

Resources