Lua - Download file asynchronously via HTTP - asynchronous

I just finish reading copas core code. And I want to write code to download file from website asynchronously, but copas seems to only support socket IO.
Since Lua does not provide async syntax, and other packages will surely have their own event loop that, I think, can not run along side copas' loop.
So to async download file via http, do I have to find a package that suppprt async http and async file IO at the same time? Or any other ideas?

After reading bunches of code, I can finally answer my own question.
As I mention in my comment to the question, one can make use of the step function exported by async IO library, and merge multiple stepping into a bigger loop.
In the case of luv, it uses external thread pool in C to manage file IO, and use a single-threaded loop to call pending callbacks and manage IO polling (polling is not needed in my use case).
One can simply call file operation function provided by luv to make async file IO. But still need to step luv's loop to call callbacks bind to IO operations.
The integerated main loop looks goes like this:
local function main_loop()
copas.running = true
while not copas.finished() or uv.loop_alive() do
if not copas.finished() then
copas.step()
end
if uv.loop_alive() then
uv.run("nowait")
end
end
end
copas.step() is the stepping function of copas. And uv.run("nowait") make luv run just one pass of event loop and don't block if there is no ready IO when polling.
A working solution looks like this:
local copas = require "copas"
local http = require "copas.http"
local uv = require "luv"
local urls = {
"http://example.com",
"http://example.com"
}
local function main_loop()
copas.running = true
while not copas.finished() or uv.loop_alive() do
if not copas.finished() then
copas.step()
end
if uv.loop_alive() then
uv.run("nowait")
end
end
end
local function write_file(file_path, data)
-- ** call to luv async file IO **
uv.fs_open(file_path, "w+", 438, function(err, fd)
assert(not err, err)
uv.fs_write(fd, data, nil, function(err_o, _)
assert(not err_o, err_o)
uv.fs_close(fd, function(err_c)
assert(not err_c, err_c)
print("finished:", file_path)
end)
end)
end)
end
local function dl_url(url)
local content, _, _, _ = http.request(url)
write_file("foo.txt", content)
end
-- adding task to copas' loop
for _, url in ipairs(urls) do
copas.addthread(dl_url, url)
end
main_loop()

Related

Why is Rust's std::thread::sleep allowing my HTTP response to return the correct body?

I am working on the beginning of the final chapter of The Rust Programming Language, which is teaching how to write an HTTP response with Rust.
For some reason, the HTML file being sent does not display in the browser unless I have Rust wait before calling TcpResponse::flush().
Here is the code:
use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;
use std::fs;
use std::thread::sleep;
use std::time::Duration;
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
for stream in listener.incoming() {
let stream = stream.unwrap();
handle_connection(stream);
}
}
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
let contents = fs::read_to_string("hello.html").unwrap();
let response = format!(
"HTTP/1.1 200 OK\r\nContent-Length: {}\r\n{}",
contents.len(),
contents
);
stream.write(response.as_bytes()).unwrap();
// let i = stream.write(response.as_bytes()).unwrap();
// println!("{} bytes written to the stream", i);
// ^^ using this code instead will sometimes make it display properly
sleep(Duration::from_secs(1));
// ^^ uncommenting this will cause a blank page to load.
stream.flush().unwrap();
}
I observe the same behavior in multiple browsers.
According to the Rust book, calling TcpListener::flush should ensure that the bytes finish writing to the stream. So why would I be unable to view the HTML file in the browser unless I sleep the thread before flushing?
I have done hard reloading and restarted the server with cargo run multiple times and the behavior is the same. I have also printed out the file contents to the terminal, and the contents are being read fine under either condition (of course they are).
I wonder if this is a problem with my operating system. I'm on Windows 10.
It isn't really holding the project up as I can continue learning (and I'm not planning on putting an actual web project into production right now), but I would appreciate any insight anyone has on this issue. There must be something about Rust's handling of the stream or the environment that I am not understanding.
Thanks for your time!

Reading JS library from CDN within Mirth

I'm doing some testing around Mirth-Connect. I have a test channel that the datatypes are Raw for the source and one destination. The destination is not doing anything right now. In the source, the connector type is JavaScript Reader, and the code is doing the following...
var url = new java.net.URL('https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.15/lodash.fp.min.js');
var conn = url.openConnection();
conn.setRequestMethod('GET');
if(conn.getResponseCode() === 200) {
var body = org.apache.commons.io.IOUtils.toString(conn.getInputStream(), 'UTF-8');
logger.debug('CONTENT: ' + body);
globalMap.put('_', body);
}
conn.disconnect();
// This code is in source but also tested in destination
logger.debug('FROM GLOBAL: ' + $('_')); // library was found
var arr = [1, 2, 3, 4];
var _ = $('_');
var newArr = _.chunk(arr, 2);
The error I'm getting is: TypeError: Cannot find function chunk in object.
The reason I want to do this is to build custom/internal libraries with unit test and serve them with an internal/company CDN and allow Mirth to consume them.
How can I make the library available to Mirth?
Rhino actually has commonjs support, but mirth doesn't have it enabled by default. Here's how you can use it in your channel.
channel deploy script
with (JavaImporter(
org.mozilla.javascript.Context,
org.mozilla.javascript.commonjs.module.Require,
org.mozilla.javascript.commonjs.module.provider.SoftCachingModuleScriptProvider,
org.mozilla.javascript.commonjs.module.provider.UrlModuleSourceProvider,
java.net.URI
)) {
var require = new Require(
Context.getCurrentContext(),
this,
new SoftCachingModuleScriptProvider(new UrlModuleSourceProvider([
// Search path. You can add multiple URIs to this array
new URI('https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.15/')
],null)),
null,
null,
true
);
} // end JavaImporter
var _ = require('lodash.min');
require('lodash.fp.min')(_); // convert lodash to fp
$gc('_', _);
Note: There's something funky with the cdnjs lodash fp packages that don't detect the environment correctly and force that weird two stage import. If you use https://cdn.jsdelivr.net/npm/lodash#4.17.15/ instead you only need to do var _ = require('fp'); and it loads everything in one step.
transformer
var _ = $gc('_');
logger.info(JSON.stringify(_.chunk(2)([1,2,3,4])));
Note: This is the correct way to use fp/chunk. In your OP you were calling with the standard chunk syntax.
Additional Commentary
I think it's probably ok to do it this way where you download the library once at deploy time and store it in the globalChannelMap, then retrieve it from the map where needed. It would probably also work to store the require object itself in the map if you wanted to call it elsewhere. It will cache and reuse the object created for future calls to the same resource.
I would not create new Require objects anywhere but the deploy script, or you will be redownloading the resource on every message (or every poll in the case of a Javascript Reader.)
Edit: I guess for an internal webhost, this could be desirable in a Javascript Reader if you intend for it to pick up changes immediately on the next poll without a redeploy, assuming you would be upgrading the library in place instead of incrementing a version
The benefit to using Code Templates, as Vibin suggested is that they get compiled directly into your channel at deploy time and there is no additional fetching step at runtime. Making the library available is as simple as assigning it to your channel.
Even though importing third party libraries could be an option, I was actually looking into this for our team to write our own custom functions, write unit-test for them, and lastly be able to pull that code inside Mirth. I was experimenting with lodash but it was not my end goal to use it, it is. My solution was to do a REST GET call with java in the global script. Your URL would be the GitHub raw URL of the code you want to pull in. Same code of my original question but like I said, the URL is the raw GitHub URL for the function I want to pull in.

Send a large file with HTTP.jl

I would like to implement a server with HTTP.jl and julia. After some computation the server would return a "large" file (about several 100 MB). I would like to avoid having to read all the file in memory and then send it to the client.
Some framework allow have a specific function for this (e.g. Flask http://flask.pocoo.org/docs/0.12/api/#flask.send_file) or allow to stream the content to the client (http://flask.pocoo.org/docs/0.12/patterns/streaming/).
Are one for these two options also available in HTTP.jl ? Or any other Julia web package?
Here is a test code which reads the file testfile.txt, but I want to avoid loading the complete file in memory.
import HTTP
f = open("testfile.txt","w")
write(f,"test")
close(f)
router = HTTP.Router()
function testfun(req::HTTP.Request)
f = open("testfile.txt")
data = read(f)
close(f)
return HTTP.Response(200,data)
end
HTTP.register!(router, "GET", "/testfun",HTTP.HandlerFunction(testfun))
server = HTTP.Servers.Server(router)
task = #async HTTP.serve(server, ip"127.0.0.1", 8000; verbose=false)
sleep(1.0)
req = HTTP.request("GET","http://127.0.0.1:8000/testfun/")
# end server
put!(server.in, HTTP.Servers.KILL)
#show String(req.body)
You can use memory mapped IO like this:
function testfun(req::HTTP.Request)
data = Mmap.mmap(open("testfile.txt"), Array{UInt8,1})
return HTTP.Response(200,data)
end
data now looks like a normal byte array to julia, but is actually liked to the file, which might be exactly what you want. The file will be closed upon garbage collection - if you have many requests and no garbage collection is triggered, you might end up with a lot of open files. If your request takes quite long anyway, you might consider calling gc() at the begin of the request.

How to use defer in combination with http.ListenAndServe?

When using http.ListenAndServe() in Go, this results in a blocking situation where the application, apparently, can only be stopped by killing it. This seems to skip processing my defer statements. Please see the code below. When I kill the application the db is not closed. How can I make sure my defer statement will be run?
func main() {
db := NewDB(DBFILENAME)
defer db.Close()
http.HandleFunc("/", handler)
http.ListenAndServe(":80", nil)
}
defer statements get executed only when the function enclosing them returns. Your main function is not returning when you kill it. So you need to use signals and channels.
This is a good link explaining the same https://www.socketloop.com/tutorials/golang-intercept-ctrl-c-interrupt-or-kill-signal-and-determine-the-signal-type .
I faced the same problem with my last project. I implemented a similar solution in my project wshare .
In your case, you can try something like
ch := make(chan os.Signal, 3)
signal.Notify(ch, os.Interrupt,syscall.SIGTERM,syscall.SIGINT)
go func() {
signalType := <-ch
signal.Stop(ch)
log.Println("Exit command received. Exiting...")
// this is a good place to flush everything to disk
// before terminating.
db.Close()
log.Println("Signal type : ", signalType)
os.Exit(0)
}()
The HTTP package in the next go release 1.8 will have a new Shutdown function that gracefully shuts down the server . https://beta.golang.org/pkg/net/http/#Server.Shutdown
I suppose that defer will work then.

Parallel HTTP web crawler in Erlang

I'm coding on a simple web crawler and have generated a bunch gf static files I try to crawl by the code at bottom. I have two issues/questions I don't have an idea for:
1.) Looping over the sequence 1..200 throws me an error exactly after 100 pages have been crawled:
** exception error: no match of right hand side value {error,socket_closed_remotely}
in function erlang_test_01:fetch_page/1 (erlang_test_01.erl, line 11)
in call from lists:foreach/2 (lists.erl, line 1262)
2.) How to parallelize the requests, e.g. 20 cincurrent reqs
-module(erlang_test_01).
-export([start/0]).
-define(BASE_URL, "http://46.4.117.69/").
to_url(Id) ->
?BASE_URL ++ io_lib:format("~p", [Id]).
fetch_page(Id) ->
Uri = to_url(Id),
{ok, {{_, Status, _}, _, Data}} = httpc:request(get, {Uri, []}, [], [{body_format,binary}]),
Status,
Data.
start() ->
inets:start(),
lists:foreach(fun(I) -> fetch_page(I) end, lists:seq(1, 200)).
1. Error message
socket_closed_remotely indicates that the server closed the connection, maybe because you made too many requests in a short timespan.
2. Parallellization
Create 20 worker processes and one process holding the URL queue. Let each process ask the queue for a URL (by sending it a message). This way you can control the number of workers.
An even more "Erlangy" way is to spawn one process for each URL! The upside to this is that your code will be very straightforward. The downside is that you cannot control your bandwidth usage or number of connections to the same remote server in a simple way.

Resources