Download large file with LuaSocket's HTTP module while keeping UI responsive - http

I would like to use LuaSocket's HTTP module to download a large file while displaying progress in the console and later on in a GUI. The UI must never block, not even when the server is unresponsive during the transfer. Additionally, creating a worker thread to handle the download is not an option.
Here's what I got so far:
local io = io
local ltn12 = require("ltn12")
local http = require("socket.http")
local fileurl = "http://www.example.com/big_file.zip"
local fileout_path = "big_file.zip"
local file_size = 0
local file_down = 0
-- counter filter used in ltn12
function counter(chunk)
if chunk == nil then
return nil
elseif chunk == "" then
return ""
else
file_down = file_down + #chunk
ui_update(file_size, file_down) -- update ui, run main ui loop etc.
return chunk -- return unmodified chunk
end
end
-- first request
-- determine file size
local r, c, h = http.request {
method = "HEAD",
url = fileurl
}
file_size = h["content-length"]
-- second request
-- download file
r, c, h = http.request {
method = "GET",
url = fileurl,
-- set our chain, count first then write to file
sink = ltn12.sink.chain(
counter,
ltn12.sink.file(io.open(fileout_path, "w"))
)
}
There are a few problems with the above, ignoring error checking and hard-coding:
It requires 2 HTTP requests when it is possible with only 1 (a normal GET request also sends content-length)
If the server is unresponsive, then the UI will also be unresponsive, as the filter only gets called when there is data to process.
How could I do this making sure the UI never blocks?

There is an example on non-preemptive multithreading in Programming in Lua that uses non-blocking luasocket calls and coroutines to do a multiple parallel downloads. It should be possible to apply the same logic to your process to avoid blocking. I can only add that you should consider calling this logic from IDLE event in your GUI (if there is such a thing) to avoid getting "attempt to yield across metamethod/c-call boundary" errors.

Related

How to read incomming messages using svSocket Server in R

I am using svSocket package in R to create a socket server. I have successfully created server using startSocketServer(...). I am able to connect my application to the server and send data from server to the application. But I am struggeling with reading of messages sent by application. I couldn't find any example for that on internet. I found only processSocket(...) example in documentation of vsSocket (see below) which describes the function that processes a command coming from the socket. But I want only read socket messages comming to the server in repeat block and print them on the screen for testing.
## Not run:
# ## A simple REPL (R eval/process loop) using basic features of processSocket()
# repl <- function ()
# {
# pars <- parSocket("repl", "", bare = FALSE) # Parameterize the loop
# cat("Enter R code, hit <CTRL-C> or <ESC> to exit\n> ") # First prompt
# repeat {
# entry <- readLines(n = 1) # Read a line of entry
# if (entry == "") entry <- "<<<esc>>>" # Exit from multiline mode
# cat(processSocket(entry, "repl", "")) # Process the entry
# }
# }
# repl()
# ## End(Not run)
Thx for your input.
EDIT:
Here more specific example of socket server creation and sending message:
require(svSocket)
#start server
svSocket::startSocketServer(
port = 9999,
server.name = "test_server",
procfun = processSocket,
secure = FALSE,
local = FALSE
)
#test calls
svSocket::getSocketClients(port = 9999) #ip and port of client connected
svSocket::getSocketClientsNames(port = 9999) #name of client connected
svSocket::getSocketServerName(port = 9999) #name of socket server given during creation
svSocket::getSocketServers() #server name and port
#send message to client
svSocket::sendSocketClients(
text = "send this message to the client",
sockets = svSocket::getSocketClientsNames(port = 9999),
serverport = 9999
)
... and response of the code above is:
> require(svSocket)
>
> #start server
> svSocket::startSocketServer(
+ port = 9999,
+ server.name = "test_server",
+ procfun = processSocket,
+ secure = FALSE,
+ local = FALSE
+ )
[1] TRUE
>
> #test calls
> svSocket::getSocketClients(port = 9999) #ip and port of client connected
sock0000000005C576B0
"192.168.2.1:55427"
> svSocket::getSocketClientsNames(port = 9999) #name of client connected
[1] "sock0000000005C576B0"
> svSocket::getSocketServerName(port = 9999) #name of socket server given during creation
[1] "test_server"
> svSocket::getSocketServers() #server name and port
test_server
9999
>
> #send message to client
> svSocket::sendSocketClients(
+ text = "send this message to the client",
+ sockets = svSocket::getSocketClientsNames(port = 9999),
+ serverport = 9999
+ )
>
What you can see is:
successfull creation of socket server
successfull connection of external client sock0000000005C576B0 (192.168.2.1:55427) to the server
successfull sending of message to the client (here no explizit output is provided in console, but the client reacts as awaited
what I am still not able to implement is to fetch client messages sent to the server. Could somebody provide me an example on that?
For interaction with the server from the client side, see ?evalServer.
Otherwise, it is your processSocket() function (either the default one, or a custom function you provide) that is the entry point triggered when the server got some data from one connected client. From there, you have two possibilities:
The simplest one is just to use the default processSocket() function. Besides some special code between <<<>>>, which is interpreted as special commands, the default version will evaluate R code on the server side. So, just call the function you want on the server. For instance, define f <- function(txt) paste("Fake process ", txt) on the server, and call evalServer(con, "f('some text')") on the client. Your custom f() function is executed on the server. Just take care that you need to double quote expressions that contain text here.
An alternate solution is to define your own processSocket() function to capture messages sent by the client to the server earlier. This is safer for a server that needs to process a limited number of message types without parsing and evaluating R code received from the client.
Now, the server is asynchronous, meaning that you still got the prompt available on the server, while it is listening to client(s) and processing their requests.

PyZMQ - How to terminate a Context if no connection was made?

I've been trying to figure out how to close a Context-instance (or if I even need to) when my socket hasn't yet connected to a bound address. Here's my demo code:
import zmq
import json
data = {}
data['key'] = 'value'
json_data = json.dumps(data)
context = zmq.Context.instance()
socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:5555")
socket.send_json(data)
socket.close()
print("I get here!")
context.term()
My expected behavior is that this ends fine. My actual behavior is that context.term() blocks with no way to ^C out. It prints out "I get here!" before it stops, btw.
EDIT Incorporating the chosen answer's solution, this works:
import zmq
import json
data = {}
data['key'] = 'value'
json_data = json.dumps(data)
context = zmq.Context.instance()
socket = context.socket(zmq.REQ)
socket.setsockopt(zmq.LINGER, 100)
socket.connect("tcp://localhost:5555")
socket.send_json(data)
socket.close()
print("I get here!")
context.term()
Yes, this is The Desired behaviour. Why?
ZeroMQ uses Context-instance as an autonomous battlefield unit. It has its own resources and operates in as many IO-threads, as performance tweaking had imperatively required to spawn.
As these resources allocations and transport-related infrastructure is "expensive", the .term()-instance method takes due care, not to damage toys, that still wait inside IN/OUT-queues, before got delivered. Did I mention both the infrastructure-setup & maintenance plus the message-delivery mechanisms are asynchronous and do not take place, the less are granted to be completed upon request? No, they operate "separately" under the Context()-instance hood, in a best-effort fashion, having a Zen-of-Zero ( incl. a Zero-"warranty" ) inside the design-DNA...
Your code has put a message already "there", so there is a gold-egg, that the .term()-call tries not to damage, before finally killing the Context-instance.
This behaviour is indeed The Desired behaviour and one can change it for cases, where due design care was taken otherwise:
import zmq
import json
print( "Run against ZeroMQ native-API[{0:}]". format( zmq.pyzmq_version_info() ) )
pass; aLocalCONTEXT = zmq.Context.instance()
socket = aLocalCONTEXT.socket( zmq.REQ ); socket.connect( "tcp://localhost:5555" )
print( "<aSocket> has LINGER == [{0:}]". format( socket.getsockopt( zmq.LINGER ) )
socket.send_json( json.dumps( { 'key': 'value' } ) ) # MOV. data into Context()
socket.close(); print( "I get here!" ) # N/P to .close() socket
# /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ BUT!
aLocalCONTEXT.term() # THE HELL OPENS HERE,
# # GIVEN LINGER WAS -1
# # AS .term()-method
# # MUST WAIT UNTIL ALL MSGs
# # KNOWN TO BE IN-FLIGHT
# # GET INDEED DELIVERED, OUCH
Even though newer native ZeroMQ API-versions ( 4.2+ ) promise to have set zmq.LINGER default value other than originally injected -1 == WAIT INDETERMINANTLY TILL DELIVERED, IF NOT FOREVER IN CASE NO PEERS ARE OUT THERE
so a due design-side care is indeed a sign of a fair engineering practice :o)

How to safely handle raw (file) data in Java?

An image gets corrupted while being retrieved (through HTTP) and then sent (through HTTP) to a database. Image's raw data is handled in String form.
The service sends a GET for an image file, receives response with the raw image data (response's body) and the Content-Type. Then, a PUT request is sent with the aforementioned request's body and Content-Type header. (The PUT request is constructed by providing the body in String) This PUT request is sent to a RESTful database (CouchDB), creating an attachment (for those unfamiliar with CouchDB an attachment acts like a static file).
Now I have the original image, which my service GETs and PUTs to a database, and this 'copy' of the original image, that I can now GET from the database. If I then `curl --head -v "[copy's url]" it has the Content-Type of the original image, but Content-Length has changed, went from 200kb to about 400kb. If I GET the 'copy' image with a browser, it is not rendered, whereas, the original renders fine. It is corrupted.
What might be the cause? My guess is that while handling the raw data as a string, my framework guesses the encoding wrong and corrupts it. I have not been able to confirm or deny this. How could I handle this raw data/request body in a safe manner, or how could I properly handle the encoding (if that proves to be the problem)?
Details: Play2 Framework's HTTP client, Scala. Below a test to reproduce:
"able to copy an image" in {
def waitFor[T](future:Future[T]):T = { // to bypass futures
Await.result(future, Duration(10000, "millis"))
}
val originalImageUrl = "http://laughingsquid.com/wp-content/uploads/grumpy-cat.jpg"
val couchdbUrl = "http://admin:admin#localhost:5984/testdb"
val getOriginal:ws.Response = waitFor(WS.url(originalImageUrl).get)
getOriginal.status mustEqual 200
val rawImage:String = getOriginal.body
val originalContentType = getOriginal.header("Content-Type").get
// need an empty doc to have something to attach the attachment to
val emptyDocUrl = couchdbUrl + "/empty_doc"
val putEmptyDoc:ws.Response = waitFor(WS.url(emptyDocUrl).put("{}"))
putEmptyDoc.status mustEqual 201
//uploading an attachment will require the doc's revision
val emptyDocRev = (putEmptyDoc.json \ "rev").as[String]
// create actual attachment/static file
val attachmentUrl = emptyDocUrl + "/0"
val putAttachment:ws.Response = waitFor(WS.url(attachmentUrl)
.withHeaders(("If-Match", emptyDocRev), ("Content-Type", originalContentType))
.put(rawImage))
putAttachment.status mustEqual 201
// retrieve attachment
val getAttachment:ws.Response = waitFor(WS.url(attachmentUrl).get)
getAttachment.status mustEqual 200
val attachmentContentType = getAttachment.header("Content-Type").get
originalContentType mustEqual attachmentContentType
val originalAndCopyMatch = getOriginal.body == getAttachment.body
originalAndCopyMatch aka "original matches copy" must beTrue // << false
}
Fails at the last 'must':
[error] x able to copy an image
[error] original matches copy is false (ApplicationSpec.scala:112)
The conversion to String is definitely going to cause problems. You need to work with the bytes as Daniel mentioned.
Looking at the source it looks like ws.Response is just a wrapper. If you get to the underlying class then there are some methods that may help you. On the Java side, someone made a commit on GitHub to expose more ways of getting the response data other than a String.
I'm not familiar with scala but something like this may work:
getOriginal.getAHCResponse.getResponseBodyAsBytes
// instead of getOriginal.body
WS.scala
https://github.com/playframework/playframework/blob/master/framework/src/play/src/main/scala/play/api/libs/ws/WS.scala
WS.java
Here you can see that Response has some new methods, getBodyAsStream() and asByteArray.
https://github.com/playframework/playframework/blob/master/framework/src/play-java/src/main/java/play/libs/WS.java

Multiple http requests on the same long running operation with Sinatra and EventMachine

I'm trying to understand how to use evented web servers with a combination of async sinatra and EventMachine.
In the code below each request on '/' will generate a new async http request to google. Is there an elegant solution for detecting that a request is already ongoing and waiting for its execution ?
If I have 100 concurrent requests on '/', this will generate 100 requests to the google backend. It would be much better to have a way to detect there is already an ongoing backend request and wait for its execution.
thanks for the answer.
require 'sinatra'
require 'json'
require 'eventmachine'
require 'em-http-request'
require 'sinatra/async'
Sinatra.register Sinatra::Async
def get_data
puts "Start request"
http = EventMachine::HttpRequest.new("http://www.google.com").get
http.callback {
puts "Request completed"
yield http.response
}
end
aget '/' do
get_data {|data| body data}
end
Update
I actually discovered you can add several callbacks to the same http request. So, it's easy to implement:
class Request
def get_data
if !#http || #http.response_header.status != 0
#puts "Creating new request"
#http = EventMachine::HttpRequest.new("http://www.bbc.com").get
end
#puts "Adding callback"
#http.callback do
#puts "Request completed"
yield #http.response
end
end
end
$req = Request.new
aget '/' do
$req.get_data {|data| body data}
end
This gives a very high number of requests per second. Cool!
You don't have to use sinatra/async at all to make it evented, just run it with an evented server (Thin, Rainbows!, Goliath).
Take a look at em-synchrony for an example of making multiple parallel requests without introducing spaghetti callback code:
require "em-synchrony"
require "em-synchrony/em-http"
EventMachine.synchrony do
multi = EventMachine::Synchrony::Multi.new
multi.add :a, EventMachine::HttpRequest.new("http://www.postrank.com").aget
multi.add :b, EventMachine::HttpRequest.new("http://www.postrank.com").apost
res = multi.perform
p "Look ma, no callbacks, and parallel HTTP requests!"
p res
EventMachine.stop
end
And yes, you can run this inside your Sinatra action.
Also take a look at Faraday, specifically with EM adapter.

is node.js' console.log asynchronous?

Are console.log/debug/warn/error in node.js asynchrounous? I mean will javascript code execution halt till the stuff is printed on screen or will it print at a later stage?
Also, I am interested in knowing if it is possible for a console.log to NOT display anything if the statement immediately after it crashes node.
Update: Starting with Node 0.6 this post is obsolete, since stdout is synchronous now.
Well let's see what console.log actually does.
First of all it's part of the console module:
exports.log = function() {
process.stdout.write(format.apply(this, arguments) + '\n');
};
So it simply does some formatting and writes to process.stdout, nothing asynchronous so far.
process.stdout is a getter defined on startup which is lazily initialized, I've added some comments to explain things:
.... code here...
process.__defineGetter__('stdout', function() {
if (stdout) return stdout; // only initialize it once
/// many requires here ...
if (binding.isatty(fd)) { // a terminal? great!
stdout = new tty.WriteStream(fd);
} else if (binding.isStdoutBlocking()) { // a file?
stdout = new fs.WriteStream(null, {fd: fd});
} else {
stdout = new net.Stream(fd); // a stream?
// For example: node foo.js > out.txt
stdout.readable = false;
}
return stdout;
});
In case of a TTY and UNIX we end up here, this thing inherits from socket. So all that node bascially does is to push the data on to the socket, then the terminal takes care of the rest.
Let's test it!
var data = '111111111111111111111111111111111111111111111111111';
for(var i = 0, l = 12; i < l; i++) {
data += data; // warning! gets very large, very quick
}
var start = Date.now();
console.log(data);
console.log('wrote %d bytes in %dms', data.length, Date.now() - start);
Result
....a lot of ones....1111111111111111
wrote 208896 bytes in 17ms
real 0m0.969s
user 0m0.068s
sys 0m0.012s
The terminal needs around 1 seconds to print out the sockets content, but node only needs 17 milliseconds to push the data to the terminal.
The same goes for the stream case, and also the file case gets handle asynchronous.
So yes Node.js holds true to its non-blocking promises.
console.warn() and console.error() are blocking. They do not return until the underlying system calls have succeeded.
Yes, it is possible for a program to exit before everything written to stdout has been flushed. process.exit() will terminate node immediately, even if there are still queued writes to stdout. You should use console.warn to avoid this behavior.
My Conclusion , after reading Node.js 10.* docs (Attached below). is that you can use console.log for logging , console.log is synchronous and implemented in low level c .
Although console.log is synchronic, it wont cause a performance issue only if you are not logging huge amount of data.
(The command line example below demonstrate, console.log async and console.error is sync)
Based on Node.js Doc's
The console functions are synchronous when the destination is a terminal or a file (to avoid lost messages in case of premature exit) and asynchronous when it's a pipe (to avoid blocking for long periods of time).
That is, in the following example, stdout is non-blocking while stderr is blocking:
$ node script.js 2> error.log | tee info.log
In daily use, the blocking/non-blocking dichotomy is not something you should worry about unless you > log huge amounts of data.
Hope it helps
Console.log is asynchronous in windows while it is synchronous in linux/mac. To make console.log synchronous in windows write this line at the start of your
code probably in index.js file. Any console.log after this statement will be considered as synchronous by interpreter.
if (process.stdout._handle) process.stdout._handle.setBlocking(true);
You can use this for synchrounous logging:
const fs = require('fs')
fs.writeSync(1, 'Sync logging\n')

Resources