Reciprocally-resuming coroutines - asynchronous

I apologize beforehand for the length of this question. I have tried to make it as succinct as possible, but it's just a rather complicated beast.
In chapter 24 of Ierusalimschy's Programming in Lua (4th ed.), the author presents a toy ("ugly") implementation of any asynchronous I/O library, like this one1:
-- filename: async.lua
-- Based (with several modifications) on Listing 24.3 (p. 246) of *Programming
-- in Lua*, 4th edition.
local async = {}
local queue = {}
local function enqueue (command) table.insert(queue, command) end
function async.readline (stream, callback)
enqueue(function () callback(stream:read()) end)
end
function async.writeline (stream, line, callback)
enqueue(function () callback(stream:write(line)) end)
end
function async.stop () enqueue("stop") end
function async.runloop ()
while true do
local next_command = table.remove(queue, 1)
if next_command == "stop" then break end
next_command()
end
end
return async
The author uses this toy library to illustrate some applications of coroutines, such as the scheme shown below for running "synchronous code on top of the asynchronous library"2:
-- Based (with several modifications) on Listing 24.5 (p. 248) of *Programming
-- in Lua*, 4th edition.
local async = require "async"
function run (synchronous_code)
local co = coroutine.create(function ()
synchronous_code()
async.stop()
end)
local wrapper = function ()
local status, result = assert(coroutine.resume(co))
return result
end
wrapper()
async.runloop()
end
function getline (stream)
local co = coroutine.running()
local callback = function (line) assert(coroutine.resume(co, line)) end
async.readline(stream, callback)
local line = coroutine.yield()
return line
end
function putline (stream, line)
local co = coroutine.running()
local callback = function () assert(coroutine.resume(co)) end
async.writeline(stream, line, callback)
coroutine.yield()
end
The author uses this technique to implement a function that prints to stdout in reverse order the lines it read from stdin:
function synchronous_code ()
local lines = {}
local input = io.input()
local output = io.output()
while true do
local line = getline(input)
if not line then break end
table.insert(lines, line)
end
for i = #lines, 1, -1 do putline(output, lines[i] .. "\n") end
end
run(synchronous_code)
The general idea is that the run function creates a coroutine that "registers" itself (through the callbacks created by getline and putline) into the asynchronous library's main loop. Whenever the asynchronous library's main loop executes one of these callbacks, it resumes the coroutine, which can do a bit more of its work, including registering the next callback with the main loop.
The run function gets the ball rolling by invoking the wrapper function, which, in turn, "resumes" (actually starts) the coroutine. The coroutine then runs until it encounters the first yield statement, which, in this example, happens within getline, right after getline has registered a callback into the async library's queue. Then the wrapper function regains control and returns. Finally, run invokes async.runloop. As async.runloop starts processing its queue, it resumes the coroutine, and off we go. The "synchronous code" (running within the coroutine) continues until the next getline or putline yields (after registering a callback), and async's main loop takes over again.
So far so good. But then, in Exercise 24.4 (p. 249), the author asks:
Exercise 24.4: Write a line iterator for the coroutine-based library (Listing 24.5), so that you can read the file with a for loop.
("Listing 24.5" refers to the code in the second code fragment above, where run, getline, and putline are defined.)
I am completely stumped with this one. In the example above, the coroutine "delivers" the lines it reads by writing them to stdout, which it can do all by itself. In contrast, the iterator requested by Exercise 24.4 would have to deliver its lines to a different coroutine, the one that is doing the iteration.
The only way that I can imagine this could happen is if the two coroutines could reciprocally resume each other. Is that even possible? I have not been able to construct a simple example of this, and would appreciate to see code that does it3.
Also, it seems to me that for this to work at all, one would need to implement an object with a write method (so that it can be passed to putline) that is ultimately responsible for delivering lines (somehow) to the iterator's coroutine.
1I have changed some superficial details, such as the names of variables, indentation, etc. The overall structure and function are unchanged.
2Again, I have changed some inessential details, to make the code easier for me to follow.
3 It is worth noting that the remaining two exercises for this chapter (24.5 and 24.6) are both about implementing systems involving multiple concurrent coroutines. Therefore, it is not farfetched to imagine that Exercise 24.4 is also about having two coroutines talking to each other.

I believe you're completely overthinking this exercise. The way I understand it, you're only meant to write a synchronous-style for iterator that runs within the synchronous code given to the run function. Taking the third code block as a base:
function for_file(file)
return function(file)
return getline(file)
end, file, nil
end
function synchronous_code ()
local lines = {}
local input = io.input()
local output = io.output()
for line in for_line(input) do
table.insert(lines, line)
end
for i = #lines, 1, -1 do putline(output, lines[i] .. "\n") end
end
run(synchronous_code)
As you can see, you don't really need to be aware of the coroutines at all for this to work, which is kind of the point of the library.

Related

How to systematically populate a whitelist for a sandboxing program?

On pp. 260-263 of Programming in Lua (4th ed.), the author discusses how to implement "sandboxing" (i.e. the running of untrusted code) in Lua.
When it comes to imposing limiting the functions that untrusted code can run, he recommends a "whitelist approach":
We should never think in terms of what functions to remove, but what functions to add.
This question is about tools and techniques for putting this suggestion into practice. (I expect there will be confusion on this point I want to emphasize it upfront.)
The author gives the following code as an illustration of a sandbox program based on a whitelist of allowed functions. (I have added or moved around some comments, and removed some blank lines, but I've copied the executable content verbatim from the book).
-- From p. 263 of *Programming in Lua* (4th ed.)
-- Listing 25.6. Using hooks to bar calls to unauthorized functions
local debug = require "debug"
local steplimit = 1000 -- maximum "steps" that can be performed
local count = 0 -- counter for steps
local validfunc = { -- set of authorized functions
[string.upper] = true,
[string.lower] = true,
... -- other authorized functions
}
local function hook (event)
if event == "call" then
local info = debug.getinfo(2, "fn")
if not validfunc[info.func] then
error("calling bad function: " .. (info.name or "?"))
end
end
count = count + 1
if count > steplimit then
error("script uses too much CPU")
end
end
local f = assert(loadfile(arg[1], "t", {})) -- load chunk
debug.sethook(hook, "", 100) -- set hook
f() -- run chunk
Right off the bat I am puzzled by this code, since the hook tests for event type (if event == "call" then...), and yet, when the hook is set, only count events are requested (debug.sethook(hook, "", 100)). Therefore, the whole song-and-dance with validfunc is for naught.
Maybe it is a typo. So I tried experimenting with this code, but I found it very difficult to put the whitelist technique in practice. The example below is a very simplified illustration of the type of problems I ran into.
First, here is a slightly modified version of the author's code.
#!/usr/bin/env lua5.3
-- Filename: sandbox
-- ----------------------------------------------------------------------------
local debug = require "debug"
local steplimit = 1000 -- maximum "steps" that can be performed
local count = 0 -- counter for steps
local validfunc = { -- set of authorized functions
[string.upper] = true,
[string.lower] = true,
[io.stdout.write] = true,
-- ... -- other authorized functions
}
local function hook (event)
if event == "call" then
local info = debug.getinfo(2, "fnS")
if not validfunc[info.func] then
error(string.format("calling bad function (%s:%d): %s",
info.short_src, info.linedefined, (info.name or "?")))
end
end
count = count + 1
if count > steplimit then
error("script uses too much CPU")
end
end
local f = assert(loadfile(arg[1], "t", {})) -- load chunk
validfunc[f] = true
debug.sethook(hook, "c", 100) -- set hook
f() -- run chunk
The most significant differences in the second snippet relative to the first one are:
the call to debug.sethook has "c" as mask;
the f function for the loaded chunk gets added to the validfunc whitelist;
io.stdout.write is added to the validfunc whitelist;
When I use this sandbox program to run the one-line script shown below:
# Filename: helloworld.lua
io.stdout:write("Hello, World!\n")
...I get the following error:
% ./sandbox helloworld.lua
lua5.3: ./sandbox:20: calling bad function ([C]:-1): __index
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in metamethod '__index'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
I tried to fix this by adding the following to validfunc:
[getmetatable(io.stdout).__index] = true,
...but I still get pretty much the same error. I could go on guessing and trying more things to add, but this is what I would like to avoid.
I have two related questions:
What can I add to validfunc so that sandbox will run helloworld (as is) to completion?
More importantly, what is a systematic way to find determine what to add to a whitelist table?
Part (2) is the heart of this post. I am looking for tools/techniques that remove the guesswork from the problem of populating a whitelist table.
(I know that I can get helloworld to work if I replace io.stdout:write with print, register print in sandbox's validfunc, and pass {print = print} as the last argument to loadfile, but doing this does not answer the general question of how to systematically determine what needs to be added to the whitelist to allow some specific code to work in the sandbox.)
EDIT: Ask #DarkWiiPlayer pointed out, the calling bad function error is being triggered by the calling of an unregistered function (__index?), which happened as part of the response to an earlier attempt to index a nil value error. So, this post's questions are all about systematically determining what to add to validfunc to allow Lua to emit the attempt to index a nil value error normally.
I should add that the question of which function's call triggered the hook's execution responsible for the calling bad function error message is at the moment completely unclear. This error message blames the error on __index, but I suspect that this may be a red herring, possibly due to a bug in Lua.
Why suspect a bug in Lua? If I change the error call in sandbox slightly to
error(string.format("calling bad function (%s:%d): %s (%s)",
info.short_src, info.linedefined, (info.name or "?"),
info.func))
...then the error message looks like this:
lua5.3: ./sandbox:20: calling bad function ([C]:-1): __index (function: 0x55b391b79ef0)
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in metamethod '__index'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
Nothing surprising there, but if now I change helloworld.lua to
# Filename: helloworld.lua
nonexistent()
io.stdout:write("Hello, World!\n")
...and run it under sandbox, the error message becomes
lua5.3: ./sandbox:20: calling bad function ([C]:-1): nonexistent (function: 0x556a161cdef0)
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in global 'nonexistent'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
From this error message, one may conclude that nonexistent is a real function; after all, it's sitting right there at 0x556a161cdef0! But we know that nonexistent lives up to its name: it doesn't exist!
The whiff of a bug is definitely in the air. It could be that the function that is triggering the hook should really be excluded from those that trigger such "c"-masked hooks? Be that as it may, it appears that, in this particular situation, the call to debug.info is returning inconsistent information (since the name of the function [e.g. nonexistent] clearly does not correspond at all to the actual function object [e.g. function: 0x556a161cdef0] that is supposedly triggering the hook).
(Final answer at the bottom, feel free to skip until the <hr> line)
I'll explain my debugging step by step.
This is a really weird phenomenon. After some testing, I've managed to narrow it down a bit:
Since you pass {} to load, the function runs with an empty environment, so io is, in fact, nil (and io.stdout would error anyway)
The error happens directly when attempting to index io (which is a nil value)
The functio __index is a C function (see error message)
My first intuition was that __index was called somewhere internally. Thus, to find out what it does, I decided to look at its locals in hopes of guessing what it does.
A quick helper function I threw together:
local function locals(f)
return function(f, n)
local name, value = debug.getlocal(f+1, n)
if name then
return n+1, name, value
end
end, f, 1
end
Insert that right before the line where the error is raised:
for idx, name, value in locals(2) do
print(name, value)
end
error(string.format("calling bad function (%s:%d): %s", info.short_src, info.linedefined, (info.name or "?")))
This led to an interesting result:
(*temporary) stdin:43: attempt to index a nil value (global 'io')
(*temporary) table: 0x563cef2fd170
lua: stdin:29: calling bad function ([C]:-1): __index
stack traceback:
[C]: in function 'error'
stdin:29: in function <stdin:21>
[C]: in metamethod '__index'
stdin:43: in function 'f'
stdin:49: in main chunk
[C]: in ?
shell returned 1
Why is there a temporary string value with a completely different error message?
By the way, this error makes total sense; io does not exist because of the empty environment, so indexing it should obviously raise just that error.
It's honestly a very interesting error, but I'll leave it at this, as you're learning the language and this hint might be enough for you to figure it out on your own. It's also a very nice chance to actually use (and get to know) the debug module in a more practical context.
Actual Solution
After some time has now passed, I came back to add a proper solution to this problem, but I really already did just that. The weird error reporting is just Lua being weird. The real error is the empty environment that's set when loading the chunk, as I mentioned a few paragraphs above.
From the manual:
load (chunk [, chunkname [, mode [, env]]])
Loads a chunk.
[...]
If the resulting function has upvalues, the first upvalue is set to the value of env, if that parameter is given, or to the value of the global environment. Other upvalues are initialized with nil. (When you load a main chunk, the resulting function will always have exactly one upvalue, the _ENV variable (see §2.2). However, when you load a binary chunk created from a function (see string.dump), the resulting function can have an arbitrary number of upvalues.) All upvalues are fresh, that is, they are not shared with any other function.
[...]
Now, in a "main chunk", i.e. one loaded from a text Lua file, the first (and only) upvalue is always the environment of the chunk, so where it will look for "globals" (this is slightly different in Lua 5.1). Since an empty table is passed in, the chunk has no access to any of the global variables like string or io.
Therefore, when the function f() tries to index io, Lua throws an error "attempt to index a nil value", because io is nil. For whatever reason Lua then makes some internal function calls that end up triggering the blacklist, causing a new error that shadows the previous one; this makes debugging this error extremely inconvenient and almost impossible without using the debug library to get additional information about the call stack.
I ultimately only realized this myself after I noticed the original error message while looking at the locals of the function that made the blocked call.
I hope this solves the problem :)

Erlang: Make a ring

I'm quite new to Erlang (Reading through "Software for a Concurrent World"). From what I've read, we link two processes together to form a reliable system.
But if we need more than two process, I think we should connect them in a ring. Although this is slightly tangential to my actual question, please let me know if this is incorrect.
Given a list of PIDs:
[1,2,3,4,5]
I want to form these in a ring of {My_Pid, Linked_Pid} tuples:
[{1,2},{2,3},{3,4},{4,5},{5,1}]
I have trouble creating an elegant solution that adds the final {5,1} tuple.
Here is my attempt:
% linkedPairs takes [1,2,3] and returns [{1,2},{2,3}]
linkedPairs([]) -> [];
linkedPairs([_]) -> [];
linkedPairs([X1,X2|Xs]) -> [{X1, X2} | linkedPairs([X2|Xs])].
% joinLinks takes [{1,2},{2,3}] and returns [{1,2},{2,3},{3,1}]
joinLinks([{A, _}|_]=P) ->
{X, Y} = lists:last(P)
P ++ [{Y, A}].
% makeRing takes [1,2,3] and returns [{1,2},{2,3},{3,1}]
makeRing(PIDs) -> joinLinks(linkedPairs(PIDs)).
I cringe when looking at my joinLinks function - list:last is slow (I think), and it doesn't look very "functional".
Is there a better, more idiomatic solution to this?
If other functional programmers (non-Erlang) stumble upon this, please post your solution - the concepts are the same.
Use lists:zip with the original list and its 'rotated' version:
1> L=[1,2,3].
[1,2,3]
2> lists:zip(L, tl(L) ++ [hd(L)]).
[{1,2},{2,3},{3,1}]
If you are manipulating long lists, you can avoid the creation of the intermediary list tl(L) ++ [hd(L)] using an helper function:
1> L = lists:seq(1,5).
[1,2,3,4,5]
2> Link = fun Link([Last],First,Acc) -> lists:reverse([{Last,First}|Acc]);
Link([X|T],First,Acc) -> Link(T,First,[{X,hd(T)}|Acc]) end.
#Fun<erl_eval.42.127694169>
3> Joinlinks = fun(List) -> Link(List,hd(List),[]) end.
#Fun<erl_eval.6.127694169>
4> Joinlinks(L).
[{1,2},{2,3},{3,4},{4,5},{5,1}]
5>
But if we need more than two process, I think we should connect them
in a ring.
No. For instance, suppose you want to download the text of 10 different web pages. Instead of sending a request, then waiting for the server to respond, then sending the next request, etc., you can spawn a separate process for each request. Each spawned process only needs the pid of the main process, and the main process collects the results as they come in. When a spawned process gets a reply from a server, the spawned process sends a message to the main process with the results, then terminates. The spawned processes have no reason to send messages to each other. No ring.
I would guess that it is unlikely that you will ever create a ring of processes in your erlang career.
I have trouble creating an elegant solution that adds the final {5,1} tuple.
You can create the four other processes passing them self(), which will be different for each spawned process. Then, you can create a separate branch of your create_ring() function that terminates the recursion and returns the pid of the last created process to the main process:
init(N) ->
LastPid = create_ring(....),
create_ring(0, PrevPid) -> PrevPid;
create_ring(N, PrevPid) when N > 0 ->
Pid = spawn(?MODULE, loop, [PrevPid]),
create_ring(.......).
Then, the main process can call (not spawn) the same function that is being spawned by the other processes, passing the function the last pid that was returned by the create_ring() function:
init(N) ->
LastPid = create_ring(...),
loop(LastPid).
As a result, the main process will enter into the same message loop as the other processes, and the main process will have the last pid stored in the loop parameter variable to send messages to.
In erlang, you will often find that while you are defining a function, you won't be able to do everything that you want in that function, so you need to call another function to do whatever it is that is giving you trouble, and if in the second function you find you can't do everything you need to do, then you need to call another function, etc. Applied to the ring problem above, I found that init() couldn't do everything I wanted in one function, so I defined the create_ring() function to handle part of the problem.

Julia #distributed: subsequent code run before all workers finish

I have been headbutting on a wall for a few days around this code:
using Distributed
using SharedArrays
# Dimension size
M=10;
N=100;
z_ijw = zeros(Float64,M,N,M)
z_ijw_tmp = SharedArray{Float64}(M*M*N)
i2s = CartesianIndices(z_ijw)
#distributed for iall=1:(M*M*N)
# get index
i=i2s[iall][1]
j=i2s[iall][2]
w=i2s[iall][3]
# Assign function value
z_ijw_tmp[iall]=sqrt(i+j+w) # Any random function would do
end
# Print the last element of the array
println(z_ijw_tmp[end])
println(z_ijw_tmp[end])
println(z_ijw_tmp[end])
The first printed out number is always 0, the second number is either 0 or 10.95... (sqrt of 120, which is correct). The 3rd is either 0 or 10.95 (if the 2nd is 0)
So it appears that the print code (#mainthread?) is allowed to run before all the workers finish. Is there anyway for the print code to run properly the first time (without a wait command)
Without multiple println, I thought it was a problem with scope and spend a few days reading about it #.#
#distributed with a reducer function, i.e. #distributed (+), will be synced, whereas #distributed without a reducer function will be started asynchronously.
Putting a #sync in front of your #distributed should make the code behave the way you want it to.
This is also noted in the documentation here:
Note that without a reducer function, #distributed executes asynchronously, i.e. it spawns independent tasks on all available workers and returns immediately without waiting for completion. To wait for completion, prefix the call with #sync

How do I make a `Pipe` or `TTY` with a custom callback in Julia?

I'd like to fancy up my embedding of Julia in a MATLAB mex function by hooking up Julia's STDIN, STDOUT, and STDERR to the MATLAB terminal. The documentation for redirect_std[in|out|err] says that the stream that I pass in as the argument needs to be a TTY or a Pipe (or a TcpSocket, which wouldn't seem to apply).
I know how I will define the right callbacks for each stream (basically, wrappers around calls to MATLAB's input and fprintf), but I'm not sure how to construct the required stream.
Pipe was renamed PipeEndpoint in https://github.com/JuliaLang/julia/pull/12739, but the corresponding documentation was not updated and PipeEndpoint is now considered internal. Even so, creating the pipe up front is still doable:
pipe = Pipe()
Base.link_pipe(pipe)
redirect_stdout(pipe.in)
#async while !eof(pipe)
data = readavailable(pipe)
# Pass data to whatever function handles display here
end
Furthermore, the no-argument version of these functions already create a pipe object, so the recommended way to do this would be:
(rd,wr) = redirect_stdout()
#async while !eof(rd)
data = readavailable(rd)
# Pass data to whatever function handles display here
end
Nevertheless, all of this is less clear than it could be, so I have created a pull request to clean up this API: https://github.com/JuliaLang/julia/pull/18253. Once that pull request is merged, the link_pipe call will become unnecessary and pipe can be passed directly into redirect_stdout. Further, the return value from the no-argument version will become a regular Pipe.

Detecting keystrokes in Julia

I have a piece of code in Julia in which a solver iterates many, many times as it seeks a solution to a very complex problem. At present, I have to provide a number of iterations for the code to do, set low enough that I don't have to wait hours for the code to halt in order to save the current state, but high enough that I don't have to keep activating the code every 5 minutes.
Is there a way, with the current state of Julia (0.2), to detect a keystroke instructing the code to either end without saving (in case of problems) or end with saving? I require a method such that the code will continue unimpeded unless such a keystroke event has happened, and that will interrupt on any iteration.
Essentially, I'm looking for a command that will read in a keystroke if a keystroke has occurred (while the terminal that Julia is running in has focus), and run certain code if the keystroke was a specific key. Is this possible?
Note: I'm running julia via xfce4-terminal on Xubuntu, in case that affects the required command.
You can you an asynchronous task to read from STDIN, blocking until something is available to read. In your main computation task, when you are ready to check for input, you can call yield() to lend a few cycles to the read task, and check a global to see if anything was read. For example:
input = ""
#async while true
global input = readavailable(STDIN)
end
for i = 1:10^6 # some long-running computation
if isempty(input)
yield()
else
println("GOT INPUT: ", input)
global input = ""
end
# do some other work here
end
Note that, since this is cooperative multithreading, there are no race conditions.
You may be able to achieve this by sending an interrupt (Ctrl+C). This should work from the REPL without any changes to your code – if you want to implement saving you'll have to handle the resulting InterruptException and prompt the user.
I had some trouble with the answer from steven-g-johnson, and ended up using a Channel to communicate between tasks:
function kbtest()
# allow 'q' pressed on the keyboard to break the loop
quitChannel = Channel(10)
#async while true
kb_input = readline(stdin)
if contains(lowercase(kb_input), "q")
put!(quitChannel, 1)
break
end
end
start_time = time()
while (time() - start_time) < 10
if isready(quitChannel)
break
end
println("in loop # $(time() - start_time)")
sleep(1)
end
println("out of loop # $(time() - start_time)")
end
This requires pressing and then , which works well for my needs.

Resources