I have the following simple OCaml Async job which is supposed to write to a file and terminate the process.
Unix.openfile "foobar" ~mode:[`Creat;`Rdwr]
>>= fun fd ->
let wr = Writer.create fd in
Writer.write wr "this is a test";
Unix.close fd
>>= fun () ->
exit 0
However, it seems that fd gets closed before the write is performed (part of displayed sexp is "writer fd unexpectedly closed"). Is there a way to wait for write to complete before closing the file descriptor? Actually, I don't understand why Writer.write doesn't return a Deferred.t just as Reader.read does. Wouldn't it solve the issue?
My problem is actually a little more general. Basically, I have a periodic job that writes something to a file using Clock.every'. The program can exit at any time and close the file descriptors. How to make sure that all the writes are processed before the fd get closed?
If my stopping job is:
Unix.close fd
>>= fun () ->
exit 0
A scheduled write can very well happen between Unix.close fd and exit 0.
In your particular case, you shouldn't close the file descriptor directly with the Unix.close function. The idea is that you actually moved the ownership of fd to the writer, so it is now the writer's responsibility to close the file descriptor. So, you need to use Writer.close that returns a unit deferred. This function, will wait until all pending writes are finished, and then close the file descriptor, e.g.,
Unix.openfile "foobar" ~mode:[`Creat;`Rdwr]
>>= fun fd ->
let wr = Writer.create fd in
Writer.write wr "this is a test";
Writer.close fd
>>= fun () ->
exit 0
Answering your more general question "Is there a way to wait for write to complete before closing the file descriptor?". Yes, Writer.close will wait. In the sence that it will return a deferred that will become determined after everything is written and fd is closed. You can also use a force_close parameter, that will force the closing operation leading to abrupt writes, if it considered a better option to the hanging program. E.g., you can give a program a reasonable amount of time to flush data, and then terminate with an error.
Related
I apologize beforehand for the length of this question. I have tried to make it as succinct as possible, but it's just a rather complicated beast.
In chapter 24 of Ierusalimschy's Programming in Lua (4th ed.), the author presents a toy ("ugly") implementation of any asynchronous I/O library, like this one1:
-- filename: async.lua
-- Based (with several modifications) on Listing 24.3 (p. 246) of *Programming
-- in Lua*, 4th edition.
local async = {}
local queue = {}
local function enqueue (command) table.insert(queue, command) end
function async.readline (stream, callback)
enqueue(function () callback(stream:read()) end)
end
function async.writeline (stream, line, callback)
enqueue(function () callback(stream:write(line)) end)
end
function async.stop () enqueue("stop") end
function async.runloop ()
while true do
local next_command = table.remove(queue, 1)
if next_command == "stop" then break end
next_command()
end
end
return async
The author uses this toy library to illustrate some applications of coroutines, such as the scheme shown below for running "synchronous code on top of the asynchronous library"2:
-- Based (with several modifications) on Listing 24.5 (p. 248) of *Programming
-- in Lua*, 4th edition.
local async = require "async"
function run (synchronous_code)
local co = coroutine.create(function ()
synchronous_code()
async.stop()
end)
local wrapper = function ()
local status, result = assert(coroutine.resume(co))
return result
end
wrapper()
async.runloop()
end
function getline (stream)
local co = coroutine.running()
local callback = function (line) assert(coroutine.resume(co, line)) end
async.readline(stream, callback)
local line = coroutine.yield()
return line
end
function putline (stream, line)
local co = coroutine.running()
local callback = function () assert(coroutine.resume(co)) end
async.writeline(stream, line, callback)
coroutine.yield()
end
The author uses this technique to implement a function that prints to stdout in reverse order the lines it read from stdin:
function synchronous_code ()
local lines = {}
local input = io.input()
local output = io.output()
while true do
local line = getline(input)
if not line then break end
table.insert(lines, line)
end
for i = #lines, 1, -1 do putline(output, lines[i] .. "\n") end
end
run(synchronous_code)
The general idea is that the run function creates a coroutine that "registers" itself (through the callbacks created by getline and putline) into the asynchronous library's main loop. Whenever the asynchronous library's main loop executes one of these callbacks, it resumes the coroutine, which can do a bit more of its work, including registering the next callback with the main loop.
The run function gets the ball rolling by invoking the wrapper function, which, in turn, "resumes" (actually starts) the coroutine. The coroutine then runs until it encounters the first yield statement, which, in this example, happens within getline, right after getline has registered a callback into the async library's queue. Then the wrapper function regains control and returns. Finally, run invokes async.runloop. As async.runloop starts processing its queue, it resumes the coroutine, and off we go. The "synchronous code" (running within the coroutine) continues until the next getline or putline yields (after registering a callback), and async's main loop takes over again.
So far so good. But then, in Exercise 24.4 (p. 249), the author asks:
Exercise 24.4: Write a line iterator for the coroutine-based library (Listing 24.5), so that you can read the file with a for loop.
("Listing 24.5" refers to the code in the second code fragment above, where run, getline, and putline are defined.)
I am completely stumped with this one. In the example above, the coroutine "delivers" the lines it reads by writing them to stdout, which it can do all by itself. In contrast, the iterator requested by Exercise 24.4 would have to deliver its lines to a different coroutine, the one that is doing the iteration.
The only way that I can imagine this could happen is if the two coroutines could reciprocally resume each other. Is that even possible? I have not been able to construct a simple example of this, and would appreciate to see code that does it3.
Also, it seems to me that for this to work at all, one would need to implement an object with a write method (so that it can be passed to putline) that is ultimately responsible for delivering lines (somehow) to the iterator's coroutine.
1I have changed some superficial details, such as the names of variables, indentation, etc. The overall structure and function are unchanged.
2Again, I have changed some inessential details, to make the code easier for me to follow.
3 It is worth noting that the remaining two exercises for this chapter (24.5 and 24.6) are both about implementing systems involving multiple concurrent coroutines. Therefore, it is not farfetched to imagine that Exercise 24.4 is also about having two coroutines talking to each other.
I believe you're completely overthinking this exercise. The way I understand it, you're only meant to write a synchronous-style for iterator that runs within the synchronous code given to the run function. Taking the third code block as a base:
function for_file(file)
return function(file)
return getline(file)
end, file, nil
end
function synchronous_code ()
local lines = {}
local input = io.input()
local output = io.output()
for line in for_line(input) do
table.insert(lines, line)
end
for i = #lines, 1, -1 do putline(output, lines[i] .. "\n") end
end
run(synchronous_code)
As you can see, you don't really need to be aware of the coroutines at all for this to work, which is kind of the point of the library.
I am attempting to send a message from one process that I spawned to another for an assignment, I feel like I am very close here, but I think my syntax is just a bit off:
-module(assignment6).
-export([start/1, process1/2, process2/0, send_message/2]).
process1(N, Pid) ->
Message = "This is the original Message",
if
N == 1 ->
timer:sleep(3000),
send_message(Pid, Message);
N > 1 ->
timer:sleep(3000),
send_message(Pid, Message),
process1(N-1, Pid);
true ->
io:fwrite("Negative/0, Int/Floating-Point Numbers not allowed")
end.
process2() ->
recieve
Message ->
io:fwrite(Message),
io:fwrite("~n");
end.
send_message(Pid, Message) ->
Pid ! {Message}.
start(N) ->
Pid = spawn(assignment6, process2, []),
spawn(assignment6, process1, [N, Pid]).
The goal of this program is that the Message, will be printed out N times when the function is started, but be delayed enough so that I can hot-swap the wording of the message mid-run. I just can't quite get the Message to process2 for printout.
Four small things:
It's spelled receive, not recieve
Remove the semicolon in process2. The last clause in a receive expression does not have a terminating semicolon. You can see this in the if expression in process1: the first two clauses end with a semicolon, but the third one does not.
In process2, print the message like this:
io:fwrite("~p~n", [Message])
Since Message is a tuple, not a string, passing it as the first argument to io:fwrite causes a badarg error. Let's ask io:fwrite to format it for us instead.
process2 should probably call itself after printing the message. Otherwise, it will receive one message and then exit.
So now you can run the code, and while it's running you can load a new version of the module with a different message (so called "hot code swapping"). Will that change the message being printed? Why / why not?
It won't. process1 does a local call to itself, which means that it stays in the old version of the module. Do an external call instead (explicitly specifying the module: assignment6:process1(N-1, Pid)), and it will switch to the new version.
I'm quite new to Erlang (Reading through "Software for a Concurrent World"). From what I've read, we link two processes together to form a reliable system.
But if we need more than two process, I think we should connect them in a ring. Although this is slightly tangential to my actual question, please let me know if this is incorrect.
Given a list of PIDs:
[1,2,3,4,5]
I want to form these in a ring of {My_Pid, Linked_Pid} tuples:
[{1,2},{2,3},{3,4},{4,5},{5,1}]
I have trouble creating an elegant solution that adds the final {5,1} tuple.
Here is my attempt:
% linkedPairs takes [1,2,3] and returns [{1,2},{2,3}]
linkedPairs([]) -> [];
linkedPairs([_]) -> [];
linkedPairs([X1,X2|Xs]) -> [{X1, X2} | linkedPairs([X2|Xs])].
% joinLinks takes [{1,2},{2,3}] and returns [{1,2},{2,3},{3,1}]
joinLinks([{A, _}|_]=P) ->
{X, Y} = lists:last(P)
P ++ [{Y, A}].
% makeRing takes [1,2,3] and returns [{1,2},{2,3},{3,1}]
makeRing(PIDs) -> joinLinks(linkedPairs(PIDs)).
I cringe when looking at my joinLinks function - list:last is slow (I think), and it doesn't look very "functional".
Is there a better, more idiomatic solution to this?
If other functional programmers (non-Erlang) stumble upon this, please post your solution - the concepts are the same.
Use lists:zip with the original list and its 'rotated' version:
1> L=[1,2,3].
[1,2,3]
2> lists:zip(L, tl(L) ++ [hd(L)]).
[{1,2},{2,3},{3,1}]
If you are manipulating long lists, you can avoid the creation of the intermediary list tl(L) ++ [hd(L)] using an helper function:
1> L = lists:seq(1,5).
[1,2,3,4,5]
2> Link = fun Link([Last],First,Acc) -> lists:reverse([{Last,First}|Acc]);
Link([X|T],First,Acc) -> Link(T,First,[{X,hd(T)}|Acc]) end.
#Fun<erl_eval.42.127694169>
3> Joinlinks = fun(List) -> Link(List,hd(List),[]) end.
#Fun<erl_eval.6.127694169>
4> Joinlinks(L).
[{1,2},{2,3},{3,4},{4,5},{5,1}]
5>
But if we need more than two process, I think we should connect them
in a ring.
No. For instance, suppose you want to download the text of 10 different web pages. Instead of sending a request, then waiting for the server to respond, then sending the next request, etc., you can spawn a separate process for each request. Each spawned process only needs the pid of the main process, and the main process collects the results as they come in. When a spawned process gets a reply from a server, the spawned process sends a message to the main process with the results, then terminates. The spawned processes have no reason to send messages to each other. No ring.
I would guess that it is unlikely that you will ever create a ring of processes in your erlang career.
I have trouble creating an elegant solution that adds the final {5,1} tuple.
You can create the four other processes passing them self(), which will be different for each spawned process. Then, you can create a separate branch of your create_ring() function that terminates the recursion and returns the pid of the last created process to the main process:
init(N) ->
LastPid = create_ring(....),
create_ring(0, PrevPid) -> PrevPid;
create_ring(N, PrevPid) when N > 0 ->
Pid = spawn(?MODULE, loop, [PrevPid]),
create_ring(.......).
Then, the main process can call (not spawn) the same function that is being spawned by the other processes, passing the function the last pid that was returned by the create_ring() function:
init(N) ->
LastPid = create_ring(...),
loop(LastPid).
As a result, the main process will enter into the same message loop as the other processes, and the main process will have the last pid stored in the loop parameter variable to send messages to.
In erlang, you will often find that while you are defining a function, you won't be able to do everything that you want in that function, so you need to call another function to do whatever it is that is giving you trouble, and if in the second function you find you can't do everything you need to do, then you need to call another function, etc. Applied to the ring problem above, I found that init() couldn't do everything I wanted in one function, so I defined the create_ring() function to handle part of the problem.
I'm trying to execute external program from Julia via run, then wait until it finishes and store its output into a variable.
The only solution I came up with is this:
callback = function(data)
print(data)
end
open(`minizinc com.mzn com.dzn`) do f
x = readall(f)
callback(x)
end
The problem is that I do not want to use callbacks.
Is there any way, how to wait until the process is finished and then continue in executing?
Thanks in advance
You can just call readall (or readstring on Julia master) on the command object:
julia> readall(`echo Hello`)
"Hello\n"
I have a C function that I have wrapped in Tcl that opens a file, reads the contents, performs an operation, and returns a value Unfortunately, when I call the function to open a large file, it blocks the event loop. The OS is linux.
I'd like to make the calls asynchronous. How do I do so?
(I can pass the work to another Tcl thread, but that's not exactly what I want).
This is quite difficult to do in general. The issue is that asynchronous file operations don't work very well with ordinary files due to the abstractions involved at the OS level. The best way around this — if you can — is to build an index over the file first so that you can avoid reading through it all and instead just seek to somewhere close to the data. This is the core of how a database works.
If you can't do that but you can apply a simple filter, putting that filter in a subprocess (pipes do work with asynchronous I/O in Tcl, and they do so on all supported platforms) or another thread (inter-thread messages are nice from an asynch processing perspective too) can work wonders.
Use the above techniques if you can. They're what I believe you should do.
If even that is impractical, you're going to have to do this the hard way.
The hard way involves inserting event-loop-aware delays in your processing.
Introducing delays in 8.5 and before
In Tcl 8.5 and before, you do this by splitting your code up into several pieces in different procedures and using a stanza like this to pass control between them through a “delay”:
# 100ms delay, but tune it yourself
after 100 [list theNextProcedure $oneArgument $another]
This is continuation-passing style, and it can be rather tricky to get right. In particular, it's rather messy with complicated processing. For example, suppose you were doing a loop over the first thousand lines of a file:
proc applyToLines {filename limit callback} {
set f [open $filename]
for {set i 1} {$i <= $limit} {incr i} {
set line [gets $f]
if {[eof $f]} break
$callback $i $line
}
close $f
}
applyToLines "/the/filename.txt" 1000 DoSomething
In classic Tcl CPS, you'd do this:
proc applyToLines {filename limit callback} {
set f [open $filename]
Do1Line $f 1 $limit $callback
}
proc Do1Line {f i limit callback} {
set line [gets $f]
if {![eof $f]} {
$callback $i $line
if {[incr i] <= $limit} {
after 10 [list Do1Line $f $i $limit $callback]
return
}
}
close $f
}
applyToLines "/the/filename.txt" 1000 DoSomething
As you can see, it's not a simple transformation, and if you wanted to do something once the processing was done, you'd need to pass around a callback. (You could also use globals, but that's hardly elegant…)
(If you want help changing your code to work this was, you'll need to show us the code that you want help with.)
Introducing delays in 8.6
In Tcl 8.6, though the above code techniques will still work, you've got another option: coroutines! We can write this instead:
proc applyToLines {filename limit callback} {
set f [open $filename]
for {set i 1} {$i <= $limit} {incr i} {
set line [gets $f]
if {[eof $f]} break
yield [after 10 [info coroutine]]
$callback $i $line
}
close $f
}
coroutine ApplyToAFile applyToLines "/the/filename.txt" 1000 DoSomething
That's almost the same, except for the line with yield and info coroutine (which suspends the coroutine until it is resumed from the event loop in about 10ms time) and the line with coroutine ApplyToAFile, where that prefix creates a coroutine (with the given arbitrary name ApplyToAFile) and sets it running. As you can see, it's not too hard to transform your code like this.
(There is no chance at all of a backport of the coroutine engine to 8.5 or before; it completely requires the non-recursive script execution engine in 8.6.)
Tcl does support asynchronous I/O on its channels (hence including files) using event-style (callback) approach.
The idea is to register a script as a callback for the so-called readable event on an opened channel set to a non-blocking mode and then in that script call read on the channel once, process the data read and then test for whether that read operation hit the EOF condition, in which case close the file.
Basically this looks like this:
set data ""
set done false
proc read_chunk fd {
global data
append data [read $fd]
if {[eof $fd]} {
close $fd
set ::done true
}
}
set fd [open file]
chan configure $fd -blocking no
chan event $fd readable [list read_chunk $fd]
vwait ::done
(Two points: a) In case of Tcl ≤ 8.5 you'll have to use fconfigure instead of chan configure and fileevent instead of chan event; b) If you're using Tk you don't need vwait as Tk already forces the Tcl event loop to run).
Note one caveat though: if the file you're reading is located on a physically attached fast medium (like rotating disk, SSD etc) it will be quite highly available which means the Tcl's event loop will be saturated with the readable events on your file and the overall user experience will likely be worse than if you'd read it in one gulp because the Tk UI uses idle-priority callbacks for many of its tasks, and they won't get any chance to run until your file is read; in the end you'll have sluggish or frozen UI anyway and the file will be read slower (in the wall-clock time terms) compared to the case of reading it in a single gulp.
There are two possible solutions:
Do use a separate thread.
Employ a hack which gives a chance for the idle-priority events to run — in your callback script for the readable event schedule execution of another callback script with the idle priority:
chan event $fd readable [list after idle [list read_chunk $fd]]
Obviously, this actually doubles the number of events piped through the Tcl event loop in response to the chunks of the file's data becoming "available" but in exchange it brings the priority of processing your file's data down to that of UI events.
You might also be tempted to just call update in your readable callback to force the event loop to process the UI event, — please don't.
There's yet another approach available since Tcl 8.6: coroutines. The chief idea is that instead of using events you interleave reading a file using reasonably small chunks with some other processing. Both tasks should be implemented as coroutines periodically yielding into each other thus creating a cooperative multitasking. Wiki has more info on this.