asynchronous file IO in Tcl

asynchronous file IO in Tcl - asynchronous

I have a C function that I have wrapped in Tcl that opens a file, reads the contents, performs an operation, and returns a value Unfortunately, when I call the function to open a large file, it blocks the event loop. The OS is linux.
I'd like to make the calls asynchronous. How do I do so?
(I can pass the work to another Tcl thread, but that's not exactly what I want).

This is quite difficult to do in general. The issue is that asynchronous file operations don't work very well with ordinary files due to the abstractions involved at the OS level. The best way around this — if you can — is to build an index over the file first so that you can avoid reading through it all and instead just seek to somewhere close to the data. This is the core of how a database works.
If you can't do that but you can apply a simple filter, putting that filter in a subprocess (pipes do work with asynchronous I/O in Tcl, and they do so on all supported platforms) or another thread (inter-thread messages are nice from an asynch processing perspective too) can work wonders.
Use the above techniques if you can. They're what I believe you should do.
If even that is impractical, you're going to have to do this the hard way.
The hard way involves inserting event-loop-aware delays in your processing.
Introducing delays in 8.5 and before
In Tcl 8.5 and before, you do this by splitting your code up into several pieces in different procedures and using a stanza like this to pass control between them through a “delay”:
# 100ms delay, but tune it yourself
after 100 [list theNextProcedure $oneArgument $another]
This is continuation-passing style, and it can be rather tricky to get right. In particular, it's rather messy with complicated processing. For example, suppose you were doing a loop over the first thousand lines of a file:
proc applyToLines {filename limit callback} {
set f [open $filename]
for {set i 1} {$i <= $limit} {incr i} {
set line [gets $f]
if {[eof $f]} break
$callback $i $line
}
close $f
}
applyToLines "/the/filename.txt" 1000 DoSomething
In classic Tcl CPS, you'd do this:
proc applyToLines {filename limit callback} {
set f [open $filename]
Do1Line $f 1 $limit $callback
}
proc Do1Line {f i limit callback} {
set line [gets $f]
if {![eof $f]} {
$callback $i $line
if {[incr i] <= $limit} {
after 10 [list Do1Line $f $i $limit $callback]
return
}
}
close $f
}
applyToLines "/the/filename.txt" 1000 DoSomething
As you can see, it's not a simple transformation, and if you wanted to do something once the processing was done, you'd need to pass around a callback. (You could also use globals, but that's hardly elegant…)
(If you want help changing your code to work this was, you'll need to show us the code that you want help with.)
Introducing delays in 8.6
In Tcl 8.6, though the above code techniques will still work, you've got another option: coroutines! We can write this instead:
proc applyToLines {filename limit callback} {
set f [open $filename]
for {set i 1} {$i <= $limit} {incr i} {
set line [gets $f]
if {[eof $f]} break
yield [after 10 [info coroutine]]
$callback $i $line
}
close $f
}
coroutine ApplyToAFile applyToLines "/the/filename.txt" 1000 DoSomething
That's almost the same, except for the line with yield and info coroutine (which suspends the coroutine until it is resumed from the event loop in about 10ms time) and the line with coroutine ApplyToAFile, where that prefix creates a coroutine (with the given arbitrary name ApplyToAFile) and sets it running. As you can see, it's not too hard to transform your code like this.
(There is no chance at all of a backport of the coroutine engine to 8.5 or before; it completely requires the non-recursive script execution engine in 8.6.)

Tcl does support asynchronous I/O on its channels (hence including files) using event-style (callback) approach.
The idea is to register a script as a callback for the so-called readable event on an opened channel set to a non-blocking mode and then in that script call read on the channel once, process the data read and then test for whether that read operation hit the EOF condition, in which case close the file.
Basically this looks like this:
set data ""
set done false
proc read_chunk fd {
global data
append data [read $fd]
if {[eof $fd]} {
close $fd
set ::done true
}
}
set fd [open file]
chan configure $fd -blocking no
chan event $fd readable [list read_chunk $fd]
vwait ::done
(Two points: a) In case of Tcl ≤ 8.5 you'll have to use fconfigure instead of chan configure and fileevent instead of chan event; b) If you're using Tk you don't need vwait as Tk already forces the Tcl event loop to run).
Note one caveat though: if the file you're reading is located on a physically attached fast medium (like rotating disk, SSD etc) it will be quite highly available which means the Tcl's event loop will be saturated with the readable events on your file and the overall user experience will likely be worse than if you'd read it in one gulp because the Tk UI uses idle-priority callbacks for many of its tasks, and they won't get any chance to run until your file is read; in the end you'll have sluggish or frozen UI anyway and the file will be read slower (in the wall-clock time terms) compared to the case of reading it in a single gulp.
There are two possible solutions:
Do use a separate thread.
Employ a hack which gives a chance for the idle-priority events to run — in your callback script for the readable event schedule execution of another callback script with the idle priority:
chan event $fd readable [list after idle [list read_chunk $fd]]
Obviously, this actually doubles the number of events piped through the Tcl event loop in response to the chunks of the file's data becoming "available" but in exchange it brings the priority of processing your file's data down to that of UI events.
You might also be tempted to just call update in your readable callback to force the event loop to process the UI event, — please don't.
There's yet another approach available since Tcl 8.6: coroutines. The chief idea is that instead of using events you interleave reading a file using reasonably small chunks with some other processing. Both tasks should be implemented as coroutines periodically yielding into each other thus creating a cooperative multitasking. Wiki has more info on this.

Related

How to output subsections in correct order from Jupyter notebooks when using concurrent execution?

My code is, simplified, something like:
import concurrent.futures
from IPython.display import display, display_markdown
def f(parameter):
display_markdown("## %s" % (column), raw=True)
# Do some processing
display(parameter)
# Do some more processing
display(parameter)
# Do even more processing
display(parameter)
with concurrent.futures.ThreadPoolExecutor() as executor:
for result in executor.map(f, range(5)):
pass # Intentionally ignore results
The problem with this is that, because the function f gets intentionally executed multiple times in parallel and the processing takes an individual amount of time, that the display_markdown and display calls are executed interleaved.
How can I ensure that the subsections/ output of each invocation of f are outputted together/ without interleaving with the others?
And, because the processing is taking some time, to see the intermediate results/ outputs while they are executed?
Logically somehow Jupyter has to maintain some cursor/ pointer for each invocation of f and insert its output at the memorized point, while further output already happened after it, instead of just outputting it at the end.

Reciprocally-resuming coroutines

I apologize beforehand for the length of this question. I have tried to make it as succinct as possible, but it's just a rather complicated beast.
In chapter 24 of Ierusalimschy's Programming in Lua (4th ed.), the author presents a toy ("ugly") implementation of any asynchronous I/O library, like this one1:
-- filename: async.lua
-- Based (with several modifications) on Listing 24.3 (p. 246) of *Programming
-- in Lua*, 4th edition.
local async = {}
local queue = {}
local function enqueue (command) table.insert(queue, command) end
function async.readline (stream, callback)
enqueue(function () callback(stream:read()) end)
end
function async.writeline (stream, line, callback)
enqueue(function () callback(stream:write(line)) end)
end
function async.stop () enqueue("stop") end
function async.runloop ()
while true do
local next_command = table.remove(queue, 1)
if next_command == "stop" then break end
next_command()
end
end
return async
The author uses this toy library to illustrate some applications of coroutines, such as the scheme shown below for running "synchronous code on top of the asynchronous library"2:
-- Based (with several modifications) on Listing 24.5 (p. 248) of *Programming
-- in Lua*, 4th edition.
local async = require "async"
function run (synchronous_code)
local co = coroutine.create(function ()
synchronous_code()
async.stop()
end)
local wrapper = function ()
local status, result = assert(coroutine.resume(co))
return result
end
wrapper()
async.runloop()
end
function getline (stream)
local co = coroutine.running()
local callback = function (line) assert(coroutine.resume(co, line)) end
async.readline(stream, callback)
local line = coroutine.yield()
return line
end
function putline (stream, line)
local co = coroutine.running()
local callback = function () assert(coroutine.resume(co)) end
async.writeline(stream, line, callback)
coroutine.yield()
end
The author uses this technique to implement a function that prints to stdout in reverse order the lines it read from stdin:
function synchronous_code ()
local lines = {}
local input = io.input()
local output = io.output()
while true do
local line = getline(input)
if not line then break end
table.insert(lines, line)
end
for i = #lines, 1, -1 do putline(output, lines[i] .. "\n") end
end
run(synchronous_code)
The general idea is that the run function creates a coroutine that "registers" itself (through the callbacks created by getline and putline) into the asynchronous library's main loop. Whenever the asynchronous library's main loop executes one of these callbacks, it resumes the coroutine, which can do a bit more of its work, including registering the next callback with the main loop.
The run function gets the ball rolling by invoking the wrapper function, which, in turn, "resumes" (actually starts) the coroutine. The coroutine then runs until it encounters the first yield statement, which, in this example, happens within getline, right after getline has registered a callback into the async library's queue. Then the wrapper function regains control and returns. Finally, run invokes async.runloop. As async.runloop starts processing its queue, it resumes the coroutine, and off we go. The "synchronous code" (running within the coroutine) continues until the next getline or putline yields (after registering a callback), and async's main loop takes over again.
So far so good. But then, in Exercise 24.4 (p. 249), the author asks:
Exercise 24.4: Write a line iterator for the coroutine-based library (Listing 24.5), so that you can read the file with a for loop.
("Listing 24.5" refers to the code in the second code fragment above, where run, getline, and putline are defined.)
I am completely stumped with this one. In the example above, the coroutine "delivers" the lines it reads by writing them to stdout, which it can do all by itself. In contrast, the iterator requested by Exercise 24.4 would have to deliver its lines to a different coroutine, the one that is doing the iteration.
The only way that I can imagine this could happen is if the two coroutines could reciprocally resume each other. Is that even possible? I have not been able to construct a simple example of this, and would appreciate to see code that does it3.
Also, it seems to me that for this to work at all, one would need to implement an object with a write method (so that it can be passed to putline) that is ultimately responsible for delivering lines (somehow) to the iterator's coroutine.
1I have changed some superficial details, such as the names of variables, indentation, etc. The overall structure and function are unchanged.
2Again, I have changed some inessential details, to make the code easier for me to follow.
3 It is worth noting that the remaining two exercises for this chapter (24.5 and 24.6) are both about implementing systems involving multiple concurrent coroutines. Therefore, it is not farfetched to imagine that Exercise 24.4 is also about having two coroutines talking to each other.

I believe you're completely overthinking this exercise. The way I understand it, you're only meant to write a synchronous-style for iterator that runs within the synchronous code given to the run function. Taking the third code block as a base:
function for_file(file)
return function(file)
return getline(file)
end, file, nil
end
function synchronous_code ()
local lines = {}
local input = io.input()
local output = io.output()
for line in for_line(input) do
table.insert(lines, line)
end
for i = #lines, 1, -1 do putline(output, lines[i] .. "\n") end
end
run(synchronous_code)
As you can see, you don't really need to be aware of the coroutines at all for this to work, which is kind of the point of the library.

Is there any real use of the forever keyword in Qt? [duplicate]

Why do some people use while(true){} blocks in their code? How does it work?

It's an infinite loop. At each iteration, the condition will be evaluated. Since the condition is true, which is always... true... the loop will run forever. Exiting the loop is done by checking something inside the loop, and then breaking if necessary.
By placing the break check inside the loop, instead of using it as the condition, this can make it more clear that you're expecting this to run until some event occurs.
A common scenario where this is used is in games; you want to keep processing the action and rendering frames until the game is quit.

It's just a loop that never ends on its own, known as an infinite-loop. (Often times, that's a bad thing.)
When it's empty, it serves to halt the program indefinitely*; otherwise there's typically some condition in the loop that, when true, breaks the loop:
while (true)
{
// ...
if (stopLoop)
break;
// ...
}
This is often cleaner than an auxiliary flag:
bool run = true;
while (run)
{
// ...
if (stopLoop)
{
run = false;
continue; // jump to top
}
// ...
}
Also note some will recommend for (;;) instead, for various reasons. (Namely, it might get rid of a warning akin to "conditional expression is always true".)
*In most languages.

Rather than stuff all possible conditions in the while statement,
// Always tests all conditions in loop header:
while( (condition1 && condition2) || condition3 || conditionN_etc ) {
// logic...
if (notable_condition)
continue; // skip remainder, go direct to evaluation portion of loop
// more logic
// maybe more notable conditions use keyword: continue
}
Some programmers might argue it's better to put the conditions throughough the logic, (i.e. not just inside the loop header) and to employ break statements to get out at appropriate places. This approach will usually negate the otherwise original conditions to determine when to leave the loop (i.e. instead of when to keep looping).
// Always tests all conditions in body of loop logic:
while(true) {
//logic...
if (!condition1 || !condition2)
break; // Break out for good.
// more logic...
if (!condition3)
break;
// even more logic ...
}
In real life it's often a more gray mixture, a combination of all these things, instead of a polarized decision to go one way or another.
Usage will depend on the complexity of the logic and the preferences of the programmer .. and maybe on the accepted answer of this thread :)
Also don't forget about do..while. The ultimate solution may use that version of the while construct to twist conditional logic to their liking.
do {
//logic with possible conditional tests and break or continue
} while (true); /* or many conditional tests */
In summary it's just nice to have options as a programmer. So don't forget to thank your compiler authors.

When Edsger W. Dijkstra was young, this was equivalent to:
Do loop initialization
label a:
Do some code
If (Loop is stoppable and End condition is met) goto label b
/* nowadays replaced by some kind of break() */
Do some more code, probably incrementing counters
go to label a
label b:
Be happy and continue
After Dijkstra decided to become Antigotoist, and convinced hordes of programmers to do so, a religious faith came upon earth and the truthiness of code was evident.
So the
Do loop initialization
While (true){
some code
If (Loop is stoppable and End condition is met) break();
Do some more code, probably incrementing counters
}
Be happy and continue
Replaced the abomination.
Not happy with that, fanatics went above and beyond. Once proved that recursion was better, clearer and more general that looping, and that variables are just a diabolic incarnation, Functional Programming, as a dream, came true:
Nest[f[.],x, forever[May God help you break]]
And so, loops recursion became really unstoppable, or at least undemonstratively stoppable.

while (the condition){do the function}
when the condition is true.. it will do the function.
so while(true)
the condition is always true
it will continue looping.
the coding will never proceed.

It's a loop that runs forever, unless there's a break statement somewhere inside the body.

The real point to have while (true) {..} is when semantics of exit conditions have no strong single preference, so its nice way to say to reader, that "well, there are actually break conditions A, B, C .., but calculations of conditions are too lengthy, so they were put into inner blocks independently in order of expected probability of appearance".

This code refers to that inside of it will run indefinitely.
i = 0
while(true)
{
i++;
}
echo i; //this code will never be reached
Unless inside of curly brackets is something like:
if (i > 100) {
break; //this will break the while loop
}
or this is another possibility how to stop while loop:
if (i > 100) {
return i;
}
It is useful to use during some testing. Or during casual coding. Or, like another answer is pointing out, in videogames.
But what I consider as bad practice is using it in production code.
For example, during debugging I want to know immediately what needs to be done in order to stop while. I don't want to search in the function for some hidden break or return.
Or the programmer can easily forget to add it there and data in a database can be affected before the code is stopped by other manners.
So ideal would be something like this:
i = 0
while(i < 100)
{
i++;
}
echo i; //this code will be reached in this scenario

Erlang: Make a ring

I'm quite new to Erlang (Reading through "Software for a Concurrent World"). From what I've read, we link two processes together to form a reliable system.
But if we need more than two process, I think we should connect them in a ring. Although this is slightly tangential to my actual question, please let me know if this is incorrect.
Given a list of PIDs:
[1,2,3,4,5]
I want to form these in a ring of {My_Pid, Linked_Pid} tuples:
[{1,2},{2,3},{3,4},{4,5},{5,1}]
I have trouble creating an elegant solution that adds the final {5,1} tuple.
Here is my attempt:
% linkedPairs takes [1,2,3] and returns [{1,2},{2,3}]
linkedPairs([]) -> [];
linkedPairs([_]) -> [];
linkedPairs([X1,X2|Xs]) -> [{X1, X2} | linkedPairs([X2|Xs])].
% joinLinks takes [{1,2},{2,3}] and returns [{1,2},{2,3},{3,1}]
joinLinks([{A, _}|_]=P) ->
{X, Y} = lists:last(P)
P ++ [{Y, A}].
% makeRing takes [1,2,3] and returns [{1,2},{2,3},{3,1}]
makeRing(PIDs) -> joinLinks(linkedPairs(PIDs)).
I cringe when looking at my joinLinks function - list:last is slow (I think), and it doesn't look very "functional".
Is there a better, more idiomatic solution to this?
If other functional programmers (non-Erlang) stumble upon this, please post your solution - the concepts are the same.

Use lists:zip with the original list and its 'rotated' version:
1> L=[1,2,3].
[1,2,3]
2> lists:zip(L, tl(L) ++ [hd(L)]).
[{1,2},{2,3},{3,1}]

If you are manipulating long lists, you can avoid the creation of the intermediary list tl(L) ++ [hd(L)] using an helper function:
1> L = lists:seq(1,5).
[1,2,3,4,5]
2> Link = fun Link([Last],First,Acc) -> lists:reverse([{Last,First}|Acc]);
Link([X|T],First,Acc) -> Link(T,First,[{X,hd(T)}|Acc]) end.
#Fun<erl_eval.42.127694169>
3> Joinlinks = fun(List) -> Link(List,hd(List),[]) end.
#Fun<erl_eval.6.127694169>
4> Joinlinks(L).
[{1,2},{2,3},{3,4},{4,5},{5,1}]
5>

But if we need more than two process, I think we should connect them
in a ring.
No. For instance, suppose you want to download the text of 10 different web pages. Instead of sending a request, then waiting for the server to respond, then sending the next request, etc., you can spawn a separate process for each request. Each spawned process only needs the pid of the main process, and the main process collects the results as they come in. When a spawned process gets a reply from a server, the spawned process sends a message to the main process with the results, then terminates. The spawned processes have no reason to send messages to each other. No ring.
I would guess that it is unlikely that you will ever create a ring of processes in your erlang career.
I have trouble creating an elegant solution that adds the final {5,1} tuple.
You can create the four other processes passing them self(), which will be different for each spawned process. Then, you can create a separate branch of your create_ring() function that terminates the recursion and returns the pid of the last created process to the main process:
init(N) ->
LastPid = create_ring(....),
create_ring(0, PrevPid) -> PrevPid;
create_ring(N, PrevPid) when N > 0 ->
Pid = spawn(?MODULE, loop, [PrevPid]),
create_ring(.......).
Then, the main process can call (not spawn) the same function that is being spawned by the other processes, passing the function the last pid that was returned by the create_ring() function:
init(N) ->
LastPid = create_ring(...),
loop(LastPid).
As a result, the main process will enter into the same message loop as the other processes, and the main process will have the last pid stored in the loop parameter variable to send messages to.
In erlang, you will often find that while you are defining a function, you won't be able to do everything that you want in that function, so you need to call another function to do whatever it is that is giving you trouble, and if in the second function you find you can't do everything you need to do, then you need to call another function, etc. Applied to the ring problem above, I found that init() couldn't do everything I wanted in one function, so I defined the create_ring() function to handle part of the problem.

Detecting keystrokes in Julia

I have a piece of code in Julia in which a solver iterates many, many times as it seeks a solution to a very complex problem. At present, I have to provide a number of iterations for the code to do, set low enough that I don't have to wait hours for the code to halt in order to save the current state, but high enough that I don't have to keep activating the code every 5 minutes.
Is there a way, with the current state of Julia (0.2), to detect a keystroke instructing the code to either end without saving (in case of problems) or end with saving? I require a method such that the code will continue unimpeded unless such a keystroke event has happened, and that will interrupt on any iteration.
Essentially, I'm looking for a command that will read in a keystroke if a keystroke has occurred (while the terminal that Julia is running in has focus), and run certain code if the keystroke was a specific key. Is this possible?
Note: I'm running julia via xfce4-terminal on Xubuntu, in case that affects the required command.

You can you an asynchronous task to read from STDIN, blocking until something is available to read. In your main computation task, when you are ready to check for input, you can call yield() to lend a few cycles to the read task, and check a global to see if anything was read. For example:
input = ""
#async while true
global input = readavailable(STDIN)
end
for i = 1:10^6 # some long-running computation
if isempty(input)
yield()
else
println("GOT INPUT: ", input)
global input = ""
end
# do some other work here
end
Note that, since this is cooperative multithreading, there are no race conditions.

You may be able to achieve this by sending an interrupt (Ctrl+C). This should work from the REPL without any changes to your code – if you want to implement saving you'll have to handle the resulting InterruptException and prompt the user.

I had some trouble with the answer from steven-g-johnson, and ended up using a Channel to communicate between tasks:
function kbtest()
# allow 'q' pressed on the keyboard to break the loop
quitChannel = Channel(10)
#async while true
kb_input = readline(stdin)
if contains(lowercase(kb_input), "q")
put!(quitChannel, 1)
break
end
end
start_time = time()
while (time() - start_time) < 10
if isready(quitChannel)
break
end
println("in loop # $(time() - start_time)")
sleep(1)
end
println("out of loop # $(time() - start_time)")
end
This requires pressing and then , which works well for my needs.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

asynchronous file IO in Tcl - asynchronous

Related

How to output subsections in correct order from Jupyter notebooks when using concurrent execution?

Reciprocally-resuming coroutines

Is there any real use of the forever keyword in Qt? [duplicate]

Erlang: Make a ring

Detecting keystrokes in Julia

Categories

Resources