I'm warming up with Clojure and started to write a few simple functions.
I'm realizing how the language is clearly well-suited for parallel computation and this got me thinking. I've got an app (written in Java but whatever) that works the following way:
one thread waits for input to come in (filesystem in my case but it could be network or whatever) and puts that input once it arrives on a queue
several consumers fetch data from that queue and process the data in parallel
The code that puts the input to be parallely processed may look like this (it's just an example):
asynchFetchInput( new MyCallBack() {
public void handle( Input input ) {
queue.put(input)
}
})
Where asynchFetchInput would spawn a Thread and then call the callback.
It's really just an example but if someone could explain how to do something similar using Clojure it would greatly help me understand the "bigger picture".
If you have to transform data, you can make it into a seq, then you can feed it to either map or pmap The latter will process it in parallel. filter and reduce are also both really useful; so you might want to see if you can express your logic in those terms.
You might also want to look into the concurrency utilities in basic Java rather than spawning your own threads.
Related
I am new to ELM and I am trying to wrap my head around asynchronous computations.
I am trying to write an ELM program to automate radiology report generation.
This program uses a graph data structure to store the state of a hierarchical tree of toggles, radio, and edit boxes that the user can edit (add, delete, move around) the toggles.
Individual toggles can also have their behavior modified so that toggling one ON, will show or hide other toggles.
As an example, when you toggled Chest ON, the lung would show, and the liver would be hidden. I do this by using the edges of the graph in the update function, which then is interpreted and rendered in the view function.
I then use a look-up table of sorts to compute the English phrase corresponding to a set of specific ON or OFF toggles, like liver, cyst, 3, mm, segment_4.
This works fine, but computing the English phrases is computationally expensive and it blocks my UI thread.
I have been playing with Tasks and Process.swap but I am misunderstanding them somehow.
I thought that I could change my ToggleOnOff message implementation so that I could spawn the long running implementation, return immediately my new state (i.e., (model, Cmd msg) ), and set the spawned processed to call the update again with the English text string when it finished with the computation. I have failed completely in implementing this.
I have a generateReport function which I turned into a Task using Task.succeed.
Than I tried to spawn this generateReportTask and chain it using Task.andThen to the msg UpdateReportFullText. And then I would need to return a new model imediately with the state of the toggles, before the long running generateReportTask runs and calls update on its own to add the english text to the model.
I believe I am still thinking about this imperatively and that I am misunderstanding FP and ELM. My experience with ELM has always shown me that the ELM way is always shorter and clearer than my clumsy imperative-style noob attempts.
Can someone please educate me and help a poor fellow see the light?
Thank you
(Sorry for the dry no code question but my program is hard to reproduce in a few lines...)
Long running computations are a bit tricky on the web platform, especially if you want them not to block the user interface.
In practice you have two options:
Use Web Workers. This is a technology that actually spins up a new process that can run entirely in parallel with your UI process. Unfortunately Web Workers are basically an entirely separate program that communicates with your main program via (mostly) serialised messages. This means that the computation you perform has to be accelerated more than the serialization/deserialization ends up costing.
In Elm, you would use a Platform.worker program, then add ports and wire things together in JavaScript. This article seems to describe this in some detail.
You can chunk up your computation into small bits and perform each small bit per frame. This can work even better in some cases where there are intermediate results that can be rendered, although this isn't necessary. You can see an example I wrote here (notice how it actually takes a few seconds to compute the layout of the graph, but it just looks like a neat little animation).
I could not find detailed documentation about the #async macro. From the docs about parallelism I understand that there is only one system thread used inside a Julia process and there is explicit task switching going on by the help of the yieldto function - correct me if I am wrong about this.
For me it is difficult to understand when exactly these task switches happen just by looking at the code, and knowing when it happens seems crucial.
As I understand a yieldto somewhere in the code (or in some function called by the code) needs to be there to ensure that the system is not stuck with only one task.
For example when there is a read operation, inside the read there probably is a wait call and in the implementation of wait there probably is a yieldto call. I thought that without the yieldto call the code would stuck in one task; however running the following example seems to prove this hypothesis wrong.
#async begin # Task A
while true
println("A")
end
end
while true # Task B
println("B")
end
This code produces the following output
BA
BA
BA
...
It is very unclear to me where the task switching happens inside the task created by the #async macro in the code above.
How can I tell about looking at some code the points where task switching happens?
The task switch happens inside the call to println("A"), which at some point calls write(STDOUT, "A".data). Because isa(STDOUT, Base.AsyncStream) and there is no method that is more specialized, this resolves to:
write{T}(s::AsyncStream,a::Array{T}) at stream.jl:782
If you look at this method, you will notice that it calls stream_wait(ct) on the current task ct, which in turn calls wait().
(Also note that println is not atomic, because there is a potential wait between writing the arguments and the newline.)
You could of course determine when stuff like that happens by looking at all the code involved. But I don't see why you would need to know this exactly, because, when working with parallelism, you should not depend on processes not switching context anyway. If you depend on a certain execution order, synchronize explicitly.
(You already kind of noted this in your question, but let me restate it here: As a rule of thumb, when using green threads, you can expect potential context switches when doing IO, because blocking for IO is a textbook example of why green threads are useful in the first place.)
I'm currently writing a programme which has a function to hash a number of files in the background. I've read the Qt4 documentation a number of times over and I still can't really figure out which threading option is best for this.
http://doc.qt.io/qt-5/thread-basics.html
There's really no need to update the GUI when it's done with each file, I just don't wish to block the GUI and I really only need a single signal/slot connection upon completion. I'm thinking of extending QThread for a hashing thread. Does this sound reasonable/right?
I have this article bookmarked as it nicely illustrates the use of QThread and highlights some common misconceptions about it. Sample code available, which runs without blocking the GUI. Sample is hosted on RapidShare, but they seem to have implemented some sort of timed waiting period since I last used it.
This sounds like a good place to use the QtConcurrent::map() function. The map function can apply the same operation to a container of objects, in your case, files. Once you start the map function, you can create a QFutureWatcher and connect to its finished signal to be notified when all of the work is done.
I have some high performance file transfer code which I wrote in C# using the Async Programming Model (APM) idiom (eg, BeginRead/EndRead). This code reads a file from a local disk and writes it to a socket.
For best performance on modern hardware, it's important to keep more than one outstanding I/O operation in flight whenever possible. Thus, I post several BeginRead operations on the file, then when one completes, I call a BeginSend on the socket, and when that completes I do another BeginRead on the file. The details are a bit more complicated than that but at the high level that's the idea.
I've got the APM-based code working, but it's very hard to follow and probably has subtle concurrency bugs. I'd love to use TPL for this instead. I figured Task.Factory.FromAsync would just about do it, but there's a catch.
All of the I/O samples I've seen (most particularly the StreamExtensions class in the Parallel Extensions Extras) assume one read followed by one write. This won't perform the way I need.
I can't use something simple like Parallel.ForEach or the Extras extension Task.Factory.Iterate because the async I/O tasks don't spend much time on a worker thread, so Parallel just starts up another task, resulting in potentially dozens or hundreds of pending I/O operations; way too much! You can work around that by Waiting on your tasks, but that causes creation of an event handle (a kernel object), and a blocking wait on a task wait handle, which ties up a worker thread. My APM-based implementation avoids both of those things.
I've been playing around with different ways to keep multiple read/write operations in flight, and I've managed to do so using continuations that call a method that creates another task, but it feels awkward, and definitely doesn't feel like idiomatic TPL.
Has anyone else grappled with an issue like this with the TPL? Any suggestions?
If you're worried about too many threads, you can just set ParallelOptions.MaxDegreeOfParallelism to an acceptable number in your call to Parallel.ForEach.
I was reading Joel On Software today and ran across this quote:
Without understanding functional
programming, you can't invent
MapReduce, the algorithm that makes
Google so massively scalable. The
terms Map and Reduce come from Lisp
and functional programming. MapReduce
is, in retrospect, obvious to anyone
who remembers from their
6.001-equivalent programming class that purely functional programs have
no side effects and are thus trivially
parallelizable.
What does he mean when he says functional programs have no side effects? And how does this make parallelizing trivial?
What does he mean when he says
functional programs have no side
effects?
Most people think of programming as creating variables, assigning them values, adding things to lists, etc. Variables "vary", hence the name.
Functional programming is a style of designing programs to eliminate variables -- everything is a constant or readonly.
When Joel says functional programs have no side-effects, there's a lot of hand-waving involved since its perfectly easy to write functional programs which do modify variables -- but largely, when people talk about functional programming, they mean programs which don't hold any modifiable state.
"But Juliet! How can write a useful program if it can't modify anything"
Good question!
You "modify" things by creating a new instance of your object with modified state. For example:
class Customer
{
public string Id { get; private set; }
public string Name { get; private set; }
public Customer(string id, string name)
{
this.Id = id;
this.Name = name;
}
public Customer SetName(string name)
{
// returns a new customer with the given name
return new Customer(this.Id, name);
}
}
So all the initialization take place in the constructor, and we can't modify the object ever again -- we create new instances with our modifications passed into the constructor.
You'll be surprised how far you can carry this style of programming.
"But Juliet!? How can this possibly be efficient with all this copying?"
The trick is realizing that you don't have to copy your entire object graph, only the parts which have changed. If parts of your object graph haven't changed, can reuse it in your new object (copy the pointer, don't new up a new instance of any objects in that part of the graph).
You'll be surprised how far you can carry this style of programming. In fact, its extremely easy to write immutable versions of many common data structures -- like immutable Avl Trees, red-black trees, many kinds of heaps, etc. See here for an implementation of an immutable treap.
In most cases, the immutable version of a data structure has the same computational complexity for insert/lookup/delete as its mutable counterparts. The only difference is that inserting returns a new version of your data structure without modifying the original one.
And how does this make parallelizing
trivial?
Think about it: if you have an immutable tree or any other data structure, then you can two threads inserting, removing, and lookup up items in the tree without needing to take a lock. Since the tree is immutable, its not possible for one thread to put the object in an invalid state under another thread's nose -- so we eliminate a whole class of multithreading errors related to race conditions. Since we don't have race-conditions, we don't have any need for locks, so we also eliminate a whole class of errors related to deadlocking.
Because immutable objects are intrinsically thread-safe, they're said to make concurrency "trivial". But that's only really half the story. There are times when we need changes in one thread to be visible to another - so how do we do that with immutable objects?
The trick is to re-think our concurrency model. Instead of having two threads sharing state with one another, we think of threads as being a kind of mailbox which can send and receive messages.
So if thread A has a pointer to thread B, it can pass a message -- the updated data structure -- to thread B, where thread B merges its copy with the data structure with the copy in the message it received. Its also possible for a thread to pass itself as a message, so that Thread A sends itself to Thread B, then thread B sends a message back to Thread A via the pointer it received.
Believe me, the strategy above makes concurrent programming 1000x easier than locks on mutable state. So the important part of Joel's comment: "Without understanding functional programming, you can't invent MapReduce, the algorithm that makes Google so massively scalable."
Traditional locking doesn't scale well because, in order to lock an object, you need to have a reference to its pointer -- the locked object needs to be in the same memory as the object doing the locking. You can't obtain a lock on an object across processes.
But think about the message passing model above: threads are passing messages two and from one another. Is there really a difference between passing a message to a thread in the same process vs passing a message to thread listening on some IP address? Not really. And its exactly because threads can send and receive messages across the process boundary that message passing scales as well as it does, because its not bound to a single machine, you can have your app running across as many machines as needed.
(For what its worth, you can implement message passing using mutable messages, its just that no one ever wants to because a thread can't do anything to the message without locking it -- which we already know is full of problems. So immutable is the default way to go when you're using message passing concurrency.)
Although its very high level and glosses over a lot of actual implementation detail, the principles above are exactly how Google's MapReduce can scale pretty much indefinitely.
See also: http://www.defmacro.org/ramblings/fp.html
Let me wikipedia it for you
In brief, a pure function is one that calculate things based only on its given arguments and returns a result.
Writing something to the screen or changing a global variable (or a data member) is a side effect. Relying on data other than that given in an argument also makes your function non-pure although it is not a side effect.
Writing a "pure function" makes it easier to invoke many instances of it in parallel. That's mainly because being pure, you can be sure it doesn't effect the outside world and doesn't rely on outside information.
Functional programming aims to create functions that are dependent only on their inputs, and do not change state elsewhere in the system (ie, do not have side-effects to their execution).
This means, among other things, that they are idempotent: the same function can be run many times over the same input, and since it has no side-effects you don't care how many times it's run. This is good for parallelization, because it means that you don't have to create a lot of overhead to keep track of whether a particular node crashes.
Of course, in the real world, it's hard to keep side-effects out of your programs (ie, writing to a file). So real-world programs tend to be a combination of functional and non-functional portions.
Units of functional programs have only their input and their output, no internal state. This lack of internal state means that you can put the functional modules on any number of cores/nodes, without having to worry about having the previous calculation in the module affecting the next.
I believe what he means is that purely functional code makes explicit the flow of data through the program. Side-effects allow portions of the code to "communicate" in ways that are difficult to analyze.
Without side-effects in play, the runtime environment can determine how to best decompose the code into parallelism according to the structure of the functional code.
This would be a simplification of the reality, because there is also an issue of decomposing the code into "chunks" which amount to approximately equal "effort." This requires a human to write the functional code in such a way that it will decompose reasonably when parallelized.