Clojure - executing a bunch of HTTP requests in parallel - pmap?

Clojure - executing a bunch of HTTP requests in parallel - pmap? - asynchronous

I need to make 200 or so HTTP requests. I want them to run in parallel, or batches, and I'm not sure where to start for doing this in Clojure. pmap appears to have the effect I want, for example, using http.async.client:
(defn get-json [url]
(with-open [client (http/create-client)]
(let [resp (http/GET client url)]
(try
(println 1)
(http/string (http/await resp))
(println "********DONE*********")
nil
(catch Exception e (println e) {})))))
music.core=> (pmap get-json [url url2])
1
1
********DONE*********
********DONE*********
(nil nil)
But I can't prove that the requests are actually executing in parallel. Do I need to call into the JVM's Thread APIs? I'm searching around and coming up with other libraries like Netty, Lamina, Aleph - should I be using one of these? Please just point me in the right direction for learning about the best practice/simplest solution.

Ideally you don't want to tie up a thread waiting for the result of each http request, so pmap or other thread-based approaches aren't really a good idea.
What you really want to do is:
Fire off all the requests asynchronously
Wait for the results with just one thread
My suggested approach is to use http-kit to fire off all the asynchronous requests at once, producing a sequence of promises. You then just need to dereference all these promises in a single thread, which will block the thread until all results are returned.
Something like:
(require '[org.httpkit.client :as http])
(let [urls (repeat 100 "http://google.com") ;; insert your URLs here
promises (doall (map http/get urls))
results (doall (map deref promises))]
#_do_stuff_with_results
(first results))

What you're describing is a perfectly good use of pmap and I'd approach it in similar fashion.
As far as 'proving' that it runs in parallel, you have to trust that each iteration of pmap runs the function in a new thread. However a simple way to be certain is simply print the thread id as a sanity check:
user=> (defn thread-id [_] (.getId (Thread/currentThread)))
user=> (pmap thread-id [1 2 3])
(53 11 56)
As the thread numbers are in fact different - meaning clojure is creating a new thread each time - you can safely trust the JVM will run your code in parallel.
Also have a look at other parallel functions such as pvalues and pcalls. They give you different semantics and might be the right answer depending on the problem at hand.

Take a look at Claypoole. Example code:
(require '[com.climate.claypoole :as cp])
;; Run with a thread pool of 100 threads, meaning up to 100 HTTP calls
;; will run simultaneously. with-shutdown! ensures the thread pool is
;; closed afterwards.
(cp/with-shutdown! [pool (cp/threadpool 100)
(cp/pmap pool get-json [url url2]))
The reason you should prefer com.climate.claypoole/pmap over clojure.core/pmap in this case is that the latter sets the number of threads based on the number of CPUs, with no way of overriding. For networking and other I/O operations that aren't CPU bound, you typically want to set the number of threads based on the desired amount of I/O, not based on CPU capacity.
Or use a non-blocking client like http-kit that doesn't require one thread per connection, as suggested by mikera.

Related

MPI - Equivalent of MPI_SENDRCV with asynchronous functions

I know that MPI_SENDRECV allow to overcome the problem of deadlocks (when we use the classic MPI_SEND and MPI_RECV functions).
I would like to know if MPI_SENDRECV(sent_to_process_1, receive_from_process_0) is equivalent to:
MPI_ISEND(sent_to_process_1, request1)
MPI_IRECV(receive_from_process_0, request2)
MPI_WAIT(request1)
MPI_WAIT(request2)
with asynchronous MPI_ISEND and MPI_RECV functions?
From I have seen, MPI_ISEND and MPI_RECV creates a fork (i.e. 2 processes). So if I follow this logic, the first call of MPI_ISEND generates 2 processes. One does the communication and the other calls MPI_RECV which forks itself 2 processes.
But once the communication of first MPI_ISEND is finished, does the second process call MPI_IRECV again? With this logic, the above equivalent doesn't seem to be valid...
Maybe I should change to this:
MPI_ISEND(sent_to_process_1, request1)
MPI_WAIT(request1)
MPI_IRECV(receive_from_process_0, request2)
MPI_WAIT(request2)
But I think that it could be create also deadlocks.
Anyone could give to me another solution using MPI_ISEND, MPI_IRECV and MPI_WAIT to get the same behaviour of MPI_SEND_RECV?

There's some dangerous lines of thought in the question and other answers. When you start a non-blocking MPI operation, the MPI library doesn't create a new process/thread/etc. You're thinking of something more like a parallel region of OpenMP I believe, where new threads/tasks are created to do some work.
In MPI, starting a non-blocking operation is like telling the MPI library that you have some things that you'd like to get done whenever MPI gets a chance to do them. There are lots of equally valid options for when they are actually completed:
It could be that they all get done later when you call a blocking completion function (like MPI_WAIT or MPI_WAITALL). These functions guarantee that when the blocking completion call is done, all of the requests that you passed in as arguments are finished (in your case, the MPI_ISEND and the MPI_IRECV). Regardless of when the operations actually take place (see next few bullets), you as an application can't consider them done until they are actually marked as completed by a function like MPI_WAIT or MPI_TEST.
The operations could get done "in the background" during another MPI operation. For instance, if you do something like the code below:
MPI_Isend(..., MPI_COMM_WORLD, &req[0]);
MPI_Irecv(..., MPI_COMM_WORLD, &req[1]);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Waitall(2, req);
The MPI_ISEND and the MPI_IRECV would probably actually do the data transfers in the background during the MPI_BARRIER. This is because as an application, you are transferring "control" of your application to the MPI library during the MPI_BARRIER call. This lets the library make progress on any ongoing MPI operation that it wants. Most likely, when the MPI_BARRIER is complete, so are most other things that finished first.
Some MPI libraries allow you to specify that you want a "progress thread". This tells the MPI library to start up another thread (not that thread != process) in the background that will actually do the MPI operations for you while your application continues in the main thread.
Remember that all of these in the end require that you actually call MPI_WAIT or MPI_TEST or some other function like it to ensure that your operation is actually complete, but none of these spawn new threads or processes to do the work for you when you call your nonblocking functions. Those really just act like you stick them on a list of things to do (which in reality, is how most MPI libraries implement them).
The best way to think of how MPI_SENDRECV is implemented is to do two non-blocking calls with one completion function:
MPI_Isend(..., MPI_COMM_WORLD, &req[0]);
MPI_Irecv(..., MPI_COMM_WORLD, &req[1]);
MPI_Waitall(2, req);

How I usually do this on node i communicating with node i+1:
mpi_isend(send_to_process_iPlus1, requests(1))
mpi_irecv(recv_from_process_iPlus1, requests(2))
...
mpi_waitall(2, requests)
You can see how ordering your commands this way with non-blocking communication allows you (during the ... above) to perform any computation that does not rely on the send/recv buffers to be done during your communication. Overlapping computation with communication is often crucial for maximizing performance.
mpi_send_recv on the other hand (while avoiding any deadlock issues) is still a blocking operation. Thus, your program must remain in that routine during the entire send/recv process.
Final points: you can initialize more than 2 requests and wait on all of them the same way using the above structure as dealing with 2 requests. For instance, it's quite easy to start communication with node i-1 as well and wait on all 4 of the requests. Using mpi_send_recv you must always have a paired send and receive; what if you only want to send?

using core.async with blocking clients/drivers : are there performance benefits?

I'm programming a web application backend in Clojure, using among other things:
http-kit as an HTTP server and client (nonblocking)
monger as my DB driver (blocking)
clj-aws-s3 as an S3 client (blocking)
I am aware of the performance benefits of event-driven, non-blocking stacks like the ones you find on NodeJS and the Play Framework (this question helped me), and how it yields a much better load capacity. For that reason, I'm considering making my backend asynchronous using core.async.
My question is : Can you recreate the performance benefits of non-blocking web stacks by using core.async on top of blocking client/driver libraries?
Elaborating:
What I'm currently doing are the usual synchronous calls :
(defn handle-my-request [req]
(let [data1 (db/findData1)
data2 (db/findData2)
data3 (s3/findData3)
result (make-something-of data1 data2 data3)]
(ring.util.response/response result))
)
What I plan to do is wrapping any call involving IO in a thread block, and synchronize this inside a go block,
(defn handle-my-request! [req resp-chan] ;; resp-chan is a core.async channel through which the response must be pushed
(go
(let [data1-ch (thread (db/findData1)) ;; spin of threads to fetch the data (involves IO)
data2-ch (thread (db/findData2))
data3-ch (thread (s3/findData3))
result (make-something-of (<! data1-ch) (<! data2-ch) (<! data3-ch))] ;; synchronize
(->> (ring.util.response/response result)
(>! resp-chan)) ;; send response
)))
Is there a point doing it that way?
I'm doing this because that's kind of the best practices I found, but their performance benefits are still a mystery to me. I thought the issue with synchronous stacks was that they use one thread per request. Now it seems they use more than one.
Thanks in advance for your help, have a beautiful day.

The benefit from your example is that findData1,2 and 3 are run in parallel, which can decrease the response time at the cost of using more threads.
In my experience, what usually happens is that the call to findData2 depends on the results of findData1, and findData3 depends on the results of findData2, which means that the calls cannot be parallelized, on which case there is no point on using core.async

The simple answer is no, you're not going to increase capacity at all this way. If you've got memory to hold 100 threads, then you've got 300 "thread seconds" of capacity for each 3-second interval. So, say each of your blocks takes one second to execute. It doesn't matter if each request runs synchronously, holding the thread for the full three seconds, or blockingly-asynchronously, holding a thread for one second three times, you're never going to serve more than 100 requests per three seconds.
However if you make one step asynchronous, then suddenly your code needs only two thread-seconds per request, so you can now serve 300/2=150 requests per three seconds.
The more complicated answer is it might make it better or worse, depending on how your client or web server handles timeouts, how quickly/often clients retry the request, how parallelizable your code is, how expensive thread swapping is, etc. If you try to do 200 requests in the synchronous implementation, then 100 will get through after 3 secs and the remaining 100 will get through in 6 secs. In the async implementation, since they're all competing for threads at various async junctures, most of them will take 5-6 secs to complete. So that's that. But if the blocks are parallelizable, then some requests may complete in just one second, so that's that too.
So on the very edge it kind of depends, but ultimately the capacity is thread-seconds, and by that standard sync or blocking-async, it is all the same. It's not Clojure specific, and there are certainly plenty of more in-depth resources out there detailing all the edge cases than what I've provided here.

What does (SB-SYS:WAIT-UNTIL-FD-USABLE 6 :INPUT NIL NIL) do?

I am curently using sbcl 1.0.57.0 and my program generates constant output at the shell, until at a certain point my program freezes without any clue whatsoever.
C-c and down, reveals the last call to be:
(SB-SYS:WAIT-UNTIL-FD-USABLE 6 :INPUT NIL NIL)
I restarted my program and tried this again, and again the program freezes and a C-c down reveals this call to be the last one. After a while, I didn't measure the exact time but it should roughly be around 5min, the program continues for a short period of time and then freezes again.
To put the call into context:
The first familiar call in the trace is drakma:http-request. The complete call used by itself does not result in a freeze, though.
Now I wonder what this call actually does and if this could be the reason the program freezes?
As the second part of this question would be requesting you to be clairvoyant if the call has nothing to do with my problems my final question is: What does this call do?

(describe 'sb-sys:wait-until-fd-usable) gives:
WAIT-UNTIL-FD-USABLE names a compiled function:
Lambda-list: (FD DIRECTION &OPTIONAL TIMEOUT)
[...]
Documentation:
Wait until FD is usable for DIRECTION. DIRECTION should be either :INPUT
or :OUTPUT
TIMEOUT, if supplied, is the number of seconds to wait before giving up.
The intention of the call seems to be to wait (without any timeout)
until the file descriptor 6 is usable, but can the problem be that
the function is called with 4 arguments while it expects 2 or 3?

Though technically I did not ask this I wanted to add my newest information on this topic in case anybody else is looking for this problem in a similar context.
The problem arose using drakma:http-request. I was now able to write a proof of concept for a specific request and posted this as an issue at the drakma github page. It appears that drakma does not provide a timeout in sbcl thus the (SB-SYS:WAIT-UNTIL-FD-USABLE 6 :INPUT NIL NIL) waits on information to arrive without a timeout in this case:
https://github.com/edicl/drakma/issues/67

Tail calls in architectures without a call stack

My answer for a recent question about GOTOs and tail recursion was phrased in terms of a call stack. I'm worried that it wasn't sufficiently general, so I ask you: how is the notion of a tail call (or equivalent) useful in architectures without a call stack?
In continuation passing, all called functions replace the calling function, and are thus tail calls, so "tail call" doesn't seem to be a useful distinction. In messaging and event based architectures, there doesn't seem to be an equivalent, though please correct me if I'm wrong. The latter two architectures are interesting cases as they are associated with OOP, rather than FP. What about other architectures? Were the old Lisp machines based on call-stacks?
Edit: According to "What the heck is: Continuation Passing Style (CPS)" (and Alex below), the equivalent of a tail call under continuation passing is not "called function replaces calling function" but "calling function passes the continuation it was given, rather than creating a new continuation". This type of tail call is useful, unlike what I asserted.
Also, I'm not interested in systems that use call stacks at a lower level, for the higher level doesn't require a call stack. This restriction doesn't apply to Alex's answer because he's writing about the fact that other invocation architectures (is this the right term?) often have a call stack equivalent, not that they have a call stack somewhere under the hood. In the case of continuation passing, the structure is like an arborescence, but with edges in the opposite direction. Call stack equivalents are highly relevant to my interests.

"Architectures without a call stack" typically "simulate" one at some level -- for example, back in the time of IBM 360, we used the S-Type Linkage Convention using register-save areas and arguments-lists indicated, by convention, by certain general-purpose registers.
So "tail call" can still matter: does the calling function need to preserve the information necessary to resume execution after the calling point (once the called function is finished), or does it know there IS going to be no execution after the calling point, and so simply reuse its caller's "info to resume execution" instead?
So for example a tail call optimization might mean not appending the continuation needed to resume execution on whatever linked list is being used for the purpose... which I like to see as a "call stack simulation" (at some level, though it IS obviously a more flexible arrangement -- don't want to have continuation-passing fans jumping all over my answer;-).

On the off chance that this question interests someone other than me, I have an expanded answer for the other question that also answers this one. Here's the nutshell, non-rigorous version.
When a computational system performs sub-computations (i.e. a computation starts and must pause while another computation is performed because the first depends on the result of the second), a dependency relation between execution points naturally arises. In call-stack based architectures, the relation is topologically a path graph. In CPS, it's a tree, where every path between the root and a node is a continuation. In message passing and threading, it's a collection of path graphs. Synchronous event handling is basically message passing. Starting a sub-calculation involves extending the dependency relation, except in a tail call which replaces a leaf rather than appending to it.
Translating tail calling to asynchronous event handling is more complex, so instead consider a more general version. If A is subscribed to an event on channel 1, B is subscribed to the same event on channel 2 and B's handler merely fires the event on channel 1 (it translates the event across channels), then A can be subscribed to the event on channel 2 instead of subscribing B. This is more general because the equivalent of a tail call requires that
A's subscription on channel 1 be canceled when A is subscribed on channel 2
the handlers are self-unsubscribing (when invoked, they cancel the subscription)
Now for two systems that don't perform sub-computations: lambda calculus (or term rewriting systems in general) and RPN. For lambda calculus, tail calls roughly correspond to a sequence of reductions where the term length is O(1) (see iterative processes in SICP section 1.2). Take RPN to use a data stack and an operations stack (as opposed to a stream of operations; the operations are those yet to be processed), and an environment that maps symbols to a sequence of operations. Tail calls could correspond to processes with O(1) stack growth.

I just don't get continuations!

What are they and what are they good for?
I do not have a CS degree and my background is VB6 -> ASP -> ASP.NET/C#. Can anyone explain it in a clear and concise manner?

Imagine if every single line in your program was a separate function. Each accepts, as a parameter, the next line/function to execute.
Using this model, you can "pause" execution at any line and continue it later. You can also do inventive things like temporarily hop up the execution stack to retrieve a value, or save the current execution state to a database to retrieve later.

You probably understand them better than you think you did.
Exceptions are an example of "upward-only" continuations. They allow code deep down the stack to call up to an exception handler to indicate a problem.
Python example:
try:
broken_function()
except SomeException:
# jump to here
pass
def broken_function():
raise SomeException() # go back up the stack
# stuff that won't be evaluated
Generators are examples of "downward-only" continuations. They allow code to reenter a loop, for example, to create new values.
Python example:
def sequence_generator(i=1):
while True:
yield i # "return" this value, and come back here for the next
i = i + 1
g = sequence_generator()
while True:
print g.next()
In both cases, these had to be added to the language specifically whereas in a language with continuations, the programmer can create these things where they're not available.

A heads up, this example is not concise nor exceptionally clear. This is a demonstration of a powerful application of continuations. As a VB/ASP/C# programmer, you may not be familiar with the concept of a system stack or saving state, so the goal of this answer is a demonstration and not an explanation.
Continuations are extremely versatile and are a way to save execution state and resume it later. Here is a small example of a cooperative multithreading environment using continuations in Scheme:
(Assume that the operations enqueue and dequeue work as expected on a global queue not defined here)
(define (fork)
(display "forking\n")
(call-with-current-continuation
(lambda (cc)
(enqueue (lambda ()
(cc #f)))
(cc #t))))
(define (context-switch)
(display "context switching\n")
(call-with-current-continuation
(lambda (cc)
(enqueue
(lambda ()
(cc 'nothing)))
((dequeue)))))
(define (end-process)
(display "ending process\n")
(let ((proc (dequeue)))
(if (eq? proc 'queue-empty)
(display "all processes terminated\n")
(proc))))
This provides three verbs that a function can use - fork, context-switch, and end-process. The fork operation forks the thread and returns #t in one instance and #f in another. The context-switch operation switches between threads, and end-process terminates a thread.
Here's an example of their use:
(define (test-cs)
(display "entering test\n")
(cond
((fork) (cond
((fork) (display "process 1\n")
(context-switch)
(display "process 1 again\n"))
(else (display "process 2\n")
(end-process)
(display "you shouldn't see this (2)"))))
(else (cond ((fork) (display "process 3\n")
(display "process 3 again\n")
(context-switch))
(else (display "process 4\n")))))
(context-switch)
(display "ending process\n")
(end-process)
(display "process ended (should only see this once)\n"))
The output should be
entering test
forking
forking
process 1
context switching
forking
process 3
process 3 again
context switching
process 2
ending process
process 1 again
context switching
process 4
context switching
context switching
ending process
ending process
ending process
ending process
ending process
ending process
all processes terminated
process ended (should only see this once)
Those that have studied forking and threading in a class are often given examples similar to this. The purpose of this post is to demonstrate that with continuations you can achieve similar results within a single thread by saving and restoring its state - its continuation - manually.
P.S. - I think I remember something similar to this in On Lisp, so if you'd like to see professional code you should check the book out.

One way to think of a continuation is as a processor stack. When you "call-with-current-continuation c" it calls your function "c", and the parameter passed to "c" is your current stack with all your automatic variables on it (represented as yet another function, call it "k"). Meanwhile the processor starts off creating a new stack. When you call "k" it executes a "return from subroutine" (RTS) instruction on the original stack, jumping you back in to the context of the original "call-with-current-continuation" ("call-cc" from now on) and allowing your program to continue as before. If you passed a parameter to "k" then this becomes the return value of the "call-cc".
From the point of view of your original stack, the "call-cc" looks like a normal function call. From the point of view of "c", your original stack looks like a function that never returns.
There is an old joke about a mathematician who captured a lion in a cage by climbing into the cage, locking it, and declaring himself to be outside the cage while everything else (including the lion) was inside it. Continuations are a bit like the cage, and "c" is a bit like the mathematician. Your main program thinks that "c" is inside it, while "c" believes that your main program is inside "k".
You can create arbitrary flow-of-control structures using continuations. For instance you can create a threading library. "yield" uses "call-cc" to put the current continuation on a queue and then jumps in to the one on the head of the queue. A semaphore also has its own queue of suspended continuations, and a thread is rescheduled by taking it off the semaphore queue and putting it on to the main queue.

Basically, a continuation is the ability for a function to stop execution and then pick back up where it left off at a later point in time. In C#, you can do this using the yield keyword. I can go into more detail if you wish, but you wanted a concise explanation. ;-)

I'm still getting "used" to continuations, but one way to think about them that I find useful is as abstractions of the Program Counter (PC) concept. A PC "points" to the next instruction to execute in memory, but of course that instruction (and pretty much every instruction) points, implicitly or explicitly, to the instruction that follows, as well as to whatever instructions should service interrupts. (Even a NOOP instruction implicitly does a JUMP to the next instruction in memory. But if an interrupt occurs, that'll usually involve a JUMP to some other instruction in memory.)
Each potentially "live" point in a program in memory to which control might jump at any given point is, in a sense, an active continuation. Other points that can be reached are potentially active continuations, but, more to the point, they are continuations that are potentially "calculated" (dynamically, perhaps) as a result of reaching one or more of the currently active continuations.
This seems a bit out of place in traditional introductions to continuations, in which all pending threads of execution are explicitly represented as continuations into static code; but it takes into account the fact that, on general-purpose computers, the PC points to an instruction sequence that might potentially change the contents of memory representing a portion of that instruction sequence, thus essentially creating a new (or modified, if you will) continuation on the fly, one that doesn't really exist as of the activations of continuations preceding that creation/modification.
So continuation can be viewed as a high-level model of the PC, which is why it conceptually subsumes ordinary procedure call/return (just as ancient iron did procedure call/return via low-level JUMP, aka GOTO, instructions plus recording of the PC on call and restoring of it on return) as well as exceptions, threads, coroutines, etc.
So just as the PC points to computations to happen in "the future", a continuation does the same thing, but at a higher, more-abstract level. The PC implicitly refers to memory plus all the memory locations and registers "bound" to whatever values, while a continuation represents the future via the language-appropriate abstractions.
Of course, while there might typically be just one PC per computer (core processor), there are in fact many "active" PC-ish entities, as alluded to above. The interrupt vector contains a bunch, the stack a bunch more, certain registers might contain some, etc. They are "activated" when their values are loaded into the hardware PC, but continuations are abstractions of the concept, not PCs or their precise equivalent (there's no inherent concept of a "master" continuation, though we often think and code in those terms to keep things fairly simple).
In essence, a continuation is a representation of "what to do next when invoked", and, as such, can be (and, in some languages and in continuation-passing-style programs, often is) a first-class object that is instantiated, passed around, and discarded just like most any other data type, and much like how a classic computer treats memory locations vis-a-vis the PC -- as nearly interchangeable with ordinary integers.

In C#, you have access to two continuations. One, accessed through return, lets a method continue from where it was called. The other, accessed through throw, lets a method continue at the nearest matching catch.
Some languages let you treat these statements as first-class values, so you can assign them and pass them around in variables. What this means is that you can stash the value of return or of throw and call them later when you're really ready to return or throw.
Continuation callback = return;
callMeLater(callback);
This can be handy in lots of situations. One example is like the one above, where you want to pause the work you're doing and resume it later when something happens (like getting a web request, or something).
I'm using them in a couple of projects I'm working on. In one, I'm using them so I can suspend the program while I'm waiting for IO over the network, then resume it later. In the other, I'm writing a programming language where I give the user access to continuations-as-values so they can write return and throw for themselves - or any other control flow, like while loops - without me needing to do it for them.

Think of threads. A thread can be run, and you can get the result of its computation. A continuation is a thread that you can copy, so you can run the same computation twice.

Continuations have had renewed interest with web programming because they nicely mirror the pause/resume character of web requests. A server can construct a continaution representing a users session and resume if and when the user continues the session.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex