I seem to recall hearing a term for breaking up long-running synchronous code into chunks that are incrementally evaluated over multiple callbacks on the event-queue (to avoid blocking). Does such a term exist? If so, what is it?
The tags on your post indicate you're already familiar with "ansynchronous", "non-blocking", and "event-driven". The only other terms that come to mind are "deferred processing" and "threaded" but those have different mechanics and meanings, and it is important that they are not confused.
Related
My girlfriend was asked the below question in an interview:
We trigger 5 independent APIs simultaneously. Once they have all completed, we want to trigger a function. How will you design a system to do this?
My girlfriend replied she will use a flag variable, but the interviewer was evidently not happy with it.
So, is there a good way in which this could be handled (in a distributed context)? Note that each of the 5 API calls are made by different servers and the function to be triggered is on a 6th server.
The other answers suggesting Promises seem to assume all these requests necessarily come from the same client. If the context here is distributed systems, as you said it is, then I don't think those are valid answers. If they were, then the interview question would have nothing to do with distributed systems, except to essay your girlfriend's ability to recognize something that isn't really a distributed systems problem.
And the question does have the shape of some classic problems in distributed systems. It sounds a lot like YouTube view counting: How do you achieve qualities like atomicity and consistency in a multi-threaded, multi-process, or multi-client environment? Failing to recognize this, thinking the answer could be as simple as "a flag", betrayed a lack of experience in distributed systems.
Another thing about that answer is that it leaves many ambiguities. Where does the flag live? As a variable in another (Java?) API? In a database? In a file? Even in a non-distributed context, these are important questions. And if she had gone on to address these questions, even being innocent of all the distributed systems complications, she might have happily fallen into a discussion of the kinds of D.S. problems that occur when you use, say, a file; and how using a ACID-compliant database might solve those problems, and what the tradeoffs might be there... And she might have corrected herself and said "counter" instead of "flag"!
If I were asked this, my first thought would be to use promises/futures. The idea behind them is that you can execute time-consuming operations asynchronously and they will somehow notify you when they've completed, either successfully or unsuccessfully, typically by calling a callback function. So the first step is to spawn five asynchronous tasks and get five promises.
Then I would join the five promises together, creating a unified promise that represents the five separate tasks. In JavaScript I might call Promise.all(); in Java I would use CompletableFuture.allOf().
I would want to make sure to handle both success and failure. The combined promise should succeed if all of the API calls succeed and fail if any of them fail. If any fail there should be appropriate error handling/reporting. What happens if multiple calls fail? How would a mix of successes and failures be reported? These would be design points to mention, though not necessarily solve during the interview.
Promises and futures typically have modular layering system that would allow edge cases like timeouts to be handled by chaining handlers together. If done right, timeouts could become just another error condition that would be naturally handled by the error handling already in place.
This solution would not require any state to be shared across threads, so I would not have to worry about mutexes or deadlocks or other thread synchronization problems.
She said she would use a flag variable to keep track of the number of API calls have returned.
One thing that makes great interviewees stand out is their ability to anticipate follow-up questions and explain details before they are asked. The best answers are fully fleshed out. They demonstrate that one has thought through one's answer in detail, and they have minimal handwaving.
When I read the above I have a slew of follow-up questions:
How will she know when each API call has returned? Is she waiting for a function call to return, a callback to be called, an event to be fired, or a promise to complete?
How is she causing all of the API calls to be executed concurrently? Is there multithreading, a fork-join pool, multiprocessing, or asynchronous execution?
Flag variables are booleans. Is she really using a flag, or does she mean a counter?
What is the variable tracking and what code is updating it?
What is monitoring the variable, what condition is it checking, and what's it doing when the condition is reached?
If using multithreading, how is she handling synchronization?
How will she handle edge cases such API calls failing, or timing out?
A flag variable might lead to a workable solution or it might lead nowhere. The only way an interviewer will know which it is is if she thinks about and proactively discusses these various questions. Otherwise, the interviewer will have to pepper her with follow-up questions, and will likely lower their evaluation of her.
When I interview people, my mental grades are something like:
S — Solution works and they addressed all issues without prompting.
A — Solution works, follow-up questions answered satisfactorily.
B — Solution works, explained well, but there's a better solution that more experienced devs would find.
C — What they said is okay, but their depth of knowledge is lacking.
F — Their answer is flat out incorrect, or getting them to explain their answer was like pulling teeth.
I have been reading about sagas, their intent, and their usage. BUT - I have two questions that I'd really like some closure on, and then more of an opinion question.
When using Sagas for a simple api call, the boilerplate seems very excessive. If I had 20 api calls, how is that less wieldy than using thunks? Plus, I keep hearing the idea of "side effects" - but I'm unsure how that plays into it all.
I read some blogs that used a pattern that was able to dynamically generate the Sagas to reduce less boilerplate - but couldn't you do that with thunks too? Also, any examples would be great.
Are Sagas still usefull when literally dealing with very simple post or get calls?
Any opinions on redux-sags vs. redux-observables?
thanks!
Disclaimer: I am one of the authors of redux-observable, so my opinions of both redux-saga and redux-observable are tainted with bias
Since you used the term Saga (instead of Epic) I'll assume you're asking in the context of redux-saga (not redux-observable).
In redux-saga, the effects you do, e.g. an AJAX request, aren't actually directly handled inside your generator sagas. Instead, the helpers you use are creating Plain Old JavaScript Objects which represent the effect intent, which you yield and then the redux-saga middleware itself performs the effect internally, hidden from you, providing the results back to your yield, like yourSaga.next(response).
Some like this because your saga generators are truly pure. Because it uses generators to support multiple effects, it makes it easy to test without mocks because you just assert that the effects it yielded are expected. Personally, I found in practice this seemed far cooler than it really is: many times you end up effectively recreating everything the saga does in your test. You are now testing the implementation of the saga is correct not testing the behavior of the saga. Many don't care (or even notice this), but I did. I imagine some even prefer it. This is called "effects as data". FWIW, redux-observable does not use this "effects as data" model, which is the most fundamental difference between it and redux-saga.
Tying this back to how they compare to redux-thunk, the biggest differences are: time-based operations (e.g. debouncing sequential actions) are impractical with redux-thunk alone without major hacks. Speaking of debouncing, it doesn't come with any utilities at all, so you're on the your handling debouncing and other common effects. Testing is much much harder.
These are mostly opinions, however. Certainly very successful applications can (and have) use redux-thunk. https://m.twitter.com comes to mind.
I think it wouldn't be controversial to say that redux-thunk is significantly easier to learn and use for simple request->response AJAX calls, without needing cancellation, etc. In fact, I often recommend users unfamiliar with RxJS use redux-thunk for the simple stuff and only lean on redux-observable for the more complex stuff, so they can remain productive and learn as they go. There's definitely a place for academic "correctness" and beautiful code, but for most people's jobs, shipit™ should be #1 priority. Users don't care how correct our code is, only that it exists and [mostly] works.
Regarding opinions on redux-saga vs. redux-observable, I'm biased because I'm one of the authors of redux-observable, but I summarized some of my thoughts in a previous SO post: Why use Redux-Observable over Redux-Saga? tl;dr they have a similar overall pattern, but redux-saga uses "effects as data" whereas redux-observable uses real effects with RxJS. Pros and cons to both, the primary pro to using RxJS that it is a skill that is a vastly useful skill for things other than redux-observable, and will almost certainly outlive redux-observable/redux-saga so the skill is highly transferable.
Here seems to be the two biggest things I can take from the How to Design Programs (simplified Racket) course I just finished, straight from the lecture notes of the course:
1) Tail call optimization, and the lack thereof in non-functional languages:
Sadly, most other languages do not support TAIL CALL
OPTIMIZATION. Put another way, they do build up a stack
even for tail calls.
Tail call optimization was invented in the mid 70s, long
after the main elements of most languages were developed.
Because they do not have tail call optimization, these
languages provide a fixed set of LOOPING CONSTRUCTS that
make it possible to traverse arbitrary sized data.
a) What are the equivalents to this type of optimization in procedural languages that don't feature it?
b) Do using those equivalents mean we avoid building up a stack in similar situations in languages that don't have it?
2) Mutation and multicore processors
This mechanism is fundamental in almost any other language you
program in. We have delayed introducing it until now for
several reasons:
despite being fundamental, it is surprisingly complex
overuse of it leads to programs that are not amenable
to parallelization (running on multiple processors).
Since multi-core computers are now common, the ability
to use mutation only when needed is becoming more and
more important
overuse of mutation can also make it difficult to
understand programs, and difficult to test them well
But mutable variables are important, and learning this mechanism
will give you more preparation to work with Java, Python and many
other languages. Even in such languages, you want to use a style
called "mostly functional programming".
I learned some Java, Python and C++ before taking this course, so came to take mutation for granted. Now that has been all thrown in the air by the above statement. My questions are:
a) where could I find more detailed information regarding what is suggested in the 2nd bullet, and what to do about it, and
b) what kind of patterns would emerge from a "mostly functional programming" style, as opposed to a more careless style I probably would have had had I continued on with those other languages instead of taking this course?
As Leppie points out, looping constructs manage to recover the space savings of proper tail calling, for the particular kinds of loops that they support. The only problem with looping constructs is that the ones you have are never enough, unless you just hurl the ball into the user's court and force them to model the stack explicitly.
To take an example, suppose you're traversing a binary tree using a loop. It works... but you need to explicitly keep track of the "ones to come back to." A recursive traversal in a tail-calling language allows you to have your cake and eat it too, by not wasting space when not required, and not forcing you to keep track of the stack yourself.
Your question on parallelism and concurrency is much more wide-open, and the best pointers are probably to areas of research, rather than existing solutions. I think that most would agree that there's a crisis going on in the computing world; how do we adapt our mutation-heavy programming skills to the new multi-core world?
Simply switching to a functional paradigm isn't a silver bullet here, either; we still don't know how to write high-level code and generate blazing fast non-mutating run-concurrently code. Lots of folks are working on this, though!
To expand on the "mutability makes parallelism hard" concept, when you have multiple cores going, you have to use synchronisation if you want to modify something from one core and have it be seen consistently by all the other cores.
Getting synchronisation right is hard. If you over-synchronise, you have deadlocks, slow (serial rather than parallel) performance, etc. If you under-synchronise, you have partially-observed changes (where another core sees only a portion of the changes you made from a different core), leaving your objects observed in an invalid "halfway changed" state.
It is for that reason that many functional programming languages encourage a message-queue concept instead of a shared state concept. In that case, the only shared state is the message queue, and managing synchronisation in a message queue is a solved problem.
a) What are the equivalents to this type of optimization in procedural languages that don't feature it? b) Do using those equivalents mean we avoid building up a stack in similar situations in languages that don't have it?
Well, the significance of a tail call is that it can evaluate another function without adding to the call stack, so anything that builds up the stack can't really be called an equivalent.
A tail call behaves essentially like a jump to the new code, using the language trappings of a function call and all the appropriate detail management. So in languages without this optimization, you'd use a jump within a single function. Loops, conditional blocks, or even arbitrary goto statements if nothing else works.
a) where could I find more detailed information regarding what is suggested in the 2nd bullet, and what to do about it
The second bullet sounds like an oversimplification. There are many ways to make parallelization more difficult than it needs to be, and overuse of mutation is just one.
However, note that parallelization (splitting a task into pieces that can be done simultaneously) is not entirely the same thing as concurrency (having multiple tasks executed simultaneously that may interact), though there's certainly overlap. Avoiding mutation is incredibly helpful in writing concurrent programs, since immutable data avoids a lot of race conditions and resource contention that would otherwise be possible.
b) what kind of patterns would emerge from a "mostly functional programming" style, as opposed to a more careless style I probably would have had had I continued on with those other languages instead of taking this course?
Have you looked at Haskell or Clojure? Both are heavily inclined to a very functional style emphasizing controlled mutation. Haskell is more rigorous about it but has a lot of tools for working with limited forms of mutability, while Clojure is a bit more informal and might be more familiar to you since it's another Lisp dialect.
I'm looking for non-trivial resources on concepts of asychronous programming, preferably books but also substantial articles or papers. This is not about the simple examples like passing a callback to an event listener in GUI programming, or having producer-consumer decoupled over a queue, or writing an onload handler for your HTML (although all those are valid). It's about the kind of problems the lighttpd developers might be concerned with, or someone doing substantial business logic in JavaScript that runs in a browser or on node.js. It's about situations where you need to pass a callback to a callback to a callback ... about complex asynchronous control-flows, and staying sane at the same time. I'm looking for concepts that allow you to do this systematically, to reason about this kind of control-flows, to seriously manage a significant amount of logic distributed in deeply nested callbacks, with all its ensuing issues of timing, synchronization, binding of values, passing of contexts, etc.
I wouldn't shrink away from some abstract explorations like continuation-passing-style, linear logic or temporal reasoning. Posts like this seem to go into the right direction, but discuss specific issues rather than a complete theory (E.g. the post mentions the "reactor" pattern, which seems relevant, without describing it).
Thanks.
EDIT:
To give more details about the aspects I'm interested in. I'm interested in a disciplined approach to asynchronous programming, a theory if you will, maybe just a set of specific patterns that I can pass to fellow programmers and say "This is the way we do asynchronous programming" in non-trivial scenarios. I need a theory to disentangle layers of callbacks that randomly fail to work, or produce spurious results. I want an approach which allows me to say "If we do it this way, we can be sure that ...". - Does this make things clearer?
EDIT 2:
As feedback indicates a dependency on the programming language: This will be JavaScript, but maybe it's enough to assume a language that allows higher-order functions.
EDIT 3:
Changed the title to be more specific (although I think design patterns are only one way to look at it; but at least it gives a better direction).
When doing layered callbacks currying is a useful technique.
For more on this you can look at http://en.wikibooks.org/wiki/Haskell/Higher-order_functions_and_Currying and for javascript you can look at http://www.svendtofte.com/code/curried_javascript/.
Basically, if you have multiple layers of callbacks, rather than having one massive parameter list, you can build it up incrementally, so that when you are in a loop calling your function, the various callback functions have already been defined, and passed.
This isn't meant as a complete answer to the question, but I was asked to put this part into an answer, so I did.
After a quick search here is a blog where he shows using currying with callbacks:
http://bjouhier.wordpress.com/2011/04/04/currying-the-callback-or-the-essence-of-futures/
UPDATE:
After reading the edit to the original question, to see design patterns for asynchronous programming, this may be a good diagram:
http://www1.cse.wustl.edu/~schmidt/patterns-ace.html, but there is much more to good asynchronous design, as first-order functions will enable this to be simplified, but, if you are using the MPI library and Fortran then you will have different implementations.
How you approach the design is affected heavily by the language and the technologies involved, that any answer will fall short of being complete.
I am looking to update an application in which I have the ability to update synchronously or asynchronously. For the real-time nature of the app, which currently ranges from a synchronous execution of methods ranging in frequencies from 1-60Hz, do you see any advantages into asynchronously updating due to user input? Or should I wait until the next synchronous cycle to incorporate the change?
My thoughts so far:
The current advantage that I see with introducing an asynchronous update is that if a member in a 1Hz method is updated, the 60Hz method may execute 50+ times with the old value. I know this is still a relatively short amount of time to a user ( < 1 second), but to me the principal of continuing calcs with bad values for 50+ reps seems bad.
The current advantage that I see with keeping it synchronous is the ease of readability for the flow of code execution.
Are there any repercussions I am not thinking of?
It's a little hard to say without more of a sense of your application. In general, I'd say it's preferable to stay synchronous for a real-time application where possible, just because it makes it easier to reason about timeliness (often the hardest thing to reason about.) If you can reasonably make something periodic, make it periodic and thank your lucky stars.
Moving to a partially synchronous or async model does have some advantages. Like you say, it might feel less than aesthetic to continue operating on stale data. But consider: this is a real-time application. Presumably you have a requirement that states what the update latency for data input to your 60Hz task must be. Like in any general purpose computing performance setting, don't go to extra work to do better than that unless it's easy; it's clearer in the implementation; or it becomes necessary to achieve correctness.
So, all that said, there are no hard and fast rules. Make sure your rationale is both written down and reflected in your design.