R parallel backend: What happens when one process faces an exception? - r

I am using foreach + %dopar% to achieve parallelism over multiple cores. I know some of the tasks will face exceptions. When an exception occurs:
Will the remaining tasks that were already parallelly started still complete?
Will the tasks that were not scheduled (I don't know if that's the correct term) be scheduled and eventually complete? If so, will it still be able to utilize all the cores?
I tried finding resources on this, but couldn't find any. Looks like I'm using the wrong keywords. If you have any resources, please direct me to them.

There is a parameter in foreach called .errorhandling, it could have the values of stop (default), remove or pass. their behaviour is like this:
stop: The function will be stopped.
remove: The result of this specific task will not be returned.
pass: The error object will be included with the results.
So addressing your specific question, if you have many task running in parallel and one of the task in one worker raised an exception, then it stop the process and will pass to the next task "scheduled" (that is because of the default value stop). The other task will continue as normal in parallel.
Please see this answer that explains better how are handled the errors in foreach and %dopar%.
I hope this will clarify a little your problem.

Related

Asynchronous gradient descent in tensorflow under concurrent access

I have been having trouble implementing an asynchronous gradient descent in a multithreaded environment.
To describe the skeleton of my code, for each thread,
loop
synchronize with global param
< do some work / accumulate gradients on mini-batch >
apply gradient descent to the global network, specifically,
= self.optimizer.apply_gradients(grads_and_vars)
end
where each thread has its own optimizer.
Now the problem is that, in defining the optimizer with 'use_locking=False', it does Not work, evidenced by the rewards generated by my reinforcement learning agent.
However, when I set the 'use_locking=True', it works so the algorithm is correct; it's just that the local gradients are not applied properly to the global param.
So some possible reasons I thought of were the following:
1. While one thread is updating the global param, when another thread accesses the global param, the former thread cancels all remaining updates. And that too many threads access this global param concurrently, threads do all hard work for nothing.
2. Referring to, How does asynchronous training work in distributed Tensorflow?, reading asynchronously is certain fine in the top of the loop. However, it may be that as soon as the thread is done applying the gradient, it goes to synchronizing from the global param too quickly that it does not fetch the updates from other threads.
Can you, hopefully tensorflow developer, help me what is really happening with 'use_locking' for this specific loop instance?
I have been spending days on this simple example. Although setting use_locking = True does solve the issue, it is not asynchronous in nature and it is also very slow.
I appreciate your help.

Julia: understanding when task switching happens

I could not find detailed documentation about the #async macro. From the docs about parallelism I understand that there is only one system thread used inside a Julia process and there is explicit task switching going on by the help of the yieldto function - correct me if I am wrong about this.
For me it is difficult to understand when exactly these task switches happen just by looking at the code, and knowing when it happens seems crucial.
As I understand a yieldto somewhere in the code (or in some function called by the code) needs to be there to ensure that the system is not stuck with only one task.
For example when there is a read operation, inside the read there probably is a wait call and in the implementation of wait there probably is a yieldto call. I thought that without the yieldto call the code would stuck in one task; however running the following example seems to prove this hypothesis wrong.
#async begin # Task A
while true
println("A")
end
end
while true # Task B
println("B")
end
This code produces the following output
BA
BA
BA
...
It is very unclear to me where the task switching happens inside the task created by the #async macro in the code above.
How can I tell about looking at some code the points where task switching happens?
The task switch happens inside the call to println("A"), which at some point calls write(STDOUT, "A".data). Because isa(STDOUT, Base.AsyncStream) and there is no method that is more specialized, this resolves to:
write{T}(s::AsyncStream,a::Array{T}) at stream.jl:782
If you look at this method, you will notice that it calls stream_wait(ct) on the current task ct, which in turn calls wait().
(Also note that println is not atomic, because there is a potential wait between writing the arguments and the newline.)
You could of course determine when stuff like that happens by looking at all the code involved. But I don't see why you would need to know this exactly, because, when working with parallelism, you should not depend on processes not switching context anyway. If you depend on a certain execution order, synchronize explicitly.
(You already kind of noted this in your question, but let me restate it here: As a rule of thumb, when using green threads, you can expect potential context switches when doing IO, because blocking for IO is a textbook example of why green threads are useful in the first place.)

How to implement a callback/producer/consumers scheme?

I'm warming up with Clojure and started to write a few simple functions.
I'm realizing how the language is clearly well-suited for parallel computation and this got me thinking. I've got an app (written in Java but whatever) that works the following way:
one thread waits for input to come in (filesystem in my case but it could be network or whatever) and puts that input once it arrives on a queue
several consumers fetch data from that queue and process the data in parallel
The code that puts the input to be parallely processed may look like this (it's just an example):
asynchFetchInput( new MyCallBack() {
public void handle( Input input ) {
queue.put(input)
}
})
Where asynchFetchInput would spawn a Thread and then call the callback.
It's really just an example but if someone could explain how to do something similar using Clojure it would greatly help me understand the "bigger picture".
If you have to transform data, you can make it into a seq, then you can feed it to either map or pmap The latter will process it in parallel. filter and reduce are also both really useful; so you might want to see if you can express your logic in those terms.
You might also want to look into the concurrency utilities in basic Java rather than spawning your own threads.

asp.net - How does parallel.for works

How does parallel.for works. Does it invoke threads for each loop/ divide the loops in parts and execute them in parallel? If it does then can we ensure the same result as normal for loop? I tested for performance and it really uses multi core processors. But i want to know the internal working as to how this works
Parallel.For partitions the work for a number of concurrent iterations. Per default it uses the default task scheduler to schedule the iterations, which essentially uses the current thread as well as a number of thread pool threads. There are overloads that will allow you to change this behavior.
A parallel loop may look very similar to a regular loop, but there are actually a number of important differences. First of all the order of the iterations is not guaranteed. I.e. the code cannot assume any specific order. Doing so will lead to unpredictable results.
Also, since the code may be run on multiple threads exception handling is completely different from a regular for loop. Parallel.For will catch exceptions for the threads and marshal these back to the calling thread as inner exceptions in an instance of AggregateException.
For additional details please check Parallel Programming with Microsoft .NET by Microsoft patterns and practices.
parallel.for loops run iterations of the loop in different processes in parallel. You can only use this if iterations are independent of one another. Only if the iterations are independent, you can assume the same results will be produced by a parallel and non-parallel for loop.

Suggestions for doing async I/O with Task Parallel Library

I have some high performance file transfer code which I wrote in C# using the Async Programming Model (APM) idiom (eg, BeginRead/EndRead). This code reads a file from a local disk and writes it to a socket.
For best performance on modern hardware, it's important to keep more than one outstanding I/O operation in flight whenever possible. Thus, I post several BeginRead operations on the file, then when one completes, I call a BeginSend on the socket, and when that completes I do another BeginRead on the file. The details are a bit more complicated than that but at the high level that's the idea.
I've got the APM-based code working, but it's very hard to follow and probably has subtle concurrency bugs. I'd love to use TPL for this instead. I figured Task.Factory.FromAsync would just about do it, but there's a catch.
All of the I/O samples I've seen (most particularly the StreamExtensions class in the Parallel Extensions Extras) assume one read followed by one write. This won't perform the way I need.
I can't use something simple like Parallel.ForEach or the Extras extension Task.Factory.Iterate because the async I/O tasks don't spend much time on a worker thread, so Parallel just starts up another task, resulting in potentially dozens or hundreds of pending I/O operations; way too much! You can work around that by Waiting on your tasks, but that causes creation of an event handle (a kernel object), and a blocking wait on a task wait handle, which ties up a worker thread. My APM-based implementation avoids both of those things.
I've been playing around with different ways to keep multiple read/write operations in flight, and I've managed to do so using continuations that call a method that creates another task, but it feels awkward, and definitely doesn't feel like idiomatic TPL.
Has anyone else grappled with an issue like this with the TPL? Any suggestions?
If you're worried about too many threads, you can just set ParallelOptions.MaxDegreeOfParallelism to an acceptable number in your call to Parallel.ForEach.

Resources