Disabling ALL asynchronous execution in CUDA programs

Disabling ALL asynchronous execution in CUDA programs - asynchronous

According to the CUDA programming guide, you can disable asynchronous kernel launches at run time by setting an environment variable (CUDA_LAUNCH_BLOCKING=1).
This is a helpful tool for debugging. I also want to determine the benefit in my code from using concurrent kernels and transfers.
I want to also disable other concurrent calls, in particular cudaMemcpyAsync.
Does CUDA_LAUNCH_BLOCKING affect these kinds of calls in addition to kernel launches? I suspect not. What would be the best alternative? I can add cudaStreamSynchronize calls, but I would prefer a run time solution. I can run in the debugger, but that will affect the timing and defeat the purpose.

Setting CUDA_LAUNCH_BLOCKING won't effect the streams API at all. If you add some debug code to force all your streams code to use stream 0, all the calls other than kernel calls will revert to synchronous behaviour.

Related

Xamarin android Worker

I have old xamarin android jobintentservice I'm replacing to support android 12.0.
I now inherit Worker instead of JobIntentService, and use WorkManager to enqueue its job.
Couple of questions:
Is there a way use await inside DoWork override method?
Is it better to inherit ListenableWorker instead in order to use await? Do I lose
anything if I switch to it?
If I Task.Factory.StartNew(LongRunning) from Worker's DoWork, and immediately that follow with return success result, will my long running task run to completion, or all work associated will be terminated?

I think you can know more about WorkManager first.
From document Schedule tasks with WorkManager we know that:
WorkManager is the recommended solution for persistent work. Work is persistent when it remains scheduled through app restarts and system reboots. Because most background processing is best accomplished through persistent work, WorkManager is the primary recommended API for background processing.
WorkManager handles three types of persistent work:
Immediate: Tasks that must begin immediately and complete soon. May
be expedited.
Long Running: Tasks which might run for longer, potentially longer
than 10 minutes.
Deferrable: Scheduled tasks that start at a later time and can run
periodically.
And from the Work chaining of Features,we know
For complex related work, chain individual work tasks together using an intuitive interface that allows you to control which pieces run sequentially and which run in parallel.
WorkManager.getInstance(...)
.beginWith(Arrays.asList(workA, workB))
.then(workC)
.enqueue();
For each work task, you can define input and output data for that work. When chaining work together, WorkManager automatically passes output data from one work task to the next.
Note:
If you still have any problem, get back to me.

What are the differences between the MPI_* functions and PMPI_* functions?

I'm looking at the mpi.h header, and I'm confused about the PMPI_Init function. It's placed right after the MPI_Init declaration, and it looks exactly the same. However, Msmpi.dll (for instance) doesn't have the MPI_Init export, only the PMPI_Init.
What are these PMPI_ functions?

You are looking at the MPI profiling interface. For each MPI function, there is also a similar PMPI function, which just differs by the prefix.
As a user, you should only call the MPI version and just ignore the PMPI version.
This is a mechanism that allows tool developers to intercept calls to the MPI functions and call the PMPI versions internally. Usually this is implemented such that all functions are implemented as PMPI functions and with MPI functions as weak symbols pointing to them. The tool can then replace the weak symbols with their own wrapper implementations and still call the PMPI functions internally.
// Normal case
user --calls--> libmpi:MPI_Init --redicrects to--> libmpi:PMPI_Init (implementation)
// Tool case
user --calls--> libtool:MPI_Init (does tool things) --calls--> libmpi:PMPI_Init (implementation)
You can find more information in Section 14.2 of the MPI standard. In general, I highly recommend to look in the standard for function signatures and such instead of the header.

The PMPI_ entry points are part of the MPI Profiling Interface.
These symbols by default simply refer to their MPI_ function namesakes, but by having them defined as part of the API, they make it easy for tools to insert themselves around MPI calls to do simple performance profiling or tracing. There are lots of examples of how they work and how to use them.
Most profiling tools for MPI codes make use of this to do things like time MPI communication routines, count the number of messages being sent/received with particular sizes, etc., without having to modify the user code; you just have to link in the profiling library.
The Profiling interface doesn't have to be used strictly for profiling, of course - there have been projects that have used the profiling interface for communications correctness checking (making sure sends and receives were matched), simple heuristic deadlock testing, etc.
The profiling interface was the only standard tools interface to the MPI library for some time, but there is now also a richer Tools interface.

Why is async programming faster

I keep hearing that using async programming patterns will make my code run faster. Why is that true? Doesn't the same exact code have to run either way, whether it runs now or it runs later?

It's not faster, it just doesn't waste time.
Synchronous code stops processing when waiting for I/O. Which means that when you're reading a file you can't run any other code. Now, if you have nothing else to do while that file is being read then asynchronous code would not buy you anything much.
Usually the additional CPU time that you can use is useful for servers. So the question is why do asynchronous programming instead of starting up a new thread for each client?
It turns out that starting and tearing down threads is expensive. Some time back in the early 2000s a web server benchmark found that tclhttpd compared favorably to Apache for serving static image files. This is despite the fact that tclhttpd was written in tcl and Apache was written in C and tcl was known to be 50 times slower than C. Tcl managed to hold its own against Apache because tcl had an easy to use asynchronous I/O API. So tclhttpd used it.
It's not that C doesn't have asynchronous I/O API. It's just that they're rarely used. So Apache didn't use it. These days, Apache2 uses asynchronous I/O internally along with thread pools. The C code ends up looking more complicated but it's faster - lesson learned.
Which leads us to the recent obsession with asynchronous programming. Why are people obsessed with it? (most answers on Stackoverflow about javascript programming for example insist that you should never use synchronous versions of asynchronous functions).
This goes back to how you rarely see asynchronous programs in C even though it's the superior way of doing things (GUI code is an exception because UI libraries learned early on to rely on asynchronous programming and Events). There are simply too many functions in C that are synchronous. So even if you wanted to do asynchronous programming you'll end up calling a synchronous function sooner or later. The alternative is to abandon stdlib and write your own asynchronous libraries for everything - from file I/O to networking to SQL.
So, in languages like javascript where asynchronous programming ended up as the default style there is pressure from other programmers to not mess it up and accidentally introduce synchronous functions which would be hard to integrate with asynchronous code without losing a lot of performance. So in the end, like taxes, asynchronous code has become a social contract.

It's not always faster. In fact, just setting up and tearing down the async environment adds a lot of time to your code. You have to spin off a new process/thread, set up an event queue/message pump, and clean up everything nicely in the end. (Even if your framework hides all these details from you, they're happening in the background).
The advantage is blocking. Lot's of our code depends on external resources. We need to query a database for the records to process, or download the latest version of something from a website. From the moment you ask that resource for information until you get an answer, your code has nothing to do. It's blocking, waiting for an answer. All the time your program spends blocking is totally wasted.
That's what async is designed for. By spinning the "wait for this blocking operation" code off into an async request, you let the rest of your non-blocking code keep running.
As a metaphor, imagine a manager telling his employee what to do that day. One the tasks is a phone call to a company with long wait times. If he told her to make the call synchronously she would call and wait on hold without doing anything else. Make it async and she can work on a lot of other tasks while the phone sits on hold in the background.

It runs the same code , but it does not wait for time taking task to finish . It will continue to execute code until async function is done.

Can the R console support background tasks or interrupts (event-handling)?

While working in an R console, I'd like to set up a background task that monitors a particular connection and when an event occurs, another function (an alert) is executed. Alternatively, I can set things up so that an external function simply sends an alert to R, but this seems to be the same problem: it is necessary to set up a listener.
I can do this in a dedicated process of R, but I don't know if this is feasible from within a console. Also, I'm not interested in interrupting R if it is calculating a function, but alerting or interrupting if the console is merely waiting on input.
Here are three use cases:
The simplest possible example is watching a file. Suppose that I have a file called "latestData.csv" and I want to monitor it for changes; when it changes, myAlert() is executed. (One can extend it to do different things, but just popping up with a note that a file has changed is useful.)
A different kind of monitor would watch for whether a given machine is running low on RAM and might execute a save.image() and terminate. Again, this could be a simple issue of watching a file produced by an external monitor that saves the output of top or some other command.
A different example is like another recent SO question, about : have R halt the EC2 machine it's running on. If an alert from another machine or process tells the program to save & terminate, then being able to listen for that alert would be great.
At the moment, I suspect there are two ways of handling this: via Rserve and possibly via fork. If anyone has examples of how to do this with either package or via another method, that would be great. I think that solving any of these three use cases would solve all of them, modulo a little bit external code.
Note 1: I realize, per this answer to another SO question that R is single threaded, which is why I suspect fork and Rserve may work. However, I'm not sure about feasibility if one is interfacing with an R terminal. Although R's REPL is attached to the input from the console, I am trying to either get around this or mimic it, which is where fork or Rserve may be the answer.
Note 2: For those familiar with event handling / eventing methods, that would solve everything, too. I've just not found anything about this in R.
Update 1: I've found that the manual for writing R extensions has a section referencing event handling, which mentions the use of R_PolledEvents. This looks promising.

One more option is the svSocket package. It is non blocking.
Here is an 8 minute video using it, which has over 3,000 views. It shows how to turn an R session into a server and how to send commands to it and receive data back. It demonstrates doing that even while the server is busy; e.g., say you start a long running process and forget to save intermediate results, you can connect to the server and fetch the results (before it has finished) from it.

It depends whether you want to interrupt idling or working R. If the first, you can think of bypassing the R default REPL loop by some event listener that will queue the incoming events and evaluate them. The common option is to use tcl/tk or gtk loop; I have made something like this around libev in my triggr package, which makes R digest requests coming from a socket.
The latter case is mostly hopeless, unless you will manually make the computational code to execute if(evenOccured) processIt code periodically.
Multithreading is not a real option, because as you know two interpreters in one process will break themselves by using same global variables, while forked processes will have independent memory contents.

It turns out that the package Rdsm supports this as well.
With this package, one can set up a server/client relationship between different instances of R, each is a basic R terminal, and the server can send messages, including functions, to the clients.
Transformed to the use case I described, the server process can do whatever monitoring is necessary, and then send messages to the clients. The documentation is a little terse, unfortunately, but the functionality seems to be straightforward.
If the server process is, say, monitoring a connection (a file, a pipe, a URL, etc.) on a regular basis and a trigger is encountered, it can then send a message to the clients.
Although the primary purpose of the package is shared memory (which is how I came across it), this messaging works pretty well for other purposes, too.
Update 1: Of course for message passing, one can't ignore MPI and the Rmpi package. That may do the trick, but the Rdsm package launches / works with R consoles, which is the kind of interface I'd sought. I'm not yet sure what Rmpi supports.

A few ideas:
Run R from within another language's script (this is possible, for example, in Perl using RSPerl) and use the wrapping script to launch the listener.
Another option may be to run an external (non-R) command (using system()) from within R that will launch a listener in the background.
Running R in batch mode in the background either before launching R or in a separate window.
For example:
R --no-save < listener.R > output.out &
The listener can send an approraite email when the event occurs.

what are threads in actionscript functions?

I've seen a lot of other developers refer to threads in ActionScript functions. As a newbie I have no idea what they are referring to so:
What is a thread in this sense?
How would I run more than one thread at a time?
How do I ensure that I am only running one thread at a time?
Thanks
~mike

Threads represent a way to have a program appear to perform several jobs concurrently. Although whether or not the jobs can actually occur simultaneously is dependent on several factors (most importantly, whether the CPU the program is running on has multiple cores available to do the work). Threads are useful because they allow work to be done in one context without interfering with another context.
An example will help to illustrate why this is important. Suppose that you have a program which fetches the list of everyone in the phone book whose name matches some string. When people click the "search" button, it will trigger a costly and time-consuming search, which might not complete for a few seconds.
If you have only a single-threaded execution model, the UI will hang and be unresponsive until the search completes. Your program has no choice but to wait for the results to finish.
But if you have several threads, you can offload the search operation to a different thread, and then have a callback -- a trigger which is invoked when the work is completed -- to let you know that things are ready. This frees up the UI and allows it to continue to respond to events.
Unfortunately, because ActionScript's execution model doesn't support threads natively, it's not possible to get true threading. There is a rough approximation called "green threads", which are threads that are controlled by an execution context or virtual machine rather than a larger operating system, which is how it's usually done. Several people have taken a stab at it, although I can't say how widespread their usage is. You can read more at Alex Harui's blog here and see an example of green threads for ActionScript here.

It really depends on what you mean. The execution model for ActionScript is single-threaded, meaning it can not run a process in the background.
If you are not familiar with threading, it is essentially the ability to have something executed in the background of a main process.
So, if you needed to do a huge mathematical computation in your flex/flash project, with a multi-threaded program you could do that in the background while you simultaneously updated your UI. Because ActionScript is not multi-threaded you can not do such things. However, you can create a pseudo-threading class as demonstrated here:
http://blogs.adobe.com/aharui/pseudothread/PseudoThread.as

The others have described what threading is, and you'd need threading if you were getting hardcore into C++ and 3D game engines, among many other computationally-expensive operations, and languages that support multi-threading.
Actionscript doesn't have multi-threading. It executes all code in one frame. So if you create a for loop that processes 100,000,000 items, it will cause the app to freeze. That's because the Flash Player can only execute one thread of code at a time, per frame.
You can achieve pseudo-threading by using:
Timers
Event.ENTER_FRAME
Those allow you to jump around and execute code.
Tween engines like TweenMax can operate on 1000's of objects at once over a few seconds by using Timers. You can also do this with Event.ENTER_FRAME. There is something called "chunking" (check out Grant Skinner's AS3 Optimizations Presentation), which says "execute computationally expensive tasks over a few frames", like drawing complex bitmaps, which is a pseudo-multi-threading thing you can do with actionscript.
A lot of other things are asynchronous, like service calls. If you make an HTTPService request in Flex, it will send a request to the server and then continue executing code in that frame. Once it's done, the server can still be processing that request (say it's saving a 30mb video to a database on the server), and that might take a minute. Then it will send something back to Flex and you can continue code execution with a ResultEvent.RESULT event handler.
So Actionscript basically uses:
Asynchronous events, and
Timers...
... to achieve pseudo-multi-threading.

a thread allows you to execute two or more blocks of actionscrpt simultaniously by default you will always be executing on the same default thread unless you explcitly start a new thread.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex