PyOpenCL: What is Event.wait() for? - wait

This function is uncommented in the PyOpenCL documentation: http://documen.tician.de/pyopencl/runtime.html?highlight=enqueue#pyopencl.Event.wait
My problem is: I have to call a kernel in a for-loop and after each call enqueue a copy operation with pyopencl.enqueue_copy_buffer(dest, src, size). First I did this with .wait() appended (i.e. pyopencl.enqueue_copy_buffer(dest, src, size).wait()) because I found this in an example. Then I removed .wait() and got a very significant speedup.
So what is this function for and under which conditions can it be abandoned? Thanks.

.wait() will wait for the operation to be completed. If your code can proceed without it being finished (or even started), you can leave it out. Note that this is not related to the order of operations in the queue: they will be executed (unless you use out-of-order queue) in the order you enqueued them, one after another.

Related

Is there a way to make_ref() for spawned processes in erlang?

I have tried to make references to spawned processes in erlang in several ways in order to make them compatible with the logging of From in a call to gen_server. So far I have tried P1ID = {spawn(fun() -> self() end), make_ref()}, in order to capture the structure of from() as stated in the documentation about gen_server:reply: erlang documentation I have not yet succeeded and the documentation about make_ref() is rather scarce.
were you attempting to built that {Pid, Ref} tuple in order to test the handle_call() gen_server callback from your tests?
if yes, you should not test these gen_server internals directly. instead add higher level functions to your module(that will call the gen_server call/cast/.. functions) and test those
spawn() already returns a pid() so there was no reason to return self() from from the spawned process.
Hope it helps

Which wait mechanism is Polly using

Polly has several retry functionalities like for example WaitAndRetryForever. I looked in the documentation but couldn't find what is used exactly for making the thread wait until the next retry. I guess Polly uses System.Timers for this or is it something completely different? Thanks for any collaboration.
Asynchronous executions (fooAsyncPolicy.ExecuteAsync(...)) wait with Task.Delay(...), freeing the thread the caller was using while the delay occurs.
Synchronous executions (fooSyncPolicy.Execute(...)) wait between retries in a cancellable thread-blocking manner. This means that, for the synchronous (a):
action();
compared to the synchronous (b):
policy.Execute(action);
the following three things all hold:
both (a) and (b) block progress from continuing (subsequent code does not run) until the statement has completed;
(b) executes action on the same thread that (a) originally would have;
(b) expresses exceptions (if Policy operation does not intervene) in the same/similar-as-possible way that (a) originally would have.
These semantics (1) (2) (3) are intentional, to keep synchronously executing code with Polly as similar in semantics/behaviour (surrounding code needs little adjustment) as executing code without Polly.
Anticipating a follow-up question: Wouldn't it be possible to write the synchronous Polly: Policy.Handle<T>().WaitAndRetry(...).Execute(action) so that it didn't block a thread while waiting before retrying?: Yes, but no solution has been found that is preferable to letting the caller control transitions to TPL Tasks or async/await and then using Polly's ExecuteAsync(...).

Pintool- How can I traverse through all traces (even the ones that have already been executed once)?

I'm trying to count how many times a bbl is executed in the whole program run, but apparently, Trace_addinstrumentfunction skips traces that have already been executed once. Anyone has any ideas?
Pin instrumentation works in two phases. The instrumentation phase is called when new code is encountered, and allows you to insert analysis callbacks. Analysis callbacks are called every time the code is encountered.
I strongly recommend reading the first bit of the pin manual to understand the difference between instrumentation and analysis functions.
The instrumentation called allows you to insert the callbacks. In simpler terms, the function will have you put function calls before each instrument. You can define this instrument as either Instruction, Trace or Routine. Now, specific to your question, finding number of bbl is easy. Pin however follows a different definition of BBL. Finding the number of times a BBL(Per Pin's definition) is executed is easy. You can simply insert an Trace Instrumentation call and for every BBL increment a counter in the analysis call and you will get the BBL count.
If you want to go by the textbook definition of BBL(one entry one exit) which implies one BBL breaks at the BranchOrCall statement, insert a call using IsBranchOrCall API and increment the BBLcounter in the callback function.
I recommend trying both of them and figuring out the difference between the two definitions.

How to replace WAIT, to avoid locked error and to free the memory

I have today a question about the usage of the WAIT.
I work with an internal source code quality team, in charge of reviewing your code and approve it. Unfortunately, the usage of WAIT UP TO x SECONDS instruction is now forbidden and not negotiable.
Issue
CALL FUNCTION 'ANY_FUNCTION_WITH_LOCK'.
CALL FUNCTION 'ANY_FUNCTION_WITH_LOCK_2'.
If I execute this pseudo-code, I will have an error (locked object) because I use shared objects (I use function modules without sync/async mode).
I can resolve this issue by the usage of WAIT.
CALL FUNCTION 'ANY_FUNCTION_WITH_LOCK'.
WAIT UP TO 5 SECONDS. " or 1, 2, 3, 4, 5, ... seconds <------------
CALL FUNCTION 'ANY_FUNCTION_WITH_LOCK_2'.
With this method (and std functions), the system will stop and wait for a specific time.
But, sometime the system need 1 second ... or more. We can't know the exact time needed.
But, If we execute this code inside a loop with a large number of objects, the system can wait during an infinite time until a memory dump.
(Impacted functions are related to VL32N, QA11, ... and their objects)
Need
The need is How to replace the WAIT instruction ?
We need to find a solution/function that has the same behavior as the WAIT UP TO, but that will not impact (or less) the memory level (dump, excessive consumption of resources, ...)
In fact we need something like the COMMIT WORK AND WAIT but with the result of a function and not the database.
Solutions ?
Use a loop with timestamp comparaison and use ENQUEUE_READ to get the list of locked objects and check if the needed object is in this list, until X secondes.
It seems that this solution need the same level of resource as the WAIT.
ENQUE_SLEEP seems to have the same behavior that WAIT on the memory (How to make an abap program pause?)
Refactor all the code already done, and use synchronous functions.
Anything else ? Any idea ? Is it even possible?
Thanks in advance :)
Why not just put a check for the lock in between the two function modules? You could put that inside of a loop and exit the loop as soon as the lock is cleared from FM 1.
I use ENQUE_SLEEP when I want to wait for a specified amount of time and then recheck something. For example you could wait 5 seconds and then check for the existence of the locks. If the objects are no longer locked, then proceed. If the locks are still there, sleep again. To avoid an infinite loop, you must have some limit on the number of times you are willing to sleep before you give up and log some kind of an error.
The problem with WAIT is it triggers an implicit commit. ENQUE_SLEEP will not do that.

How to properly write a SIGPROF handler that invokes AsyncGetCallTrace?

I am writing a short and simple profiler (in C), which is intended to print out stack traces for threads in various Java clients at regular intervals. I have to use the undocumented function AsyncGetCallTrace instead of GetStackTrace to minimize intrusion and allow for stack traces regardless of thread state. The source code for the function can be found here: http://download.java.net/openjdk/jdk6/promoted/b20/openjdk-6-src-b20-21_jun_2010.tar.gz
in hotspot/src/share/vm/prims/forte.cpp. I found some man pages documenting JVMTI, signal handling, and timing, as well as a blog with details on how to set up the AsyncGetCallTrace call: http://jeremymanson.blogspot.com/2007/05/profiling-with-jvmtijvmpi-sigprof-and.html
What this blog is missing is the code to actually invoke the function within the signal handler (the author assumes the reader can do this on his/her own). I am asking for help in doing exactly this. I am not sure how and where to create the struct ASGCT_CallTrace (and the internal struct ASGCT_CallFrame), as defined in the aforementioned file forte.cpp. The struct ASGCT_CallTrace is one of the parameters passed to AsyncGetCallTrace, so I do need to create it, but I don't know how to obtain the correct values for its fields: JNIEnv *env_id, jint num_frames, and JVMPI_CallFrame *frames. Furthermore, I do not know what the third parameter passed to AsyncGetCallTrace (void* ucontext) is supposed to be?
The above problem is the main one I am having. However, other issues I am faced with include:
SIGPROF doesn't seem to be raised by the timer exactly at the specified intervals, but rather a bit less frequently. That is, if I set the timer to send a SIGPROF every second (1 sec, 0 usec), then in a 5 second run, I am getting fewer than 5 SIGPROF handler outputs (usually 1-3)
SIGPROF handler outputs do not appear at all during a Thread.sleep in the Java code. So, if a SIGPROF is to be sent every second, and I have Thread.sleep(5000);, I will not get any handler outputs during the execution of that code.
Any help would be appreciated. Additional details (as well as parts of code and sample outputs) will be posted upon request.
I finally got a positive result, but since little discussion was spawned here, my own answer will be brief.
The ASGCT_CallTrace structure (and the underlying ASGCT_CallFrame array) can simply be declared in the signal handler, thus existing only the stack:
ASGCT_CallTrace trace;
JNIEnv *env;
global_VM_pointer->AttachCurrentThread((void **) &env, NULL);
trace.env_id = env;
trace.num_frames = 0;
ASGCT_CallFrame storage[25];
trace.frames = storage;
The following gets the uContext:
ucontext_t uContext;
getcontext(&uContext);
And then the call is just:
AsyncGetCallTrace(&trace, 25, &uContext);
I am sure there are some other nuances that I had to take care of in the process, but I did not really document them. I am not sure I can disclose the full current code I have, which successfully asynchronously requests for and obtains stack traces of any java program at fixed intervals. But if someone is interested in or stuck on the same problem, I am now able to help (I think).
On the other two issues:
[1] If a thread is sleeping and a SIGPROF is generated, the thread handles that signal only after waking up. This is normal, since it is the thread's job to handle the signal.
[2] The timer imperfections do not seem to appear anymore. Perhaps I mis-measured.

Resources