How to replace WAIT, to avoid locked error and to free the memory - wait

I have today a question about the usage of the WAIT.
I work with an internal source code quality team, in charge of reviewing your code and approve it. Unfortunately, the usage of WAIT UP TO x SECONDS instruction is now forbidden and not negotiable.
Issue
CALL FUNCTION 'ANY_FUNCTION_WITH_LOCK'.
CALL FUNCTION 'ANY_FUNCTION_WITH_LOCK_2'.
If I execute this pseudo-code, I will have an error (locked object) because I use shared objects (I use function modules without sync/async mode).
I can resolve this issue by the usage of WAIT.
CALL FUNCTION 'ANY_FUNCTION_WITH_LOCK'.
WAIT UP TO 5 SECONDS. " or 1, 2, 3, 4, 5, ... seconds <------------
CALL FUNCTION 'ANY_FUNCTION_WITH_LOCK_2'.
With this method (and std functions), the system will stop and wait for a specific time.
But, sometime the system need 1 second ... or more. We can't know the exact time needed.
But, If we execute this code inside a loop with a large number of objects, the system can wait during an infinite time until a memory dump.
(Impacted functions are related to VL32N, QA11, ... and their objects)
Need
The need is How to replace the WAIT instruction ?
We need to find a solution/function that has the same behavior as the WAIT UP TO, but that will not impact (or less) the memory level (dump, excessive consumption of resources, ...)
In fact we need something like the COMMIT WORK AND WAIT but with the result of a function and not the database.
Solutions ?
Use a loop with timestamp comparaison and use ENQUEUE_READ to get the list of locked objects and check if the needed object is in this list, until X secondes.
It seems that this solution need the same level of resource as the WAIT.
ENQUE_SLEEP seems to have the same behavior that WAIT on the memory (How to make an abap program pause?)
Refactor all the code already done, and use synchronous functions.
Anything else ? Any idea ? Is it even possible?
Thanks in advance :)

Why not just put a check for the lock in between the two function modules? You could put that inside of a loop and exit the loop as soon as the lock is cleared from FM 1.

I use ENQUE_SLEEP when I want to wait for a specified amount of time and then recheck something. For example you could wait 5 seconds and then check for the existence of the locks. If the objects are no longer locked, then proceed. If the locks are still there, sleep again. To avoid an infinite loop, you must have some limit on the number of times you are willing to sleep before you give up and log some kind of an error.
The problem with WAIT is it triggers an implicit commit. ENQUE_SLEEP will not do that.

Related

Meteor.setTimeout() memory leak?

I've created a new project with just one file (server.js) on the server with this tiny piece of code that does nothing. But, after running it, my node process is using about 1Gb of memory. Does anyone know why?
for (var i = 1000000; i >= 0; i--) {
Meteor.setTimeout(function(){},1000);
};
Apparently Meteor.setTimeout() function does or uses something (closure?) that prevents GC from clearing memory after it has been executed. Any ideas?
Since you are calling this on the server side, Meteor.setTimeout is a lot more complex than it appears on the surface. Meteor.setTimeout wraps setTimeout with Meteor.bindEnvironment(), which is essentially binding the context of the current environment to the timeout callback. When that timeout triggers, it will pull in the context of when it was originally called.
A good example would be if you called a Meteor.method() on the server and used a Meteor.setTimeout() within it. Meteor.method() will keep track of the user who called the method. If you use Meteor.setTimeout() it will bind that environment to the callback for the timeout, increasing the amount of memory needed for an empty function().
As to why there isn't any garbage collection occurring on your server, it may not have hit it's buffer. I tried running your test and my virtual memory hit around 1.2gb, but it never went any higher, even after subsequent tests. Try running that code multiple times to see if memory consumption continues to increase linearly, or if it hits a ceiling and stops growing.

Barriers in OpenCL

In OpenCL, my understanding is that you can use the barrier() function to synchronize threads in a work group. I do (generally) understand what they are for and when to use them. I'm also aware that all threads in a work group must hit the barrier, otherwise there are problems. However, every time I've tried to use barriers so far, it seems to result in either my video driver crashing, or an error message about accessing invalid memory of some sort. I've seen this on 2 different video cards so far (1 ATI, 1 NVIDIA).
So, my questions are:
Any idea why this would happen?
What is the difference between barrier(CLK_LOCAL_MEM_FENCE) and barrier(CLK_GLOBAL_MEM_FENCE)? I read the documentation, but it wasn't clear to me.
Is there general rule about when to use barrier(CLK_LOCAL_MEM_FENCE) vs. barrier(CLK_GLOBAL_MEM_FENCE)?
Is there ever a time that calling barrier() with the wrong parameter type could cause an error?
As you have stated, barriers may only synchronize threads in the same workgroup. There is no way to synchronize different workgroups in a kernel.
Now to answer your question, the specification was not clear to me either, but it seems to me that section 6.11.9 contains the answer:
CLK_LOCAL_MEM_FENCE – The barrier function will either flush any
variables stored in local memory or queue a memory fence to ensure
correct ordering of memory operations to local memory.
CLK_GLOBAL_MEM_FENCE – The barrier function will queue a memory fence
to ensure correct ordering of memory operations to global memory.
This can be useful when work-items, for example, write to buffer or
image memory objects and then want to read the updated data.
So, to my understanding, you should use CLK_LOCAL_MEM_FENCE when writing and reading to the __local memory space, and CLK_GLOBAL_MEM_FENCE when writing and readin to the __global memory space.
I have not tested whether this is any slower, but most of the time, when I need a barrier and I have a doubt about which memory space is impacted, I simply use a combination of the two, ie:
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
This way you should not have any memory reading\writing ordering problem (as long as you are sure that every thread in the group goes through the barrier, but you are aware of that).
Hope it helps.
Reviving an old-ish thread here. I have had a little bit of trouble with barrier() myself.
Regarding your crash problem, one potential cause could be if your barrier is inside a condition. I read that when you use barrier, ALL work items in the group must be able to reach that instruction, or it will hang your kernel - usually resulting in a crash.
if(someCondition){
//do stuff
barrier(CLK_LOCAL_MEM_FENCE);
//more stuff
}else{
//other stuff
}
My understanding is that if one or more work items satisfies someCondition, ALL work items must satisfy that condition, or there will be some that will skip the barrier. Barriers wait until ALL work items reach that point. To fix the above code, I need to restructure it a bit:
if(someCondition){
//do stuff
}
barrier(CLK_LOCAL_MEM_FENCE);
if(someCondition){
//more stuff
}else{
//other stuff
}
Now all work items will reach the barrier.
I don't know to what extent this applies to loops; if a work item breaks from a for loop, does it hit barriers? I am unsure.
UPDATE: I have successfully crashed a few ocl programs with a barrier in a for-loop. Make sure all work items exit the for loop at the same time - or better yet, put the barrier outside the loop.
(source: Heterogeneous Computing with OpenCL Chapter 5, p90-91)

How to properly write a SIGPROF handler that invokes AsyncGetCallTrace?

I am writing a short and simple profiler (in C), which is intended to print out stack traces for threads in various Java clients at regular intervals. I have to use the undocumented function AsyncGetCallTrace instead of GetStackTrace to minimize intrusion and allow for stack traces regardless of thread state. The source code for the function can be found here: http://download.java.net/openjdk/jdk6/promoted/b20/openjdk-6-src-b20-21_jun_2010.tar.gz
in hotspot/src/share/vm/prims/forte.cpp. I found some man pages documenting JVMTI, signal handling, and timing, as well as a blog with details on how to set up the AsyncGetCallTrace call: http://jeremymanson.blogspot.com/2007/05/profiling-with-jvmtijvmpi-sigprof-and.html
What this blog is missing is the code to actually invoke the function within the signal handler (the author assumes the reader can do this on his/her own). I am asking for help in doing exactly this. I am not sure how and where to create the struct ASGCT_CallTrace (and the internal struct ASGCT_CallFrame), as defined in the aforementioned file forte.cpp. The struct ASGCT_CallTrace is one of the parameters passed to AsyncGetCallTrace, so I do need to create it, but I don't know how to obtain the correct values for its fields: JNIEnv *env_id, jint num_frames, and JVMPI_CallFrame *frames. Furthermore, I do not know what the third parameter passed to AsyncGetCallTrace (void* ucontext) is supposed to be?
The above problem is the main one I am having. However, other issues I am faced with include:
SIGPROF doesn't seem to be raised by the timer exactly at the specified intervals, but rather a bit less frequently. That is, if I set the timer to send a SIGPROF every second (1 sec, 0 usec), then in a 5 second run, I am getting fewer than 5 SIGPROF handler outputs (usually 1-3)
SIGPROF handler outputs do not appear at all during a Thread.sleep in the Java code. So, if a SIGPROF is to be sent every second, and I have Thread.sleep(5000);, I will not get any handler outputs during the execution of that code.
Any help would be appreciated. Additional details (as well as parts of code and sample outputs) will be posted upon request.
I finally got a positive result, but since little discussion was spawned here, my own answer will be brief.
The ASGCT_CallTrace structure (and the underlying ASGCT_CallFrame array) can simply be declared in the signal handler, thus existing only the stack:
ASGCT_CallTrace trace;
JNIEnv *env;
global_VM_pointer->AttachCurrentThread((void **) &env, NULL);
trace.env_id = env;
trace.num_frames = 0;
ASGCT_CallFrame storage[25];
trace.frames = storage;
The following gets the uContext:
ucontext_t uContext;
getcontext(&uContext);
And then the call is just:
AsyncGetCallTrace(&trace, 25, &uContext);
I am sure there are some other nuances that I had to take care of in the process, but I did not really document them. I am not sure I can disclose the full current code I have, which successfully asynchronously requests for and obtains stack traces of any java program at fixed intervals. But if someone is interested in or stuck on the same problem, I am now able to help (I think).
On the other two issues:
[1] If a thread is sleeping and a SIGPROF is generated, the thread handles that signal only after waking up. This is normal, since it is the thread's job to handle the signal.
[2] The timer imperfections do not seem to appear anymore. Perhaps I mis-measured.

What is call out table in unix?

Can anybody tell me what a 'call out table' is in Unix? Maurice J. Bach gives an explanation in his book Design of the UNIX Operating System, but I'm having difficulty in understanding the examples, especially the one explaining the reason of negative time-out fields. Why are software interrupts used there?
Thanks!
Interrupts stop the current code and start the execution of a high-priority handler; while this handler runs, nothing else can get the CPU. So if you need to do something complex, your interrupt handler would hang the whole system.
The solution: Fill a data structure with all the necessary data and then store this data structure with a pointer to the handler in the call out table. Some service (usually the clock handler) will eventually visit the table and execute the entries one by one in a standard context (i.e. one which doesn't block the process switching).
In System V unix, the kernel or device drivers could schedule some function to be run (or "called out") by the kernel at a later time. The kernel clock handler was in charge of making sure such registered call outs were executed. The call out table was the kernel data structure in which such registered "call outs" were stored.
I don't know to what end they were generally used.

Should I care about thread safe of static int (4 bytes) variable in ASP .NET

I have the feeling that I should not care about thread safe accessing / writing to an
public static int MyVar = 12;
in ASP .NET.
I read/write to this variable from various user threads. Let's suppose this variable will store the numbers of clicks on a certain button/link.
My theory is that no thread can read/write to this variable at the same time. It's just a simple variable of 4 bytes.
I do care about thread safe, but only for refference objects and List instances or other types that take more cycles to read/update.
I am wrong with my presumption ?
EDIT
I understand this depend of my scenario, but wasn't that the point of the question. The question is: it is right that can be written thread safe code with an (static int) variable without using lock keyword ?
It is my problem to write correct code. The answer seems to be: Yes, if you write correct and simple code, and not to much complicated, you can create thread safe functions without the need of lock keyword.
If one thread simply sets the value and another thread reads the value, then a lock is not necessary; the read and write are atomic. But if multiple threads might be updating it and are also reading it to do the update (e.g., increment), then you definitely do need some kind of synchronization. If only one thread is ever going to update it even for an increment, then I would argue that no synchronization is necessary.
Edit (three years later) It might also be desirable to add the volatile keyword to the declaration to ensure that reads of the value always get the latest value (assuming that matters in the application).
The concept of thread 'safety' is too vague to be meaningful unfortunately. If you're asking whether you can read and write to it from multiple threads without the program crashing during the operation, the answer is almost certainly yes. If you're also asking if the variable is guaranteed to either be the old value or the new value without ever storing any broken intermediate values, the answer for this data type is again almost certainly yes.
But if your question is "will my program work correctly if I access this from multiple threads", then the answer depends entirely on what your program is doing. For example, if you run the following pseudo code in 2 threads repeatedly in most programming languages, eventually you'll hit the assertion.
if MyVar >= 1:
MyVar = MyVar - 1
assert MyVar >= 0
Primitives like int are thread-safe in the sense that reads/writes are atomic. But as with most any type, it's left to you to do proper checking with more complex operations. For example, if (x > 0) x--; would be problematic in a multi-threaded scenario because x might change in between the if condition check and decrement.
A simple read or write on a field of 32 bits or less is always atomic. But you should provide your read/write code to make sure that it is thread safe.
Check out this post: http://msdn.microsoft.com/en-us/magazine/cc163929.aspx
It explains why you need to synchronize access to the integers in this scenario
Try Interlocked.Increment() or Interlocked.Add() and you'll be right. Your code complexity will be the same but you truly won't have to worry. If you're not worried about losing a few clicks in your counter, you can continue as you are.
Reading or writing integers is atomic. However, reading and then writing is not atomic. So, if you have one thread that writes and many that read, you may be able to get away without locks.
However, even though the operations are atomic, there are still potential multi-threading issues. In order for one thread to be guaranteed that another thread can see values it writes, you need a memory barrier. Otherwise, the compiler can optimize the code so that the variable stays in a register (or even optimize the operation away completely), so changes would be invisible from one thread to another.
You can establish a memory barrier explicitly (volatile or Thread.MemoryBarrier), or with the Interlocked class -- or with the lock statement (Monitor).

Resources