Intel Pin instrumentation to execute once - intel-pin

I am looking for a possibility of if I could insert an instrumentation before an instruction, then once that executed, want to remove that instrumentation so that I don't have multiple calls for the same instruction.
Any idea?

I am not quite sure if I get your point. Do you have some instruction which you want to instrument only once? For example, an instruction in some kind of loop? If that is the case, you can use different APIs in the instrumentation function to zero down that particular instruction and instead of doing an Insert_call, simply perform the analysis in the instrumentation function. The instrumentation function simply works on the static code and hence will get executed only once.

Related

Protractor and asynchronous property of JS

I wanted to ask about this in general, but I had concerns switching my frontend automation suite from a Java framework to a JavaScript one. Mainly around an individual test running asynchronously.
Could cases potentially happen where tests take steps out of order or do a false positive of passing a test before the last expect argument is resolved?
If they can, in general, how do I resolve this issue?
You have two options to handle asynchronous nature of JS with Protractor:
1) Use async/await
or
2) User control flow (this basically means that if you are not using async/await you are using the Control Flow. It can not be combined as far as I know)
If you are not familiar with the async/await you can simply write your tests in synchronous manner e.g:
browser.get("https://myurl.com")
element(by.id("login").sendKeys("admin")
element(by.id("password").sendKeys("secretpassword")
element(by.buttonText("Login!").click()
expect(element(by.cssContainingText("header", "Welcome Admin")).toBePresent()
Protractor will execute this code in synchronous manner and the biggest plus is that it will actually wait for your Angular application to be ready (nevertheless you will have to use smart waits (Protractor expect condition) from time to time)
Before executing your tests protractor will build a queue of your steps and will execute it one by one since most of this steps are build upon promises. All none-protractor/webdriver method will be added to the call stack in common asynchronous manner. E.g. if you will add console.log("foo") at the end of the previous code snippet it will print the console.log before executing the steps.
This is pretty poor explanation but hope it will help

Abort ExUnit on the first test that does not pass

In Ruby, specifically RSpec, you can tell the test runner to abort on the first test that does not pass by the command-line flag --fail-fast. This helps a lot to not waste time or lose focus when fixing a lot of test in a row, for example when doing test-driven or behavior-driven development.
Now on Elixir with ExUnit I am looking for a way to do exactly that. Is there a way to do this?
There is such an option since Elixir 1.8.
Use the --max-failures switch to limit the number of tests evaluated with failure. To halt the test suite after the first failure, run this:
mix test --max-failures 1
Unfortunately there is (to my knowledge) no such flag implemented.
However, you can run a single test by
mix test path/to/testfile.exs:12
where 12 is the line number of the test.
Hope that helps!
That makes not much sense since tests in Elixir are a) to be run blazingly fast and b) in most cases are to be run asynchronously. Immediate termination of the test suite on the failed test is an anti-pattern and that’s why it’s not allowed by ExUnit authors.
One still has an option to shoot their own leg: just implement a custom handler for the EventManager and kill the whole application on “test failed” event.
For BDD, one preferably uses tags, running the test suite with only this feature included. That way you’ll get an ability to run tests per feature at any time in the future.
Also, as a last resort one might run a specific case only by passing the file name to mix test and/or a specific test only by passing the file name followed by a colon and a line number.

Is there a way to simplify OpenCl kernels usage ?

To use OpenCL kernel the following is needed:
Put the kernel code in a string
call clCreateProgramWithSource
call clBuildProgram
call clCreateKernel
call clSetKernelArg (x number of arguments)
call clEnqueueNDRangeKernel
This need to be done for each kernel. Is there a way to do this repeating less code for each kernel?
There is no way to speed up the process. You need to go step by step as you listed.
But it is important to know why it is needed these steps, to understand how flexible the chain is.
clCreateProgramWithSource: Allows to add different strings from different sources to generate the program. Some string might be static, but some might be downloaded from a server, or loaded from disk. It allows the CL code to be dynamic and updated over time.
clBuildProgram: Builds the program for a given device. Maybe you have 8 devices, so you need to call this multiple times. Each device will produce a different binary code.
clCreateKernel: Creates a kernel. But a kernel is an entry point in a binary. So it is possible you create multiple kernels from a program (for different functions). Also the same kernel might be created multiple times, since it holds the arguments. This is useful for having ready-to-be-launched instances with proper parameters.
clSetKernelArg: Changes the parameters in the instance of the kernel. (it is stored there, so it can used multiple times in the future).
clEnqueueNDRangeKernel: Launches it, configuring the size of the launch and the chain of dependencies with other operations.
So, even if you could have a way to just call "getKernelFromString()", the functionality will be very limited, and not very flexible.
You can have look at wrapper libraries
https://streamhpc.com/knowledge/for-developers/opencl-wrappers/
I suggest you look into SYCL. The building steps are performed offline, saving execution time by skipping the clCreateProgramWithSource. The argument setting is done automatically by the runtime, extracting the information from the user lambda
There is also CLU: https://github.com/Computing-Language-Utility/CLU - see https://www.khronos.org/assets/uploads/developers/library/2012-siggraph-opencl-bof/OpenCL-CLU-and-Intel-SIGGRAPH_Aug12.pdf for more info. It is a very simple tool, but should make life a bit easier.

Difference between write() and printf()

Recently I am studying operating system..I just wanna know:
What’s the difference between a system call (like write()) and a standard library function (like printf())?
A system call is a call to a function that is not part of the application but is inside the kernel. The kernel is a software layer that provides you some basic functionalities to abstract the hardware to you. Roughly, the kernel is something that turns your hardware into software.
You always ultimately use write() to write anything on a peripheral whatever is the kind of device you write on. write() is designed to only write a sequence of bytes, that's all and nothing more. But as write() is considered too basic (you may want to write an integer in ten basis, or a float number in scientific notation, etc), different libraries are provided to you by different kind of programming environments to ease you.
For example, the C programming langage gives you printf() that lets you write data in many different formats. So, you can understand printf() as a function that convert your data into a formatted sequence of bytes and that calls write() to write those bytes onto the output. But C++ gives you cout; Java System.out.println, etc. Each of these functions ends to a call to write() (at least on POSIX systems).
One thing to know (important) is that such a system call is costly! It is not a simple function call because you need to call something that is outside of your own code and the system must ensure that you are not trying to do nasty things, etc. So it is very common in higher print-like function that some buffering is built-in; such that write is not always called, but your data are kept into some hidden structure and written only when it is really needed or necessary (buffer is full or you really want to see the result of your print).
This is exactly what happens when you manage your money. If many people gives you 5 bucks each, you won't go deposit each to the bank! You keep them on your wallet (this is the print) up to the point it is full or you don't want to keep them anymore. Then you go to the bank and make a big deposit (this is the write). And you know that putting 5 bucks to your wallet is much much faster than going to the bank and make the deposit. The bank is the kernel/OS.
System calls are implemented by the operating system, and run in kernel mode. Library functions are implemented in user mode, just like application code. Library functions might invoke system calls (e.g. printf eventually calls write), but that depends on what the library function is for (math functions usually don't need to use the kernel).
System Call's in OS are used in interacting with the OS. E.g. Write() could be used something into the system or into a program.
While Standard Library functions are program specific, E.g. printf() will print something out but it will only be in GUI/command line and wont effect system.
Sorry couldnt comment, because i need 50 reputation to comment.
EDIT: Barmar has good answer
I am writing a small program. At the moment it just reads each line from stdin and prints it to stdout. I can add a call to write in the loop, and it would add a few characters at the end of each line. But when I use printf instead, then all the extra characters are clustered and appear all at once, instead of appearing on each line.
It seems that using printf causes stderr to be buffered. Adding fflush(stdout); after calling printf fixes the discrepancy in output.
I'd like to mention another point that the stdio buffers are maintained in a process’s user-space memory, while system call write transfers data directly to a kernel buffer. It means that if you fork a process after write and printf calls, flushing may bring about to give output three times subject to line-buffering and block-buffering, two of them belong to printf call since stdio buffers are duplicated in the child by fork.
printf() is one of the APIs or interfaces exposed to user space to call functions from C library.
printf() actually uses write() system call. The write() system call is actually responsible for sending data to the output.

Why does zumero_sync need to be called multiple times?

According to the documentation for zumero_sync:
If a large amount of information needs to be pulled from the server,
this function may need to be called more than once.
In my Android app that uses Zumero that's no problem; I just keep calling zumero_sync until the return value doesn't start with "0;".
However, now I'm trying to write an admin script that also syncs with my server dbfiles. I'd like to use the sqlite3 shell, and have the script pass the SQL to execute via command line arguments. I need to call zumero_sync in a loop (which SQLite doesn't support) to make sure the db is fully synced. If I had to, I could invoke sqlite3 in a loop (reading its output, looking for "0;"), or even write a C++ app to call the SQLite/Zumero functions natively. But it certainly would be easier if a single zumero_sync was enough.
I guess my real question is: could zumero_sync be changed so it completes the sync before returning? If there are cases where the existing behavior is more useful, maybe there could be a parameter for specifying which mode to use?
I see two basic questions here:
(1) Why does zumero_sync() work the way it does?
(2) Can it work differently?
I'll answer (2) first, since it's easier: Yes, it could work differently. Rather, we could (and probably will, soon, you brought this up) implement an additional function, named something like zumero_sync_complete(), which performs [the guts of] zumero_sync() in a loop and returns after the sync is complete.
We didn't implement zumero_sync_complete() because it doesn't add much value. It's a simple loop, so you can darn well write it yourself. :-)
Er, except in scripting environments which don't support loops. Like the sqlite3 shell.
Answer to (1):
The Zumero sync protocol is designed to give the server the flexibility to return partial results if it wants to do so. And for the sake of reducing load on the server (and increasing its scalability) it often does want to do exactly that.
Given that, one reason to expose this to the client is to increase the client's flexibility as well. As long we're making multiple roundtrips, we might as well give the client an opportunity to do something (like, maybe, update a progress bar) in between them.
Another thing a client might want to do in between loop iterations is handle an error.
Or, in the case of a multithreaded client, it might want to deal with changes that happened on the client while the sync is going on.
Which raises the question of how locking should be managed? Do we hold the sqlite write lock during the entire loop? Or only when absolutely necessary?
Bottom line: A robust app would probably want to implement the loop itself so that it can make its own decisions and retain full control over things.
But, as you observe, the sqlite3 shell doesn't have loops. And it's not an app. And it doesn't have threads. Or progress bars. So it's a use case where a simpler-and-less-powerful form of zumero_sync() would make sense.

Resources