Instrumenting R programs using intel-pin

Instrumenting R programs using intel-pin - r

I am instrumenting R programs using pinatrace.so tool to generate trace for read and write memory instructions. What I observe is that multiple #eof statements get printed in the trace file at different places(which should have been actually get printed only at the end of the trace). Also, the immediate next line after #eof gets distorted and is not printed properly.
I am invoking R shell and my R program using the following command:
../../../pin -follow_execv -t obj-intel64/pinatrace.so -- /home/R-3.5.3./bin/R -f hello.R
The trace file gets printed as shown:
0 0x7ffc812cd1c8
1 0x7ffc812cd1c8
0 0x7f7f8555ee78
#eof
f6971ce8
1 0x6f4518
0 0x7ffc171a0b70
.....
.....
1 0x7ffc6da8f078
0 0x7f7c38786e78
#eof
ffc171a07c8
0 0x6f4e30
0 0x6ff918
What is wrong with this instrumentation?

When Pin is invoked with the follow_execv knob, it will create a new copy of itself in every child process that is created. The new copy is not aware that another copy is running in the parent or at all. See here:
If -follow_execv is enabled and the user has not registered to get a notification, Pin will be injected into child/exec-ed process with the same command line as of current process.
If the Pintool wasn't created with -follow_execv in mind, all copies of the Pintool will normally write to the same file. This will create strange artifacts such as what you're seeing, as different processes write to the same file and terminate it while other processes are writing after the terminator.
The simplest solution is to add a PID suffix to the file, another option is to use the Follow Child Process API (linked above) to determine which subprocess is the actual R program you want to trace. Finally, R may have support for instrumentation which you could use to instrument the program itself.

Related

Issue in executing a batch file using PeopleCode in Application engine program

I want to execute a batch file using People code in Application Engine Program. But The program have an issue returning Exec code as a non zero value (Value - 1).
Below is people code snippet below.
Global File &FileLog;
Global string &LogFileName, &Servername, &commandline;
Local string &Footer;
If &Servername = "PSNT" Then
&ScriptName = "D: && D:\psoft\PT854\appserv\prcs\RNBatchFile.bat";
End-If;
&commandline = &ScriptName;
/* Need to commit work or Exec will fail */
CommitWork();
&ExitCode = Exec("cmd.exe /c " | &commandline, %Exec_Synchronous + %FilePath_Absolute);
If &ExitCode <> 0 Then
MessageBox(0, "", 0, 0, ("Batch File Call Failed! Exit code returned by script was " | &ExitCode));
End-If;
Any help how to resolve this issue.

Best bet is to do a trace of the execution.
Thoughts:
Can you log on the the process scheduler you are running this on and execute the script OK?
Is the AE being scheduled or called at run-time?
You should not need to change directory as you are using a fully qualified path to the script.
you should not need to call "cmd /c" as this will create an additional shell for you application to run within, making debuging harder, etc.
Run a trace, and drop us the output. :) HTH

What about changing the working directory to D: inside of the script instead? You are invoking two commands and I'm wondering what the shell is returning to exec. I'm assuming you wrote your script to give the appropriate return code and that isn't the problem.
I couldn't tell from the question text, but are you looking for a negative result, such as -1? I think return codes are usually positive. 0 for success, some other positive number for failure. Negative numbers may be acceptable, but am wondering if Exec doesn't like negative numbers?
Perhaps the PeopleCode ChDir function still works as an alternative to two commands in one line? I haven't tried it for a LONG time.
Another alternative that gives you significant control over the process is to use java.lang.Runtime.exec from PeopleCode: http://jjmpsj.blogspot.com/2010/02/exec-processes-while-controlling-stdin.html.

AutoIt Scripting for an External CLI Program - eac3to.exe

I am attempting to design a front end GUI for a CLI program by the name of eac3to.exe. The problem as I see it is that this program sends all of it's output to a cmd window. This is giving me no end of trouble because I need to get a lot of this output into a GUI window. This sounds easy enough, but I am begining to wonder whether I have found one of AutoIt's limitations?
I can use the Run() function with a windows internal command such as Dir and then get the output into a variable with the AutoIt StdoutRead() function, but I just can't get the output from an external program such as eac3to.exe - it just doesn't seem to work whatever I do! Just for testing purposesI I don't even need to get the output to a a GUI window: just printing it with ConsoleWrite() is good enough as this proves that I was able to read it into a variable. So at this stage that's all I need to do - get the text (usually about 10 lines) that has been output to a cmd window by my external CLI program into a variable. Once I can do this the rest will be a lot easier. This is what I have been trying, but it never works:
Global $iPID = Run("C:\VIDEO_EDITING\eac3to\eac3to.exe","", #SW_SHOW)
Global $ScreenOutput = StdoutRead($iPID)
ConsoleWrite($ScreenOutput & #CRLF)
After running this script all I get from the consolWrite() is a blank line - not the text data that was output as a result of running eac3to.exe (running eac3to without any arguments just lists a screen of help text relating to all the commandline options), and that's what I am trying to get into a variable so that I can put it to use later in the program.

Before I suggest a solution let me just tell you that Autoit has one
of the best help files out there. Use it.
You are missing $STDOUT_CHILD = Provide a handle to the child's STDOUT stream.
Also, you can't just do RUN and immediately call stdoutRead. At what point did you give the app some time to do anything and actually print something back to the console?
You need to either use ProcessWaitClose and read the stream then or, you should read the stream in a loop. Simplest check would be to set a sleep between RUN and READ and see what happens.
#include <AutoItConstants.au3>
Global $iPID = Run("C:\VIDEO_EDITING\eac3to\eac3to.exe","", #SW_SHOW, $STDOUT_CHILD)
; Wait until the process has closed using the PID returned by Run.
ProcessWaitClose($iPID)
; Read the Stdout stream of the PID returned by Run. This can also be done in a while loop. Look at the example for StderrRead.
; If the proccess doesnt end when finished you need to put this inside of a loop.
Local $ScreenOutput = StdoutRead($iPID)
ConsoleWrite($ScreenOutput & #CRLF)

Process many EDI files through single MFX

I've created a mapping in MapForce 2013 and exported the MFX file. Now, I need to be able to run the mapping using MapForce Server. The problem is, I need to specify both the input EDI file and the output file. As far as I can tell, the usage pattern is to run the mapping with MapForce server using the input/output configuration in the MFX itself, not passed in on the command line.
I suppose I could change the input/output to some standard file name and then just write the input file to that path before performing the mapping, and then grab the output from the standard output file path when the mapping is complete.
But I'd prefer to be able to do something like:
MapForceServer run -in=MyInputFile.txt -out=MyOutputFile.xml MyMapping.mfx > MyLogFile.txt
Is something like this possible? Perhaps using parameters within the mapping?

There are two options that I've come across in dealing with a similar situation.
Option 1- If you set the input XML file to *.xml in the component settings, mapforceserver.exe will automatically process all txt in the directory assuming your source is xml (this should work for text just the same). Similar to the example below you can set a cleanup routine to move the files into another folder after processing.
Note: It looks in the folder where the schema file is located.
Option 2 - Since your output is XML you can use Altova's raptorxml (rack up another license charge). Now you can generate code in XSLT 2.0 and use a batch file to automatically execute, something like this.
::#echo off
for %%f IN (*.xml) DO (RaptorXML xslt --xslt-version=2 --input="%%f" --output="out/%%f" %* "mymapping.xslt"
if NOT errorlevel 1 move "%%f" processed
if errorlevel 1 move "%%f" error)
sleep 15
mymapping.bat
I tossed in a sleep command to loop the batch for rechecking every 15 seconds. Unfortunately this does not work if your output target is a database.

unix: can i write to the same file in parallel without missing entries?

I wrote a script that executes commands in parallel. I let them all write an entry to the same log file. It does not matter if the order is wrong or entries are interleaved, but i noticed that some entries are missing. I should probably lock the file before writing, however, is it true that if multiple processes try to write to a file simultaneously, it will result in missing entries?

Yes, if different processes independently open and write to the same file, it may result in overlapping writes and missing data. This happens because each process will get its own file pointer, that advances only by local writes.
Instead of locking, a better option might be to open the log file once in an ancestor of all worker processes, have it inherited across fork(), and used by them for logging. This means that there will be a single shared file pointer, that advances when any of the processes writes a new entry.

In a script you should use ">> file" (double greater than) to append output to that file. The interpreter will open the destination in "append" mode. If your program also wants to append, follow the directives below:
Open a text file in "append" mode ("a+") and give preference to printing only full lines (don't do multiple 'print' followed by a final 'println', but print the entire line with a single 'println').
The fopen documentation states this:
DESCRIPTION
The fopen() function opens the file whose pathname is the
string pointed to by filename, and associates a stream with
it.
The argument mode points to a string beginning with one of
the following sequences:
r or rb Open file for reading.
w or wb Truncate to zero length or create file
for writing.
a or ab Append; open or create file for writing
at end-of-file.
r+ or rb+ or r+b Open file for update (reading and writ-
ing).
w+ or wb+ or w+b Truncate to zero length or create file
for update.
a+ or ab+ or a+b Append; open or create file for update,
writing at end-of-file.
The character b has no effect, but is allowed for ISO C
standard conformance (see standards(5)). Opening a file with
read mode (r as the first character in the mode argument)
fails if the file does not exist or cannot be read.
Opening a file with append mode (a as the first character in
the mode argument) causes all subsequent writes to the file
to be forced to the then current end-of-file, regardless of
intervening calls to fseek(3C). If two separate processes
open the same file for append, each process may write freely
to the file without fear of destroying output being written
by the other. The output from the two processes will be
intermixed in the file in the order in which it is written.
It is because of this intermixing that you want to give preference to
using only 'println' (or its equivalent).

Who know the history of unix fork?

Fork is a great tool in unix.We can use it to generate our copy and change its behaviour.But I don't know the history of fork.
Does someone can tell me the story?

Actually, unlike many of the basic UNIX features, fork was a relative latecomer (a).
The earliest existence of multiple processes within UNIX consisted of a few (fixed number of) processes, one per terminal that was attached to the PDP-7 machine (b).
The basic idea was that the shell process for a given terminal would accept a command from the user, locate the program file, load a small bootstrap program into high memory and jump to it, passing enough details for the bootstrap code to load the program file.
The bootstrap code, after loading the program into low memory (overwriting the shell), would then jump to it.
When the program was finished, it would call exit but it wasn't like the exit we know and love today. This exit would simply reload the shell and run it using pretty much the same method used to load the program in the first place.
So it was really more like a rudimentary exec command, the one that replaces your current program with another, in the same process space.
The shell would exec your program then, when your program was done, it would again exec the shell by calling exit.
This method was similar to that found in many other interactive systems at the time, including the Multics from whence UNIX got its name.
From the two-way exec, it was actually not that big a leap to adding fork as a process duplicator to work in conjunction. While many systems run another program directly, it's this "just add what's needed" method which is responsible for the separation of duties between fork and exec in UNIX. It also resulted in a very simple fork function.
If you're interested in the early history of various features(c) of Unix, you cannot go past the article The Evolution of the Unix Time-Sharing System by Dennis Ritchie, presented at a 1979 conference in Australia, and subsequently published by AT&T.
(a) Though I mean latecomer in the sense that the separation of the four fundamental forces in the universe was "late", happening some 0.00000000001 seconds after the big bang.</humour>.
(b) Since a question was raised in a comment as to how the shells were originally started off, there's a great resource holding very early source code for Unix over at The Unix Heritage Society, specifically the source code archives and, in particular, the first edition.
The init.s file from the first edition shows how the fixed number of shell processes were created (slightly reformatted):
...
mov $itab, r1 / address of table to r1
1:
mov (r1)+, r0 / 'x, x=0, 1... to r0
beq 1f / branch if table end
movb r0, ttyx+8 / put symbol in ttyx
jsr pc, dfork / go to make new init for this ttyx
mov r0, (r1)+ / save child id in word offer '0, '1, etc
br 1b / set up next child
1:
...
itab:
'0; ..
'1; ..
'2; ..
'3; ..
'4; ..
'5; ..
'6; ..
'7; ..
0
Here you can see the snippet which creates the processes for each connected terminal. These are the days of hard-coded values, no auto detection of terminal quantity involved. The zero-terminated table at itab is used to create a number of processes and hopefully the comments from the code explain how (the only possibly tricky bit is the labels - though there are multiple 1 labels, you branch to the nearest one in a given direction, hence 1b means the closest 1 label in the backwards direction).
The code shown simply processes the table, calling dfork to create a process for each terminal and start getty, the login prompt. The getty program, in turn, eventually started the shell. From that point, it's as I described in the main part of this answer.
(c) No paths (and use of temporary links to get around this limitation), limited processes, why there's a GECOS field in the password file, and all sorts of other trivia, generally interesting only to uber-geeks, of course.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex