I want to visualize memory mapping states of processes. For this I parsed the output of
# strace -s 256 -v -k -f -e trace=memory,process command
and now I have a time series of disjoint sums of intervals on the real line. Is there a convenient visualization library for such data? Haskell interface would be the most time-saving for me, but any suggestion is welcome. Thanks!
Just in case this might be useful for anyone, I hacked up a little tool to do this. (By the way I ended up using R/Shiny for interactive visualization.)
Here's the github repo.
It's interactive in that if you click a region, the stack traces responsible for the memory mapping
will be shown like this:
trace:
22695 mmap(NULL, 251658240, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2b4210000000
/lib/x86_64-linux-gnu/libc-2.19.so(mmap64+0xa) [0xf487a]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN2os17pd_reserve_memoryEmPcm+0x31) [0x91e9c1]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN2os14reserve_memoryEmPcm+0x20) [0x91ced0]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN13ReservedSpace10initializeEmmbPcmb+0x256) [0xac20a6]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN17ReservedCodeSpaceC1Emmb+0x2c) [0xac270c]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN8CodeHeap7reserveEmmm+0xa5) [0x61a3c5]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN9CodeCache10initializeEv+0x80) [0x47ff50]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_Z12init_globalsv+0x45) [0x63c905]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(_ZN7Threads9create_vmEP14JavaVMInitArgsPb+0x23e) [0xa719be]
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so(JNI_CreateJavaVM+0x74) [0x6d11c4]
/usr/lib/jvm/java-8-oracle/lib/amd64/jli/libjli.so(JavaMain+0x9e) [0x745e]
/lib/x86_64-linux-gnu/libpthread-2.19.so(start_thread+0xc4) [0x8184]
/lib/x86_64-linux-gnu/libc-2.19.so(clone+0x6d) [0xfa37d]
The same colors correspond to the same flags for mmap/msync/madvise etc.
Synopsis
$ make show-prerequisites
# (Follow the instructions)
$ make COMMAND="time ls"
...
DATA_DIR=build/data-2016-12-12_02h38m13s
Listening on http://127.0.0.1:5000
....
$ firefox http://127.0.0.1:5000
$ # Re-browse the previous results
$ make DATA_DIR=build/data-2016-12-12_02h38m13s
In the process of development I realized the striking geometricity of the problem.
So I created a module called Sheaf and described there a recipe for defining a Grothendieck
topology and a constant sheaf on it. It now seems the Grothendieck (or Lawvere-Tierney) topologies
are actually ubiquitous for programming.. but I'm not sure if it will prove anything worthy.
So feel free to check it!
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to construct and submit an array job based on R in the HPC of my university.
I'm used to submit array jobs based on Matlab and I have some doubts on how to translate the overall procedure to R. Let me report a very simple Matlab example and then my questions.
The code is based on 3 files:
"main" which does some preliminary operations.
"subf" which should be run by each task and uses some matrices created by "main".
a bash file which I qsub in the terminal.
1. main:
clear
%% Do all the operations that are common across tasks
% Here, as an example, I create
% 1) a matrix A that I will sum to the output of each task
% 2) a matrix grid; each task will use some rows of the matrix grid
m=1000;
A=rand(m,m);
grid=rand(m,m);
%% Tasks
tasks=10; %number of tasks
jobs=round(size(grid,1)/tasks); %I split the number of rows of the matrix grid among the tasks
2. subf:
%% Set task ID
idtemp=str2double(getenv('SGE_TASK_ID'));
%% Select local grid
if idtemp<tasks
grid_local= grid(jobs*(idtemp-1)+1: idtemp*jobs,:);
else
grid_local= grid(jobs*(idtemp-1)+1: end,:); %for the last task, we should take all the rows of grid that have been left
end
sg_local=size(grid_local,1);
%% Do the task
output=zeros(sg_local,1);
for g=1:sg_local
output(g,:)=sum(sum(A+repmat(grid_local(g,:),m,1)));
end
%% Save output by keeping track of task ID
filename = sprintf('output.%d.mat', ID);
save(filename,'output')
3. bash
#$ -S /bin/bash
#$ -l h_vmem=6G
#$ -l tmem=6G
#$ -l h_rt=480:0:0
#$ -cwd
#$ -j y
#Run 10 tasks where each task has a different $SGE_TASK_ID ranging from 1 to 10
#$ -t 1-10
#$ -N Example
date
hostname
#Output the Task ID
echo "Task ID is $SGE_TASK_ID"
export PATH=/xx/xx/matlab/bin:$PATH
matlab -nodisplay -nodesktop -nojvm -nosplash -r "main; ID = $SGE_TASK_ID; subf; exit"
These are my questions:
Suppose I'm able to translate "main" and "subf" into R language. Should I be extra-careful about anything in particular concerning the parallelisation? For example, do I have to declare some parallel environment, such as parLapply or dopar?
In the "main" file I should also install some R packages. Can I do them locally in my folder directly at the beginning of the "main" file, or should I contact the HPC administrator to install them globally?
I could not find any example of bash file for R in the instructions given by my university. Therefore, I have doubts on how to re-adapt the above bash file. I suppose that the only lines to change are:
export PATH=/xx/xx/matlab/bin:$PATH
matlab -nodisplay -nodesktop -nojvm -nosplash -r "main; ID = $SGE_TASK_ID; subf; exit"
Could you give some hints on how I should change them?
The parallelization is handled by the HPC, right? In which case, I think "no", nothing special required.
It depends on how they allow/enable R. In a HPC that I use (not your school), the individual nodes do not have direct internet access, so it would require special care; this might be the exception, I don't know.
Recommendation: if there is a shared filesystem that both you and all of the nodes can access, then create an R "library" there that contains the installed packages you need, then use .libPaths(...) in your R scripts here to add that to the search path for packages. The only gotcha to this might be if there are non-R shared library (e.g., .dll, .so, .a) requirements. For this, either "docker" or "ask admins".
If you don't have a shared filesystem, then you might ask the cluster admins if they use/prefer docker images (you might provide an image or a DOCKERFILE to create one) or if they have preferred mechanisms for enabling various packages.
I do not recommend asking them to install the packages, for two reasons: First, think about them needing to do this with every person who has a job to run, for any number of programming languages, and then realize that they may have no idea how to do it for that language. Second, package versions are very important, and you asking them to install a package may install either a too-new package or overwrite an older version that somebody else is relying on. (See packrat and renv for discussions on reproducible environments.)
Bottom line, the use of a path you control (and using .libPaths) enables you to have complete control over package versions. If you have not been bitten by unintended consequences of newer-versioned packages, just wait ... congratulations, you've been lucky.
I suggest you can add source("main.R") to the beginning of subf.R, which would make your bash file perhaps as simple as
export PATH=/usr/local/R-4.x.x/bin:$PATH
Rscript /path/to/subf.R
(Noting that you'll need to reference Sys.getenv("SGE_TASK_ID") somewhere in subf.R.)
I have checked about the function of "-n" --
"Displays active TCP connections, however, addresses and port numbers are expressed numerically and no attempt is made to determine names."
But I can't see why "-n" can make netstat exit immediately?
From a quick check, I don't see the same description for the "-n" option as you do, and it doesn't make netstat run continuously.
As you didn't specify the version and exact command you are using, I tried both the version that comes with RH7.6 (net-tools 2.10-alpha) and the latest from source code (net-tools 3.14-alpha). The net-tools source code can be found in github [1].
As I couldn't find the exact option you describe, I tried all flags (without combinations) that don't require an argument. As far as I can tell the only options that cause netstat to not exit immediately are '-g' and '-c'. '-c' makes sense as it is the flag for running netstat continuously. For '-g' it isn't as obvious as the continuous behavior is coming from reading the /proc/net/igmp and /proc/net/igmp6 files line-by-line. The first file is read quickly but the igmp6 file takes much longer (1 line per ~1 sec). The '-g' option isn't really continuous, but just takes a lot of time to finish.
From the code, the only reason for continuous execution is (appears 4 times in the code):
if (i || !flag_cnt)
break;
wait_continous();
'i' is a return code from a function and the 'break' command is to break from an infinite for loop, so basically the code will run continuously only if flag_cnt is set (only happens when '-c' is provided) and there were no errors with previous commands.
For the specific issue above there could be a few reasons:
The option involves reading from a file and it takes very long time to finish, but it is not really continuous.
There's a correlation between the given option and flag_cnt, which cause flag_cnt to be set.
There's a call to wait_continous() which doesn't follow the condition above.
As I said, I couldn't reproduce the issue in the original question, nor could I find any flag with the description above. Also, non of the flags besides '-c' caused netstat to run continuously.
If you still want to figure this out I suggest you take a look at your code, or at least specify the net-tools version you use. The kernel version is also important as some code would be compiled-out due to missing kernel support.
[1] https://github.com/ecki/net-tools
I have a C++ library and it has a few of C++ static objects. The library could suffer from C++ static initialization fiasco. I'm trying to vet unforeseen translation unit dependencies by randomizing the order of the *.o files during a build.
I visited 2.3 How make Processes a Makefile in the GNU manual and it tells me:
Goals are the targets that make strives ultimately to update. You can override this behavior using the command line (see Arguments to Specify the Goals) ...
I also followed to 9.2 Arguments to Specify the Goals, but a treatment was not provided. It did not surprise me.
Is it possible to have Make randomize its goals? If so, then how do I do it?
If not, are there any alternatives? This is in a test environment, so I have more tools available to me than just GNUmake.
Thanks in advance.
This is really implementation-defined, but GNU Make will process targets from left to right.
Say you have an OBJS variable with the objects you want to randomize, you could write something like (using e.g. shuf):
RAND_OBJS := $(shell shuf -e -- $(OBJS))
random_build: $(RAND_OBJS)
This holds as long as you're not using parallel make (-j option). If you are the order will still be randomized, but it will also depend on number of jobs, system load, current phase of the moon, etc.
Next release of GNU make will have --shuffle mode. It will allow you to execute prerequisites in random order to shake out missing dependencies by running $ make --shuffle.
The feature was recently added in https://savannah.gnu.org/bugs/index.php?62100 and so far is available only in GNU make's git tree.
My Jenkins server is running arc diff, and once in a while I have large diffs, I don't want my job to fail if that is the case:
Right with the latest master of arc, I get:
This diff has a very large number of changes (762). Differential works
best for changes which will receive detailed human review, and not as
well for large automated changes or bulk checkins. See
https://secure.phabricator.com/book/phabricator/article/differential_large_changes/
for information about reviewing big checkins. Continue anyway? [y/N]
[1mUsage Exception:[m Aborted generation of gigantic diff.
Build step 'Execute shell' marked build as failure
My current code tries to avoid interactivity and mostly works, except for large diffs. Any way around this?
echo "jenkins
Summary:
Test Plan:
required
Reviewers:
alberto56
Subscribers:
JIRA Issues:
$JIRAISSUE" > arc_info.txt
arc diff --allow-untracked --message jenkins --message-file arc_info.txt origin/master
rm arc_info.txt
There is no interaction option (yet) for arc diff. You may wanna try something like:
echo 'y' | arc diff ...
or even
echo 'y y y' | arc diff ...
You could also use the Yes command: http://linux.die.net/man/1/yes
Fork is a great tool in unix.We can use it to generate our copy and change its behaviour.But I don't know the history of fork.
Does someone can tell me the story?
Actually, unlike many of the basic UNIX features, fork was a relative latecomer (a).
The earliest existence of multiple processes within UNIX consisted of a few (fixed number of) processes, one per terminal that was attached to the PDP-7 machine (b).
The basic idea was that the shell process for a given terminal would accept a command from the user, locate the program file, load a small bootstrap program into high memory and jump to it, passing enough details for the bootstrap code to load the program file.
The bootstrap code, after loading the program into low memory (overwriting the shell), would then jump to it.
When the program was finished, it would call exit but it wasn't like the exit we know and love today. This exit would simply reload the shell and run it using pretty much the same method used to load the program in the first place.
So it was really more like a rudimentary exec command, the one that replaces your current program with another, in the same process space.
The shell would exec your program then, when your program was done, it would again exec the shell by calling exit.
This method was similar to that found in many other interactive systems at the time, including the Multics from whence UNIX got its name.
From the two-way exec, it was actually not that big a leap to adding fork as a process duplicator to work in conjunction. While many systems run another program directly, it's this "just add what's needed" method which is responsible for the separation of duties between fork and exec in UNIX. It also resulted in a very simple fork function.
If you're interested in the early history of various features(c) of Unix, you cannot go past the article The Evolution of the Unix Time-Sharing System by Dennis Ritchie, presented at a 1979 conference in Australia, and subsequently published by AT&T.
(a) Though I mean latecomer in the sense that the separation of the four fundamental forces in the universe was "late", happening some 0.00000000001 seconds after the big bang.</humour>.
(b) Since a question was raised in a comment as to how the shells were originally started off, there's a great resource holding very early source code for Unix over at The Unix Heritage Society, specifically the source code archives and, in particular, the first edition.
The init.s file from the first edition shows how the fixed number of shell processes were created (slightly reformatted):
...
mov $itab, r1 / address of table to r1
1:
mov (r1)+, r0 / 'x, x=0, 1... to r0
beq 1f / branch if table end
movb r0, ttyx+8 / put symbol in ttyx
jsr pc, dfork / go to make new init for this ttyx
mov r0, (r1)+ / save child id in word offer '0, '1, etc
br 1b / set up next child
1:
...
itab:
'0; ..
'1; ..
'2; ..
'3; ..
'4; ..
'5; ..
'6; ..
'7; ..
0
Here you can see the snippet which creates the processes for each connected terminal. These are the days of hard-coded values, no auto detection of terminal quantity involved. The zero-terminated table at itab is used to create a number of processes and hopefully the comments from the code explain how (the only possibly tricky bit is the labels - though there are multiple 1 labels, you branch to the nearest one in a given direction, hence 1b means the closest 1 label in the backwards direction).
The code shown simply processes the table, calling dfork to create a process for each terminal and start getty, the login prompt. The getty program, in turn, eventually started the shell. From that point, it's as I described in the main part of this answer.
(c) No paths (and use of temporary links to get around this limitation), limited processes, why there's a GECOS field in the password file, and all sorts of other trivia, generally interesting only to uber-geeks, of course.