Who know the history of unix fork? - unix

Fork is a great tool in unix.We can use it to generate our copy and change its behaviour.But I don't know the history of fork.
Does someone can tell me the story?

Actually, unlike many of the basic UNIX features, fork was a relative latecomer (a).
The earliest existence of multiple processes within UNIX consisted of a few (fixed number of) processes, one per terminal that was attached to the PDP-7 machine (b).
The basic idea was that the shell process for a given terminal would accept a command from the user, locate the program file, load a small bootstrap program into high memory and jump to it, passing enough details for the bootstrap code to load the program file.
The bootstrap code, after loading the program into low memory (overwriting the shell), would then jump to it.
When the program was finished, it would call exit but it wasn't like the exit we know and love today. This exit would simply reload the shell and run it using pretty much the same method used to load the program in the first place.
So it was really more like a rudimentary exec command, the one that replaces your current program with another, in the same process space.
The shell would exec your program then, when your program was done, it would again exec the shell by calling exit.
This method was similar to that found in many other interactive systems at the time, including the Multics from whence UNIX got its name.
From the two-way exec, it was actually not that big a leap to adding fork as a process duplicator to work in conjunction. While many systems run another program directly, it's this "just add what's needed" method which is responsible for the separation of duties between fork and exec in UNIX. It also resulted in a very simple fork function.
If you're interested in the early history of various features(c) of Unix, you cannot go past the article The Evolution of the Unix Time-Sharing System by Dennis Ritchie, presented at a 1979 conference in Australia, and subsequently published by AT&T.
(a) Though I mean latecomer in the sense that the separation of the four fundamental forces in the universe was "late", happening some 0.00000000001 seconds after the big bang.</humour>.
(b) Since a question was raised in a comment as to how the shells were originally started off, there's a great resource holding very early source code for Unix over at The Unix Heritage Society, specifically the source code archives and, in particular, the first edition.
The init.s file from the first edition shows how the fixed number of shell processes were created (slightly reformatted):
...
mov $itab, r1 / address of table to r1
1:
mov (r1)+, r0 / 'x, x=0, 1... to r0
beq 1f / branch if table end
movb r0, ttyx+8 / put symbol in ttyx
jsr pc, dfork / go to make new init for this ttyx
mov r0, (r1)+ / save child id in word offer '0, '1, etc
br 1b / set up next child
1:
...
itab:
'0; ..
'1; ..
'2; ..
'3; ..
'4; ..
'5; ..
'6; ..
'7; ..
0
Here you can see the snippet which creates the processes for each connected terminal. These are the days of hard-coded values, no auto detection of terminal quantity involved. The zero-terminated table at itab is used to create a number of processes and hopefully the comments from the code explain how (the only possibly tricky bit is the labels - though there are multiple 1 labels, you branch to the nearest one in a given direction, hence 1b means the closest 1 label in the backwards direction).
The code shown simply processes the table, calling dfork to create a process for each terminal and start getty, the login prompt. The getty program, in turn, eventually started the shell. From that point, it's as I described in the main part of this answer.
(c) No paths (and use of temporary links to get around this limitation), limited processes, why there's a GECOS field in the password file, and all sorts of other trivia, generally interesting only to uber-geeks, of course.

Related

Array job based on R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to construct and submit an array job based on R in the HPC of my university.
I'm used to submit array jobs based on Matlab and I have some doubts on how to translate the overall procedure to R. Let me report a very simple Matlab example and then my questions.
The code is based on 3 files:
"main" which does some preliminary operations.
"subf" which should be run by each task and uses some matrices created by "main".
a bash file which I qsub in the terminal.
1. main:
clear
%% Do all the operations that are common across tasks
% Here, as an example, I create
% 1) a matrix A that I will sum to the output of each task
% 2) a matrix grid; each task will use some rows of the matrix grid
m=1000;
A=rand(m,m);
grid=rand(m,m);
%% Tasks
tasks=10; %number of tasks
jobs=round(size(grid,1)/tasks); %I split the number of rows of the matrix grid among the tasks
2. subf:
%% Set task ID
idtemp=str2double(getenv('SGE_TASK_ID'));
%% Select local grid
if idtemp<tasks
grid_local= grid(jobs*(idtemp-1)+1: idtemp*jobs,:);
else
grid_local= grid(jobs*(idtemp-1)+1: end,:); %for the last task, we should take all the rows of grid that have been left
end
sg_local=size(grid_local,1);
%% Do the task
output=zeros(sg_local,1);
for g=1:sg_local
output(g,:)=sum(sum(A+repmat(grid_local(g,:),m,1)));
end
%% Save output by keeping track of task ID
filename = sprintf('output.%d.mat', ID);
save(filename,'output')
3. bash
#$ -S /bin/bash
#$ -l h_vmem=6G
#$ -l tmem=6G
#$ -l h_rt=480:0:0
#$ -cwd
#$ -j y
#Run 10 tasks where each task has a different $SGE_TASK_ID ranging from 1 to 10
#$ -t 1-10
#$ -N Example
date
hostname
#Output the Task ID
echo "Task ID is $SGE_TASK_ID"
export PATH=/xx/xx/matlab/bin:$PATH
matlab -nodisplay -nodesktop -nojvm -nosplash -r "main; ID = $SGE_TASK_ID; subf; exit"
These are my questions:
Suppose I'm able to translate "main" and "subf" into R language. Should I be extra-careful about anything in particular concerning the parallelisation? For example, do I have to declare some parallel environment, such as parLapply or dopar?
In the "main" file I should also install some R packages. Can I do them locally in my folder directly at the beginning of the "main" file, or should I contact the HPC administrator to install them globally?
I could not find any example of bash file for R in the instructions given by my university. Therefore, I have doubts on how to re-adapt the above bash file. I suppose that the only lines to change are:
export PATH=/xx/xx/matlab/bin:$PATH
matlab -nodisplay -nodesktop -nojvm -nosplash -r "main; ID = $SGE_TASK_ID; subf; exit"
Could you give some hints on how I should change them?
The parallelization is handled by the HPC, right? In which case, I think "no", nothing special required.
It depends on how they allow/enable R. In a HPC that I use (not your school), the individual nodes do not have direct internet access, so it would require special care; this might be the exception, I don't know.
Recommendation: if there is a shared filesystem that both you and all of the nodes can access, then create an R "library" there that contains the installed packages you need, then use .libPaths(...) in your R scripts here to add that to the search path for packages. The only gotcha to this might be if there are non-R shared library (e.g., .dll, .so, .a) requirements. For this, either "docker" or "ask admins".
If you don't have a shared filesystem, then you might ask the cluster admins if they use/prefer docker images (you might provide an image or a DOCKERFILE to create one) or if they have preferred mechanisms for enabling various packages.
I do not recommend asking them to install the packages, for two reasons: First, think about them needing to do this with every person who has a job to run, for any number of programming languages, and then realize that they may have no idea how to do it for that language. Second, package versions are very important, and you asking them to install a package may install either a too-new package or overwrite an older version that somebody else is relying on. (See packrat and renv for discussions on reproducible environments.)
Bottom line, the use of a path you control (and using .libPaths) enables you to have complete control over package versions. If you have not been bitten by unintended consequences of newer-versioned packages, just wait ... congratulations, you've been lucky.
I suggest you can add source("main.R") to the beginning of subf.R, which would make your bash file perhaps as simple as
export PATH=/usr/local/R-4.x.x/bin:$PATH
Rscript /path/to/subf.R
(Noting that you'll need to reference Sys.getenv("SGE_TASK_ID") somewhere in subf.R.)

Mainframe Unix Codepage for SYSPRINT or SYSOUT direct display

Hello this my first question to StackOverflow, not sure about the forum and topic.
While participating in an Open Mainframe initiative using Visual Studio Code and Putty for Unix I developed a sample program in COBOL showing international sayings (german, english, french, spanish, latin for now). It works fine via batch with JCL to file and being called from REXX. In file I can't see special chars for non-english but I had a lucky punch with a twin-program in PL/1 (doing the same and showing the special chars in REXX).
Now my question: I also tried to call by mvscmd from Unix bash script. It works so far but dont show me the special chars. Ok I have last chance to call mvscmd from Python. Or alternatively I can transfer file from MVS to unix (for any reason then it automatically converts and I see my special chars contents).
Where is the place to handle it? Cobol? (as I said, for any reason PL/1 can do. I only use standard put edit in PL/1 vs display in Cobol). Converting the Sysprint/Sysout?
Any specialist can help me?
Hello and sorry for late replay. Well the whole code is a little bit much but I guess my problem is the following - MVSCMD direct coded in the shell script
#!/bin/sh
parm='Z08800.FYD.DATA'
#echo "arg1=>"$1"<"
[ ! -z "$1" ] && parm=$parm","$1
#echo "arg2=>"$2"<"
[ ! -z "$2" ] && parm=$parm","$2
#echo "parm=>"$parm"<"
mvscmd --pgm=saycob --args=$parm \
--steplib='z08800.fyd.load' \
--sysin=dummy \
--sysout=*
I have some more shell script but this is the main. I directly put it to sysout (its the COBOL diplay. I can use fixed string or my saying read from MVS file). When using PL/1 program the last file is then sysprint because PL/1 makes it by PUT EDIT.
I assume my codepage is pretty wrong. But I dont know how to repair. I used some settings in the shell but LANG remains on C ??? By the way this Unix seems to be quite old and I only have the chance to use it until August.
My main interest is to use the program on Mainframe and in JCL and/or REXX.
But they gave us chance with this embedded Unix (?) also so I wanted to try.
Direct Sysout from COBOL program to Unix terminal.
I meant when executing the program on the Mainframe and then watching the result file in ISPF (old stuff) editor by PF3 I can see German and Spanish and French special characters. So they are there seems, produced by COBOL and PL/1.
When transfering the MVS file (kind of PDS) into the UNIX by MVSCMD, it is also fine (special chars) but thats not what I wanted.
I tried to use Python instead flat shell but its going even worse. I cannot direct the Sysout to terminal, all what is Python able to call is on the Mainframe and with the MVS filesystem. So I have to transfer it after. It is to much overhead in my eyes when call say 7 sayings and I want them to be displayed in the Unix terminal lol.
Here is my REXX that is doing the trick
/* rexx */
ARG PARM1 PARM2
PARAMETER = '/Z08800.FYD.DATA'
If Length(PARM1) > 0
Then PARAMETER = PARAMETER","PARM1
If Length(PARM2) > 0
Then PARAMETER = PARAMETER","PARM2
PARAMETER = "'"PARAMETER"'"
Address TSO "Alloc File(sysprint) Dataset(*)"
Address TSO "Alloc File(sysin) Dummy"
Address TSO "Call fyd.load(saypli)" PARAMETER
Address TSO "Free File(sysprint)"
Address TSO "Free File(sysin)"
It is now the other Load, the PL/1 - but the COBOL does the same with Sysout instead of Sysprint.
It is shown in my REXX terminal that is also called by ISPF and then 3.4 in the edit panel. The program has no manual input but reads file. And yes, the sayings are not allocated here, I read them by dynamic allocation but it doesnt matter from where my strings come to the DISPLAY / PUT EDIT
And this now JCL. OK works little different, it stores to PDS member
//SAYCOB JOB
//COBCLG EXEC IGYWCLG,
// PARM.GO='Z08800.FYD.DATA'
// SET MBR=SAYCOB
//COBOL.SYSIN DD DSN=&SYSUID..FYD.SOURCE(&MBR),DISP=SHR
//LKED.SYSLMOD DD DSN=&SYSUID..FYD.LOAD(&MBR),DISP=SHR
//GO.SYSOUT DD SYSOUT=*
//*-------------------------------------------------------------
//*
//*-------------------------------------------------------------
//SAYCOB EXEC PGM=&MBR,PARM='Z08800.FYD.DATA,001,007'
//STEPLIB DD DSN=&SYSUID..FYD.LOAD,DISP=SHR
//SYSOUT DD DSN=&SYSUID..FYD.OUTPUT(&MBR),DISP=SHR
//*-------------------------------------------------------------
//LIST EXEC PGM=LINE80,PARM='/80'
//STEPLIB DD DSN=&SYSUID..FYD.LOAD,DISP=SHR
//SYSIN DD DSN=&SYSUID..FYD.OUTPUT(&MBR),DISP=SHR
//SYSPRINT DD SYSOUT=*
//
Here in the parameter I give them the library to my sayings and then I allocate by PL/1 or COBOL. I can of course show, but its a little bit much, about 200 lines... The problem is not MVS I guess but the Unix codepage.

why "netstat -a" do not exit immediately but "netstat -n" does?

I have checked about the function of "-n" --
"Displays active TCP connections, however, addresses and port numbers are expressed numerically and no attempt is made to determine names."
But I can't see why "-n" can make netstat exit immediately?
From a quick check, I don't see the same description for the "-n" option as you do, and it doesn't make netstat run continuously.
As you didn't specify the version and exact command you are using, I tried both the version that comes with RH7.6 (net-tools 2.10-alpha) and the latest from source code (net-tools 3.14-alpha). The net-tools source code can be found in github [1].
As I couldn't find the exact option you describe, I tried all flags (without combinations) that don't require an argument. As far as I can tell the only options that cause netstat to not exit immediately are '-g' and '-c'. '-c' makes sense as it is the flag for running netstat continuously. For '-g' it isn't as obvious as the continuous behavior is coming from reading the /proc/net/igmp and /proc/net/igmp6 files line-by-line. The first file is read quickly but the igmp6 file takes much longer (1 line per ~1 sec). The '-g' option isn't really continuous, but just takes a lot of time to finish.
From the code, the only reason for continuous execution is (appears 4 times in the code):
if (i || !flag_cnt)
break;
wait_continous();
'i' is a return code from a function and the 'break' command is to break from an infinite for loop, so basically the code will run continuously only if flag_cnt is set (only happens when '-c' is provided) and there were no errors with previous commands.
For the specific issue above there could be a few reasons:
The option involves reading from a file and it takes very long time to finish, but it is not really continuous.
There's a correlation between the given option and flag_cnt, which cause flag_cnt to be set.
There's a call to wait_continous() which doesn't follow the condition above.
As I said, I couldn't reproduce the issue in the original question, nor could I find any flag with the description above. Also, non of the flags besides '-c' caused netstat to run continuously.
If you still want to figure this out I suggest you take a look at your code, or at least specify the net-tools version you use. The kernel version is also important as some code would be compiled-out due to missing kernel support.
[1] https://github.com/ecki/net-tools

First token could not be read or is not the keyword 'FoamFile' in OpenFOAM

I am a beginner to programming. I am trying to run a simulation of a combustion chamber using reactingFoam.
I have modified the counterflow2D tutorial.
For those who maybe don't know OpenFOAM, it is a programme built in C++ but it does not require C++ programming, just well-defining the variables in the files needed.
In one of my first tries I have made a very simple model but since I wanted to check it very well I set it to 60 seconds with a 1e-6 timestep.
My computer is not very powerful so it took me for a day aprox. (by this I mean I'd like to find a solution rather than repeating the simulation).
I executed the solver reactingFOAM using 4 processors in parallel using
mpirun -np 4 reactingFOAM -parallel > log
The log does not show any evidence of error.
The problem is that when I use reconstructPar it works perfectly but then I try to watch the results with paraFoam and this error is shown:
From function bool Foam::IOobject::readHeader(Foam::Istream&)
in file db/IOobject/IOobjectReadHeader.C at line 88
Reading "mypath/constant/reactions" at line 1
First token could not be read or is not the keyword 'FoamFile'
I have read that maybe some files are empty when they are not supposed to be so, but I have not found that problem.
My 'reactions' file have not been modified from the tutorial and has always worked.
edit:
Sorry for the vague question. I have modified it a bit.
A typical OpenFOAM dictionary file always contains a Foam::Istream named FoamFile. An example from a typical system/controlDict file can be seen below:
FoamFile
{
version 2.0;
format ascii;
class dictionary;
location "system";
object controlDict;
}
During the construction of the dictionary header, if this Istream is absent, OpenFOAM ceases its operation by raising an error message that you have experienced:
First token could not be read or is not the keyword 'FoamFile'
The benefit of the header is possibly to contribute OpenFOAM's abstraction mechanisms, which would be difficult otherwise.
As mentioned in the comments, adding the header entity almost always solves this problem.

Pipe file to stdin and keep alive, waiting for new data

A semi-newbie to UNIX piping, so apologies if I'm asking anything obvious here. I'm using a program called CCExtractor to grab the closed captions from a video file. It has the option to receive a file from stdin, and it works great if I do the following:
./ccextractor -stdin < myvideofile.wtv
However, I want to try using it "live" - as a video is recording, it'll transcribe the subtitles. From my understanding, < won't do that, as it'll stop as soon as it reaches the current end of the file. Following this answer on Stack Overflow, it seems like:
tail -c +1 -f myvideofile.wtv | ./ccextractor -stdin
should work - but it doesn't process any part of the video at all (it should, at least, work as well as the previous command and parse the existing data). I figured I'd take a step back and use a simple cat:
cat myvideofile.wtv | ./ccextractor -stdin
and that doesn't work either. I was of the belief that the first and third commands ought to be roughly equivalent, but that's obviously not the case. What are the differences, and how could I get this to work?

Resources