How do I read a directory as a file in Unix?

How do I read a directory as a file in Unix? - unix

I understand that a directory is just a file in unix that contains the inode numbers and names of the files within. How do I take a look at this? I can't use cat or less on a directory, and opening it in vi just shows me a listing of the files...no inode numbers.

Since this is a programming question (it is a programming question, isn't it?), you should check out the opendir, readdir and closedir functions. These are part of the Single UNIX Spec.
#include <sys/types.h>
#include <dirent.h>
DIR *opendir (const char *dirname);
struct dirent *readdir(DIR *dirp);
int closedir(DIR *dirp);
The dirent.h file should have the structure you need, containing at least:
char d_name[] name of entry
ino_t d_ino file serial number
See here for the readdir manpage - it contains links to the others.
Keep in mind that the amount of information about a file stored in the directory entries for it is minimal. The inode itself contains the stuff you get from the stat function, things like times, size, owner, permissions and so on, along with the all-important pointers to the actual file content.

In the old days - Version 7, System III, early System V - you could indeed open a directory and read the contents into memory, especially for the old Unix file system with 2-byte inode numbers and a limit of 14 bytes on the file name.
As more exotic file systems became more prevalent, the opendir(), readdir(), closedir() family of function calls had to be used instead because parsing the contents of a directory became increasingly non-trivial.
Finally, in the last decade or so, it has reached the point where on most systems, you cannot read the directory; you can open it (primarily so operations such as fchdir() can work), and you can use the opendir() family of calls to read it.

It looks like the stat command might be in order. From the article:
stat /etc/passwd
File: `/etc/passwd'
Size: 2911 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 324438 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2008-08-11 05:24:17.000000000 -0400
Modify: 2008-08-03 05:11:05.000000000 -0400
Change: 2008-08-03 05:11:05.000000000 -0400

Related

How can I find the first clusters/blocks of a file?

I have a FAT16 drive that contains the following info:
Bytes per sector: 512 bytes (0x200)
Sectors per cluster: 64 (0x40)
Reserved sectors: 6 (0x06)
Number of FATs: 2 (0x02)
Number of root entries: 512 (0x0200)
Total number of sectors: 3805043 (0x3a0f73)
Sectors per file allocation table: 233 (0xE9)
Root directory is located at sector 472 (0x1d8)
I'm looking for a file with the following details:
File name: LOREMI~1
File extension: TXT
File size: 3284 bytes (0x0cd4)
First cluster: 660 (0x294)
However, I would admit that the start of the file cluster is located at sector 42616. My problem is that what equation should I use that would produce 42616?
I have trouble figuring this out since there is barely any information about this other than a tutorial made by Tavi Systems but the part involving this is very hard to follow.

Actually, the FAT filesystem is fairly well documented. The official FAT documentation by Microsoft can be found by the filename fatgen103.
The directory entry LOREMI~1.TXT can be found in the root directory and is precedented by the long file name entry (xt, lorem ipsum.t → lorem ipsum.txt), the directory entry is documented in the «FAT Directory Structure» chapter; in case of FAT16 you are interested in the 26th to 28th byte to get the cluster address (DIR_FstClusLo), which is (little endian!) 0x0294 (or 660₁₀).
Based on the BPB header information you provided we can calculate the the data sector like this:
data_sector = (cluster-2) * sectors_per_cluster +
(reserved_sectors + (number_of_fats * fat_size) +
first_data_sector)
Why cluster-2? Because the first two clusters in a FAT filesystem are always reserved for the BPB header block as well as the FAT itself, see chapter «FAT Data Structure» in fatgen103.doc.
In order for us to solve this, we still need to determine the sector span of the root directory entry. For FAT12/16 this can be determined like this:
first_data_sector = ((root_entries * directory_entry_size) +
(bytes_per_sector - 1)) // bytes_per_sector
The directory entry size is always 32 bytes as per specification (see chapter «FAT Directory Structure» in fatgen103.doc), every other value is known by now:
first_data_sector = ((512*32)+(512-1)) // 512 → 32
data_sector = (660-2)*64+(6+(2*233)+32) → 42616

Mainframe Unix Codepage for SYSPRINT or SYSOUT direct display

Hello this my first question to StackOverflow, not sure about the forum and topic.
While participating in an Open Mainframe initiative using Visual Studio Code and Putty for Unix I developed a sample program in COBOL showing international sayings (german, english, french, spanish, latin for now). It works fine via batch with JCL to file and being called from REXX. In file I can't see special chars for non-english but I had a lucky punch with a twin-program in PL/1 (doing the same and showing the special chars in REXX).
Now my question: I also tried to call by mvscmd from Unix bash script. It works so far but dont show me the special chars. Ok I have last chance to call mvscmd from Python. Or alternatively I can transfer file from MVS to unix (for any reason then it automatically converts and I see my special chars contents).
Where is the place to handle it? Cobol? (as I said, for any reason PL/1 can do. I only use standard put edit in PL/1 vs display in Cobol). Converting the Sysprint/Sysout?
Any specialist can help me?

Hello and sorry for late replay. Well the whole code is a little bit much but I guess my problem is the following - MVSCMD direct coded in the shell script
#!/bin/sh
parm='Z08800.FYD.DATA'
#echo "arg1=>"$1"<"
[ ! -z "$1" ] && parm=$parm","$1
#echo "arg2=>"$2"<"
[ ! -z "$2" ] && parm=$parm","$2
#echo "parm=>"$parm"<"
mvscmd --pgm=saycob --args=$parm \
--steplib='z08800.fyd.load' \
--sysin=dummy \
--sysout=*
I have some more shell script but this is the main. I directly put it to sysout (its the COBOL diplay. I can use fixed string or my saying read from MVS file). When using PL/1 program the last file is then sysprint because PL/1 makes it by PUT EDIT.
I assume my codepage is pretty wrong. But I dont know how to repair. I used some settings in the shell but LANG remains on C ??? By the way this Unix seems to be quite old and I only have the chance to use it until August.
My main interest is to use the program on Mainframe and in JCL and/or REXX.
But they gave us chance with this embedded Unix (?) also so I wanted to try.
Direct Sysout from COBOL program to Unix terminal.
I meant when executing the program on the Mainframe and then watching the result file in ISPF (old stuff) editor by PF3 I can see German and Spanish and French special characters. So they are there seems, produced by COBOL and PL/1.
When transfering the MVS file (kind of PDS) into the UNIX by MVSCMD, it is also fine (special chars) but thats not what I wanted.
I tried to use Python instead flat shell but its going even worse. I cannot direct the Sysout to terminal, all what is Python able to call is on the Mainframe and with the MVS filesystem. So I have to transfer it after. It is to much overhead in my eyes when call say 7 sayings and I want them to be displayed in the Unix terminal lol.

Here is my REXX that is doing the trick
/* rexx */
ARG PARM1 PARM2
PARAMETER = '/Z08800.FYD.DATA'
If Length(PARM1) > 0
Then PARAMETER = PARAMETER","PARM1
If Length(PARM2) > 0
Then PARAMETER = PARAMETER","PARM2
PARAMETER = "'"PARAMETER"'"
Address TSO "Alloc File(sysprint) Dataset(*)"
Address TSO "Alloc File(sysin) Dummy"
Address TSO "Call fyd.load(saypli)" PARAMETER
Address TSO "Free File(sysprint)"
Address TSO "Free File(sysin)"
It is now the other Load, the PL/1 - but the COBOL does the same with Sysout instead of Sysprint.
It is shown in my REXX terminal that is also called by ISPF and then 3.4 in the edit panel. The program has no manual input but reads file. And yes, the sayings are not allocated here, I read them by dynamic allocation but it doesnt matter from where my strings come to the DISPLAY / PUT EDIT

And this now JCL. OK works little different, it stores to PDS member
//SAYCOB JOB
//COBCLG EXEC IGYWCLG,
// PARM.GO='Z08800.FYD.DATA'
// SET MBR=SAYCOB
//COBOL.SYSIN DD DSN=&SYSUID..FYD.SOURCE(&MBR),DISP=SHR
//LKED.SYSLMOD DD DSN=&SYSUID..FYD.LOAD(&MBR),DISP=SHR
//GO.SYSOUT DD SYSOUT=*
//*-------------------------------------------------------------
//*
//*-------------------------------------------------------------
//SAYCOB EXEC PGM=&MBR,PARM='Z08800.FYD.DATA,001,007'
//STEPLIB DD DSN=&SYSUID..FYD.LOAD,DISP=SHR
//SYSOUT DD DSN=&SYSUID..FYD.OUTPUT(&MBR),DISP=SHR
//*-------------------------------------------------------------
//LIST EXEC PGM=LINE80,PARM='/80'
//STEPLIB DD DSN=&SYSUID..FYD.LOAD,DISP=SHR
//SYSIN DD DSN=&SYSUID..FYD.OUTPUT(&MBR),DISP=SHR
//SYSPRINT DD SYSOUT=*
//
Here in the parameter I give them the library to my sayings and then I allocate by PL/1 or COBOL. I can of course show, but its a little bit much, about 200 lines... The problem is not MVS I guess but the Unix codepage.

getcwd: work with symbolic links

int
main(void)
{
char *ptr;
size_t size;
if (chdir("/usr/spool/uucppublic") < 0)
err_sys("chdir failed");
ptr = path_alloc(&size);
/* our own function */
if (getcwd(ptr, size) == NULL)
err_sys("getcwd failed");
printf("cwd = %s\n", ptr);
exit(0);
}
$ ./a.out
cwd = /var/spool/uucppublic
$ ls -l /usr/spool
lrwxrwxrwx 1 root 12 Jan 31 07:57 /usr/spool -> ../var/spool
Note that chdir follows the symbolic link—as we expect it to, from
Figure 4.17 — but when it goes up the directory tree, getcwd has no
idea when it hits the /var/spool directory that it is pointed to by
the symbolic link /usr/spool. This is a characteristic of symbolic
links.
This all above is from book Advanced Unix Programming by Rago and Stevens.
First, chdir follows symblic links, but what does kernel store under current working directory of the process? Just uucppublic?
Second, what did the author want to state by saying
getcwd has no idea when hits /var/spool
As I understand, getcwd should start reading inode of .. in folder uucppublic to jump to directory spool with parent var, not usr. That is why getcwd should not care whether there was a symbolic or not. Because chdir follows symbolic links.

It looks like you got the idea, but you're parsing the English wrong.
getcwd has no idea when it hits the /var/spool directory that it is pointed to by the symbolic link /usr/spool
"when it hits the /var/spool directory" is a modifier on the whole clause:
getcwd has no idea that it is pointed to by the symbolic link /usr/spool
and in that sentence, the "it" is "the /var/spool directory". So read it like this:
getcwd has no idea that the /var/spool directory is pointed to by the symbolic link /usr/spool
The snippet you pulled out:
getcwd has no idea when hits /var/spool
is not a meaningful fragment because it keeps the modifying "when" clause but drops the more important "that" clause which is the object of "has no idea..."
As a side note, you are working from an old book, so you should be aware that things have changed a little. getcwd is a syscall now (in Linux at least) so the old algorithm (traverse .. and search for matching inode numbers) is no longer used. The dedicated syscall gives the same result faster.

Unix: What are stdin/out/err REALLY?

Assuming the following are correct...
stdin, stdout, and stderr are streams
streams are file descriptors
file descriptors are numbers/indexes in the kernel representing open files
Questions:
a. Does it follow by transition that stdin/out/err involve open files? So if I do ls /dir, does ls output the results to a file referred to by stdout(2)?
b. Where does above file live? in a /proc//? OR is that where the FD lives?
c. What is /dev/stdout? If I do vim /dev/stdout, vim tells me it is not a file. I see there's a series of links that lead to /dev/pts/27. What is going on? I tried to cat /dev/stdout but nothing happens.
d. In general, how is it that "files" in linux are actually NOT files?

Some of your assumptions are incorrect. For example, stdin is of type FILE*; it's not a "file descriptor".
stdin, stdout, and stderr are macros defined in <stdio.h>. (Yes, they're required to be macros, not just variable names). They expand to expressions of type FILE*, and they point to the FILE objects associated with the standard input, output, and error streams.
A "file descriptor" is a small integer value representing a POSIX stream. On UNIX-like systems, FILE* values are generally associated with file descriptors (you can use the fileno and fdopen functions to go from one to the other), but they're not the same thing.
Basically, there are two distinct I/O systems, one built on top of the other. The lower level system uses numeric file descriptors, manipulated via the open, read, write, and close functions, and so forth. The higher level, as defined by the ISO C standard, uses pointers of type FILE*, manipulated with fopen, fread, fwrite, fprintf, putchar, fclose, and so forth.
As I mentioned, on UNIX-like system, the C standard layer is generally implemented on top of the POSIX layer. On non-POSIX systems (like MS Windows), the C standard layer may be implemented on top of some other system-specific interface.
Linux and other UNIX-like systems try (incompletely) to follow an "everything is a file" philosophy. There are a number of file-like entities under /proc. These are not physical files stored on disk; they're entities that can be accessed using either the POSIX or ISO C I/O layers. Neither layer requires the "files" it deals with to be actual disk files, so there's nothing inconsistent about this.
man proc for more information on what's under the /proc directory (there's far more detail than I can put in this answer).

unix: can i write to the same file in parallel without missing entries?

I wrote a script that executes commands in parallel. I let them all write an entry to the same log file. It does not matter if the order is wrong or entries are interleaved, but i noticed that some entries are missing. I should probably lock the file before writing, however, is it true that if multiple processes try to write to a file simultaneously, it will result in missing entries?

Yes, if different processes independently open and write to the same file, it may result in overlapping writes and missing data. This happens because each process will get its own file pointer, that advances only by local writes.
Instead of locking, a better option might be to open the log file once in an ancestor of all worker processes, have it inherited across fork(), and used by them for logging. This means that there will be a single shared file pointer, that advances when any of the processes writes a new entry.

In a script you should use ">> file" (double greater than) to append output to that file. The interpreter will open the destination in "append" mode. If your program also wants to append, follow the directives below:
Open a text file in "append" mode ("a+") and give preference to printing only full lines (don't do multiple 'print' followed by a final 'println', but print the entire line with a single 'println').
The fopen documentation states this:
DESCRIPTION
The fopen() function opens the file whose pathname is the
string pointed to by filename, and associates a stream with
it.
The argument mode points to a string beginning with one of
the following sequences:
r or rb Open file for reading.
w or wb Truncate to zero length or create file
for writing.
a or ab Append; open or create file for writing
at end-of-file.
r+ or rb+ or r+b Open file for update (reading and writ-
ing).
w+ or wb+ or w+b Truncate to zero length or create file
for update.
a+ or ab+ or a+b Append; open or create file for update,
writing at end-of-file.
The character b has no effect, but is allowed for ISO C
standard conformance (see standards(5)). Opening a file with
read mode (r as the first character in the mode argument)
fails if the file does not exist or cannot be read.
Opening a file with append mode (a as the first character in
the mode argument) causes all subsequent writes to the file
to be forced to the then current end-of-file, regardless of
intervening calls to fseek(3C). If two separate processes
open the same file for append, each process may write freely
to the file without fear of destroying output being written
by the other. The output from the two processes will be
intermixed in the file in the order in which it is written.
It is because of this intermixing that you want to give preference to
using only 'println' (or its equivalent).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex