unix: can i write to the same file in parallel without missing entries? - unix

I wrote a script that executes commands in parallel. I let them all write an entry to the same log file. It does not matter if the order is wrong or entries are interleaved, but i noticed that some entries are missing. I should probably lock the file before writing, however, is it true that if multiple processes try to write to a file simultaneously, it will result in missing entries?

Yes, if different processes independently open and write to the same file, it may result in overlapping writes and missing data. This happens because each process will get its own file pointer, that advances only by local writes.
Instead of locking, a better option might be to open the log file once in an ancestor of all worker processes, have it inherited across fork(), and used by them for logging. This means that there will be a single shared file pointer, that advances when any of the processes writes a new entry.

In a script you should use ">> file" (double greater than) to append output to that file. The interpreter will open the destination in "append" mode. If your program also wants to append, follow the directives below:
Open a text file in "append" mode ("a+") and give preference to printing only full lines (don't do multiple 'print' followed by a final 'println', but print the entire line with a single 'println').
The fopen documentation states this:
DESCRIPTION
The fopen() function opens the file whose pathname is the
string pointed to by filename, and associates a stream with
it.
The argument mode points to a string beginning with one of
the following sequences:
r or rb Open file for reading.
w or wb Truncate to zero length or create file
for writing.
a or ab Append; open or create file for writing
at end-of-file.
r+ or rb+ or r+b Open file for update (reading and writ-
ing).
w+ or wb+ or w+b Truncate to zero length or create file
for update.
a+ or ab+ or a+b Append; open or create file for update,
writing at end-of-file.
The character b has no effect, but is allowed for ISO C
standard conformance (see standards(5)). Opening a file with
read mode (r as the first character in the mode argument)
fails if the file does not exist or cannot be read.
Opening a file with append mode (a as the first character in
the mode argument) causes all subsequent writes to the file
to be forced to the then current end-of-file, regardless of
intervening calls to fseek(3C). If two separate processes
open the same file for append, each process may write freely
to the file without fear of destroying output being written
by the other. The output from the two processes will be
intermixed in the file in the order in which it is written.
It is because of this intermixing that you want to give preference to
using only 'println' (or its equivalent).

Related

Fortran90: Scripting of Standard In not working as expected

Working with Fortran90 in Unix...
I have a programme which needs to read in the input parameters from a file "input-deck.par". This filename is currently hard-coded but I want to run a number of runs using different input-deck files (input-deck01.par, input-deck02.par, input-deck03.par etc.) so I've set-up the code to do a simple "read(*,*) inpfile" to allow the user to input the name of this file directly on run-time with a view to scripting this later.
This works fine interactively. If I execute the programme it asks for the file name, you type it in and the filename is accepted, the file is opened and the programme picks up the parameters from that file.
The issue lies in scripting this. I've tried scripting using the "<" pipe command so:
myprog.e < input-deck01.par
But I get an error saying:
Fortran runtime error: Cannot open file '----------------': No such file or directory
If I print the filename right after the input line, it prints that the filename is '----------------' (I initialise the variable as having 16 characters hence the 16 hyphens I think)
It seems like the "<" pipe is not passing the keyboard input in correctly. I've tried with spaces and quotes around the filename (various combinations) but the errors are the same.
Can anyone help?
(Please be gentle- this is my first post on SO and Fortran is not my primary language....)
Fortran has the ability to read the command line arguments. A simple example is
program foo
implicit none
character(len=80) name
logical available
integer fd
if (command_argument_count() == 1) then
call get_command_argument(1, name)
else
call usage
end if
inquire(file=name, exist=available)
if (.not. available) then
call usage
end if
open(newunit=fd, file=name, status='old')
! Read file
contains
subroutine usage
write(*,'(A)') 'Usage: foo filename'
write(*,'(A)') ' filename --> file containing input info'
stop
end subroutine usage
end program foo
Instead of piping the file into the executable you simply do
% foo input.txt

Looping through the content of a file in Zsh

I'm trying to loop through the contents of a file in zsh. In my loop I want to get user input. Going off of this answer for Bash, I'm attempting to do:
while read -u 10 line; do
echo $line;
# TODO read from stdin here, etc.
done 10<myfile.txt
However I get an error:
zsh: parse error near `10'
Referring to the 10 after the done. Obviously I'm not getting the file descriptor syntax right, but I'm having trouble figuring out the docs.
Use a file descriptor number less than 10. If you want to hard code file descriptor numbers, stick to the range 3-9 (plus 0-2 for stdin,out,err). When zsh needs file descriptors itself, it uses them in the 10+ range.
If you're even getting close to needing more than the 7 available hard coded file descriptors, you should really think about using variables to name them. Syntax like exec {myfd}<myfile.txt will open a file with zsh allocating a file descriptor greater than 10 and assigning it to $myfd.
Bourne shell syntax is not entirely unambiguous given file descriptors numbering 10 and over and even in bash, I'd advise against using them. I'm not entirely sure how bash avoids conflicts if it needs to open any for internal use - I guess it never needs to leave any open. This may look like a zsh limitation at first sight but is actually a sensible feature.

Unix: What are stdin/out/err REALLY?

Assuming the following are correct...
stdin, stdout, and stderr are streams
streams are file descriptors
file descriptors are numbers/indexes in the kernel representing open files
Questions:
a. Does it follow by transition that stdin/out/err involve open files? So if I do ls /dir, does ls output the results to a file referred to by stdout(2)?
b. Where does above file live? in a /proc//? OR is that where the FD lives?
c. What is /dev/stdout? If I do vim /dev/stdout, vim tells me it is not a file. I see there's a series of links that lead to /dev/pts/27. What is going on? I tried to cat /dev/stdout but nothing happens.
d. In general, how is it that "files" in linux are actually NOT files?
Some of your assumptions are incorrect. For example, stdin is of type FILE*; it's not a "file descriptor".
stdin, stdout, and stderr are macros defined in <stdio.h>. (Yes, they're required to be macros, not just variable names). They expand to expressions of type FILE*, and they point to the FILE objects associated with the standard input, output, and error streams.
A "file descriptor" is a small integer value representing a POSIX stream. On UNIX-like systems, FILE* values are generally associated with file descriptors (you can use the fileno and fdopen functions to go from one to the other), but they're not the same thing.
Basically, there are two distinct I/O systems, one built on top of the other. The lower level system uses numeric file descriptors, manipulated via the open, read, write, and close functions, and so forth. The higher level, as defined by the ISO C standard, uses pointers of type FILE*, manipulated with fopen, fread, fwrite, fprintf, putchar, fclose, and so forth.
As I mentioned, on UNIX-like system, the C standard layer is generally implemented on top of the POSIX layer. On non-POSIX systems (like MS Windows), the C standard layer may be implemented on top of some other system-specific interface.
Linux and other UNIX-like systems try (incompletely) to follow an "everything is a file" philosophy. There are a number of file-like entities under /proc. These are not physical files stored on disk; they're entities that can be accessed using either the POSIX or ISO C I/O layers. Neither layer requires the "files" it deals with to be actual disk files, so there's nothing inconsistent about this.
man proc for more information on what's under the /proc directory (there's far more detail than I can put in this answer).

Process many EDI files through single MFX

I've created a mapping in MapForce 2013 and exported the MFX file. Now, I need to be able to run the mapping using MapForce Server. The problem is, I need to specify both the input EDI file and the output file. As far as I can tell, the usage pattern is to run the mapping with MapForce server using the input/output configuration in the MFX itself, not passed in on the command line.
I suppose I could change the input/output to some standard file name and then just write the input file to that path before performing the mapping, and then grab the output from the standard output file path when the mapping is complete.
But I'd prefer to be able to do something like:
MapForceServer run -in=MyInputFile.txt -out=MyOutputFile.xml MyMapping.mfx > MyLogFile.txt
Is something like this possible? Perhaps using parameters within the mapping?
There are two options that I've come across in dealing with a similar situation.
Option 1- If you set the input XML file to *.xml in the component settings, mapforceserver.exe will automatically process all txt in the directory assuming your source is xml (this should work for text just the same). Similar to the example below you can set a cleanup routine to move the files into another folder after processing.
Note: It looks in the folder where the schema file is located.
Option 2 - Since your output is XML you can use Altova's raptorxml (rack up another license charge). Now you can generate code in XSLT 2.0 and use a batch file to automatically execute, something like this.
::#echo off
for %%f IN (*.xml) DO (RaptorXML xslt --xslt-version=2 --input="%%f" --output="out/%%f" %* "mymapping.xslt"
if NOT errorlevel 1 move "%%f" processed
if errorlevel 1 move "%%f" error)
sleep 15
mymapping.bat
I tossed in a sleep command to loop the batch for rechecking every 15 seconds. Unfortunately this does not work if your output target is a database.

Are there any equivalent of C/C++ __FILE__ and __LINE__ macros in R?

I'm trying to get the equivalent of FILE or LINE macros in C or C++ in R (or S+). Any ideas?
FILE The presumed name of the current source file (a character string literal).
LINE The presumed line number (within the current source file) of the current source line (an integer constant).
As for context - I have log messages being flushed to console from different sections of the code, and given that the messages themselves are built at run-time, it is often very difficult to find out where this log message is coming from (with the size of the R code growing to many thousand lines and running on a distributed grid). However if I could dump the FILE and LINE number along with the log messages, it would be much easier to trace the logs...
Use the #line directive. The structure is #line nn "filename". See Duncan's Murdoch's article on source references for more.

Resources