Will using "yes" command in unix waste lot of cpu cycles? - unix

Will using "yes" command waste lot of cpu cycles ?
I have a long running script (script code not in my control) which accepts something as input just once. Then the script runs for long time .
To automate I use "yes" command to feed the input
yes hello | myscript
Will yes command steal/waste lot of cpu cycles ?. As per the docs i read it keeps printing the string argument to piped program
I gave top command , i didnt see "yes" there in top

yes will print the string "hello" when it has the chance - this means that the receiving end (your script) must be waiting for I/O (i.e., expecting input). So: no, yes does not take any CPU when the receiving end does not wait for input, the process is blocked.
See the run state of the process yes in a ps auxf for confirmation.

Related

In a UNIX pipeline how to get the user-tool interaction of the first stage piped into the next stage?

Page 32 of the book "The UNIX Programming Environment" makes this profound statement about UNIX pipes:
The programs in a pipeline actually run at the same time, not one
after another. This means that the programs in a pipeline can be
interactive; the kernel looks after whatever scheduling and
synchronization is need to make it all work.
Wow! By using pipes I can get parallel processing for free!
"I've got to illustrate this awesome capability to my colleagues" I thought. I will implement a demo: create a simple interactive tool that mimics what I type in at the command window. I'll name that tool mimic. Use a pipe to connect it to another tool which counts the number of lines and characters. I'll name that tool wc (sadly, I am working on Windows, which doesn't have the UNIX wc program so I must implement my own). The two tools will run on a command line like this:
mimic | wc
The output of mimic is piped into wc.
Well, that's the idea.
I implemented the mimic tool with a very simple Flex lexer, which I show below. Compiling it generated mimic.exe
When I run mimic.exe from the command line it does indeed mimic what I type:
> mimic
hello world
hello world
greetings
greetings
ctrl-c
I implemented wc using AWK. The AWK program (wc.awk) is shown below. When I run wc from the command line it does indeed count the lines and characters:
> echo Hello World | awk -f wc.awk
lines 1 chars 13
However, when I put them together with a pipe, they don't work as I imagined they would. Here's a sample run:
> mimic | awk -f wc.awk
hello world
greetings
ctrl-c
Hmm, nothing. No mimicking. No line counts. No char counts.
How come it's not working? What am I doing wrong, please?
What can I do to make it work as I expected? I expected it to work like this: I type something in at the command line. mimic repeats it, sending it to the pipe which sends it to wc which reports the number of lines and characters. I type in the next thing at the command line. mimic repeats it, sending it to the pipe which sends it to wc which reports the number of lines and characters. And so forth. That's the behavior I thought I was implementing.
Here is my simple Flex lexer (mimic):
%option noyywrap
%option always-interactive
%%
%%
int main(int argc, char *argv[])
{
yyin = stdin;
yylex();
return 0;
}
Here is my simple AWK program (wc):
{ nchars = nchars + length($0) + 1 }
END { printf("lines %-10d chars %d\n", NR, nchars) }
This question has little or nothing to do with Flex, Bison, Awk, and really not so much about Unix, either (since you're experimenting with Windows).
I don't have Windows handy, but the underlying issue is basically about stdio buffering, so it's reproducible on Unix as well.
To simplify, I only implemented mimic, which I did directly rather than using Flex (which is clearly overkill):
#include <stdio.h>
int main(void) {
for (int ch; (ch = getchar()) != EOF; ) putchar(ch);
return 0;
}
Since you use %always-interactive, which forces Flex to always read one character at a time with fgetc(), that has basically the same sequence of standard library calls as your program, except for simplifying fwrite of one byte to the equivalent putchar.
It certainly has the same execution characteristic:
$ ./mimic
Here we go round the mulberry busy,
Here we go round the mulberry busy,
the mulberry bush, the mulberry bush.
the mulberry bush, the mulberry bush.
In the above, I signalled end-of-input by typing Ctrl-D for the third line. On Windows, I would have had to have typed Ctrl-Z followed by Enter to get the same effect. If I kill the execution by typing Ctrl-C instead, I get roughly the same result (other than the fact that Ctrl-C shows up in the console):
$ ./mimic
Here we go round the mulberry bush,
Here we go round the mulberry bush,
the mulberry bush, the mulberry bush.
the mulberry bush, the mulberry bush.
^C
Now, since mimic just copies stdin to stdout, I might expect to be able to pipe it into itself and get the same result. But the output is a little different:
$ ./mimic | ./mimic
Here we go round the mulberry bush,
the mulberry bush, the mulberry bush.
Here we go round the mulberry bush,
the mulberry bush, the mulberry bush.
Again, I signaled end of input by typing Ctrl-D. And it was only after I typed the Ctrl-D that any output appeared; at that point, both lines were echoed. And if I terminate the program abruptly with Ctrl-C, I don't get any output at all:
$ ./mimic | ./mimic
Here we go round the mulberry bush,
the mulberry bush, the mulberry bush.
^C
OK, that's the data. Now, we need an explanation. And the explanation has to do with the way C standard library streams are buffered, which you can read about in the setvbuf manpage (and in many places on the web). I'll summarise:
The C standard specifies that all input functions execute "as if" implemented by repeated calls to fgetc, and all output functions "as if" implemented by repeated calls to fputc. Note that this means that there is nothing special about the chunk of bytes written by a single printf or fwrite. The library does not do anything to ensure that the sequence of calls is atomic. If you have two processes both writing to stdout, the messages can get interleaved, and that will happen from time to time.
The data written by fputc (and, consequently, all stdio output functions) is actually placed into an output buffer. This buffer is not part of the Operating System (which may add another layer of buffering). It's strictly part of the stdio library functions, which are ordinary userland functions. You could write them yourself (and it's a pretty good exercise to do so). From time to time, the contents of this buffer are sent to the appropriate operating system interface in order to be transferred to the output device.
"From time to time" is deliberately unspecific. There are three standard buffering modes, although the standard doesn't require them all to be used by a particular library implementation, and it also doesn't restrict the library implementation from using different buffering modes (although the three specified modes do basically cover the useful possibilities). However, most C library implementations you're likely to use do implement all three, pretty well as described in the standard. So take the following as a description of a common implementation technique, but be aware that on certain idiosyncratic platforms, it might not be accurate.
The three buffering modes are:
Unbuffered. In this mode, each byte written is transferred to the operating system (and, it is hoped, to the actual output device) as soon as possible.
Fully buffered. In this mode, there is a buffer of a predetermined size (often 8 kilobytes, but different library implementations have different defaults for different platforms). If you want to, you can supply your own buffer (of an arbitrary size) for a particular output stream, using the setvbuf standard library function (q.v.). Fully-buffered output might stay in the output buffer until it is full (although a given implementation may release output earlier). The buffer will, however, be sent to the operating system if you call fflush or fclose on the stream, or if fclose is called automatically when main returns. (It's not sent if the process dies, though.)
Line-buffered. In this mode, the stream again has an output buffer of a predetermined size, which is usually exactly the same as the buffer used in "fully-buffered" mode. The difference is that the buffer is also sent to the operating system when a new line character ('\n') is written. (If the buffer gets full before an end-of-line character is written, then it is sent to the operating system, just as in Fully-Buffered mode. But most of the time, lines will be fairly short, so the buffer will be sent to the OS at the end of each line.)
Finally, you need to know that stdout is fully-buffered by default unless the standard library can determine that stdout is connected to some kind of console device, in which case it is not fully-buffered. (On Unix, it is typically line-buffered.) By contrast, stderr is not fully-buffered. (On Unix, it is typically unbuffered.) You can change the buffering mode, and the buffer size if relevant, calling setvbuf before the first write operation to the stream.
The above-mentioned defaults are one of the reasons you are encouraged to write error messages to stderr rather than stdout: since stderr is unbuffered, the error message will appear as soon as possible. Also, you should normally put \n at the end of output line; if standard output is a terminal and therefore line buffered, the \n will ensure that the line is actually output, rather than languishing in the output buffer.
You can see all of this in action in the above examples. When I just ran ./mimic, leaving stdout mapped to the terminal, the output showed up each time I entered a line. (That also has to do with the way terminal input is handled by the terminal driver, which is another kettle of fish.)
But when I piped mimic into itself, the first mimics standard output is redirected to a pipe. A pipe is not a terminal, so that mimic's stdout is fully-buffered by default. Since the buffer is longer than the total input, the entire program runs without sending anything to stdout, until the buffer is flushed when stdout is implicitly closed by main returning.
Moreover, if I kill the process (by typing Ctrl-C, for example, or by sending it a SIGKILL signal), then the output buffer is never sent to the operating system, and nothing appears on the console.
If you're writing console apps using standard C library calls, it's very important to understand how stdio output buffering affects the sequence of outputs you see. That's just as true on Windows as on Unix. (Of course, it doesn't apply if you use native Windows or Posix I/O interfaces.)

How to make zsh's REPORTTIME work? (time for long-running commands)

This is my .zshrc:
export REPORTTIME=3
When I run sleep 4 it doesn't output anything.
If I change to REPORTTIME=blablabla (or anything non-sensical) it doesn't raise an error and starts behaving as REPORTTIME=0, i.e. returning the time taken for everything.
Interestingly, if I try REPORTTIME=3s I get the following message:
zsh: bad math expression: operator expected at `s'
sleep 4 0.00s user 0.00s system 0% cpu 4.004 total
So I get the error and still the output.
I tried RERPORTTIME="3" and even REPORTTIME=1+2. None of these work.
Also, if I run python -c "import time; time.sleep(4)" I get the same results (so the problem is not with sleep).
Of course, I tried other values too (other than 3).
I'm running MacOS with iterm2 and zsh is my default shell.
You need to set it explicitly to a non-negative integer representing seconds; i.e.
% REPORTTIME=3
Setting to other non-negative values does not work on my Zsh v5.4.2
either. Running something like a system update (e.g., yaourt) then
acts as if I had put time on the front of it. Pretty slick!
So you need a command that eats some user/system time; sleep does
not. Although total elapsed is long enough, user and system time is
not:
% time sleep 3
sleep 3 0.00s user 0.00s system 0% cpu 3.002 total
Also, no need to export this since it's directly used by Zsh.
You can undo/turn off this behavior with:
% unset REPORTTIME
Docs on REPORTTIME from man zshparam:
If nonnegative, commands whose combined user and system execution times (measured in seconds) are greater than this value have timing statistics printed for them. Output is suppressed for commands executed within the line editor, including completion; commands explicitly marked with the time keyword still cause the summary to be printed in this case.

AutoIt Scripting for an External CLI Program - eac3to.exe

I am attempting to design a front end GUI for a CLI program by the name of eac3to.exe. The problem as I see it is that this program sends all of it's output to a cmd window. This is giving me no end of trouble because I need to get a lot of this output into a GUI window. This sounds easy enough, but I am begining to wonder whether I have found one of AutoIt's limitations?
I can use the Run() function with a windows internal command such as Dir and then get the output into a variable with the AutoIt StdoutRead() function, but I just can't get the output from an external program such as eac3to.exe - it just doesn't seem to work whatever I do! Just for testing purposesI I don't even need to get the output to a a GUI window: just printing it with ConsoleWrite() is good enough as this proves that I was able to read it into a variable. So at this stage that's all I need to do - get the text (usually about 10 lines) that has been output to a cmd window by my external CLI program into a variable. Once I can do this the rest will be a lot easier. This is what I have been trying, but it never works:
Global $iPID = Run("C:\VIDEO_EDITING\eac3to\eac3to.exe","", #SW_SHOW)
Global $ScreenOutput = StdoutRead($iPID)
ConsoleWrite($ScreenOutput & #CRLF)
After running this script all I get from the consolWrite() is a blank line - not the text data that was output as a result of running eac3to.exe (running eac3to without any arguments just lists a screen of help text relating to all the commandline options), and that's what I am trying to get into a variable so that I can put it to use later in the program.
Before I suggest a solution let me just tell you that Autoit has one
of the best help files out there. Use it.
You are missing $STDOUT_CHILD = Provide a handle to the child's STDOUT stream.
Also, you can't just do RUN and immediately call stdoutRead. At what point did you give the app some time to do anything and actually print something back to the console?
You need to either use ProcessWaitClose and read the stream then or, you should read the stream in a loop. Simplest check would be to set a sleep between RUN and READ and see what happens.
#include <AutoItConstants.au3>
Global $iPID = Run("C:\VIDEO_EDITING\eac3to\eac3to.exe","", #SW_SHOW, $STDOUT_CHILD)
; Wait until the process has closed using the PID returned by Run.
ProcessWaitClose($iPID)
; Read the Stdout stream of the PID returned by Run. This can also be done in a while loop. Look at the example for StderrRead.
; If the proccess doesnt end when finished you need to put this inside of a loop.
Local $ScreenOutput = StdoutRead($iPID)
ConsoleWrite($ScreenOutput & #CRLF)

Why doesn't "yes | head" hang?

Why doesn't yes | head hang?
I thought the system collects all of the result from yes and then pipes it to head, and because yes is an infinite loop, the system hangs. But, it can actually stop and show 10 lines of y.
How does the system manage to stop yes when head is done collecting data?
When you say yes | head the shell will arrange things such that the output of yes goes to a pipe and the input of head comes from that same pipe. When head reads 10 lines, it closes its STDIN_FILENO, thereby closing its end of the pipe. When yes tries to write to a closed pipe it gets a SIGPIPE whose default action is to kill it.
A simple way to test this is with strace:
$ strace yes | head
y
[...]
y
write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 4096) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=4069, si_uid=1000} ---
+++ killed by SIGPIPE +++
the system collects all of the result from yes and then pipes it to head
That would be extremely inefficient. When you use a pipe, the operating system creates a buffer for the pipe communication.
send | receive
As long as there's enough space in the buffer the sending process will write in it, and if there's enough data in the buffer, the receiver will process it. If there isn't the waiting process is blocked.
As soon as head finishes, the OS notices that it terminated, it will trigger a signal (SIGPIPE) which will terminate the sending process (unless the process handles it).

How to avoid multiple writers in a named pipe?

I am writing a program with a named pipe with multiple readers and multiple writers. The idea is to use that named pipe to create pairs of reader/writer. That is:
A reads the pipe
B writes in the pipe
(vice versa)
Pair A-B created!
In order to ensure that only one process is reading and one is writing, I have used 2 locks with flock. Just like this.
Reader Code:
echo "[JOB $2, Part $REMAINING] Taking next machine..."
VMTAKEN=$((
flock -x 200;
cat $VMPIPE;
)200>$JOINQUEUELOCK)
echo "[JOB $2, Part $REMAINING] Machine $VMTAKEN taken..."
Writer Code:
((
flock -x 200;
echo "[MACHINE $MACHINEID] I am inside the critical section"
echo "$MACHINEID" > $VMPIPE;
echo "[MACHINE $MACHINEID] Going outside the critical section"
)200>$VMQUEUELOCK)
echo "[MACHINE $MACHINEID] Got new Job"
I sometimes get the following problem:
[MACHINE 3] I am inside the critical section
[JOB 1, Part 249] Taking next machine...
[MACHINE 3] Going outside the critical section
[MACHINE 1] I am inside the critical section
[MACHINE 1] Going outside the critical section
[MACHINE 1]: Got new Job
[MACHINE 3]: Got new Job
[JOB 1, Part 249] Machine 3
1 taken...
As you can see, Another writer wrote before the reader finished reading. What can I do to get rid of this problem? Should I use an ACK Pipe or something?
Thank you in advance
This would be a typical use for semaphores:
Create 2 semaphores - one for reading processed, the other one for writing processes. set each semaphore to value 1
Reading processes sem_wait(2) on the semaphore for readers until semphore > 0 and lower it to zero if they get it.
Writing processes will do the same with the semaphore intended for them
A controlling process (which may also set up the semaphores initially) could check, if both semaphores are zero and assign the pair
reader/writer release the semaphores (increasing them by 1 again) so next readr or writer will get the semaphore.
For passing informations between reader/writer shared memory may be used...

Resources