Difference between cat filename and cat < filename in unix - unix

Suppose I have a file named "file1". I want to display the contents of "file1" using the cat command in Unix.
Both cat file1 and cat < file1 are working similarly. What is the difference between them?

It's where input comes from.
If you say cat file1 the shell doesn't do anything special. cat calls open(2) on the file and reads from it
If you say cat < file1 the shell calls open(2) on the file and calls dup(2) into STDIN_FILENO for cat. cat just reads from STDIN_FILENO

We can use another command to notice the difference between:
wc –w food2.txt
Possible output:
6 food2.txt
the command tells the file name since it knows it (passed as an argument) .
wc –w < food2.txt
Possible output:
6
the standard input is redirected to file food2.txt without the command knowing about it .

cat opens a file, and cat > fileName tells the shell to open the file in the cat standard input.
Here is a link with more detailed information/answer:
https://unix.stackexchange.com/questions/258931/difference-between-cat-and-cat

Related

Use ls to show only certain number of items

I'm trying to get a simple myhead command in C to show the top 10 lines of the first five .HTML files in a directory. I was advised to use ls to carry this out in conjunction with my myhead command. My main issue is with getting ls to only show 5 .html files and not list them all.
I was thinking something like this
ls *.html -n 5 > myhead
However, that doesn't exist. Any ideas? We are only meant to use ls and myhead.
If I understand correctly, you've written a C program myhead that prints out ten lines of a file passed in.
You definitely don't want to do this
ls *.html -n 5 > myhead
This would overwrite or create a new file myhead in the current directory.
The key thing needed to achieve this are command line pipes. This allows the stdout of one command to be the stdin of the next command. Also you'll need command substitution which is having the stdout output of one command, or piped commands, be used as text for another command. Historically this has been done with backticks `ls`, or in bash you can use $(ls) as an example to get a ls listing and use it as text for another command.
Given you're okay with the standard ls file list order you can do this to get the first 5 .html files:
ls *.html | head -n 5
I don't know what myhead is or how it works as it's not explained in the question. You say it shows the first ten lines of a file passed into it. There could be a few it does that.
I'll give a solution for each possibility (assuming you're using bash):
take one file at a time, passed in as an argument
for f in $(ls *.html | head -n 5) ; do myhead $f ; done
take multiple files at a time, passed in as multiple arguments
myhead $(ls *.html | head -n 5)
take the contents of a file passed in through stdin
for f in $(ls *.html | head -n 5) ; do cat $f | myhead ; done
What you're looking for are pipes
You can use them like this:
# all the output (STDOUT) of ls is passed as the input (STDIN) of myhead
ls *.html | myhead -5
myhead reads the input on STDIN, and outputs N lines of it on STDOUT.
With the standard Unix head command, you can do this:
head -n 10 $(ls | head -n 5)
First, you should run this command exactly as shown in your shell to verify it works. Next, try it with your myhead command instead of head.

Why not pipe list of file names into cat?

What is the design rationale that cat doesn't take list of file names from pipe input? Why did the designers choose that the following does not work?
ls *.txt | cat
Instead of this, they chose that we need to pass the file names as argument to cat as:
ls *.txt | xargs cat
When you say ls *.txt | cat doesn't work, you should say that doesn't work as you expect. In fact, that works in the way it was thought to work.
From man:
cat - Concatenate FILE(s), or standard input, to standard output
Suppose the next output:
$ ls *.txt
file1.txt
file2.txt
... the input to cat will be:
file1.txt
file2.txt
...and that's exactly what cat output in the standard output
In some shells, it's equivalent to:
cat <(ls *.txt)
or
ls *.txt > tmpfile; cat tmpfile
So, cat is really working as their designers expected to do so.
On the other hand, what you are expecting is that cat interprets its input as a set of filenames to read and concatenate their content, but when you pipe to cat, that input works as a lonely file.
To make it short, cat is a command, like echo or cp, and few others, which cannot convert pipe redirected input stream into arguments.
So, xargs, is used, to pass the input stream as an argument to the command.
More details here: http://en.wikipedia.org/wiki/Xargs
As a former unix SA, and now, Python developer, I believe I could compare xargs, to StringIO/CStringIO, in Python, as it kind of helps the same way.
When it comes to your question: Why didn't they allow stream input? Here is what I think
Nobody but them could answer this.
I believe, however, than cat is meant to print to stdout the content of a file, while the command echo, was meant to print to stdout the content of a string.
Each of these commands, had a specific role, when created.

How to cat using part of a filename in terminal?

I'm using terminal on OS 10.X. I have some data files of the format:
mbh5.0_mrg4.54545454545_period0.000722172513951.params.dat
mbh5.0_mrg4.54545454545_period0.00077271543854.params.dat
mbh5.0_mrg4.59090909091_period-0.000355232058085.params.dat
mbh5.0_mrg4.59090909091_period-0.000402015664015.params.dat
I know that there will be some files with similar numbers after mbh and mrg, but I won't know ahead of time what the numbers will be or how many similarly numbered ones there will be. My goal is to cat all the data from all the files with similar numbers after mbh and mrg into one data file. So from the above I would want to do something like...
cat mbh5.0_mrg4.54545454545*dat > mbh5.0_mrg4.54545454545.dat
cat mbh5.0_mrg4.5909090909*dat > mbh5.0_mrg4.5909090909.dat
I want to automate this process because there will be many such files.
What would be the best way to do this? I've been looking into sed, but I don't have a solution yet.
for file in *.params.dat; do
prefix=${file%_*}
cat "$file" >> "$prefix.dat"
done
This part ${file%_*} remove the last underscore and following text from the end of $file and saves the result in the prefix variable. (Ref: http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion)
It's not 100% clear to me what you're trying to achieve here but if you want to aggregate files into a file with the same number after "mbh5.0_mrg4." then you can do the following.
ls -l mbh5.0_mrg4* | awk '{print "cat " $9 " > mbh5.0_mrg4." substr($9,12,11) ".dat" }' | /bin/bash
The "ls -s" lists the file and the "awk" takes the 9th column from the result of the ls. With some string concatenation the result is passed to /bin/bash to be executed.
This is a linux bash script, so assuming you have /bind/bash, I'm not 100% famililar with OS X. This script also assumes that the number youre grouping on is always in the same place in the filename. I think you can change /bin/bash to almost any shell you have installed.

How to display contents of all files under a directory on the screen using unix commands

Using cat command as follows we can display content of multiple files on screen
cat file1 file2 file3
But in a directory if there are more than 20 files and I want content of all those files to be displayed on the screen without using the cat command as above by mentioning the names of all files.
How can I do this?
You can use the * character to match all the files in your current directory.
cat * will display the content of all the files.
If you want to display only files with .txt extension, you can use cat *.txt, or if you want to display all the files whose filenames start with "file" like your example, you can use cat file*
If it's just one level of subdirectory, use cat * */*
Otherwise,
find . -type f -exec cat {} \;
which means run the find command, to search the current directory (.) for all ordinary files (-type f). For each file found, run the application (-exec) cat, with the current file name as a parameter (the {} is a placeholder for the filename). The escaped semicolon is required to terminate the -exec clause.
I also found it useful to print filename before printing content of each file:
find ./ -type f | xargs tail -n +1
It will go through all subdirectories as well.
Have you tried this command?
grep . *
It's not suitable for large files but works for /sys or /proc, if this is what you meant to see.
You could use awk too. Lets consider we need to print the content of a all text files in a directory some-directory
awk '{print}' some-directory/*.txt
If you want to do more then just one command called for every file, you will be more flexible with for loop. For example if you would like to print filename and it contents
for file in parent_dir/*.file_extension; do echo $file; cat $file; echo; done

difference between grep Vs cat and grep

i would like to know difference between below 2 commands, I understand that 2) should be use but i want to know the exact sequence that happens in 1) and 2)
suppose filename has 200 characters in it
1) cat filename | grep regex
2) grep regex filename
Functionally (in terms of output), those two are the same. The first one actually creates a separate process cat which simply send the contents of the file to standard output, which shows up on the standard input of the grep, because the shell has connected the two with a pipe.
In that sense grep regex <filename is also equivalent but with one less process.
Where you'll start seeing the difference is in variants when the extra information (the file names) is used by grep, such as with:
grep -n regex filename1 filename2
The difference between that and:
cat filename1 filename2 | grep -n regex
is that the former knows about the individual files whereas the latter sees it as one file (with no name).
While the former may give you:
filename1:7:line with regex in 10-line file
filename2:2:another regex line
the latter will be more like:
7:line with regex in 10-line file
12:another regex line
Another executable that acts differently if it knows the file names is wc, the word counter programs:
$ cat qq.in
1
2
3
$ wc -l qq.in # knows file so prints it
3 qq.in
$ cat qq.in | wc -l # does not know file
3
$ wc -l <qq.in # also does not know file
3
First one:
cat filename | grep regex
Normally cat opens file and prints its contents line by line to stdout. But here it outputs its content to pipe'|'. After that grep reads from pipe(it takes pipe as stdin) then if matches regex prints line to stdout. But here there is a detail grep is opened in new shell process so pipe forwards its input as output to new shell process.
Second one:
grep regex filename
Here grep directly reads from file(above it was reading from pipe) and matches regex if matched prints line to stdout.
If you want to check the actual execution time diffrence, first create a file with 100000 lines:
user#server ~ $ for i in $(seq 1 100000); do echo line${1} >> test_f; done
user#server ~ $ wc -l test_f
100000 test_f
Now measure:
user#server ~ $ time grep line test_f
#...
real 0m1.320s
user 0m0.101s
sys 0m0.122s
user#server ~ $ time cat test_f | grep line
#...
real 0m1.288s
user 0m0.132s
sys 0m0.108s
As we can see, the diffrence is not too big...
Actually, though the outputs are the same;
-$cat filename | grep regex
This command looks for the content of the file "filename", then fetches regex in it; while
-$grep regex filename
This command directly searches for the content named regex in the file "filename"
Functionally they are equivalent, however, the shell will fork two processes for cat filename | grep regex and connect them with a pipe.

Resources