Why does sed only show one line though pipe - unix

I have several txt files under a directory, and I want see first line of every file
So I use
ls *txt | xargs sed -n '1p'
however it only returns one line of the first file
What is wrong?
P.S.: I know I can use head, but what I ask is why sed is not working

Use the argument -t to xargs to see what is going on:
ls *txt | xargs -t sed -n '1p'
You will see that sed is run as:
sed -n '1p' foo.txt bar.txt gar.txt
and as sed only supports one input file, it will print the first line of
the file foo.txt and exit.
xargs is assuming you want to pass the list of input files all together.
To tell it to pass one at a time, you need to use the -L NUM option,
which tells xargs to pass one line at a time from your ls command.
[Note there are other issues you will run into if any file name has a blank in it]
ls *txt | xargs -L 1 -t sed -n '1p'
You will see:
sed -n '1p' foo.txt
sed -n '1p' bar.txt
sed -n '1p' gar.txt
In unix there are many ways to do any task; other ways include:
(if you use /bin/csh or /bin/tcsh):
foreach f (*txt)
echo -n $f:
sed -1p $f
end
If you use /bin/sh or /bin/ksh, then:
for files in *txt;
do
echo -n $files :
sed -n '1p' $files
done
Also consider using the program find; it lets you qualify the types of files you want to look at and can recursively examine sub directories:
`find . -type f -a -name "*txt" -print -a -exec sed -n '1p' {} \;`

First, ls and xargs are not useful here. Please read: "Don't Parse ls". For a more reliable form of the command that works with all kinds of file names, use:
sed -n '1p' *.txt
Second, sed treats its input files all as one stream. So, the above does not do what you want. Use head instead (as you said):
head -n1 *.txt
To suppress the verbose headers that head prints and make the output more like sed 1p, use the -q option:
head -qn1 *.txt
Handling many many files
If you have many many .txt files, where, depending on system configuration, "many" likely means several tens of thousands of such files, then another approach is needed. find is useful:
find . -maxdepth 1 -name '*.txt' -exec head -n1 {} +

This might work for you (GNU sed):
sed -s '1!d' file1 file2 file...
This will print the first line of each file i.e. delete all lines but the first of each file.

Related

How to perform a static count of loaded packages in R?

I'd like to search a directory structure to count the number of times I've loaded various R packages. The source is contained in .org and .R files. I'm willing to assume that "library(" is the first non-blank entry on any line I care about, and I'm willing to assume that there is at most only one such call per line.
find . -regex ".*/.*\.org" -print
gets me a list of .org files, and
find . -regex ".*\.\(org\|R\)$" -print
gets me a list of .org and .R files (thanks to https://unix.stackexchange.com/questions/15308/how-to-use-find-command-to-search-for-multiple-extensions).
Given a particular file,
grep -h "library(" file | sed 's/library(//' | sed 's/)//'
gets me the package name. I'd like to hook them together and then possibly redirect the output to a file, from which I can use R to calculate frequencies.
The seemingly straightforward
find . -regex ".*/.*\.org" -print | xargs -0 grep -h "library(" | sed 's/library(//' | sed 's/)//'
doesn't work; I get
find . -regex ".*/.*\.org" -print | xargs -0 grep -h "library(" | sed 's/library(//' | sed 's/)//'
Usage: /usr/bin/grep [OPTION]... PATTERN [FILE]...
Try '/usr/bin/grep --help' for more information.
and I'm not sure what to do next.
I also tried
find . -regex ".*/.*\.org" -exec grep -h "library(" "{}" "\;"
and got
find . -regex ".*/.*\.org" -exec grep -h "library(" "{}" "\;"
find: missing argument to `-exec'
It seems simple. What am I missing?
UPDATE: Adding -t to the above xargs shows me the first command:
grep -h library ./dirname/filename.org
followed by, presumably, a list of all the matching files with paths relative to the PWD. Actually, that works if I only search for .org files; if I add .R files, too, I get "xargs: argument line too long". I think that means xargs is passing the entire list of files as the argument to one invocation of grep.
find ... -print | xargs OK
find ... -print0 | xargs -0 OK
find ... -print0 | xargs broken
find ... -print | xargs -0 broken (what you used)
Also, please don't:
grep -h "library(" | sed 's/library(//' | sed 's/)//'
when this is faster:
grep -h "library(" | sed -e 's/library(//' -e 's/)//'
and this is even faster, and more interesting:
grep -h "library(" | grep -o '(.*)' | tr -d ' ()'

find + sed, filename output

I have directory: D:/Temp, where there are a lot of subfolders with text files. Each folder has "file.txt". In some file.txt files is a word - "pattern". I would like check how many pattern words there are, and also get the filepath to that file.txt:
find D:/Temp -type f -name "file.txt" -exec basename {} cat {} \; | sed -n '/pattern/p' | wc -l
Output should be:
4
D:/Temp/abc1/file.txt
D:/Temp/abc2/file.txt
D:/Temp/abc3/file.txt
D:/Temp/abc4/file.txt
Or similar.
You could use GNU grep :
grep -lr --include file.txt "pattern" "D:/Temp/"
This will return the file paths.
grep -cr --include file.txt "pattern" "D:/Temp/"
This will return the count (counting the pattern occurences rather than the number of files)
Explanation of the flags :
-r makes grep recursively browse its target, that can then be a directory
--include <glob> makes grep restrict its recursive browsing to files matching the <glob>.
-l makes grep only return the files path. Additionnaly, it will stop parsing a file as soon as it has encountered the pattern.
-c makes grep only return the number of matches
If your file names don't contain spaces then all you need is:
awk '/pattern/{print FILENAME; cnt++; nextfile} END{print cnt+0}' $(find D:/Temp -type f -name "file.txt")
The above used GNU awk for nextfile.
I'd propose you to use two commands : one for find all the files:
find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \;
Another for counting them:
find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \; | wc -l
Previously I've used:
grep -Hc "pattern" $(find D:/temp -type f -name "file.txt")
This will only work if file.txt is found. Otherwise you could use the following which will account for when both files are found or not found:
searchFiles=$(find D:/temp -type f -name "file.txt"); [[ ! -z "$searchFiles" ]] && grep -Hc "pattern" $searchFiles
The output for this would look more like:
D:/Temp/abc1/file.txt 2
D:/Temp/abc2/file.txt 1
D:/Temp/abc3/file.txt 1
D:/Temp/abc4/file.txt 1
I would use
find D:/Temp -type f -name "file.txt" -exec dirname {} \; > tmpfile
wc -l tmpfile
cat tmpfile
rm tmpfile
Give a try to this safe and standard version:
find D:/Temp -type f -name file.txt -printf "%p\0" | xargs -0 bash -c 'printf "%s" "${#}"; grep -c "pattern" "${#}"' | grep ":[1-9][0-9]*$"
For each file.txt file found in D:/Temp directory and sub-directories, the xargs command prints the filename and the number of lines which contain pattern (grep -c).
A final grep ":[1-9][0-9]*$" selects only filenames with a count greater than 0.
The way I'm reading your question, I'm going to answer as if:
some but not all file.txt files contain pattern,
you want a list of the paths leading to file.txt with pattern, and
you want a count of pattern in each of those files.
There are a few options. (Always multiple ways to do anything.)
If your bash is version 4 or higher, you can use globstar to recurse through directories:
shopt -s globstar
for file in **/file.txt; do
if count=$(grep -c 'pattern' "$file"); then
printf "%d %s\n" "$count" "${file%/*}"
fi
done
This works because the if evaluation considers a failed grep (i.e. zero occurrences) to be FALSE, and thus does not print results.
Note that this may be high impact because it launches a separate grep on each file that is found. A lighter weight alternative might be to run a single grep on the fileglob, and parse the results:
shopt -s globstar
grep -c 'pattern' **/file.txt | grep -v ':0$'
This also depends on bash 4, and of course if you have millions of files you may overwhelm bash's command line maximum length. The output of this will be obvious, but you'll need to parse it with care if your filenames contain colons. I.e. cut -d: -f2 may not cut it.
One more option that leverages grep instead of bash might be:
grep -r --include 'file.txt' -c 'pattern' ./ | grep -v ':0$'
This uses GNU grep's --include option which modified the behaviour of -r (recursive). It should work in Linux, FreeBSD, NetBSD, OSX, but not with the default grep on OpenBSD or most SVR4 (Solaris, HP/UX, etc).
Note that I have tested none of these. No liability assumed. May contain nuts.
This should do it:
find . -name "file.txt" -type f -printf '%p\n' | awk '{print} END { print NR }'

How to run a command on all results of find?

Using find I create a file that contains all the files that use a specific key word:
find . -type f | xargs grep -l 'foo' > foo.txt
I want to take that list in foo.txt and maybe run some commands using that list, i.e. run an ls command on the list contained within the file.
You don't need xargs to create foo.txt. Just execute the command with -exec like this:
find . -type f -exec grep -l 'foo' {} \; > foo.txt
Then you can run ls against the file by looping through the file:
while IFS= read -r read file
do
ls "$file"
done < foo.txt
Maybe it is a little ugly, but this can also make it:
ls $(cat foo.txt)
You can use xargs like this:
xargs ls < foo.txt
The advantage of xargs is that it will execute the command with multiple arguments which is more efficient than executing the command once per argument using a loop, for example.

How to delete a line from files in subfolders using Unix commands?

I've a folder structure where the starting folder is test. test has two folders in it. test1 and test2, plus a bunch of files. There's a word welcome in these bunch of files as well as the files in test1 and test2. I tried this and it did not work
sed 's/\<welcome\>//g' **/*
What am I doing wrong? How can this be done?
This might work for you (GNU sed):
find test -type f -exec sed -i '/welcome/d' '{}' \;
sed -e 's/welcome//g' test > test2
The file test has several entries including welcome.
This above sed line deletes welcome from test.
You could put it in a loop..
And I see that sputnick has just answered your question to a large extent! :P
Try doing this (GNU sed):
sed -i '/welcome/s/\<welcome\>//g' test/*
The -i switch modify the files, so take care.
If -i switch is not supported on your Unix :
for i in test/*; do
sed '/welcome/s/\<welcome\>//g' "$i" > /tmp/.sed && mv /tmp/.sed "$i"
done
find . -type f|xargs perl -ni -e 'print unless(/<welcome>/)'

find and replace in multiple files on command line

How do i find and replace a string on command line in multiple files on unix?
there are many ways .But one of the answers would be:
find . -name '*.html' |xargs perl -pi -e 's/find/replace/g'
Like the Zombie solution (and faster I assume) but with sed (standard on many distros and OSX) instead of Perl :
find . -name '*.py' | xargs sed -i .bak 's/foo/bar/g'
This will replace all foo occurences in your Python files below the current directory with bar and create a backup for each file with the .py.bak extension.
And to remove de .bak files:
find . -name "*.bak" -delete
I always did that with ed scripts or ex scripts.
for i in "$#"; do ex - "$i" << 'eof'; done
%s/old/new/
x
eof
The ex command is just the : line mode from vi.
Using find and sed with name or directories with space use this:
find . -name '*.py' -print0 | xargs -0 sed -i 's/foo/bar/g'
with recent bash shell, and assuming you do not need to traverse directories
for file in *.txt
do
while read -r line
do
echo ${line//find/replace} > temp
done <"file"
mv temp "$file"
done

Resources