This is the code i'm using to untar a file grep on the contents of the files within the tar and then delete the untared files. I dont have enough space to untar all files at once.
the issue i'm having is with the for f in `ls | grep -v *.gz line this is supposed to find the files that have come out of the tar and can be identified by not having a .tar.gz extension but it doesnt seem to pick them up?
Any help would be much appreciated
M
for i in *.tar.gz;
do echo $i >>outtput1;
tar -xvvzf $i; mv $i ./processed/;
for f in `ls | grep -v *.gz`; ----- this is the line that isn't working
do echo $f >> outtput1;
grep 93149249194 $f >>
outtput1; grep 788 $f >> outtput1;
rm -f $f;
done;
done
Try ls -1 | grep -v "\\.gz$". The -1 will make ls output one result per line. I've also fixed your regex for you in a few ways.
Although a better way to solve this whole thing is with find -exec.
Change it to ls | grep -v "*.gz", you have to quote *.gz because otherwise it will just glob the files in the working directory and grep them.
Never use ls in scripts, and don't use grep to match file patterns. Use globbing and tests instead:
for f in *
do
if [[ $f != *.gz ]]
then
...
fi
done
Related
Completely noob question but, using ls piped to grep, I need to find files or directories that have all capitals in their name, and directories need to have "/" appended to indicate that it is a directory. Trying to append the "/" is the only part I am stuck on. Again, I apologize for the amateur question. I currently have ls | grep [A-Z] and the example out should be: BIRD, DOG, DOGDIR/
It's an interesting question because it's a somewhat difficult thing to accomplish with a bash one-liner.
Here's what I came up with. It doesn't seem very elegant, but I'm not sure how to improve.
find /animals -type d -or -type f \
| grep '/[A-Z]*$' \
| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' \
| sed -e 's/, *$//'; echo
I'll break that down for you
find /animals -type d -or -type f writes out, once per line, the directories and files it found in /animals (see below for my test environment dockerfile - I created /animals to match your desired output). Find can't do a regex match as far as I know on the name, so...
grep '/[A-Z]*$' filter's find's output so that only paths are shown where the last part of the file or directory name, after the final /, is all uppercase
xargs -I + bash -c '...' when you're in a shell and you want to use a "for" loop, chances are what you should be using is xargs. Learn it, know it, love it. xargs takes its input, separated by default by $IFS, and runs the command you give it for each piece of input . So this is going to run a bash shell for each path. that passed the grep filter. In my case, -I + will make xargs replace the literal '+' character with its current input filename. -I also makes it pass one at a time through xargs. For more information, see the xargs manual page.
'echo -n $(basename +)$( test -d + && echo -n /),\\ ' this is the inner bash script that will be run by xargs for each path that got through grep.
basename + cuts the directory component off the path; from your example output you don't want eg /animals/DOGDIR/, you want DOGDIR/. basename is the program that trims the directories for us.
test -d + && echo -n / checks to see whether + (remember xargs will replace it with filename) is a directory ,and if so, runs echo -n /. the -n argument to echo suppresses the newline, important to get the output in the CSV format you specified.
now we can put it all together to see that we're echo -n the output of basename + , with / appended, if it's a directory, and then , appended to that. All the echos run with -n to suppress newlines to keep output CSV looking.
| sed -e 's/, *$//'; echo is purely for formatting. Adding , to each individual output was an easy way to get the CSV, but it leaves us with a final , at the end of the list. The sed invocation removes , followed by any number of spaces at the end of the output so far - eg the entire output from all the xargs invocations. And since we never did output a newline at the end of that output, the final echo is adding that.
Usually in unix shells, you probably wouldn't want a CSV style output. You'd probably instead want a newline-separated output in most cases, one matching file per line, and that would be somewhat simpler to do because you wouldn't need all that faffing with -n and , to make it CSV style. But, valid requirement if the need is there.
FROM debian
RUN mkdir -p /animals
WORKDIR /animals
RUN mkdir -p DOGDIR lowerdir && touch DOGDIR/DOG DOGDIR/lowerDOG2 lowerdir/BIRD
ENTRYPOINT [ "/bin/bash" ]
CMD [ "-c" , "find /animals -type d -or -type f | grep '/[A-Z]*$'| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' | sed -e 's/, *$//'; echo"]
$ docker run --rm test
BIRD, DOGDIR/, DOG
You can start looking at
ls -F | grep -v "[[:lower:]]"
I did not add something for a comma-seperated line, because this is the wrong method: Parsing ls should be avoided ! It will go wrong for filenames like
I am a terribble filename,
with newlines inside me,
and the ls command combined with grep
will only show the last line
BECAUSE THIS LINE HAS NO LOWERCASE CHARACTERS
To get the files without a pipe, you can use
shopt -s extglob
ls -dp +([[:upper:]])
shopt -u extglob
An explanation of the extglob and uppercase can be found at https://unix.stackexchange.com/a/389071/57293
When you want the output in one line, you can get troubles with filenames that have newlines or commas in its name. You might want something like
# parsing ls, yes wrong and failing for some files
ls -dp +([[:upper:]]) | tr "\n" "," | sed 's/,$/\n/'
In a filename*.tar.gz list, I need to view the tar file name where there is the file1.txt.
How can I improve this instruction ?
for file in filename*.tar.gz; do tar -ztvf "$file" | grep 'file1.txt' ;done
What you have so far will print the entry in a tar that matches the pattern "file1.txt", but it will not print the name of the tar file itself that contains the entry.
If you want to print the name of the tar file that contains file1.txt, you can use a conditional statement like this:
for file in filename*.tar.gz; do
if tar ztf "$file" | grep -q file1.txt; then
echo "$file"
fi
done
The condition here is the exit code of grep. If the pattern is found, grep exits with zero, which means success, and the echo "$file" will be executed.
Also note that I added the -q flag to grep to make it quiet, and do not print the matched line, as it does by default. With this flag, grep outputs nothing, and that's fine, we need only the exit code to decide the conditional statement.
A more compact equivalent:
for file in filename*.tar.gz; do
tar ztf "$file" | grep -q file1.txt && echo "$file"
done
I have a text file that lists a large number of file paths. I need to copy all these files from the source directory (mentioned in the path in the file one every line) to a destination directory.
Currently, the command line I tried is
while read line; do cp $ line dest_dir; done < my_file.txt
This seems to be a bit slow. Is there a way to parallelise this whole thing or speed it up ?
You could try GNU Parallel as follows:
parallel --dry-run -a fileList.txt cp {} destinationDirectory
If you like what it says, remove the --dry-run.
You could do something like the following (in your chosen shell)
#!/bin/bash
BATCHSIZE=2
# **NOTE**: check exists with -f and points at the right place. you might not need this. depends on your own taste for risk.
ln -s `which cp` /tmp/myuniquecpname
# **NOTE**: this sort of thing can have limits in some shells
for i in `cat test.txt`
do
BASENAME="`basename $i`"
echo doing /tmp/myuniquecpname $i test2/$BASENAME &
/tmp/myuniquecpname $i test2/$BASENAME &
COUNT=`ps -ef | grep /tmp/myuniquecpname | grep -v grep | wc -l`
# **NOTE**: maybe need to put a timeout on this loop
until [ $COUNT -lt $BATCHSIZE ]; do
COUNT=`ps -ef | grep /tmp/myuniquecpname | grep -v grep | wc -l`
echo waiting...
sleep 1
done
done
I want to concatenate the files whose name does not include "_BASE_". I thought it would be somewhere along the lines of ...
ls | grep -v _BASE_ | cat > all.txt
the cat part is what I am not getting right. Can anybody give me some idea about this?
Try this
ls | grep -v _BASE_ | xargs cat > all.txt
You can ignore some files with ls using --ignore option and then cat them into a file.
ls --ignore="*_BASE_*" | xargs cat > all.txt
Also you can do that without xargs:
cat $( ls --ignore="*_BASE_*" ) > all.txt
UPD:
Dale Hagglund noticed, that filename like "Some File" will appear as two filenames, "Some" and "File". To avoid that you can use --quoting-style=WORD option, when WORD can be shell or escape.
For example, if --quoting-style=shell Some File will print as 'Some File' and will be interpreted as one file.
Another problem is output file could the same of one of lsed files. We need to ignore it too.
So answer is:
outputFile=a.txt; ls --ignore="*sh*" --ignore="${outputFile}" --quoting-style=shell | xargs cat > ${outputFile}
If you want to get also files from subdirectories, `find' is your friend:
find . -type f ! -name '*_BASE_*' ! -path ./all.txt -exec cat {} >> all.txt \+
It searches files in the current directory and its subdirectories, it finds only files (-type f), ignores files matching to wildcard pattern *_BASE_*, ignores all.txt, and executes cat in the same manner as xargs would.
is there a way to search and replace a string using single unix command grep recusrsively in multiple directories?
i know it can be done by using the combination of find with other utilities like sed perl etc.but is there a way where we can use only grep for doing this on unix command line?
I don't think that only grep would work here; involving sed and other utilities will be much more easier, than just grep
one way, if you have GNU find and bash shell
find /path -type f -iname "*.txt" | while read -r FILE
do
while read -r LINE
do
case "$LINE" in
*WORD_TO_SEARCH* ) LINE=${LINE//WORD_TO_SEARCH/REPLACE};;
esac
echo "$LINE" >> temp
done < "$FILE"
mv temp "$FILE"
done