How to scan a folder recursively in ffmpeg to check for presence of subtitles? - unix

I have a folder (with subfolders) of videos on my Raspberry Pi, some containing soft-embedded subtitles and some without.
I want to use ffmpeg to check for the presence of the soft-embedded subtitle, and return a result against each filename so that I can locate.
The videos may be mp4, mkv or avi.
I have been using this command successfully but in a limited way as it only works for .mp4 files and doesn't check recursively. The output is a nice list with either a 0 or 1 at the beginning of the line, where 1 means that there is no subtitle.
for f in *.mp4; do ffmpeg -i "$f" -c copy -map 0:s:0 -frames:s 1 -f null - -v 0 -hide_banner; echo $? "$f" ; done
I have tried all manner of ways like find and -exec, read, xargs, all to no avail. The below is the closest I've got, but it doesn't deal with whitespaces properly so filenames including spaces are split over two lines and the command seems to run only one word at a time, so it fails and shows 1 for everything.
for f in `find /mnt/SSD/ | grep -E '(.mp4|.mkv|.avi)'`; do ffmpeg -i "$f" -c copy -map 0:s:0 -frames:s 1 -f null - -v 0 -hide_banner; echo $? "$f" ; done
Any ideas what I'm doing wrong?

The for command generates the list of items to loop through from the space separated list of words. You can avoid splitting filenames by reading the output of find command line by line, e.g.:
find . -iname "*.mp4" -o -iname "*.mkv" -o -iname "*.avi" | \
while read f; do \
subs_num=$(ffprobe -v quiet -select_streams s -show_entries stream=codec_type -of default=noprint_wrappers=1 "$f" | grep codec_type=subtitle | wc -l); \
[ $subs_num -gt 0 ] && echo "Subtitles: $subs_num File: $f"; \
done
BTW, yo can use ffprobe to detect subtitles streams. The above command counts them and prints the results only for videos with subtitles.

Related

Combine find, grep and xargs with printf

I have a find command combined with exec grep and a printf option :
find -L /home/blast/dirtest -maxdepth 3 **-exec grep -q "pattern" {} \;** -printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null
Result :
f/#/2018-01-01 10:00:00/#/191/#/filee.xml/#//#//home/blast/dirtest/01/05
I need the printf to get all the desired file informations at once (date, type size etc)
The above command works fine. But the exec option is too slow comparing to xargs.
I tryed to do the same with xarg but I did not succeed.
Any Idea on how to acheive that ? using the xargs command keeping the desired printf or similar .
Thanks
Your code is:
find -L /home/blast/dirtest -maxdepth 3 \
-exec grep -q "pattern" {} \; \
-printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null
This invokes a new grep process for each file.
If you are using GNU utilities, you can reduce the number of grep processes by something like:
(
format=\''%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n'\'
find -L /home/blast/dirtest -maxdepth 3 -print0 |\
xargs -0 grep -l -Z "pattern" |\
xargs -0 sh -c 'find "$#" -printf '"$format" --
) 2>/dev/null
for clarity, store the formatstring in a variable
use -print0 / -0 / -Z options to enable null-delimited data
generate initial filelist with find
filter on "pattern" with grep (use of xargs minimises the number of times grep gets called)
feed the filtered filelist into another xargs to run a minimal number of find -printf
in second xargs, call a subshell so that extra arguments can be appended (find requires the paths to precede the operators)
dummy second argument (--) to the sh -c invocation prevents the first filename being lost due to assignment to $0
To do it exactly how you want:
find -L /home/blast/dirtest/ -maxdepth 3 \
-printf '%p#%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' \
> tmp.out
cut -d# -f1 tmp.out \
| xargs grep -l "pattern" 2>/dev/null \
| sed 's/^/^/; s/$/#/' \
| grep -f /dev/stdin tmp.out \
| sed 's/^.*#//'
This operates under the assumption that you have no character # in your file names.
What it does is avoid the grep at first and just dump all the files with the requested metadata to a temporary file.
But it also prefixes each line with the full path (%p#).
Then we extract (cut) the full paths out of this list and list the files which contains the pattern (xargs grep).
We then use sed to prefix each such file name with ^ and suffix it with #, which makes it a greppable pattern in our tmp.out file.
Then we use this pattern (grep -f /dev/stdin) to extract only those paths from the big list in tmp.out.
Now all that's left is to remove the artificial full path we prefixed using the last sed command.
Seeing how you used /home, there's a good chance you're on Linux, which, if you're willing to accept some output format changes, allows you to do it somewhat more elegantly:
find -L /home/blast/dirtest/ -maxdepth 3 \
| xargs grep -l "pattern" 2>/dev/null \
| xargs stat --printf '%F/#/%y/#/%s/#/%n\n'
The output of stat --printf is different from that of find -printf (and from that of MacOS' stat -f), but it's the same information.
Do note, however, that because you passed -L to find, and you're grepping the result:
The results are limited to file types which can be grepped, so they will never be directories, links, etc..
If you stumble upon a broken link, it will not be in the output because it cannot be grepped.
I'v found an intresting thing about the -exec option.
We could run the grep once using the exec with the plus-sign (+)
-exec command {} +
This variant of the -exec option runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total
number of invocations of the command will be much less than the number of matched files. The command line is built in much the same way that xargs builds its command
lines. Only one instance of ’{}’ is allowed within the command. The command is executed in the starting directory.
That means if I change this :
-exec grep -l 'pattern' {} \;
By this ( replace the semicolon with the plus signe ):
-exec grep -l 'pattern' {} \+
Will improve the performance significantly.
Then I can pipe only one xargs for the format printing needs only.

Append "/" to end of directory

Completely noob question but, using ls piped to grep, I need to find files or directories that have all capitals in their name, and directories need to have "/" appended to indicate that it is a directory. Trying to append the "/" is the only part I am stuck on. Again, I apologize for the amateur question. I currently have ls | grep [A-Z] and the example out should be: BIRD, DOG, DOGDIR/
It's an interesting question because it's a somewhat difficult thing to accomplish with a bash one-liner.
Here's what I came up with. It doesn't seem very elegant, but I'm not sure how to improve.
find /animals -type d -or -type f \
| grep '/[A-Z]*$' \
| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' \
| sed -e 's/, *$//'; echo
I'll break that down for you
find /animals -type d -or -type f writes out, once per line, the directories and files it found in /animals (see below for my test environment dockerfile - I created /animals to match your desired output). Find can't do a regex match as far as I know on the name, so...
grep '/[A-Z]*$' filter's find's output so that only paths are shown where the last part of the file or directory name, after the final /, is all uppercase
xargs -I + bash -c '...' when you're in a shell and you want to use a "for" loop, chances are what you should be using is xargs. Learn it, know it, love it. xargs takes its input, separated by default by $IFS, and runs the command you give it for each piece of input . So this is going to run a bash shell for each path. that passed the grep filter. In my case, -I + will make xargs replace the literal '+' character with its current input filename. -I also makes it pass one at a time through xargs. For more information, see the xargs manual page.
'echo -n $(basename +)$( test -d + && echo -n /),\\ ' this is the inner bash script that will be run by xargs for each path that got through grep.
basename + cuts the directory component off the path; from your example output you don't want eg /animals/DOGDIR/, you want DOGDIR/. basename is the program that trims the directories for us.
test -d + && echo -n / checks to see whether + (remember xargs will replace it with filename) is a directory ,and if so, runs echo -n /. the -n argument to echo suppresses the newline, important to get the output in the CSV format you specified.
now we can put it all together to see that we're echo -n the output of basename + , with / appended, if it's a directory, and then , appended to that. All the echos run with -n to suppress newlines to keep output CSV looking.
| sed -e 's/, *$//'; echo is purely for formatting. Adding , to each individual output was an easy way to get the CSV, but it leaves us with a final , at the end of the list. The sed invocation removes , followed by any number of spaces at the end of the output so far - eg the entire output from all the xargs invocations. And since we never did output a newline at the end of that output, the final echo is adding that.
Usually in unix shells, you probably wouldn't want a CSV style output. You'd probably instead want a newline-separated output in most cases, one matching file per line, and that would be somewhat simpler to do because you wouldn't need all that faffing with -n and , to make it CSV style. But, valid requirement if the need is there.
FROM debian
RUN mkdir -p /animals
WORKDIR /animals
RUN mkdir -p DOGDIR lowerdir && touch DOGDIR/DOG DOGDIR/lowerDOG2 lowerdir/BIRD
ENTRYPOINT [ "/bin/bash" ]
CMD [ "-c" , "find /animals -type d -or -type f | grep '/[A-Z]*$'| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' | sed -e 's/, *$//'; echo"]
$ docker run --rm test
BIRD, DOGDIR/, DOG
You can start looking at
ls -F | grep -v "[[:lower:]]"
I did not add something for a comma-seperated line, because this is the wrong method: Parsing ls should be avoided ! It will go wrong for filenames like
I am a terribble filename,
with newlines inside me,
and the ls command combined with grep
will only show the last line
BECAUSE THIS LINE HAS NO LOWERCASE CHARACTERS
To get the files without a pipe, you can use
shopt -s extglob
ls -dp +([[:upper:]])
shopt -u extglob
An explanation of the extglob and uppercase can be found at https://unix.stackexchange.com/a/389071/57293
When you want the output in one line, you can get troubles with filenames that have newlines or commas in its name. You might want something like
# parsing ls, yes wrong and failing for some files
ls -dp +([[:upper:]]) | tr "\n" "," | sed 's/,$/\n/'

What is the fastest way to copy a large number of file paths mentioned in a text file, from one directory to another

I have a text file that lists a large number of file paths. I need to copy all these files from the source directory (mentioned in the path in the file one every line) to a destination directory.
Currently, the command line I tried is
while read line; do cp $ line dest_dir; done < my_file.txt
This seems to be a bit slow. Is there a way to parallelise this whole thing or speed it up ?
You could try GNU Parallel as follows:
parallel --dry-run -a fileList.txt cp {} destinationDirectory
If you like what it says, remove the --dry-run.
You could do something like the following (in your chosen shell)
#!/bin/bash
BATCHSIZE=2
# **NOTE**: check exists with -f and points at the right place. you might not need this. depends on your own taste for risk.
ln -s `which cp` /tmp/myuniquecpname
# **NOTE**: this sort of thing can have limits in some shells
for i in `cat test.txt`
do
BASENAME="`basename $i`"
echo doing /tmp/myuniquecpname $i test2/$BASENAME &
/tmp/myuniquecpname $i test2/$BASENAME &
COUNT=`ps -ef | grep /tmp/myuniquecpname | grep -v grep | wc -l`
# **NOTE**: maybe need to put a timeout on this loop
until [ $COUNT -lt $BATCHSIZE ]; do
COUNT=`ps -ef | grep /tmp/myuniquecpname | grep -v grep | wc -l`
echo waiting...
sleep 1
done
done

How to copy files in shell that do not end with a certain file extension

For example copy all files that do not end with .txt
Bash will accept a not pattern.
cp !(*.txt)
You can use ls with grep -v option:
for i in `ls | grep -v ".txt"`
do
cp $i $dest_dir
done
Depending on how many assumptions you can afford to make about the characters in the file names, it might be as simple as:
cp $(ls | grep -v '\.txt$') /some/other/place
If that won't work for you, then maybe find ... -print0 | xargs -0 cp ... can be used instead (though that has issues - because the destination goes at the end of the argument list).
On MacOS X, xargs has an option -J that supports what is needed:
-J replstr
If this option is specified, xargs will use the data read from standard input to replace the first occurrence of replstr instead of append-
ing that data after all other arguments. This option will not affect how many arguments will be read from input (-n), or the size of the
command(s) xargs will generate (-s). The option just moves where those arguments will be placed in the command(s) that are executed. The
replstr must show up as a distinct argument to xargs. It will not be recognized if, for instance, it is in the middle of a quoted string.
Furthermore, only the first occurrence of the replstr will be replaced. For example, the following command will copy the list of files and
directories which start with an uppercase letter in the current directory to destdir:
/bin/ls -1d [A-Z]* | xargs -J % cp -rp % destdir
It appears the GNU xargs does not have -J but does have the related but slightly restrictive -I option (which is also present in MacOS X):
-I replace-str
Replace occurrences of replace-str in the initial-arguments with
names read from standard input. Also, unquoted blanks do not
terminate input items; instead the separator is the newline
character. Implies -x and -L 1.
You can rely on:
find . -not -name "*.txt"
By using:
find -x . -not -name "*.txt" -d 1 -exec cp '{}' toto/ \;`
Which copies all file that are not .txt of the current directory to a subdirectory toto/. the -d 1 is used to prevent recursion here.
Either do:
for f in $(ls | grep -v "\.txt$")
do
cp -- "$f" ⟨destination-directory⟩
done
or if you have a huge amount of files:
find -prune \! -name "*.txt" -exec cp -- "{}" ⟨destination-directory⟩ .. \;
Two things here to comment on. One is the use of the double hyphen in the invocation of cp, and the quoting of $f. The first guards against "wacky" filenames that begin with a hyphen and might be interpreted as options. The second guards agains filenames with spaces (or what's in IFS) in them.
In zsh:
setopt extendedglob
cp *^.txt /some/folder
(if you just want files)...
cp *.^txt(.) /some/folder
More information on zsh globbing here and here.
I would do it like this, where destination is the destination directory:
ls | grep -v "\.txt$" | xargs cp -t destination
Edit: added "-t" thanks to the comments

for i in `ls |grep` question

This is the code i'm using to untar a file grep on the contents of the files within the tar and then delete the untared files. I dont have enough space to untar all files at once.
the issue i'm having is with the for f in `ls | grep -v *.gz line this is supposed to find the files that have come out of the tar and can be identified by not having a .tar.gz extension but it doesnt seem to pick them up?
Any help would be much appreciated
M
for i in *.tar.gz;
do echo $i >>outtput1;
tar -xvvzf $i; mv $i ./processed/;
for f in `ls | grep -v *.gz`; ----- this is the line that isn't working
do echo $f >> outtput1;
grep 93149249194 $f >>
outtput1; grep 788 $f >> outtput1;
rm -f $f;
done;
done
Try ls -1 | grep -v "\\.gz$". The -1 will make ls output one result per line. I've also fixed your regex for you in a few ways.
Although a better way to solve this whole thing is with find -exec.
Change it to ls | grep -v "*.gz", you have to quote *.gz because otherwise it will just glob the files in the working directory and grep them.
Never use ls in scripts, and don't use grep to match file patterns. Use globbing and tests instead:
for f in *
do
if [[ $f != *.gz ]]
then
...
fi
done

Resources